Working with the results

In this example, we will show you how to work with the results of metafeatures extraction.

from sklearn.datasets import load_iris
from pymfe.mfe import MFE

data = load_iris()
y = data.target
X = data.data

Parsing subset of metafeaure

After extracting metafeatures, parse a subset of interest from the results.

model = MFE(groups=["relative", "general", "model-based"], measure_time="avg")
model.fit(X, y)
ft = model.extract()
print("\n".join("{:50} {:30}".format(x, y) for x, y in zip(ft[0], ft[1])))
attr_to_inst                                                  0.02666666666666667
best_node.mean.relative                                                       3.0
best_node.sd.relative                                                         1.0
cat_to_num                                                                    0.0
elite_nn.mean.relative                                                        4.0
elite_nn.sd.relative                                                          6.0
freq_class.mean                                                0.3333333333333333
freq_class.sd                                                                 0.0
inst_to_attr                                                                 37.5
leaves                                                                          9
leaves_branch.mean                                             3.7777777777777777
leaves_branch.sd                                               1.2018504251546631
leaves_corrob.mean                                             0.1111111111111111
leaves_corrob.sd                                              0.15051762539834182
leaves_homo.mean                                                37.46666666666667
leaves_homo.sd                                                 13.142298124757328
leaves_per_class.mean                                          0.3333333333333333
leaves_per_class.sd                                           0.22222222222222224
linear_discr.mean.relative                                                    7.0
linear_discr.sd.relative                                                      2.5
naive_bayes.mean.relative                                                     5.0
naive_bayes.sd.relative                                                       2.5
nodes                                                                           8
nodes_per_attr                                                                2.0
nodes_per_inst                                                0.05333333333333334
nodes_per_level.mean                                                          1.6
nodes_per_level.sd                                             0.8944271909999159
nodes_repeated.mean                                            2.6666666666666665
nodes_repeated.sd                                              0.5773502691896258
nr_attr                                                                         4
nr_bin                                                                          0
nr_cat                                                                          0
nr_class                                                                        3
nr_inst                                                                       150
nr_num                                                                          4
num_to_cat                                                                    nan
one_nn.mean.relative                                                          6.0
one_nn.sd.relative                                                            5.0
random_node.mean.relative                                                     2.0
random_node.sd.relative                                                       4.0
tree_depth.mean                                                3.0588235294117645
tree_depth.sd                                                  1.4348601079588785
tree_imbalance.mean                                           0.19491705385114738
tree_imbalance.sd                                             0.13300709991513865
tree_shape.mean                                                0.2708333333333333
tree_shape.sd                                                 0.10711960313126631
var_importance.mean                                                          0.25
var_importance.sd                                             0.27845186989521703
worst_node.mean.relative                                                      1.0
worst_node.sd.relative                                                        7.0

From the extract output, parse only the ‘general’ metafeatures

ft_general = model.parse_by_group("general", ft)
print("\n".join("{:50} {:30}".format(x, y) for x, y in zip(ft_general[0],
                                                           ft_general[1])))
attr_to_inst                                                  0.02666666666666667
cat_to_num                                                                    0.0
freq_class.mean                                                0.3333333333333333
freq_class.sd                                                                 0.0
inst_to_attr                                                                 37.5
nr_attr                                                                         4
nr_bin                                                                          0
nr_cat                                                                          0
nr_class                                                                        3
nr_inst                                                                       150
nr_num                                                                          4
num_to_cat                                                                    nan

Actually, you can parse by various groups at once. In this case, the selected metafeatures must be from one of the given groups.

ft_subset = model.parse_by_group(["general", "model-based"], ft)
print("\n".join("{:50} {:30}".format(x, y) for x, y in zip(ft_subset[0],
                                                           ft_subset[1])))
attr_to_inst                                                  0.02666666666666667
cat_to_num                                                                    0.0
freq_class.mean                                                0.3333333333333333
freq_class.sd                                                                 0.0
inst_to_attr                                                                 37.5
leaves                                                                          9
leaves_branch.mean                                             3.7777777777777777
leaves_branch.sd                                               1.2018504251546631
leaves_corrob.mean                                             0.1111111111111111
leaves_corrob.sd                                              0.15051762539834182
leaves_homo.mean                                                37.46666666666667
leaves_homo.sd                                                 13.142298124757328
leaves_per_class.mean                                          0.3333333333333333
leaves_per_class.sd                                           0.22222222222222224
nodes                                                                           8
nodes_per_attr                                                                2.0
nodes_per_inst                                                0.05333333333333334
nodes_per_level.mean                                                          1.6
nodes_per_level.sd                                             0.8944271909999159
nodes_repeated.mean                                            2.6666666666666665
nodes_repeated.sd                                              0.5773502691896258
nr_attr                                                                         4
nr_bin                                                                          0
nr_cat                                                                          0
nr_class                                                                        3
nr_inst                                                                       150
nr_num                                                                          4
num_to_cat                                                                    nan
tree_depth.mean                                                3.0588235294117645
tree_depth.sd                                                  1.4348601079588785
tree_imbalance.mean                                           0.19491705385114738
tree_imbalance.sd                                             0.13300709991513865
tree_shape.mean                                                0.2708333333333333
tree_shape.sd                                                 0.10711960313126631
var_importance.mean                                                          0.25
var_importance.sd                                             0.27845186989521703

Maybe an uncommon scenario, given that the user already have instantiated some MFE model to extract the metafeatures, but actually there’s no need to instantiate a MFE model to parse the results.

ft_subset = MFE.parse_by_group(["general", "model-based"], ft)
print("\n".join("{:50} {:30}".format(x, y) for x, y in zip(ft_subset[0],
                                                           ft_subset[1])))
attr_to_inst                                                  0.02666666666666667
cat_to_num                                                                    0.0
freq_class.mean                                                0.3333333333333333
freq_class.sd                                                                 0.0
inst_to_attr                                                                 37.5
leaves                                                                          9
leaves_branch.mean                                             3.7777777777777777
leaves_branch.sd                                               1.2018504251546631
leaves_corrob.mean                                             0.1111111111111111
leaves_corrob.sd                                              0.15051762539834182
leaves_homo.mean                                                37.46666666666667
leaves_homo.sd                                                 13.142298124757328
leaves_per_class.mean                                          0.3333333333333333
leaves_per_class.sd                                           0.22222222222222224
nodes                                                                           8
nodes_per_attr                                                                2.0
nodes_per_inst                                                0.05333333333333334
nodes_per_level.mean                                                          1.6
nodes_per_level.sd                                             0.8944271909999159
nodes_repeated.mean                                            2.6666666666666665
nodes_repeated.sd                                              0.5773502691896258
nr_attr                                                                         4
nr_bin                                                                          0
nr_cat                                                                          0
nr_class                                                                        3
nr_inst                                                                       150
nr_num                                                                          4
num_to_cat                                                                    nan
tree_depth.mean                                                3.0588235294117645
tree_depth.sd                                                  1.4348601079588785
tree_imbalance.mean                                           0.19491705385114738
tree_imbalance.sd                                             0.13300709991513865
tree_shape.mean                                                0.2708333333333333
tree_shape.sd                                                 0.10711960313126631
var_importance.mean                                                          0.25
var_importance.sd                                             0.27845186989521703

Total running time of the script: ( 0 minutes 0.088 seconds)

Gallery generated by Sphinx-Gallery