Using Summaries

In this example we will explain the different ways to select summary functions.

# Load a dataset
from sklearn.datasets import load_iris
from pymfe.mfe import MFE

data = load_iris()
y = data.target
X = data.data

Summary Methods

Several meta-features generate multiple values and mean and sd are the standard method to summary these values. In order to increase the flexibility, the PyMFE package implemented the summary (or post processing) methods to deal with multiple measures values. This method is able to deal with descriptive statistic (resulting in a single value) or a distribution (resulting in multiple values).

The post processing methods are setted using the parameter summary. It is possible to compute min, max, mean, median, kurtosis, standard deviation, among others. It will be illustrated in the following examples:

Apply several statistical measures as post processing

mfe = MFE(summary=["max", "min", "median", "mean", "var", "sd", "kurtosis",
                   "skewness"])
mfe.fit(X, y)
ft = mfe.extract()
print("\n".join("{:50} {:30}".format(x, y) for x, y in zip(ft[0], ft[1])))
attr_conc.kurtosis                                            -0.9474216477983255
attr_conc.max                                                  0.4299566853449739
attr_conc.mean                                                0.20980476831180148
attr_conc.median                                              0.18467386404867223
attr_conc.min                                                 0.08478331361536394
attr_conc.sd                                                   0.1195879817732128
attr_conc.skewness                                             0.7075924186351203
attr_conc.var                                                0.014301285384590275
attr_ent.kurtosis                                             -1.7072116699243152
attr_ent.max                                                   2.3156530476978263
attr_ent.mean                                                  2.2771912775084115
attr_ent.median                                                2.3034401979164256
attr_ent.min                                                    2.186231666502969
attr_ent.sd                                                   0.06103943244855649
attr_ent.skewness                                             -0.7209530933492252
attr_ent.var                                                0.0037258123136418905
attr_to_inst                                                  0.02666666666666667
best_node.kurtosis                                                           -3.0
best_node.max                                                  0.6666666666666666
best_node.mean                                                 0.6666666666666667
best_node.median                                               0.6666666666666666
best_node.min                                                  0.6666666666666666
best_node.sd                                               1.1702778228589004e-16
best_node.skewness                                                            0.0
best_node.var                                              1.3695501826753678e-32
can_cor.kurtosis                                                            -2.75
can_cor.max                                                    0.9848208927389822
can_cor.mean                                                   0.7280089563896481
can_cor.median                                                 0.7280089563896481
can_cor.min                                                   0.47119702004031394
can_cor.sd                                                     0.3631869233645244
can_cor.skewness                                          -2.5347649085285293e-16
can_cor.var                                                    0.1319047413029889
cat_to_num                                                                    0.0
class_conc.kurtosis                                             -2.34680678006496
class_conc.max                                                 0.4011425322248528
class_conc.mean                                               0.27347384133126745
class_conc.median                                             0.28650664619878463
class_conc.min                                                0.11973954070264788
class_conc.sd                                                 0.14091096327223987
class_conc.skewness                                          -0.07091647996659645
class_conc.var                                               0.019855899570310535
class_ent                                                       1.584962500721156
cor.kurtosis                                                  -1.9476130087221712
cor.max                                                        0.9628654314027961
cor.mean                                                        0.594116025760156
cor.median                                                     0.6231906153010576
cor.min                                                       0.11756978413300208
cor.sd                                                         0.3375443182856702
cor.skewness                                                 -0.18142911996033195
cor.var                                                       0.11393616680693783
cov.kurtosis                                                  -1.9705891027997176
cov.max                                                        1.2956093959731547
cov.mean                                                       0.5966542132736764
cov.median                                                     0.4229635346756151
cov.min                                                      0.042434004474272924
cov.sd                                                         0.5582672431248462
cov.skewness                                                  0.34072276443380106
cov.var                                                        0.3116623147462162
eigenvalues.kurtosis                                          -1.6906307400544616
eigenvalues.max                                                 4.228241706034867
eigenvalues.mean                                               1.1432392617449672
eigenvalues.median                                             0.1604401239857764
eigenvalues.min                                              0.023835092973449445
eigenvalues.sd                                                 2.0587713015069764
eigenvalues.skewness                                           0.7454458797939764
eigenvalues.var                                                 4.238539271908729
elite_nn.kurtosis                                             -0.4687499999999991
elite_nn.max                                                                  1.0
elite_nn.mean                                                  0.9333333333333333
elite_nn.median                                                0.9333333333333333
elite_nn.min                                                                  0.8
elite_nn.sd                                                   0.06285393610547088
elite_nn.skewness                                             -0.7159456159513794
elite_nn.var                                                 0.003950617283950616
eq_num_attr                                                    1.8780672345507194
freq_class.kurtosis                                                          -3.0
freq_class.max                                                 0.3333333333333333
freq_class.mean                                                0.3333333333333333
freq_class.median                                              0.3333333333333333
freq_class.min                                                 0.3333333333333333
freq_class.sd                                                                 0.0
freq_class.skewness                                                           0.0
freq_class.var                                                                0.0
g_mean.kurtosis                                                -1.876087805810185
g_mean.max                                                      5.785720390427728
g_mean.mean                                                    3.2230731578977903
g_mean.median                                                  3.1324323471229167
g_mean.min                                                     0.8417075469176013
g_mean.sd                                                      2.0229431040263726
g_mean.skewness                                               0.10017663652972701
g_mean.var                                                      4.092298802127854
gravity                                                        3.2082811597489393
h_mean.kurtosis                                               -1.8765954987057685
h_mean.max                                                      5.728905057850834
h_mean.mean                                                    2.9783891110628673
h_mean.median                                                  2.8449903044543063
h_mean.min                                                    0.49467077749202265
h_mean.sd                                                       2.145948231748242
h_mean.skewness                                                0.1382251313881372
h_mean.var                                                      4.605093813343408
inst_to_attr                                                                 37.5
iq_range.kurtosis                                              -1.809694974469229
iq_range.max                                                   3.4999999999999996
iq_range.mean                                                  1.7000000000000002
iq_range.median                                                1.4000000000000004
iq_range.min                                                                  0.5
iq_range.sd                                                    1.2754084313139324
iq_range.skewness                                               0.485861717653184
iq_range.var                                                    1.626666666666666
joint_ent.kurtosis                                            -2.3945662964722434
joint_ent.max                                                   3.410577680708083
joint_ent.mean                                                 3.0182209990602855
joint_ent.median                                               2.9901513033202027
joint_ent.min                                                  2.6820037088926547
joint_ent.sd                                                   0.3821875549207214
joint_ent.skewness                                            0.03611581267545158
joint_ent.var                                                  0.1460673271362794
kurtosis.kurtosis                                              -2.098903711032839
kurtosis.max                                                  0.13870467668072406
kurtosis.mean                                                 -0.8105361276250795
kurtosis.median                                               -0.9819958777250918
kurtosis.min                                                  -1.4168574317308589
kurtosis.sd                                                    0.7326910069728161
kurtosis.skewness                                             0.30302223794237043
kurtosis.var                                                   0.5368361116988393
leaves                                                                          9
leaves_branch.kurtosis                                         0.4284461976769669
leaves_branch.max                                                               5
leaves_branch.mean                                             3.7777777777777777
leaves_branch.median                                                          4.0
leaves_branch.min                                                               1
leaves_branch.sd                                               1.2018504251546631
leaves_branch.skewness                                        -1.1647123778290422
leaves_branch.var                                              1.4444444444444444
leaves_corrob.kurtosis                                        -1.7726882865481086
leaves_corrob.max                                              0.3333333333333333
leaves_corrob.mean                                             0.1111111111111111
leaves_corrob.median                                         0.013333333333333334
leaves_corrob.min                                            0.006666666666666667
leaves_corrob.sd                                              0.15051762539834182
leaves_corrob.skewness                                         0.6063813319643286
leaves_corrob.var                                            0.022655555555555557
leaves_homo.kurtosis                                          -1.1186086765355643
leaves_homo.max                                                              57.6
leaves_homo.mean                                                37.46666666666667
leaves_homo.median                                                           36.0
leaves_homo.min                                                              18.0
leaves_homo.sd                                                 13.142298124757328
leaves_homo.skewness                                            0.317544360680112
leaves_homo.var                                                            172.72
leaves_per_class.kurtosis                                     -2.3333333333333335
leaves_per_class.max                                           0.5555555555555556
leaves_per_class.mean                                          0.3333333333333333
leaves_per_class.median                                        0.3333333333333333
leaves_per_class.min                                           0.1111111111111111
leaves_per_class.sd                                           0.22222222222222224
leaves_per_class.skewness                                  2.1076890233118196e-16
leaves_per_class.var                                          0.04938271604938273
lh_trace                                                       32.477316568194915
linear_discr.kurtosis                                          1.1714277215943012
linear_discr.max                                                              1.0
linear_discr.mean                                              0.9800000000000001
linear_discr.median                                                           1.0
linear_discr.min                                               0.8666666666666667
linear_discr.sd                                               0.04499657051403685
linear_discr.skewness                                         -1.6391493111228852
linear_discr.var                                            0.0020246913580246905
mad.kurtosis                                                  -1.8614049069823586
mad.max                                                        1.8532499999999998
mad.mean                                                                1.0934175
mad.median                                                                1.03782
mad.min                                                       0.44477999999999973
mad.sd                                                         0.5785781994035033
mad.skewness                                                  0.21354801391337835
mad.var                                                            0.334752732825
max.kurtosis                                                   -2.182177604436795
max.max                                                                       7.9
max.mean                                                        5.425000000000001
max.median                                                                   5.65
max.min                                                                       2.5
max.sd                                                         2.4431878083083722
max.skewness                                                 -0.13254651618979896
max.var                                                         5.969166666666667
mean.kurtosis                                                 -1.9225347042283154
mean.max                                                        5.843333333333334
mean.mean                                                      3.4645000000000006
mean.median                                                     3.407666666666667
mean.min                                                       1.1993333333333336
mean.sd                                                         1.918485079431164
mean.skewness                                                 0.06361261265760602
mean.var                                                                 3.680585
median.kurtosis                                                 -2.04337146925642
median.max                                                                    5.8
median.mean                                                    3.6125000000000003
median.median                                                               3.675
median.min                                                                    1.3
median.sd                                                       1.919364043982624
median.skewness                                             -0.061080929963701326
median.var                                                     3.6839583333333326
min.kurtosis                                                  -1.9261232920910136
min.max                                                                       4.3
min.mean                                                       1.8499999999999999
min.median                                                                    1.5
min.min                                                                       0.1
min.sd                                                         1.8083141320025125
min.skewness                                                   0.3693439632179755
min.var                                                                      3.27
mut_inf.kurtosis                                               -2.303310024453824
mut_inf.max                                                    1.2015788914374017
mut_inf.mean                                                   0.8439327791692818
mut_inf.median                                                 0.9067678693618417
mut_inf.min                                                   0.36061648651604195
mut_inf.sd                                                     0.4222019352579773
mut_inf.skewness                                             -0.11787771034076516
mut_inf.var                                                   0.17825447413558124
naive_bayes.kurtosis                                          -1.1414812611540737
naive_bayes.max                                                               1.0
naive_bayes.mean                                               0.9533333333333334
naive_bayes.median                                             0.9333333333333333
naive_bayes.min                                                0.8666666666666667
naive_bayes.sd                                                0.04499657051403685
naive_bayes.skewness                                         -0.31221891640435945
naive_bayes.var                                               0.00202469135802469
nodes                                                                           8
nodes_per_attr                                                                2.0
nodes_per_inst                                                0.05333333333333334
nodes_per_level.kurtosis                                      -1.6700000000000004
nodes_per_level.max                                                             3
nodes_per_level.mean                                                          1.6
nodes_per_level.median                                                        1.0
nodes_per_level.min                                                             1
nodes_per_level.sd                                             0.8944271909999159
nodes_per_level.skewness                                        0.603738353924943
nodes_per_level.var                                                           0.8
nodes_repeated.kurtosis                                        -2.333333333333333
nodes_repeated.max                                                              4
nodes_repeated.mean                                            2.6666666666666665
nodes_repeated.median                                                         3.0
nodes_repeated.min                                                              1
nodes_repeated.sd                                              1.5275252316519465
nodes_repeated.skewness                                      -0.20782656212951636
nodes_repeated.var                                              2.333333333333333
nr_attr                                                                         4
nr_bin                                                                          0
nr_cat                                                                          0
nr_class                                                                        3
nr_cor_attr                                                                   0.5
nr_disc                                                                         2
nr_inst                                                                       150
nr_norm                                                                       1.0
nr_num                                                                          4
nr_outliers                                                                     1
ns_ratio                                                        1.698308838945616
num_to_cat                                                                    nan
one_nn.kurtosis                                               -1.3167187500000028
one_nn.max                                                                    1.0
one_nn.mean                                                                  0.96
one_nn.median                                                                 1.0
one_nn.min                                                     0.8666666666666667
one_nn.sd                                                     0.05621826951410451
one_nn.skewness                                               -0.7204063794571065
one_nn.var                                                  0.0031604938271604923
p_trace                                                         1.191898822470078
random_node.kurtosis                                                         -3.0
random_node.max                                                0.6666666666666666
random_node.mean                                               0.6666666666666667
random_node.median                                             0.6666666666666666
random_node.min                                                0.6666666666666666
random_node.sd                                             1.1702778228589004e-16
random_node.skewness                                                          0.0
random_node.var                                            1.3695501826753678e-32
range.kurtosis                                                -1.8858268700023024
range.max                                                                     5.9
range.mean                                                     3.5750000000000006
range.median                                                   3.0000000000000004
range.min                                                                     2.4
range.sd                                                       1.6500000000000001
range.skewness                                                 0.5188872193004419
range.var                                                                  2.7225
roy_root                                                       32.191925524310506
sd.kurtosis                                                   -1.7876277008883368
sd.max                                                         1.7652982332594662
sd.mean                                                        0.9478670787835934
sd.median                                                      0.7951518984691048
sd.min                                                         0.4358662849366982
sd.sd                                                          0.5712994109375844
sd.skewness                                                     0.541487250344505
sd.var                                                          0.326383016937631
sd_ratio                                                       1.2708666438750897
skewness.kurtosis                                             -2.3196335687878826
skewness.max                                                   0.3126147039228578
skewness.mean                                                 0.06273198447775732
skewness.median                                               0.10386208214673759
skewness.min                                                 -0.26941093030530366
skewness.sd                                                   0.29439896290757683
skewness.skewness                                            -0.10337620962487609
skewness.var                                                  0.08667074936105679
sparsity.kurtosis                                              -2.340915274336955
sparsity.max                                                 0.039048200122025624
sparsity.mean                                                  0.0287147773948895
sparsity.median                                              0.029555212805869355
sparsity.min                                                 0.016700483845793663
sparsity.sd                                                  0.011032357470087495
sparsity.skewness                                            -0.06436063304281459
sparsity.var                                               0.00012171291134779536
t_mean.kurtosis                                               -1.9391694118386529
t_mean.max                                                      5.797777777777777
t_mean.mean                                                    3.4705555555555554
t_mean.median                                                  3.4411111111111112
t_mean.min                                                     1.2022222222222223
t_mean.sd                                                      1.9048021402275979
t_mean.skewness                                                0.0327130494008835
t_mean.var                                                      3.628271193415637
tree_depth.kurtosis                                           -0.7921333239178021
tree_depth.max                                                                  5
tree_depth.mean                                                3.0588235294117645
tree_depth.median                                                             3.0
tree_depth.min                                                                  0
tree_depth.sd                                                  1.4348601079588785
tree_depth.skewness                                           -0.5738062414925108
tree_depth.var                                                 2.0588235294117645
tree_imbalance.kurtosis                                       -2.1867984670621965
tree_imbalance.max                                            0.35355339059327373
tree_imbalance.mean                                           0.19491705385114738
tree_imbalance.median                                         0.18313230988382748
tree_imbalance.min                                            0.05985020504366078
tree_imbalance.sd                                             0.13300709991513865
tree_imbalance.skewness                                       0.12675317882685808
tree_imbalance.var                                           0.017690888627835678
tree_shape.kurtosis                                          -0.28142447562999795
tree_shape.max                                                                0.5
tree_shape.mean                                                0.2708333333333333
tree_shape.median                                                            0.25
tree_shape.min                                                            0.15625
tree_shape.sd                                                 0.10711960313126631
tree_shape.skewness                                            0.9140413021207706
tree_shape.var                                                     0.011474609375
var.kurtosis                                                   -1.721549473595456
var.max                                                         3.116277852348993
var.mean                                                       1.1432392617449665
var.median                                                     0.6333498881431767
var.min                                                         0.189979418344519
var.sd                                                         1.3325463926454557
var.skewness                                                   0.6911005971389304
var.var                                                        1.7756798885524168
var_importance.kurtosis                                       -1.6933440581861985
var_importance.max                                             0.9226107085346216
var_importance.mean                                                          0.25
var_importance.median                                         0.03869464573268919
var_importance.min                                                            0.0
var_importance.sd                                             0.44925548152944056
var_importance.skewness                                        0.7416243043271853
var_importance.var                                            0.20183048768424952
w_lambda                                                     0.023438633222267347
worst_node.kurtosis                                           -1.5739109350454201
worst_node.max                                                 0.6666666666666666
worst_node.mean                                                              0.58
worst_node.median                                                             0.6
worst_node.min                                                 0.4666666666666667
worst_node.sd                                                  0.0773001205818937
worst_node.skewness                                          -0.24632978798366398
worst_node.var                                               0.005975308641975306

Apply quantile as post processing method

mfe = MFE(features=["cor"], summary=["quantiles"])
mfe.fit(X, y)
ft = mfe.extract()
print("\n".join("{:50} {:30}".format(x, y) for x, y in zip(ft[0], ft[1])))
cor.quantiles.0                                               0.11756978413300208
cor.quantiles.1                                               0.38170447548496433
cor.quantiles.2                                                0.6231906153010576
cor.quantiles.3                                                0.8583006134828313
cor.quantiles.4                                                0.9628654314027961

Apply histogram as post processing method

mfe = MFE(features=["cor"], summary=["histogram"])
mfe.fit(X, y)
ft = mfe.extract()
print("\n".join("{:50} {:30}".format(x, y) for x, y in zip(ft[0], ft[1])))
cor.histogram.0                                               0.16666666666666666
cor.histogram.1                                                               0.0
cor.histogram.2                                               0.16666666666666666
cor.histogram.3                                               0.16666666666666666
cor.histogram.4                                                               0.0
cor.histogram.5                                                               0.0
cor.histogram.6                                                               0.0
cor.histogram.7                                                               0.0
cor.histogram.8                                                0.3333333333333333
cor.histogram.9                                               0.16666666666666666

Get the default values without summarize them

mfe = MFE(features=["cor"], summary=None)
mfe.fit(X, y)
ft = mfe.extract()
print(ft)
(['cor'], [array([0.11756978, 0.87175378, 0.4284401 , 0.81794113, 0.36612593,
       0.96286543])])

Total running time of the script: ( 0 minutes 0.304 seconds)

Gallery generated by Sphinx-Gallery