Extracting meta-features from unsupervised learning

In this example we will show you how to extract meta-features from unsupervised machine learning tasks.

# Load a dataset
from sklearn.datasets import load_iris
from pymfe.mfe import MFE

data = load_iris()
y = data.target
X = data.data

You can simply omit the target attribute for unsupervised tasks while fitting the data into the MFE model. The pymfe package automatically finds and extracts only the metafeatures suitable for this type of task.

# Extract default unsupervised measures
mfe = MFE()
mfe.fit(X)
ft = mfe.extract()
print("\n".join("{:50} {:30}".format(x, y) for x, y in zip(ft[0], ft[1])))

# Extract all available unsupervised measures
mfe = MFE(groups="all")
mfe.fit(X)
ft = mfe.extract()
print("\n".join("{:50} {:30}".format(x, y) for x, y in zip(ft[0], ft[1])))
attr_conc.mean                                                0.20980476831180148
attr_conc.sd                                                   0.1195879817732128
attr_ent.mean                                                  2.2771912775084115
attr_ent.sd                                                   0.06103943244855649
attr_to_inst                                                  0.02666666666666667
cat_to_num                                                                    0.0
cor.mean                                                        0.594116025760156
cor.sd                                                         0.3375443182856702
cov.mean                                                       0.5966542132736764
cov.sd                                                         0.5582672431248462
eigenvalues.mean                                               1.1432392617449672
eigenvalues.sd                                                 2.0587713015069764
g_mean.mean                                                    3.2230731578977903
g_mean.sd                                                      2.0229431040263726
h_mean.mean                                                    2.9783891110628673
h_mean.sd                                                       2.145948231748242
inst_to_attr                                                                 37.5
iq_range.mean                                                  1.7000000000000002
iq_range.sd                                                    1.2754084313139324
kurtosis.mean                                                 -0.8105361276250795
kurtosis.sd                                                    0.7326910069728161
mad.mean                                                                1.0934175
mad.sd                                                         0.5785781994035033
max.mean                                                        5.425000000000001
max.sd                                                         2.4431878083083722
mean.mean                                                      3.4645000000000006
mean.sd                                                         1.918485079431164
median.mean                                                    3.6125000000000003
median.sd                                                       1.919364043982624
min.mean                                                       1.8499999999999999
min.sd                                                         1.8083141320025125
nr_attr                                                                         4
nr_bin                                                                          0
nr_cat                                                                          0
nr_cor_attr                                                                   0.5
nr_inst                                                                       150
nr_norm                                                                       1.0
nr_num                                                                          4
nr_outliers                                                                     1
num_to_cat                                                                    nan
range.mean                                                     3.5750000000000006
range.sd                                                       1.6500000000000001
sd.mean                                                        0.9478670787835934
sd.sd                                                          0.5712994109375844
skewness.mean                                                 0.06273198447775732
skewness.sd                                                   0.29439896290757683
sparsity.mean                                                  0.0287147773948895
sparsity.sd                                                  0.011032357470087495
t_mean.mean                                                    3.4705555555555554
t_mean.sd                                                      1.9048021402275979
var.mean                                                       1.1432392617449665
var.sd                                                         1.3325463926454557
attr_conc.mean                                                0.20980476831180148
attr_conc.sd                                                   0.1195879817732128
attr_ent.mean                                                  2.2771912775084115
attr_ent.sd                                                   0.06103943244855649
attr_to_inst                                                  0.02666666666666667
cat_to_num                                                                    0.0
cohesiveness.mean                                               67.10333333333334
cohesiveness.sd                                                 5.355733510152213
cor.mean                                                        0.594116025760156
cor.sd                                                         0.3375443182856702
cov.mean                                                       0.5966542132736764
cov.sd                                                         0.5582672431248462
eigenvalues.mean                                               1.1432392617449672
eigenvalues.sd                                                 2.0587713015069764
g_mean.mean                                                    3.2230731578977903
g_mean.sd                                                      2.0229431040263726
h_mean.mean                                                    2.9783891110628673
h_mean.sd                                                       2.145948231748242
inst_to_attr                                                                 37.5
iq_range.mean                                                  1.7000000000000002
iq_range.sd                                                    1.2754084313139324
kurtosis.mean                                                 -0.8105361276250795
kurtosis.sd                                                    0.7326910069728161
mad.mean                                                                1.0934175
mad.sd                                                         0.5785781994035033
max.mean                                                        5.425000000000001
max.sd                                                         2.4431878083083722
mean.mean                                                      3.4645000000000006
mean.sd                                                         1.918485079431164
median.mean                                                    3.6125000000000003
median.sd                                                       1.919364043982624
min.mean                                                       1.8499999999999999
min.sd                                                         1.8083141320025125
nr_attr                                                                         4
nr_bin                                                                          0
nr_cat                                                                          0
nr_cor_attr                                                                   0.5
nr_inst                                                                       150
nr_norm                                                                       1.0
nr_num                                                                          4
nr_outliers                                                                     1
num_to_cat                                                                    nan
one_itemset.mean                                                              0.2
one_itemset.sd                                                0.04993563108104261
range.mean                                                     3.5750000000000006
range.sd                                                       1.6500000000000001
sd.mean                                                        0.9478670787835934
sd.sd                                                          0.5712994109375844
skewness.mean                                                 0.06273198447775732
skewness.sd                                                   0.29439896290757683
sparsity.mean                                                  0.0287147773948895
sparsity.sd                                                  0.011032357470087495
t2                                                            0.02666666666666667
t3                                                           0.013333333333333334
t4                                                                            0.5
t_mean.mean                                                    3.4705555555555554
t_mean.sd                                                      1.9048021402275979
two_itemset.mean                                                             0.32
two_itemset.sd                                                 0.0851125499534728
var.mean                                                       1.1432392617449665
var.sd                                                         1.3325463926454557
wg_dist.mean                                                   0.4620901765870531
wg_dist.sd                                                    0.05612193762635788

Total running time of the script: ( 0 minutes 0.306 seconds)

Gallery generated by Sphinx-Gallery