Welcome to PyMFE’s documentation!

Install

Requirements

The PyMFE package requires the following dependencies:

numpy

scipy

scikit-learn

patsy

pandas

statsmodels

texttable

tqdm

gower

igraph

Install

The PyMFE is available on the PyPi. You can install it via pip as follow:

pip install -U pymfe

It is possible to use the development version installing from GitHub:

pip install -U git+https://github.com/ealcobaca/pymfe.git

If you prefer, you can clone it and run the setup.py file. Use the following commands to get a copy from Github and install all dependencies:

git clone https://github.com/ealcobaca/pymfe.git
cd pymfe
pip install .

Test and coverage

You want to test/test-coverage the code before to install:

$ make install-dev
$ make test-cov

Using PyMFE

Extracting metafeatures with PyMFE is easy.

The simplest way to extract meta-features is by instantiating the MFE class. It computes five meta-features groups by default using mean and standard deviation as summary functions: General, Statistical, Information-theoretic, Model-based, and Landmarking. The fit method can be called by passing the X and y. Then the extract method is used to extract the related measures. A simple example using pymfe for supervised tasks is given next:

# Load a dataset
from sklearn.datasets import load_iris
from pymfe.mfe import MFE

data = load_iris()
y = data.target
X = data.data

# Extract default measures
mfe = MFE()
mfe.fit(X, y)
ft = mfe.extract()
print(ft)

# Extract general, statistical and information-theoretic measures
mfe = MFE(groups=["general", "statistical", "info-theory"])
mfe.fit(X, y)
ft = mfe.extract()
print(ft)

For more examples see sphx_glr_auto_examples.

Meta-feature Description Table

The table shows for each meta-feature the group, a quick description and paper reference. See examples of how to compute the meta-feature in sphx_glr_auto_examples.

Meta-feature description

Group

Meta-feature name

Description

Reference

clustering

ch

Compute the Calinski and Harabasz index.

[1] T. Calinski, J. Harabasz, A dendrite method for cluster analysis, Commun. Stat. Theory Methods 3 (1) (1974) 1–27.

clustering

int

Compute the INT index.

[1] Bezdek, J. C.; Pal, N. R. (1998a). Some new indexes of cluster validity. IEEE Transactions on Systems, Man, and Cybernetics, Part B, v.28, n.3, p.301–315.

clustering

nre

Compute the normalized relative entropy.

[1] Bruno Almeida Pimentel, André C.P.L.F. de Carvalho. A new data characterization for selecting clustering algorithms using meta-learning. Information Sciences, Volume 477, 2019, Pages 203-219.

clustering

pb

Compute the pearson correlation between class matching and instance distances.

[1] J. Lev, “The Point Biserial Coefficient of Correlation”, Ann. Math. Statist., Vol. 20, no.1, pp. 125-126, 1949.

clustering

sc

Compute the number of clusters with size smaller than a given size.

[1] Bruno Almeida Pimentel, André C.P.L.F. de Carvalho. A new data characterization for selecting clustering algorithms using meta-learning. Information Sciences, Volume 477, 2019, Pages 203-219.

clustering

sil

Compute the mean silhouette value.

[1] P.J. Rousseeuw, Silhouettes: a graphical aid to the interpretation and validation of cluster analysis, J. Comput. Appl. Math. 20 (1987) 53–65.

clustering

vdb

Compute the Davies and Bouldin Index.

[1] D.L. Davies, D.W. Bouldin, A cluster separation measure, IEEE Trans. Pattern Anal. Mach. Intell. 1 (2) (1979) 224–227.

clustering

vdu

Compute the Dunn Index.

[1] J.C. Dunn, Well-separated clusters and optimal fuzzy partitions, J. Cybern. 4 (1) (1974) 95–104.

complexity

c1

Compute the entropy of class proportions.

[1] Ana C. Lorena, Luís P. F. Garcia, Jens Lehmann, Marcilio C. P. Souto, and Tin K. Ho. How Complex is your classification problem? A survey on measuring classification complexity (V2). (2019) (Cited on page 15). Published in ACM Computing Surveys (CSUR), Volume 52 Issue 5, October 2019, Article No. 107.

complexity

c2

Compute the imbalance ratio.

[1] Ana C. Lorena, Luís P. F. Garcia, Jens Lehmann, Marcilio C. P. Souto, and Tin K. Ho. How Complex is your classification problem? A survey on measuring classification complexity (V2). (2019) (Cited on page 16). Published in ACM Computing Surveys (CSUR), Volume 52 Issue 5, October 2019, Article No. 107.

complexity

cls_coef

Clustering coefficient.

[1] Ana C. Lorena, Luís P. F. Garcia, Jens Lehmann, Marcilio C. P. Souto, and Tin K. Ho. How Complex is your classification problem? A survey on measuring classification complexity (V2). (2019) (Cited on page 9). Published in ACM Computing Surveys (CSUR), Volume 52 Issue 5, October 2019, Article No. 107.

complexity

density

Average density of the network.

[1] Ana C. Lorena, Luís P. F. Garcia, Jens Lehmann, Marcilio C. P. Souto, and Tin K. Ho. How Complex is your classification problem? A survey on measuring classification complexity (V2). (2019) (Cited on page 9). Published in ACM Computing Surveys (CSUR), Volume 52 Issue 5, October 2019, Article No. 107.

complexity

f1

Maximum Fisher’s discriminant ratio.

[1] Ana C. Lorena, Luís P. F. Garcia, Jens Lehmann, Marcilio C. P. Souto, and Tin K. Ho. How Complex is your classification problem? A survey on measuring classification complexity (V2). (2019) (Cited on page 9). Published in ACM Computing Surveys (CSUR), Volume 52 Issue 5, October 2019, Article No. 107. [2] Ramón A Mollineda, José S Sánchez, and José M Sotoca. Data characterization for effective prototype selection. In 2nd Iberian Conference on Pattern Recognition and Image Analysis (IbPRIA), pages 27–34, 2005.

complexity

f1v

Directional-vector maximum Fisher’s discriminant ratio.

[1] Ana C. Lorena, Luís P. F. Garcia, Jens Lehmann, Marcilio C. P. Souto, and Tin K. Ho. How Complex is your classification problem? A survey on measuring classification complexity (V2). (2019) (Cited on page 9). Published in ACM Computing Surveys (CSUR), Volume 52 Issue 5, October 2019, Article No. 107. [2] Witold Malina. Two-parameter fisher criterion. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 31(4):629–636, 2001.

complexity

f2

Volume of the overlapping region.

[1] Ana C. Lorena, Luís P. F. Garcia, Jens Lehmann, Marcilio C. P. Souto, and Tin K. Ho. How Complex is your classification problem? A survey on measuring classification complexity (V2). (2019) (Cited on page 9). Published in ACM Computing Surveys (CSUR), Volume 52 Issue 5, October 2019, Article No. 107. [2] Marcilio C P Souto, Ana C Lorena, Newton Spolaôr, and Ivan G Costa. Complexity measures of supervised classification tasks: a case study for cancer gene expression data. In International Joint Conference on Neural Networks (IJCNN), pages 1352–1358, 2010. [3] Lisa Cummins. Combining and Choosing Case Base Maintenance Algorithms. PhD thesis, National University of Ireland, Cork, 2013.

complexity

f3

Compute feature maximum individual efficiency.

[1] Ana C. Lorena, Luís P. F. Garcia, Jens Lehmann, Marcilio C. P. Souto, and Tin K. Ho. How Complex is your classification problem? A survey on measuring classification complexity (V2). (2019) (Cited on page 6). Published in ACM Computing Surveys (CSUR), Volume 52 Issue 5, October 2019, Article No. 107.

complexity

f4

Compute the collective feature efficiency.

[1] Ana C. Lorena, Luís P. F. Garcia, Jens Lehmann, Marcilio C. P. Souto, and Tin K. Ho. How Complex is your classification problem? A survey on measuring classification complexity (V2). (2019) (Cited on page 7). Published in ACM Computing Surveys (CSUR), Volume 52 Issue 5, October 2019, Article No. 107.

complexity

hubs

Hub score.

[1] Ana C. Lorena, Luís P. F. Garcia, Jens Lehmann, Marcilio C. P. Souto, and Tin K. Ho. How Complex is your classification problem? A survey on measuring classification complexity (V2). (2019) (Cited on page 9). Published in ACM Computing Surveys (CSUR), Volume 52 Issue 5, October 2019, Article No. 107.

complexity

l1

Sum of error distance by linear programming.

[1] Ana C. Lorena, Luís P. F. Garcia, Jens Lehmann, Marcilio C. P. Souto, and Tin K. Ho. How Complex is your classification problem? A survey on measuring classification complexity (V2). (2019) (Cited on page 9). Published in ACM Computing Surveys (CSUR), Volume 52 Issue 5, October 2019, Article No. 107.

complexity

l2

Compute the OVO subsets error rate of linear classifier.

[1] Ana C. Lorena, Luís P. F. Garcia, Jens Lehmann, Marcilio C. P. Souto, and Tin K. Ho. How Complex is your classification problem? A survey on measuring classification complexity (V2). (2019) (Cited on page 9). Published in ACM Computing Surveys (CSUR), Volume 52 Issue 5, October 2019, Article No. 107.

complexity

l3

Non-Linearity of a linear classifier.

[1] Ana C. Lorena, Luís P. F. Garcia, Jens Lehmann, Marcilio C. P. Souto, and Tin K. Ho. How Complex is your classification problem? A survey on measuring classification complexity (V2). (2019) (Cited on page 9). Published in ACM Computing Surveys (CSUR), Volume 52 Issue 5, October 2019, Article No. 107.

complexity

lsc

Local set average cardinality.

[1] Ana C. Lorena, Luís P. F. Garcia, Jens Lehmann, Marcilio C. P. Souto, and Tin K. Ho. How Complex is your classification problem? A survey on measuring classification complexity (V2). (2019) (Cited on page 15). Published in ACM Computing Surveys (CSUR), Volume 52 Issue 5, October 2019, Article No. 107. [2] Enrique Leyva, Antonio González, and Raúl Pérez. A set of complexity measures designed for applying meta-learning to instance selection. IEEE Transactions on Knowledge and Data Engineering, 27(2):354–367, 2014.

complexity

n1

Compute the fraction of borderline points.

[1] Ana C. Lorena, Luís P. F. Garcia, Jens Lehmann, Marcilio C. P. Souto, and Tin K. Ho. How Complex is your classification problem? A survey on measuring classification complexity (V2). (2019) (Cited on page 9-10). Published in ACM Computing Surveys (CSUR), Volume 52 Issue 5, October 2019, Article No. 107.

complexity

n2

Ratio of intra and extra class nearest neighbor distance.

[1] Ana C. Lorena, Luís P. F. Garcia, Jens Lehmann, Marcilio C. P. Souto, and Tin K. Ho. How Complex is your classification problem? A survey on measuring classification complexity (V2). (2019) (Cited on page 9). Published in ACM Computing Surveys (CSUR), Volume 52 Issue 5, October 2019, Article No. 107.

complexity

n3

Error rate of the nearest neighbor classifier.

[1] Ana C. Lorena, Luís P. F. Garcia, Jens Lehmann, Marcilio C. P. Souto, and Tin K. Ho. How Complex is your classification problem? A survey on measuring classification complexity (V2). (2019) (Cited on page 9). Published in ACM Computing Surveys (CSUR), Volume 52 Issue 5, October 2019, Article No. 107.

complexity

n4

Compute the non-linearity of the k-NN Classifier.

[1] Ana C. Lorena, Luís P. F. Garcia, Jens Lehmann, Marcilio C. P. Souto, and Tin K. Ho. How Complex is your classification problem? A survey on measuring classification complexity (V2). (2019) (Cited on page 9-11). Published in ACM Computing Surveys (CSUR), Volume 52 Issue 5, October 2019, Article No. 107.

complexity

t1

Fraction of hyperspheres covering data.

[1] Ana C. Lorena, Luís P. F. Garcia, Jens Lehmann, Marcilio C. P. Souto, and Tin K. Ho. How Complex is your classification problem? A survey on measuring classification complexity (V2). (2019) (Cited on page 9). Published in ACM Computing Surveys (CSUR), Volume 52 Issue 5, October 2019, Article No. 107. [2] Tin K Ho and Mitra Basu. Complexity measures of supervised classification problems. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(3):289–300, 2002.

complexity

t2

Compute the average number of features per dimension.

[1] Ana C. Lorena, Luís P. F. Garcia, Jens Lehmann, Marcilio C. P. Souto, and Tin K. Ho. How Complex is your classification problem? A survey on measuring classification complexity (V2). (2019) (Cited on page 15). Published in ACM Computing Surveys (CSUR), Volume 52 Issue 5, October 2019, Article No. 107.

complexity

t3

Compute the average number of PCA dimensions per points.

[1] Ana C. Lorena, Luís P. F. Garcia, Jens Lehmann, Marcilio C. P. Souto, and Tin K. Ho. How Complex is your classification problem? A survey on measuring classification complexity (V2). (2019) (Cited on page 15). Published in ACM Computing Surveys (CSUR), Volume 52 Issue 5, October 2019, Article No. 107.

complexity

t4

Compute the ratio of the PCA dimension to the original dimension.

[1] Ana C. Lorena, Luís P. F. Garcia, Jens Lehmann, Marcilio C. P. Souto, and Tin K. Ho. How Complex is your classification problem? A survey on measuring classification complexity (V2). (2019) (Cited on page 15). Published in ACM Computing Surveys (CSUR), Volume 52 Issue 5, October 2019, Article No. 107.

concept

cohesiveness

Compute the improved version of the weighted distance, that captures how dense or sparse is the example distribution.

[1] Vilalta, R and Drissi, Y (2002). A Characterization of Difficult Problems in Classification. Proceedings of the 2002 International Conference on Machine Learning and Applications (pp. 133-138).

concept

conceptvar

Compute the concept variation that estimates the variability of class labels among examples.

[1] Vilalta, R. (1999). Understanding accuracy performance through concept characterization and algorithm analysis. In Proceedings of the ICML-99 workshop on recent advances in meta-learning and future work (pp. 3-9).

concept

impconceptvar

Compute the improved concept variation that estimates the variability of class labels among examples.

[1] Vilalta, R and Drissi, Y (2002). A Characterization of Difficult Problems in Classification. Proceedings of the 2002 International Conference on Machine Learning and Applications (pp. 133-138).

concept

wg_dist

Compute the weighted distance, that captures how dense or sparse is the example distribution.

[1] Vilalta, R. (1999). Understanding accuracy performance through concept characterization and algorithm analysis. In Proceedings of the ICML-99 workshop on recent advances in meta-learning and future work (pp. 3-9).

general

attr_to_inst

Compute the ratio between the number of attributes.

[1] Alexandros Kalousis and Theoharis Theoharis. NOEMON: Design, implementation and performance results of an intelligent assistant for classifier selection. Intelligent Data Analysis, 3(5):319–337, 1999.

general

cat_to_num

Compute the ratio between the number of categoric and numeric features.

[1] Matthias Feurer, Jost Tobias Springenberg, and Frank Hutter. Using meta-learning toinitialize bayesian optimization of hyperparameters. In International Conference on Meta-learning and Algorithm Selection (MLAS), pages 3 – 10, 2014.

general

freq_class

Compute the relative frequency of each distinct class.

[1] Guido Lindner and Rudi Studer. AST: Support for algorithm selection with a CBR approach. In European Conference on Principles of Data Mining and Knowledge Discovery (PKDD), pages 418 – 423, 1999.

general

inst_to_attr

Compute the ratio between the number of instances and attributes.

[1] Petr Kuba, Pavel Brazdil, Carlos Soares, and Adam Woznica. Exploiting sampling andmeta-learning for parameter setting for support vector machines. In 8th IBERAMIA Workshop on Learning and Data Mining, pages 209 – 216, 2002.

general

nr_attr

Compute the total number of attributes.

[1] Donald Michie, David J. Spiegelhalter, Charles C. Taylor, and John Campbell. Machine Learning, Neural and Statistical Classification, volume 37. Ellis Horwood Upper Saddle River, 1994.

general

nr_bin

Compute the number of binary attributes.

[1] Donald Michie, David J. Spiegelhalter, Charles C. Taylor, and John Campbell. Machine Learning, Neural and Statistical Classification, volume 37. Ellis Horwood Upper Saddle River, 1994.

general

nr_cat

Compute the number of categorical attributes.

[1] Robert Engels and Christiane Theusinger. Using a data metric for preprocessing advice for data mining applications. In 13th European Conference on on Artificial Intelligence (ECAI), pages 430 – 434, 1998.

general

nr_class

Compute the number of distinct classes.

[1] Donald Michie, David J. Spiegelhalter, Charles C. Taylor, and John Campbell. Machine Learning, Neural and Statistical Classification, volume 37. Ellis Horwood Upper Saddle River, 1994.

general

nr_inst

Compute the number of instances (rows) in the dataset.

[1] Donald Michie, David J. Spiegelhalter, Charles C. Taylor, and John Campbell. Machine Learning, Neural and Statistical Classification, volume 37. Ellis Horwood Upper Saddle River, 1994.

general

nr_num

Compute the number of numeric features.

[1] Robert Engels and Christiane Theusinger. Using a data metric for preprocessing advice for data mining applications. In 13th European Conference on on Artificial Intelligence (ECAI), pages 430 – 434, 1998.

general

num_to_cat

Compute the number of numerical and categorical features.

[1] Matthias Feurer, Jost Tobias Springenberg, and Frank Hutter. Using meta-learning toinitialize bayesian optimization of hyperparameters. In International Conference on Meta-learning and Algorithm Selection (MLAS), pages 3 – 10, 2014.

info-theory

attr_conc

Compute concentration coef. of each pair of distinct attributes.

[1] Alexandros Kalousis and Melanie Hilario. Model selection via meta-learning: a comparative study. International Journal on Artificial Intelligence Tools, 10(4):525–554, 2001.

info-theory

attr_ent

Compute Shannon’s entropy for each predictive attribute.

[1] Donald Michie, David J. Spiegelhalter, Charles C. Taylor, and John Campbell. Machine Learning, Neural and Statistical Classification, volume 37. Ellis Horwood Upper Saddle River, 1994.

info-theory

class_conc

Compute concentration coefficient between each attribute and class.

[1] Alexandros Kalousis and Melanie Hilario. Model selection via meta-learning: a comparative study. International Journal on Artificial Intelligence Tools, 10(4):525–554, 2001.

info-theory

class_ent

Compute target attribute Shannon’s entropy.

[1] Donald Michie, David J. Spiegelhalter, Charles C. Taylor, and John Campbell. Machine Learning, Neural and Statistical Classification, volume 37. Ellis Horwood Upper Saddle River, 1994.

info-theory

eq_num_attr

Compute the number of attributes equivalent for a predictive task.

[1] Donald Michie, David J. Spiegelhalter, Charles C. Taylor, and John Campbell. Machine Learning, Neural and Statistical Classification, volume 37. Ellis Horwood Upper Saddle River, 1994.

info-theory

joint_ent

Compute the joint entropy between each attribute and class.

[1] Donald Michie, David J. Spiegelhalter, Charles C. Taylor, and John Campbell. Machine Learning, Neural and Statistical Classification, volume 37. Ellis Horwood Upper Saddle River, 1994.

info-theory

mut_inf

Compute the mutual information between each attribute and target.

[1] Donald Michie, David J. Spiegelhalter, Charles C. Taylor, and John Campbell. Machine Learning, Neural and Statistical Classification, volume 37. Ellis Horwood Upper Saddle River, 1994.

info-theory

ns_ratio

Compute the noisiness of attributes.

[1] Donald Michie, David J. Spiegelhalter, Charles C. Taylor, and John Campbell. Machine Learning, Neural and Statistical Classification, volume 37. Ellis Horwood Upper Saddle River, 1994.

itemset

one_itemset

Compute the one itemset meta-feature.

[1] Song, Q., Wang, G., & Wang, C. (2012). Automatic recommendation of classification algorithms based on data set characteristics. Pattern recognition, 45(7), 2672-2689.

itemset

two_itemset

Compute the two itemset meta-feature.

[1] Song, Q., Wang, G., & Wang, C. (2012). Automatic recommendation of classification algorithms based on data set characteristics. Pattern recognition, 45(7), 2672-2689.

landmarking

best_node

Performance of a the best single decision tree node.

[1] Hilan Bensusan and Christophe Giraud-Carrier. Discovering task neighbourhoods through landmark learning performances. In 4th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD), pages 325 – 330, 2000. [2] Johannes Furnkranz and Johann Petrak. An evaluation of landmarking variants. In 1st ECML/PKDD International Workshop on Integration and Collaboration Aspects of Data Mining, Decision Support and Meta-Learning (IDDM), pages 57 – 68, 2001.

landmarking

elite_nn

Performance of Elite Nearest Neighbor.

[1] Hilan Bensusan and Christophe Giraud-Carrier. Discovering task neighbourhoods through landmark learning performances. In 4th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD), pages 325 – 330, 2000.

landmarking

linear_discr

Performance of the Linear Discriminant classifier.

[1] Hilan Bensusan and Christophe Giraud-Carrier. Discovering task neighbourhoods through landmark learning performances. In 4th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD), pages 325 – 330, 2000. [2] Johannes Furnkranz and Johann Petrak. An evaluation of landmarking variants. In 1st ECML/PKDD International Workshop on Integration and Collaboration Aspects of Data Mining, Decision Support and Meta-Learning (IDDM), pages 57 – 68, 2001.

landmarking

naive_bayes

Performance of the Naive Bayes classifier.

[1] Hilan Bensusan and Christophe Giraud-Carrier. Discovering task neighbourhoods through landmark learning performances. In 4th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD), pages 325 – 330, 2000. [2] Johannes Furnkranz and Johann Petrak. An evaluation of landmarking variants. In 1st ECML/PKDD International Workshop on Integration and Collaboration Aspects of Data Mining, Decision Support and Meta-Learning (IDDM), pages 57 – 68, 2001.

landmarking

one_nn

Performance of the 1-Nearest Neighbor classifier.

[1] Hilan Bensusan and Christophe Giraud-Carrier. Discovering task neighbourhoods through landmark learning performances. In 4th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD), pages 325 – 330, 2000.

landmarking

random_node

Performance of the single decision tree node model induced by a random attribute.

[1] Hilan Bensusan and Christophe Giraud-Carrier. Discovering task neighbourhoods through landmark learning performances. In 4th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD), pages 325 – 330, 2000. [2] Johannes Furnkranz and Johann Petrak. An evaluation of landmarking variants. In 1st ECML/PKDD International Workshop on Integration and Collaboration Aspects of Data Mining, Decision Support and Meta-Learning (IDDM), pages 57 – 68, 2001.

landmarking

worst_node

Performance of the single decision tree node model induced by the worst informative attribute.

[1] Hilan Bensusan and Christophe Giraud-Carrier. Discovering task neighbourhoods through landmark learning performances. In 4th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD), pages 325 – 330, 2000. [2] Johannes Furnkranz and Johann Petrak. An evaluation of landmarking variants. In 1st ECML/PKDD International Workshop on Integration and Collaboration Aspects of Data Mining, Decision Support and Meta-Learning (IDDM), pages 57 – 68, 2001.

model-based

leaves

Compute the number of leaf nodes in the DT model.

[1] Yonghong Peng, PA Flach, Pavel Brazdil, and Carlos Soares. Decision tree-based data characterization for meta-learning. In 2nd ECML/PKDD International Workshop on Integration and Collaboration Aspects of Data Mining, Decision Support and Meta-Learning(IDDM), pages 111 – 122, 2002a.

model-based

leaves_branch

Compute the size of branches in the DT model.

[1] Yonghong Peng, PA Flach, Pavel Brazdil, and Carlos Soares. Decision tree-based data characterization for meta-learning. In 2nd ECML/PKDD International Workshop on Integration and Collaboration Aspects of Data Mining, Decision Support and Meta-Learning(IDDM), pages 111 – 122, 2002a.

model-based

leaves_corrob

Compute the leaves corroboration of the DT model.

[1] Hilan Bensusan, Christophe Giraud-Carrier, and Claire Kennedy. A higher-order approachto meta-learning. In 10th International Conference Inductive Logic Programming (ILP), pages 33 – 42, 2000.

model-based

leaves_homo

Compute the DT model Homogeneity for every leaf node.

[1] Hilan Bensusan, Christophe Giraud-Carrier, and Claire Kennedy. A higher-order approachto meta-learning. In 10th International Conference Inductive Logic Programming (ILP), pages 33 – 42, 2000.

model-based

leaves_per_class

Compute the proportion of leaves per class in DT model.

[1] Andray Filchenkov and Arseniy Pendryak. Datasets meta-feature description for recom-mending feature selection algorithm. In Artificial Intelligence and Natural Language and Information Extraction, Social Media and Web Search FRUCT Conference (AINL-ISMWFRUCT), pages 11 – 18, 2015.

model-based

nodes

Compute the number of non-leaf nodes in DT model.

[1] Yonghong Peng, PA Flach, Pavel Brazdil, and Carlos Soares. Decision tree-based data characterization for meta-learning. In 2nd ECML/PKDD International Workshop on Integration and Collaboration Aspects of Data Mining, Decision Support and Meta-Learning(IDDM), pages 111 – 122, 2002a.

model-based

nodes_per_attr

Compute the ratio of nodes per number of attributes in DT model.

[1] Hilan Bensusan, Christophe Giraud-Carrier, and Claire Kennedy. A higher-order approachto meta-learning. In 10th International Conference Inductive Logic Programming (ILP), pages 33 – 42, 2000.

model-based

nodes_per_inst

Compute the ratio of non-leaf nodes per number of instances in DT model.

[1] Hilan Bensusan, Christophe Giraud-Carrier, and Claire Kennedy. A higher-order approachto meta-learning. In 10th International Conference Inductive Logic Programming (ILP), pages 33 – 42, 2000.

model-based

nodes_per_level

Compute the ratio of number of nodes per tree level in DT model.

[1] Yonghong Peng, PA Flach, Pavel Brazdil, and Carlos Soares. Decision tree-based data characterization for meta-learning. In 2nd ECML/PKDD International Workshop on Integration and Collaboration Aspects of Data Mining, Decision Support and Meta-Learning(IDDM), pages 111 – 122, 2002a.

model-based

nodes_repeated

Compute the number of repeated nodes in DT model.

[1] Hilan Bensusan, Christophe Giraud-Carrier, and Claire Kennedy. A higher-order approachto meta-learning. In 10th International Conference Inductive Logic Programming (ILP), pages 33 – 42, 2000.

model-based

tree_depth

Compute the depth of every node in the DT model.

[1] Yonghong Peng, PA Flach, Pavel Brazdil, and Carlos Soares. Decision tree-based data characterization for meta-learning. In 2nd ECML/PKDD International Workshop on Integration and Collaboration Aspects of Data Mining, Decision Support and Meta-Learning(IDDM), pages 111 – 122, 2002a.

model-based

tree_imbalance

Compute the tree imbalance for each leaf node.

[1] Hilan Bensusan, Christophe Giraud-Carrier, and Claire Kennedy. A higher-order approachto meta-learning. In 10th International Conference Inductive Logic Programming (ILP), pages 33 – 42, 2000.

model-based

tree_shape

Compute the tree shape for every leaf node.

[1] Hilan Bensusan, Christophe Giraud-Carrier, and Claire Kennedy. A higher-order approachto meta-learning. In 10th International Conference Inductive Logic Programming (ILP), pages 33 – 42, 2000.

model-based

var_importance

Compute the features importance of the DT model for each attribute.

[1] Hilan Bensusan, Christophe Giraud-Carrier, and Claire Kennedy. A higher-order approachto meta-learning. In 10th International Conference Inductive Logic Programming (ILP), pages 33 – 42, 2000.

statistical

can_cor

Compute canonical correlations of data.

[1] Alexandros Kalousis. Algorithm Selection via Meta-Learning. PhD thesis, Faculty of Science of the University of Geneva, 2002.

statistical

cor

Compute the absolute value of the correlation of distinct dataset column pairs.

[1] Ciro Castiello, Giovanna Castellano, and Anna Maria Fanelli. Meta-data: Characterization of input features for meta-learning. In 2nd International Conference on Modeling Decisions for Artificial Intelligence (MDAI), pages 457–468, 2005. [2] Matthias Reif, Faisal Shafait, Markus Goldstein, Thomas Breuel, and Andreas Dengel. Automatic classifier selection for non-experts. Pattern Analysis and Applications, 17(1):83–96, 2014. [3] Donald Michie, David J. Spiegelhalter, Charles C. Taylor, and John Campbell. Machine Learning, Neural and Statistical Classification, volume 37. Ellis Horwood Upper Saddle River, 1994.

statistical

cov

Compute the absolute value of the covariance of distinct dataset attribute pairs.

[1] Ciro Castiello, Giovanna Castellano, and Anna Maria Fanelli. Meta-data: Characterization of input features for meta-learning. In 2nd International Conference on Modeling Decisions for Artificial Intelligence (MDAI), pages 457–468, 2005. [2] Donald Michie, David J. Spiegelhalter, Charles C. Taylor, and John Campbell. Machine Learning, Neural and Statistical Classification, volume 37. Ellis Horwood Upper Saddle River, 1994.

statistical

eigenvalues

Compute the eigenvalues of covariance matrix from dataset.

[1] Shawkat Ali and Kate A. Smith. On learning algorithm selection for classification. Applied Soft Computing, 6(2):119 – 138, 2006.

statistical

g_mean

Compute the geometric mean of each attribute.

[1] Shawkat Ali and Kate A. Smith-Miles. A meta-learning approach to automatic kernel selection for support vector machines. Neurocomputing, 70(1):173 – 186, 2006.

statistical

gravity

Compute the distance between minority and majority classes center of mass.

[1] Shawkat Ali and Kate A. Smith. On learning algorithm selection for classification. Applied Soft Computing, 6(2):119 – 138, 2006.

statistical

h_mean

Compute the harmonic mean of each attribute.

[1] Shawkat Ali and Kate A. Smith-Miles. A meta-learning approach to automatic kernel selection for support vector machines. Neurocomputing, 70(1):173 – 186, 2006.

statistical

iq_range

Compute the interquartile range (IQR) of each attribute.

[1] Shawkat Ali and Kate A. Smith-Miles. A meta-learning approach to automatic kernel selection for support vector machines. Neurocomputing, 70(1):173 – 186, 2006.

statistical

kurtosis

Compute the kurtosis of each attribute.

[1] Donald Michie, David J. Spiegelhalter, Charles C. Taylor, and John Campbell. Machine Learning, Neural and Statistical Classification, volume 37. Ellis Horwood Upper Saddle River, 1994.

statistical

lh_trace

Compute the Lawley-Hotelling trace.

[1] Lawley D. A Generalization of Fisher’s z Test. Biometrika. 1938;30(1):180-187. [2] Hotelling H. A generalized T test and measure of multivariate dispersion. In: Neyman J, ed. Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability. Berkeley: University of California Press; 1951:23-41.

statistical

mad

Compute the Median Absolute Deviation (MAD) adjusted by a factor.

[1] Shawkat Ali and Kate A. Smith. On learning algorithm selection for classification. Applied Soft Computing, 6(2):119 – 138, 2006.

statistical

max

Compute the maximum value from each attribute.

[1] Robert Engels and Christiane Theusinger. Using a data metric for preprocessing advice for data mining applications. In 13th European Conference on on Artificial Intelligence (ECAI), pages 430 – 434, 1998.

statistical

mean

Compute the mean value of each attribute.

[1] Robert Engels and Christiane Theusinger. Using a data metric for preprocessing advice for data mining applications. In 13th European Conference on on Artificial Intelligence (ECAI), pages 430 – 434, 1998.

statistical

median

Compute the median value from each attribute.

[1] Robert Engels and Christiane Theusinger. Using a data metric for preprocessing advice for data mining applications. In 13th European Conference on on Artificial Intelligence (ECAI), pages 430 – 434, 1998.

statistical

min

Compute the minimum value from each attribute.

[1] Robert Engels and Christiane Theusinger. Using a data metric for preprocessing advice for data mining applications. In 13th European Conference on on Artificial Intelligence (ECAI), pages 430 – 434, 1998.

statistical

nr_cor_attr

Compute the number of distinct highly correlated pair of attributes.

[1] Mostafa A. Salama, Aboul Ella Hassanien, and Kenneth Revett. Employment of neural network and rough set in meta-learning. Memetic Computing, 5(3):165 – 177, 2013.

statistical

nr_disc

Compute the number of canonical correlation between each attribute and class.

[1] Guido Lindner and Rudi Studer. AST: Support for algorithm selection with a CBR approach. In European Conference on Principles of Data Mining and Knowledge Discovery (PKDD), pages 418 – 423, 1999.

statistical

nr_norm

Compute the number of attributes normally distributed based in a given method.

[1] Christian Kopf, Charles Taylor, and Jorg Keller. Meta-Analysis: From data characterisation for meta-learning to meta-regression. In PKDD Workshop on Data Mining, Decision Support, Meta-Learning and Inductive Logic Programming, pages 15 – 26, 2000.

statistical

nr_outliers

Compute the number of attributes with at least one outlier value.

[1] Christian Kopf and Ioannis Iglezakis. Combination of task description strategies and case base properties for meta-learning. In 2nd ECML/PKDD International Workshop on Integration and Collaboration Aspects of Data Mining, Decision Support and Meta-Learning(IDDM), pages 65 – 76, 2002. [2] Peter J. Rousseeuw and Mia Hubert. Robust statistics for outlier detection. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 1(1):73 – 79, 2011.

statistical

p_trace

Compute the Pillai’s trace.

[1] Pillai K.C.S (1955). Some New test criteria in multivariate analysis. Ann Math Stat: 26(1):117–21. Seber, G.A.F. (1984). Multivariate Observations. New York: John Wiley and Sons.

statistical

range

Compute the range (max - min) of each attribute.

[1] Shawkat Ali and Kate A. Smith-Miles. A meta-learning approach to automatic kernel selection for support vector machines. Neurocomputing, 70(1):173 – 186, 2006.

statistical

roy_root

Compute the Roy’s largest root.

[1] Roy SN. On a Heuristic Method of Test Construction and its use in Multivariate Analysis. Ann Math Stat. 1953;24(2):220-238. [2] A note on Roy’s largest root. Kuhfeld, W.F. Psychometrika (1986) 51: 479. https://doi.org/10.1007/BF02294069

statistical

sd

Compute the standard deviation of each attribute.

[1] Robert Engels and Christiane Theusinger. Using a data metric for preprocessing advice for data mining applications. In 13th European Conference on on Artificial Intelligence (ECAI), pages 430 – 434, 1998.

statistical

sd_ratio

Compute a statistical test for homogeneity of covariances.

[1] Donald Michie, David J. Spiegelhalter, Charles C. Taylor, and John Campbell. Machine Learning, Neural and Statistical Classification, volume 37. Ellis Horwood Upper Saddle River, 1994.

statistical

skewness

Compute the skewness for each attribute.

[1] Donald Michie, David J. Spiegelhalter, Charles C. Taylor, and John Campbell. Machine Learning, Neural and Statistical Classification, volume 37. Ellis Horwood Upper Saddle River, 1994.

statistical

sparsity

Compute (possibly normalized) sparsity metric for each attribute.

[1] Mostafa A. Salama, Aboul Ella Hassanien, and Kenneth Revett. Employment of neural network and rough set in meta-learning. Memetic Computing, 5(3):165 – 177, 2013.

statistical

t_mean

Compute the trimmed mean of each attribute.

[1] Robert Engels and Christiane Theusinger. Using a data metric for preprocessing advice for data mining applications. In 13th European Conference on on Artificial Intelligence (ECAI), pages 430 – 434, 1998.

statistical

var

Compute the variance of each attribute.

[1] Ciro Castiello, Giovanna Castellano, and Anna Maria Fanelli. Meta-data: Characterization of input features for meta-learning. In 2nd International Conference on Modeling Decisions for Artificial Intelligence (MDAI), pages 457–468, 2005.

statistical

w_lambda

Compute the Wilks’ Lambda value.

[1] Guido Lindner and Rudi Studer. AST: Support for algorithm selection with a CBR approach. In European Conference on Principles of Data Mining and Knowledge Discovery (PKDD), pages 418 – 423, 1999.

Note

Relative and Subsampling Landmarking are subcase of Landmarking. Thus, the Landmarking description is the same for Relative and Subsampling groups.

Note

More info about implementation can be found in API Documentation. See API Documentation.

API Documentation

This is the full API documentation of the PyMFE package.

pymfe.mfe: Meta-feature extractor

Main module for extracting metafeatures from datasets.

mfe.MFE([groups, features, summary, ...])

Core class for metafeature extraction.

pymfe.general: General Meta-features

A module dedicated to the extraction of general metafeatures.

general.MFEGeneral()

Keep methods for metafeatures of General/Simple group.

pymfe.statistical: Statistical Meta-features

A module dedicated to the extraction of statistical metafeatures.

statistical.MFEStatistical()

Keep methods for metafeatures of Statistical group.

pymfe.info_theory: Information theory Meta-features

A module dedicated to the extraction of Information Theoretic Metafeatures.

info_theory.MFEInfoTheory()

Keeps methods for metafeatures of Information Theory group.

pymfe.model_based: Model-based Meta-features

Module dedicated to extraction of model-based metafeatures.

model_based.MFEModelBased()

Keep methods for metafeatures of model-based group.

pymfe.landmarking: Landmarking Meta-features

Module dedicated to extraction of landmarking metafeatures.

landmarking.MFELandmarking()

Keep methods for metafeatures of landmarking group.

pymfe.relative: Relative Landmarking Meta-features

Module dedicated to extraction of relative landmarking metafeatures.

relative.MFERelativeLandmarking()

Keep methods for metafeatures of landmarking group.

pymfe.clustering: Clustering Meta-features

A module dedicated to the extraction of clustering metafeatures.

clustering.MFEClustering()

Keep methods for metafeatures of Clustering group.

pymfe.concept: Concept Meta-features

Module dedicated to extraction of Concept Metafeatures.

concept.MFEConcept()

Keep methods for metafeatures of Concept group.

pymfe.itemset: Itemset Meta-features

Module dedicated to extraction of itemset metafeatures.

itemset.MFEItemset()

Keep methods for metafeatures of Itemset group.

pymfe.complexity: Complexity Meta-features

Module dedicated to extraction of complexity metafeatures.

complexity.MFEComplexity()

Keep methods for metafeatures of Complexity group.

What is new on PyMFE package?

The PyMFE releases are available in PyPI and GitHub.

Version 0.3.0

  • Metafeature extraction with confidence intervals

  • Pydoc fixes and package documentation/code consistency improvements

    • Reformatted ‘model-based’ group metafeature extraction methods arguments to a consistent format (all model-based metafeatures now receive a single mandatory argument ‘dt_model’, and all other arguments are optional arguments from precomputations.) Now it is much easier to use those methods directly without the main class (mfe) filter, if desired.

    • Now accepting user custom arguments in precomputation methods.

    • Added ‘extract_from_model’ MFE method, making easy to extract model-based metafeatures from a pre-fitted model without using the training data.

  • Memory issues

    • Now handling memory errors in precomputations, postcomputations and metafeature extraction as a regular exception.

  • Categorical attributes one-hot encoding option

    • Added option to encode categorical attributes using one-hot encoding instead of the current gray encoding.

  • New nan-resilient summary functions

    • All summary functions now can be calculated ignoring ‘nan’ values, using its nan-resilient version.

  • Online documentation improvement

Version 0.2.0

  • New meta-feature groups

    • Complexity

    • Itemset

    • Concept

  • New feature in MFE to list meta-feature description and references

  • Dev class update

  • Integration, system tests, tests updates

  • Old module reviews

  • Docstring improvement

  • Online documentation improvement

  • Clustering group updated

  • Landmarking group updated

  • Statistical group updated

Version 0.1.1

  • Bugs solved

    • False positive of mypy fixed

    • Contributing link now is working

  • We added a note about how to add a new meta-feature

  • Modified ‘verbosity’ (from ‘extract’ method) argument type from boolean to integer. Now the user can choose the desired level of verbosity. Verbosity = 1 means that a progress bar will be shown during the meta-feature extraction process. Verbosity = 2 maintains all the previous verbose messages (i.e., it logs every “extract” step) plus additional information about the current percentage of progress done so far.

Version 0.1.0

  • Meta-feature groups available

    • Relative landmarking

    • Clustering-based

    • Relative subsampling landmarking

  • Makefile to help developers

  • New Functionalities

    • Now you can list available groups

    • Now you can list available meta-features

  • Documentation

    • New examples

    • New README

  • Bugs

    • Problems in parse categoric meta-features solved

    • Categorization of attributes with constant values solved

  • Test

    • Several new tests added

Version 0.0.3

  • Documentation improvement

  • Setup improvement

  • Meta-feature groups available:

    • Simple

    • Statistical

    • Information-theoretic

    • Model-based

    • Landmarking

About us

Contributors

You can find the contributors of this package here.

Citing PyMFE

If you use the pymfe in scientific publication, we would appreciate citations to the following paper:

Edesio Alcobaça, Felipe Siqueira, Adriano Rivolli, Luís P. F. Garcia, Jefferson T. Oliva, & André C. P. L. F. de Carvalho (2020). MFE: Towards reproducible meta-feature extraction. Journal of Machine Learning Research, 21(111), 1-5.

You can also use the bibtex format:

@article{JMLR:v21:19-348,
  author  = {Edesio Alcobaça and
             Felipe Siqueira and
             Adriano Rivolli and
             Luís P. F. Garcia and
             Jefferson T. Oliva and
             André C. P. L. F. de Carvalho
  },
  title   = {MFE: Towards reproducible meta-feature extraction},
  journal = {Journal of Machine Learning Research},
  year    = {2020},
  volume  = {21},
  number  = {111},
  pages   = {1-5},
  url     = {http://jmlr.org/papers/v21/19-348.html}
}

Getting started

Information to install, test, and contribute to the package.

API Documentation

In this section, we document expected types and allowed features for all functions, and all parameters available for the meta-feature extraction.

Examples

A set of examples illustrating the use of PyMFE package. You will learn in this section how PyMFE works, patter, tips, and more.

What’s new ?

Log of the PyMFE history.

About PyMFE

If you would like to know more about this project, how to cite it, and the contributors, see this section.