pymfe.general.MFEGeneral
- class pymfe.general.MFEGeneral[source]
Keep methods for metafeatures of
General/Simplegroup.The convention adopted for metafeature extraction related methods is to always start with
ft_prefix to allow automatic method detection. This prefix is predefined within_internalmodule.All method signature follows the conventions and restrictions listed below:
For independent attribute data,
Xmeansevery type of attribute,NmeansNumeric attributes onlyandCstands forCategorical attributes only. It is important to note that the categorical attribute sets betweenXandCand the numerical attribute sets betweenXandNmay differ due to data transformations, performed while fitting data into MFE model, enabled by, respectively,transform_numandtransform_catarguments fromfit(MFE method).Only arguments in MFE
_custom_args_ftattribute (set up insidefitmethod) are allowed to be required method arguments. All other arguments must be strictly optional (i.e., has a predefined default value).The initial assumption is that the user can change any optional argument, without any previous verification of argument value or its type, via kwargs argument of
extractmethod of MFE class.The return value of all feature extraction methods should be a single value or a generic List (preferably a
np.ndarray) type with numeric values.
There is another type of method adopted for automatic detection. It is adopted the prefix
precompute_for automatic detection of these methods. These methods run while fitting some data into an MFE model automatically, and their objective is to precompute some common value shared between more than one feature extraction method. This strategy is a trade-off between more system memory consumption and speeds up of feature extraction. Their return value must always be a dictionary whose keys are possible extra arguments for both feature extraction methods and other precomputation methods. Note that there is a share of precomputed values between all valid feature-extraction modules (e.g.,class_freqscomputed in modulestatisticalcan freely be used for any precomputation or feature extraction method of modulelandmarking).- __init__(*args, **kwargs)
Methods
__init__(*args, **kwargs)Compute the ratio between the number of attributes.
ft_cat_to_num(X, cat_cols)Compute the ratio between the number of categoric and numeric features.
ft_freq_class(y[, class_freqs])Compute the relative frequency of each distinct class.
Compute the ratio between the number of instances and attributes.
ft_nr_attr(X)Compute the total number of attributes.
ft_nr_bin(X)Compute the number of binary attributes.
ft_nr_cat(cat_cols)Compute the number of categorical attributes.
ft_nr_class(y[, classes])Compute the number of distinct classes.
ft_nr_inst(X)Compute the number of instances (rows) in the dataset.
ft_nr_num(X, cat_cols)Compute the number of numeric features.
ft_num_to_cat(X, cat_cols)Compute the number of numerical and categorical features.
Precompute distinct classes and its frequencies from
y.- classmethod ft_attr_to_inst(X: ndarray) float[source]
Compute the ratio between the number of attributes.
It is effectively the inverse of value given by
ft_inst_to_attr.- Parameters
- X
np.ndarray Fitted data.
- X
- Returns
- float
The ratio between the number of attributes and instances.
References
- 1
Alexandros Kalousis and Theoharis Theoharis. NOEMON: Design, implementation and performance results of an intelligent assistant for classifier selection. Intelligent Data Analysis, 3(5):319–337, 1999.
- classmethod ft_cat_to_num(X: ndarray, cat_cols: List[int]) Union[int, float][source]
Compute the ratio between the number of categoric and numeric features.
If the number of numeric features is zero,
np.nanis returned instead.Effectively the inverse of value given by
ft_num_to_cat.- Parameters
- X
np.ndarray Fitted data.
- cat_cols
listof int List containing the indices of each categorical column in
X.
- X
- Returns
- int or float
Proportion of categorical and numerical attributes.
References
- 1
Matthias Feurer, Jost Tobias Springenberg, and Frank Hutter. Using meta-learning toinitialize bayesian optimization of hyperparameters. In International Conference on Meta-learning and Algorithm Selection (MLAS), pages 3 – 10, 2014.
- classmethod ft_freq_class(y: ndarray, class_freqs: Optional[ndarray] = None) ndarray[source]
Compute the relative frequency of each distinct class.
- Parameters
- y
np.ndarray Target attribute.
- class_freqs
np.ndarray, optional Absolute frequency of each distinct class. Argument used to take advantage of precomputations.
- y
- Returns
np.ndarrayRelative frequency of each distinct class.
References
- 1
Guido Lindner and Rudi Studer. AST: Support for algorithm selection with a CBR approach. In European Conference on Principles of Data Mining and Knowledge Discovery (PKDD), pages 418 – 423, 1999.
- classmethod ft_inst_to_attr(X: ndarray) float[source]
Compute the ratio between the number of instances and attributes.
It is effectively the inverse of value given by
ft_attr_to_inst.- Parameters
- X
np.ndarray Fitted data.
- X
- Returns
- float
Ratio of number of instances and number of predictive attributes.
References
- 1
Petr Kuba, Pavel Brazdil, Carlos Soares, and Adam Woznica. Exploiting sampling andmeta-learning for parameter setting for support vector machines. In 8th IBERAMIA Workshop on Learning and Data Mining, pages 209 – 216, 2002.
- classmethod ft_nr_attr(X: ndarray) int[source]
Compute the total number of attributes.
- Parameters
- X
np.ndarray Fitted data.
- X
- Returns
- int
Total number of attributes in the data without transformations.
References
- 1
Donald Michie, David J. Spiegelhalter, Charles C. Taylor, and John Campbell. Machine Learning, Neural and Statistical Classification, volume 37. Ellis Horwood Upper Saddle River, 1994.
- classmethod ft_nr_bin(X: ndarray) int[source]
Compute the number of binary attributes.
Any attribute that has exactly two distinct values is considered a binary attribute, independently of its data type.
- Parameters
- X
np.ndarray Fitted data.
- X
- Returns
- int
Number of binary attributes in
X.
References
- 1
Donald Michie, David J. Spiegelhalter, Charles C. Taylor, and John Campbell. Machine Learning, Neural and Statistical Classification, volume 37. Ellis Horwood Upper Saddle River, 1994.
- classmethod ft_nr_cat(cat_cols: List[int]) int[source]
Compute the number of categorical attributes.
- Parameters
- cat_cols
listof int List containing the indices of each categorical column in
X.
- cat_cols
- Returns
- int
Number of categorical attributes in
X.
References
- 1
Robert Engels and Christiane Theusinger. Using a data metric for preprocessing advice for data mining applications. In 13th European Conference on on Artificial Intelligence (ECAI), pages 430 – 434, 1998.
- classmethod ft_nr_class(y: ndarray, classes: Optional[ndarray] = None) int[source]
Compute the number of distinct classes.
- Parameters
- y
np.ndarray Target attribute.
- classes
np.ndarray, optional Array with all distinct classes. This argument purpose is mainly for benefit from precomputations.
- y
- Returns
- int
Number of distinct classes in
y.
References
- 1
Donald Michie, David J. Spiegelhalter, Charles C. Taylor, and John Campbell. Machine Learning, Neural and Statistical Classification, volume 37. Ellis Horwood Upper Saddle River, 1994.
- classmethod ft_nr_inst(X: ndarray) int[source]
Compute the number of instances (rows) in the dataset.
- Parameters
- X
np.ndarray Fitted data.
- X
- Returns
- int
Number of instances in
X.
References
- 1
Donald Michie, David J. Spiegelhalter, Charles C. Taylor, and John Campbell. Machine Learning, Neural and Statistical Classification, volume 37. Ellis Horwood Upper Saddle River, 1994.
- classmethod ft_nr_num(X: ndarray, cat_cols: List[int]) int[source]
Compute the number of numeric features.
- Parameters
- X
np.ndarray Fitted data.
- cat_cols
listof int List containing the indices of each categorical column in
X.
- X
- Returns
- int
Number of numerical attributes in
X.
References
- 1
Robert Engels and Christiane Theusinger. Using a data metric for preprocessing advice for data mining applications. In 13th European Conference on on Artificial Intelligence (ECAI), pages 430 – 434, 1998.
- classmethod ft_num_to_cat(X: ndarray, cat_cols: List[int]) Union[int, float][source]
Compute the number of numerical and categorical features.
If the number of categoric features is zero,
np.nanis returned instead.Effectively the inverse of the value given by
ft_cat_to_num.- Parameters
- X
np.ndarray Fitted data.
- cat_cols
listof int List containing the indices of each categorical column in
X.
- X
- Returns
- int or float
If
Xhas at least one categorical feature, then return the ratio of numerical and categorical features. Returnnp.nanotherwise.
References
- 1
Matthias Feurer, Jost Tobias Springenberg, and Frank Hutter. Using meta-learning toinitialize bayesian optimization of hyperparameters. In International Conference on Meta-learning and Algorithm Selection (MLAS), pages 3 – 10, 2014.
- classmethod precompute_general_class(y: Optional[ndarray] = None, **kwargs) Dict[str, Any][source]
Precompute distinct classes and its frequencies from
y.- Parameters
- y
np.ndarray Target attribute.
- **kwargs
Additional arguments. May have previously precomputed before this method from other precomputed methods, so they can help speed up this precomputation.
- y
- Returns
dict- The following precomputed items are returned:
classes(np.ndarray): distinct classes ofy, ifyis notNoneType.class_freqs(np.ndarray): class frequencies ofy, ifyis notNoneType.