pymfe.general.MFEGeneral
- class pymfe.general.MFEGeneral[source]
Keep methods for metafeatures of
General
/Simple
group.The convention adopted for metafeature extraction related methods is to always start with
ft_
prefix to allow automatic method detection. This prefix is predefined within_internal
module.All method signature follows the conventions and restrictions listed below:
For independent attribute data,
X
meansevery type of attribute
,N
meansNumeric attributes only
andC
stands forCategorical attributes only
. It is important to note that the categorical attribute sets betweenX
andC
and the numerical attribute sets betweenX
andN
may differ due to data transformations, performed while fitting data into MFE model, enabled by, respectively,transform_num
andtransform_cat
arguments fromfit
(MFE method).Only arguments in MFE
_custom_args_ft
attribute (set up insidefit
method) are allowed to be required method arguments. All other arguments must be strictly optional (i.e., has a predefined default value).The initial assumption is that the user can change any optional argument, without any previous verification of argument value or its type, via kwargs argument of
extract
method of MFE class.The return value of all feature extraction methods should be a single value or a generic List (preferably a
np.ndarray
) type with numeric values.
There is another type of method adopted for automatic detection. It is adopted the prefix
precompute_
for automatic detection of these methods. These methods run while fitting some data into an MFE model automatically, and their objective is to precompute some common value shared between more than one feature extraction method. This strategy is a trade-off between more system memory consumption and speeds up of feature extraction. Their return value must always be a dictionary whose keys are possible extra arguments for both feature extraction methods and other precomputation methods. Note that there is a share of precomputed values between all valid feature-extraction modules (e.g.,class_freqs
computed in modulestatistical
can freely be used for any precomputation or feature extraction method of modulelandmarking
).- __init__(*args, **kwargs)
Methods
__init__
(*args, **kwargs)Compute the ratio between the number of attributes.
ft_cat_to_num
(X, cat_cols)Compute the ratio between the number of categoric and numeric features.
ft_freq_class
(y[, class_freqs])Compute the relative frequency of each distinct class.
Compute the ratio between the number of instances and attributes.
ft_nr_attr
(X)Compute the total number of attributes.
ft_nr_bin
(X)Compute the number of binary attributes.
ft_nr_cat
(cat_cols)Compute the number of categorical attributes.
ft_nr_class
(y[, classes])Compute the number of distinct classes.
ft_nr_inst
(X)Compute the number of instances (rows) in the dataset.
ft_nr_num
(X, cat_cols)Compute the number of numeric features.
ft_num_to_cat
(X, cat_cols)Compute the number of numerical and categorical features.
Precompute distinct classes and its frequencies from
y
.- classmethod ft_attr_to_inst(X: ndarray) float [source]
Compute the ratio between the number of attributes.
It is effectively the inverse of value given by
ft_inst_to_attr
.- Parameters
- X
np.ndarray
Fitted data.
- X
- Returns
- float
The ratio between the number of attributes and instances.
References
- 1
Alexandros Kalousis and Theoharis Theoharis. NOEMON: Design, implementation and performance results of an intelligent assistant for classifier selection. Intelligent Data Analysis, 3(5):319–337, 1999.
- classmethod ft_cat_to_num(X: ndarray, cat_cols: List[int]) Union[int, float] [source]
Compute the ratio between the number of categoric and numeric features.
If the number of numeric features is zero,
np.nan
is returned instead.Effectively the inverse of value given by
ft_num_to_cat
.- Parameters
- X
np.ndarray
Fitted data.
- cat_cols
list
of int List containing the indices of each categorical column in
X
.
- X
- Returns
- int or float
Proportion of categorical and numerical attributes.
References
- 1
Matthias Feurer, Jost Tobias Springenberg, and Frank Hutter. Using meta-learning toinitialize bayesian optimization of hyperparameters. In International Conference on Meta-learning and Algorithm Selection (MLAS), pages 3 – 10, 2014.
- classmethod ft_freq_class(y: ndarray, class_freqs: Optional[ndarray] = None) ndarray [source]
Compute the relative frequency of each distinct class.
- Parameters
- y
np.ndarray
Target attribute.
- class_freqs
np.ndarray
, optional Absolute frequency of each distinct class. Argument used to take advantage of precomputations.
- y
- Returns
np.ndarray
Relative frequency of each distinct class.
References
- 1
Guido Lindner and Rudi Studer. AST: Support for algorithm selection with a CBR approach. In European Conference on Principles of Data Mining and Knowledge Discovery (PKDD), pages 418 – 423, 1999.
- classmethod ft_inst_to_attr(X: ndarray) float [source]
Compute the ratio between the number of instances and attributes.
It is effectively the inverse of value given by
ft_attr_to_inst
.- Parameters
- X
np.ndarray
Fitted data.
- X
- Returns
- float
Ratio of number of instances and number of predictive attributes.
References
- 1
Petr Kuba, Pavel Brazdil, Carlos Soares, and Adam Woznica. Exploiting sampling andmeta-learning for parameter setting for support vector machines. In 8th IBERAMIA Workshop on Learning and Data Mining, pages 209 – 216, 2002.
- classmethod ft_nr_attr(X: ndarray) int [source]
Compute the total number of attributes.
- Parameters
- X
np.ndarray
Fitted data.
- X
- Returns
- int
Total number of attributes in the data without transformations.
References
- 1
Donald Michie, David J. Spiegelhalter, Charles C. Taylor, and John Campbell. Machine Learning, Neural and Statistical Classification, volume 37. Ellis Horwood Upper Saddle River, 1994.
- classmethod ft_nr_bin(X: ndarray) int [source]
Compute the number of binary attributes.
Any attribute that has exactly two distinct values is considered a binary attribute, independently of its data type.
- Parameters
- X
np.ndarray
Fitted data.
- X
- Returns
- int
Number of binary attributes in
X
.
References
- 1
Donald Michie, David J. Spiegelhalter, Charles C. Taylor, and John Campbell. Machine Learning, Neural and Statistical Classification, volume 37. Ellis Horwood Upper Saddle River, 1994.
- classmethod ft_nr_cat(cat_cols: List[int]) int [source]
Compute the number of categorical attributes.
- Parameters
- cat_cols
list
of int List containing the indices of each categorical column in
X
.
- cat_cols
- Returns
- int
Number of categorical attributes in
X
.
References
- 1
Robert Engels and Christiane Theusinger. Using a data metric for preprocessing advice for data mining applications. In 13th European Conference on on Artificial Intelligence (ECAI), pages 430 – 434, 1998.
- classmethod ft_nr_class(y: ndarray, classes: Optional[ndarray] = None) int [source]
Compute the number of distinct classes.
- Parameters
- y
np.ndarray
Target attribute.
- classes
np.ndarray
, optional Array with all distinct classes. This argument purpose is mainly for benefit from precomputations.
- y
- Returns
- int
Number of distinct classes in
y
.
References
- 1
Donald Michie, David J. Spiegelhalter, Charles C. Taylor, and John Campbell. Machine Learning, Neural and Statistical Classification, volume 37. Ellis Horwood Upper Saddle River, 1994.
- classmethod ft_nr_inst(X: ndarray) int [source]
Compute the number of instances (rows) in the dataset.
- Parameters
- X
np.ndarray
Fitted data.
- X
- Returns
- int
Number of instances in
X
.
References
- 1
Donald Michie, David J. Spiegelhalter, Charles C. Taylor, and John Campbell. Machine Learning, Neural and Statistical Classification, volume 37. Ellis Horwood Upper Saddle River, 1994.
- classmethod ft_nr_num(X: ndarray, cat_cols: List[int]) int [source]
Compute the number of numeric features.
- Parameters
- X
np.ndarray
Fitted data.
- cat_cols
list
of int List containing the indices of each categorical column in
X
.
- X
- Returns
- int
Number of numerical attributes in
X
.
References
- 1
Robert Engels and Christiane Theusinger. Using a data metric for preprocessing advice for data mining applications. In 13th European Conference on on Artificial Intelligence (ECAI), pages 430 – 434, 1998.
- classmethod ft_num_to_cat(X: ndarray, cat_cols: List[int]) Union[int, float] [source]
Compute the number of numerical and categorical features.
If the number of categoric features is zero,
np.nan
is returned instead.Effectively the inverse of the value given by
ft_cat_to_num
.- Parameters
- X
np.ndarray
Fitted data.
- cat_cols
list
of int List containing the indices of each categorical column in
X
.
- X
- Returns
- int or float
If
X
has at least one categorical feature, then return the ratio of numerical and categorical features. Returnnp.nan
otherwise.
References
- 1
Matthias Feurer, Jost Tobias Springenberg, and Frank Hutter. Using meta-learning toinitialize bayesian optimization of hyperparameters. In International Conference on Meta-learning and Algorithm Selection (MLAS), pages 3 – 10, 2014.
- classmethod precompute_general_class(y: Optional[ndarray] = None, **kwargs) Dict[str, Any] [source]
Precompute distinct classes and its frequencies from
y
.- Parameters
- y
np.ndarray
Target attribute.
- **kwargs
Additional arguments. May have previously precomputed before this method from other precomputed methods, so they can help speed up this precomputation.
- y
- Returns
dict
- The following precomputed items are returned:
classes
(np.ndarray
): distinct classes ofy
, ify
is notNoneType
.class_freqs
(np.ndarray
): class frequencies ofy
, ify
is notNoneType
.