pymfe.general.MFEGeneral

class pymfe.general.MFEGeneral[source]

Keep methods for metafeatures of General/Simple group.

The convention adopted for metafeature extraction related methods is to always start with ft_ prefix to allow automatic method detection. This prefix is predefined within _internal module.

All method signature follows the conventions and restrictions listed below:

For independent attribute data, X means every type of attribute, N means Numeric attributes only and C stands for Categorical attributes only. It is important to note that the categorical attribute sets between X and C and the numerical attribute sets between X and N may differ due to data transformations, performed while fitting data into MFE model, enabled by, respectively, transform_num and transform_cat arguments from fit (MFE method).
Only arguments in MFE _custom_args_ft attribute (set up inside fit method) are allowed to be required method arguments. All other arguments must be strictly optional (i.e., has a predefined default value).
The initial assumption is that the user can change any optional argument, without any previous verification of argument value or its type, via kwargs argument of extract method of MFE class.
The return value of all feature extraction methods should be a single value or a generic List (preferably a np.ndarray) type with numeric values.

There is another type of method adopted for automatic detection. It is adopted the prefix precompute_ for automatic detection of these methods. These methods run while fitting some data into an MFE model automatically, and their objective is to precompute some common value shared between more than one feature extraction method. This strategy is a trade-off between more system memory consumption and speeds up of feature extraction. Their return value must always be a dictionary whose keys are possible extra arguments for both feature extraction methods and other precomputation methods. Note that there is a share of precomputed values between all valid feature-extraction modules (e.g., class_freqs computed in module statistical can freely be used for any precomputation or feature extraction method of module landmarking).

__init__(*args, **kwargs)

Methods

`__init__`(args, *kwargs)
`ft_attr_to_inst`(X)	Compute the ratio between the number of attributes.
`ft_cat_to_num`(X, cat_cols)	Compute the ratio between the number of categoric and numeric features.
`ft_freq_class`(y[, class_freqs])	Compute the relative frequency of each distinct class.
`ft_inst_to_attr`(X)	Compute the ratio between the number of instances and attributes.
`ft_nr_attr`(X)	Compute the total number of attributes.
`ft_nr_bin`(X)	Compute the number of binary attributes.
`ft_nr_cat`(cat_cols)	Compute the number of categorical attributes.
`ft_nr_class`(y[, classes])	Compute the number of distinct classes.
`ft_nr_inst`(X)	Compute the number of instances (rows) in the dataset.
`ft_nr_num`(X, cat_cols)	Compute the number of numeric features.
`ft_num_to_cat`(X, cat_cols)	Compute the number of numerical and categorical features.
`precompute_general_class`([y])	Precompute distinct classes and its frequencies from `y`.

classmethod ft_attr_to_inst(X: ndarray) → float[source]

Compute the ratio between the number of attributes.

It is effectively the inverse of value given by ft_inst_to_attr.

Parameters

Xnp.ndarray: Fitted data.

Returns

float: The ratio between the number of attributes and instances.

References

1: Alexandros Kalousis and Theoharis Theoharis. NOEMON: Design, implementation and performance results of an intelligent assistant for classifier selection. Intelligent Data Analysis, 3(5):319–337, 1999.

classmethod ft_cat_to_num(X: ndarray, cat_cols: List[int]) → Union[int, float][source]

Compute the ratio between the number of categoric and numeric features.

If the number of numeric features is zero, np.nan is returned instead.

Effectively the inverse of value given by ft_num_to_cat.

Parameters

Xnp.ndarray: Fitted data.
cat_colslist of int: List containing the indices of each categorical column in X.

Returns

int or float: Proportion of categorical and numerical attributes.

References

1: Matthias Feurer, Jost Tobias Springenberg, and Frank Hutter. Using meta-learning toinitialize bayesian optimization of hyperparameters. In International Conference on Meta-learning and Algorithm Selection (MLAS), pages 3 – 10, 2014.

classmethod ft_freq_class(y: ndarray, class_freqs: Optional[ndarray] = None) → ndarray[source]

Compute the relative frequency of each distinct class.

Parameters

ynp.ndarray: Target attribute.
class_freqsnp.ndarray, optional: Absolute frequency of each distinct class. Argument used to take advantage of precomputations.

Returns

np.ndarray: Relative frequency of each distinct class.

References

1: Guido Lindner and Rudi Studer. AST: Support for algorithm selection with a CBR approach. In European Conference on Principles of Data Mining and Knowledge Discovery (PKDD), pages 418 – 423, 1999.

classmethod ft_inst_to_attr(X: ndarray) → float[source]

Compute the ratio between the number of instances and attributes.

It is effectively the inverse of value given by ft_attr_to_inst.

Parameters

Xnp.ndarray: Fitted data.

Returns

float: Ratio of number of instances and number of predictive attributes.

References

1: Petr Kuba, Pavel Brazdil, Carlos Soares, and Adam Woznica. Exploiting sampling andmeta-learning for parameter setting for support vector machines. In 8th IBERAMIA Workshop on Learning and Data Mining, pages 209 – 216, 2002.

classmethod ft_nr_attr(X: ndarray) → int[source]

Compute the total number of attributes.

Parameters

Xnp.ndarray: Fitted data.

Returns

int: Total number of attributes in the data without transformations.

References

1: Donald Michie, David J. Spiegelhalter, Charles C. Taylor, and John Campbell. Machine Learning, Neural and Statistical Classification, volume 37. Ellis Horwood Upper Saddle River, 1994.

classmethod ft_nr_bin(X: ndarray) → int[source]

Compute the number of binary attributes.

Any attribute that has exactly two distinct values is considered a binary attribute, independently of its data type.

Parameters

Xnp.ndarray: Fitted data.

Returns

int: Number of binary attributes in X.

References

1: Donald Michie, David J. Spiegelhalter, Charles C. Taylor, and John Campbell. Machine Learning, Neural and Statistical Classification, volume 37. Ellis Horwood Upper Saddle River, 1994.

classmethod ft_nr_cat(cat_cols: List[int]) → int[source]

Compute the number of categorical attributes.

Parameters

cat_colslist of int: List containing the indices of each categorical column in X.

Returns

int: Number of categorical attributes in X.

References

1: Robert Engels and Christiane Theusinger. Using a data metric for preprocessing advice for data mining applications. In 13th European Conference on on Artificial Intelligence (ECAI), pages 430 – 434, 1998.

classmethod ft_nr_class(y: ndarray, classes: Optional[ndarray] = None) → int[source]

Compute the number of distinct classes.

Parameters

ynp.ndarray: Target attribute.
classesnp.ndarray, optional: Array with all distinct classes. This argument purpose is mainly for benefit from precomputations.

Returns

int: Number of distinct classes in y.

References

1: Donald Michie, David J. Spiegelhalter, Charles C. Taylor, and John Campbell. Machine Learning, Neural and Statistical Classification, volume 37. Ellis Horwood Upper Saddle River, 1994.

classmethod ft_nr_inst(X: ndarray) → int[source]

Compute the number of instances (rows) in the dataset.

Parameters

Xnp.ndarray: Fitted data.

Returns

int: Number of instances in X.

References

1: Donald Michie, David J. Spiegelhalter, Charles C. Taylor, and John Campbell. Machine Learning, Neural and Statistical Classification, volume 37. Ellis Horwood Upper Saddle River, 1994.

classmethod ft_nr_num(X: ndarray, cat_cols: List[int]) → int[source]

Compute the number of numeric features.

Parameters

Xnp.ndarray: Fitted data.
cat_colslist of int: List containing the indices of each categorical column in X.

Returns

int: Number of numerical attributes in X.

References

1: Robert Engels and Christiane Theusinger. Using a data metric for preprocessing advice for data mining applications. In 13th European Conference on on Artificial Intelligence (ECAI), pages 430 – 434, 1998.

classmethod ft_num_to_cat(X: ndarray, cat_cols: List[int]) → Union[int, float][source]

Compute the number of numerical and categorical features.

If the number of categoric features is zero, np.nan is returned instead.

Effectively the inverse of the value given by ft_cat_to_num.

Parameters

Xnp.ndarray: Fitted data.
cat_colslist of int: List containing the indices of each categorical column in X.

Returns

int or float: If X has at least one categorical feature, then return the ratio of numerical and categorical features. Return np.nan otherwise.

References

1: Matthias Feurer, Jost Tobias Springenberg, and Frank Hutter. Using meta-learning toinitialize bayesian optimization of hyperparameters. In International Conference on Meta-learning and Algorithm Selection (MLAS), pages 3 – 10, 2014.

classmethod precompute_general_class(y: Optional[ndarray] = None, **kwargs) → Dict[str, Any][source]

Precompute distinct classes and its frequencies from y.

Parameters

ynp.ndarray: Target attribute.
**kwargs: Additional arguments. May have previously precomputed before this method from other precomputed methods, so they can help speed up this precomputation.

Returns

dict

The following precomputed items are returned:

classes (np.ndarray): distinct classes of y, if y is not NoneType.
class_freqs (np.ndarray): class frequencies of y, if y is not NoneType.