pymfe.concept.MFEConcept
- class pymfe.concept.MFEConcept[source]
Keep methods for metafeatures of
Concept
group.The convention adopted for metafeature extraction related methods is to always start with
ft_
prefix to allow automatic method detection. This prefix is predefined within_internal
module.All method signature follows the conventions and restrictions listed below:
For independent attribute data,
X
meansevery type of attribute
,N
meansNumeric attributes only
andC
stands forCategorical attributes only
. It is important to note that the categorical attribute sets betweenX
andC
and the numerical attribute sets betweenX
andN
may differ due to data transformations, performed while fitting data into MFE model, enabled by, respectively,transform_num
andtransform_cat
arguments fromfit
(MFE method).Only arguments in MFE
_custom_args_ft
attribute (set up insidefit
method) are allowed to be required method arguments. All other arguments must be strictly optional (i.e., has a predefined default value).The initial assumption is that the user can change any optional argument, without any previous verification of argument value or its type, via kwargs argument of
extract
method of MFE class.The return value of all feature extraction methods should be a single value or a generic List (preferably a
np.ndarray
) type with numeric values.
There is another type of method adopted for automatic detection. It is adopted the prefix
precompute_
for automatic detection of these methods. These methods run while fitting some data into an MFE model automatically, and their objective is to precompute some common value shared between more than one feature extraction method. This strategy is a trade-off between more system memory consumption and speeds up of feature extraction. Their return value must always be a dictionary whose keys are possible extra arguments for both feature extraction methods and other precomputation methods. Note that there is a share of precomputed values between all valid feature-extraction modules (e.g.,class_freqs
computed in modulestatistical
can freely be used for any precomputation or feature extraction method of modulelandmarking
).- __init__(*args, **kwargs)
Methods
__init__
(*args, **kwargs)ft_cohesiveness
(N[, cohesiveness_alpha, ...])Compute the improved version of the weighted distance, that captures how dense or sparse is the example distribution.
ft_conceptvar
(N, y[, conceptvar_alpha, ...])Compute the concept variation that estimates the variability of class labels among examples.
ft_impconceptvar
(N, y[, ...])Compute the improved concept variation that estimates the variability of class labels among examples.
ft_wg_dist
(N[, wg_dist_alpha, ...])Compute the weighted distance, that captures how dense or sparse is the example distribution.
precompute_concept_dist
(N[, concept_dist_metric])Precompute some useful things to support complexity measures.
- classmethod ft_cohesiveness(N: ndarray, cohesiveness_alpha: float = 1.0, concept_dist_metric: str = 'euclidean', concept_distances: Optional[ndarray] = None) ndarray [source]
Compute the improved version of the weighted distance, that captures how dense or sparse is the example distribution.
- Parameters
- N
np.ndarray
Numerical fitted data.
- cohesiveness_alphafloat, optional
The alpha value to adjust the weight. The higher the alpha less is the effect of the weight in the computation.
- concept_dist_metricstr, optional
Metric used to compute distance between each pair of examples. See cdist from scipy for more options. Used only if the argument
concept_distances
is None.- concept_distances
np.ndarray
, optional Distance matrix of examples from
N
. Argument used to take advantage of precomputations.
- N
- Returns
np.ndarray
An array with the cohesiveness for each example.
References
- 1
Vilalta, R and Drissi, Y (2002). A Characterization of Difficult Problems in Classification. Proceedings of the 2002 International Conference on Machine Learning and Applications (pp. 133-138).
- classmethod ft_conceptvar(N: ndarray, y: ndarray, conceptvar_alpha: float = 2.0, concept_dist_metric: str = 'euclidean', concept_minimum: float = 1e-09, concept_distances: Optional[ndarray] = None) ndarray [source]
Compute the concept variation that estimates the variability of class labels among examples.
- Parameters
- N
np.ndarray
Numerical fitted data.
- y
np.ndarray
Target attribute.
- conceptvar_alphafloat, optional
The alpha value to adjust the weight. The higher the alpha less is the effect of the weight in the computation.
- concept_dist_metricstr, optional
Metric used to compute distance between each pair of examples. See cdist from scipy for more options. Used only if the argument
concept_distances
is None.- concept_minimum: float, optional
This variable is the minimum value considered in the computation. It will be sum when necessary to avoid division by zero.
- concept_distances
np.ndarray
, optional Distance matrix of examples from N. Argument used to take advantage of precomputations.
- N
- Returns
np.ndarray
An array with the concept variation for each example.
References
- 1
Vilalta, R. (1999). Understanding accuracy performance through concept characterization and algorithm analysis. In Proceedings of the ICML-99 workshop on recent advances in meta-learning and future work (pp. 3-9).
- classmethod ft_impconceptvar(N: ndarray, y: ndarray, impconceptvar_alpha: float = 1.0, concept_dist_metric: str = 'euclidean', concept_distances: Optional[ndarray] = None) ndarray [source]
Compute the improved concept variation that estimates the variability of class labels among examples.
- Parameters
- N
np.ndarray
Numerical fitted data.
- y
np.ndarray
Target attribute.
- impconceptvar_alphafloat, optional
The alpha value to adjust the weight. The higher the alpha less is the effect of the weight in the computation.
- concept_dist_metricstr, optional
Metric used to compute distance between each pair of examples. See cdist from scipy for more options. Used only if the argument
concept_distances
is None.- concept_distances
np.ndarray
, optional Distance matrix of examples from
N
. Argument used to take advantage of precomputations.
- N
- Returns
np.ndarray
An array with the improved concept variation for each example.
References
- 1
Vilalta, R and Drissi, Y (2002). A Characterization of Difficult Problems in Classification. Proceedings of the 2002 International Conference on Machine Learning and Applications (pp. 133-138).
- classmethod ft_wg_dist(N: ndarray, wg_dist_alpha: float = 2.0, concept_dist_metric: str = 'euclidean', concept_minimum: float = 1e-09, concept_distances: Optional[ndarray] = None) ndarray [source]
Compute the weighted distance, that captures how dense or sparse is the example distribution.
- Parameters
- N
np.ndarray
Numerical fitted data.
- wg_dist_alphafloat, optional
The alpha value to adjust the weight. The higher the alpha less is the effect of the weight in the computation.
- concept_dist_metricstr, optional
Metric used to compute distance between each pair of examples. See cdist from scipy for more options. Used only if the argument
concept_distances
is None.- concept_minimumfloat, optional
This variable is the minimum value considered in the computation. It will be sum when necessary to avoid division by zero.
- concept_distances
np.ndarray
, optional Distance matrix of examples from N. Argument used to take advantage of precomputations.
- N
- Returns
np.ndarray
An array with the weighted distance for each example.
References
- 1
Vilalta, R. (1999). Understanding accuracy performance through concept characterization and algorithm analysis. In Proceedings of the ICML-99 workshop on recent advances in meta-learning and future work (pp. 3-9).
- classmethod precompute_concept_dist(N: ndarray, concept_dist_metric: str = 'euclidean', **kwargs) Dict[str, Any] [source]
Precompute some useful things to support complexity measures.
- Parameters
- N
np.ndarray
, optional Numerical fitted data.
- concept_dist_metricstr, optional
Metric used to compute distance between each pair of examples. See cdist from scipy for more options.
- **kwargs
Additional arguments. May have previously precomputed before this method from other precomputed methods, so they can help speed up this precomputation.
- N
- Returns
dict
- With following precomputed items:
concept_distances
(np.ndarray
): Distance matrix of examples from N.