pymfe.info_theory.MFEInfoTheory
- class pymfe.info_theory.MFEInfoTheory[source]
Keeps methods for metafeatures of
Information Theory
group.The convention adopted for metafeature extraction related methods is to always start with
ft_
prefix to allow automatic method detection. This prefix is predefined within_internal
module.All method signature follows the conventions and restrictions listed below:
For independent attribute data,
X
meansevery type of attribute
,N
meansNumeric attributes only
andC
stands forCategorical attributes only
. It is important to note that the categorical attribute sets betweenX
andC
and the numerical attribute sets betweenX
andN
may differ due to data transformations, performed while fitting data into MFE model, enabled by, respectively,transform_num
andtransform_cat
arguments fromfit
(MFE method).Only arguments in MFE
_custom_args_ft
attribute (set up insidefit
method) are allowed to be required method arguments. All other arguments must be strictly optional (i.e., has a predefined default value).The initial assumption is that the user can change any optional argument, without any previous verification of argument value or its type, via kwargs argument of
extract
method of MFE class.The return value of all feature extraction methods should be a single value or a generic List (preferably a
np.ndarray
) type with numeric values.
There is another type of method adopted for automatic detection. It is adopted the prefix
precompute_
for automatic detection of these methods. These methods run while fitting some data into an MFE model automatically, and their objective is to precompute some common value shared between more than one feature extraction method. This strategy is a trade-off between more system memory consumption and speeds up of feature extraction. Their return value must always be a dictionary whose keys are possible extra arguments for both feature extraction methods and other precomputation methods. Note that there is a share of precomputed values between all valid feature-extraction modules (e.g.,class_freqs
computed in modulestatistical
can freely be used for any precomputation or feature extraction method of modulelandmarking
).- __init__(*args, **kwargs)
Methods
__init__
(*args, **kwargs)ft_attr_conc
(C[, max_attr_num, random_state])Compute concentration coef.
ft_attr_ent
(C[, attr_ent])Compute Shannon's entropy for each predictive attribute.
ft_class_conc
(C, y)Compute concentration coefficient between each attribute and class.
ft_class_ent
(y[, class_ent, class_freqs])Compute target attribute Shannon's entropy.
ft_eq_num_attr
(C, y[, class_ent, ...])Compute the number of attributes equivalent for a predictive task.
ft_joint_ent
(C, y[, joint_ent])Compute the joint entropy between each attribute and class.
ft_mut_inf
(C, y[, mut_inf, attr_ent, ...])Compute the mutual information between each attribute and target.
ft_ns_ratio
(C, y[, attr_ent, mut_inf])Compute the noisiness of attributes.
Precompute each distinct class (absolute) frequencies.
precompute_entropy
([y, C, class_freqs])Precompute various values related to Shannon's Entropy.
- classmethod ft_attr_conc(C: ndarray, max_attr_num: Optional[int] = 12, random_state: Optional[int] = None) ndarray [source]
Compute concentration coef. of each pair of distinct attributes.
- Parameters
- C
np.ndarray
Categorical fitted data.
- max_attr_numint, optional
Maximum number of attributes considered. If
C
has more attributes than this value, this feature will be calculated in a sample ofmax_attr_num
random attributes. If None, then all attributes are considered. Note that this method cost is combinatorial to the number of attributes considered.- random_stateint, optional
Used only if
max_attr_num
is given andC
has more attributes than it. This random seed is set before samplingC
attributes.
- C
- Returns
np.ndarray
Concentration coefficient for each pair of distinct predictive attribute.
References
- 1
Alexandros Kalousis and Melanie Hilario. Model selection via meta-learning: a comparative study. International Journal on Artificial Intelligence Tools, 10(4):525–554, 2001.
- classmethod ft_attr_ent(C: ndarray, attr_ent: Optional[ndarray] = None) ndarray [source]
Compute Shannon’s entropy for each predictive attribute.
The Shannon’s Entropy H of a vector x is defined as:
H(x) = - sum_{val in phi_x}(P(x = val) * log2(P(x = val))
Where phi_x is a set of all possible distinct values in vector x and P(x = val) is the probability of x assume some value val in phi_x.
- Parameters
- C
np.ndarray
Categorical fitted data.
- attr_ent
np.ndarray
, optional This argument is this method own return value, meant to exploit possible attribute entropy precomputations.
- C
- Returns
np.ndarray
Entropy of each predictive attribute.
References
- 1
Donald Michie, David J. Spiegelhalter, Charles C. Taylor, and John Campbell. Machine Learning, Neural and Statistical Classification, volume 37. Ellis Horwood Upper Saddle River, 1994.
- classmethod ft_class_conc(C: ndarray, y: ndarray) ndarray [source]
Compute concentration coefficient between each attribute and class.
- Parameters
- C
np.ndarray
Categorical fitted data.
- y
np.ndarray
Target attribute.
- C
- Returns
np.ndarray
Concentration coefficient between each predictive attribute and the target attribute (class.)
References
- 1
Alexandros Kalousis and Melanie Hilario. Model selection via meta-learning: a comparative study. International Journal on Artificial Intelligence Tools, 10(4):525–554, 2001.
- classmethod ft_class_ent(y: ndarray, class_ent: Optional[float] = None, class_freqs: Optional[ndarray] = None) float [source]
Compute target attribute Shannon’s entropy.
The Shannon’s Entropy H of a vector y is defined as:
H(y) = - sum_{val in phi_y}(P(y = val) * log2(P(y = val))
Where phi_y is a set of all possible distinct values in vector
y
and P(y = val) is the probability of y assume some value val in phi_y.- Parameters
- y
np.ndarray
Target attribute.
- class_entfloat, optional
Entropy of the target attribute
y
. Used to explot precomputations. IfNoneType
, this argument is calculated using the methodft_class_ent
.- class_freqs
np.ndarray
, optional Absolute frequency of each distinct class in
y
. This argument is meant to exploit precomputations, used ifclass_ent
isNoneType
.
- y
- Returns
- float
Entropy of the target attribute.
References
- 1
Donald Michie, David J. Spiegelhalter, Charles C. Taylor, and John Campbell. Machine Learning, Neural and Statistical Classification, volume 37. Ellis Horwood Upper Saddle River, 1994.
- classmethod ft_eq_num_attr(C: ndarray, y: ndarray, class_ent: Optional[float] = None, class_freqs: Optional[ndarray] = None, mut_inf: Optional[ndarray] = None) float [source]
Compute the number of attributes equivalent for a predictive task.
The attribute equivalence E is defined as:
E = attr_num * (H(y) / sum_x(MI(x, y)))
Where H(y) is the Shannon’s Entropy of the target attribute and MI(x,y) is the Mutual Information between the predictive attribute x and target attribute
y
.- Parameters
- C
np.ndarray
Categorical fitted data.
- y
np.ndarray
Target attribute.
- class_entfloat, optional
Entropy of the target attribute
y
. Used to explot precomputations. IfNoneType
, this argument is calculated using the methodft_class_ent
.- class_freqs
np.ndarray
, optional Absolute frequency of each distinct class in
y
. This argument is meant to exploit precomputations, used ifclass_ent
isNoneType
.- mut_inf
np.ndarray
, optional Values of mutual information between each numeric attribute of
N
and targety
. Similarly, from the argument above, this argument purpose is to exploit the precomputations of mutual information. If this argument value isNoneType
, then it is calculated using the methodft_mut_int
.
- C
- Returns
- float
Estimated number of equivalent predictive attributes.
References
- 1
Donald Michie, David J. Spiegelhalter, Charles C. Taylor, and John Campbell. Machine Learning, Neural and Statistical Classification, volume 37. Ellis Horwood Upper Saddle River, 1994.
- classmethod ft_joint_ent(C: ndarray, y: ndarray, joint_ent: Optional[ndarray] = None) ndarray [source]
Compute the joint entropy between each attribute and class.
The Joint Entropy H between a predictive attribute x and target attribute
y
is defined as:H(x, y) = - sum_{phi_x}(sum_{phi_y}(p_i_j * log2(p_i_j)))
Where phi_x and phi_y are sets of possible distinct values for, respectively, x and
y
and p_i_j is defined as:p_i_j = P(x = phi_x_i, y = phi_y_j)
That is, p_i_j is the joint probability of x to assume a specific value i in the set phi_x simultaneously with
y
assuming a specific value j in the set phi_y.- Parameters
- C
np.ndarray
Categorical fitted data.
- y
np.ndarray
Target attribute.
- joint_ent
np.ndarray
, optional This argument is this method own return value, meant to exploit possible joint entropy precomputations.
- C
- Returns
np.ndarray
Estimated joint entropy between each predictive attribute and the target attribute (class attribute.)
References
- 1
Donald Michie, David J. Spiegelhalter, Charles C. Taylor, and John Campbell. Machine Learning, Neural and Statistical Classification, volume 37. Ellis Horwood Upper Saddle River, 1994.
- classmethod ft_mut_inf(C: ndarray, y: ndarray, mut_inf: Optional[ndarray] = None, attr_ent: Optional[ndarray] = None, class_ent: Optional[float] = None, joint_ent: Optional[ndarray] = None, class_freqs: Optional[ndarray] = None) ndarray [source]
Compute the mutual information between each attribute and target.
The mutual Information MI between an independent attribute x and target attribute
y
is defined as:MI(x, y) = H(x) + H(y) - H(x, y)
Where H(x) and H(y) are, respectively, the Shannon’s Entropy (see the documentation of
ft_attr_ent
orft_class_ent
for more information) for x andy
and H(x, y) is the joint entropy of x andy
(seeft_joint_ent
documentation more details.)- Parameters
- C
np.ndarray
Categorical fitted data.
- y
np.ndarray
Target attribute.
- mut_inf
np.ndarray
, optional This argument is this method own return value, meant to exploit possible mutual information precomputations.
- attr_ent
np.ndarray
, optional Values of each attribute entropy in
N
. This argument purpose is to exploit possible precomputations of attribute entropy. IfNoneType
, this argument is calculated usingft_attr_ent
method.- class_entfloat, optional
Entropy of the target attribute
y
. Used to explot precomputations. IfNoneType
, this argument is calculated using the methodft_class_ent
.- joint_ent
np.ndarray
, optional Joint entropy between each independent attribute in
N
and target attributey
. IfNoneType
, this argument is calculated using the methodft_joint_ent
.- class_freqs
np.ndarray
, optional Absolute frequency of each distinct class in
y
.
- C
- Returns
np.ndarray
Mutual information between each attribute and the target attribute.
References
- 1
Donald Michie, David J. Spiegelhalter, Charles C. Taylor, and John Campbell. Machine Learning, Neural and Statistical Classification, volume 37. Ellis Horwood Upper Saddle River, 1994.
- classmethod ft_ns_ratio(C: ndarray, y: ndarray, attr_ent: Optional[ndarray] = None, mut_inf: Optional[ndarray] = None) float [source]
Compute the noisiness of attributes.
Let
y
be a target attribute and x one predictive attribute in a datasetN
. NoisinessN
is defined as:N = (sum_x(attr_entropy(x)) - sum_x(MI(x, y))) / sum_x(MI(x, y))
where MI(x, y) is the mutual information between target attribute
y
and predictive attribute x, and all sum is performed over each distinct attribute x inN
.- Parameters
- C
np.ndarray
Categorical fitted data.
- y
np.ndarray
Target attribute.
- attr_ent
np.ndarray
, optional Values of each attribute entropy in
N
. This argument purpose is to exploit possible precomputations of attribute entropy. IfNoneType
, this argument is calculated usingft_attr_ent
method.- mut_inf
np.ndarray
, optional Values of mutual information between each numeric attribute of
N
and targety
. Similarly, from the argument above, this argument purpose is to exploit the precomputations of mutual information. If this argument value isNoneType
, then it is calculated using the methodft_mut_int
.
- C
- Returns
- float
Estimated noisiness of the predictive attributes.
References
- 1
Donald Michie, David J. Spiegelhalter, Charles C. Taylor, and John Campbell. Machine Learning, Neural and Statistical Classification, volume 37. Ellis Horwood Upper Saddle River, 1994.
- classmethod precompute_class_freq(y: Optional[ndarray] = None, **kwargs) Dict[str, Any] [source]
Precompute each distinct class (absolute) frequencies.
- Parameters
- y
np.ndarray
, optional Target attribute.
- kwargs:
Additional arguments. May have previously precomputed before this method from other precomputed methods, so they can help speed up this precomputation.
- y
- Returns
dict
- With following precomputed items:
class_freqs
(np.ndarray
): absolute frequency of each distinct class iny
, ify
is notNone-Type
.
- classmethod precompute_entropy(y: Optional[ndarray] = None, C: Optional[ndarray] = None, class_freqs: Optional[ndarray] = None, **kwargs) Dict[str, Any] [source]
Precompute various values related to Shannon’s Entropy.
- Parameters
- C
np.ndarray
Categorical fitted data.
- y
np.ndarray
Target attribute.
- class_freqs
np.ndarray
, optional Absolute frequency of each distinct class in
y
.- kwargs:
Additional arguments. May have previously precomputed before this method from other precomputed methods, so they can help speed up this precomputation.
- C
- Returns
dict
- With following precomputed items:
class_ent
(float): Shannon’s Entropy ofy
, if it is notNoneType
.attr_ent
(np.ndarray
): Shannon’s Entropy of each attribute inC
, if it is notNoneType
.joint_ent
(np.ndarray
): Joint Entropy between each attribute inC
and target attributey
if both are notNoneType
.mut_inf
(np.ndarray
): mutual information between each attribute inC
andy
, if they both are notNoneType
.