pymfe.info_theory.MFEInfoTheory

class pymfe.info_theory.MFEInfoTheory[source]

Keeps methods for metafeatures of Information Theory group.

The convention adopted for metafeature extraction related methods is to always start with ft_ prefix to allow automatic method detection. This prefix is predefined within _internal module.

All method signature follows the conventions and restrictions listed below:

For independent attribute data, X means every type of attribute, N means Numeric attributes only and C stands for Categorical attributes only. It is important to note that the categorical attribute sets between X and C and the numerical attribute sets between X and N may differ due to data transformations, performed while fitting data into MFE model, enabled by, respectively, transform_num and transform_cat arguments from fit (MFE method).
Only arguments in MFE _custom_args_ft attribute (set up inside fit method) are allowed to be required method arguments. All other arguments must be strictly optional (i.e., has a predefined default value).
The initial assumption is that the user can change any optional argument, without any previous verification of argument value or its type, via kwargs argument of extract method of MFE class.
The return value of all feature extraction methods should be a single value or a generic List (preferably a np.ndarray) type with numeric values.

There is another type of method adopted for automatic detection. It is adopted the prefix precompute_ for automatic detection of these methods. These methods run while fitting some data into an MFE model automatically, and their objective is to precompute some common value shared between more than one feature extraction method. This strategy is a trade-off between more system memory consumption and speeds up of feature extraction. Their return value must always be a dictionary whose keys are possible extra arguments for both feature extraction methods and other precomputation methods. Note that there is a share of precomputed values between all valid feature-extraction modules (e.g., class_freqs computed in module statistical can freely be used for any precomputation or feature extraction method of module landmarking).

__init__(*args, **kwargs)

Methods

`__init__`(args, *kwargs)
`ft_attr_conc`(C[, max_attr_num, random_state])	Compute concentration coef.
`ft_attr_ent`(C[, attr_ent])	Compute Shannon's entropy for each predictive attribute.
`ft_class_conc`(C, y)	Compute concentration coefficient between each attribute and class.
`ft_class_ent`(y[, class_ent, class_freqs])	Compute target attribute Shannon's entropy.
`ft_eq_num_attr`(C, y[, class_ent, ...])	Compute the number of attributes equivalent for a predictive task.
`ft_joint_ent`(C, y[, joint_ent])	Compute the joint entropy between each attribute and class.
`ft_mut_inf`(C, y[, mut_inf, attr_ent, ...])	Compute the mutual information between each attribute and target.
`ft_ns_ratio`(C, y[, attr_ent, mut_inf])	Compute the noisiness of attributes.
`precompute_class_freq`([y])	Precompute each distinct class (absolute) frequencies.
`precompute_entropy`([y, C, class_freqs])	Precompute various values related to Shannon's Entropy.

classmethod ft_attr_conc(C: ndarray, max_attr_num: Optional[int] = 12, random_state: Optional[int] = None) → ndarray[source]

Compute concentration coef. of each pair of distinct attributes.

Parameters

Cnp.ndarray: Categorical fitted data.
max_attr_numint, optional: Maximum number of attributes considered. If C has more attributes than this value, this feature will be calculated in a sample of max_attr_num random attributes. If None, then all attributes are considered. Note that this method cost is combinatorial to the number of attributes considered.
random_stateint, optional: Used only if max_attr_num is given and C has more attributes than it. This random seed is set before sampling C attributes.

Returns

np.ndarray: Concentration coefficient for each pair of distinct predictive attribute.

References

1: Alexandros Kalousis and Melanie Hilario. Model selection via meta-learning: a comparative study. International Journal on Artificial Intelligence Tools, 10(4):525–554, 2001.

classmethod ft_attr_ent(C: ndarray, attr_ent: Optional[ndarray] = None) → ndarray[source]

Compute Shannon’s entropy for each predictive attribute.

The Shannon’s Entropy H of a vector x is defined as:

H(x) = - sum_{val in phi_x}(P(x = val) * log2(P(x = val))

Where phi_x is a set of all possible distinct values in vector x and P(x = val) is the probability of x assume some value val in phi_x.

Parameters

Cnp.ndarray: Categorical fitted data.
attr_entnp.ndarray, optional: This argument is this method own return value, meant to exploit possible attribute entropy precomputations.

Returns

np.ndarray: Entropy of each predictive attribute.

References

1: Donald Michie, David J. Spiegelhalter, Charles C. Taylor, and John Campbell. Machine Learning, Neural and Statistical Classification, volume 37. Ellis Horwood Upper Saddle River, 1994.

classmethod ft_class_conc(C: ndarray, y: ndarray) → ndarray[source]

Compute concentration coefficient between each attribute and class.

Parameters

Cnp.ndarray: Categorical fitted data.
ynp.ndarray: Target attribute.

Returns

np.ndarray: Concentration coefficient between each predictive attribute and the target attribute (class.)

References

1: Alexandros Kalousis and Melanie Hilario. Model selection via meta-learning: a comparative study. International Journal on Artificial Intelligence Tools, 10(4):525–554, 2001.

classmethod ft_class_ent(y: ndarray, class_ent: Optional[float] = None, class_freqs: Optional[ndarray] = None) → float[source]

Compute target attribute Shannon’s entropy.

The Shannon’s Entropy H of a vector y is defined as:

H(y) = - sum_{val in phi_y}(P(y = val) * log2(P(y = val))

Where phi_y is a set of all possible distinct values in vector y and P(y = val) is the probability of y assume some value val in phi_y.

Parameters

ynp.ndarray: Target attribute.
class_entfloat, optional: Entropy of the target attribute y. Used to explot precomputations. If NoneType, this argument is calculated using the method ft_class_ent.
class_freqsnp.ndarray, optional: Absolute frequency of each distinct class in y. This argument is meant to exploit precomputations, used if class_ent is NoneType.

Returns

float: Entropy of the target attribute.

References

1: Donald Michie, David J. Spiegelhalter, Charles C. Taylor, and John Campbell. Machine Learning, Neural and Statistical Classification, volume 37. Ellis Horwood Upper Saddle River, 1994.

classmethod ft_eq_num_attr(C: ndarray, y: ndarray, class_ent: Optional[float] = None, class_freqs: Optional[ndarray] = None, mut_inf: Optional[ndarray] = None) → float[source]

Compute the number of attributes equivalent for a predictive task.

The attribute equivalence E is defined as:

E = attr_num * (H(y) / sum_x(MI(x, y)))

Where H(y) is the Shannon’s Entropy of the target attribute and MI(x,y) is the Mutual Information between the predictive attribute x and target attribute y.

Parameters

Cnp.ndarray: Categorical fitted data.
ynp.ndarray: Target attribute.
class_entfloat, optional: Entropy of the target attribute y. Used to explot precomputations. If NoneType, this argument is calculated using the method ft_class_ent.
class_freqsnp.ndarray, optional: Absolute frequency of each distinct class in y. This argument is meant to exploit precomputations, used if class_ent is NoneType.
mut_infnp.ndarray, optional: Values of mutual information between each numeric attribute of N and target y. Similarly, from the argument above, this argument purpose is to exploit the precomputations of mutual information. If this argument value is NoneType, then it is calculated using the method ft_mut_int.

Returns

float: Estimated number of equivalent predictive attributes.

References

1: Donald Michie, David J. Spiegelhalter, Charles C. Taylor, and John Campbell. Machine Learning, Neural and Statistical Classification, volume 37. Ellis Horwood Upper Saddle River, 1994.

classmethod ft_joint_ent(C: ndarray, y: ndarray, joint_ent: Optional[ndarray] = None) → ndarray[source]

Compute the joint entropy between each attribute and class.

The Joint Entropy H between a predictive attribute x and target attribute y is defined as:

H(x, y) = - sum_{phi_x}(sum_{phi_y}(p_i_j * log2(p_i_j)))

Where phi_x and phi_y are sets of possible distinct values for, respectively, x and y and p_i_j is defined as:

p_i_j = P(x = phi_x_i, y = phi_y_j)

That is, p_i_j is the joint probability of x to assume a specific value i in the set phi_x simultaneously with y assuming a specific value j in the set phi_y.

Parameters

Cnp.ndarray: Categorical fitted data.
ynp.ndarray: Target attribute.
joint_entnp.ndarray, optional: This argument is this method own return value, meant to exploit possible joint entropy precomputations.

Returns

np.ndarray: Estimated joint entropy between each predictive attribute and the target attribute (class attribute.)

References

1: Donald Michie, David J. Spiegelhalter, Charles C. Taylor, and John Campbell. Machine Learning, Neural and Statistical Classification, volume 37. Ellis Horwood Upper Saddle River, 1994.

classmethod ft_mut_inf(C: ndarray, y: ndarray, mut_inf: Optional[ndarray] = None, attr_ent: Optional[ndarray] = None, class_ent: Optional[float] = None, joint_ent: Optional[ndarray] = None, class_freqs: Optional[ndarray] = None) → ndarray[source]

Compute the mutual information between each attribute and target.

The mutual Information MI between an independent attribute x and target attribute y is defined as:

MI(x, y) = H(x) + H(y) - H(x, y)

Where H(x) and H(y) are, respectively, the Shannon’s Entropy (see the documentation of ft_attr_ent or ft_class_ent for more information) for x and y and H(x, y) is the joint entropy of x and y (see ft_joint_ent documentation more details.)

Parameters

Cnp.ndarray: Categorical fitted data.
ynp.ndarray: Target attribute.
mut_infnp.ndarray, optional: This argument is this method own return value, meant to exploit possible mutual information precomputations.
attr_entnp.ndarray, optional: Values of each attribute entropy in N. This argument purpose is to exploit possible precomputations of attribute entropy. If NoneType, this argument is calculated using ft_attr_ent method.
class_entfloat, optional: Entropy of the target attribute y. Used to explot precomputations. If NoneType, this argument is calculated using the method ft_class_ent.
joint_entnp.ndarray, optional: Joint entropy between each independent attribute in N and target attribute y. If NoneType, this argument is calculated using the method ft_joint_ent.
class_freqsnp.ndarray, optional: Absolute frequency of each distinct class in y.

Returns

np.ndarray: Mutual information between each attribute and the target attribute.

References

1: Donald Michie, David J. Spiegelhalter, Charles C. Taylor, and John Campbell. Machine Learning, Neural and Statistical Classification, volume 37. Ellis Horwood Upper Saddle River, 1994.

classmethod ft_ns_ratio(C: ndarray, y: ndarray, attr_ent: Optional[ndarray] = None, mut_inf: Optional[ndarray] = None) → float[source]

Compute the noisiness of attributes.

Let y be a target attribute and x one predictive attribute in a dataset N. Noisiness N is defined as:

N = (sum_x(attr_entropy(x)) - sum_x(MI(x, y))) / sum_x(MI(x, y))

where MI(x, y) is the mutual information between target attribute y and predictive attribute x, and all sum is performed over each distinct attribute x in N.

Parameters

Cnp.ndarray: Categorical fitted data.
ynp.ndarray: Target attribute.
attr_entnp.ndarray, optional: Values of each attribute entropy in N. This argument purpose is to exploit possible precomputations of attribute entropy. If NoneType, this argument is calculated using ft_attr_ent method.
mut_infnp.ndarray, optional: Values of mutual information between each numeric attribute of N and target y. Similarly, from the argument above, this argument purpose is to exploit the precomputations of mutual information. If this argument value is NoneType, then it is calculated using the method ft_mut_int.

Returns

float: Estimated noisiness of the predictive attributes.

References

1: Donald Michie, David J. Spiegelhalter, Charles C. Taylor, and John Campbell. Machine Learning, Neural and Statistical Classification, volume 37. Ellis Horwood Upper Saddle River, 1994.

classmethod precompute_class_freq(y: Optional[ndarray] = None, **kwargs) → Dict[str, Any][source]

Precompute each distinct class (absolute) frequencies.

Parameters

ynp.ndarray, optional: Target attribute.
kwargs:: Additional arguments. May have previously precomputed before this method from other precomputed methods, so they can help speed up this precomputation.

Returns

dict

With following precomputed items:

class_freqs (np.ndarray): absolute frequency of each distinct class in y, if y is not None-Type.

classmethod precompute_entropy(y: Optional[ndarray] = None, C: Optional[ndarray] = None, class_freqs: Optional[ndarray] = None, **kwargs) → Dict[str, Any][source]

Precompute various values related to Shannon’s Entropy.

Parameters

Cnp.ndarray: Categorical fitted data.
ynp.ndarray: Target attribute.
class_freqsnp.ndarray, optional: Absolute frequency of each distinct class in y.
kwargs:: Additional arguments. May have previously precomputed before this method from other precomputed methods, so they can help speed up this precomputation.

Returns

dict

With following precomputed items:

class_ent (float): Shannon’s Entropy of y, if it is not NoneType.
attr_ent (np.ndarray): Shannon’s Entropy of each attribute in C, if it is not NoneType.
joint_ent (np.ndarray): Joint Entropy between each attribute in C and target attribute y if both are not NoneType.
mut_inf (np.ndarray): mutual information between each attribute in C and y, if they both are not NoneType.