pymfe.model_based.MFEModelBased

class pymfe.model_based.MFEModelBased[source]

Keep methods for metafeatures of model-based group.

The convention adopted for metafeature extraction related methods is to always start with ft_ prefix to allow automatic method detection. This prefix is predefined within _internal module.

All method signature follows the conventions and restrictions listed below:

  1. For independent attribute data, X means every type of attribute, N means Numeric attributes only and C stands for Categorical attributes only. It is important to note that the categorical attribute sets between X and C and the numerical attribute sets between X and N may differ due to data transformations, performed while fitting data into MFE model, enabled by, respectively, transform_num and transform_cat arguments from fit (MFE method).

  2. Only arguments in MFE _custom_args_ft attribute (set up inside fit method) are allowed to be required method arguments. All other arguments must be strictly optional (i.e., has a predefined default value).

  3. The initial assumption is that the user can change any optional argument, without any previous verification of argument value or its type, via kwargs argument of extract method of MFE class.

  4. The return value of all feature extraction methods should be a single value or a generic List (preferably a np.ndarray) type with numeric values.

There is another type of method adopted for automatic detection. It is adopted the prefix precompute_ for automatic detection of these methods. These methods run while fitting some data into an MFE model automatically, and their objective is to precompute some common value shared between more than one feature extraction method. This strategy is a trade-off between more system memory consumption and speeds up of feature extraction. Their return value must always be a dictionary whose keys are possible extra arguments for both feature extraction methods and other precomputation methods. Note that there is a share of precomputed values between all valid feature-extraction modules (e.g., class_freqs computed in module statistical can freely be used for any precomputation or feature extraction method of module landmarking).

__init__(*args, **kwargs)

Methods

__init__(*args, **kwargs)

extract_table(dt_model[, leaf_nodes])

Bookkeep some information table from the dt_model into an array.

ft_leaves(dt_model)

Compute the number of leaf nodes in the DT model.

ft_leaves_branch(dt_model[, leaf_nodes, ...])

Compute the size of branches in the DT model.

ft_leaves_corrob(dt_model[, leaf_nodes, ...])

Compute the leaves corroboration of the DT model.

ft_leaves_homo(dt_model[, tree_shape, ...])

Compute the DT model Homogeneity for every leaf node.

ft_leaves_per_class(dt_model[, dt_info_table])

Compute the proportion of leaves per class in DT model.

ft_nodes(dt_model)

Compute the number of non-leaf nodes in DT model.

ft_nodes_per_attr(dt_model)

Compute the ratio of nodes per number of attributes in DT model.

ft_nodes_per_inst(dt_model)

Compute the ratio of non-leaf nodes per number of instances in DT model.

ft_nodes_per_level(dt_model[, ...])

Compute the ratio of number of nodes per tree level in DT model.

ft_nodes_repeated(dt_model[, dt_info_table, ...])

Compute the number of repeated nodes in DT model.

ft_tree_depth(dt_model[, dt_node_depths])

Compute the depth of every node in the DT model.

ft_tree_imbalance(dt_model[, leaf_nodes, ...])

Compute the tree imbalance for each leaf node.

ft_tree_shape(dt_model[, tree_shape, ...])

Compute the tree shape for every leaf node.

ft_var_importance(dt_model)

Compute the features importance of the DT model for each attribute.

precompute_model_based_class(N[, y, ...])

Precompute the DT Model and some information related to it.

classmethod extract_table(dt_model: DecisionTreeClassifier, leaf_nodes: Optional[ndarray] = None) ndarray[source]

Bookkeep some information table from the dt_model into an array.

Parameters
dt_modelsklearn.tree.DecisionTreeClassifier

The DT model.

leaf_nodesnp.ndarray, optional

Boolean array with value True if a node is a leaf. If given, can improve performance.

Returns
np.ndarray

DT model properties table calculated with extract_table method. Check its documentation for more information.

Notes

Each line in the returned array represents a node where:
  • Column 0: It is the id of the attribute splitted in that node.

  • Columns 1: It is the number of examples that fall on that node.

  • Columns 2: It is 0 if the node is not a leaf, otherwise is the class number represented by that leaf node.

classmethod ft_leaves(dt_model: DecisionTreeClassifier) int[source]

Compute the number of leaf nodes in the DT model.

Parameters
dt_modelsklearn.tree.DecisionTreeClassifier

The DT model.

Returns
int

Number of leaf nodes in the DT model.

References

1

Yonghong Peng, PA Flach, Pavel Brazdil, and Carlos Soares. Decision tree-based data characterization for meta-learning. In 2nd ECML/PKDD International Workshop on Integration and Collaboration Aspects of Data Mining, Decision Support and Meta-Learning(IDDM), pages 111 – 122, 2002a.

classmethod ft_leaves_branch(dt_model: DecisionTreeClassifier, leaf_nodes: Optional[ndarray] = None, dt_node_depths: Optional[ndarray] = None) ndarray[source]

Compute the size of branches in the DT model.

The size of branches consists in the depth of all leaves of the DT model.

Parameters
dt_modelsklearn.tree.DecisionTreeClassifier

The DT model.

leaf_nodesnp.ndarray, optional

Boolean array with value True if a node is a leaf. Argument used to take advantage of precomputations.

dt_node_depthsnp.ndarray, optional

Depth of every DT model node. Argument used to take advantage of precomputations.

Returns
np.ndarray

Size of branches of the DT model.

References

1

Yonghong Peng, PA Flach, Pavel Brazdil, and Carlos Soares. Decision tree-based data characterization for meta-learning. In 2nd ECML/PKDD International Workshop on Integration and Collaboration Aspects of Data Mining, Decision Support and Meta-Learning(IDDM), pages 111 – 122, 2002a.

classmethod ft_leaves_corrob(dt_model: DecisionTreeClassifier, leaf_nodes: Optional[ndarray] = None, dt_info_table: Optional[ndarray] = None) ndarray[source]

Compute the leaves corroboration of the DT model.

The Leaves corroboration is the proportion of examples that belong to each leaf of the DT model.

Parameters
dt_modelsklearn.tree.DecisionTreeClassifier

The DT model.

leaf_nodesnp.ndarray, optional

Boolean array with value True if a node is a leaf. Argument used to take advantage of precomputations.

dt_info_tablenp.ndarray, optional

DT model properties table calculated with extract_table method. Check its documentation for more information. Argument used to take advantage of precomputations.

Returns
np.ndarray

Leaves corroboration for every leaf node.

References

1

Hilan Bensusan, Christophe Giraud-Carrier, and Claire Kennedy. A higher-order approachto meta-learning. In 10th International Conference Inductive Logic Programming (ILP), pages 33 – 42, 2000.

classmethod ft_leaves_homo(dt_model: DecisionTreeClassifier, tree_shape: Optional[ndarray] = None, leaf_nodes: Optional[ndarray] = None, dt_node_depths: Optional[ndarray] = None) ndarray[source]

Compute the DT model Homogeneity for every leaf node.

The DT model homogeneity is calculated by the number of leaves divided by the structural shape (which is calculated by the ft_tree_shape method) of the DT model.

Parameters
dt_modelsklearn.tree.DecisionTreeClassifier

The DT model.

tree_shapenp.ndarray, optional

Tree shape as calculated in ft_tree_shape. Argument used to take advantage from precomputations.

leaf_nodesnp.ndarray, optional

Boolean array with value True if a node is a leaf. Used only if tree_shape is None. Argument used to take advantage of precomputations.

dt_node_depthsnp.ndarray, optional

Depth of every DT model node. Used only if tree_shape is None. Argument used to take advantage of precomputations.

Returns
np.ndarray

The DT model homogeneity for every leaf node.

References

1

Hilan Bensusan, Christophe Giraud-Carrier, and Claire Kennedy. A higher-order approachto meta-learning. In 10th International Conference Inductive Logic Programming (ILP), pages 33 – 42, 2000.

classmethod ft_leaves_per_class(dt_model: DecisionTreeClassifier, dt_info_table: Optional[ndarray] = None) ndarray[source]

Compute the proportion of leaves per class in DT model.

This quantity is computed by the proportion of leaves of the DT model associated with each class.

Parameters
dt_modelsklearn.tree.DecisionTreeClassifier

The DT model.

dt_info_tablenp.ndarray, optional

DT model properties table calculated with extract_table method. Check its documentation for more information. Argument used to take advantage of precomputations.

Returns
np.ndarray

Leaves per class.

References

1

Andray Filchenkov and Arseniy Pendryak. Datasets meta-feature description for recom-mending feature selection algorithm. In Artificial Intelligence and Natural Language and Information Extraction, Social Media and Web Search FRUCT Conference (AINL-ISMWFRUCT), pages 11 – 18, 2015.

classmethod ft_nodes(dt_model: DecisionTreeClassifier) int[source]

Compute the number of non-leaf nodes in DT model.

Parameters
dt_modelsklearn.tree.DecisionTreeClassifier

The DT model.

Returns
int

Number of non-leaf nodes.

References

1

Yonghong Peng, PA Flach, Pavel Brazdil, and Carlos Soares. Decision tree-based data characterization for meta-learning. In 2nd ECML/PKDD International Workshop on Integration and Collaboration Aspects of Data Mining, Decision Support and Meta-Learning(IDDM), pages 111 – 122, 2002a.

classmethod ft_nodes_per_attr(dt_model: DecisionTreeClassifier) float[source]

Compute the ratio of nodes per number of attributes in DT model.

Parameters
dt_modelsklearn.tree.DecisionTreeClassifier

The DT model.

Returns
float

Ratio of the number of non-leaf nodes per number of attributes.

References

1

Hilan Bensusan, Christophe Giraud-Carrier, and Claire Kennedy. A higher-order approachto meta-learning. In 10th International Conference Inductive Logic Programming (ILP), pages 33 – 42, 2000.

classmethod ft_nodes_per_inst(dt_model: DecisionTreeClassifier) float[source]

Compute the ratio of non-leaf nodes per number of instances in DT model.

Parameters
dt_modelsklearn.tree.DecisionTreeClassifier

The DT model.

Returns
float

Ratio of the number of non-leaf nodes per instances.

References

1

Hilan Bensusan, Christophe Giraud-Carrier, and Claire Kennedy. A higher-order approachto meta-learning. In 10th International Conference Inductive Logic Programming (ILP), pages 33 – 42, 2000.

classmethod ft_nodes_per_level(dt_model: DecisionTreeClassifier, dt_node_depths: Optional[ndarray] = None, non_leaf_nodes: Optional[ndarray] = None) ndarray[source]

Compute the ratio of number of nodes per tree level in DT model.

Parameters
dt_modelsklearn.tree.DecisionTreeClassifier

The DT model.

dt_node_depthsnp.ndarray, optional

Depth of every DT model node. Argument used to take advantage of precomputations.

non_leaf_nodesnp.ndarray, optional

Boolean array with value True if a node is non-leaf. Argument used to take advantage of precomputations.

Returns
np.ndarray

Number of nodes per level.

References

1

Yonghong Peng, PA Flach, Pavel Brazdil, and Carlos Soares. Decision tree-based data characterization for meta-learning. In 2nd ECML/PKDD International Workshop on Integration and Collaboration Aspects of Data Mining, Decision Support and Meta-Learning(IDDM), pages 111 – 122, 2002a.

classmethod ft_nodes_repeated(dt_model: DecisionTreeClassifier, dt_info_table: Optional[ndarray] = None, non_leaf_nodes: Optional[ndarray] = None) ndarray[source]

Compute the number of repeated nodes in DT model.

The number of repeated nodes is the number of repeated attributes that appear in the DT model.

Parameters
dt_modelsklearn.tree.DecisionTreeClassifier

The DT model.

dt_info_tablenp.ndarray, optional

DT model properties table calculated with extract_table method. Check its documentation for more information. Argument used to take advantage of precomputations.

non_leaf_nodesnp.ndarray, optional

Boolean array with value True if a node is non-leaf. Argument used to take advantage of precomputations.

Returns
np.ndarray

Absolute frequency of each repeated node.

References

1

Hilan Bensusan, Christophe Giraud-Carrier, and Claire Kennedy. A higher-order approachto meta-learning. In 10th International Conference Inductive Logic Programming (ILP), pages 33 – 42, 2000.

classmethod ft_tree_depth(dt_model: DecisionTreeClassifier, dt_node_depths: Optional[ndarray] = None) ndarray[source]

Compute the depth of every node in the DT model.

Parameters
dt_modelsklearn.tree.DecisionTreeClassifier

The DT model.

dt_node_depthsnp.ndarray, optional

Depth of each node in the DT model. Argument used to take advantage of precomputations.

Returns
np.ndarray

Depth of every node in the DT model.

References

1

Yonghong Peng, PA Flach, Pavel Brazdil, and Carlos Soares. Decision tree-based data characterization for meta-learning. In 2nd ECML/PKDD International Workshop on Integration and Collaboration Aspects of Data Mining, Decision Support and Meta-Learning(IDDM), pages 111 – 122, 2002a.

classmethod ft_tree_imbalance(dt_model: DecisionTreeClassifier, leaf_nodes: Optional[ndarray] = None, dt_node_depths: Optional[ndarray] = None) ndarray[source]

Compute the tree imbalance for each leaf node.

Parameters
dt_modelsklearn.tree.DecisionTreeClassifier

The DT model.

leaf_nodesnp.ndarray, optional

Boolean array with value True if a node is a leaf. Argument used to take advantage of precomputations.

dt_node_depthsnp.ndarray, optional

Depth of every DT model node. Argument used to take advantage of precomputations.

Returns
np.ndarray

Tree imbalance values for every leaf node.

References

1

Hilan Bensusan, Christophe Giraud-Carrier, and Claire Kennedy. A higher-order approachto meta-learning. In 10th International Conference Inductive Logic Programming (ILP), pages 33 – 42, 2000.

classmethod ft_tree_shape(dt_model: DecisionTreeClassifier, tree_shape: Optional[ndarray] = None, leaf_nodes: Optional[ndarray] = None, dt_node_depths: Optional[ndarray] = None) ndarray[source]

Compute the tree shape for every leaf node.

The tree shape is the probability of arrive in each leaf given a random walk. We call this as the structural shape of the DT model.

Parameters
dt_modelsklearn.tree.DecisionTreeClassifier

The DT model.

tree_shapenp.ndarray, optional

This method’s output. Argument used to take advantage of precomputations.

leaf_nodesnp.ndarray, optional

Boolean array with value True if a node is a leaf. Used only if tree_shape is None. Argument used to take advantage of precomputations.

dt_node_depthsnp.ndarray, optional

Depth of every DT model node. Used only if tree_shape is None. Argument used to take advantage of precomputations.

Returns
np.ndarray

The tree shape for every leaf node.

References

1

Hilan Bensusan, Christophe Giraud-Carrier, and Claire Kennedy. A higher-order approachto meta-learning. In 10th International Conference Inductive Logic Programming (ILP), pages 33 – 42, 2000.

classmethod ft_var_importance(dt_model: DecisionTreeClassifier) ndarray[source]

Compute the features importance of the DT model for each attribute.

It is calculated using the Gini index to estimate the amount of information used in the DT model.

Parameters
dt_modelsklearn.tree.DecisionTreeClassifier

The DT model.

Returns
np.ndarray

Features importance given by the DT model.

References

1

Hilan Bensusan, Christophe Giraud-Carrier, and Claire Kennedy. A higher-order approachto meta-learning. In 10th International Conference Inductive Logic Programming (ILP), pages 33 – 42, 2000.

classmethod precompute_model_based_class(N: ndarray, y: Optional[ndarray] = None, dt_model: Optional[DecisionTreeClassifier] = None, random_state: Optional[int] = None, hypparam_model_dt: Optional[Dict[str, Any]] = None, **kwargs) Dict[str, Any][source]

Precompute the DT Model and some information related to it.

Parameters
Nnp.ndarray

Numerical fitted data.

ynp.ndarray, optional

Target attribute.

dt_modelnp.ndarray, optional

Decision tree fitted with the given data N and y. This argument can be used to use a previously fitted custom model.

random_stateint, optional

If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random. Used only if argument dt_model is None.

kwargs:

Additional arguments. May have previously precomputed before this method from other precomputed methods, so they can help speed up this precomputation.

Returns
dict
With following precomputed items:
  • dt_model (sklearn.tree.DecisionTreeClassifier): decision tree classifier either fitted with the given data N and y (if dt_model is None), or the given dt_model.

  • dt_info_table (np.ndarray): some tree properties table.

  • dt_node_depths (np.ndarray): the depth of each tree node ordered by node (e.g., index one contain the node one depth, the index two the node two depth and so on.)