pymfe.model_based.MFEModelBased

class pymfe.model_based.MFEModelBased[source]

Keep methods for metafeatures of model-based group.

The convention adopted for metafeature extraction related methods is to always start with ft_ prefix to allow automatic method detection. This prefix is predefined within _internal module.

All method signature follows the conventions and restrictions listed below:

For independent attribute data, X means every type of attribute, N means Numeric attributes only and C stands for Categorical attributes only. It is important to note that the categorical attribute sets between X and C and the numerical attribute sets between X and N may differ due to data transformations, performed while fitting data into MFE model, enabled by, respectively, transform_num and transform_cat arguments from fit (MFE method).
Only arguments in MFE _custom_args_ft attribute (set up inside fit method) are allowed to be required method arguments. All other arguments must be strictly optional (i.e., has a predefined default value).
The initial assumption is that the user can change any optional argument, without any previous verification of argument value or its type, via kwargs argument of extract method of MFE class.
The return value of all feature extraction methods should be a single value or a generic List (preferably a np.ndarray) type with numeric values.

There is another type of method adopted for automatic detection. It is adopted the prefix precompute_ for automatic detection of these methods. These methods run while fitting some data into an MFE model automatically, and their objective is to precompute some common value shared between more than one feature extraction method. This strategy is a trade-off between more system memory consumption and speeds up of feature extraction. Their return value must always be a dictionary whose keys are possible extra arguments for both feature extraction methods and other precomputation methods. Note that there is a share of precomputed values between all valid feature-extraction modules (e.g., class_freqs computed in module statistical can freely be used for any precomputation or feature extraction method of module landmarking).

__init__(*args, **kwargs)

Methods

`__init__`(args, *kwargs)
`extract_table`(dt_model[, leaf_nodes])	Bookkeep some information table from the `dt_model` into an array.
`ft_leaves`(dt_model)	Compute the number of leaf nodes in the DT model.
`ft_leaves_branch`(dt_model[, leaf_nodes, ...])	Compute the size of branches in the DT model.
`ft_leaves_corrob`(dt_model[, leaf_nodes, ...])	Compute the leaves corroboration of the DT model.
`ft_leaves_homo`(dt_model[, tree_shape, ...])	Compute the DT model Homogeneity for every leaf node.
`ft_leaves_per_class`(dt_model[, dt_info_table])	Compute the proportion of leaves per class in DT model.
`ft_nodes`(dt_model)	Compute the number of non-leaf nodes in DT model.
`ft_nodes_per_attr`(dt_model)	Compute the ratio of nodes per number of attributes in DT model.
`ft_nodes_per_inst`(dt_model)	Compute the ratio of non-leaf nodes per number of instances in DT model.
`ft_nodes_per_level`(dt_model[, ...])	Compute the ratio of number of nodes per tree level in DT model.
`ft_nodes_repeated`(dt_model[, dt_info_table, ...])	Compute the number of repeated nodes in DT model.
`ft_tree_depth`(dt_model[, dt_node_depths])	Compute the depth of every node in the DT model.
`ft_tree_imbalance`(dt_model[, leaf_nodes, ...])	Compute the tree imbalance for each leaf node.
`ft_tree_shape`(dt_model[, tree_shape, ...])	Compute the tree shape for every leaf node.
`ft_var_importance`(dt_model)	Compute the features importance of the DT model for each attribute.
`precompute_model_based_class`(N[, y, ...])	Precompute the DT Model and some information related to it.

classmethod extract_table(dt_model: DecisionTreeClassifier, leaf_nodes: Optional[ndarray] = None) → ndarray[source]

Bookkeep some information table from the dt_model into an array.

Parameters

dt_modelsklearn.tree.DecisionTreeClassifier: The DT model.
leaf_nodesnp.ndarray, optional: Boolean array with value True if a node is a leaf. If given, can improve performance.

Returns

np.ndarray: DT model properties table calculated with extract_table method. Check its documentation for more information.

Notes

Each line in the returned array represents a node where:

Column 0: It is the id of the attribute splitted in that node.
Columns 1: It is the number of examples that fall on that node.
Columns 2: It is 0 if the node is not a leaf, otherwise is the class number represented by that leaf node.

classmethod ft_leaves(dt_model: DecisionTreeClassifier) → int[source]

Compute the number of leaf nodes in the DT model.

Parameters

dt_modelsklearn.tree.DecisionTreeClassifier: The DT model.

Returns

int: Number of leaf nodes in the DT model.

References

1: Yonghong Peng, PA Flach, Pavel Brazdil, and Carlos Soares. Decision tree-based data characterization for meta-learning. In 2nd ECML/PKDD International Workshop on Integration and Collaboration Aspects of Data Mining, Decision Support and Meta-Learning(IDDM), pages 111 – 122, 2002a.

classmethod ft_leaves_branch(dt_model: DecisionTreeClassifier, leaf_nodes: Optional[ndarray] = None, dt_node_depths: Optional[ndarray] = None) → ndarray[source]

Compute the size of branches in the DT model.

The size of branches consists in the depth of all leaves of the DT model.

Parameters

dt_modelsklearn.tree.DecisionTreeClassifier: The DT model.
leaf_nodesnp.ndarray, optional: Boolean array with value True if a node is a leaf. Argument used to take advantage of precomputations.
dt_node_depthsnp.ndarray, optional: Depth of every DT model node. Argument used to take advantage of precomputations.

Returns

np.ndarray: Size of branches of the DT model.

References

1: Yonghong Peng, PA Flach, Pavel Brazdil, and Carlos Soares. Decision tree-based data characterization for meta-learning. In 2nd ECML/PKDD International Workshop on Integration and Collaboration Aspects of Data Mining, Decision Support and Meta-Learning(IDDM), pages 111 – 122, 2002a.

classmethod ft_leaves_corrob(dt_model: DecisionTreeClassifier, leaf_nodes: Optional[ndarray] = None, dt_info_table: Optional[ndarray] = None) → ndarray[source]

Compute the leaves corroboration of the DT model.

The Leaves corroboration is the proportion of examples that belong to each leaf of the DT model.

Parameters

dt_modelsklearn.tree.DecisionTreeClassifier: The DT model.
leaf_nodesnp.ndarray, optional: Boolean array with value True if a node is a leaf. Argument used to take advantage of precomputations.
dt_info_tablenp.ndarray, optional: DT model properties table calculated with extract_table method. Check its documentation for more information. Argument used to take advantage of precomputations.

Returns

np.ndarray: Leaves corroboration for every leaf node.

References

1: Hilan Bensusan, Christophe Giraud-Carrier, and Claire Kennedy. A higher-order approachto meta-learning. In 10th International Conference Inductive Logic Programming (ILP), pages 33 – 42, 2000.

classmethod ft_leaves_homo(dt_model: DecisionTreeClassifier, tree_shape: Optional[ndarray] = None, leaf_nodes: Optional[ndarray] = None, dt_node_depths: Optional[ndarray] = None) → ndarray[source]

Compute the DT model Homogeneity for every leaf node.

The DT model homogeneity is calculated by the number of leaves divided by the structural shape (which is calculated by the ft_tree_shape method) of the DT model.

Parameters

dt_modelsklearn.tree.DecisionTreeClassifier: The DT model.
tree_shapenp.ndarray, optional: Tree shape as calculated in ft_tree_shape. Argument used to take advantage from precomputations.
leaf_nodesnp.ndarray, optional: Boolean array with value True if a node is a leaf. Used only if tree_shape is None. Argument used to take advantage of precomputations.
dt_node_depthsnp.ndarray, optional: Depth of every DT model node. Used only if tree_shape is None. Argument used to take advantage of precomputations.

Returns

np.ndarray: The DT model homogeneity for every leaf node.

References

1: Hilan Bensusan, Christophe Giraud-Carrier, and Claire Kennedy. A higher-order approachto meta-learning. In 10th International Conference Inductive Logic Programming (ILP), pages 33 – 42, 2000.

classmethod ft_leaves_per_class(dt_model: DecisionTreeClassifier, dt_info_table: Optional[ndarray] = None) → ndarray[source]

Compute the proportion of leaves per class in DT model.

This quantity is computed by the proportion of leaves of the DT model associated with each class.

Parameters

dt_modelsklearn.tree.DecisionTreeClassifier: The DT model.
dt_info_tablenp.ndarray, optional: DT model properties table calculated with extract_table method. Check its documentation for more information. Argument used to take advantage of precomputations.

Returns

np.ndarray: Leaves per class.

References

1: Andray Filchenkov and Arseniy Pendryak. Datasets meta-feature description for recom-mending feature selection algorithm. In Artificial Intelligence and Natural Language and Information Extraction, Social Media and Web Search FRUCT Conference (AINL-ISMWFRUCT), pages 11 – 18, 2015.

classmethod ft_nodes(dt_model: DecisionTreeClassifier) → int[source]

Compute the number of non-leaf nodes in DT model.

Parameters

dt_modelsklearn.tree.DecisionTreeClassifier: The DT model.

Returns

int: Number of non-leaf nodes.

References

1: Yonghong Peng, PA Flach, Pavel Brazdil, and Carlos Soares. Decision tree-based data characterization for meta-learning. In 2nd ECML/PKDD International Workshop on Integration and Collaboration Aspects of Data Mining, Decision Support and Meta-Learning(IDDM), pages 111 – 122, 2002a.

classmethod ft_nodes_per_attr(dt_model: DecisionTreeClassifier) → float[source]

Compute the ratio of nodes per number of attributes in DT model.

Parameters

dt_modelsklearn.tree.DecisionTreeClassifier: The DT model.

Returns

float: Ratio of the number of non-leaf nodes per number of attributes.

References

1: Hilan Bensusan, Christophe Giraud-Carrier, and Claire Kennedy. A higher-order approachto meta-learning. In 10th International Conference Inductive Logic Programming (ILP), pages 33 – 42, 2000.

classmethod ft_nodes_per_inst(dt_model: DecisionTreeClassifier) → float[source]

Compute the ratio of non-leaf nodes per number of instances in DT model.

Parameters

dt_modelsklearn.tree.DecisionTreeClassifier: The DT model.

Returns

float: Ratio of the number of non-leaf nodes per instances.

References

1: Hilan Bensusan, Christophe Giraud-Carrier, and Claire Kennedy. A higher-order approachto meta-learning. In 10th International Conference Inductive Logic Programming (ILP), pages 33 – 42, 2000.

classmethod ft_nodes_per_level(dt_model: DecisionTreeClassifier, dt_node_depths: Optional[ndarray] = None, non_leaf_nodes: Optional[ndarray] = None) → ndarray[source]

Compute the ratio of number of nodes per tree level in DT model.

Parameters

dt_modelsklearn.tree.DecisionTreeClassifier: The DT model.
dt_node_depthsnp.ndarray, optional: Depth of every DT model node. Argument used to take advantage of precomputations.
non_leaf_nodesnp.ndarray, optional: Boolean array with value True if a node is non-leaf. Argument used to take advantage of precomputations.

Returns

np.ndarray: Number of nodes per level.

References

1: Yonghong Peng, PA Flach, Pavel Brazdil, and Carlos Soares. Decision tree-based data characterization for meta-learning. In 2nd ECML/PKDD International Workshop on Integration and Collaboration Aspects of Data Mining, Decision Support and Meta-Learning(IDDM), pages 111 – 122, 2002a.

classmethod ft_nodes_repeated(dt_model: DecisionTreeClassifier, dt_info_table: Optional[ndarray] = None, non_leaf_nodes: Optional[ndarray] = None) → ndarray[source]

Compute the number of repeated nodes in DT model.

The number of repeated nodes is the number of repeated attributes that appear in the DT model.

Parameters

dt_modelsklearn.tree.DecisionTreeClassifier: The DT model.
dt_info_tablenp.ndarray, optional: DT model properties table calculated with extract_table method. Check its documentation for more information. Argument used to take advantage of precomputations.
non_leaf_nodesnp.ndarray, optional: Boolean array with value True if a node is non-leaf. Argument used to take advantage of precomputations.

Returns

np.ndarray: Absolute frequency of each repeated node.

References

1: Hilan Bensusan, Christophe Giraud-Carrier, and Claire Kennedy. A higher-order approachto meta-learning. In 10th International Conference Inductive Logic Programming (ILP), pages 33 – 42, 2000.

classmethod ft_tree_depth(dt_model: DecisionTreeClassifier, dt_node_depths: Optional[ndarray] = None) → ndarray[source]

Compute the depth of every node in the DT model.

Parameters

dt_modelsklearn.tree.DecisionTreeClassifier: The DT model.
dt_node_depthsnp.ndarray, optional: Depth of each node in the DT model. Argument used to take advantage of precomputations.

Returns

np.ndarray: Depth of every node in the DT model.

References

1: Yonghong Peng, PA Flach, Pavel Brazdil, and Carlos Soares. Decision tree-based data characterization for meta-learning. In 2nd ECML/PKDD International Workshop on Integration and Collaboration Aspects of Data Mining, Decision Support and Meta-Learning(IDDM), pages 111 – 122, 2002a.

classmethod ft_tree_imbalance(dt_model: DecisionTreeClassifier, leaf_nodes: Optional[ndarray] = None, dt_node_depths: Optional[ndarray] = None) → ndarray[source]

Compute the tree imbalance for each leaf node.

Parameters

dt_modelsklearn.tree.DecisionTreeClassifier: The DT model.
leaf_nodesnp.ndarray, optional: Boolean array with value True if a node is a leaf. Argument used to take advantage of precomputations.
dt_node_depthsnp.ndarray, optional: Depth of every DT model node. Argument used to take advantage of precomputations.

Returns

np.ndarray: Tree imbalance values for every leaf node.

References

1: Hilan Bensusan, Christophe Giraud-Carrier, and Claire Kennedy. A higher-order approachto meta-learning. In 10th International Conference Inductive Logic Programming (ILP), pages 33 – 42, 2000.

classmethod ft_tree_shape(dt_model: DecisionTreeClassifier, tree_shape: Optional[ndarray] = None, leaf_nodes: Optional[ndarray] = None, dt_node_depths: Optional[ndarray] = None) → ndarray[source]

Compute the tree shape for every leaf node.

The tree shape is the probability of arrive in each leaf given a random walk. We call this as the structural shape of the DT model.

Parameters

dt_modelsklearn.tree.DecisionTreeClassifier: The DT model.
tree_shapenp.ndarray, optional: This method’s output. Argument used to take advantage of precomputations.
leaf_nodesnp.ndarray, optional: Boolean array with value True if a node is a leaf. Used only if tree_shape is None. Argument used to take advantage of precomputations.
dt_node_depthsnp.ndarray, optional: Depth of every DT model node. Used only if tree_shape is None. Argument used to take advantage of precomputations.

Returns

np.ndarray: The tree shape for every leaf node.

References

1: Hilan Bensusan, Christophe Giraud-Carrier, and Claire Kennedy. A higher-order approachto meta-learning. In 10th International Conference Inductive Logic Programming (ILP), pages 33 – 42, 2000.

classmethod ft_var_importance(dt_model: DecisionTreeClassifier) → ndarray[source]

Compute the features importance of the DT model for each attribute.

It is calculated using the Gini index to estimate the amount of information used in the DT model.

Parameters

dt_modelsklearn.tree.DecisionTreeClassifier: The DT model.

Returns

np.ndarray: Features importance given by the DT model.

References

1: Hilan Bensusan, Christophe Giraud-Carrier, and Claire Kennedy. A higher-order approachto meta-learning. In 10th International Conference Inductive Logic Programming (ILP), pages 33 – 42, 2000.

classmethod precompute_model_based_class(N: ndarray, y: Optional[ndarray] = None, dt_model: Optional[DecisionTreeClassifier] = None, random_state: Optional[int] = None, hypparam_model_dt: Optional[Dict[str, Any]] = None, **kwargs) → Dict[str, Any][source]

Precompute the DT Model and some information related to it.

Parameters

Nnp.ndarray: Numerical fitted data.
ynp.ndarray, optional: Target attribute.
dt_modelnp.ndarray, optional: Decision tree fitted with the given data N and y. This argument can be used to use a previously fitted custom model.
random_stateint, optional: If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random. Used only if argument dt_model is None.
kwargs:: Additional arguments. May have previously precomputed before this method from other precomputed methods, so they can help speed up this precomputation.

Returns

dict

With following precomputed items:

dt_model (sklearn.tree.DecisionTreeClassifier): decision tree classifier either fitted with the given data N and y (if dt_model is None), or the given dt_model.
dt_info_table (np.ndarray): some tree properties table.
dt_node_depths (np.ndarray): the depth of each tree node ordered by node (e.g., index one contain the node one depth, the index two the node two depth and so on.)