pymfe.model_based.MFEModelBased
- class pymfe.model_based.MFEModelBased[source]
Keep methods for metafeatures of
model-based
group.The convention adopted for metafeature extraction related methods is to always start with
ft_
prefix to allow automatic method detection. This prefix is predefined within_internal
module.All method signature follows the conventions and restrictions listed below:
For independent attribute data,
X
meansevery type of attribute
,N
meansNumeric attributes only
andC
stands forCategorical attributes only
. It is important to note that the categorical attribute sets betweenX
andC
and the numerical attribute sets betweenX
andN
may differ due to data transformations, performed while fitting data into MFE model, enabled by, respectively,transform_num
andtransform_cat
arguments fromfit
(MFE method).Only arguments in MFE
_custom_args_ft
attribute (set up insidefit
method) are allowed to be required method arguments. All other arguments must be strictly optional (i.e., has a predefined default value).The initial assumption is that the user can change any optional argument, without any previous verification of argument value or its type, via kwargs argument of
extract
method of MFE class.The return value of all feature extraction methods should be a single value or a generic List (preferably a
np.ndarray
) type with numeric values.
There is another type of method adopted for automatic detection. It is adopted the prefix
precompute_
for automatic detection of these methods. These methods run while fitting some data into an MFE model automatically, and their objective is to precompute some common value shared between more than one feature extraction method. This strategy is a trade-off between more system memory consumption and speeds up of feature extraction. Their return value must always be a dictionary whose keys are possible extra arguments for both feature extraction methods and other precomputation methods. Note that there is a share of precomputed values between all valid feature-extraction modules (e.g.,class_freqs
computed in modulestatistical
can freely be used for any precomputation or feature extraction method of modulelandmarking
).- __init__(*args, **kwargs)
Methods
__init__
(*args, **kwargs)extract_table
(dt_model[, leaf_nodes])Bookkeep some information table from the
dt_model
into an array.ft_leaves
(dt_model)Compute the number of leaf nodes in the DT model.
ft_leaves_branch
(dt_model[, leaf_nodes, ...])Compute the size of branches in the DT model.
ft_leaves_corrob
(dt_model[, leaf_nodes, ...])Compute the leaves corroboration of the DT model.
ft_leaves_homo
(dt_model[, tree_shape, ...])Compute the DT model Homogeneity for every leaf node.
ft_leaves_per_class
(dt_model[, dt_info_table])Compute the proportion of leaves per class in DT model.
ft_nodes
(dt_model)Compute the number of non-leaf nodes in DT model.
ft_nodes_per_attr
(dt_model)Compute the ratio of nodes per number of attributes in DT model.
ft_nodes_per_inst
(dt_model)Compute the ratio of non-leaf nodes per number of instances in DT model.
ft_nodes_per_level
(dt_model[, ...])Compute the ratio of number of nodes per tree level in DT model.
ft_nodes_repeated
(dt_model[, dt_info_table, ...])Compute the number of repeated nodes in DT model.
ft_tree_depth
(dt_model[, dt_node_depths])Compute the depth of every node in the DT model.
ft_tree_imbalance
(dt_model[, leaf_nodes, ...])Compute the tree imbalance for each leaf node.
ft_tree_shape
(dt_model[, tree_shape, ...])Compute the tree shape for every leaf node.
ft_var_importance
(dt_model)Compute the features importance of the DT model for each attribute.
precompute_model_based_class
(N[, y, ...])Precompute the DT Model and some information related to it.
- classmethod extract_table(dt_model: DecisionTreeClassifier, leaf_nodes: Optional[ndarray] = None) ndarray [source]
Bookkeep some information table from the
dt_model
into an array.- Parameters
- dt_model
sklearn.tree.DecisionTreeClassifier
The DT model.
- leaf_nodes
np.ndarray
, optional Boolean array with value True if a node is a leaf. If given, can improve performance.
- dt_model
- Returns
np.ndarray
DT model properties table calculated with
extract_table
method. Check its documentation for more information.
Notes
- Each line in the returned array represents a node where:
Column 0: It is the id of the attribute splitted in that node.
Columns 1: It is the number of examples that fall on that node.
Columns 2: It is 0 if the node is not a leaf, otherwise is the class number represented by that leaf node.
- classmethod ft_leaves(dt_model: DecisionTreeClassifier) int [source]
Compute the number of leaf nodes in the DT model.
- Parameters
- dt_model
sklearn.tree.DecisionTreeClassifier
The DT model.
- dt_model
- Returns
- int
Number of leaf nodes in the DT model.
References
- 1
Yonghong Peng, PA Flach, Pavel Brazdil, and Carlos Soares. Decision tree-based data characterization for meta-learning. In 2nd ECML/PKDD International Workshop on Integration and Collaboration Aspects of Data Mining, Decision Support and Meta-Learning(IDDM), pages 111 – 122, 2002a.
- classmethod ft_leaves_branch(dt_model: DecisionTreeClassifier, leaf_nodes: Optional[ndarray] = None, dt_node_depths: Optional[ndarray] = None) ndarray [source]
Compute the size of branches in the DT model.
The size of branches consists in the depth of all leaves of the DT model.
- Parameters
- dt_model
sklearn.tree.DecisionTreeClassifier
The DT model.
- leaf_nodes
np.ndarray
, optional Boolean array with value True if a node is a leaf. Argument used to take advantage of precomputations.
- dt_node_depths
np.ndarray
, optional Depth of every DT model node. Argument used to take advantage of precomputations.
- dt_model
- Returns
np.ndarray
Size of branches of the DT model.
References
- 1
Yonghong Peng, PA Flach, Pavel Brazdil, and Carlos Soares. Decision tree-based data characterization for meta-learning. In 2nd ECML/PKDD International Workshop on Integration and Collaboration Aspects of Data Mining, Decision Support and Meta-Learning(IDDM), pages 111 – 122, 2002a.
- classmethod ft_leaves_corrob(dt_model: DecisionTreeClassifier, leaf_nodes: Optional[ndarray] = None, dt_info_table: Optional[ndarray] = None) ndarray [source]
Compute the leaves corroboration of the DT model.
The Leaves corroboration is the proportion of examples that belong to each leaf of the DT model.
- Parameters
- dt_model
sklearn.tree.DecisionTreeClassifier
The DT model.
- leaf_nodes
np.ndarray
, optional Boolean array with value True if a node is a leaf. Argument used to take advantage of precomputations.
- dt_info_table
np.ndarray
, optional DT model properties table calculated with
extract_table
method. Check its documentation for more information. Argument used to take advantage of precomputations.
- dt_model
- Returns
np.ndarray
Leaves corroboration for every leaf node.
References
- 1
Hilan Bensusan, Christophe Giraud-Carrier, and Claire Kennedy. A higher-order approachto meta-learning. In 10th International Conference Inductive Logic Programming (ILP), pages 33 – 42, 2000.
- classmethod ft_leaves_homo(dt_model: DecisionTreeClassifier, tree_shape: Optional[ndarray] = None, leaf_nodes: Optional[ndarray] = None, dt_node_depths: Optional[ndarray] = None) ndarray [source]
Compute the DT model Homogeneity for every leaf node.
The DT model homogeneity is calculated by the number of leaves divided by the
structural shape
(which is calculated by theft_tree_shape
method) of the DT model.- Parameters
- dt_model
sklearn.tree.DecisionTreeClassifier
The DT model.
- tree_shape
np.ndarray
, optional Tree shape as calculated in
ft_tree_shape
. Argument used to take advantage from precomputations.- leaf_nodes
np.ndarray
, optional Boolean array with value True if a node is a leaf. Used only if
tree_shape
is None. Argument used to take advantage of precomputations.- dt_node_depths
np.ndarray
, optional Depth of every DT model node. Used only if
tree_shape
is None. Argument used to take advantage of precomputations.
- dt_model
- Returns
np.ndarray
The DT model homogeneity for every leaf node.
References
- 1
Hilan Bensusan, Christophe Giraud-Carrier, and Claire Kennedy. A higher-order approachto meta-learning. In 10th International Conference Inductive Logic Programming (ILP), pages 33 – 42, 2000.
- classmethod ft_leaves_per_class(dt_model: DecisionTreeClassifier, dt_info_table: Optional[ndarray] = None) ndarray [source]
Compute the proportion of leaves per class in DT model.
This quantity is computed by the proportion of leaves of the DT model associated with each class.
- Parameters
- dt_model
sklearn.tree.DecisionTreeClassifier
The DT model.
- dt_info_table
np.ndarray
, optional DT model properties table calculated with
extract_table
method. Check its documentation for more information. Argument used to take advantage of precomputations.
- dt_model
- Returns
np.ndarray
Leaves per class.
References
- 1
Andray Filchenkov and Arseniy Pendryak. Datasets meta-feature description for recom-mending feature selection algorithm. In Artificial Intelligence and Natural Language and Information Extraction, Social Media and Web Search FRUCT Conference (AINL-ISMWFRUCT), pages 11 – 18, 2015.
- classmethod ft_nodes(dt_model: DecisionTreeClassifier) int [source]
Compute the number of non-leaf nodes in DT model.
- Parameters
- dt_model
sklearn.tree.DecisionTreeClassifier
The DT model.
- dt_model
- Returns
- int
Number of non-leaf nodes.
References
- 1
Yonghong Peng, PA Flach, Pavel Brazdil, and Carlos Soares. Decision tree-based data characterization for meta-learning. In 2nd ECML/PKDD International Workshop on Integration and Collaboration Aspects of Data Mining, Decision Support and Meta-Learning(IDDM), pages 111 – 122, 2002a.
- classmethod ft_nodes_per_attr(dt_model: DecisionTreeClassifier) float [source]
Compute the ratio of nodes per number of attributes in DT model.
- Parameters
- dt_model
sklearn.tree.DecisionTreeClassifier
The DT model.
- dt_model
- Returns
- float
Ratio of the number of non-leaf nodes per number of attributes.
References
- 1
Hilan Bensusan, Christophe Giraud-Carrier, and Claire Kennedy. A higher-order approachto meta-learning. In 10th International Conference Inductive Logic Programming (ILP), pages 33 – 42, 2000.
- classmethod ft_nodes_per_inst(dt_model: DecisionTreeClassifier) float [source]
Compute the ratio of non-leaf nodes per number of instances in DT model.
- Parameters
- dt_model
sklearn.tree.DecisionTreeClassifier
The DT model.
- dt_model
- Returns
- float
Ratio of the number of non-leaf nodes per instances.
References
- 1
Hilan Bensusan, Christophe Giraud-Carrier, and Claire Kennedy. A higher-order approachto meta-learning. In 10th International Conference Inductive Logic Programming (ILP), pages 33 – 42, 2000.
- classmethod ft_nodes_per_level(dt_model: DecisionTreeClassifier, dt_node_depths: Optional[ndarray] = None, non_leaf_nodes: Optional[ndarray] = None) ndarray [source]
Compute the ratio of number of nodes per tree level in DT model.
- Parameters
- dt_model
sklearn.tree.DecisionTreeClassifier
The DT model.
- dt_node_depths
np.ndarray
, optional Depth of every DT model node. Argument used to take advantage of precomputations.
- non_leaf_nodes
np.ndarray
, optional Boolean array with value True if a node is non-leaf. Argument used to take advantage of precomputations.
- dt_model
- Returns
np.ndarray
Number of nodes per level.
References
- 1
Yonghong Peng, PA Flach, Pavel Brazdil, and Carlos Soares. Decision tree-based data characterization for meta-learning. In 2nd ECML/PKDD International Workshop on Integration and Collaboration Aspects of Data Mining, Decision Support and Meta-Learning(IDDM), pages 111 – 122, 2002a.
- classmethod ft_nodes_repeated(dt_model: DecisionTreeClassifier, dt_info_table: Optional[ndarray] = None, non_leaf_nodes: Optional[ndarray] = None) ndarray [source]
Compute the number of repeated nodes in DT model.
The number of repeated nodes is the number of repeated attributes that appear in the DT model.
- Parameters
- dt_model
sklearn.tree.DecisionTreeClassifier
The DT model.
- dt_info_table
np.ndarray
, optional DT model properties table calculated with
extract_table
method. Check its documentation for more information. Argument used to take advantage of precomputations.- non_leaf_nodes
np.ndarray
, optional Boolean array with value True if a node is non-leaf. Argument used to take advantage of precomputations.
- dt_model
- Returns
np.ndarray
Absolute frequency of each repeated node.
References
- 1
Hilan Bensusan, Christophe Giraud-Carrier, and Claire Kennedy. A higher-order approachto meta-learning. In 10th International Conference Inductive Logic Programming (ILP), pages 33 – 42, 2000.
- classmethod ft_tree_depth(dt_model: DecisionTreeClassifier, dt_node_depths: Optional[ndarray] = None) ndarray [source]
Compute the depth of every node in the DT model.
- Parameters
- dt_model
sklearn.tree.DecisionTreeClassifier
The DT model.
- dt_node_depths
np.ndarray
, optional Depth of each node in the DT model. Argument used to take advantage of precomputations.
- dt_model
- Returns
np.ndarray
Depth of every node in the DT model.
References
- 1
Yonghong Peng, PA Flach, Pavel Brazdil, and Carlos Soares. Decision tree-based data characterization for meta-learning. In 2nd ECML/PKDD International Workshop on Integration and Collaboration Aspects of Data Mining, Decision Support and Meta-Learning(IDDM), pages 111 – 122, 2002a.
- classmethod ft_tree_imbalance(dt_model: DecisionTreeClassifier, leaf_nodes: Optional[ndarray] = None, dt_node_depths: Optional[ndarray] = None) ndarray [source]
Compute the tree imbalance for each leaf node.
- Parameters
- dt_model
sklearn.tree.DecisionTreeClassifier
The DT model.
- leaf_nodes
np.ndarray
, optional Boolean array with value True if a node is a leaf. Argument used to take advantage of precomputations.
- dt_node_depths
np.ndarray
, optional Depth of every DT model node. Argument used to take advantage of precomputations.
- dt_model
- Returns
np.ndarray
Tree imbalance values for every leaf node.
References
- 1
Hilan Bensusan, Christophe Giraud-Carrier, and Claire Kennedy. A higher-order approachto meta-learning. In 10th International Conference Inductive Logic Programming (ILP), pages 33 – 42, 2000.
- classmethod ft_tree_shape(dt_model: DecisionTreeClassifier, tree_shape: Optional[ndarray] = None, leaf_nodes: Optional[ndarray] = None, dt_node_depths: Optional[ndarray] = None) ndarray [source]
Compute the tree shape for every leaf node.
The tree shape is the probability of arrive in each leaf given a random walk. We call this as the
structural shape of the DT model.
- Parameters
- dt_model
sklearn.tree.DecisionTreeClassifier
The DT model.
- tree_shape
np.ndarray
, optional This method’s output. Argument used to take advantage of precomputations.
- leaf_nodes
np.ndarray
, optional Boolean array with value True if a node is a leaf. Used only if
tree_shape
is None. Argument used to take advantage of precomputations.- dt_node_depths
np.ndarray
, optional Depth of every DT model node. Used only if
tree_shape
is None. Argument used to take advantage of precomputations.
- dt_model
- Returns
np.ndarray
The tree shape for every leaf node.
References
- 1
Hilan Bensusan, Christophe Giraud-Carrier, and Claire Kennedy. A higher-order approachto meta-learning. In 10th International Conference Inductive Logic Programming (ILP), pages 33 – 42, 2000.
- classmethod ft_var_importance(dt_model: DecisionTreeClassifier) ndarray [source]
Compute the features importance of the DT model for each attribute.
It is calculated using the Gini index to estimate the amount of information used in the DT model.
- Parameters
- dt_model
sklearn.tree.DecisionTreeClassifier
The DT model.
- dt_model
- Returns
np.ndarray
Features importance given by the DT model.
References
- 1
Hilan Bensusan, Christophe Giraud-Carrier, and Claire Kennedy. A higher-order approachto meta-learning. In 10th International Conference Inductive Logic Programming (ILP), pages 33 – 42, 2000.
- classmethod precompute_model_based_class(N: ndarray, y: Optional[ndarray] = None, dt_model: Optional[DecisionTreeClassifier] = None, random_state: Optional[int] = None, hypparam_model_dt: Optional[Dict[str, Any]] = None, **kwargs) Dict[str, Any] [source]
Precompute the DT Model and some information related to it.
- Parameters
- N
np.ndarray
Numerical fitted data.
- y
np.ndarray
, optional Target attribute.
- dt_model
np.ndarray
, optional Decision tree fitted with the given data N and y. This argument can be used to use a previously fitted custom model.
- random_stateint, optional
If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random. Used only if argument
dt_model
is None.- kwargs:
Additional arguments. May have previously precomputed before this method from other precomputed methods, so they can help speed up this precomputation.
- N
- Returns
dict
- With following precomputed items:
dt_model
(sklearn.tree.DecisionTreeClassifier
): decision tree classifier either fitted with the given dataN
andy
(ifdt_model
is None), or the givendt_model
.dt_info_table
(np.ndarray
): some tree properties table.dt_node_depths
(np.ndarray
): the depth of each tree node ordered by node (e.g., index one contain the node one depth, the index two the node two depth and so on.)