Note
Click here to download the full example code
A developer sample class for Metafeature groups.
This class was built to give a model of how you should write a metafeature group class as a Pymfe developer. Please read this entire guide with attention before programming your own class.
- At the end of this reading, you will know:
How to register your class as a valid MFE metafeature class
Which are the special method name prefixes, and how to properly use each of them
Which are the rules involving precomputation, metafeature extraction and post-processing methods
Which are the coding practices usually adopted in this library, that you should follow in order to get your changes accepted in the master branch
Also, feel free to copy this file to use as template for your own class.
First, some tips and tricks which may help you follow the code standards stabilished in this library.
1. Use type annotations as much as possible
Always run mypy to check if the variable types was specified correctly. You can install it with pip using the following command line:
>>> pip install -U mypy
Use the following command before pushing your modifications to the remote repository:
>>> mypy pymfe --ignore-missing-imports
The expected output for this command is no output.
Note that all warnings must be fixed to your modifications be accepted in the master branch, so take your time to fix your variable types carefully.
2. Use pylint to check your code style and auto-formatters such as yapf
Pylint can be used to check if your code follow some coding practices adopted by the python community. You can install with with pip using the following command:
>>> pip install -U pylint
It can be harsh sometimes, so we have decided to disable some of the verifications. You can use the following command to check if your code met the standards stabilished in this library,
>>> pylint pymfe -d 'C0103, R0913, R0902, R0914, C0302, R0904, R0801, E1101'
The expected output is something like
>>> Your code has been rated at 10.00/10 (previous run: x/10, y)
Your code will not be accepted in the master branch unless it gets the maximum pylint score.
Yapf is a code auto-formatter which usually solves a large amount of coding style related issues automatically.
>>> pip install -U yapf
If you use the flag -i
, Yapf changes your code in-place.
>>> yapf -i yourModulename.py
3. Make all verifications with the provided Makefile
You can use the Makefile provided in the root directory to run mypy, pylint, and also pytest. Obviously, all tests (both for coding style and programming logic) must pass in order to your modifications be accepted.
You can use the tag test-cov
for run tests and get the code coverage:
>>> make test-cov
You can use the tag test
for only run the tests:
>>> make test
You can use the tag code-check
for make all verifications with mypy,
pylint and pep8:
>>> make code-check
Remember that your code must pass all verifications included in both
code-check
and test
/test-cov
to your changes be accepted in
the master branch.
Note
This example shows how to create a new group of meta-features. If you want only to add a new meta-feature, you should insert it in the meta-feature group file and create an “ft_” method to it. The new meta-feature will be automatically picked up (as the method “ft_metafeature_name” in this example). You should not forget to use the precompute methods to save time.
Note
You should not forget to create tests for all new functionalities that you implemented. All tests can be found in ./tests/ fold. Please follow the existing code style while creating your tests as much as possible.
Note
This class is being updated in GitHub, check this link to see the current version.
import typing as t
import time
import numpy as np
class MFEBoilerplate:
"""The class name must start with ``MFE`` (just to keep code consistency)
concatenated with the corresponding metafeature group name (e.g.,
``MFEStatistical`` or ``MFEGeneral``) in CamelCase format.
Also, the class must be registered in the ``_internal.py`` module to be
an official MFE class, because the pymfe framework is supposed to detect
the metafeature extraction methods automatically, so you must explain
where it is supposed to look for those methods.
Three tuples at module level in ``_internal.py`` module must be updated
to your new class be detected correctly:
1. VALID_GROUPS: str
Here you should write the name of your metafeature group. (e.g.,
``statistical`` or ``general``. This name is the value that will
be given by the user in the ``groups`` MFE parameter to extract
the all the metafeatures programmed here. Please select a
sufficiently representative name for your metafeature group.
2. GROUP_PREREQUISITES : str or :obj:`tuple` of str
Use this tuple to register other MFE metafeature group classes
as dependencies of your class. This means that, if the user ask
to extract the metafeatures of your class, then all metafeature
groups in the prerequisites will also be extracted also (even if
the user doesn't ask explicity for these groups). Note that the
possible consequences this may imply must be solved within this
class post-processing methods (these methods will be explained
later in this same guide.)
The values of this tuple can be strings (which means one single
dependency), sequences with strings (which means your class has
multiple dependencies), or simply None (which means your class
has no dependencies). Generally your class will not have any
dependency, so just stick to the last option if you are not sure
so far.
3. VALID_MFECLASSES : MFE Classes
In this tuple you should just insert a reference to your class.
Note that this imply that this module must be imported at the top
of the module ``_internal.py``.
These three tuples have one-to-one correspondence using the indexes,
so the order of values does matter. Please insert your class in the
same index for all three tuples.
===================================================================
For example, if we want to make this specific template class an official
MFE class, those three tuples should be updated as follows: (Remember that
all tuples below are found in ``_internal.py`` module.)
-------------------------------------------------------------------
# 1. First, choose carefully a metafeature group name. This value will be
# used directly by the user when extracting the metafeatures programmed
# in this class, so it must be meaningful and as short as possible.
VALID_GROUPS = (
...,
"boilerplate",
)
-------------------------------------------------------------------
# 2. Generally your class will not have any dependency, so you should
# just register ``None`` as prerequisites. Remember that a class can
# have any number of dependencies (0, 1 or more than 1.)
GROUP_PREREQUISITES = (
...,
None,
)
-------------------------------------------------------------------
# 3. The last step is to insert your class in this tuple below.
# Remember to import your module in the ``_internal.py`` module.
# So, for instance, to register this class, 'MFEBoilerplate', as
# an official MFE metafeature extractor class, we should make the
# following modifications:
import pymfe._dev as _dev
VALID_MFECLASSES = (
...,
_dev.MFEBoilerplate,
)
After this three simple steps, your class is now an official MFE
metafeature extraction class. From now on you no longer need to
worry about the ``_internal.py`` module and any other external
pymfe module.
===================================================================
Now that you know how to handle the issues related to the
``_internal.py`` module, let's start with the actual MFE class
development.
This tutorial is built to introduce all the different elements
following the natural order of how a regular MFE Class is usually
presented.
Therefore, the order that we shall see the different concepts in
this guide is:
1. Precomputation methods (prefixed with ``precompute_``)
Methods related to this subject:
1.1 precompute_basic_precomp_method
1.2 precompute_more_info
1.3 precompute_random_values
2. Feature extraction methods (prefixed with ``ft_``)
Methods related to this subject:
2.1 ft_metafeature_name
2.2 ft_fitted_data_arguments
2.3 ft_using_precomputed_values
2.4 ft_about_return_values
3. Regular/auxiliary methods (non-prefixed )
Methods related to this subject:
3.1 _protected_methods
3.2 non_protected_methods_without_any_prefixes
4. Postprocessing methods (prefixed with ``postprocess_``)
Methods related to this subject:
4.1 postprocess_groupName1_groupName2
So, we shall start looking at a example of a precomputation
method.
"""
# Important detail: all methods must be classmethods; there is no class
# instantiation in the pymfe framework.
@classmethod
def precompute_basic_precomp_method(cls,
y: t.Optional[np.ndarray] = None,
argument_bar: t.Optional[int] = None,
**kwargs) -> t.Dict[str, t.Any]:
"""A precomputation method example.
The pydoc of each method must explain cleary what is the purpose of
that method. This method is supposed to introduce a powerful concept
of the pymfe framework: precomputation methods.
1. Why precomputation methods?
-----------------------------------------------------------------
All methods whose name is prefixed with ``precompute_`` are
executed automatically before the metafeature extraction. These
methods are extremely important to improve the performance of
the Pymfe library, as it is quite common that different metafeature
extraction methods uses the very same information.
The idea behind this type of methods is to fill up a shared cache
with all values that can be shared by different metafeature extraction
methods, and also between different metafeature group classes. This
means that the values precomputed in ``MFEFoo`` class can also be used
in the ``MFEBar`` class methods.
2. Naming convention of a precomputation method
-----------------------------------------------------------------
The name of the method does not matter, as long as it starts with
the prefix ``precompute_``. This prefix is used to tell the Pymfe
framework that this is a precomputation method. As you will see during
this guide, the Pymfe rely heavily on prefixes in the method names, so
it is important that you don't forget them, and use them appropriately.
3. Arguments of a precomputation method
-----------------------------------------------------------------
The structure of these precomputation methods is pretty simple. In the
arguments you can specify custom parameters such as ``X`` and ``y``
that are automatically given by the MFE class. Those attributes can be
registered in a special attribute in the MFE class, or also given by
the user, but you should not rely on this feature; just stick to the
MFE registered arguments, and let all user-customizable attributes
have a default value. How exactly those arguments arrive as method
arguments is not important to develop an MFE metafeature extraction
class. If you're curious, you should examine the ``mfe.py`` and
``_internal.py`` modules by yourself, but it will take some time and
is not encouraged unless you plan an actual framework redesign.
It is obligatory to receive the ``kwargs`` in every precomputation
method. You are free to pick up values from it. We recommend you to
use the ``get`` method for this task. However, it is forbidden to
remove or modify the existing values in it. This parameter must be
considered ``read-only`` except to the insertion of new key-value
pairs. The reason behind this is that there's no guarantee of any
execution order of the precomputation methods within a class and
neither between classes, so all precomputation methods must have
the chance to read the same values.
4. Return values of precomputation methods
-----------------------------------------------------------------
All precomputation methods must return a dictionary with strings as
keys. The value data type can be anything. Note that the name of the
keys will be used later to match the argument names of feature
extraction methods. It means that, if you return a dictionary in the
form: {'foo': 1, 'bar': ('a', 'b')}, then all feature extraction
methods with an argument named ``foo`` will receive value ``1``, and
every method with argument named ``bar`` will receive a tuple with 'a'
and 'b' elements. Always choose meaningful key/argument names.
As this framework rely on a dictionary to distribute the parameters
between feature extraction methods, your precomputed keys should never
replace existing keys with different values, and you should not give
the same name to parameters with different semantics or purposes.
The rule of thumb for the pymfe lybrary is: 'if two things have the
same name, then they are the same thing'. Therefore, avoid extremely
generic argument names such as ``freqs``, ``mean``, ``model`` etc.
5. The user can disable precomputation methods
-----------------------------------------------------------------
Keep in mind that the user can disable the precomputation methods,
mainly due to memory constraints.
Never rely on these methods to produce any mandatory arguments. All
the precomputed values here should go to optional parameters and all
receptor metafeature extraction methods must be responsible to verify
if all values were effectively precomputed (i.e., they are not
``None``). If this is not the case, unfortunately these methods must
compute those arguments for themselves. If it is not clear how it
works for you by now, it will probably be easier to grasp when we
reach our first actual metafeature extraction method. For now, it is
just important to keep in mind that: you will need to recompute all
the stuff precomputed in every precomputations methods inside other
methods whenever those values are needed for the case when the user
disable the precomputation methods.
Parameters
----------
y : :obj:`np.ndarray`, optional
Always give clear and meaningful description to every argument.
argument_bar : int, optional
Some user-given attribute.
**kwargs:
Additional arguments. May have previously precomputed before
this method from other precomputed methods, so they can help
speed up this precomputation avoiding duplicated work.
Returns
-------
:obj:`dict`
The following precomputed items are returned:
* ``y_unique`` (:obj:`np.ndarray`): unique values from
``y``, if it is not None.
* ``absolute_bar`` (float): absolute value of
``argument_bar``, if it is not None.
"""
precomp_vals = {} # type: t.Dict[str, t.Any]
# Always consider that your precomputation argument could
# be precomputed by another precomputation method (even if
# from a different module), so check if the new key is not
# already in kwargs before calculating anything.
if argument_bar is not None and "absolute_bar" not in kwargs:
precomp_vals["absolute_bar"] = abs(argument_bar)
# The number of precomputed values within a single precomputation
# method vary greatly, from just a single value to a few amount.
# As long as all values are semantically sufficiently related with
# each other, you don't need to create new precomputation methods.
if y is not None and "y_unique" not in kwargs:
y_unique = np.unique(y, return_counts=False)
precomp_vals["y_unique"] = y_unique
# Always return a dictionary, even if it is empty
return precomp_vals
@classmethod
def precompute_more_info(cls,
argument_bar: t.Optional[int] = None,
**kwargs) -> t.Dict[str, t.Any]:
"""Highly relevant information about precomputation methods.
1. How many precomputation methods per class?
-----------------------------------------------------------------
Every MFE metafeature extraction class can have as many of
precomputation methods as needed. Don't hesitate to create
new precomputation methods whenever you think it will help
to improve the performance of the package.
2. How many precomputed values per precomputation method?
-----------------------------------------------------------------
There is no limit of how many values can be precomputed within
a single precomputation method.
However, try to keep every precomputation method precompute only
related values to avoid confusion. Prefer to calculate dissociated
values in distinct precomputation methods.
3. Using other precomputed values in a precomputation method
-----------------------------------------------------------------
Don't rely on the execution order of precomputation methods. Always
assume that the precomputation methods (even within the same class)
can be executed in any order. However, it does not mean that you
can't at least try to use previously precomputed methods: that's why
the 'kwargs' is used in all precomputation methods.
If needed, try to get a value from 'kwargs' using the 'get' method
(i.e., kwargs.get('argument_name') - remember 'kwargs' is just a
Python dictionary.) Then, check whether that value was successfully
gotten (i.e., is not None).
Parameters
----------
argument_bar : int, optional
Some user-given attribute. Note that it has the same value as
in the previous precomputation method, because it is the same
argument (it has the same name.)
**kwargs:
Additional arguments. May have previously precomputed before
this method from other precomputed methods, so they can help
speed up this precomputation avoiding duplicated work.
Returns
-------
:obj:`dict`
The following precomputed items are returned:
* ``double_absolute_bar`` (int): two times the
value of ``absolute_bar``, which may or may not
be precomputed in the previous precomputation
method. If it is not the case, we precompute
``absolute_bar`` here and also store its value.
* ``qux`` (float): value is equal to 1.0.
* ``quux`` (:obj:`complex`) Imaginary value related to
``qux``.
* ``quuz`` (:obj:`np.ndarray`): an sequence based
on ``qux``.
"""
precomp_vals = {} # type: t.Dict[str, t.Any]
if argument_bar is not None and "double_absolute_bar" not in kwargs:
# May have been precomputed from another precomputation method
absolute_bar = kwargs.get("absolute_bar")
# Wrong! 'absolute_bar' may be None
# precomp_vals["double_absolute_bar"] = 2 * absolute_bar
if absolute_bar is None:
absolute_bar = abs(argument_bar)
# Because we needed to calculate 'absolute_bar' here, does
# not hurt also storing this value also, to prevent it
# being recalculated in 'precompute_basic_precomp_method'.
precomp_vals["absolute_bar"] = absolute_bar
# Correct: now 'absolute_bar' is guaranteed to be not None
precomp_vals["double_absolute_bar"] = 2 * absolute_bar
if not {"qux", "quux", "quuz"}.issubset(kwargs):
precomp_vals["qux"] = 1.0
precomp_vals["quux"] = 5 + 1.0j * (precomp_vals["qux"])
precomp_vals["quuz"] = np.array(
[precomp_vals["qux"] + i for i in np.arange(5)])
return precomp_vals
@classmethod
def precompute_random_values(cls,
random_state: t.Optional[int] = None,
**kwargs) -> t.Dict[str, t.Any]:
"""Precomputation method with pseudo-random behavior.
1. An important pymfe default argument for you: 'random_state'
-----------------------------------------------------------------
If you are using anything with pseudo-random properties, you shall
always get the pymfe framework global random seed using the
``random_state`` argument. This seed is user defined. You can get
it for any precomputation, metafeature extraction or post-processing
methods.
2. Important aspects related to pseudo-random behaviour
-----------------------------------------------------------------
Uncontrolled pseudo-random behavior is absolutely forbidden in
this package.
Also, pseudo-random methods must have related automated tests.
Therefore, setting up the random seed (as long as the user define
it) is never optional.
Parameters
----------
random_state : int, optional
If given, controls the pseudo-random behavior inside this
method, so the results will be reproducible.
**kwargs:
Additional arguments. May have previously precomputed before
this method from other precomputed methods, so they can help
speed up this precomputation avoiding duplicated work.
Returns
-------
:obj:`dict`
The following precomputed items are returned:
* ``random_special_num`` (float): a random value
that must be controlled by the random seed specified
by the user using the ``random_state`` pymfe framework
global argument.
"""
precomp_vals = {} # type: t.Dict[str, t.Any]
if "random_special_num" not in kwargs:
if random_state is not None:
np.random.seed(random_state)
aux = np.random.randint(-5, 5, size=10)
precomp_vals["random_special_num"] = np.random.choice(aux, size=1)
return precomp_vals
@classmethod
def ft_metafeature_name(
cls,
X: np.ndarray,
y: np.ndarray,
random_state: t.Optional[int] = None,
opt_arg_bar: float = 1.0,
opt_arg_baz: np.ndarray = None,
) -> int:
"""Single-line description of this feature extraction method.
The purpose of this method is to introduce the first actual
metafeature extraction method.
1. Metafeature extraction methods: the most important ones
-----------------------------------------------------------------
Similarly to the precomputation methods, the feature extraction
method names are also prefixed. All your feature extraction method
names must be prefixed with ``ft_``.
2. The pymfe framework provides arguments automatically
-----------------------------------------------------------------
As mentioned in the documentation of the very first precomputation
method, the pymfe framework is responsible to provide to every
precomputation (those prefixed with ``precompute_``, metafeature
extraction (those prefixed with ``ft_``) and post-processing (we
will see those later) methods its arguments. 'How?', you may ask.
The short answer is dictionary unpacking: the MFE class holds some
dictionaries that are unpacked while calling those prefixed methods.
Then, if a method's argument happens to match with a dictionary
key, that argument will assume the matched key value.
All precomputed values are packed into one of those dictionaries
(and it happens automatically; you don't need to worry about it.)
Therefore, the same value provided as the key of some precomputed
dictionary is used to match directly the parameter name. All
parameters must be treated as read-only values; it is forbidden to
modify any value inside any feature extraction method.
We will see more about which default parameters are given by the
pymfe framework soon in the ``ft_fitted_data_arguments`` method
just below. However, if you want to see with your own eyes the
actual values, you can check out search for the instance attribute
``mfe.MFE._custom_args_ft`` of the MFE class (inside the ``mfe.py``
module). This attribute is set up inside the ``mfe.MFE.fit`` method.
If you have a very good reason, feel free to insert new values
in there if (and only if) they are needed. Note that it is highly
unlikely.
2. Mandatory & optional arguments of metafeature extraction methods
-----------------------------------------------------------------
The only arguments allowed to be mandatory (i.e., arguments without
any default value) are the ones registered inside the MFE attribute
``_custom_args_ft`` (check this out in the ``mfe.py`` module.)
All other values must have a default value, without any exception.
Remember that all arguments can be customized directly by the user
while calling the ``extract`` MFE method. You usually don't need
to worry about if the user uses incorrect data types for the
arguments, as it will most probably raise an TypeError exception.
However, sometimes you should consider handling incorrect values
(such as probability arguments with values not within the range
0 and 1.) Usually, just returning ``np.nan`` (if your metafeature
is non-summarizable) or ``np.array([np.nan])`` (if your metafeature
is summarizable) is one way to go when handling incorrect arguments.
3. Return values of metafeature extraction methods
-----------------------------------------------------------------
We'll see about this soon in the ``ft_about_return_values`` method.
Arguments
---------
X : :obj:`np.ndarray`
All attributes fitted in the model (numerical and categorical
ones). While writing your method documentations, you don't need
to write about very common arguments such as ``X``, ``y``, ``N``
and ``C``. In fact, you are encouraged to just omit these.
y : :obj:`np.ndarray`
Target attributes. Again, no need to write about these type of
arguments in the method documentation, as it can get way too
much repetitive without any information gain.
random_state : int, optional
Extremely important argument. This one is a fixed feature from the
MFE framework. If your method has ANY pseudo-random behaviour,
you should use specifically this argument to provide the random
seed. In this case, it would be nice if you write about what
is the random behaviour of your method to make clear to the
user why he or she ever needs a random seed in the first place.
opt_arg_bar : float, optional
Argument used to detect carbon footprints of hungry dinosaurs.
opt_arg_baz : :obj:`np.ndarray`, optional
If None, this argument is foo. Otherwise, this argument is bar.
Returns
-------
int
Give a clear description about the returned value.
Notes
-----
You can use the notes section of the documentation to provide
references, and also ``very specific`` details of the method.
"""
# Inside the feature extraction method you can do whenever you
# want, just make sure to:
# 1. Always return a single number, a single np.nan or a numpy
# array with numeric values (or np.nan) - no exceptions!
# 2. Make it run as fast as possible. Metafeatures with high
# computational complexity are discouraged.
# You can raise ValueError, TypeError and LinAlgError exceptions.
if opt_arg_bar <= 0.0:
raise ValueError("'opt_arg_bar' must be positive!")
# When using pseudo-random functions, ALWAYS use random_state
# to enforce experiment replication. Uncontrolled pseudo-random
# behavior is absolutely forbidden.
if opt_arg_baz is None:
np.random.seed(random_state)
opt_arg_baz = np.random.choice(10, size=5, replace=False)
aux_1, aux_2 = np.array(X.shape) * y.size
np.random.seed(random_state)
random_ind = np.random.randint(opt_arg_baz.size)
ret = aux_1 * opt_arg_bar / (aux_2 + opt_arg_baz[random_ind])
return ret
@classmethod
def ft_fitted_data_arguments(cls, X: np.ndarray, N: np.ndarray,
C: np.ndarray, y: np.ndarray) -> int:
"""Information about some arguments related to fitted data.
1. Handling Numerical, Categorical and Mixed data types
-----------------------------------------------------------------
Not all feature extraction methods handles all type of data. Some
methods only work for numerical values, while others works only for
categorical values. A few ones work for both data types, but this
is generally not the case.
The Pymfe framework provides easy access to the fitted data
attributes separated by data type (numerical and categorical).
You can use the attribute ``X`` to get all the original fitted
data (without any data transformations), attribute ``N`` to get
only the numerical attributes and, similarly, ``C`` to get only
the categorical attributes.
Arguments
---------
X : :obj:`np.ndarray`
All fitted original data, without any data transformation
such as discretization or one-hot encoding.
N : :obj:`np.ndarray`
Just numerical attributes of the fitted data, with possibly
categorical data one-hot encoded (if the user uses this
type of transformation.)
C : :obj:`np.ndarray`
Just the categorical attributes of the fitted data, with
possibly numerical data discretized (if the user uses
this type of transformation.)
y : :obj:`np.ndarray`
Target attribute.
Returns
-------
int
Some important return value.
Notes
-----
You can even receive more than one of these attributes in the
same method, but keep in mind that this may cause confusion as
the user may enable or disable data transformations (encoding
for categorical values and discretization for numerical values).
"""
ret = np.array(X.shape) + np.array(N.shape) + np.array(C.shape)
return np.prod(ret) * y.size
@classmethod
def ft_using_precomputed_values(
cls,
y: np.ndarray,
# y_unique: np.ndarray, # Wrong! Need an default value.
y_unique: t.Optional[np.ndarray] = None) -> np.ndarray:
"""Metafeature extraction method using precomputed values.
1. How to use precomputed arguments
-----------------------------------------------------------------
Within any metafeature extraction method, you can safely assume that
all precomputation methods (even the ones of other MFE classes) were
all executed (successfully or not!), and their values are hopefully
ready to be used as arguments. Note that the pymfe framework has a
huge resilience against exceptions, so the code will most probably
continue to flow even if a few precomputation methods were not
successful for some reasons (e.g., math domain errors.)
To get precomputed values is no different than getting a pymfe
default automatic argument (such as ``X`` and ``y``): just match
the argument name with the precomputed dictionary key. For
instance, the argument ``y_unique`` was precomputed in the
``precompute_basic_precomp_method`` and is probably ready to be used in
this metafeature extraction method, IF the user does not
disabled the precomputations. As we can't guarantee whether the
user will or will not disable the precomputations, we need to
always check if ``y_unique`` is different than ``None`` before
using it. If, unfortunatelly, it is not the case, then we need
to compute ``y_unique`` inside this method.
2. When to use precomputed arguments
-----------------------------------------------------------------
Always! :)
3. The precomputation cache is shared among all pymfe classes
-----------------------------------------------------------------
Remember that you can also use precomputed values from other
pymfe metafeature extraction classes (and, therefore, your
precomputed values will also be automatically available to the
other classes aswell.)
Arguments
---------
y : :obj:`np.ndarray`
Target attribute.
y_unique : :obj:`np.ndarray`, optional
Argument precomputed in the ``precompute_basic_precomp_method``
precomputation method. Note that it must be an optional
argument (because it is forbidden to rely on precomputation
methods to fill mandatory arguments, as the user can disable
precomputation methods whenever he or she wants.) Note also
that the argument name must match exatcly the corresponding
dictionary key given inside the precomputation method.
Returns
-------
:obj:`np.ndarray`
Describe your return value.
"""
# res = -1.0 * y_unique # Wrong! 'y_unique' may be None!
# You need to verify if precomputed values is None. If this
# is the case, you need to manually compute it inside the method
# that needs that value.
if y_unique is None:
# If ``y_unique`` is None, it means probably that the user
# disabled the precomputations (or something went wrong inside
# the precomputation method,) so we need to compute
# it now as this argument is needed to compute the
# method's output.
# Obviously, the computation inside the metafeature
# extraction method must be identical to the computation
# in the precomputation method, as both results must
# always match. Once again, remember:
# 'If two things have the same name, then they are the
# same thing'.
y_unique = np.unique(y, return_counts=False)
res = -1.0 * y_unique # Correct: 'y_unique' is surely not None
return res
@classmethod
def ft_about_return_values(
cls,
y: np.ndarray,
) -> np.ndarray:
"""Information about return values of feature extraction methods.
1. You have two return options for metafeature extraction methods
-----------------------------------------------------------------
The return value of any feature extraction method should be
a single value (int, float, numpy number, or a :obj:`np.nan`,)
or a numpy array. This array must contain only numbers or
:obj:`np.nan`.
2. What's the difference?
-----------------------------------------------------------------
If the return value is a single number, the output value of this
method will be transformed directly into a MFE class extract output.
If it is a numpy array, then this output will automatically be
summarized using every user selected summary functions.
3. A more detailed explanation
-----------------------------------------------------------------
If you return a single value, your metafeature is said to be
'non-summarizable'. It means that the value your method return is
the value the user will get. If you need to return an invalid
value, always return 'np.nan'.
If you return an numpy array, then your metafeature is said to be
'summarizable', and the user will get a few statistics related to
the values your method returns (instead of the actual values):
its mean, standard deviation, quantiles, variance etc. It will
happen automatically, and you should not worry about this. You
can put 'np.nan' inside your array. If you need to return an
entire invalid array, consider returning 'np.array([np.nan])'.
DO NOT return a single 'np.nan', as it is reserved for the
'non-summarizable' metafeature extraction methods.
Arguments
---------
y : :obj:`np.ndarray`
Target attribute.
Returns
-------
:obj:`np.ndarray`
This method returns a numpy array, so its output value will
be summarized automatically by the MFE framework before
outputting to the user.
"""
# Either your method return a single value, or it return an
# numpy array. You can't mix both within a single metafeature
# extraction method.
if np.any(y < 0):
# My metafeature can't handle negative 'y' values, so I
# can return an invalid array
# return np.nan # Wrong! It is not an array!
return np.array([np.nan]) # Correct.
if y.size > 20:
return np.power(y, 1 / 4) + np.arange(y.size)
return np.sqrt(y) + np.arange(y.size)
@classmethod
def _protected_methods(cls, arg_foo: float) -> float:
"""Tips for using protected methods.
1. How to use Python's protected methods in pymfe code
-----------------------------------------------------------------
Protected methods (methods whose name starts with a underscore)
should be used whenever you need to modularize better your code,
and even more if you need to use the same piece of code between
two or more different metafeature extraction methods.
2. Using private methods
-----------------------------------------------------------------
Private methods (methods prefixed with two underscores) are not
really necessary, and their use must be justified somehow.
So far, there is not even a single private method in any pymfe
code.
3. Protected method documentation
-----------------------------------------------------------------
You don't need to follow the standard documentation format for
protected methods (method description, argument list, return value
description etc.) Instead, you can be more technical since the
documentation will probably be more suitable for other developers
and maintainers of the package. If you fell more confortable with
the standard format (just like the public methods), there is no
harm to follow it in the protected method documentation then.
"""
def inner_functions(x: float, lamb: float = 1.0) -> float:
"""Usage of inner functions.
1. When to use inner functions
---------------------------------------------------------
Use them whenever you need modularize a piece of code that
is way too much specific for the method that contains it.
Therefore, it is highly unlikely that this same piece of
code may ever be used from another method.
2. How many inner functions per method?
---------------------------------------------------------
These functions are quite useful for very complex feature
extraction methods with many steps needed to reach the final
result. In that case, consider creating a separated inner
function for every step.
"""
return np.abs(np.tanh(x * lamb) * 0.7651j)
return np.max(inner_functions(arg_foo), 0.0)
@classmethod
def non_protected_methods_without_any_prefixes(cls) -> None:
"""Don't use non-protected regular methods.
The main reason to avoid this type of methods is because
it will be shown in the package documentation despite the
fact that it is not of the user's interest.
"""
raise NotImplementedError(
"Hide me prefixing my name with a single '_'.")
@classmethod
def postprocess_groupName1_groupName2(
cls, mtf_names: t.List[str], mtf_vals: t.List[float],
mtf_time: t.List[float], class_indexes: t.Sequence[int],
groups: t.Tuple[str, ...], inserted_group_dep: t.FrozenSet[str],
**kwargs
) -> t.Optional[t.Tuple[t.List[str], t.List[float], t.List[float]]]:
"""Introduction to post-processing methods.
1. What is a post-processing method?
-----------------------------------------------------------------
The post-processing methods can be used to either modify in-place
previously generated metafeatures (not necessarily from the same
group) or to generate new metafeatures using previously extracted
metafeatures just before outputting the results to the user. The
popularity of this type of method is not even close to the
preprocessing ones, but they may be useful in some specific cases
(mainly related to `somehow` merge the dependencies data with the
generated data from the dependent class.)
For instance, the 'Relative Landmarking' metafeature group is
entirely based on post-processing methods: that specific group needs
every 'Landmarking' metafeature results and, therefore, it can be
computed only after the metafeature extraction process finishes
(because we have no guarantees of the metafeature extraction order.)
So, if your MFE class does not have any external dependencies, nor it
is supposed to somehow merge two or more metafeature values, you
don't need to read this section, and you are already good to go
and develop your own MFE class. If it is not your case, then stay
with us for a couple of extra minutes more.
2. Structure of a post-processing method
-----------------------------------------------------------------
All post-processing methods receive all previously extracted
metafeatures from every MFE class. It will not receive just the
metafeatures related to the metafeature extraction methods of this
class. It is very import to keep this in mind.
There's a very important trick with the naming of these post-processing
methods, other than just prefixing they with ``postprocess_``.
You can put names of metafeature groups of interest separated by
underscores. All metafeature indexes related to any of the selected
groups will arrive in the ``class_indexes`` argument automatically.
For example, suppose a post-processing method named like:
postprocess_infotheory_statistical(...)
This implies that the indices of both `information theory` and
`statistical` metafeature groups will arrive inside the
``class_indexes`` sequence. Using this feature, one can easily
work with these metafeatures without needing to separate them by
hand. Of course, you can give as many metafeature group names as
needed. If you need them all, then simply don't put any metafeature
group name, as every metafeature is an metafeature of interest in
this case.
There were various arguments that are automatically filled for
this type of methods (as you can see just above in this method
signature). Check the ``arguments`` section for more details
about each one.
3. How many post-processing methods are necessary?
-----------------------------------------------------------------
Just like the preprocessing and metafeature extraction methods,
an MFE class may have any number post-processing methods, including
none. In fact, no post-processing method is by far the common case.
4. Return value of post-processing methods
-----------------------------------------------------------------
The return value of post-processing methods must be either None,
or a tuple with exactly three lists. In the first case (returning
None), the post-processing method is probably supposed to modify
the received metafeature values in-place (which is perfectly
fine). In the second case (returning three lists), these lists
will be considered new metafeatures and will be appended to the
MFE output before given to the user. These lists must follow the
order given below:
1. New metafeature names
2. New metafeature values
3. Time elapsed to extract every new metafeature
Now, let's take a quick look at the common post-processing method
arguments. Note that all the arguments listed below are actual
arguments from the pymfe framework, and you can use they in your
post-processing methods.
Arguments
---------
mtf_names : :obj:`list` of str
A list containing all previously extracted metafeature names.
mtf_vals : :obj:`list` of float
A list containing all previously extracted metafeature values.
mtf_time : :obj:`list` of float
A list containing all time elapsed for each metafeature
previously extracted.
class_indexes : Sequence of int
Indexes of the metafeatures related to this method ``groups of
interest``. The ``groups of interest`` are the metafeature groups
whose name are in this method's name after the ``postprocess_``
prefix, separated with underscores (in this example, they are
``groupName1`` and ``groupName2``.)
If it is not clear for you so far, the metafeatures received
in this method are all the metafeatures extracted in every MFE
classes, not just the ones related to this class. Then, this
argument can be used as reference to target only the metafeatures
effectively used in this post-processing method.
If you need every single metafeature extracted for your
post-processing method, then this argument does not matter (nor
your post-processing method name, as long as it is correctly
prefixed with ``postprocess_``) as every metafeature is of your
particular interest, and there is no need for an auxiliary
list to split the metafeatures.
groups : :obj:`tuple` of str
Extracted metafeature groups (including metafeature groups
inserted due to group dependencies). Can be used as reference
inside the post-processing method.
inserted_group_dep : :obj:`tuple` of :obj:`str
Extracted metafeature groups due to class dependencies. Can be
used as a reference inside the post-processing method.
**kwargs:
Just like the preprocessing methods, the kwargs is also
mandatory in post-processing methods. It can be used to
retrieve additional arguments using the ``get`` method.
Returns
-------
if not None:
:obj:`tuple` with three :obj:`list`
These lists are (necessarily in this order):
1. New metafeature names
2. New metafeature values
3. Time elapsed to extract every new metafeature
"""
# Sometimes you can cheat pylint in case you are not using some
# arguments, such as kwargs. Keep in mind that this fact should
# not be abused just to avoid pylint warnings. Always take some
# time to fix your code.
# pylint: disable=W0613
new_mtf_names = [] # type: t.List[str]
new_mtf_vals = [] # type: t.List[float]
new_mtf_time = [] # type: t.List[float]
# In this example, this post-processing method returns
# new metafeatures conditionally. Note that this variable
# ``change_in_place`` is fabricated for this example; it
# is not a true feature of the Pymfe framework!!! The
# decision of whether or not to change metafeatures in
# place depends on your particular context!
change_in_place = kwargs.get("change_in_place", False)
if change_in_place:
# Make changes in-place using the ``class_indexes`` as
# reference. Note that these indexes are collected using
# this post-processing method name as reference (check the
# documentation of this method for a clear explanation.)
for index in class_indexes:
time_start = time.time()
mtf_vals[index] *= 2.0
mtf_names[index] += ".twice"
mtf_time[index] += time.time() - time_start
# Don't return new metafeatures, as the changes made are
# in-place in this particular situation.
return None
# The previous branch was not taken: therefore, the changes
# are not in-place. This means that new metafeatures will be
# created and appended to the previously existing ones. Note
# that whether the new feature values are supposed to be identical
# to its in-place variants are context dependent. If you have
# good reasons to do make they different, then you are allowed to.
# Create new metafeatures (in this case, the user will receive
# twice as many values as separated metafeatures.) Note that the
# number of new metafeatures also is context dependent: your
# post-processing method may return as many as new metafeatures it
# is supposed to return.
for index in class_indexes:
time_start = time.time()
new_mtf_vals.append(-1.0 * new_mtf_vals[index])
new_mtf_names.append("{}.negative".format(new_mtf_names[index]))
new_mtf_time.append(new_mtf_time[index] + time.time() - time_start)
# Finally:
# Return new metafeatures produced in this method. Pay attention to the
# order of these lists, as it must be preserved for any post-processing
# method.
return new_mtf_names, new_mtf_vals, new_mtf_time
Total running time of the script: ( 0 minutes 0.005 seconds)