未验证 提交 f5bd4fc3 编写于 作者: HansBug's avatar HansBug 😆 提交者: GitHub

Merge pull request #25 from opendilab/doc/practice

doc(hansbug): add practice pages
......@@ -60,8 +60,6 @@ gen
### Images template
# JPEG
*.jpg
*.jpeg
*.jpe
*.jif
*.jfif
......
Apply into Scikit-Learn
===========================
Actually, ``TreeValue`` can be used in practice with not only ``numpy`` or ``torch`` library, such as ``scikit-learn``.
In the following part, a demo of PCA to tree-structured arrays will be shown.
In the field of traditional machine learning, PCA (Principal Component Analysis) is often used to preprocess data,
by normalizing the data range, and trying to reduce the dimensionality of the data, so as to reduce the complexity
of the input data and improve machine learning's efficiency and quality. Just as the following image
.. figure:: heading_of_pca.jpg
:alt: PCA Principle
PCA in a nutshell. Source: Lavrenko and Sutton 2011, slide 13.
In the scikit-learn library, the PCA class is provided to support this function, and the function ``fit_transform``
can be used to simplify the data. For a set of ``np.array`` format data that presents a tree structure,
we can implement the operation support for the tree structure by quickly wrapping the function ``fit_transform``.
The specific code is as follows
.. literalinclude:: sklearn.demo.py
:language: python
:linenos:
The output should be
.. literalinclude:: sklearn.demo.py.txt
:language: text
:linenos:
For further information, see the links below:
* `Official documentation of PCA in scikit-learn <https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html?highlight=pca#sklearn.decomposition.PCA>`_
* `Details of PCA <https://devopedia.org/principal-component-analysis>`_
import numpy as np
from sklearn.decomposition import PCA
from treevalue import FastTreeValue
fit_transform = FastTreeValue.func()(lambda x: PCA(min(*x.shape)).fit_transform(x))
if __name__ == '__main__':
data = FastTreeValue({
'a': np.random.randint(-5, 15, (4, 3)),
'x': {
'c': np.random.randint(-15, 5, (5, 4)),
}
})
print("Original int data:")
print(data)
pdata = fit_transform(data)
print("Fit transformed data:")
print(pdata)
......@@ -26,6 +26,7 @@ structure processing when the calculation is tree-based.
:caption: Best Practice
best_practice/numpy/index
best_practice/sklearn/index
.. toctree::
:maxdepth: 2
......
......@@ -7,4 +7,5 @@ packaging
sphinx-multiversion~=0.2.4
where~=1.0.2
numpy>=1.19,<2
easydict>=1.7,<2
\ No newline at end of file
easydict>=1.7,<2
scikit-learn>=0.24.2
\ No newline at end of file
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册