* [Selecting dimensionality reduction with Pipeline and GridSearchCV](https://scikit-learn.org/stable/auto_examples/plot_compare_reduction.html#sphx-glr-auto-examples-plot-compare-reduction-py)
*[Feature Union with Heterogeneous Data Sources](https://scikit-learn.org/stable/auto_examples/hetero_feature_union.html#sphx-glr-auto-examples-hetero-feature-union-py)
\ No newline at end of file
*[Feature Union with Heterogeneous Data Sources](https://scikit-learn.org/stable/auto_examples/hetero_feature_union.html#sphx-glr-auto-examples-hetero-feature-union-py)
* Kilian Weinberger, Anirban Dasgupta, John Langford, Alex Smola and Josh Attenberg (2009). [用于大规模多任务学习的特征散列](http://alex.smola.org/papers/2009/Weinbergeretal09.pdf). Proc. ICML.
> * 用于文本特征提取和评估的样本管道 [Sample pipeline for text feature extraction and evaluation](https://scikit-learn.org/stable/auto_examples/model_selection/grid_search_text_feature_extraction.html#sphx-glr-auto-examples-model-selection-grid-search-text-feature-extraction-py)
对于文本分类任务中的外核缩放的完整示例,请参阅文本文档的外核分类 [Out-of-core classification of text documents](https://scikit-learn.org/stable/auto_examples/applications/plot_out_of_core_classification.html#sphx-glr-auto-examples-applications-plot-out-of-core-classification-py).
*[A demo of structured Ward hierarchical clustering on a raccoon face image](https://scikit-learn.org/stable/auto_examples/cluster/plot_face_ward_segmentation.html#sphx-glr-auto-examples-cluster-plot-face-ward-segmentation-py)
*[Spectral clustering for image segmentation](https://scikit-learn.org/stable/auto_examples/cluster/plot_segmentation_toy.html#sphx-glr-auto-examples-cluster-plot-segmentation-toy-py)
*[Feature agglomeration vs. univariate selection](https://scikit-learn.org/stable/auto_examples/cluster/plot_feature_agglomeration_vs_univariate_selection.html#sphx-glr-auto-examples-cluster-plot-feature-agglomeration-vs-univariate-selection-py)
\ No newline at end of file
*[Feature agglomeration vs. univariate selection](https://scikit-learn.org/stable/auto_examples/cluster/plot_feature_agglomeration_vs_univariate_selection.html#sphx-glr-auto-examples-cluster-plot-feature-agglomeration-vs-univariate-selection-py)
一般来说,机器学习算法受益于数据集的标准化。如果数据集中存在一些离群值,那么稳定的缩放或转换更合适。不同缩放、转换以及归一在一个包含边缘离群值的数据集中的表现在 [Compare the effect of different scalers on data with outliers](https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html#sphx-glr-auto-examples-preprocessing-plot-all-scaling-py) 中有着重说明。
[`Imputer`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.Imputer.html#sklearn.preprocessing.Imputer "sklearn.preprocessing.Imputer") 可以在 Pipeline 中用作构建支持插补的合成模型。参见 [Imputing missing values before building an estimator](https://scikit-learn.org/stable/auto_examples/plot_missing_values.html#sphx-glr-auto-examples-plot-missing-values-py) 。
*[The Johnson-Lindenstrauss bound for embedding with random projections](https://scikit-learn.org/stable/auto_examples/plot_johnson_lindenstrauss_bound.html#sphx-glr-auto-examples-plot-johnson-lindenstrauss-bound-py)
* Sanjoy Dasgupta. 2000. [Experiments with random projection.](http://cseweb.ucsd.edu/~dasgupta/papers/randomf.pdf) In Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence (UAI‘00), Craig Boutilier and Moisés Goldszmidt (Eds.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 143-151.
* Ella Bingham and Heikki Mannila. 2001. [Random projection in dimensionality reduction: applications to image and text data.](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.24.5135&rep=rep1&type=pdf) In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining (KDD ‘01). ACM, New York, NY, USA, 245-250.
* Sanjoy Dasgupta and Anupam Gupta, 1999. [An elementary proof of the Johnson-Lindenstrauss Lemma.](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.39.3334&rep=rep1&type=pdf)
## 4.5.2\. 高斯随机投影
## 5.5.2\. 高斯随机投影
The [`sklearn.random_projection.GaussianRandomProjection`](https://scikit-learn.org/stable/modules/generated/sklearn.random_projection.GaussianRandomProjection.html#sklearn.random_projection.GaussianRandomProjection"sklearn.random_projection.GaussianRandomProjection") 通过将原始输入空间投影到随机生成的矩阵(该矩阵的组件由以下分布中抽取) :math:[`](#id4)N(0, frac{1}{n_{components}})`降低维度。
...
...
@@ -61,7 +61,7 @@ The [`sklearn.random_projection.GaussianRandomProjection`](https://scikit-learn.
@@ -99,4 +99,4 @@ The [`sklearn.random_projection.GaussianRandomProjection`](https://scikit-learn.
参考:
* D. Achlioptas. 2003. [Database-friendly random projections: Johnson-Lindenstrauss with binary coins](www.cs.ucsc.edu/~optas/papers/jl.pdf). Journal of Computer and System Sciences 66 (2003) 671–687
* Ping Li, Trevor J. Hastie, and Kenneth W. Church. 2006. [Very sparse random projections.](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.62.585&rep=rep1&type=pdf) In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD ‘06). ACM, New York, NY, USA, 287-296.
\ No newline at end of file
* Ping Li, Trevor J. Hastie, and Kenneth W. Church. 2006. [Very sparse random projections.](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.62.585&rep=rep1&type=pdf) In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD ‘06). ACM, New York, NY, USA, 287-296.
@@ -103,4 +103,4 @@ kernel 方法的一个缺点是,在优化过程中有可能存储大量的 ker
| [[VZ2010]](#id5) | [“Efficient additive kernels via explicit feature maps”](https://www.robots.ox.ac.uk/~vgg/publications/2011/Vedaldi11/vedaldi11.pdf) Vedaldi, A. and Zisserman, A. - Computer Vision and Pattern Recognition 2010 |
| [[VVZ2010]](#id6) | [“Generalized RBF feature maps for Efficient Detection”](https://www.robots.ox.ac.uk/~vgg/publications/2010/Sreekanth10/sreekanth10.pdf) Vempati, S. and Vedaldi, A. and Zisserman, A. and Jawahar, CV - 2010 |
\ No newline at end of file
| [[VVZ2010]](#id6) | [“Generalized RBF feature maps for Efficient Detection”](https://www.robots.ox.ac.uk/~vgg/publications/2010/Sreekanth10/sreekanth10.pdf) Vempati, S. and Vedaldi, A. and Zisserman, A. and Jawahar, CV - 2010 |
* C.D. Manning, P. Raghavan and H. Schütze (2008). Introduction to Information Retrieval. Cambridge University Press. [http://nlp.stanford.edu/IR-book/html/htmledition/the-vector-space-model-for-scoring-1.html](http://nlp.stanford.edu/IR-book/html/htmledition/the-vector-space-model-for-scoring-1.html)
已被证明在机器学习中运用到无噪声数据中是有用的. 可见例如 [Machine learning for quantum mechanics in a nutshell](http://onlinelibrary.wiley.com/doi/10.1002/qua.24954/abstract/).
## 4.7.7\. 卡方核函数
## 5.7.7\. 卡方核函数
```py
在计算机视觉应用中训练非线性支持向量机时,卡方核函数是一种非常流行的选择.
...
...
@@ -143,4 +143,4 @@ array([0, 1, 0, 1])
参考:
* Zhang, J. and Marszalek, M. and Lazebnik, S. and Schmid, C. Local features and kernels for classification of texture and object categories: A comprehensive study International Journal of Computer Vision 2007 [http://research.microsoft.com/en-us/um/people/manik/projects/trade-off/papers/ZhangIJCV06.pdf](http://research.microsoft.com/en-us/um/people/manik/projects/trade-off/papers/ZhangIJCV06.pdf)
\ No newline at end of file
* Zhang, J. and Marszalek, M. and Lazebnik, S. and Schmid, C. Local features and kernels for classification of texture and object categories: A comprehensive study International Journal of Computer Vision 2007 [http://research.microsoft.com/en-us/um/people/manik/projects/trade-off/papers/ZhangIJCV06.pdf](http://research.microsoft.com/en-us/um/people/manik/projects/trade-off/papers/ZhangIJCV06.pdf)