专栏首页让自己透明,用于自己看的内容Kernel PCA for nonlinear dimensionality reduction核心PCA非线性降维

Kernel PCA for nonlinear dimensionality reduction核心PCA非线性降维

Most of the techniques in statistics are linear by nature, so in order to capture nonlinearity,we might need to apply some transformation. PCA is, of course, a linear transformation.In this recipe, we'll look at applying nonlinear transformations, and then apply PCA for dimensionality reduction.


Getting ready准备工作

Life would be so easy if data was always linearly separable, but unfortunately it's not.Kernel PCA can help to circumvent this issue. Data is first run through the kernel function that projects the data onto a different space; then PCA is performed.


To familiarize yourself with the kernel functions, it will be a good exercise to think of how to generate data that is separable by the kernel functions available in the kernel PCA.Here, we'll do that with the cosine kernel. This recipe will have a bit more theory than the previous recipes.


How to do it...怎么做

The cosine kernel works by comparing the angle between two samples represented in the feature space. It is useful when the magnitude of the vector perturbs the typical distance measure used to compare samples.


As a reminder, the cosine between two vectors is given by the following:像提出的那样,两向量cos公式如下:

This means that the cosine between A and B is the dot product of the two vectors normalized by the product of the individual norms. The magnitude of vectors A and B have no influence on this calculation.


So, let's generate some data and see how useful it is. First, we'll imagine there are two different underlying processes; we'll call them A and B:现在让我们来生成一些数据,来看看如何使用。首先,我们设想有两个不同的潜在过程,我们称他们为A和B:

>>> import numpy as np

>>> A1_mean = [1, 1]

>>> A1_cov = [[2, .99], [1, 1]]

>>> A1 = np.random.multivariate_normal(A1_mean, A1_cov, 50)

>>> A2_mean = [5, 5]

>>> A2_cov = [[2, .99], [1, 1]]

>>> A2 = np.random.multivariate_normal(A2_mean, A2_cov, 50)

>>> A = np.vstack((A1, A2))

>>> B_mean = [5, 0]

>>> B_cov = [[.5, -1], [-.9, .5]]

>>> B = np.random.multivariate_normal(B_mean, B_cov, 100)

Once plotted, it will look like the following:绘图,他们会是这样:

By visual inspection, it seems that the two classes are from different processes, but separating them in one slice might be difficult. So, we'll use the kernel PCA with the cosine kernel discussed earlier:


>>> kpca = decomposition.KernelPCA(kernel='cosine', n_components=1)

>>> AB = np.vstack((A, B))

>>> AB_transformed = kpca.fit_transform(AB)

Visualized in one dimension after the kernel PCA, the dataset looks like the following:


Contrast this with PCA without a kernel:比较一下没有核的PCA

Clearly, the kernel PCA does a much better job.很明显,核心PCA表现很不错

How it works...如何工作的

There are several different kernels available as well as the cosine kernel. You can even write your own kernel function. The available kernels are:


1、poly (polynomial) 多项式核函数

2、rbf (radial basis function)径向基函数

3、sigmoid S型函数

4、cosine cos

5、precomputed 预计算

There are also options contingent of the kernel choice. For example, the degree argument will specify the degree for the poly , rbf , and sigmoid kernels; also, gamma will affect the rbf or poly kernels.

还有很多可组合的核函数,如级参数将为解释poly , rbf , and sigmoid核函数,γ将影响rbf or poly核函数

The recipe on SVM will cover the rbf kernel function in more detail.在SVM那部分将要包含更多关于rbf核函数的细节

A word of caution: kernel methods are great to create separability, but they can also cause overfitting if used without care.一点忠告,核方法很擅长分离,但是要注意因为不注意的使用它而引起的过拟合。


原文作者:Trent Hauck


  • Decomposition to classify with DictionaryLearning字典学习的分解

    In this recipe, we'll show how a decomposition method can actually be used for c...

  • Using ridge regression to overcome linear regression's shortfalls

    In this recipe, we'll learn about ridge regression. It is different from vanilla...

  • Evaluating the linear regression model评估线性回归模型

    In this recipe, we'll look at how well our regression fits the underlying data. ...

  • Head First Stanford NLP (2)

    (深入浅出Stanford NLP 进阶篇) 本文接着介绍Stanford NLP工具的使用方法。

  • Leetcode 174 Dungeon Game

    The demons had captured the princess (P) and imprisoned her in the bottom-right...

  • 陷入回声室:Twitter上的意大利疫苗辩论(CS AI)


  • Codeforces Beta Round #2 A,B,C

    A. Winner time limit per test:1 second memory limit per test:64 megabytes input:...

  • Codeforces Round #395 (Div. 2)(A.思维,B,水)

    A. Taymyr is calling you time limit per test:1 second memory limit per test:256 ...

  • imp错误IMP-00098: INTERNAL ERROR: impgst2Segmentation fault


  • Controlling Access to the Kubernetes API

    ? API Server Ports and IPs By default the Kubernetes API server serves HTTP o...