前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >Kernel PCA for nonlinear dimensionality reduction核心PCA非线性降维

Kernel PCA for nonlinear dimensionality reduction核心PCA非线性降维

作者头像
到不了的都叫做远方
修改2020-04-20 10:15:00
7400
修改2020-04-20 10:15:00
举报

Most of the techniques in statistics are linear by nature, so in order to capture nonlinearity,we might need to apply some transformation. PCA is, of course, a linear transformation.In this recipe, we'll look at applying nonlinear transformations, and then apply PCA for dimensionality reduction.

多数统计学技术都是自然线性的,所以如果想要处理非线性情况,我们需要应用一些变换,PCA当然是线性变换,以下,我们将先应用非线性变换,然后再应用PCA进行降维。

Getting ready准备工作

Life would be so easy if data was always linearly separable, but unfortunately it's not.Kernel PCA can help to circumvent this issue. Data is first run through the kernel function that projects the data onto a different space; then PCA is performed.

如果数据都是能够线性分割的,生活将是多轻松啊,但是不幸的是他不是,核心PCA能帮忙绕过这个问题,数据将首先经过能够将数据转换成另一种形式的核函数,然后PCA开始崭露头角。

To familiarize yourself with the kernel functions, it will be a good exercise to think of how to generate data that is separable by the kernel functions available in the kernel PCA.Here, we'll do that with the cosine kernel. This recipe will have a bit more theory than the previous recipes.

为了使你熟悉核函数,思考如何生成能够被核心PCA用核函数分割的数据将会是一个好的练习,我们将会用余弦核函数,这一步会比之前的步骤更偏理论一些。

How to do it...怎么做

The cosine kernel works by comparing the angle between two samples represented in the feature space. It is useful when the magnitude of the vector perturbs the typical distance measure used to compare samples.

余弦核函数能够比较两个样本在特征空间中的夹角,用测量物理距离之间的大小来比较样本间的差距是非常有效的。

As a reminder, the cosine between two vectors is given by the following:像提出的那样,两向量cos公式如下:

This means that the cosine between A and B is the dot product of the two vectors normalized by the product of the individual norms. The magnitude of vectors A and B have no influence on this calculation.

公式的意思是A、B夹角的COS值是两个向量的点积除以自己的模的乘积。AB向量的差距和他们的估计值无关。

So, let's generate some data and see how useful it is. First, we'll imagine there are two different underlying processes; we'll call them A and B:

现在让我们来生成一些数据,来看看如何使用。首先,我们设想有两个不同的潜在过程,我们称他们为A和B:

import numpy as np
A1_mean = [1, 1]
A1_cov = [[2, .99], [1, 1]]
A1 = np.random.multivariate_normal(A1_mean, A1_cov, 50)
A2_mean = [5, 5]
A2_cov = [[2, .99], [1, 1]]
A2 = np.random.multivariate_normal(A2_mean, A2_cov, 50)
A = np.vstack((A1, A2))
B_mean = [5, 0]
B_cov = [[.5, -1], [-.9, .5]]
B = np.random.multivariate_normal(B_mean, B_cov, 100)

Once plotted, it will look like the following:绘图,他们会是这样:

By visual inspection, it seems that the two classes are from different processes, but separating them in one slice might be difficult. So, we'll use the kernel PCA with the cosine kernel discussed earlier:

通过视觉判断,有两类不同的过程,一刀切分辨他们会很难,所以我们用cos核模型的核心PCA提前讨论。

kpca = decomposition.KernelPCA(kernel='cosine', n_components=1)
AB = np.vstack((A, B))
AB_transformed = kpca.fit_transform(AB)

Visualized in one dimension after the kernel PCA, the dataset looks like the following:

通过核心PCA后一维形象化,数据集将看起来是一下的样子:

Contrast this with PCA without a kernel:比较一下没有核的PCA

Clearly, the kernel PCA does a much better job.很明显,核心PCA表现很不错

How it works...如何工作的

There are several different kernels available as well as the cosine kernel. You can even write your own kernel function. The available kernels are:

有很多像cos函数一样不同的核函数可用,你也可以写自己的核函数,可选的核是:

1、poly (polynomial) 多项式核函数

2、rbf (radial basis function)径向基函数

3、sigmoid S型函数

4、cosine cos

5、precomputed 预计算

There are also options contingent of the kernel choice. For example, the degree argument will specify the degree for the poly , rbf , and sigmoid kernels; also, gamma will affect the rbf or poly kernels.

还有很多可组合的核函数,如级参数将为解释poly , rbf , and sigmoid核函数,γ将影响rbf or poly核函数

The recipe on SVM will cover the rbf kernel function in more detail.在SVM那部分将要包含更多关于rbf核函数的细节

A word of caution: kernel methods are great to create separability, but they can also cause overfitting if used without care.一点忠告,核方法很擅长分离,但是要注意因为不注意的使用它而引起的过拟合。

本文系外文翻译,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文系外文翻译前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

评论
作者已关闭评论
0 条评论
热度
最新
推荐阅读
相关产品与服务
对象存储
对象存储(Cloud Object Storage,COS)是由腾讯云推出的无目录层次结构、无数据格式限制,可容纳海量数据且支持 HTTP/HTTPS 协议访问的分布式存储服务。腾讯云 COS 的存储桶空间无容量上限,无需分区管理,适用于 CDN 数据分发、数据万象处理或大数据计算与分析的数据湖等多种场景。
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档