前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >sklearn查看数据分布

sklearn查看数据分布

作者头像
MachineLP
发布2018-01-09 11:40:35
1.7K0
发布2018-01-09 11:40:35
举报
文章被收录于专栏:小鹏的专栏
代码语言:javascript
复制
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import StratifiedShuffleSplit


train_data = pd.read_csv("train.csv")
LABELS = train_data['species']

# 将train_data中的‘id’列弹出。
ID = train_data.pop('id')
# print train_data[0:1]

# 将train_data中的‘species’列弹出。
y = train_data.pop('species')
# 将species向量化。
y = LabelEncoder().fit(y).transform(y)
print y

# standardize the data by setting the mean to 0 and std to 1
standardize = True
X = StandardScaler().fit(train_data).transform(train_data) if standardize else train_data.values
print X[0:1]

from sklearn.decomposition import PCA, IncrementalPCA
n_components = 2
ipca = IncrementalPCA(n_components=n_components, batch_size=10)
X_ipca = ipca.fit_transform(X)

pca = PCA(n_components=n_components)
X_pca = pca.fit_transform(X)

colors = ['navy', 'turquoise', 'darkorange', 'blue', 'purple', 'green',
          'yellow','red','pink', 'palegoldenrod','navy', 'turquoise', 'darkorange', 'blue', 'purple', 'green',
          'yellow','red','pink', 'palegoldenrod','navy', 'turquoise', 'darkorange', 'blue', 'purple', 'green',
          'yellow','red','pink', 'palegoldenrod',]

for X_transformed, title in [(X_ipca, "Incremental PCA"), (X_pca, "PCA")]:
    plt.figure(figsize=(8, 8))
    for color, i, target_name in \
    zip(colors, [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24], LABELS):
        plt.scatter(X_transformed[y == i, 0], X_transformed[y == i, 1],
                    color=color, lw=2, label=target_name)

    if "Incremental" in title:
        err = np.abs(np.abs(X_pca) - np.abs(X_ipca)).mean()
        plt.title(title + " of iris dataset\nMean absolute unsigned error "
                  "%.6f" % err)
    else:
        plt.title(title + " of iris dataset")
    plt.legend(loc="best", shadow=False, scatterpoints=1)
    plt.axis([-10, 10, -10, 10])

plt.show()
代码语言:javascript
复制
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import StratifiedShuffleSplit


train_data = pd.read_csv("train.csv")
LABELS = train_data['species']

# 将train_data中的‘id’列弹出。
ID = train_data.pop('id')
# print train_data[0:1]

# 将train_data中的‘species’列弹出。
y = train_data.pop('species')
# 将species向量化。
y = LabelEncoder().fit(y).transform(y)
print y

# standardize the data by setting the mean to 0 and std to 1
standardize = True
X = StandardScaler().fit(train_data).transform(train_data) if standardize else train_data.values
print X[0:1]

from sklearn.decomposition import PCA, IncrementalPCA
n_components = 2
ipca = IncrementalPCA(n_components=n_components, batch_size=10)
X_ipca = ipca.fit_transform(X)

pca = PCA(n_components=n_components)
X_pca = pca.fit_transform(X)

colors = ['navy', 'turquoise', 'darkorange', 'blue', 'purple', 'green',
          'yellow','red','pink', 'palegoldenrod','navy', 'turquoise', 'darkorange', 'blue', 'purple', 'green',
          'yellow','red','pink', 'palegoldenrod','navy', 'turquoise', 'darkorange', 'blue', 'purple', 'green',
          'yellow','red','pink', 'palegoldenrod',]

for X_transformed, title in [(X_ipca, "Incremental PCA"), (X_pca, "PCA")]:
    plt.figure(figsize=(8, 8))
    for color, i, target_name in \
    zip(colors, [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24], LABELS):
        plt.scatter(X_transformed[y == i, 0], X_transformed[y == i, 1],
                    color=color, lw=2, label=target_name)

    if "Incremental" in title:
        err = np.abs(np.abs(X_pca) - np.abs(X_ipca)).mean()
        plt.title(title + " of iris dataset\nMean absolute unsigned error "
                  "%.6f" % err)
    else:
        plt.title(title + " of iris dataset")
    #plt.legend(loc="best", shadow=False, scatterpoints=1)
    plt.axis([-10, 10, -10, 10])

plt.show()
本文参与 腾讯云自媒体同步曝光计划,分享自作者个人站点/博客。
原始发表:2017年02月10日,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档