github链接:
PandasCute/2018-CCF-BDCI-China-Unicom-Research-Institute-top2github.com
首先感谢:林有夕
提供这份PPT,听说干货满满
话不多说,接下来将会奉上完整版ppt
有人问上面的图怎么生成的,咳咳,敲黑板重点!!!
以下代码是w2v 的聚类结果TSNE 可视化图,良心干货
import pandas as pd
import multiprocessing
import numpy as np
import random
import sys
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt
df=pd.read_csv('1_total_fee_w2v.csv')
l=list(df['1_total_fee'].astype('str'))
name=list(df)
def plot_with_labels(low_dim_embs, labels, filename = 'tsne.png'):
assert low_dim_embs.shape[0] >= len(labels), "More labels than embeddings"
plt.figure(figsize= (10, 18))
for i, label in enumerate(labels):
x, y = low_dim_embs[i, :]
plt.scatter(x, y)
plt.annotate(label, xy = (x, y), textcoords = 'offset points', ha = 'right', va = 'bottom')
plt.savefig(filename)
tsne = TSNE(perplexity = 30, n_components = 2, init = 'pca', n_iter = 5000)
plot_only = 300
low_dim_embs = tsne.fit_transform(df.iloc[:plot_only][name[1:]])
labels = [l[i] for i in range(plot_only)]
plot_with_labels(low_dim_embs, labels)
要打造优质的产品:请把自己看成是一位出色的工程师,而不是一位机器学习专家。