它支持多种数据读取的方式这里简单介绍2种
通过csv文件读取数据:
$ pip instal pandas
$ python
>>> import pandas as pd
>>> data = pd.read_csv('test.csv')
通过mysql读取数据:
$ pip install sqlalchemy
$ pip install MySQL-python
$ python
>>> import pandas as pd
>>> from sqlalchemy import create_engine
>>> engine = create_engine('mysql://user:password@localhost/test')
>>> with engine.connect() as conn, conn.begin():
>>> data = pd.read_sql_table('data', conn)
>>> data
x y shape color xx
0 0.8 21 2 0.60
1 NaN 0.9 23 2 0.93
2 0.5 0.3 NaN 1 0.30
3 0.3 0.5 24 1 0.10
4 0.0 0.2 25 2 0.00
5 0.3 0.3 25 1 0.10
对不符合要求的数据进行清除,去掉数据里出现空值(NaN)的行
>>> data.dropna(how='any')
x y shape color xx
0 0.8 21 2 0.6
3 0.3 0.5 24 1 0.1
4 0.0 0.2 25 2 0.0
5 0.3 0.3 25 1 0.1
取行列数量:
>>> data.shape #6行,5列
(6, 5)
取行列名:
>>> data.columns
Index([u'x', u'y', u'shape', u'color', u'xx'], dtype='object')
select语句比较:类似 select shape, color from data limit 3;
>>> data[['shape','color']].head(3)
shape color
0 21 2
1 23 2
2 NaN 1
where语句比较:类似 select color from data where color = 2 limit 3;
>>> data[data['color'] == 2].head(3)
x y shape color xx
0 0.8 21 2 0.60
1 NaN 0.9 23 2 0.93
4 0.0 0.2 25 2 0.00
group by语句比较:类似 select color, count(*) from data where gruop by color;
>>> data.groupby('color').size()
color
1 3
2 3
dtype: int64
join语句比较:类似 select * from date inner join data2 on date.x = date2.x;
>>> pd.merge(data, data2, on='x')
x y_x shape_x color_x xx_x y_y shape_y color_y xx_y
0 0.8 21 2 0.60 21 2 0.60
1 NaN 0.9 23 2 0.93 0.9 23 2 0.93
2 0.5 0.3 NaN 1 0.30 0.3 NaN 1 0.30
3 0.3 0.5 24 1 0.10 0.5 24 1 0.10
4 0.3 0.5 24 1 0.10 0.3 25 1 0.10
5 0.3 0.3 25 1 0.10 0.5 24 1 0.10
6 0.3 0.3 25 1 0.10 0.3 25 1 0.10
7 0.0 0.2 25 2 0.00 0.2 25 2 0.00
pip install matplotlib
>>> import pandas as pd
>>> import matplotlib.pyplot as plt
>>> d = {'hubei': 20, 'guangdong': 10, 'zhejiang': 15} #演示数据key:value对
>>> ts = pd.Series(d) #序列化数据
>>> ts.plot(kind='barh') #选择绘制成水平条形图
>>> plt.savefig('test.png') #保存成图片