上节课提到的MongoDB,其实安装很简单的,前几天下载页面打不开,无形中放大了心里阴影面积
用下面几个命令,几分钟就ok了。
1
MongDB的基本操作
输入mongo进入环境,然后:
> use my_quant
switched to db my_quant
> db
my_quant
>db.my_quant.insert({"code":"000001","date":"2015-01-05","index":true,"close":3350.52,"high":3369.28,"low":3253.88,"open":3258.63,"volume":531352391})
3行命令就可以新建一个数据库.
然后如果要删除一个数据库,用下面命令即可:
> show databases
admin 0.000GB
config 0.000GB
daily 0.000GB
local 0.000GB
my_quant 0.000GB
> use daily
switched to db daily
> show collections
daily
> db
daily
> db.dropDatabase()
{ "dropped" : "daily", "ok" : 1 }
> show databases
admin 0.000GB
config 0.000GB
local 0.000GB
my_quant 0.000GB
当然用db.daily.drop()也是可以的
2
搭建自己的数据库
前一篇讲到了爬取股票数据的基本函数,这次就是要把爬取的数据存储到我们的数据库了。先贴一下量化分析的大概框架吧。
├── README
├── MyQuant_v1 #量化分析程序目录
├── __init__.py
├── data #数据处理目录
│ ├── __init__.py
│ └── data_crawler.py #爬取指数、股票数据
├──util # 公用程序
│ ├── __init__.py
│ └── database.py #链接数据库
├── backtest #回测
│ ├── __init__.py
│ └── _backtest_ #计划写一下回测走势图
├── factor #因子
│ ├── __init__.py
│ └──_ factor_.py #不准备开发
├── strategy #策略
│ ├── __init__.py
│ └── _strategy_ #计划简单写个,主要用于回测
├── trading #交易
│ ├── __init__.py
│ └── _trading_ #不准备开发
└── log #日志目录
├── __init__.py
├── backtest.log #不准备开发
└── transactions.log#不准备开发
至于为何要搞框架,之前也有介绍过了,程序复杂了,如果再揉在一篇中,随便改个东西都可能会要命。
先贴个小菜,链接数据库的,只有2行代码
#!/usr/bin/env python3.6
# -*- coding: utf-8 -*-
# @Time : 2019-07-06 20:14
# @Author : Ed Frey
# @File : database.py
# @Software: PyCharm
from pymongo import MongoClient
DB_CONN = MongoClient('mongodb://127.0.0.1:27017')['my_quant']
然后注意,前方高能!!!
#!/usr/bin/env python3.6
# -*- coding: utf-8 -*-
# @Time : 2019-07-01 22:11
# @Author : Ed Frey
# @File : data_crawler.py
# @Software: PyCharm
import tushare as ts
from util.database import DB_CONN
from pymongo import UpdateOne
from datetime import datetime
class DataCrawler():
def __init__(self):
self.daily = DB_CONN['daily']
self.daily_hfq = DB_CONN['daily_hfq']
self.daily_qfq = DB_CONN['daily_qfq']
def crawl_index(self,begin_date=None,end_date=None):
'''
to crawl the index's trading information and set up our own database
:param begin_date: YYYY-MM-DD
:param end_date: YYYY-MM-DD
:return:
'''
codes = ['000001','000300','399001','399005','399006']
if begin_date is None:
begin_date = '2008-01-01'
if end_date is None:
end_date = datetime.now().strftime('%Y-%m-%d')
for code in codes:
df_daily = ts.get_k_data(code,index=True,start=begin_date,end=end_date)
self.save_data(code, df_daily, self.daily, {'index': True})
def save_data(self, code, df_daily, collection, extra_fields=None):
"""
save the data into MongoDB
:param code: stock's code
:param df_daily: the DataFrame including k line
:param collection: saving collection
:param extra_fields: the other fields that will be used one day
"""
update_requests = []
for df_index in df_daily.index:
daily_obj = df_daily.loc[df_index]
doc = self.daily_obj_2_doc(code, daily_obj)
if extra_fields is not None:
doc.update(extra_fields)
update_requests.append(
UpdateOne(
{'code': doc['code'], 'date': doc['date'], 'index': doc['index']},
{'$set': doc},
upsert=True)
)
if len(update_requests) > 0:
update_result = collection.bulk_write(update_requests, ordered =False)
print('Saving index daily, code: %s, inserted: %4d, modified: %4d'
%(code, update_result.upserted_count, update_result.modified_count),
flush=True)
def crawl(self, begin_date=None, end_date=None):
'''
to crawl the stock's trading information and set up our own database,including 前复权、后复权、不复权
:param begin_date: YYYY-MM-DD
:param end_date: YYYY-MM-DD
:return:
'''
df_stock = ts.get_stock_basics()
codes = list(df_stock.index)
if begin_date is None:
begin_date = '2008-01-01'
if end_date is None:
end_date = datetime.now().strftime('%Y-%m-%d')
for code in codes:
df_daily = ts.get_k_data(code, autype=None, start=begin_date, end=end_date)
self.save_data(code, df_daily, self.daily, {'index': False})
df_daily_hfq = ts.get_k_data(code, autype='hfq', start=begin_date, end=end_date)
self.save_data(code, df_daily_hfq, self.daily_hfq, {'index': False})
df_daily_qfq = ts.get_k_data(code, autype='qfq', start=begin_date, end=end_date)
self.save_data(code, df_daily_qfq, self.daily_qfq, {'index': False})
@staticmethod
def daily_obj_2_doc(code, daily_obj):
doc = dict(daily_obj)
doc['code'] = code
return doc
if __name__ == '__main__':
dc = DataCrawler()
dc.crawl_index(begin_date='2019-07-01',end_date='2019-07-03')
dc.crawl(begin_date='2019-07-01', end_date='2019-07-03')
测试结果随便贴一点:
/Users/Ed_Frey/anaconda2/envs/python36/bin/python /Users/Ed_Frey/Desktop/MyQuant_v1/data/data_crawler.py
Saving index daily, code: 000001, inserted: 0, modified: 0
Saving index daily, code: 000300, inserted: 0, modified: 0
Saving index daily, code: 399001, inserted: 0, modified: 0
Saving index daily, code: 300776, inserted: 3, modified: 0
Saving index daily, code: 600230, inserted: 3, modified: 0
Saving index daily, code: 600230, inserted: 3, modified: 0
<urlopen error timed out>
前面几行是已经爬取过的数据,没有插入或修改操作,后面几行新的数据进行了插入。
仅仅爬取3个交易日的数据,半个小时还没完事
时不时会报一下超时,用网上免费资源搭建数据库嘛,肯定要花点时间的。要想用交易所最详实的数据接口,vip费用也要几十万一年。所以,就酱紫慢慢爬吧。
至于终端如何查看数据库是否有更新。可以使用相应的查询参数查询,也贴一下吧(输出结果比较多,有删减):
> db.daily.find({"code":"000001"})
{ "_id" : ObjectId("5d20a455f9174eb451727881"), "code" : "000001", "date" : "2019-07-01", "index" : true, "close" : 3044.9, "high" : 3045.37, "low" : 3014.69, "open" : 3024.62, "volume" : 250840433 }
{ "_id" : ObjectId("5d20a455f9174eb451727882"), "code" : "000001", "date" : "2019-07-02", "index" : true, "close" : 3043.94, "high" : 3048.48, "low" : 3033.78, "open" : 3042.58, "volume" : 214520624 }
{ "_id" : ObjectId("5d20a455f9174eb451727883"), "code" : "000001", "date" : "2019-07-03", "index" : true, "close" : 3015.26, "high" : 3031.83, "low" : 3006.32, "open" : 3031.83, "volume" : 212296173 }
> db.daily.find({"date":"2019-07-01"})
{ "_id" : ObjectId("5d20a455f9174eb451727881"), "code" : "000001", "date" : "2019-07-01", "index" : true, "close" : 3044.9, "high" : 3045.37, "low" : 3014.69, "open" : 3024.62, "volume" : 250840433 }
{ "_id" : ObjectId("5d20ab5df9174eb451727894"), "code" : "000300", "date" : "2019-07-01", "index" : true, "close" : 3935.81, "high" : 3936.67, "low" : 3886.91, "open" : 3899.33, "volume" : 158370340 }
{ "_id" : ObjectId("5d20ab5df9174eb451727898"), "code" : "399001", "date" : "2019-07-01", "index" : true, "close" : 9530.46, "high" : 9530.46, "low" : 9339.77, "open" : 9384.79, "volume" : 337747139 }
{ "_id" : ObjectId("5d20ab5df9174eb45172789c"), "code" : "399005", "date" : "2019-07-01", "index" : true, "close" : 5908.05, "high" : 5909.84, "low" : 5803.07, "open" : 5828.43, "volume" : 33168558 }
{ "_id" : ObjectId("5d20ab5ef9174eb4517278a0"), "code" : "399006", "date" : "2019-07-01", "index" : true, "close" : 1568.16, "high" : 1568.4, "low" : 1534.56, "open" : 1545.06, "volume" : 20994431 }
{ "_id" : ObjectId("5d20ab5ff9174eb4517278a4"), "code" : "600150", "date" : "2019-07-01", "index" : false, "close" : 24, "high" : 24.98, "low" : 23.65, "open" : 24, "volume" : 369836 }
{ "_id" : ObjectId("5d20ab61f9174eb4517278ad"), "code" : "600153", "date" : "2019-07-01", "index" : false, "close" : 8.94, "high" : 9.08, "low" : 8.91, "open" : 8.98, "volume" : 134979 }
好了,这期就到这里了!