首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >python seek thread 超大日志数据分析

python seek thread 超大日志数据分析

作者头像
葫芦
发布2019-04-17 16:31:56
8220
发布2019-04-17 16:31:56
举报
文章被收录于专栏:葫芦葫芦
#!/usr/bin/env python
# -*- coding: utf-8 -*-

///
./flowdata.log
2017-02-02 15:29:19,390 [views:111:ebitpost] [INFO]- ebitapi: http://218.85.118.8:8000/api/user/query, ebit response: src_ip: 110.86.101.119:63688, content: {"data":{"basic_rate_down":20480,"basic_rate_up":2048,"dial_acct":"fj::059391534153","max_linerate_down":102400,"max_linerate_up":102400},"message":"提速判断成功","result":0}
///
///
./ipdb_cn.txt
1.1.1.0  中国 广东 深圳 
1.1.2.0  中国 广东 深圳
...
233.233.2.0  中国 新疆 乌鲁木齐
///
import re,heapq,threading
from collections import Counter
from multiprocessing import Pool
dic={}
def readconfig():
    with open('./ipdb_cn.txt',mode='r') as f:
        for i in f:
            nn=i.split()
            tn= nn[2].decode('utf-8')
            if dic.has_key(tn):
                dic[tn].add('.'.join(nn[0].split('.')[:-1]))
            else:
                dic[tn]=set()
                dic[tn].add('.'.join(nn[0].split('.')[:-1]))
t=threading.Thread(target=readconfig)
t.start()
tf=open('./flowdata.log','r')
tf.seek(0,2)
total=tf.tell()
def run(start,end):
    with open('./flowdata.log','r') as f:
        s=set()
        regex=re.compile(r'_ip:\s?([0-9]+(?:\.[0-9]+){3}')
        ad=s.add
        tel=f.tell
        fd=re.findall
        f.seek(start,0)
        for i in f:
            l=fd(regex,i)
            if len(l):
                ad(l[0])
            if  tel()>end:
                return s
        return s
p=Pool(4)
results=[]
for i in range(12):
    result=p.apply_async(run,args=(i*total/12,(i+1)*total/12))
    results.append(result)
p.close()
p.join()
t.join()
filset=set()
for result in results:
    filset|=result.get()
sumfil=len(filset)
filist=list(filset)
def refn(start,end):
    return [k for i in filist[start:end] for k in dic if i[:i.rindex('.')] in dic[k]]
p=Pool(4)
results=[]
for i in range(8):
    result=p.apply_async(refn,args=(i*sumfil/8,(i+1)*sumfil/8))
    results.append(result)
p.close()
p.join()
fn=[]
for result in results:
    fn+=result.get()
fdic=Counter(fn)
ret=[{'n':k,'v':fdic[k]/float(sumfil)*100} for k in fdic]
sortl=heapq.nlargest(len(ret),ret,key=lambda s:s['v'])
for i in sortl:
    print i['n'] + '   ' + '%.2f' % round(i['v'],2)+'%'
本文参与 腾讯云自媒体分享计划,分享自作者个人站点/博客。
原始发表:2018/09/21 ,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档