python学习之文章数据分析

通常我们在进行NLP学习的时候,会经常的处理一些语料,同时也会对这些语料进行一些分析,今天的这篇文章我们通过分析quora上的Andrew NG的一个回答来实际操作一下:

原文复制如下:

Deep Learning is an amazing tool that is helping numerous groups create exciting AI applications. It is helping us build self-driving cars, accurate speech recognition, computers that can understand images, and much more.Despite all the recent progress, I still see huge untapped opportunities ahead. There're many projects in precision agriculture, consumer finance, medicine, ... where I see a clear opportunity for deep learning to have a big impact, but that none of us have had time to focus on yet. So I'm confident deep learning isn't going to "plateau" anytime soon and that it'll continue to grow rapidly.Deep Learning has also been overhyped. Because neural networks are very technical and hard to explain, many of us used to explain it by drawing an analogy to the human brain. But we have pretty much no idea how the biological brain works. UC Berkeley's Michael Jordan calls deep learning a "cartoon" of the biological brain--a vastly oversimplified version of something we don't even understand--and I agree. Despite the media hype, we're nowhere near being able to build human-level intelligence. Because we fundamentally don't know how the brain works, attempts to blindly replicate what little we know in a computer also has not resulted in particularly useful AI systems. Instead, the most effective deep learning work today has made its progress by drawing from CS and engineering principles and at most a touch of biological inspiration, rather than try to blindly copy biology.Concretely, if you hear someone say "The brain does X. My system also does X. Thus we're on a path to building the brain," my advice is to run away!Many of the ideas used in deep learning have been around for decades. Why is it taking off only now? Two of the key drivers of its progress are: (i) scale of data and (ii) scale of computation. With our society spending more time on websites and mobile devices, for the past two decades we've been rapidly accumulating data. It was only recently that we figured out how to scale computation so as to build deep learning algorithms that can take advantage of this voluminous amount of data.This has now put us in two positive feedback loops, which is accelerating the progress of deep learning:First, now that we have huge machines to absorb huge amounts of data, the value of big data is clearer. This creates a greater incentive to acquire more data, which in turn creates a greater incentive to build bigger/faster neural networks.Second, that we have fast deep learning implementations also speeds up innovation, and accelerates deep learning's research progress. Many people underestimate the impact of computer systems investments in deep learning. When carrying out deep learning research, we start out not knowing what algorithms will and won't work, and our job is to run a lot of experiments and figure it out. If we have an efficient compute infrastructure that lets you run an experiment in a day rather than a week, then your research progress could be almost 7x as fast!This is why around 2008 my group at Stanford started advocating shifting deep learning to GPUs (this was really controversial at that time; but now everyone does it); and I'm now advocating shifting to HPC (High Performance Computing/Supercomputing) tactics for scaling up deep learning. Machine learning should embrace HPC. These methods will make researchers more efficient and help accelerate the progress of our whole field.To summarize: Deep learning has already helped AI made tremendous progress. But the best is still to come!

我们分析这篇文章有两个需求,一个是分析一篇文章当中的词频,另外一个是每一个词出现的次数,而我们也将奔着这两个目标去处理:

这里我们要用到matplotlib这个模块来进行图像的绘制:

1:分词处理

英文文章一个好处是他们每个词之间会有空格来进行区分,但是词和词之间往往会有句号,逗号这样的标点来去干扰,因此我们是通过string这个模块来去除标点和空格,其中string.punctuation是去除标点,string.whitespace是去除空格.至于hist[word]=hist.get(word,0)+1,这句话等同于上边的if-else,这里记录的是每一个单词和这个单词出现的次数.

结果如下:

2:排序处理

这一个函数是在上文获取了每一个单词和这个单词出现的次数之后,他不是有顺序的,,在这里我们要用数组的排序来处理一下,数组有一个sort()函数,可以从大到小进行排序.

结果如下:

3:绘图处理:

这里用的matplotlib绘图大家都很熟悉了,绘制出来

其实本来应该下边包含有标签,比如下边:

这样应该是最好的,但是我换了一台电脑后发现最下边的标签实在是太丑了,拥挤不堪,于是就去掉了,如果有兴趣的小伙伴可以自己再加上.

完整代码如下:

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

相关文章

来自专栏IT杂记

根据两经纬度点计算距离公式推导

已知地球上的点E经纬度为(J1, W1),点F经纬度为(J2, W2),求两点间最短的球面距离。

3839
来自专栏DannyHoo的专栏

iOS开发中使用算法之冒泡法

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/u010105969/article/details/...

1153
来自专栏大数据挖掘DT机器学习

用R语言做数据清理(详细教程)

数据的清理 如同列夫托尔斯泰所说的那样:“幸福的家庭都是相似的,不幸的家庭各有各的不幸”,糟糕的恶心的数据各有各的糟糕之处,好的数据集都是相似的。一份好的,干净...

8285
来自专栏小樱的经验随笔

从零基础学三分查找

今晚是我们学长第二次讲课,讲了一个三分!认真听了一下,感觉不是很难,可能会比二分还简单些!我就把上课讲的内容归纳为一篇文章概述吧!以后也会重点讲解的! 简单点说...

44510
来自专栏数据结构与算法

MatrixTree速成

前言 MatrixTree定理是用来解决生成树计数问题的有利工具 比如说这道题 MatrixTree定理的算法流程也非常简单 我们记矩阵A为无向图的度数矩阵 ...

3537
来自专栏数据结构与算法

Vijos / 题库 / 输油管道问题

背景 想念car的GF,car就出了道水题! 描述 某石油公司计划建造一条由东向西的主输油管道。该管道要穿过一个有n 口油井的油田。从每口油井都要有一条输油管道...

37611
来自专栏大数据挖掘DT机器学习

python数据分析师面试题选

python数据分析部分 1. 如何利用SciKit包训练一个简单的线性回归模型 利用linear_model.LinearRegression()函数 #...

6386
来自专栏python读书笔记

《算法图解》note 9 动态规划1.动态规划定义2.与分治法及贪婪算法的区别3.动态规划的后续学习

2085
来自专栏机器之心

教程 | 如何使用TensorFlow中的高级API:Estimator、Experiment和Dataset

选自Medium 作者:Peter Roelants 机器之心编译 参与:李泽南、黄小天 近日,背景调查公司 Onfido 研究主管 Peter Roelant...

7237
来自专栏数据结构与算法

扩展中国剩余定理详解

前言 阅读本文前,推荐先学一下中国剩余定理。其实不学也无所谓,毕竟两者没啥关系 扩展CRT 我们知道,中国剩余定理是用来解同余方程组 但是有一个非常令...

3269

扫码关注云+社区

领取腾讯云代金券