专栏首页云时之间python学习之文章数据分析

python学习之文章数据分析

通常我们在进行NLP学习的时候,会经常的处理一些语料,同时也会对这些语料进行一些分析,今天的这篇文章我们通过分析quora上的Andrew NG的一个回答来实际操作一下:

原文复制如下:

Deep Learning is an amazing tool that is helping numerous groups create exciting AI applications. It is helping us build self-driving cars, accurate speech recognition, computers that can understand images, and much more.Despite all the recent progress, I still see huge untapped opportunities ahead. There're many projects in precision agriculture, consumer finance, medicine, ... where I see a clear opportunity for deep learning to have a big impact, but that none of us have had time to focus on yet. So I'm confident deep learning isn't going to "plateau" anytime soon and that it'll continue to grow rapidly.Deep Learning has also been overhyped. Because neural networks are very technical and hard to explain, many of us used to explain it by drawing an analogy to the human brain. But we have pretty much no idea how the biological brain works. UC Berkeley's Michael Jordan calls deep learning a "cartoon" of the biological brain--a vastly oversimplified version of something we don't even understand--and I agree. Despite the media hype, we're nowhere near being able to build human-level intelligence. Because we fundamentally don't know how the brain works, attempts to blindly replicate what little we know in a computer also has not resulted in particularly useful AI systems. Instead, the most effective deep learning work today has made its progress by drawing from CS and engineering principles and at most a touch of biological inspiration, rather than try to blindly copy biology.Concretely, if you hear someone say "The brain does X. My system also does X. Thus we're on a path to building the brain," my advice is to run away!Many of the ideas used in deep learning have been around for decades. Why is it taking off only now? Two of the key drivers of its progress are: (i) scale of data and (ii) scale of computation. With our society spending more time on websites and mobile devices, for the past two decades we've been rapidly accumulating data. It was only recently that we figured out how to scale computation so as to build deep learning algorithms that can take advantage of this voluminous amount of data.This has now put us in two positive feedback loops, which is accelerating the progress of deep learning:First, now that we have huge machines to absorb huge amounts of data, the value of big data is clearer. This creates a greater incentive to acquire more data, which in turn creates a greater incentive to build bigger/faster neural networks.Second, that we have fast deep learning implementations also speeds up innovation, and accelerates deep learning's research progress. Many people underestimate the impact of computer systems investments in deep learning. When carrying out deep learning research, we start out not knowing what algorithms will and won't work, and our job is to run a lot of experiments and figure it out. If we have an efficient compute infrastructure that lets you run an experiment in a day rather than a week, then your research progress could be almost 7x as fast!This is why around 2008 my group at Stanford started advocating shifting deep learning to GPUs (this was really controversial at that time; but now everyone does it); and I'm now advocating shifting to HPC (High Performance Computing/Supercomputing) tactics for scaling up deep learning. Machine learning should embrace HPC. These methods will make researchers more efficient and help accelerate the progress of our whole field.To summarize: Deep learning has already helped AI made tremendous progress. But the best is still to come!

我们分析这篇文章有两个需求,一个是分析一篇文章当中的词频,另外一个是每一个词出现的次数,而我们也将奔着这两个目标去处理:

这里我们要用到matplotlib这个模块来进行图像的绘制:

1:分词处理

英文文章一个好处是他们每个词之间会有空格来进行区分,但是词和词之间往往会有句号,逗号这样的标点来去干扰,因此我们是通过string这个模块来去除标点和空格,其中string.punctuation是去除标点,string.whitespace是去除空格.至于hist[word]=hist.get(word,0)+1,这句话等同于上边的if-else,这里记录的是每一个单词和这个单词出现的次数.

结果如下:

2:排序处理

这一个函数是在上文获取了每一个单词和这个单词出现的次数之后,他不是有顺序的,,在这里我们要用数组的排序来处理一下,数组有一个sort()函数,可以从大到小进行排序.

结果如下:

3:绘图处理:

这里用的matplotlib绘图大家都很熟悉了,绘制出来

其实本来应该下边包含有标签,比如下边:

这样应该是最好的,但是我换了一台电脑后发现最下边的标签实在是太丑了,拥挤不堪,于是就去掉了,如果有兴趣的小伙伴可以自己再加上.

完整代码如下:

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

我来说两句

0 条评论
登录 后参与评论

相关文章

  • 深度学习与TensorFlow:FCN论文翻译

    这篇论文跟上一篇的VGG论文一样,在深度学习领域同样的经典,在2015年的CVPR,该论文拿到了best paper候选的论文,在之后的PASCAL VOC20...

    云时之间
  • 深度学习与TensorFlow:FCN论文翻译(二)

    Each layer of data in a convnet is a three-dimensional array of size h × w × d, ...

    云时之间
  • 深度学习与TensorFlow:FCN论文翻译(三)

    We test our FCN on semantic segmentation and scene parsing, exploring PASCAL VOC...

    云时之间
  • 深度神经网络的捷径学习问题(CS Computer Vision and Patter Recognition)

    深度学习促使了人工智能的崛起,并且是当代机器智能的主力军。大量的成功案例讯速遍布了整个科学界、工业界以及社会,但是它的局限直到最近才得到关注。从局限性角度来看,...

    Donuts_choco
  • 深度抽象强化学习-提高抽象学习能力-论文解读

    Deep reinforcement learning (DRL) brings the power of deep neural networks to be...

    用户1908973
  • 动物学习启发的预测问题(AL)

    我们提出了在动物学习实验后建模的三个问题,这些实验旨在测试在线状态构造或表示学习算法。我们的测试问题要求学习系统构建其过去与世界互动的紧凑摘要,以便预测未来,在...

    田冠宇
  • 机器学习神书推荐 Hands on Machine Learning

    本次为大家推荐的是一本机器学习神书英文原版《Hands-On Machine Learning with Scikit-Learn and TensorFlow...

    算法与编程之美
  • 通过游戏促进孩童对计算机知识的掌握(CS CY)

    学习编程,或者更广泛地说,学习计算机科学相关知识是一个日益扩大的活动和研究领域。在计算思维的标签下,计算机相关概念在计算机科学之外的许多学科领域越来越多地被用作...

    Elva
  • Tencent Joins the GPL Cooperation Commitment

    ? Hong Kong, 07 November, 2018 – Tencent, a leading provider of Internet servic...

    腾讯开源
  • 柏拉图对话系统:一个灵活的人工智能会话研究平台(cs AI)

    随着语音对话系统和会话人工智能领域的发展,对工具和环境的需求也在增长,这些工具和环境可以抽象出实现细节,从而加快开发过程,降低进入该领域的门槛,并为新思想提供一...

    RockNPeng

扫码关注云+社区

领取腾讯云代金券