Python 简单实现标签词云

基于Python的词云生成类库,很好用,而且功能强大.博主个人比较推荐 github:https://github.com/amueller/word_cloud 官方地址:https://amueller.github.io/word_cloud/ 写这篇文章花费一个半小时,阅读需要十五分钟,读完本篇文章后您将能上手wordcloud

中文词云与其他要点,我将会在下一篇文章中介绍

快速生成词云

from wordcloud import WordCloud

f = open(u'txt/AliceEN.txt','r').read()
wordcloud = WordCloud(background_color="white",width=1000, height=860, margin=2).generate(f)

# width,height,margin可以设置图片属性

# generate 可以对全部文本进行自动分词,但是他对中文支持不好,对中文的分词处理请看我的下一篇文章
#wordcloud = WordCloud(font_path = r'D:\Fonts\simkai.ttf').generate(f)
# 你可以通过font_path参数来设置字体集

#background_color参数为设置背景颜色,默认颜色为黑色

import matplotlib.pyplot as plt
plt.imshow(wordcloud)
plt.axis("off")
plt.show()

wordcloud.to_file('test.png')
# 保存图片,但是在第三模块的例子中 图片大小将会按照 mask 保存

快速生成词云

自定义字体颜色

这段代码主要来自wordcloud的github,你可以在github下载该例子

#!/usr/bin/env python
"""
Colored by Group Example
========================

Generating a word cloud that assigns colors to words based on
a predefined mapping from colors to words
"""

from wordcloud import (WordCloud, get_single_color_func)
import matplotlib.pyplot as plt

class SimpleGroupedColorFunc(object):
    """Create a color function object which assigns EXACT colors
       to certain words based on the color to words mapping

       Parameters
       ----------
       color_to_words : dict(str -> list(str))
         A dictionary that maps a color to the list of words.

       default_color : str
         Color that will be assigned to a word that's not a member
         of any value from color_to_words.
    """

    def __init__(self, color_to_words, default_color):
        self.word_to_color = {word: color
                              for (color, words) in color_to_words.items()
                              for word in words}

        self.default_color = default_color

    def __call__(self, word, **kwargs):
        return self.word_to_color.get(word, self.default_color)

class GroupedColorFunc(object):
    """Create a color function object which assigns DIFFERENT SHADES of
       specified colors to certain words based on the color to words mapping.

       Uses wordcloud.get_single_color_func

       Parameters
       ----------
       color_to_words : dict(str -> list(str))
         A dictionary that maps a color to the list of words.

       default_color : str
         Color that will be assigned to a word that's not a member
         of any value from color_to_words.
    """

    def __init__(self, color_to_words, default_color):
        self.color_func_to_words = [
            (get_single_color_func(color), set(words))
            for (color, words) in color_to_words.items()]

        self.default_color_func = get_single_color_func(default_color)

    def get_color_func(self, word):
        """Returns a single_color_func associated with the word"""
        try:
            color_func = next(
                color_func for (color_func, words) in self.color_func_to_words
                if word in words)
        except StopIteration:
            color_func = self.default_color_func

        return color_func

    def __call__(self, word, **kwargs):
        return self.get_color_func(word)(word, **kwargs)

text = """The Zen of Python, by Tim Peters
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!"""

# Since the text is small collocations are turned off and text is lower-cased
wc = WordCloud(collocations=False).generate(text.lower())

# 自定义所有单词的颜色
color_to_words = {
    # words below will be colored with a green single color function
    '#00ff00': ['beautiful', 'explicit', 'simple', 'sparse',
                'readability', 'rules', 'practicality',
                'explicitly', 'one', 'now', 'easy', 'obvious', 'better'],
    # will be colored with a red single color function
    'red': ['ugly', 'implicit', 'complex', 'complicated', 'nested',
            'dense', 'special', 'errors', 'silently', 'ambiguity',
            'guess', 'hard']
}

# Words that are not in any of the color_to_words values
# will be colored with a grey single color function
default_color = 'grey'

# Create a color function with single tone
# grouped_color_func = SimpleGroupedColorFunc(color_to_words, default_color)

# Create a color function with multiple tones
grouped_color_func = GroupedColorFunc(color_to_words, default_color)

# Apply our color function
# 如果你也可以将color_func的参数设置为图片,详细的说明请看 下一部分
wc.recolor(color_func=grouped_color_func)

# Plot
plt.figure()
plt.imshow(wc, interpolation="bilinear")
plt.axis("off")
plt.show()

Apply our color function

利用背景图片生成词云,设置停用词词集

该段代码主要来自于wordcloud的github,你同样可以在github下载该例子以及原图片与效果图

#!/usr/bin/env python
"""
Image-colored wordcloud
=======================

You can color a word-cloud by using an image-based coloring strategy
implemented in ImageColorGenerator. It uses the average color of the region
occupied by the word in a source image. You can combine this with masking -
pure-white will be interpreted as 'don't occupy' by the WordCloud object when
passed as mask.
If you want white as a legal color, you can just pass a different image to
"mask", but make sure the image shapes line up.
"""

from os import path
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt

from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator

d = path.dirname(__file__)

# Read the whole text.
text = open(path.join(d, 'alice.txt')).read()

# read the mask / color image taken from
# http://jirkavinse.deviantart.com/art/quot-Real-Life-quot-Alice-282261010
alice_coloring = np.array(Image.open(path.join(d, "alice_color.png")))

# 设置停用词
stopwords = set(STOPWORDS)
stopwords.add("said")

# 你可以通过 mask 参数 来设置词云形状
wc = WordCloud(background_color="white", max_words=2000, mask=alice_coloring,
               stopwords=stopwords, max_font_size=40, random_state=42)
# generate word cloud
wc.generate(text)

# create coloring from image
image_colors = ImageColorGenerator(alice_coloring)

# show
# 在只设置mask的情况下,你将会得到一个拥有图片形状的词云
plt.imshow(wc, interpolation="bilinear")
plt.axis("off")
plt.figure()
# recolor wordcloud and show
# we could also give color_func=image_colors directly in the constructor
# 我们还可以直接在构造函数中直接给颜色
# 通过这种方式词云将会按照给定的图片颜色布局生成字体颜色策略
plt.imshow(wc.recolor(color_func=image_colors), interpolation="bilinear")
plt.axis("off")
plt.figure()
plt.imshow(alice_coloring, cmap=plt.cm.gray, interpolation="bilinear")
plt.axis("off")
plt.show()

展示效果如下:

爱丽丝的原图

按照形状生成词云

按照图片颜色生成词云字体颜色

def friends_signature():
    signature = get_data("Signature")
    wash_signature=[]
    for item in signature:
        #去除emoji表情等非文字
        if "emoji" in item:
            continue
        rep = re.compile("1f\d+\w*|[<>/=【】『』♂ω]")
        item=rep.sub("", item)
        wash_signature.append(item)

    words="".join(wash_signature)

    print(wash_signature)

    wordlist = jieba.cut(words, cut_all=True)
    word_space_split = " ".join(wordlist)

    # 图片的作用:生成的图片是这个图片的两倍大小
    coloring = np.array(Image.open("img/num.jpg"))

    # simkai.ttf 必填项 识别中文的字体,例:simkai.ttf,
    my_wordcloud = WordCloud(background_color="white", max_words=800,
                             mask=coloring, max_font_size=120, random_state=30, scale=2,font_path="fonts/STKAITI.TTF").generate(word_space_split)

    image_colors = ImageColorGenerator(coloring)
    plt.imshow(my_wordcloud.recolor(color_func=image_colors))
    plt.imshow(my_wordcloud)
    plt.axis("off")
    plt.show()

    # 保存图片
    my_wordcloud.to_file('Signature/signature.png')

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

相关文章

来自专栏程序员宝库

如何用 vue 制作一个探探滑动组件

前言 嗨,说起探探想必各位程序汪都不陌生(毕竟妹子很多),能在上面丝滑的翻牌子,探探的的堆叠滑动组件起到了关键的作用,下面就来看看如何用vue写一个探探的堆叠组...

856130
来自专栏生信宝典

CIRCOS圈图绘制 - 染色体信息展示和调整

CIRCOS圈图绘制 - 最简单绘图和解释介绍了CIRCOS的安装、基本的配置文件的解释、如何最简单的获得一个CIRCOS图。最主要的部分还是配置文件的位置信息...

62050
来自专栏coding...

Objective-C 使用核心动画CAAnimation实现动画先来看看效果吧Demo地址

https://github.com/gongxiaokai/CAAnimationDemo

10430
来自专栏macOS 开发学习

cocos2d-objc 3.0+ 游戏开发学习手册(二): CCNode 了解

在cocos2d中,CCNode是最基本的显示对象. 在3.0后的新版本中CCNode继承自CCResponder类,可以响应用户的交互事件(点击,触摸等),也...

9120
来自专栏Python小屋

Python实现批量图片添加数字水印

之前写过一个类似的代码,是把水印信息打散以后随机添加到原图中,并提供了水印信息的提取功能,请参考:Python实现图像空域随机水印加入与提取。本文代码功能:为指...

49130
来自专栏非著名程序员

基础篇章:关于 React Native 之 Slider 组件的讲解

(友情提示:RN学习,从最基础的开始,大家不要嫌弃太基础,会的同学请自行略过,希望不要耽误已经会的同学的宝贵时间) 来,讲这个组件之前,我们先学习一下英文单词,...

32280
来自专栏林德熙的博客

WPF 在image控件用鼠标拖拽出矩形

今天有小伙伴问我一个问题,在image控件用鼠标拖拽出矩形,本文告诉大家如何使用鼠标画出矩形

17410
来自专栏阿凯的Excel

金字塔图绘制(Excel绘制图表系列课程)

今天和大家分享金字塔图的绘制 什么是金字塔图呢?就是长得很像金字塔的图! 哦! 问:那是长这样? ? 答:额,有点像,但是不是! 问:那是怎样? 答:如下图。...

45430
来自专栏十月梦想

html常用标签标记

本博客所有文章如无特别注明均为原创。作者:十月梦想 ,复制或转载请以超链接形式注明转自 十月梦想博客 。 原文地址《html常用标签标记》

23130
来自专栏Objective-C

Swift-图像的性能优化

31770

扫码关注云+社区

领取腾讯云代金券