文章/答案/技术大牛

发布

社区首页 >问答首页 >修改Python代码以便重复使用(Xlsx到Txt文件和WordCloud )

问修改Python代码以便重复使用(Xlsx到Txt文件和WordCloud )
EN

Stack Overflow用户

提问于 2021-02-22 06:24:21

回答 1查看 68关注 0票数 0

在过去的一个月里，我一直在自学Python (0编码经验python是我的第一种编码语言)，并最终编写了我的第一份与工作相关的可用代码，我正在尝试改进此代码以供重复使用，因为它将基于注释的xlx数据转换为txt 'string类型‘文件，最后是一个wordcloud；您可以在下面找到可行的代码：

代码是如何工作的：

step1. xlsx file = 4 column excel worksheet 
step2. python extracts all column 'B'  
step3. converts it into 'Str' Format , removes spaces & converts into txt file 
step4. wordcloud removes words using stopWords,
Step5. generates wordcloud according to the format

我想用一种方式来改进它：

通过简单的步骤更改文件目录，而不是多次复制和粘贴目录名(跳过手动更改所有文件目录)
txt文件的名称创建基于xlsx文件的名称(因此我不必每次手动输入)

如果有人有更好的精炼方法，请告诉我，我对此非常陌生，所以如果您需要任何其他信息来澄清任何信息，请告诉我。

如能提供任何帮助，将不胜感激，谢谢

import openpyxl as xl
import wordcloud
from wordcloud import WordCloud,STOPWORDS
from matplotlib.pyplot import imread
import jieba
import pandas as pd

# opening the source excel file ( repeated steps needed for every different document)
filename = "C:\\Users\\shakesmilk\\Desktop\\staub\\staub天猫商品评论.xlsx"
wb1 = xl.load_workbook(filename)
ws1 = wb1.worksheets[0]


# opening the destination excel file ( repeated steps needed for every different document)
filename1 = "C:\\Users\\shakesmilk\\Desktop\\staub\\staub天猫商品评论.xlsx"
wb2 = xl.load_workbook(filename1)
wb2.create_sheet('Sheet2')
ws2 = wb2.worksheets[1]

#  calculate total number of rows and
#  columns in source excel file
mr = ws1.max_row
mc = ws1.max_column
minr= ws2.min_row

# # copying the cell values from source
# # excel file to destination excel file
for i in range(1, mr + 1):
        for j in range(0, mc + 1):
                # reading cell value from source excel file
                c = ws1.cell(row=i+1, column=2)

                # writing the read value to destination excel file
                ws2.cell(row=i+1, column=2).value = c.value
# # #deleting first empty row/ column

ws2.delete_cols(1)
#saving the destination excel file
wb2.save(str(filename1))

# #converting sheet 2 with pandas to txt file
df = pd.read_excel(filename,sheet_name=1)

with open("C:\\Users\\shakesmilk\\Desktop\\staub\\file.txt", mode='w',encoding='utf-8') as outfile:
        df.to_string(outfile,header = None ,index = None)

#open read & remove spaces from txt file
commentfiletxt= "C:\\Users\\shakesmilk\\Desktop\\staub\\file.txt"

with open(commentfiletxt, 'r' , encoding='utf-8') as f:
    lines = f.readlines()
# # remove spaces
lines = [line.replace(' ', '') for line in lines]
# # finally, write lines in the file
with open(commentfiletxt,'w', encoding='utf-8') as f :
        f.writelines(lines)



# txt file generated > next to create wordcloud


#wordcloud start
#remove words from wordcloud
stopwords= set(STOPWORDS)
stopwords.update(['此用户没有填写评论', 'hellip','zwj','其他特色','还没用','非常喜欢','产品功能','没有用']) 

mask = imread('moon.jpg')
with open(commentfiletxt, 'r',encoding='utf-8') as file:
    text = file.read()
    words = jieba.lcut(text)  # 精确分词
    newtxt = ' '.join(words)  # 空格拼接
    wd = wordcloud.WordCloud(stopwords=stopwords,\
                        font_path="MSYH.TTC",\
                        background_color="white", \
                        width=800, \
                        height=300, \
                        max_words=500, \
                        max_font_size=200, \
                        mask = mask, \
                        ).generate(text)
# save picture
txt = open(commentfiletxt, mode='r', encoding='utf-8')
# save picture
wd.to_file('staub2.png')

python-3.x

pandas

data-conversion

回答 1

Stack Overflow用户

发布于 2021-03-13 03:58:33

我继续阅读“学习Python第5版”(LearningPython5Edition)，而且函数显然是一个让代码可重用的好方法；我想没有人对noob代码感兴趣，但我相信有很多初学者正在努力完善他们的代码，所以我在回答我的问题，在我完成这本书的过程中，希望这能帮助那些需要帮助的人。p.s目前正在阅读有关类的内容，我猜想我可以进一步转换这段代码，但就目前而言，对于那些需要帮助的人来说，这是我的“def”示例：

import openpyxl as xl
import wordcloud
from wordcloud import WordCloud,STOPWORDS
from matplotlib.pyplot import imread
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np

import jieba
import pandas as pd



def open_excel(filename):
    global wb1,ws1,filenametxt
    wb1 =xl.load_workbook(filename)
    ws1 = wb1.worksheets[0]
    filenametxt = filename
    print('Loading WorkBook Completed')
def create_sheet(filename1):
    global wb2,ws2
    wb2 = xl.load_workbook(filename1)
    wb2.create_sheet('Sheet2')
    ws2 = wb2.worksheets[1]
    print('Sheet 2 Created')

    mr = ws1.max_row
    mc = ws1.max_column
    minr = ws2.min_row

    for i in range(1, mr + 1):
        for j in range(0, mc + 1):
         # reading cell value from source excel file
            c = ws1.cell(row=i + 1, column=2)
            ws2.cell(row=i + 1, column=2).value = c.value
            wb2.save(filename1)
    print("Data Extracted To 'Column B'" )
    ws2.delete_cols(1)
    print('Empty Space in Column 1 Deleted')
    wb2.save(filename1)


def create_txtf(tfile):
    global df
    df = pd.read_excel(filenametxt, sheet_name=1)
    with open(tfile, mode='w', encoding='utf-8') as outfile:
        df.to_string(outfile, header=None, index=None)
        print('txt file created as file.txt')

def convert_remove(tfile1):
    #open read & remove spaces from txt file
    global commentfiletxt
    commentfiletxt = tfile1

    with open(commentfiletxt, 'r', encoding='utf-8') as f:
        lines = f.readlines()
    # # remove spaces
    lines = [line.replace(' ', '') for line in lines]
    print('Empty Spaces ' ' are removed' )
    # # finally, write lines in the file
    with open(commentfiletxt, 'w', encoding='utf-8') as f:
        f.writelines(lines)
        print('Data is written correctly without spaces')
    return(tfile1)


def wordcloudpic(picname,maskpathn):

    stopwords = set(STOPWORDS)
    stopwords.update(['此用户没有填写评论', 'hellip','zwj','其他特色','还没用','非常喜欢','产品功能','没有用','东西收到了','S','sode','c','s左右','u','middot','u','theta','rdquo','ldquo','ec','ok','好评','不错','很好','满意','好用','老板大气','好',\
                      'nbsp'])
    mask = imread(maskpathn)
    mask = mask.astype(np.uint8)

    with open(commentfiletxt, 'r',encoding='utf-8') as file:
        text = file.read()
        words = jieba.lcut(text)  # 精确分词
        newtxt = ' '.join(words)  # 空格拼接
        wd = wordcloud.WordCloud(stopwords=stopwords,\
                        font_path="MSYH.TTC",\
                        background_color="white", \
                        width=800, \
                        height=300, \
                        max_words=500, \
                        max_font_size=200, \
                        mask = mask, \
                        ).generate(text)

    txt = open(commentfiletxt, mode='r', encoding='utf-8')
    #  save picture
    wd.to_file(picname)


if __name__ == "__main__":
     open_excel("C:\\Users\\shakesmilk\\Desktop\\testtest\\test天猫商品评.xlsx")
     create_sheet("C:\\Users\\shakesmilk\\Desktop\\testtest\\test天猫商品评.xlsx")
     create_txtf("C:\\Users\\shakesmilk\\Desktop\\testtest\\file.txt")
     convert_remove("C:\\Users\\shakesmilk\\Desktop\\testtest\\file.txt")
     wordcloudpic('test.png','bubble.jpg')

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/66310932

复制

相似问题

问修改Python代码以便重复使用(Xlsx到Txt文件和WordCloud )
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问修改Python代码以便重复使用(Xlsx到Txt文件和WordCloud )EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问修改Python代码以便重复使用(Xlsx到Txt文件和WordCloud )
EN