在过去的一个月里,我一直在自学Python (0编码经验python是我的第一种编码语言),并最终编写了我的第一份与工作相关的可用代码,我正在尝试改进此代码以供重复使用,因为它将基于注释的xlx数据转换为txt 'string类型‘文件,最后是一个wordcloud;您可以在下面找到可行的代码:
代码是如何工作的:
step1. xlsx file = 4 column excel worksheet
step2. python extracts all column 'B'
step3. converts it into 'Str' Format , removes spaces & converts into txt file
step4. wordcloud removes words using stopWords,
Step5. generates wordcloud according to the format
我想用一种方式来改进它:
如果有人有更好的精炼方法,请告诉我,我对此非常陌生,所以如果您需要任何其他信息来澄清任何信息,请告诉我。
如能提供任何帮助,将不胜感激,谢谢
import openpyxl as xl
import wordcloud
from wordcloud import WordCloud,STOPWORDS
from matplotlib.pyplot import imread
import jieba
import pandas as pd
# opening the source excel file ( repeated steps needed for every different document)
filename = "C:\\Users\\shakesmilk\\Desktop\\staub\\staub天猫商品评论.xlsx"
wb1 = xl.load_workbook(filename)
ws1 = wb1.worksheets[0]
# opening the destination excel file ( repeated steps needed for every different document)
filename1 = "C:\\Users\\shakesmilk\\Desktop\\staub\\staub天猫商品评论.xlsx"
wb2 = xl.load_workbook(filename1)
wb2.create_sheet('Sheet2')
ws2 = wb2.worksheets[1]
# calculate total number of rows and
# columns in source excel file
mr = ws1.max_row
mc = ws1.max_column
minr= ws2.min_row
# # copying the cell values from source
# # excel file to destination excel file
for i in range(1, mr + 1):
for j in range(0, mc + 1):
# reading cell value from source excel file
c = ws1.cell(row=i+1, column=2)
# writing the read value to destination excel file
ws2.cell(row=i+1, column=2).value = c.value
# # #deleting first empty row/ column
ws2.delete_cols(1)
#saving the destination excel file
wb2.save(str(filename1))
# #converting sheet 2 with pandas to txt file
df = pd.read_excel(filename,sheet_name=1)
with open("C:\\Users\\shakesmilk\\Desktop\\staub\\file.txt", mode='w',encoding='utf-8') as outfile:
df.to_string(outfile,header = None ,index = None)
#open read & remove spaces from txt file
commentfiletxt= "C:\\Users\\shakesmilk\\Desktop\\staub\\file.txt"
with open(commentfiletxt, 'r' , encoding='utf-8') as f:
lines = f.readlines()
# # remove spaces
lines = [line.replace(' ', '') for line in lines]
# # finally, write lines in the file
with open(commentfiletxt,'w', encoding='utf-8') as f :
f.writelines(lines)
# txt file generated > next to create wordcloud
#wordcloud start
#remove words from wordcloud
stopwords= set(STOPWORDS)
stopwords.update(['此用户没有填写评论', 'hellip','zwj','其他特色','还没用','非常喜欢','产品功能','没有用'])
mask = imread('moon.jpg')
with open(commentfiletxt, 'r',encoding='utf-8') as file:
text = file.read()
words = jieba.lcut(text) # 精确分词
newtxt = ' '.join(words) # 空格拼接
wd = wordcloud.WordCloud(stopwords=stopwords,\
font_path="MSYH.TTC",\
background_color="white", \
width=800, \
height=300, \
max_words=500, \
max_font_size=200, \
mask = mask, \
).generate(text)
# save picture
txt = open(commentfiletxt, mode='r', encoding='utf-8')
# save picture
wd.to_file('staub2.png')
发布于 2021-03-13 03:58:33
我继续阅读“学习Python第5版”(LearningPython5Edition),而且函数显然是一个让代码可重用的好方法;我想没有人对noob代码感兴趣,但我相信有很多初学者正在努力完善他们的代码,所以我在回答我的问题,在我完成这本书的过程中,希望这能帮助那些需要帮助的人。p.s目前正在阅读有关类的内容,我猜想我可以进一步转换这段代码,但就目前而言,对于那些需要帮助的人来说,这是我的“def”示例:
import openpyxl as xl
import wordcloud
from wordcloud import WordCloud,STOPWORDS
from matplotlib.pyplot import imread
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np
import jieba
import pandas as pd
def open_excel(filename):
global wb1,ws1,filenametxt
wb1 =xl.load_workbook(filename)
ws1 = wb1.worksheets[0]
filenametxt = filename
print('Loading WorkBook Completed')
def create_sheet(filename1):
global wb2,ws2
wb2 = xl.load_workbook(filename1)
wb2.create_sheet('Sheet2')
ws2 = wb2.worksheets[1]
print('Sheet 2 Created')
mr = ws1.max_row
mc = ws1.max_column
minr = ws2.min_row
for i in range(1, mr + 1):
for j in range(0, mc + 1):
# reading cell value from source excel file
c = ws1.cell(row=i + 1, column=2)
ws2.cell(row=i + 1, column=2).value = c.value
wb2.save(filename1)
print("Data Extracted To 'Column B'" )
ws2.delete_cols(1)
print('Empty Space in Column 1 Deleted')
wb2.save(filename1)
def create_txtf(tfile):
global df
df = pd.read_excel(filenametxt, sheet_name=1)
with open(tfile, mode='w', encoding='utf-8') as outfile:
df.to_string(outfile, header=None, index=None)
print('txt file created as file.txt')
def convert_remove(tfile1):
#open read & remove spaces from txt file
global commentfiletxt
commentfiletxt = tfile1
with open(commentfiletxt, 'r', encoding='utf-8') as f:
lines = f.readlines()
# # remove spaces
lines = [line.replace(' ', '') for line in lines]
print('Empty Spaces ' ' are removed' )
# # finally, write lines in the file
with open(commentfiletxt, 'w', encoding='utf-8') as f:
f.writelines(lines)
print('Data is written correctly without spaces')
return(tfile1)
def wordcloudpic(picname,maskpathn):
stopwords = set(STOPWORDS)
stopwords.update(['此用户没有填写评论', 'hellip','zwj','其他特色','还没用','非常喜欢','产品功能','没有用','东西收到了','S','sode','c','s左右','u','middot','u','theta','rdquo','ldquo','ec','ok','好评','不错','很好','满意','好用','老板大气','好',\
'nbsp'])
mask = imread(maskpathn)
mask = mask.astype(np.uint8)
with open(commentfiletxt, 'r',encoding='utf-8') as file:
text = file.read()
words = jieba.lcut(text) # 精确分词
newtxt = ' '.join(words) # 空格拼接
wd = wordcloud.WordCloud(stopwords=stopwords,\
font_path="MSYH.TTC",\
background_color="white", \
width=800, \
height=300, \
max_words=500, \
max_font_size=200, \
mask = mask, \
).generate(text)
txt = open(commentfiletxt, mode='r', encoding='utf-8')
# save picture
wd.to_file(picname)
if __name__ == "__main__":
open_excel("C:\\Users\\shakesmilk\\Desktop\\testtest\\test天猫商品评.xlsx")
create_sheet("C:\\Users\\shakesmilk\\Desktop\\testtest\\test天猫商品评.xlsx")
create_txtf("C:\\Users\\shakesmilk\\Desktop\\testtest\\file.txt")
convert_remove("C:\\Users\\shakesmilk\\Desktop\\testtest\\file.txt")
wordcloudpic('test.png','bubble.jpg')
https://stackoverflow.com/questions/66310932
复制相似问题