问如何使用string.punctuation删除文本文件中的标点符号
EN

Stack Overflow用户

提问于 2018-06-02 04:58:31

回答 1查看 1.4K关注 0票数 0

我做了一个函数来计算我以纯文本格式下载的一本书中最常见的20个单词。我即将离开的python教科书上说使用import string，然后使用replace或translate方法来删除任何标点符号，但是当我打印出替换步骤之后的行时，所有行中仍然有标点符号。我尝试在line = line.strip()和line = line.replace(string.punctuation,'')步骤之间移动，但不起作用。我从来没有使用过replace，所以据我所知，我可能用错了。我的程序的其余部分都可以工作，只是这一步让我很沮丧。

import string
def function():
    infile = open('gutbook.txt','r',encoding='utf-8')
    count = dict()
    list2 = list()
    for line in infile:
        line = line.strip()
        line = line.replace(string.punctuation,'')
        line = line.lower().split()
        if line== []:
            continue
        for i in line:
            count[i] = count.get(i,0) + 1
    for key,value in count.items():
        newtuple = (value,key)
        list2.append(newtuple)
    list3 = sorted(list2,reverse = True)
    print(list3[:20])



function()

python

replace

回答 1

Stack Overflow用户

发布于 2018-06-02 05:03:21

使用正则表达式。

Ex:

import re
import string

text = "Hello ! #$%&'()*+,-./:;<=>?@[\]^_`{|}~ World"
print(re.sub("[" + re.escape(string.punctuation) + "]", "", text))
#or
print( re.sub(r'[^a-zA-Z0-9\s]','',text) )

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/50650921

复制

相似问题

问如何使用string.punctuation删除文本文件中的标点符号
EN

回答 1

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何使用string.punctuation删除文本文件中的标点符号EN

回答 1

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何使用string.punctuation删除文本文件中的标点符号
EN