在前面的问题之后,如果某个列表中的搜索词在字符串中返回,则我尝试使用代码返回字符串,如下所示。
import re
from nltk import tokenize
from nltk.tokenize import sent_tokenize
def foo():
List1 = ['risk','cancer','ocp','hormone','OCP',]
txt = "Risk factors for breast cancer have been well characterized. Breast cancer is 100 times more frequent in women than in men.\
Factors associated with an increased exposure to estrogen have also been elucidated including early menarche, late menopause, later age\
at first pregnancy, or nulliparity. The use of hormone replacement therapy has been confirmed as a risk factor, although mostly limited to \
the combined use of estrogen and progesterone, as demonstrated in the WHI (2). Analysis showed that the risk of breast cancer among women using \
estrogen and progesterone was increased by 24% compared to placebo. A separate arm of the WHI randomized women with a prior hysterectomy to \
conjugated equine estrogen (CEE) versus placebo, and in that study, the use of CEE was not associated with an increased risk of breast cancer (3).\
Unlike hormone replacement therapy, there is no evidence that oral contraceptive (OCP) use increases risk. A large population-based case-control study \
examining the risk of breast cancer among women who previously used or were currently using OCPs included over 9,000 women aged 35 to 64 \
(half of whom had breast cancer) (4). The reported relative risk was 1.0 (95% CI, 0.8 to 1.3) among women currently using OCPs and 0.9 \
(95% CI, 0.8 to 1.0) among prior users. In addition, neither race nor family history was associated with a greater risk of breast cancer among OCP users."
words = txt
corpus = " ".join(words).lower()
sentences1 = sent_tokenize(corpus)
a = [" ".join([sentences1[i-1],j]) for i,j in enumerate(sentences1) if [item in List1] in word_tokenize(j)]
for i in a:
print i,'\n','\n'
foo()
问题是python空闲不打印任何东西。我能做错什么呢?它所做的就是运行代码,我得到了
发布于 2015-11-08 00:34:29
你的问题我不太清楚,所以如果我搞错了,请纠正我。您是否试图将关键字列表(在list1中)与文本(在txt中)匹配?那是,
我没有编写一个复杂的正则表达式来解决您的问题,而是将其分解为两部分。
首先,我把所有的文本分解成一个句子列表。然后编写简单的正则表达式来遍历每一个句子。这种方法的麻烦在于它不是很有效率,但是它解决了您的问题。
希望这一小块代码能帮助您找到真正的解决方案。
def foo():
List1 = ['risk','cancer','ocp','hormone','OCP',]
txt = "blah blah blah - truncated"
words = txt
matches = []
sentences = re.split(r'\.', txt)
keyword = List1[0]
pattern = keyword
re.compile(pattern)
for sentence in sentences:
if re.search(pattern, sentence):
matches.append(sentence)
print("Sentence matching the word (" + keyword + "):")
for match in matches:
print (match)
-产生随机数
from random import randint
List1 = ['risk','cancer','ocp','hormone','OCP',]
print(randint(0, len(List1) - 1)) # gives u random index - use index to access List1
https://stackoverflow.com/questions/33591551
复制相似问题