我有一个SKU名称列表,需要将缩写解析为单词。
缩写的长度不同(2-5个字符),但与实际单词的顺序一致。
下面是几个例子:
SKU名称:"235 DSKTP 10LB“-> "Desktop”
SKU名称:"222840 MSE 2 2oz“->”鼠标“
其他注释:
我尝试过一些正则表达式,但都没有用。
是否有类似于d? Is ?k?t?o?p的正则表达式模式?
发布于 2019-06-19 04:13:57
import re
from collections import OrderedDict
data = '''
235 DSKTP 10LB
222840 MSE 2oz
1234 WNE 1L
12345 XXX 23L
RND PTT GNCH 16 OZ 007349012845
FRN SHL CNCH 7.05 OZ 007473418910
TWST CLNT 16 OZ 00733544
'''
words = ['Desktop',
'Mouse',
'Tree',
'Wine',
'Gnocchi',
'Shells',
'Cellentani']
def compare(sku_abbr, full_word):
s = ''.join(c for c in full_word if c not in set(sku_abbr) ^ set(full_word))
s = ''.join(OrderedDict.fromkeys(s).keys())
return s == sku_abbr
for full_sku in data.splitlines():
if not full_sku:
continue
for sku_abbr in re.findall(r'([A-Z]{3,})', full_sku):
should_break = False
for w in words:
if compare(sku_abbr.upper(), w.upper()):
print(full_sku, w)
should_break = True
break
if should_break:
break
else:
print(full_sku, '* NOT FOUND *')
打印:
235 DSKTP 10LB Desktop
222840 MSE 2oz Mouse
1234 WNE 1L Wine
12345 XXX 23L * NOT FOUND *
RND PTT GNCH 16 OZ 007349012845 Gnocchi
FRN SHL CNCH 7.05 OZ 007473418910 Shells
TWST CLNT 16 OZ 00733544 Cellentani
发布于 2019-06-19 04:27:39
您可以创建一个将缩写与实际单词相关联的字典:
import re
names = ["235 DSKTP 10LB", "222840 MSE 2oz"]
abbrs = {'DSKTP':'Desktop', 'MSE':'Mouse'}
matched = [re.findall('(?<=\s)[a-zA-Z]+(?=\s)', i) for i in names]
result = ['N/A' if not i else abbrs.get(i[0], i[0]) for i in matched]
输出:
['Desktop', 'Mouse']
https://stackoverflow.com/questions/56656228
复制相似问题