我想得到所有的温度/温度范围,两者之间有和没有空位。现在,我能够在它们之间没有空格的情况下使用:
re.findall(r'[0-9°c-]+', text)

我需要在正则表达式中添加什么,这样我就可以正确地得到它们之间有空格的那些?50空间°空间C应该被看作是一个整体,而不是三个部分。
发布于 2019-05-08 13:57:27
你可以用
-?\d+(?:\.\d+)?\s*°\s*c(?:\s*-\s*-?\d+(?:\.\d+)?\s*°\s*c)?见regex演示。
该模式由一个-?\d+(?:\.\d+)?\s*°\s*c块组成,该块被重复两次(以匹配可选的范围部分),并匹配负的和分数的温度值:
-? -一个可选的连字符\d+ - 1+数字(?:\.\d+)? -一个可选的小数部分\s* - 0+白空间° -度符号\s* - 0+白空间c - c char.(?:\s*-\s*<ABOVE_BLOCK>)?匹配包含0+空白空间的连字符的1或0次重复,然后匹配上面描述的相同块。
在Python中,动态构建模式是有意义的:
tb = r'-?\d+(?:\.\d+)?\s*°\s*c'
rx = r'{0}(?:\s*-\s*{0})?'.format(tb)
results = re.findall(rx, s)如果c是可选的,则将\s*c替换为(?:\s*c)?。
如果°和c是可选的,则将\s*°\s*c替换为(?:\s*°\s*c)?或(?:\s*°(?:\s*c)?)?。
下面是温度块模式,其中度数符号和c字符都是可选的,但遵循与前面相同的顺序:
tb = r'-?\d+(?:\.\d+)?(?:\s*°(?:\s*c)?)?'import re
s = 'This is some temperature 30° c - 50 ° c 2°c 34.5 °c 30°c - 40 °c and "30° - 40, and -45.5° - -56.5° range'
tb = r'-?\d+(?:\.\d+)?(?:\s*°(?:\s*c)?)?'
rx = r'{0}(?:\s*-\s*{0})?'.format(tb)
results = re.findall(rx, s)
print(results)
# => ['30° c - 50 ° c', '2°c', '34.5 °c', '30°c - 40 °c', '30° - 40', '-45.5° - -56.5°']如果度符号可能会丢失,而c可能仍然在那里,则移动分组边界:
tb = r'-?\d+(?:\.\d+)?(?:\s*°)?(?:\s*c)?'
^-------^^-------^参见这个regex演示和完整的Python代码演示
import re
s = 'This is some temperature 30° c - 50 ° c 2°c 34.5 °c 30°c - 40 °c and "30° - 40, and -45.5° - -56.5° range 30c - 50 °c" or 30c - 40'
tb = r'-?\d+(?:\.\d+)?(?:\s*°)?(?:\s*c)?'
rx = r'{0}(?:\s*-\s*{0})?'.format(tb)
results = re.findall(rx, s)
print(results)输出:
['30° c - 50 ° c', '2°c', '34.5 °c', '30°c - 40 °c', '30° - 40', '-45.5° - -56.5°', '30c - 50 °c', '30c - 40']发布于 2019-05-08 13:42:29
尝试使用以下模式:
\d+°c(?:\s*-\d+°c)?示例脚本:
input = "It is 50°c today. One range is 30°c-40°c and here is another 10°c -20°c"
matches = re.findall(r'\d+°c(?:\s*-\d+°c)?', input)
print(matches)
['50\xc2\xb0c', '30\xc2\xb0c-40\xc2\xb0c', '10\xc2\xb0c -20\xc2\xb0c']发布于 2019-05-08 13:59:53
这句话可能会帮助您这样做:
(([0-9°c\s]+)(?:-[0-9°]+c))|([0-9°\s]+c)

图表
此图显示了表达式的工作方式,如果您想知道的话,您可以在这个链接中可视化其他表达式:

示例测试
const regex = /(([0-9°c\s]+)(?:-[0-9°]+c))|([0-9°\s]+c)/gm;
const str = `This is some temperature 30°c-40°c. 50 ° c. 30°c -40°c`;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}
Python测试
# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility
import re
regex = r"(([0-9°c\s]+)(?:-[0-9°]+c))|([0-9°\s]+c)"
test_str = "This is some temperature 30°c-40°c. 50 ° c. 30°c -40°c"
matches = re.finditer(regex, test_str, re.MULTILINE)
for matchNum, match in enumerate(matches, start=1):
print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))
for groupNum in range(0, len(match.groups())):
groupNum = groupNum + 1
print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))
# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.https://stackoverflow.com/questions/56042140
复制相似问题