文章/答案/技术大牛

发布

社区首页 >问答首页 >去掉带括号的注释，regex

问去掉带括号的注释，regex
EN

Stack Overflow用户

提问于 2022-01-07 17:16:46

回答 3查看 98关注 0票数 1

因此，我有一个300+页面文档，我想删除我所写的所有注释，这些注释包含在"(“和")”中。由于我有时还嵌套多个音符，"[(blah [(blah (blah(Blah)(Blah)]])“，我需要确保我不只是删除”(blah [(blah (Blah))“。

所以，要做到这一点，我不确定什么是最有效的.这是个很大的工作。我想到的是，我可以检查看没有两个连续的"[("，其中有一个".*“)，只需删除简单的”(.)“。不过，我希望有比这更好的办法。

我认为我使用的两个正则表达式代码类似于"/(?<=[()\s\S*(?=()/gi"和"/(?[([().*/gi".。像那样吗？对不起，我还在努力弄清楚这些东西。

另外，我可以编写一个python程序来打开OpenOffice (odt)文件并编辑它吗？"open(r'C:\Users\Blah\Documents\Blah.odt'，'rw').read()“也适用于此，对吗？

python

regex

loops

回答 3

Stack Overflow用户

发布于 2022-02-05 18:27:51

或者，您也可以使用拟解析。

import pyparsing as pp

pattern = pp.ZeroOrMore(pp.Regex(r'.*?(?=\[\()') + pp.Suppress(pp.nested_expr('[(', ')]'))) + pp.Regex(r'.*')
pattern = pattern.leave_whitespace()

txt = ''
result = ''.join(pattern.parse_string(txt))
assert result == ''

txt = 'blah'
result = ''.join(pattern.parse_string(txt))
assert result == 'blah'

txt = 'blah\nblah'
result = ''.join(pattern.parse_string(txt))
assert result == 'blah'

txt = '[(blah [(blah [(blah)])] )]'
result = ''.join(pattern.parse_string(txt))
assert result == ''

txt = ' blah [] blah () blah [( blah [] blah () )] blah [[]] blah (()) blah ([]) blah '
result = ''.join(pattern.parse_string(txt))
assert result == ' blah [] blah () blah  blah [[]] blah (()) blah ([]) blah '

txt = 'a[(b[(c)])]d[()]e[(f[(g[(h)]i[(j)])]k[(l[(m)])])n[(o)])]p[(q[(r)]s)]t[(u[(v[(w)]x[(y)]z)])]!'
result = ''.join(pattern.parse_string(txt))
assert result == 'adept!'

* pyparsing可以由pip install pyparsing安装

注意：

如果[()] (例如，a[(b[(c)]、a[(b)]c)]等)中的一对括号被打断，则会获得意外的结果或引发IndexError。所以要小心使用。(见：Python在短语中提取字符串)

票数 1

Stack Overflow用户

发布于 2022-01-07 17:33:54

请通过以下方式查看：

a = '[(blah [(blah [(blah)])] )]'

x = re.compile(r'([\[])(.*?)([\]])')
remove_text = re.sub(x, r'', a)

票数 0

Stack Overflow用户

发布于 2022-01-07 20:34:46

一种方法是反复删除(用空字符串替换匹配)这样的子句，这些子句在没有更多的替换之前不包含子句。如果级别的最大数量为n，则将进行n+1迭代。要匹配的正则表达式如下：

\[\((?:(?!\[\().)*?\)\]

演示

考虑一下字符串：

begin [(Mary [(had [(a )]lil' [(lamb [(whose [(fleece )])])])])]was [(white [( as )])]snow
      1      2     3    3     3      4       5         5 4 3 2 1    1       2      2 1

如图所示，这有五个嵌套级别。在第一次替换之后，我们得到：

begin [(Mary [(had lil' [(lamb [(whose )])])])]was [(white )]snow
      1      2          3      4        4 3 2 1    1        1

在第二次替换之后：

begin [(Mary [(had lil' [(lamb )])])]was snow
      1      2          3       3 2 1

在第三次替换之后：

begin [(Mary [(had lil' )])]was snow
      1      2           2 1

在第四次替换之后：

begin [(Mary )]was snow
      1       1

在第五次替换之后：

begin was snow

在下一次尝试替换之后：

begin was snow

由于在最后一步没有做任何替换，我们就完成了。

正则表达式可以细分如下。

\[\(        # match '[('
(?:         # begin non-capture group
  (?!\[\()  # negative lookahead asserts that next to chars are not '[('
  .         # match any char
)*?         # end non-capture group and execute zero or more times lazily
\)\]        # match ')]'

正则表达式使用了一种称为回火贪婪令牌解的技术。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/70624889

复制

相似问题

问去掉带括号的注释，regex
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问去掉带括号的注释，regexEN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问去掉带括号的注释，regex
EN