问Java2010:将VB.NET多行注释与正则表达式进行匹配
EN

Stack Overflow用户

提问于 2018-07-17 00:14:58

回答 1查看 107关注 0票数 0

我想删除(Java/C/C++/..)来自文件的多行注释。为此，我编写了一个正则表达式：

/\*[^\*]*(\*+[^\*/][^\*]*)*\*+/

这个正则表达式可以很好地与Nodepad++和Geany一起使用(搜索并替换全部为空)。正则表达式在VB.NET中的行为不同。

我正在使用：

Microsoft Visual Studio 2010 (Version 10.0.40219.1 SP1Rel)
Microsoft .NET Framework (4.7.02053 SP1Rel)

我正在运行替换的文件并不是那么复杂。我不需要注意任何可能开始或结束评论的引号文本。

@sln感谢你的详细回复，我也会像你一样快速地解释我的regex！

/\*                      Find the beginning of the comment.
[^\*]*                   Match any chars, but not an asterisk.
                         We need to deal with finding an asterisk now:
(\*+[^\*/][^\*]*)*       This regex breaks down to:
 \*+                     Consume asterisk(s).
    [^\*/]               Match any other char that is not an asterisk or a / (would end the comment!).
          [^\*]*         Match any other chars that are not asterisks.
(               )*       Try to find more asterisks followed by other chars.

\*+/                     Match 1 to n asterisks and finish the comment with /.

下面是两个代码片段：

首先：

text

/*
 * block comment
 *
 */ /* comment1 */ /* comment2 */

My text to keep.

/* more comments */

more text

第二：

text

/*
 * block comment
 *
 */ /* comment1 *//* comment2 */

My text to keep.

/* more comments */

more text

唯一的区别是

/* comment1 *//* comment2 */

使用Notepad++和Geany删除找到的匹配项对这两种情况都很有效。在第二个示例中，使用VB.NET中的正则表达式失败。第二个示例删除后的结果如下所示：

text



more text

但是它应该看起来像这样：

text



My text to keep.



more text

我使用的是System.Text.RegularExpressions：

Dim content As String = IO.File.ReadAllText(file_path_)
Dim multiline_comment_remover As Regex = New Regex("/\*[^\*]*(\*+[^\*/][^\*]*)*\*+/")
content = multiline_comment_remover.Replace(content, "")

我希望VB.NET能得到和Notepad++和Geany一样的结果。正如sln回答的那样，我的正则表达式“应该以一种奇怪的方式工作”。问题是为什么VB.NET不能按预期处理这个正则表达式？这个问题仍然悬而未决。

既然sln的答案让我的代码正常工作，我就接受这个答案。尽管这并不能解释为什么VB.NET不喜欢我的正则表达式。谢谢你的帮助！我学到了很多！

regex

vb.net-2010

回答 1

Stack Overflow用户

回答已采纳

发布于 2018-07-17 00:49:27

我认为你可以使用一个通用的C++注释剥离器。

基本上就是

Glbolly find替换为$2

演示PCRE：https://regex101.com/r/UldYK5/1

演示Python：https://regex101.com/r/avfSfB/1

    # raw:   (?m)((?:(?:^[ \t]*)?(?:/\*[^*]*\*+(?:[^/*][^*]*\*+)*/(?:[ \t]*\r?\n(?=[ \t]*(?:\r?\n|/\*|//)))?|//(?:[^\\]|\\(?:\r?\n)?)*?(?:\r?\n(?=[ \t]*(?:\r?\n|/\*|//))|(?=\r?\n))))+)|("(?:\\[\S\s]|[^"\\])*"|'(?:\\[\S\s]|[^'\\])*'|(?:\r?\n|[\S\s])[^/"'\\\s]*)
    # delimited:  /(?m)((?:(?:^[ \t]*)?(?:\/\*[^*]*\*+(?:[^\/*][^*]*\*+)*\/(?:[ \t]*\r?\n(?=[ \t]*(?:\r?\n|\/\*|\/\/)))?|\/\/(?:[^\\]|\\(?:\r?\n)?)*?(?:\r?\n(?=[ \t]*(?:\r?\n|\/\*|\/\/))|(?=\r?\n))))+)|((?:"[^"\\]*(?:\\[\S\s][^"\\]*)*"|'[^'\\]*(?:\\[\S\s][^'\\]*)*'|(?:\r?\n(?:(?=(?:^[ \t]*)?(?:\/\*|\/\/))|[^\/"'\\\r\n]*))+|[^\/"'\\\r\n]+)+|[\S\s][^\/"'\\\r\n]*)/

    (?m)                             # Multi-line modifier
    (                                # (1 start), Comments
         (?:
              (?: ^ [ \t]* )?                  # <- To preserve formatting
              (?:
                   /\*                              # Start /* .. */ comment
                   [^*]* \*+
                   (?: [^/*] [^*]* \*+ )*
                   /                                # End /* .. */ comment
                   (?:                              # <- To preserve formatting
                        [ \t]* \r? \n
                        (?=
                             [ \t]*
                             (?: \r? \n | /\* | // )
                        )
                   )?
                |
                   //                               # Start // comment
                   (?:                              # Possible line-continuation
                        [^\\]
                     |  \\
                        (?: \r? \n )?
                   )*?
                   (?:                              # End // comment
                        \r? \n
                        (?=                              # <- To preserve formatting
                             [ \t]*
                             (?: \r? \n | /\* | // )
                        )
                     |  (?= \r? \n )
                   )
              )
         )+                               # Grab multiple comment blocks if need be
    )                                # (1 end)

 |                                 ## OR

    (                                # (2 start), Non - comments
         # Quotes
         # ======================
         (?:                              # Quote and Non-Comment blocks
              "
              [^"\\]*                          # Double quoted text
              (?: \\ [\S\s] [^"\\]* )*
              "
           |                                 # --------------
              '
              [^'\\]*                          # Single quoted text
              (?: \\ [\S\s] [^'\\]* )*
              '
           |                                 # --------------

              (?:                              # Qualified Linebreak's
                   \r? \n
                   (?:
                        (?=                              # If comment ahead just stop
                             (?: ^ [ \t]* )?
                             (?: /\* | // )
                        )
                     |                                 # or,
                        [^/"'\\\r\n]*                    # Chars which doesn't start a comment, string, escape,
                                                         # or line continuation (escape + newline)
                   )
              )+
           |                                 # --------------
              [^/"'\\\r\n]+                    # Chars which doesn't start a comment, string, escape,
                                               # or line continuation (escape + newline)

         )+                               # Grab multiple instances

      |                                 # or,
         # ======================
         # Pass through

         [\S\s]                           # Any other char
         [^/"'\\\r\n]*                    # Chars which doesn't start a comment, string, escape,
                                          # or line continuation (escape + newline)

    )                                # (2 end), Non - comments

如果您使用不支持断言的特定引擎，

那你就得用这个。

但是，这不会保留格式。

用法同上。

    # (/\*[^*]*\*+(?:[^/*][^*]*\*+)*/|//(?:[^\\]|\\\n?)*?\n)|("(?:\\[\S\s]|[^"\\])*"|'(?:\\[\S\s]|[^'\\])*'|[\S\s][^/"'\\]*)


    (                                # (1 start), Comments 
         /\*                              # Start /* .. */ comment
         [^*]* \*+
         (?: [^/*] [^*]* \*+ )*
         /                                # End /* .. */ comment
      |  
         //                               # Start // comment
         (?: [^\\] | \\ \n? )*?           # Possible line-continuation
         \n                               # End // comment
    )                                # (1 end)
 |  
    (                                # (2 start), Non - comments 
         "
         (?: \\ [\S\s] | [^"\\] )*        # Double quoted text
         "
      |  '
         (?: \\ [\S\s] | [^'\\] )*        # Single quoted text
         ' 
      |  [\S\s]                           # Any other char
         [^/"'\\]*                        # Chars which doesn't start a comment, string, escape,
                                          # or line continuation (escape + newline)
    )                                # (2 end)

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/51366120

复制

相似问题

问Java2010:将VB.NET多行注释与正则表达式进行匹配
EN

回答 1

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Java2010:将VB.NET多行注释与正则表达式进行匹配EN

回答 1

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Java2010:将VB.NET多行注释与正则表达式进行匹配
EN