工作在XCode项目文件解析器PBXProject的纯ruby实现上,在正则表达式方面需要一点帮助。
因此,PBXProject文件有一堆奇怪的代号行,其中混合了内容。我现在有的是正则表达式,(.*?) = (.*?)( \/\* (.*) \*\/)?; ?可以处理更简单的情况(第一行)。但是对于第二行,它太早了(到第一行;-character)。
isa = PBXBuildFile; fileRef = C0480C2015F4F91F00E0A2F4 /* zip.c */;
isa = PBXBuildFile; fileRef = C0480C2315F4F91F00E0A2F4 /* ZipArchive.mm */; settings = {COMPILER_FLAGS = "-fno-objc-arc"; };因此,我想要的是简单的name = value对,即
isa = PBXBuildFile
settings = {COMPILER_FLAGS = "-fno-objc-arc"; }用一个正则表达式实现这一点的简单方法?
发布于 2012-09-07 19:33:26
这个正则表达式可以很好地工作:
[a-zA-Z0-9]*\s*?=\s*?.*?(?:{[^}]*}|(?=;))请注意,只允许一个级别的括号,正则表达式将不会处理嵌套的括号。
从您的示例中,将捕获以下行:
isa = PBXBuildFile
fileRef = C0480C2015F4F91F00E0A2F4 /* zip.c */
isa = PBXBuildFile
fileRef = C0480C2315F4F91F00E0A2F4 /* ZipArchive.mm */
settings = {COMPILER_FLAGS = "-fno-objc-arc"; }下面是regex的解释:
[a-zA-Z0-9]*\s*?=\s*?.*?(?:{[^}]*}|(?=;))
Options: ^ and $ match at line breaks
Match a single character present in the list below «[a-zA-Z0-9]*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
A character in the range between “a” and “z” «a-z»
A character in the range between “A” and “Z” «A-Z»
A character in the range between “0” and “9” «0-9»
Match a single character that is a “whitespace character” (spaces, tabs, and line breaks) «\s*?»
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the character “=” literally «=»
Match a single character that is a “whitespace character” (spaces, tabs, and line breaks) «\s*?»
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match any single character that is not a line break character «.*?»
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the regular expression below «(?:(?={){[^}]*}|(?=;))»
Match either the regular expression below (attempting the next alternative only if this one fails) «(?={){[^}]*}»
Match the character “{” literally «{»
Match any character that is NOT a “}” «[^}]*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match the character “}” literally «}»
Or match regular expression number 2 below (the entire group fails if this one fails to match) «(?=;)»
Assert that the regex below can be matched, starting at this position (positive lookahead) «(?=;)»
Match the character “;” literally «;»发布于 2012-09-07 18:49:08
根据您希望解析的内容的确切性质,可能无法使用单个有限表达式进行解析。您遇到问题的第二行表示可能涉及到嵌套模式。嵌套模式只能匹配到有限的深度,这就是为什么不建议使用regex解析XHTML的原因之一。如果你真的想要处理任意深度的嵌套,你可能会想看看像Treetop这样的东西。
如果你不需要它是健壮的,你可以尝试这样的表达式:
/((?i)(?:[^;]+=\s*\{.*?\})|[^;]+=[^;]+);/这将首先尝试匹配something = {anything}形式的内容,如果不成功,它将在;之前匹配something = something。您应该能够使用string.scan(/regex/)查找给定字符串的所有匹配项。以这种方式处理块可以避免过早结束匹配过程等问题,并且可以轻松提取对。
进一步阅读:
https://stackoverflow.com/questions/12314482
复制相似问题