blocks|key|1545494|text|一些类似的东西(没有测试，这只是一个建议)|type|unstyled|depth|inlineStyleRanges|entityRanges|data|1545495|RegEx+r+=+new+RegEx(@"Idiom:([%5E\n]%2B)\n([%5Eo]%2B)(o([%5Eo]%2B)o)*");|code-block|syntax|javascript|1545496|entityMap^0|0|0^^$0|@$1|2|3|4|5|6|7|I|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|J|8|@]|9|@]|A|$E|F]]|$1|G|3|-4|5|6|7|K|8|@]|9|@]|A|$]]]|H|$]]

Something along these lines (didn't test it, it's just a suggestion)

<pre><code>RegEx r = new RegEx(@"Idiom:([^\n]+)\n([^o]+)(o([^o]+)o)*");
</code></pre>

blocks|key|2263320|text|像这样的东西应该管用。我还没有对它进行测试，但是只要稍微调试一下就可以了。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|2263321|我知道您将regex放在标记中，但这也是一种提取行的方法。|offset|length|style|CODE|2263322|using+(+var+textReader+=+new+StreamReader("idioms.txt")+)
{
++++var+idioms+=+new+List<Idiom>();
++++string+line;
++++while+(+(+line+=+textReader.ReadLine()+)+!=+null+)
++++{
++++++++var+idiom+=+new+Idiom();
++++++++if+(+line.StartsWith("idiom:+")+)
++++++++{
++++++++++++idiom.Meaning+=+line.Replace("idiom:+",+string.Empty);
++++++++++++idiom.Description+=+textReader.ReadLine();

++++++++++++while+(+(+line+=+textReader.ReadLine()+)+!=+null+)
++++++++++++{
++++++++++++++++if+(+line.StartsWith("o+")+)
++++++++++++++++++++idiom.IdiomExamples.Add(new+IdiomExample+{+Item+=+line.Replace("o+",+string.Empty)+});
++++++++++++++++else+break;
++++++++++++}
++++++++++++idioms.Add(idiom);
++++++++}
++++}

++++///idioms+ready
}|code-block|syntax|javascript|2263323|entityMap^0|0|5|5|0|0^^$0|@$1|2|3|4|5|6|7|O|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|P|8|@$D|Q|E|R|F|G]]|9|@]|A|$]]|$1|H|3|I|5|J|7|S|8|@]|9|@]|A|$K|L]]|$1|M|3|-4|5|6|7|T|8|@]|9|@]|A|$]]]|N|$]]

Something like this should work. I haven't tested it, but with a little debug I guess it would work. 

I know you put <code>regex</code> in tags, but this is a way for extracting line too.

<pre><code>using ( var textReader = new StreamReader("idioms.txt") )
{
 var idioms = new List&lt;Idiom&gt;();
 string line;
 while ( ( line = textReader.ReadLine() ) != null )
 {
 var idiom = new Idiom();
 if ( line.StartsWith("idiom: ") )
 {
 idiom.Meaning = line.Replace("idiom: ", string.Empty);
 idiom.Description = textReader.ReadLine();

 while ( ( line = textReader.ReadLine() ) != null )
 {
 if ( line.StartsWith("o ") )
 idiom.IdiomExamples.Add(new IdiomExample { Item = line.Replace("o ", string.Empty) });
 else break;
 }
 idioms.Add(idiom);
 }
 }

 ///idioms ready
}
</code></pre>

blocks|key|2540103|text|这是我对你问题的判断：|type|unstyled|depth|inlineStyleRanges|entityRanges|data|2540104|(?<section>(?<idiom>%5E.%2B?):(?<meaning>.%2B)[\n](?<description>.*?)(?<examples>(?<example>o.%2B[\s\r\n])%2B))|code-block|syntax|javascript|2540105|我测试了一下，但我认为你必须解决一些小问题。总的来说，它运行得很好。|2540106|此正则表达式的设置：|2540107|RegexOptions.IgnoreCase+%7C+RegexOptions.Multiline+%7C+RegexOptions.ExplicitCapture+%7C+RegexOptions.IgnorePatternWhitespace+%7C+RegexOptions.CultureInvariant|2540108|你有三种方法来处理你的文件。首先是使用regex，它是最快速的开发位置和最慢的性能解决方案。第二种方法是将文本解析为字符串，并使用LINQ或任何您想要的内容。对我来说，这种方法是错误的、不可扩展的等等，但是它具有更好的性能，如果您处理非常庞大的文件，这可能是非常关键的。第三种是使用正式的语法和终端机器之类的.我从来没有实现过这样的东西，但我知道开发和维护非常困难，所以我建议您使用regexp，然后迁移到另一种方法，如果性能将成为您的瓶颈。|2540109|希望这能有所帮助！|2540110|entityMap^0|0|0|0|0|0|0|0^^$0|@$1|2|3|4|5|6|7|S|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|T|8|@]|9|@]|A|$E|F]]|$1|G|3|H|5|6|7|U|8|@]|9|@]|A|$]]|$1|I|3|J|5|6|7|V|8|@]|9|@]|A|$]]|$1|K|3|L|5|D|7|W|8|@]|9|@]|A|$E|F]]|$1|M|3|N|5|6|7|X|8|@]|9|@]|A|$]]|$1|O|3|P|5|6|7|Y|8|@]|9|@]|A|$]]|$1|Q|3|-4|5|6|7|Z|8|@]|9|@]|A|$]]]|R|$]]

Here is my regex for your problem: 

<pre><code>(?&lt;section&gt;(?&lt;idiom&gt;^.+?):(?&lt;meaning&gt;.+)[\n](?&lt;description&gt;.*?)(?&lt;examples&gt;(?&lt;example&gt;o.+[\s\r\n])+))
</code></pre>

I tested it a little bit, but i think that you'll have to fix some little problems. In general, it works well. 

Settings for this regex: 

<pre><code>RegexOptions.IgnoreCase | RegexOptions.Multiline | RegexOptions.ExplicitCapture | RegexOptions.IgnorePatternWhitespace | RegexOptions.CultureInvariant
</code></pre>

Well, you have 3 ways to work with your file. First is to use regex, it's the quickiest in position of development and slowest in performance solution. The second is to parse your text into string and use LINQ or whatever you want. This approach, for me, is buggy, non-scaleable and so on, but it has better performance, which can be critical if you deal with very huge files. And the third is to use formal grammars and terminal machines or something like that... I have never implemented such a stuff, but i know, that it is fast and very hard to develop and maintain, so i recommend you to use regexps and then migrate to another approach if performance will become your bottleneck

Hope this helps!

blocks|key|2263353|text|您的示例没有描述，但是这个regexp接受可选的描述。它让您了解如何解析输入，而不是整个C#代码。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|2263354|请看这里的这个演示，看看这些组|offset|length|2263355|(?smx)
%5E+
([%5E:\n]%2B):\s*([%5E\n]%2B)
\n([%5Eo].*?\n%7C)
(%5Eo.*?)
(?=\Z%7C%5E[%5Eo:\n]%2B:)|code-block|syntax|javascript|2263356|在此之后：|2263357|Group#1有成语|ordered-list-item|2263358|Group#2有意义|2263359|如果有，Group#3有描述|2263360|Group#4提供了所有示例|2263361|此正则表达式不会将示例解析为几个示例，这是下一个工作。另外，你可能不喜欢一些新词。|2263362|entityMap|0|LINK|mutability|MUTABLE|url|http://regex101.com/r/wY1mJ0^0|0|5|4|0|0|0|0|0|0|0|0|0^^$0|@$1|2|3|4|5|6|7|15|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|16|8|@]|9|@$D|17|E|18|1|19]]|A|$]]|$1|F|3|G|5|H|7|1A|8|@]|9|@]|A|$I|J]]|$1|K|3|L|5|6|7|1B|8|@]|9|@]|A|$]]|$1|M|3|N|5|O|7|1C|8|@]|9|@]|A|$]]|$1|P|3|Q|5|O|7|1D|8|@]|9|@]|A|$]]|$1|R|3|S|5|O|7|1E|8|@]|9|@]|A|$]]|$1|T|3|U|5|O|7|1F|8|@]|9|@]|A|$]]|$1|V|3|W|5|6|7|1G|8|@]|9|@]|A|$]]|$1|X|3|-4|5|6|7|1H|8|@]|9|@]|A|$]]]|Y|$Z|$5|10|11|12|A|$13|14]]]]

Your example has no description but this regexp accepts optional description. It gives you an idea how to parse your input not the whole C# code.

See here <a href="http://regex101.com/r/wY1mJ0" rel="nofollow">this demo</a> and look at the Groups

<pre><code>(?smx)
^ 
([^:\n]+):\s*([^\n]+)
\n([^o].*?\n|)
(^o.*?)
(?=\Z|^[^o:\n]+:)
</code></pre>

After this: 

<ol>
<li>Group#1 has idiom</li>
<li>Group#2 has meaning</li>
<li>Group#3 has description if present</li>
<li>Group#4 has all the examples</li>
</ol>

This regex does not parse your examples into several examples, that is the next job. Also you may don't like some newlines.

I've text file, file's content are something like this :

<pre><code>idiom: meaning
description.
o example1.
o example2.

idiom: meaning
description.
o example1.
o example2.

.
.
.
</code></pre>

as you can see that file contains above paragraphs, each paragraph has some data that I want to extract (note that examples start with <code>o</code>). for example we've these data : 

<pre><code>public class Idiom
{
 public string Idiom { get; set; }
 public string Meaning { get; set; }
 public string Description { get; set; }
 public IList&lt;IdiomExample&gt; IdiomExamples { get; set; }
}

public class IdiomExample
{
 public string Item { get; set; }
}
</code></pre>

Is there any way to extract those fields in that file? Any Idea? 

Edited 
that file could be anything, something like idiom and verb,... are example , that is just my pattern for example :

<pre><code>little by little: gradually, slowly (also: step by step)
o Karen's health seems to be improving little by little.
o If you study regularly each day, step by step your vocabulary will increase.
to tire out: to make very weary due to difficult conditions or hard effort (also: to wear out) (S)
o The hot weather tired out the runners in the marathon.
o Does studying for final exams wear you out? It makes me feel worn out!
</code></pre>

Thanks in advance

C# : parsing text file

翻译质量差，导致语言生硬或混乱。

没有提供实际的解决方法或示例。

解答不清晰，无法理解或解决问题。

页面排版不美观，阅读体验差。

文章

问答

视频

学习中心

腾讯云实验室

直播

竞赛

腾讯云代码分析专区

腾讯iOA零信任安全管理系统专区

腾讯云架构师技术同盟交流圈

腾讯云数据库专区

腾讯云顾问专区

腾讯云原生专区

腾讯混元专区

腾讯云TCE专区

腾讯云Lighthouse专区

腾讯云HAI专区

腾讯云Edgeone专区

腾讯云存储专区

腾讯云智能专区

腾讯轻联专区 

腾讯云开发专区

TAPD专区

腾讯轻量云游戏服专区

腾讯云最具价值专家

腾讯云架构师技术同盟

腾讯云创作之星

腾讯云开发者先锋

腾讯云代码助手

云原生构建

TAPD 敏捷项目管理

Cloud Studio

SDK中心

API中心

命令行工具

涵盖代码开发、场景应用、自动测试全流程，助你从零构建专属AI助手

一站式MCP教程库，解锁AI应用新玩法

我有文本文件，文件的内容是这样的：idiom: meaningdescription.o example1.o example2.idiom: meaningdescription.o example1.o example2....如您所见，该文件包含上述段落，每个段落都有一些我希望提取的数据(请注意，示例以o开头)。...

问C#：解析文本文件
EN

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问C#：解析文本文件EN