blocks|key|919471|text|这是解决办法。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|919472|import+re

content+=+"<abc>d<e><f>ghi<j>"
result+=+re.findall(r"<.*?>%7C[%5E<>]%2B",+content)

print(result)|code-block|syntax|javascript|919473|输出：|919474|['<abc>',+'d',+'<e>',+'<f>',+'ghi',+'<j>']|919475|解释：|919476|regex+<.*?>意味着与<content>匹配的所有内容|unordered-list-item|offset|length|style|CODE|919477|regex+[%5E<>]%2B意味着其他一切|919478|简单地说，findall将找到与<content>匹配的所有内容，否则，将找到其他所有匹配的内容。这样，内容将在不丢失分隔符的情况下被分割。|919479|entityMap^0|0|0|0|0|0|6|5|F|9|0|6|6|0|5|7|G|9|0^^$0|@$1|2|3|4|5|6|7|Z|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|10|8|@]|9|@]|A|$E|F]]|$1|G|3|H|5|6|7|11|8|@]|9|@]|A|$]]|$1|I|3|J|5|D|7|12|8|@]|9|@]|A|$E|F]]|$1|K|3|L|5|6|7|13|8|@]|9|@]|A|$]]|$1|M|3|N|5|O|7|14|8|@$P|15|Q|16|R|S]|$P|17|Q|18|R|S]]|9|@]|A|$]]|$1|T|3|U|5|O|7|19|8|@$P|1A|Q|1B|R|S]]|9|@]|A|$]]|$1|V|3|W|5|6|7|1C|8|@$P|1D|Q|1E|R|S]|$P|1F|Q|1G|R|S]]|9|@]|A|$]]|$1|X|3|-4|5|6|7|1H|8|@]|9|@]|A|$]]]|Y|$]]

Here is the solution.

<pre class="lang-py prettyprint-override"><code>import re

content = "&lt;abc&gt;d&lt;e&gt;&lt;f&gt;ghi&lt;j&gt;"
result = re.findall(r"&lt;.*?&gt;|[^&lt;&gt;]+", content)

print(result)
</code></pre>

Output:

<pre class="lang-py prettyprint-override"><code>['&lt;abc&gt;', 'd', '&lt;e&gt;', '&lt;f&gt;', 'ghi', '&lt;j&gt;']
</code></pre>

Explanations:

<ul>
<li>regex <code>&lt;.*?&gt;</code> means everything that matches <code>&lt;content&gt;</code></li>
<li>regex <code>[^&lt;&gt;]+</code> means everything else</li>
</ul>

In brief, <code>findall</code> will find everything that matches <code>&lt;content&gt;</code>, otherwise, everything else. That way, the content will be split without losing the separators.

blocks|key|1550734|text|我相信你可以在这个正则表达式中使用拆分|type|unstyled|depth|inlineStyleRanges|entityRanges|data|1550735|(?<=>)(?=[a-z<])%7C(?<=[a-z>])(?=<)|code-block|syntax|javascript|1550736|https://regex101.com/r/WNy5n9/1|offset|length|1550737|它不过是两个选项的配对查找/提前断言。|1550738|扩容|1550739|+++(?<=+>+)++++++++++++++++++++++#+Behind+a++>
+++(?=+[a-z<]+)++++++++++++++++++#+Ahead+either+a-z+or+<
%7C++++++++++++++++++++++++++++++#+or,
+++(?<=+[a-z>]+)+++++++++++++++++#+Behind+either+a-z+or+>
+++(?=+<+)+++++++++++++++++++++++#+Ahead+a++<|1550740|更新|style|BOLD|1550741|请注意，在版本3.7分裂之前的Python版本中|1550742|在一个空的匹配没有被正确地处理。|1550743|想必他们无法分辨出空的|1550744|字符串和/或如何在零宽度匹配上完成凸点。|1550745|好像他们把头从a**中拉出来，现在是3.7版，|1550746|所以给你..。|1550747|Demo|1550748|3.7.3版|1550749|>>>+import+sys
>>>+print(+sys.version+)
3.7.3+(v3.7.3:ef4ec6ed12,+Mar+25+2019,+21:26:53)+[MSC+v.1916+32+bit+(Intel)]|1550750|代码|1550751|>>>+import+re
>>>+rx+=+re.compile(+r"(?<=>)(?=[a-z<])%7C(?<=[a-z>])(?=<)"+)
>>>+s+=+"<abc>d<e><f>ghi<j>test><g>"
>>>+x+=++re.split(+rx,+s+)
>>>+print+(+x+)
['<abc>',+'d',+'<e>',+'<f>',+'ghi',+'<j>',+'test>',+'<g>']|1550752|entityMap|0|LINK|mutability|MUTABLE|url^0|0|0|0|V|0|0|0|0|0|0|2|0|0|0|0|0|0|0|0|4|0|0|0|0|0^^$0|@$1|2|3|4|5|6|7|1N|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|1O|8|@]|9|@]|A|$E|F]]|$1|G|3|H|5|6|7|1P|8|@]|9|@$I|1Q|J|1R|1|1S]]|A|$]]|$1|K|3|L|5|6|7|1T|8|@]|9|@]|A|$]]|$1|M|3|N|5|6|7|1U|8|@]|9|@]|A|$]]|$1|O|3|P|5|D|7|1V|8|@]|9|@]|A|$E|F]]|$1|Q|3|R|5|6|7|1W|8|@$I|1X|J|1Y|S|T]]|9|@]|A|$]]|$1|U|3|V|5|6|7|1Z|8|@]|9|@]|A|$]]|$1|W|3|X|5|6|7|20|8|@]|9|@]|A|$]]|$1|Y|3|Z|5|6|7|21|8|@]|9|@]|A|$]]|$1|10|3|11|5|6|7|22|8|@]|9|@]|A|$]]|$1|12|3|13|5|6|7|23|8|@]|9|@]|A|$]]|$1|14|3|15|5|6|7|24|8|@]|9|@]|A|$]]|$1|16|3|17|5|6|7|25|8|@$I|26|J|27|S|T]]|9|@]|A|$]]|$1|18|3|19|5|6|7|28|8|@]|9|@]|A|$]]|$1|1A|3|1B|5|D|7|29|8|@]|9|@]|A|$E|F]]|$1|1C|3|1D|5|6|7|2A|8|@]|9|@]|A|$]]|$1|1E|3|1F|5|D|7|2B|8|@]|9|@]|A|$E|F]]|$1|1G|3|-4|5|6|7|2C|8|@]|9|@]|A|$]]]|1H|$1I|$5|1J|1K|1L|A|$1M|H]]]]

I believe you can use split with this regex 

<pre><code>(?&lt;=&gt;)(?=[a-z&lt;])|(?&lt;=[a-z&gt;])(?=&lt;)
</code></pre>

<a href="https://regex101.com/r/WNy5n9/1" rel="nofollow noreferrer">https://regex101.com/r/WNy5n9/1</a> 

It's nothing more than 2 option's with paired lookbehind/ahead assertions. 

Expanded 

<pre><code> (?&lt;= &gt; ) # Behind a &gt;
 (?= [a-z&lt;] ) # Ahead either a-z or &lt;
| # or,
 (?&lt;= [a-z&gt;] ) # Behind either a-z or &gt;
 (?= &lt; ) # Ahead a &lt;
</code></pre>

Update 
Note that in versions of Python prior to version 3.7 splitting 
on an empty match was not handled correctly. 
Presumably they couldn't tell the difference between an empty 
string and / or how to do the bump along on zero-width matches. 

Seems like they pulled their heads out of their a** now in version 3.7, 
so here you go..

Demo 

Version 3.7.3 

<pre><code>&gt;&gt;&gt; import sys
&gt;&gt;&gt; print( sys.version )
3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 21:26:53) [MSC v.1916 32 bit (Intel)]
</code></pre>

Code 

<pre><code>&gt;&gt;&gt; import re
&gt;&gt;&gt; rx = re.compile( r"(?&lt;=&gt;)(?=[a-z&lt;])|(?&lt;=[a-z&gt;])(?=&lt;)" )
&gt;&gt;&gt; s = "&lt;abc&gt;d&lt;e&gt;&lt;f&gt;ghi&lt;j&gt;test&gt;&lt;g&gt;"
&gt;&gt;&gt; x = re.split( rx, s )
&gt;&gt;&gt; print ( x )
['&lt;abc&gt;', 'd', '&lt;e&gt;', '&lt;f&gt;', 'ghi', '&lt;j&gt;', 'test&gt;', '&lt;g&gt;']
</code></pre>

blocks|key|101577|text|在所提出的解决方案中，不属于一对<的单个打开>或闭包<>被排除在结果之外。|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|101578|如果您还想保留一个<或>，可以使用：|101579|<[%5E<>]*>%7C(?:(?!<[%5E<>]*>).)%2B|code-block|syntax|javascript|101580|解释|101581|<[%5E<>]*>匹配打开<，然后0%2B时间不是>，然后是关闭>|unordered-list-item|101582|%7C或|101583|(?:(?!<[%5E<>]*>).)%2B回火的贪婪标记，如果右边的不是开式直到结束模式，则匹配任何字符。|101584|Regex演示+x-+Python演示|101585|例如：|101586|import+re
content+=+"<abc>d<e><f>ghi<j>test><g>"
result+=+re.findall(r"<[%5E<>]*>%7C(?:(?!<[%5E<>]*>).)%2B",+content)
print(result)|101587|结果|101588|['<abc>',+'d',+'<e>',+'<f>',+'ghi',+'<j>',+'test>',+'<g>']|101589|entityMap|0|LINK|mutability|MUTABLE|url|https://regex101.com/r/Nc4Jwt/1|1|https://ideone.com/fxIic3^0|G|1|M|1|0|9|1|B|1|0|0|0|0|8|C|1|M|1|T|1|0|0|1|0|0|I|0|0|7|0|B|8|1|0|0|0|0|0^^$0|@$1|2|3|4|5|6|7|1F|8|@$9|1G|A|1H|B|C]|$9|1I|A|1J|B|C]]|D|@]|E|$]]|$1|F|3|G|5|6|7|1K|8|@$9|1L|A|1M|B|C]|$9|1N|A|1O|B|C]]|D|@]|E|$]]|$1|H|3|I|5|J|7|1P|8|@]|D|@]|E|$K|L]]|$1|M|3|N|5|6|7|1Q|8|@]|D|@]|E|$]]|$1|O|3|P|5|Q|7|1R|8|@$9|1S|A|1T|B|C]|$9|1U|A|1V|B|C]|$9|1W|A|1X|B|C]|$9|1Y|A|1Z|B|C]]|D|@]|E|$]]|$1|R|3|S|5|Q|7|20|8|@$9|21|A|22|B|C]]|D|@]|E|$]]|$1|T|3|U|5|Q|7|23|8|@$9|24|A|25|B|C]]|D|@]|E|$]]|$1|V|3|W|5|6|7|26|8|@]|D|@$9|27|A|28|1|29]|$9|2A|A|2B|1|2C]]|E|$]]|$1|X|3|Y|5|6|7|2D|8|@]|D|@]|E|$]]|$1|Z|3|10|5|J|7|2E|8|@]|D|@]|E|$K|L]]|$1|11|3|12|5|6|7|2F|8|@]|D|@]|E|$]]|$1|13|3|14|5|J|7|2G|8|@]|D|@]|E|$K|L]]|$1|15|3|-4|5|6|7|2H|8|@]|D|@]|E|$]]]|16|$17|$5|18|19|1A|E|$1B|1C]]|1D|$5|18|19|1A|E|$1B|1E]]]]

In the proposed, solution a single opening <code>&lt;</code> or closing <code>&gt;</code> which are not part of a pair &lt;> are excluded from the result.

If you also want to keep a <code>&lt;</code> or <code>&gt;</code> you could use:

<pre><code>&lt;[^&lt;&gt;]*&gt;|(?:(?!&lt;[^&lt;&gt;]*&gt;).)+
</code></pre>

Explanation

<ul>
<li><code>&lt;[^&lt;&gt;]*&gt;</code> Match opening <code>&lt;</code>, then 0+ times not <code>&gt;</code>, then a closing <code>&gt;</code></li>
<li><code>|</code> Or</li>
<li><code>(?:(?!&lt;[^&lt;&gt;]*&gt;).)+</code> Tempered greedy token, match any char if what is directly on the right is not the opening till closing pattern</li>
</ul>

<a href="https://regex101.com/r/Nc4Jwt/1" rel="nofollow noreferrer">Regex demo</a> | <a href="https://ideone.com/fxIic3" rel="nofollow noreferrer">Python demo</a>

For example:

<pre><code>import re
content = "&lt;abc&gt;d&lt;e&gt;&lt;f&gt;ghi&lt;j&gt;test&gt;&lt;g&gt;"
result = re.findall(r"&lt;[^&lt;&gt;]*&gt;|(?:(?!&lt;[^&lt;&gt;]*&gt;).)+", content)
print(result)
</code></pre>

Result

<pre><code>['&lt;abc&gt;', 'd', '&lt;e&gt;', '&lt;f&gt;', 'ghi', '&lt;j&gt;', 'test&gt;', '&lt;g&gt;']
</code></pre>

@edzech asked how was it possible to split a string and keep the separators in it. His question was <a href="https://stackoverflow.com/questions/55875494/splitting-text-without-losing-separator">marked as duplicate</a>, whereas the approach here is different than the "duplicate". 

We want to split a string but by keeping the delimiters in it, we don't want them to be separated.
In brief, for <code>&lt;abc&gt;d&lt;e&gt;&lt;f&gt;ghi&lt;j&gt;</code>, we want:

<pre class="lang-py prettyprint-override"><code>['&lt;abc&gt;', 'd', '&lt;e&gt;', '&lt;f&gt;', 'ghi', '&lt;j&gt;']
</code></pre>

instead of:

<pre class="lang-py prettyprint-override"><code>['&lt;', 'abc', '&gt;' 'd', '&lt;', 'e', '&gt;', '&lt;', 'f', '&gt;', 'ghi', '&lt;', 'j', '&gt;']
</code></pre>

Using <code>split</code> does not help since it will split according to the separator. We want to keep it attached to its content.

How to split a string and keep the separators in it

翻译质量差，导致语言生硬或混乱。

没有提供实际的解决方法或示例。

解答不清晰，无法理解或解决问题。

页面排版不美观，阅读体验差。

文章

问答

视频

学习中心

腾讯云实验室

直播

竞赛

腾讯云代码分析专区

腾讯iOA零信任安全管理系统专区

腾讯云架构师技术同盟交流圈

腾讯云数据库专区

腾讯云顾问专区

腾讯云原生专区

腾讯混元专区

腾讯云TCE专区

腾讯云Lighthouse专区

腾讯云HAI专区

腾讯云Edgeone专区

腾讯云存储专区

腾讯云智能专区

腾讯轻联专区 

腾讯云开发专区

TAPD专区

腾讯轻量云游戏服专区

腾讯云最具价值专家

腾讯云架构师技术同盟

腾讯云创作之星

腾讯云开发者先锋

腾讯云代码助手

云原生构建

TAPD 敏捷项目管理

Cloud Studio

SDK中心

API中心

命令行工具

涵盖代码开发、场景应用、自动测试全流程，助你从零构建专属AI助手

一站式MCP教程库，解锁AI应用新玩法

@edzech问，如何才能将一根绳子分开，并将分隔符保存在其中。他的问题是，而这里的方法与“重复”不同。我们希望拆分一个字符串，但是通过在其中保留分隔符，我们不希望分隔它们。简而言之，对于<abc>d<e><f>ghi<j>，我们希望：['<abc>', 'd', '<e>', '<f>', 'ghi', '<j>']...

问如何拆分字符串并将分隔符保存在其中
EN

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何拆分字符串并将分隔符保存在其中EN