blocks|key|3676291|text|正如您所评论的，如果您想提取"<a+class="timetable+work"+href="test.com/";+and+"?tag=meta376">Test</a>"，您可以使用以下正则表达式：|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|3676292|<a+class="timetable.*?<\/a>|code-block|syntax|javascript|3676293|3676294|如果您想获取内容，只需在regex周围使用捕获组：|3676295|(<a+class="timetable.*?<\/a>)|3676296|匹配结果为：|3676297|MATCH+1
1.++[9-80]++`<a+class="timetable+work"+href="test.com/";+and+"?tag=meta376">Test</a>`|3676298|entityMap^0|E|21|0|0|0|0|0|0|0^^$0|@$1|2|3|4|5|6|7|V|8|@$9|W|A|X|B|C]]|D|@]|E|$]]|$1|F|3|G|5|H|7|Y|8|@]|D|@]|E|$I|J]]|$1|K|3|-4|5|6|7|Z|8|@]|D|@]|E|$]]|$1|L|3|M|5|6|7|10|8|@]|D|@]|E|$]]|$1|N|3|O|5|H|7|11|8|@]|D|@]|E|$I|J]]|$1|P|3|Q|5|6|7|12|8|@]|D|@]|E|$]]|$1|R|3|S|5|H|7|13|8|@]|D|@]|E|$I|J]]|$1|T|3|-4|5|6|7|14|8|@]|D|@]|E|$]]]|U|$]]

As you commented, if you want to extract <code>"&lt;a class="timetable work" href="test.com/"; and "?tag=meta376"&gt;Test&lt;/a&gt;"</code> you can use the following regex:

<pre><code>&lt;a class="timetable.*?&lt;\/a&gt;
</code></pre>

<a href="http://regex101.com/r/qI5yH9/1" rel="nofollow">Working demo</a>

If you want to grab the content just surround the regex with capturing groups:

<pre><code>(&lt;a class="timetable.*?&lt;\/a&gt;)
</code></pre>

The match is:

<pre><code>MATCH 1
1. [9-80] `&lt;a class="timetable work" href="test.com/"; and "?tag=meta376"&gt;Test&lt;/a&gt;`
</code></pre>

blocks|key|1282095|text|我想这就是你想要的：|type|unstyled|depth|inlineStyleRanges|entityRanges|data|1282096|sed+'s_%5E.*<a+[%5E<>]*+href="https*://[%5E/]*/$[%5E"?]*$.*$_\1_'|code-block|syntax|javascript|1282097|entityMap^0|0|0^^$0|@$1|2|3|4|5|6|7|I|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|J|8|@]|9|@]|A|$E|F]]|$1|G|3|-4|5|6|7|K|8|@]|9|@]|A|$]]]|H|$]]

I think this is what you want:

<pre><code>sed 's_^.*&lt;a [^&lt;&gt;]* href="https*://[^/]*/$[^"?]*$.*$_\1_'
</code></pre>

blocks|key|1275739|text|使用您告诉我们使用的分隔符，给出您所要求的内容：|type|unstyled|depth|inlineStyleRanges|entityRanges|data|1275740|$+sed+-n+'s%7C.*<a+class="timetable+work"+href="http://www\.test\.com/$.*$?tag=meta376">Test</a>%7C\1%7Cp'+file
pagename|code-block|syntax|javascript|1275741|entityMap^0|0|0^^$0|@$1|2|3|4|5|6|7|I|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|J|8|@]|9|@]|A|$E|F]]|$1|G|3|-4|5|6|7|K|8|@]|9|@]|A|$]]]|H|$]]

Giving you exactly what you asked for using exactly the delimiters you told us to use:

<pre><code>$ sed -n 's|.*&lt;a class="timetable work" href="http://www\.test\.com/$.*$?tag=meta376"&gt;Test&lt;/a&gt;|\1|p' file
pagename
</code></pre>

blocks|key|1360096|text|我知道使用正则表达式来处理这个问题可能很诱人，但这里有一个替代方案。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|1360097|您正在尝试解析一些HTML，因此请使用HTML解析器。下面是一个用Perl编写的示例：|1360098|use+strict;
use+warnings;
use+feature+qw(say);

use+HTML::TokeParser::Simple;
use+URI::URL;

my+$filename+=+'file.html';+
my+$parser+=+HTML::TokeParser::Simple->new($filename);

while+(my+$anchor+=+$parser->get_tag('a'))+{
++++next+unless+defined(my+$class+=+$anchor->get_attr('class'));
++++next+unless+$class+=~+/\btimetable\b/+and+$class+=~+/\bwork\b/;++++
++++my+$url+=+url+$anchor->get_attr('href');++++
++++say+substr($url->path,+1);
}|code-block|syntax|javascript|1360099|使用HTML::TokeParser::Simple解析HTML。循环遍历<a>标记，跳过没有定义正确类的任何标记。对于需要这样做的，使用URI::URL解析url并提取"path“组件(在本例中，应该是"/pagename")。因为您不想要前导斜杠，所以我使用substr删除了第一个字符。|offset|length|style|CODE|1360100|输出：|1360101|pagename|1360102|我知道它比单一的正则表达式要长得多，但它也是一个lot，更健壮，即使将来的格式稍有变化，它也可以继续工作。HTML解析器的存在是有原因的:)|BOLD|1360103|entityMap|0|LINK|mutability|MUTABLE|url|http://search.cpan.org/~gaas/HTML-Parser-3.71/lib/HTML/TokeParser.pm|1|http://search.cpan.org/~rse/lcwa-1.0.0/lib/lwp/lib/URI/URL.pm^0|0|0|0|2|O|11|3|1X|8|3N|6|2|O|0|1X|8|1|0|0|0|O|C|0^^$0|@$1|2|3|4|5|6|7|15|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|16|8|@]|9|@]|A|$]]|$1|D|3|E|5|F|7|17|8|@]|9|@]|A|$G|H]]|$1|I|3|J|5|6|7|18|8|@$K|19|L|1A|M|N]|$K|1B|L|1C|M|N]|$K|1D|L|1E|M|N]|$K|1F|L|1G|M|N]]|9|@$K|1H|L|1I|1|1J]|$K|1K|L|1L|1|1M]]|A|$]]|$1|O|3|P|5|6|7|1N|8|@]|9|@]|A|$]]|$1|Q|3|R|5|F|7|1O|8|@]|9|@]|A|$G|H]]|$1|S|3|T|5|6|7|1P|8|@$K|1Q|L|1R|M|U]]|9|@]|A|$]]|$1|V|3|-4|5|6|7|1S|8|@]|9|@]|A|$]]]|W|$X|$5|Y|Z|10|A|$11|12]]|13|$5|Y|Z|10|A|$11|14]]]]

I know it may be tempting to handle this using a regular expression but here's an alternative.

You are trying to parse some HTML, so use an HTML parser. Here's an example in Perl:

<pre><code>use strict;
use warnings;
use feature qw(say);

use HTML::TokeParser::Simple;
use URI::URL;

my $filename = 'file.html'; 
my $parser = HTML::TokeParser::Simple-&gt;new($filename);

while (my $anchor = $parser-&gt;get_tag('a')) {
 next unless defined(my $class = $anchor-&gt;get_attr('class'));
 next unless $class =~ /\btimetable\b/ and $class =~ /\bwork\b/; 
 my $url = url $anchor-&gt;get_attr('href'); 
 say substr($url-&gt;path, 1);
}
</code></pre>

Parse the HTML using <a href="http://search.cpan.org/~gaas/HTML-Parser-3.71/lib/HTML/TokeParser.pm" rel="nofollow"><code>HTML::TokeParser::Simple</code></a>. loop through the <code>&lt;a&gt;</code> tags, skipping any that don't have the correct classes defined. For the ones that do, use <a href="http://search.cpan.org/~rse/lcwa-1.0.0/lib/lwp/lib/URI/URL.pm" rel="nofollow"><code>URI::URL</code></a> to parse the url and extract the "path" component (which in your case, would be "/pagename"). As you didn't want the leading slash, I used <code>substr</code> to remove the first character.

Output:

<pre><code>pagename
</code></pre>

I know it's much longer than a single regex but it's also a lot more robust and will continue to work even when the format of your HTML changes slightly in the future. HTML parsers exist for a reason :)

blocks|key|1360168|text|为此，我将使用awk：|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|1360169|awk+-F"[/?]"+'/timetable+work/+{print+$4}'file
pagename|code-block|syntax|javascript|1360170|它搜索包含timetable+work行，然后使用\或?作为分隔符打印第四个字段。|1360171|entityMap^0|7|3|0|0|5|E|P|1|R|1|0^^$0|@$1|2|3|4|5|6|7|O|8|@$9|P|A|Q|B|C]]|D|@]|E|$]]|$1|F|3|G|5|H|7|R|8|@]|D|@]|E|$I|J]]|$1|K|3|L|5|6|7|S|8|@$9|T|A|U|B|C]|$9|V|A|W|B|C]|$9|X|A|Y|B|C]]|D|@]|E|$]]|$1|M|3|-4|5|6|7|Z|8|@]|D|@]|E|$]]]|N|$]]

I would use <code>awk</code> for this:

<pre><code>awk -F"[/?]" '/timetable work/ {print $4}'file
pagename
</code></pre>

It search for a line containing <code>timetable work</code>, then print fourth field using <code>\</code> or <code>?</code> as separator.

I am trying to extract "pagename" from the following:

<pre><code>&lt;a class="timetable work" href="http://www.test.com/pagename?tag=meta376"&gt;Test&lt;/a&gt;
</code></pre>

I tried to get it to work using "sed" but it only says invalid command code.

What line of code would you guys suggest to get the pagename? By the way: This is not a single line but there is more content on the same line - but that should not make a difference as it should just matter what is between the limiters, right?

Thanks in advance for helping me out!

sed/grep - get text between two strings (html)

翻译质量差，导致语言生硬或混乱。

没有提供实际的解决方法或示例。

解答不清晰，无法理解或解决问题。

页面排版不美观，阅读体验差。

文章

问答

视频

学习中心

腾讯云实验室

直播

竞赛

腾讯云代码分析专区

腾讯iOA零信任安全管理系统专区

腾讯云架构师技术同盟交流圈

腾讯云数据库专区

腾讯云顾问专区

腾讯云原生专区

腾讯混元专区

腾讯云TCE专区

腾讯云Lighthouse专区

腾讯云HAI专区

腾讯云Edgeone专区

腾讯云存储专区

腾讯云智能专区

腾讯轻联专区 

腾讯云开发专区

TAPD专区

腾讯轻量云游戏服专区

腾讯云最具价值专家

腾讯云架构师技术同盟

腾讯云创作之星

腾讯云开发者先锋

腾讯云代码助手

云原生构建

TAPD 敏捷项目管理

Cloud Studio

SDK中心

API中心

命令行工具

涵盖代码开发、场景应用、自动测试全流程，助你从零构建专属AI助手

一站式MCP教程库，解锁AI应用新玩法

我正在尝试从以下内容中提取"pagename“：<a class="timetable work" href="http://www.test.com/pagename?tag=meta376">Test</a>我尝试使用"sed“让它工作，但它只显示无效的命令代码。你们建议使用哪一行代码来获取页面名称？顺便说一句:这不是一行，但在同一行上有更多的内容-但这不应该有什么不同，因为它应该只是限制之间

问sed/grep -获取两个字符串之间的文本(html)
EN

回答 5

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问sed/grep -获取两个字符串之间的文本(html)EN

回答 5

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问sed/grep -获取两个字符串之间的文本(html)
EN