blocks|key|1132093|text|我同意戈登的观点，你必须使用HTML解析器来解析HTML。但是如果你真的想要一个正则表达式，你可以试试这个：|type|unstyled|depth|inlineStyleRanges|entityRanges|data|1132094|/%5E<a.*?href=(["\'])(.*?)\1.*$/|code-block|syntax|javascript|1132095|这与字符串开头的<a匹配，然后是任意数量的字符(非贪婪)+.*?，然后是href=，最后是由"或'包围的链接|offset|length|style|CODE|1132096|$str+=+'<a+title="this"+href="that">what?</a>';
preg_match('/%5E<a.*?href=(["\'])(.*?)\1.*$/',+$str,+$m);
var_dump($m);|1132097|输出：|1132098|array(3)+{
++[0]=>
++string(37)+"<a+title="this"+href="that">what?</a>"
++[1]=>
++string(1)+"""
++[2]=>
++string(4)+"that"
}|1132099|entityMap^0|0|0|8|2|T|3|10|5|1A|1|1C|1|0|0|0|0^^$0|@$1|2|3|4|5|6|7|U|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|V|8|@]|9|@]|A|$E|F]]|$1|G|3|H|5|6|7|W|8|@$I|X|J|Y|K|L]|$I|Z|J|10|K|L]|$I|11|J|12|K|L]|$I|13|J|14|K|L]|$I|15|J|16|K|L]]|9|@]|A|$]]|$1|M|3|N|5|D|7|17|8|@]|9|@]|A|$E|F]]|$1|O|3|P|5|6|7|18|8|@]|9|@]|A|$]]|$1|Q|3|R|5|D|7|19|8|@]|9|@]|A|$E|F]]|$1|S|3|-4|5|6|7|1A|8|@]|9|@]|A|$]]]|T|$]]

I agree with Gordon, you MUST use an HTML parser to parse HTML. But if you really want a regex you can try this one :

<pre><code>/^&lt;a.*?href=(["\'])(.*?)\1.*$/
</code></pre>

This matches <code>&lt;a</code> at the begining of the string, followed by any number of any char (non greedy) <code>.*?</code> then <code>href=</code> followed by the link surrounded by either <code>"</code> or <code>'</code>

<pre><code>$str = '&lt;a title="this" href="that"&gt;what?&lt;/a&gt;';
preg_match('/^&lt;a.*?href=(["\'])(.*?)\1.*$/', $str, $m);
var_dump($m);
</code></pre>

Output:

<pre><code>array(3) {
 [0]=&gt;
 string(37) "&lt;a title="this" href="that"&gt;what?&lt;/a&gt;"
 [1]=&gt;
 string(1) """
 [2]=&gt;
 string(4) "that"
}
</code></pre>

blocks|key|1131845|text|您要查找的模式将是链接锚模式，如(something)：|type|unstyled|depth|inlineStyleRanges|entityRanges|data|1131846|$regex_pattern+=+"/<a+href=\"(.*)\">(.*)<\/a>/";|code-block|syntax|javascript|1131847|entityMap^0|0|0^^$0|@$1|2|3|4|5|6|7|I|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|J|8|@]|9|@]|A|$E|F]]|$1|G|3|-4|5|6|7|K|8|@]|9|@]|A|$]]]|H|$]]

The pattern you want to look for would be the link anchor pattern, like (something):

<pre><code>$regex_pattern = "/&lt;a href=\"(.*)\"&gt;(.*)&lt;\/a&gt;/";
</code></pre>

blocks|key|1131775|text|你为什么不直接匹配|type|unstyled|depth|inlineStyleRanges|entityRanges|data|1131776|"<a.*?href\s*=\s*['"](.*?)['"]"

<?php

$str+=+'<a+title="this"+href="that">what?</a>';

$res+=+array();

preg_match_all("/<a.*?href\s*=\s*['\"](.*?)['\"]/",+$str,+$res);

var_dump($res);

?>|code-block|syntax|javascript|1131777|然后|1131778|$+php+test.php
array(2)+{
++[0]=>
++array(1)+{
++++[0]=>
++++string(27)+"<a+title="this"+href="that""
++}
++[1]=>
++array(1)+{
++++[0]=>
++++string(4)+"that"
++}
}|1131779|这是可行的。我刚刚删除了第一个捕获大括号。|1131780|entityMap^0|0|0|0|0|0^^$0|@$1|2|3|4|5|6|7|O|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|P|8|@]|9|@]|A|$E|F]]|$1|G|3|H|5|6|7|Q|8|@]|9|@]|A|$]]|$1|I|3|J|5|D|7|R|8|@]|9|@]|A|$E|F]]|$1|K|3|L|5|6|7|S|8|@]|9|@]|A|$]]|$1|M|3|-4|5|6|7|T|8|@]|9|@]|A|$]]]|N|$]]

why don't you just match 

<pre><code>"&lt;a.*?href\s*=\s*['"](.*?)['"]"

&lt;?php

$str = '&lt;a title="this" href="that"&gt;what?&lt;/a&gt;';

$res = array();

preg_match_all("/&lt;a.*?href\s*=\s*['\"](.*?)['\"]/", $str, $res);

var_dump($res);

?&gt;
</code></pre>

then

<pre><code>$ php test.php
array(2) {
 [0]=&gt;
 array(1) {
 [0]=&gt;
 string(27) "&lt;a title="this" href="that""
 }
 [1]=&gt;
 array(1) {
 [0]=&gt;
 string(4) "that"
 }
}
</code></pre>

which works. I've just removed the first capture braces.

blocks|key|1132143|text|对于那些仍然不能使用SimpleXML轻松快速地获得解决方案的人来说|type|unstyled|depth|inlineStyleRanges|entityRanges|data|1132144|$a+=+new+SimpleXMLElement('<a+href="www.something.com">Click+here</a>');
echo+$a['href'];+//+will+echo+www.something.com|code-block|syntax|javascript|1132145|这对我很有效|1132146|entityMap^0|0|0|0^^$0|@$1|2|3|4|5|6|7|K|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|L|8|@]|9|@]|A|$E|F]]|$1|G|3|H|5|6|7|M|8|@]|9|@]|A|$]]|$1|I|3|-4|5|6|7|N|8|@]|9|@]|A|$]]]|J|$]]

For the one who still not get the solutions very easy and fast using SimpleXML

<pre><code>$a = new SimpleXMLElement('&lt;a href="www.something.com"&gt;Click here&lt;/a&gt;');
echo $a['href']; // will echo www.something.com
</code></pre>

Its working for me

blocks|key|1131900|text|快速测试：<a\s%2B[%5E>]*href=(\"\'??)([%5E\1]%2B)(?:\1)>(.*)<\/a>似乎做到了这一点，第一个匹配是“or”，第二个是“href”值“that”，第三个匹配是“what？”。|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|1131901|我把“/”的第一个匹配项留在那里的原因是，你以后可以用它来反向引用它作为结束的“/”，所以它是一样的。|1131902|查看http://www.rubular.com/r/jsKyK2b6do上的实况示例|1131903|entityMap|0|LINK|mutability|MUTABLE|url|http://www.rubular.com/r/jsKyK2b6do^0|5|1B|0|0|2|Z|0|0^^$0|@$1|2|3|4|5|6|7|R|8|@$9|S|A|T|B|C]]|D|@]|E|$]]|$1|F|3|G|5|6|7|U|8|@]|D|@]|E|$]]|$1|H|3|I|5|6|7|V|8|@]|D|@$9|W|A|X|1|Y]]|E|$]]|$1|J|3|-4|5|6|7|Z|8|@]|D|@]|E|$]]]|K|$L|$5|M|N|O|E|$P|Q]]]]

Quick test: <code>&lt;a\s+[^&gt;]*href=(\"\'??)([^\1]+)(?:\1)&gt;(.*)&lt;\/a&gt;</code> seems to do the trick, with the 1st match being " or ', the second the 'href' value 'that', and the third the 'what?'.

The reason I left the first match of "/' in there is that you can use it to backreference it later for the closing "/' so it's the same.

See live example on: <a href="http://www.rubular.com/r/jsKyK2b6do" rel="nofollow">http://www.rubular.com/r/jsKyK2b6do</a>

blocks|key|917206|text|我不知道您在这里要做什么，但是如果您正在尝试验证链接，那么可以看看PHP的filter_var()|type|unstyled|depth|inlineStyleRanges|entityRanges|data|917207|如果您确实需要使用正则表达式，请查看此工具，它可能会有所帮助：http://regex.larsolavtorvik.com/|offset|length|917208|entityMap|0|LINK|mutability|MUTABLE|url|http://regex.larsolavtorvik.com/^0|0|V|W|0|0^^$0|@$1|2|3|4|5|6|7|N|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|O|8|@]|9|@$D|P|E|Q|1|R]]|A|$]]|$1|F|3|-4|5|6|7|S|8|@]|9|@]|A|$]]]|G|$H|$5|I|J|K|A|$L|M]]]]

I'm not sure what you're trying to do here, but if you're trying to validate the link then look at PHP's filter_var() 

If you really need to use a regular expression then check out this tool, it may help:
<a href="http://regex.larsolavtorvik.com/" rel="nofollow">http://regex.larsolavtorvik.com/</a>

blocks|key|1132030|text|使用您的正则表达式，我对其进行了一些修改以满足您的需要。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|1132031|<a.*?href=("%7C')(.*?)("%7C').*?>(.*)<\/a>|offset|length|style|CODE|1132032|我个人建议您使用HTML+Parser|1132033|编辑:已测试|1132034|entityMap|0|LINK|mutability|MUTABLE|url|http://docs.php.net/manual/en/domdocument.loadhtml.php^0|0|0|12|0|8|B|0|0|0^^$0|@$1|2|3|4|5|6|7|T|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|U|8|@$D|V|E|W|F|G]]|9|@]|A|$]]|$1|H|3|I|5|6|7|X|8|@]|9|@$D|Y|E|Z|1|10]]|A|$]]|$1|J|3|K|5|6|7|11|8|@]|9|@]|A|$]]|$1|L|3|-4|5|6|7|12|8|@]|9|@]|A|$]]]|M|$N|$5|O|P|Q|A|$R|S]]]]

Using your regex, I modified it a bit to suit your need.

<code>&lt;a.*?href=("|')(.*?)("|').*?&gt;(.*)&lt;\/a&gt;</code>

I personally suggest you use a <a href="http://docs.php.net/manual/en/domdocument.loadhtml.php" rel="nofollow">HTML Parser</a>

EDIT: Tested

blocks|key|1132180|text|下面的代码适用于我，它同时返回锚标记的href和value。|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|1132181|preg_match_all("'\<a.*?href=\"(.*?)\".*?\>(.*?)\<\/a\>'si",+$html,+$match);
if($match)+{
++++foreach($match[0]+as+$k+=>+$e)+{
++++++++$urls[]+=+array(
++++++++++++'anchor'++++=>++$e,
++++++++++++'href'++++++=>++$match[1][$k],
++++++++++++'value'+++++=>++$match[2][$k]
++++++++);
++++}
}|code-block|syntax|javascript|1132182|名为$urls的多维数组现在包含易于使用的关联子数组。|1132183|entityMap^0|J|4|O|5|0|0|2|5|0^^$0|@$1|2|3|4|5|6|7|O|8|@$9|P|A|Q|B|C]|$9|R|A|S|B|C]]|D|@]|E|$]]|$1|F|3|G|5|H|7|T|8|@]|D|@]|E|$I|J]]|$1|K|3|L|5|6|7|U|8|@$9|V|A|W|B|C]]|D|@]|E|$]]|$1|M|3|-4|5|6|7|X|8|@]|D|@]|E|$]]]|N|$]]

The following is working for me and returns both <code>href</code> and <code>value</code> of the anchor tag.

<pre><code>preg_match_all("'\&lt;a.*?href=\"(.*?)\".*?\&gt;(.*?)\&lt;\/a\&gt;'si", $html, $match);
if($match) {
 foreach($match[0] as $k =&gt; $e) {
 $urls[] = array(
 'anchor' =&gt; $e,
 'href' =&gt; $match[1][$k],
 'value' =&gt; $match[2][$k]
 );
 }
}
</code></pre>

The multidimensional array called <code>$urls</code> contains now associative sub-arrays that are easy to use.

Trying to find the links on a page. 

my regex is:

<pre><code>/&lt;a\s[^&gt;]*href=(\"\'??)([^\"\' &gt;]*?)[^&gt;]*&gt;(.*)&lt;\/a&gt;/
</code></pre>

but seems to fail at

<pre><code>&lt;a title="this" href="that"&gt;what?&lt;/a&gt;
</code></pre>

How would I change my regex to deal with href not placed first in the a tag?

Grabbing the href attribute of an A element

正在尝试查找页面上的链接。我的正则表达式是：/<a\s[^>]*href=(\"\'??)([^\"\' >]*?)[^>]*>(.*)<\/a>/但是看起来失败了<a title="this" href="that">what?</a>如何更改正则表达式以处理未放在a标记中的href？

问获取A元素的href属性
EN

回答 8

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问获取A元素的href属性EN

回答 8

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问获取A元素的href属性
EN