YQL Console Link
查询:
select * from html where url='http://www.cbs.com/shows/big_brother/video/' and xpath='//div[@id="cbs-video-metadata-wrapper"]/div[@class="cbs-video-share"]/a'返回:
<?xml version="1.0" encoding="UTF-8"?>
<query xmlns:yahoo="http://www.yahooapis.com/v1/base.rng"
yahoo:count="1" yahoo:created="2011-07-09T23:14:02Z" yahoo:lang="en-US">
<diagnostics>
<publiclyCallable>true</publiclyCallable>
<url execution-time="146" proxy="DEFAULT"><![CDATA[http://www.cbs.com/shows/big_brother/video/]]></url>
<user-time>163</user-time>
<service-time>146</service-time>
<build-version>19262</build-version>
</diagnostics>
<results>
<a class="twitter-share-button" href="http://twitter.com/share"/>
</results>
</query>应该返回类似如下的内容:
<results>
<a href="http://twitter.com/share" data-url="http://www.cbs.com/shows/big_brother/video/2045825951/big-brother-episode-1" class="twitter-share-button"></a>
</results>如果我回退一个级别的查询,它会完全剥离元素,我也可以使用它来获取所需的数据。
发布于 2012-11-09 06:48:14
我们有了一个新的html解析器,现在可以识别自定义属性。
添加compat="html5"以触发新的解析器。
例如:
select * from html where url = "http://mydomain.com" and compat="html5"https://stackoverflow.com/questions/6638218
复制相似问题