我正在尝试从链接中解析rss提要。下面是我的代码:
$content = file_get_contents($this->feed);
print_r($content);
$rss = new SimpleXmlElement($content);
print_r($rss);
$rss_split = array();
/* foreach ($rss->channel->item as $item) {
$title = (string) $item->title; // Title
$link = (string) $item->link; // Url Link
$description = (string) $item->description; //Description
$rss_split[] = '<div><a href="' . $link . '" target="_blank" title="" >' . $title . ' </a><hr></div>';
}*/
完整的XML可以从这里下载:http://devilsworkshop.org/feed/
下面是一个摘录来说明这个结构:
<item>
<title>Windows 8 Appstore resembles a ghost town</title>
<link>http://devilsworkshop.org/windows-appstore-resembles-ghost-town/</link>
<comments>http://devilsworkshop.org/windows-appstore-resembles-ghost-town/#comments</comments>
<pubDate>Tue, 18 Sep 2012 05:30:22 +0000</pubDate>
<dc:creator>Vibin</dc:creator>
<category><![CDATA[Analysis]]></category>
<category><![CDATA[Windows 8]]></category>
<guid isPermaLink="false">http://devilsworkshop.org/?p=62284</guid>
<description><![CDATA[<p>Microsoft is all set to release Windows 8 for public in the coming weeks. Apparently, the biggest change in Windows 8 seems to be the Metro UI (I know it’s no more called Metro, but let’s keep it like that [...]</p><p>--
This Post <a href="http://devilsworkshop.org/windows-appstore-resembles-ghost-town/">Windows 8 Appstore resembles a ghost town</a> is Published on <a href="http://devilsworkshop.org">Devils Workshop</a> .
</p><h3>Related posts:</h3><ul>
<li><a href='http://devilsworkshop.org/googles-new-look-resembles-yahoo-search/' rel='bookmark' title='Google’s new look resembles Yahoo Search'>Google’s new look resembles Yahoo Search</a></li>
</ul>]]></description>
<content:encoded><![CDATA[<p>Microsoft is all set to release Windows 8 for public in the coming weeks. Apparently, the biggest change in Windows 8 seems to be the Metro UI (I know it’s no more called Metro, but let’s keep it like that for simplicity) and apps.</p>
<ul>
<h2>Apps are less advanced</h2>
<p>Metro is great on tablets, but on desktop, it looks like an OS with dumbed down apps. Take Skitch for example, it is an app for taking and editing screenshots and was previously a Mac-only app but recently came to Windows 8. Just compare these two apps and you’ll know what I meant.</p>
<p>Here’s how Skitch looks in Windows 8:</p>
<p><a href="http://devilsworkshop.org/files/2012/09/SkitchinWindows8.png"><img style=' display: block; margin-right: auto; margin-left: auto;' class="aligncenter size-full wp-image-62302" title="SkitchinWindows8" src="http://devilsworkshop.org/files/2012/09/SkitchinWindows8.png" alt="" width="740" height="570" /></a></p>
<p>And now, this is the Mac version of Skitch:</p>
<p><a href="http://devilsworkshop.org/files/2012/09/SkitchinMac.png"><img style=' display: block; margin-right: auto; margin-left: auto;' class="aligncenter size-full wp-image-62301" title="SkitchinMac" src="http://devilsworkshop.org/files/2012/09/SkitchinMac.png" alt="" width="671" height="575" /></a></p>
<p>Another example can be Newsmix, an app which will let you read stuff that matters to you – in a Magazine layout. Apparently, this app is a fail for someone like me who subscribe to 50+ blogs.</p>
<p><a href="http://devilsworkshop.org/files/2012/09/NewsmixinWindows8.png"><img style=' display: block; margin-right: auto; margin-left: auto;' class="aligncenter size-large wp-image-62305" title="NewsMix in Windows 8" src="http://devilsworkshop.org/files/2012/09/NewsmixinWindows8-1024x640.png" alt="news-mix-windows-8" width="620" height="387" /></a><br />
Sure, it will be great on a Windows slate, but not really on a PC/laptop.</p>
<li><a href='http://devilsworkshop.org/how-to-enable-hibernate-option-in-windows-vistawindows-7/' rel='bookmark' title='How to enable Hibernate Option in Windows Vista/Windows 7'>How to enable Hibernate Option in Windows Vista/Windows 7</a></li>
<li><a href='http://devilsworkshop.org/windows-store/' rel='bookmark' title='Microsoft to Introduce Windows Store with Windows 8 Platform'>Microsoft to Introduce Windows Store with Windows 8 Platform</a></li>
</ul>]]>
</content:encoded>
<wfw:commentRss>http://devilsworkshop.org/windows-appstore-resembles-ghost-town/feed/</wfw:commentRss>
<slash:comments>0</slash:comments>
</item>
当我打印$content
时,它显示了来自content:encoded
标签的图像。但是打印$rss
根本不显示该标签,而description标签也显示SimpleXMLElement Object()
。
我想要解析这两个标签。我哪里做错了?
发布于 2012-09-18 20:21:53
首先,对于预测SimpleXML对象的行为,print_r()
不是一个好的选择,因为它们不是“普通”的PHP对象。您可以尝试使用my simplexml_dump()
function,它列出了特定节点或节点列表的内容、子节点和属性。
其次,content:encoded
元素位于名称空间content
中,因此需要使用->children()
method告诉SimpleXML访问该名称空间中的节点,而不是默认的节点。例如echo $item->children('content', true)->encoded;
发布于 2012-09-18 15:28:12
当然,打印$rss
不会显示数据。它展示了它的意图,因为它本身确实是一个SimpleXMLElement Object
。
但是,除此之外,据我所知,您的xml文档无法解析,因为它不是有效的UTF-8
。在将其复制到我的客户端并对其进行梳理时,我发现了一堆xA0
和x92
字符。
在将它们分别替换为相应的字符(空格和撇号)并保存文档后,它可以很好地解析。
这肯定是你的问题。
该问题的解决方案如下:
$char_arr = array('/\xa0/','/\x92/','/\x96/');
$rep_arr = array(' ','\'','-');
$content = preg_replace($char_arr, $rep_arr, $content);
确保将以下代码放在simpleXML对象的声明之前:
$content = file_get_contents($this->feed);
print_r($content);
$char_arr = array('/\xa0/','/\x92/','/\x96/');
$rep_arr = array(' ','\'','-');
$content = preg_replace($char_arr, $rep_arr, $content);
$rss = new SimpleXmlElement($content);
这应该可以解决你的问题;我自己测试过,它在我这一端工作。
发布于 2013-04-23 22:16:41
多亏了IMSoP answer,我直接跟踪到了http://php.net/simplexml,在那里可以找到xaviered_at gmail_dot_com的xmlObjToArr($obj)函数,并使用它来解决同样的问题。
对于那些仍然在寻找一种简单的方法来在内容之间标记内容的人,这里有一个简短而明显的脚本:encoded
<?php
echo "<pre>";
$url = "http://devilsworkshop.org/feed/";
$rss = simplexml_load_file($url);
if($rss){
$items = $rss->channel->item;
foreach($items as $item){
$title = $item->title;
$image = $item->image;
$link = $item->link;
$published_on = $item->pubDate;
$description = $item->description;
// bringing in to array <content:encoded> items from SimpleXMLElement Object()
$content = xmlObjToArr($item->children('content', true)->encoded);
echo "
title: $title
image: $image
link: $link
published on: $published_on
description: $description
content:
";
print_r($content);
}
}
function xmlObjToArr($obj) {
$namespace = $obj->getDocNamespaces(true);
$namespace[NULL] = NULL;
$children = array();
$attributes = array();
$name = strtolower((string)$obj->getName());
$text = trim((string)$obj);
if( strlen($text) <= 0 ) {
$text = NULL;
}
// get info for all namespaces
if(is_object($obj)) {
foreach( $namespace as $ns=>$nsUrl ) {
// atributes
$objAttributes = $obj->attributes($ns, true);
foreach( $objAttributes as $attributeName => $attributeValue ) {
$attribName = strtolower(trim((string)$attributeName));
$attribVal = trim((string)$attributeValue);
if (!empty($ns)) {
$attribName = $ns . ':' . $attribName;
}
$attributes[$attribName] = $attribVal;
}
// children
$objChildren = $obj->children($ns, true);
foreach( $objChildren as $childName=>$child ) {
$childName = strtolower((string)$childName);
if( !empty($ns) ) {
$childName = $ns.':'.$childName;
}
$children[$childName][] = xmlObjToArr($child);
}
}
}
return array(
'name'=>$name,
'text'=>$text,
'attributes'=>$attributes,
'children'=>$children
);
}
?>
https://stackoverflow.com/questions/12471500
复制相似问题