首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >使用php的XML解析不显示某些标记数据

使用php的XML解析不显示某些标记数据
EN

Stack Overflow用户
提问于 2012-09-18 14:30:50
回答 3查看 2.3K关注 0票数 1

我正在尝试从链接中解析rss提要。下面是我的代码:

代码语言:javascript
运行
复制
            $content = file_get_contents($this->feed);     
            print_r($content);   
            $rss = new SimpleXmlElement($content);
            print_r($rss);
            $rss_split = array();
           /* foreach ($rss->channel->item as $item) {
                $title = (string) $item->title; // Title
                $link = (string) $item->link; // Url Link
                $description = (string) $item->description; //Description               
                $rss_split[] = '<div><a href="' . $link . '" target="_blank" title="" >' . $title . ' </a><hr></div>';
            }*/

完整的XML可以从这里下载:http://devilsworkshop.org/feed/

下面是一个摘录来说明这个结构:

代码语言:javascript
运行
复制
<item>
    <title>Windows 8 Appstore resembles a ghost town</title>
    <link>http://devilsworkshop.org/windows-appstore-resembles-ghost-town/</link>
    <comments>http://devilsworkshop.org/windows-appstore-resembles-ghost-town/#comments</comments>
    <pubDate>Tue, 18 Sep 2012 05:30:22 +0000</pubDate>
    <dc:creator>Vibin</dc:creator>
    <category><![CDATA[Analysis]]></category>
    <category><![CDATA[Windows 8]]></category>

    <guid isPermaLink="false">http://devilsworkshop.org/?p=62284</guid>
    <description><![CDATA[<p>Microsoft is all set to release Windows 8 for public in the coming weeks. Apparently, the biggest change in Windows 8 seems to be the Metro UI (I know it&#8217;s no more called Metro, but let&#8217;s keep it like that [...]</p><p>--
            This Post <a href="http://devilsworkshop.org/windows-appstore-resembles-ghost-town/">Windows 8 Appstore resembles a ghost town</a> is Published on <a href="http://devilsworkshop.org">Devils Workshop</a> .
        </p><h3>Related posts:</h3><ul>
            <li><a href='http://devilsworkshop.org/googles-new-look-resembles-yahoo-search/' rel='bookmark' title='Google&#8217;s new look resembles Yahoo Search'>Google&#8217;s new look resembles Yahoo Search</a></li>
        </ul>]]></description>
    <content:encoded><![CDATA[<p>Microsoft is all set to release Windows 8 for public in the coming weeks. Apparently, the biggest change in Windows 8 seems to be the Metro UI (I know it&#8217;s no more called Metro, but let&#8217;s keep it like that for simplicity) and apps.</p>
        <ul>
        <h2>Apps are less advanced</h2>
        <p>Metro is great on tablets, but on desktop, it looks like an OS with dumbed down apps. Take Skitch for example, it is an app for taking and editing screenshots and was previously a Mac-only app but recently came to Windows 8. Just compare these two apps and you&#8217;ll know what I meant.</p>
        <p>Here&#8217;s how Skitch looks in Windows 8:</p>
        <p><a href="http://devilsworkshop.org/files/2012/09/SkitchinWindows8.png"><img style=' display: block; margin-right: auto; margin-left: auto;'  class="aligncenter size-full wp-image-62302" title="SkitchinWindows8" src="http://devilsworkshop.org/files/2012/09/SkitchinWindows8.png" alt="" width="740" height="570" /></a></p>
        <p>And now, this is the Mac version of Skitch:</p>
        <p><a href="http://devilsworkshop.org/files/2012/09/SkitchinMac.png"><img style=' display: block; margin-right: auto; margin-left: auto;'  class="aligncenter size-full wp-image-62301" title="SkitchinMac" src="http://devilsworkshop.org/files/2012/09/SkitchinMac.png" alt="" width="671" height="575" /></a></p>
        <p>Another example can be Newsmix, an app which will let you read stuff that matters to you &#8211; in a Magazine layout. Apparently, this app is a fail for someone like me who subscribe to 50+ blogs.</p>
        <p><a href="http://devilsworkshop.org/files/2012/09/NewsmixinWindows8.png"><img style=' display: block; margin-right: auto; margin-left: auto;'  class="aligncenter size-large wp-image-62305" title="NewsMix in Windows 8" src="http://devilsworkshop.org/files/2012/09/NewsmixinWindows8-1024x640.png" alt="news-mix-windows-8" width="620" height="387" /></a><br />
            Sure, it will be great on a Windows slate, but not really on a PC/laptop.</p>
        <li><a href='http://devilsworkshop.org/how-to-enable-hibernate-option-in-windows-vistawindows-7/' rel='bookmark' title='How to enable Hibernate Option in Windows Vista/Windows 7'>How to enable Hibernate Option in Windows Vista/Windows 7</a></li>
        <li><a href='http://devilsworkshop.org/windows-store/' rel='bookmark' title='Microsoft to Introduce Windows Store with Windows 8 Platform'>Microsoft to Introduce Windows Store with Windows 8 Platform</a></li>
        </ul>]]>
    </content:encoded>          
    <wfw:commentRss>http://devilsworkshop.org/windows-appstore-resembles-ghost-town/feed/</wfw:commentRss>
    <slash:comments>0</slash:comments>
</item>

当我打印$content时,它显示了来自content:encoded标签的图像。但是打印$rss根本不显示该标签,而description标签也显示SimpleXMLElement Object()

我想要解析这两个标签。我哪里做错了?

EN

回答 3

Stack Overflow用户

回答已采纳

发布于 2012-09-18 20:21:53

首先,对于预测SimpleXML对象的行为,print_r()不是一个好的选择,因为它们不是“普通”的PHP对象。您可以尝试使用my simplexml_dump() function,它列出了特定节点或节点列表的内容、子节点和属性。

其次,content:encoded元素位于名称空间content中,因此需要使用->children() method告诉SimpleXML访问该名称空间中的节点,而不是默认的节点。例如echo $item->children('content', true)->encoded;

票数 2
EN

Stack Overflow用户

发布于 2012-09-18 15:28:12

当然,打印$rss不会显示数据。它展示了它的意图,因为它本身确实是一个SimpleXMLElement Object

但是,除此之外,据我所知,您的xml文档无法解析,因为它不是有效的UTF-8。在将其复制到我的客户端并对其进行梳理时,我发现了一堆xA0x92字符。

在将它们分别替换为相应的字符(空格和撇号)并保存文档后,它可以很好地解析。

这肯定是你的问题。

该问题的解决方案如下:

代码语言:javascript
运行
复制
$char_arr = array('/\xa0/','/\x92/','/\x96/');
$rep_arr = array('&nbsp;','\'','-');
$content = preg_replace($char_arr, $rep_arr, $content);

确保将以下代码放在simpleXML对象的声明之前:

代码语言:javascript
运行
复制
$content = file_get_contents($this->feed);     
print_r($content);
$char_arr = array('/\xa0/','/\x92/','/\x96/');
$rep_arr = array('&nbsp;','\'','-');
$content = preg_replace($char_arr, $rep_arr, $content);
$rss = new SimpleXmlElement($content);

这应该可以解决你的问题;我自己测试过,它在我这一端工作。

票数 1
EN

Stack Overflow用户

发布于 2013-04-23 22:16:41

多亏了IMSoP answer,我直接跟踪到了http://php.net/simplexml,在那里可以找到xaviered_at gmail_dot_com的xmlObjToArr($obj)函数,并使用它来解决同样的问题。

对于那些仍然在寻找一种简单的方法来在内容之间标记内容的人,这里有一个简短而明显的脚本:encoded

代码语言:javascript
运行
复制
<?php

echo "<pre>";

$url = "http://devilsworkshop.org/feed/";
$rss = simplexml_load_file($url);

if($rss){

    $items = $rss->channel->item;

    foreach($items as $item){

        $title = $item->title;
        $image = $item->image;
        $link = $item->link;
        $published_on = $item->pubDate;
        $description = $item->description;

        // bringing in to array <content:encoded> items from SimpleXMLElement Object()
        $content = xmlObjToArr($item->children('content', true)->encoded);


        echo "

        title: $title
        image: $image
        link: $link
        published on: $published_on
        description: $description
        content: 
        ";

        print_r($content);

    }
}


function xmlObjToArr($obj) {
        $namespace = $obj->getDocNamespaces(true);
        $namespace[NULL] = NULL;

        $children = array();
        $attributes = array();
        $name = strtolower((string)$obj->getName());

        $text = trim((string)$obj);
        if( strlen($text) <= 0 ) {
            $text = NULL;
        }

        // get info for all namespaces
        if(is_object($obj)) {
            foreach( $namespace as $ns=>$nsUrl ) {
                // atributes
                $objAttributes = $obj->attributes($ns, true);
                foreach( $objAttributes as $attributeName => $attributeValue ) {
                    $attribName = strtolower(trim((string)$attributeName));
                    $attribVal = trim((string)$attributeValue);
                    if (!empty($ns)) {
                        $attribName = $ns . ':' . $attribName;
                    }
                    $attributes[$attribName] = $attribVal;
                }

                // children
                $objChildren = $obj->children($ns, true);
                foreach( $objChildren as $childName=>$child ) {
                    $childName = strtolower((string)$childName);
                    if( !empty($ns) ) {
                        $childName = $ns.':'.$childName;
                    }
                    $children[$childName][] = xmlObjToArr($child);
                }
            }
        }

        return array(
            'name'=>$name,
            'text'=>$text,
            'attributes'=>$attributes,
            'children'=>$children
        );
    }


?>
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/12471500

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档