首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >我使用php爬虫获取页面page.how,现在我想统计每个urls上的总浏览量(点击),我可以用php来做吗?

我使用php爬虫获取页面page.how,现在我想统计每个urls上的总浏览量(点击),我可以用php来做吗?
EN

Stack Overflow用户
提问于 2019-07-28 13:41:11
回答 1查看 37关注 0票数 0
代码语言:javascript
运行
复制
 $main_url="http://programming.com";
 $str = file_get_contents($main_url);

 // Gets Webpage Title
 if(strlen($str)>0)
 {

      $str = trim(preg_replace('/\s+/', ' ', $str)); // supports line breaks inside <title>
      preg_match("/\<title\>(.*)\<\/title\>/i",$str,$title); // ignore case
      $title=$title[1];
 }

 // Gets Webpage Description
 $b =$main_url;
 @$url = parse_url( $b );
 @$tags = get_meta_tags($url['scheme'].'://'.$url['host'] );
 $description=$tags['description'];

 // Gets Webpage Internal Links
 $doc = new DOMDocument; 
 @$doc->loadHTML($str); 

 $items = $doc->getElementsByTagName('a'); 
 foreach($items as $value) 
 { 
      $attrs = $value->attributes; 

      $sec_url[]=$attrs->getNamedItem('href')->nodeValue;

 }

 /*foreach ($sec_url as  $value) {
        print_r($value);

        ?>
    <br>
        <?php

}*/

foreach($sec_url as $value)
{

    $sq2 = "insert into datascience (link,title,description,internal_link) 

                     values('$main_url','$title','$description','$value')";  

    $res= mysqli_query($conn, $sq2);
EN

回答 1

Stack Overflow用户

发布于 2019-07-28 15:25:54

在加载的文档中使用XPath。这只会让事情变得一致。

然后,构建XPath表达式以在每个<article>中定位它们,这意味着您可以选择每个项目的所有相关详细信息。在XPath中-使用descendant轴(descendant::...)指示希望上下文节点内的节点(作为最后一个参数传递给evaluate())..

代码语言:javascript
运行
复制
$main_url="http://programming.com";
$str = file_get_contents($main_url);

$doc = new DOMDocument;
libxml_use_internal_errors(true);
$doc->loadHTML($str);
$xp = new DOMXPath($doc);

$title = $doc->getElementsByTagName("title")[0]->textContent;
$description = $xp->evaluate("string(//meta[@name='description']/@content)");

echo $title.PHP_EOL;
echo $description.PHP_EOL;

$articles = $doc->getElementsByTagName('article');
$pageArticles = [];
foreach($articles as $article) {
    $articleTitle = $xp->evaluate("string(descendant::span[@title='Views'])", $article);
    $articleViews = $xp->evaluate("string(descendant::h2[@class='title'])", $article);
    $pageArticles[] = ["title" => $articleTitle, "views" => $articleViews];
}

print_r($pageArticles);

它只是给了我作为输出。

代码语言:javascript
运行
复制
Tutorials - Programming.com
Tap into the collective intelligence of researchers who are working on the same problems you are - right now.

Array
(
    [0] => Array
        (
            [title] => 1,031
            [views] => HTML Cheat Sheet
        )

    [1] => Array
        (
            [title] => 390
            [views] => Best Java Training Institutes In Noida 
        )

    [2] => Array
        (
            [title] => 329
            [views] => Best Salesforce Training institutes in noida
        )

    [3] => Array
        (
            [title] => 382
            [views] => Top Quality Digital Marketing Training Institutes in Noida    
        )

    [4] => Array
        (
            [title] => 308
            [views] => Make your studies with professional Best Oracle Training Institutes in Noida    
        )

    [5] => Array
        (
            [title] => 374
            [views] => Create a Unique Project with a Best Linux Training Institutes in Noida
        )

    [6] => Array
        (
            [title] => 385
            [views] => Webtrackker Technology Best Dot Net Training Institutes Available To Guide the Students 
        )

    [7] => Array
        (
            [title] => 430
            [views] => Availability of My University Help Offers Great Benefit to Students
        )

    [8] => Array
        (
            [title] => 350
            [views] => Webtrackker Institute of Professional Studies: Hadoop Training Institute in Noida    
        )

    [9] => Array
        (
            [title] => 416
            [views] => The Best Quality Digital Marketing Training Institutes in Noida
        )

)
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/57238264

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档