前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >【非静态网页】【php爬虫】【动态渲染】JS渲染数据抓取 【QueryList】

【非静态网页】【php爬虫】【动态渲染】JS渲染数据抓取 【QueryList】

作者头像
躺平程序员老修
发布2023-09-05 16:19:56
3130
发布2023-09-05 16:19:56
举报

背景

爬虫的时候,经常由于网页数据是动态渲染的,导致爬的时候数据还没有渲染出来,而且也不知道哪些数据何时全部渲染完成,于是爬的都是html或者爬不到,还好找到了第三方包,这里用王者荣誉官网来做示例,最终数据展示可在如下小程序中看到:

gh_cd4007fc0841_344.jpg
gh_cd4007fc0841_344.jpg

jaeger/querylist爬虫工具

官方文档 https://querylist.cc/docs/guide/v4/PhantomJS

代码语言:javascript
复制
// 基本功能包
composer require jaeger/querylist
// JS动态渲染网页爬取插件(抓取动态渲染网页还需要下载工具:https://phantomjs.org/download.html)
composer require jaeger/querylist-phantomjs
代码语言:javascript
复制
    $url = 'www.litblc.com';    // 抓取网页地址
    $phantomPath = 'E:/githubShyzhen/FakePHP/phantomjs-2.1.1-windows/bin/phantomjs.exe';    // 下载的工具路径
    $ql = QueryList::getInstance();
    $ql->use(PhantomJs::class, $phantomPath);
    $html = $ql->browser($url)->getHtml();
    $dom = QueryList::html($html);
    
    $dom->find('.title-name')->text();
    ...

示例代码

代码语言:javascript
复制
    public function spader()
    {
        $this->handleSpader(105);
    }

    public function handleSpader($id)
    {
        $url = 'https://pvp.qq.com/web201605/herodetail/'.$id.'.shtml';
        $ql = QueryList::getInstance();
        $ql->use(PhantomJs::class,'E:/githubShyzhen/FakePHP/phantomjs-2.1.1-windows/bin/phantomjs.exe');
        $html = $ql->browser($url)->getHtml();

        $dom = QueryList::html($html);

        $mingTips = $dom->find('.sugg-tips')->text();
        $equipTips = $dom->find('.equip-tips')->eq(0)->text();


        // ming JSON
        $ming1Ids = $dom->find('.sugg-u1')->attr('data-ming');
        $tempIds = explode('|', $ming1Ids);
        $ming1Id = $tempIds[0];
        $ming2Id = $tempIds[1];
        $ming3Id = $tempIds[2];

        $ming1 = $dom->find('.sugg-u1 li')->eq(0);
        $ming2 = $dom->find('.sugg-u1 li')->eq(1);
        $ming3 = $dom->find('.sugg-u1 li')->eq(2);


        $ming1Name = $ming1->find('p')->eq(0)->text();
        $ming1Intro1 = $ming1->find('p')->eq(1)->text();
        $ming1Intro2 = $ming1->find('p')->eq(2)->text();
        $ming1Intro3 = $ming1->find('p')->eq(3)->text();


        $ming2Name = $ming2->find('p')->eq(0)->text();
        $ming2Intro1 = $ming2->find('p')->eq(1)->text();
        $ming2Intro2 = $ming2->find('p')->eq(2)->text();
        $ming2Intro3 = $ming2->find('p')->eq(3)->text();

        $ming3Name = $ming3->find('p')->eq(0)->text();
        $ming3Intro1 = $ming3->find('p')->eq(1)->text();
        $ming3Intro2 = $ming3->find('p')->eq(2)->text();
        $ming3Intro3 = $ming3->find('p')->eq(3)->text();

        $mingRes = [
            ['id' => $ming1Id, 'name' => $ming1Name, 'intro' => trim(implode('|', [$ming1Intro1, $ming1Intro2, $ming1Intro3]), '|')],
            ['id' => $ming2Id, 'name' => $ming2Name, 'intro' => trim(implode('|', [$ming2Intro1, $ming2Intro2, $ming2Intro3]), '|')],
            ['id' => $ming3Id, 'name' => $ming3Name, 'intro' => trim(implode('|', [$ming3Intro1, $ming3Intro2, $ming3Intro3]), '|')],
        ];
        $mingJson = json_encode($mingRes, JSON_UNESCAPED_UNICODE);


        // equipment JSON
        $equipmentDom = $dom->find('.equip-list')->eq(0);
        $eIdStr = $equipmentDom->attr('data-item');
        $eIds = explode('|', $eIdStr);
        $e1Id = $eIds[0];
        $e2Id = $eIds[1];
        $e3Id = $eIds[2];
        $e4Id = $eIds[3];
        $e5Id = $eIds[4];
        $e6Id = $eIds[5];

        $e1Name = $equipmentDom->find('#Jname')->eq(0)->text();
        $e2Name = $equipmentDom->find('#Jname')->eq(1)->text();
        $e3Name = $equipmentDom->find('#Jname')->eq(2)->text();
        $e4Name = $equipmentDom->find('#Jname')->eq(3)->text();
        $e5Name = $equipmentDom->find('#Jname')->eq(4)->text();
        $e6Name = $equipmentDom->find('#Jname')->eq(5)->text();

        $eRes = [
            ['id' => $e1Id, 'name' => $e1Name, 'intro' => ''],
            ['id' => $e2Id, 'name' => $e2Name, 'intro' => ''],
            ['id' => $e3Id, 'name' => $e3Name, 'intro' => ''],
            ['id' => $e4Id, 'name' => $e4Name, 'intro' => ''],
            ['id' => $e5Id, 'name' => $e5Name, 'intro' => ''],
            ['id' => $e6Id, 'name' => $e6Name, 'intro' => ''],
        ];
        $eJson = json_encode($eRes, JSON_UNESCAPED_UNICODE);


        // counterHero JSON
        $heroDom = $dom->find('.hero-info-box')->find('.hero-info')->eq(1);
        $h1Id = $heroDom->find('img')->eq(0)->src;
        $h2Id = $heroDom->find('img')->eq(1)->src;
        $h1Intro = $heroDom->find('.hero-list-desc')->find('p')->eq(0)->text();
        $h2Intro = $heroDom->find('.hero-list-desc')->find('p')->eq(1)->text();

        $id1 = substr($h1Id, strripos($h1Id, '/') + 1, strripos($h1Id, '.') - strripos($h1Id, '/') - 1);
        $id2 = substr($h2Id, strripos($h2Id, '/') + 1, strripos($h2Id, '.') - strripos($h2Id, '/') - 1);
        $heroRes = [
            ['id' => $id1, 'name' => $this->handleHeroName($id1), 'intro' => $h1Intro],
            ['id' => $id2, 'name' => $this->handleHeroName($id2), 'intro' => $h2Intro],
        ];
        $heroJson = json_encode($heroRes, JSON_UNESCAPED_UNICODE);

        $resHeroId = $id;
        $resMing = $mingJson;
        $resMingTips = $mingTips;
        $resEquipment = $eJson;
        $resEtips = $equipTips;
        $resCh = $heroJson;

        // 拼装sql
        $sql = "INSERT INTO `wangzhe_hero_tutorial` (`hero_id`,`ming`,`ming_tips`,`equipment`,`equipment_tips`,`counter_hero`, `created_at`, `updated_at`) VALUES ('$resHeroId', '$resMing', '$resMingTips', '$resEquipment', '$resEtips', '$resCh', '2022-03-29 16:29:53', '2022-03-29 16:29:53');";

        echo $sql;

        exit;
    }

    public function handleHeroName($heroId)
    {
        $json = '{"105": "廉颇","106": "小乔"}';
        $heroArr = json_decode($json, true);
        return $heroArr[$heroId];
    }
本文参与 腾讯云自媒体分享计划,分享自作者个人站点/博客。
如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 背景
  • jaeger/querylist爬虫工具
  • 示例代码
相关产品与服务
云开发 CloudBase
云开发(Tencent CloudBase,TCB)是腾讯云提供的云原生一体化开发环境和工具平台,为200万+企业和开发者提供高可用、自动弹性扩缩的后端云服务,可用于云端一体化开发多种端应用(小程序、公众号、Web 应用等),避免了应用开发过程中繁琐的服务器搭建及运维,开发者可以专注于业务逻辑的实现,开发门槛更低,效率更高。
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档