首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >无法抓取数据看起来是加密的

无法抓取数据看起来是加密的
EN

Stack Overflow用户
提问于 2022-02-02 16:26:48
回答 2查看 165关注 0票数 0

我是一个初学者,网络刮擦,我试图从这个网页获得电话号码。https://www.quickerala.com/listings?q=Healthcare&location=Alleppey

在这个元素标签下

代码语言:javascript
复制
<span data-qk-el-trackcontact="1" data-trackdata="{&quot;business&quot;:&quot;427151&quot;,&quot;address&quot;:&quot;430820&quot;,&quot;number&quot;:&quot;653252%6252%3252%553252%6252%3252%653252%6252%3252%843252%6252%3252%653252%6252%3252%553252%6252%3252%753252%6252%3252%843252%6252%3252%653252%6252%3252%753252%6252%3252%&quot;,&quot;type&quot;:&quot;mobile&quot;,&quot;page&quot;:&quot;businessListings&quot;}" data-qk-el-unobfuscate="1" data-unobfuscate-text="653252%6252%3252%553252%6252%3252%653252%6252%3252%843252%6252%3252%653252%6252%3252%553252%6252%3252%753252%6252%3252%843252%6252%3252%653252%6252%3252%753252%6252%3252%">9809780878</span>

我可以看到电话号码"9809780878“,但我不知道如何得到它。

我试过了

代码语言:javascript
复制
response.xpath('//div[@class="listContacts brtop-10"]//data-trackdata') 

response.xpath('//span[@data-qk-el-trackcontact=1]/data-trackdata') 

response.xpath('//span[@data-qk-el-trackcontact=1]').extract()   

没有任何运气。

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2022-02-02 17:41:40

移动号码在特定区域的响应中被加密。

虽然数字在其他地区也是可行的。但是在这个解决方案中,我关注的是复制(网站) decryption JS function and execute in our python script

代码语言:javascript
复制
# Library Import
from bs4 import BeautifulSoup as Soup
import requests as rq
import js2py
import json

查找和复制它们的JS解密功能并使用js4py执行

代码语言:javascript
复制
# JS function to decode mobile number
script = '''
function(e) {
    var t = e.split("").reverse().join(""),
        n = "",
        o = decodeURIComponent(decodeURIComponent(t)).split("#&#");
    for (i = 0; i < o.length; i++) "" != o[i] && (n += String.fromCharCode(o[i]));
    return n
}
'''
number_fun = js2py.eval_js(script)

其余的代码和往常一样

代码语言:javascript
复制
# Process Data
res = rq.get(
    "https://www.quickerala.com/listings?q=Healthcare&location=Alleppey")
res_data = Soup(res.text, features="html.parser")
rows = res_data.findAll("button", {"data-el": "view-location"})
for row in rows:
    try:
        json_data = json.loads(row["data-map"])
        name = json_data["name"]
        location = json_data["location"]
        phone = number_fun(json_data["phone"][0])
        print(name, location, phone)
    except:
        pass

输出

代码语言:javascript
复制
Nediyathu Speech & Hearing Aid Centre Thrissur Mavelikara, Alleppey 9446477258
Mypharma Laboratories Alleppey, Alleppey 9809780878
Air Rescuers World Wide Pvt Ltd Chengannur, Alleppey 9870001118
Abhaya Ayurveda Hospital Ennakkad, Alleppey 9539297062
SOUPARNIKA AYURVEDA CLINIC Mavelikara, Alleppey 8075803773
Thottikuzhiyil Medicals Alappuzha, Alleppey 9447470538
Aawaaz Speech & Hearing Care Centre Cherthala, Alleppey 9995822386
DANA GYM Alleppey, Alleppey 9567448535
Upasana Yoga and Reiki Clinic Ambalapuzha, Alleppey 9947260352
SIDHA DEEPAM Cherthala, Alleppey 8714233349
SANTHISUKHAM Alleppey, Alleppey 8111928007
SAMANGA AYURVEDA & PHYSIOTHERAPY REHAB. Alleppey, Alleppey 90618 60702
KRV AYURVEDA KENDRAM Alleppey, Alleppey 9447145738
School of Life Skills Alappuzha, Alleppey 9895458500
Kripa Wellness Clinic & Diabetic Research Center Kayamkulam, Alleppey 0479 2446789
Dr. Oommens Eye Hospital & Microsurgery Center Chengannur, Alleppey 0479-2453416
Santhigiri Ayurveda & Siddha Vaidyasala, Chengannur, Alleppey 479-2452582
Santhigiri Ayurveda & Siddha Hospital Alleppey, Alleppey 0478-2879734
Santhigiri Ayurveda & Siddha Hospital Thiruvambady, Alleppey 0477-3200724
Tricare Diagnostics Chengannur, Alleppey 0479-2456664
Dhathri Ayurveda Hospital & Panchakarma Center Kayamkulam, Alleppey 0479-2431403
Pranala Diagnostics Haripad, Alleppey 9495603511
Sankar's Healthcare Diagnostics Pathirappally, Alleppey 9961234488
票数 2
EN

Stack Overflow用户

发布于 2022-02-03 02:38:26

关于每个帖子的所有数据都包含在每个帖子的div上方的脚本标记中。正如op所指出的,问题是关于scrapy的,这里是一个使用scrapy的解决方案。data_json对象包含更多的信息,您也可以抓取这些信息。

代码语言:javascript
复制
import scrapy
import json

class TestSpider(scrapy.Spider):
    name = 'test'

    start_urls = ['https://www.quickerala.com/listings?q=Healthcare&location=Alleppey']

    allowed_domains = ['quickerala.com']

    def parse(self, response):
        for data in response.xpath("//div[@class='listingWrap']/script[@type='application/ld+json']/text()").getall():
            data_json = json.loads(data)
            yield {
                "name": data_json.get('name'),
                "telephone": data_json.get('telephone')
            }

运行蜘蛛,您将获得如下结果:

代码语言:javascript
复制
2022-02-03 05:34:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.quickerala.com/listings?q=Healthcare&location=Alleppey>
{'name': 'Nediyathu Speech &amp; Hearing Aid Centre Thrissur', 'telephone': ['9446477258']}
2022-02-03 05:34:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.quickerala.com/listings?q=Healthcare&location=Alleppey>
{'name': 'Mypharma Laboratories', 'telephone': ['9809780878']}
2022-02-03 05:34:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.quickerala.com/listings?q=Healthcare&location=Alleppey>
{'name': 'Air Rescuers World Wide Pvt Ltd', 'telephone': ['9870001118']}
2022-02-03 05:34:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.quickerala.com/listings?q=Healthcare&location=Alleppey>
{'name': 'Abhaya Ayurveda Hospital', 'telephone': ['9539297062', '0479 2466021']}
2022-02-03 05:34:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.quickerala.com/listings?q=Healthcare&location=Alleppey>
{'name': 'SOUPARNIKA AYURVEDA CLINIC', 'telephone': ['8075803773', '04792478536']}
2022-02-03 05:34:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.quickerala.com/listings?q=Healthcare&location=Alleppey>
{'name': 'Thottikuzhiyil Medicals', 'telephone': ['9447470538']}
2022-02-03 05:34:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.quickerala.com/listings?q=Healthcare&location=Alleppey>
{'name': 'Aawaaz Speech &amp; Hearing Care Centre', 'telephone': ['9995822386', '0478 3242386']}
2022-02-03 05:34:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.quickerala.com/listings?q=Healthcare&location=Alleppey>
{'name': 'DANA GYM', 'telephone': ['9567448535']}
2022-02-03 05:34:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.quickerala.com/listings?q=Healthcare&location=Alleppey>
{'name': 'DANA GYM', 'telephone': ['9567448535']}
2022-02-03 05:34:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.quickerala.com/listings?q=Healthcare&location=Alleppey>
{'name': 'Upasana Yoga and Reiki Clinic', 'telephone': ['9947260352']}
2022-02-03 05:34:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.quickerala.com/listings?q=Healthcare&location=Alleppey>
{'name': 'SIDHA DEEPAM', 'telephone': ['8714233349']}
2022-02-03 05:34:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.quickerala.com/listings?q=Healthcare&location=Alleppey>
{'name': 'SANTHISUKHAM', 'telephone': ['8111928007']}
2022-02-03 05:34:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.quickerala.com/listings?q=Healthcare&location=Alleppey>
{'name': 'SAMANGA AYURVEDA &amp; PHYSIOTHERAPY REHAB.', 'telephone': ['90618 60702']}
2022-02-03 05:34:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.quickerala.com/listings?q=Healthcare&location=Alleppey>
{'name': 'KRV AYURVEDA KENDRAM', 'telephone': ['9447145738']}
2022-02-03 05:34:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.quickerala.com/listings?q=Healthcare&location=Alleppey>
{'name': 'School of Life Skills', 'telephone': ['9895458500']}
2022-02-03 05:34:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.quickerala.com/listings?q=Healthcare&location=Alleppey>
{'name': 'School of Life Skills', 'telephone': ['9895458500']}
2022-02-03 05:34:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.quickerala.com/listings?q=Healthcare&location=Alleppey>
{'name': 'Kripa Wellness Clinic &amp; Diabetic Research Center', 'telephone': ['0479 2446789', '0479 2448118']}
2022-02-03 05:34:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.quickerala.com/listings?q=Healthcare&location=Alleppey>
{'name': 'Ashta Ayurveda Vaidyalayam', 'telephone': ['9747172442', '8848224376']}
2022-02-03 05:34:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.quickerala.com/listings?q=Healthcare&location=Alleppey>
{'name': 'Karingattil Medicals', 'telephone': ['8281123295', '9747112263']}
2022-02-03 05:34:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.quickerala.com/listings?q=Healthcare&location=Alleppey>
{'name': 'Dr. Oommens Eye Hospital &amp; Microsurgery Center', 'telephone': ['0479-2453416']}
2022-02-03 05:34:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.quickerala.com/listings?q=Healthcare&location=Alleppey>
{'name': 'Santhigiri Ayurveda &amp; Siddha Vaidyasala,', 'telephone': ['479-2452582']}
2022-02-03 05:34:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.quickerala.com/listings?q=Healthcare&location=Alleppey>
{'name': 'Santhigiri Ayurveda &amp; Siddha Hospital', 'telephone': ['0478-2879734']}
2022-02-03 05:34:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.quickerala.com/listings?q=Healthcare&location=Alleppey>
{'name': 'Santhigiri Ayurveda &amp; Siddha Hospital', 'telephone': ['0477-3200724']}
2022-02-03 05:34:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.quickerala.com/listings?q=Healthcare&location=Alleppey>
{'name': 'Tricare Diagnostics', 'telephone': ['0479-2456664']}
2022-02-03 05:34:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.quickerala.com/listings?q=Healthcare&location=Alleppey>
{'name': 'Dhathri Ayurveda Hospital &amp; Panchakarma Center', 'telephone': ['0479-2431403', '0479-2431535']}
2022-02-03 05:34:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.quickerala.com/listings?q=Healthcare&location=Alleppey>
{'name': 'Pranala Diagnostics', 'telephone': ['9495603511', '0479-2443131']}
2022-02-03 05:34:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.quickerala.com/listings?q=Healthcare&location=Alleppey>
{'name': 'Sankar&#039;s Healthcare Diagnostics', 'telephone': ['9961234488']}
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/70959203

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档