首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >如何在Google Play搜索中抓取所有应用程序商店应用程序

如何在Google Play搜索中抓取所有应用程序商店应用程序
EN

Stack Overflow用户
提问于 2022-04-03 16:31:45
回答 2查看 1.5K关注 0票数 0

我正在尝试使用find_all(),但似乎在查找特定信息的标记时遇到了问题。

我很乐意构建一个包装器,这样我就可以从应用程序商店中提取数据,比如标题、publisher等(公共HTML信息)。

代码不对,我知道。我能找到的最接近div标识符的是"c4"

任何洞察力都有帮助。

代码语言:javascript
运行
复制
# Imports
import requests
from bs4 import BeautifulSoup

# Data Defining
url = "https://play.google.com/store/search?q=weather%20app"

# Getting HTML

page = requests.get(url)
soup = BeautifulSoup(page.content, "html.parser")
soup.get_text()

results = soup.find_all(id="c4")

我期待着不同天气应用和信息的输出:

代码语言:javascript
运行
复制
Weather App 1
Develop Company 1

Google Weather App
Develop Company 2

Bing Weather App
Bing Developers
EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2022-04-03 18:02:49

我从url得到的输出

代码语言:javascript
运行
复制
from bs4 import BeautifulSoup
import requests

url='https://play.google.com/store/search?q=weather%20app'
req=requests.get(url)

soup = BeautifulSoup(req.content, 'html.parser')

cards= soup.find_all("div",class_="vU6FJ p63iDd")

for card in cards:
    app_name= card.find("div",class_="WsMG1c nnK0zc").text
    company = card.find("div",class_="KoLSrc").text
    print("Name: " + app_name)
    print("Company: " + company)

输出:

代码语言:javascript
运行
复制
Name: Weather app
Company: Accurate Weather Forecast & Weather Radar Map  
Name: AccuWeather: Weather Radar
Company: AccuWeather
Name: Weather Forecast - Accurate Local Weather & Widget
Company: Weather Forecast & Widget & Radar
Name: 1Weather Forecasts & Radar
Company: OneLouder Apps
Name: MyRadar Weather Radar
Company: ACME AtronOmatic LLC
Name: Weather data & microclimate : Weather Underground
Company: Weather Underground
Name: Weather & Widget - Weawow
Company: weawow weather app
Name: Weather forecast
Company: smart-pro android apps
Name: The Secret World of Weather: How to Read Signs in Every Cloud, Breeze, Hill, Street, Plant, Animal, and Dewdrop
Company: Tristan Gooley
Name: The Weather Machine: A Journey Inside the Forecast
Company: Andrew Blum
Name: The Mobile Mind Shift: Engineer Your Business to Win in the Mobile Moment
Company: Julie Ask
Name: Together: The Healing Power of Human Connection in a Sometimes Lonely World
Company: Vivek H. Murthy
Name: The Meadow
Company: James Galvin
Name: The Ancient Egyptian Culture Revealed, 2nd edition
Company: Moustafa Gadalla
Name: The Ancient Egyptian Culture Revealed, 2nd edition
Company: Moustafa Gadalla
Name: Chaos Theory
Company: Introbooks Team
Name: Survival Training: Killer Tips for Toughness and Secret Smart Survival Skills       
Company: Wesley Jones
Name: Kiasunomics 2: Economic Insights for Everyday Life
Company: Ang Swee Hoon
Name: Summary of We Are The Weather by Jonathan Safran Foer
Company: QuickRead
Name: Learn Swift by Building Applications: Explore Swift programming through iOS app development
Company: Emil Atanasov
Name: Weather Hazard Warning Application in Car-to-X Communication: Concepts, Implementations, and Evaluations
Company: Attila Jaeger
Name: Mobile App Development with Ionic, Revised Edition: Cross-Platform Apps with Ionic, 
Angular, and Cordova
Company: Chris Griffith
Name: Good Application Makes a Good Roof Better: A Simplified Guide: Installing Laminated 
Asphalt Shingles for Maximum Life & Weather Protection
Company: ARMA Asphalt Roofing Manufacturers Association
Name: The Secret World of Weather: How to Read Signs in Every Cloud, Breeze, Hill, Street, Plant, Animal, and Dewdrop
Company: Tristan Gooley
Name: The Weather Machine: A Journey Inside the Forecast
Company: Andrew Blum
Name: Space Physics and Aeronomy, Space Weather Effects and Applications
Company: Book 5
Name: How to Build Android Apps with Kotlin: A hands-on guide to developing, testing, and 
publishing your first apps with Android
Company: Alex Forrester
Name: Android 6 for Programmers: An App-Driven Approach, Edition 3
Company: Paul J. Deitel
票数 0
EN

Stack Overflow用户

发布于 2022-04-07 12:07:03

确保使用user-agent作为“真正的”用户请求,因为有时您可以接收到具有不同元素和选择器的不同HTML,以及由于没有将user-agent传递给请求标头而导致的某种错误。

user-agent并在可能的时候更新它,因为如果user-agent是旧的,网站可能会阻止请求,例如使用Chrome70版本。

此外,通过单击浏览器中所需的元素,查看SelectorGadget Chrome扩展以直观地获取CSS选择器。

更新06/06/2022.

谷歌最近改变了它的UI。现在Google Play Search返回的应用程序数量有限,即没有分页。

代码和在线IDE中的完整示例 (在Google更改后更新的代码):

代码语言:javascript
运行
复制
from bs4 import BeautifulSoup
import requests, json, lxml, re


def bs4_scrape_google_play_store_search_apps(
    query: str, filter_by: str = "apps", country: str = "US"
):
    # https://docs.python-requests.org/en/master/user/quickstart/#passing-parameters-in-urls
    params = {
        "q": query,     # search query
        "gl": country,  # country of the search. Different country display different apps.
        "c": filter_by  # filter to display list of apps. Other filters: apps, books, movies
    }

    # https://docs.python-requests.org/en/master/user/quickstart/#custom-headers
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.79 Safari/537.36",
    }

    html = requests.get("https://play.google.com/store/search", params=params, headers=headers, timeout=30)
    soup = BeautifulSoup(html.text, "lxml")

    apps_data = []

    for app in soup.select("[jscontroller=tKHFxf]"):
        title = app.select_one(".DdYX5").text
        company = app.select_one(".wMUdtb").text
        app_icon = app.select_one(".j2FCNc img")["srcset"]

        try:
            thumbnail = app.select_one(".Shbxxd img")["srcset"]
        except:
            thumbnail = app.select_one(".Vc0mnc img")["src"]

        app_link = f'https://play.google.com{app.select_one(".Si6A0c.Gy4nib")["href"]}'
        app_id = app.select_one("a")["href"].split("id=")[1]
        
        try:
            # https://regex101.com/r/SZLPRp/1
            rating = re.search(r"\d{1}\.\d{1}", app.select_one(".ubGTjb div")["aria-label"]).group()
        except:
            rating = None
        
        
        apps_data.append({
            "title": title,
            "app_link": app_link,
            "company": company,
            "rating": float(rating) if rating else rating, # float if rating is not None else rating or None
            "app_id": app_id,
            "thumbnail": thumbnail,
            "icon": app_icon
        })        

    print(json.dumps(apps_data, indent=2, ensure_ascii=False))

bs4_scrape_google_play_store_search_apps(query="maps", filter_by="apps", country="US")

产出的一部分:

代码语言:javascript
运行
复制
[
  {
    "title": "Google Maps",
    "app_link": "https://play.google.com/store/apps/details?id=com.google.android.apps.maps",
    "company": "Google LLC",
    "rating": 3.9,
    "app_id": "com.google.android.apps.maps",
    "thumbnail": "https://play-lh.googleusercontent.com/FQx43QTaAqeOtoTLylK3WIs7ySKuGS8AurXNA1Kj34m6w6CjavF4Oj3s5DB6xZZ7DS63=w832-h470-rw 2x",
    "icon": "https://play-lh.googleusercontent.com/Kf8WTct65hFJxBUDm5E-EpYsiDoLQiGGbnuyP6HBNax43YShXti9THPon1YKB6zPYpA=s128-rw 2x"
  }, ... other results
  {
    "title": "GPS, Maps, Voice Navigation & Directions",
    "app_link": "https://play.google.com/store/apps/details?id=com.maps.voice.navigation.traffic.gps.location.route.driving.directions",
    "company": "AppStar Studios",
    "rating": 4.0,
    "app_id": "com.maps.voice.navigation.traffic.gps.location.route.driving.directions",
    "thumbnail": "https://i.ytimg.com/vi/4E2NyVZlOjc/hqdefault.jpg",
    "icon": "https://play-lh.googleusercontent.com/NrK0b-e6cpj4yYkDuNZJHO9KUAl8pSj9TGi4Xw4GbPZ6UVsnAlLBH2AZuEMpb24Xig=s128-rw 2x"
  }
]

另一种解决方案可以是使用来自Google Play Store API的SerpApi。这是一个有免费计划的付费API。

不同之处在于,不需要从头开始创建解析器、维护解析器、研究如何提取数据、绕过Google或其他搜索引擎的块。

合并守则:

代码语言:javascript
运行
复制
from serpapi import GoogleSearch
import json

params = {
    "api_key": "API KEY",      # your serpapi api key
    "engine": "google_play",   # search engine
    "hl": "en",                # language
    "store": "apps",           # apps search
    "gl": "us",                # country to search from. Different country displays different.
    "q": "weather"             # search query
}

search = GoogleSearch(params)  # where data extracts
results = search.get_dict()    # JSON -> Python dictionary

apps_data = []

for apps in results["organic_results"]:
    for app in apps["items"]:
        apps_data.append({
            "title": app.get("title"),
            "link": app.get("link"),
            "description": app.get("description"),
            "product_id": app.get("product_id"),
            "rating": app.get("rating"),
            "thumbnail": app.get("thumbnail"),
            })

print(json.dumps(apps_data, indent=2, ensure_ascii=False))

部分输出(包含在游乐场中可以看到的其他数据。):

代码语言:javascript
运行
复制
[
  {
    "title": "Weather app",
    "link": "https://play.google.com/store/apps/details?id=com.weather.forecast.weatherchannel",
    "description": "The weather channel, tiempo weather forecast, weather radar & weather map",
    "product_id": "com.weather.forecast.weatherchannel",
    "rating": 4.7,
    "thumbnail": "https://play-lh.googleusercontent.com/GdXjVGXQ90eVNpb1VoXWGT3pff2M9oe3yDdYGIsde7W9h3s2S6FDLfo1uO-gljBZ1QXO=s128-rw"
  },
  {
    "title": "The Weather Channel - Radar",
    "link": "https://play.google.com/store/apps/details?id=com.weather.Weather",
    "description": "Weather Forecast & Snow Radar: local rain tracker, weather maps & alerts",
    "product_id": "com.weather.Weather",
    "rating": 4.6,
    "thumbnail": "https://play-lh.googleusercontent.com/RV3DftXlA7WUV7w-BpE8zM0X7Y4RQd2vBvZVv6A01DEGb_eXFRjLmUhSqdbqrEl9klI=s128-rw"
  },
  {
    "title": "AccuWeather: Weather Radar",
    "link": "https://play.google.com/store/apps/details?id=com.accuweather.android",
    "description": "Your local weather forecast, storm tracker, radar maps & live weather news",
    "product_id": "com.accuweather.android",
    "rating": 4.0,
    "thumbnail": "https://play-lh.googleusercontent.com/EgDT3XrIaJbhZjINCWsiqjzonzqve7LgAbim8kHXWgg6fZnQebqIWjE6UcGahJ6yugU=s128-rw"
  },
  {
    "title": "Weather by WeatherBug",
    "link": "https://play.google.com/store/apps/details?id=com.aws.android",
    "description": "The Most Accurate Weather Forecast. Alerts, Radar, Maps & News from WeatherBug",
    "product_id": "com.aws.android",
    "rating": 4.7,
    "thumbnail": "https://play-lh.googleusercontent.com/_rZCkobaGZzXN3iquPr4u2KOe7C-ljnrSkBfw6sVL1kpUfq3sBl5MoRJEisBSnxaD-M=s128-rw"
  }, ... other results
]

我也有一篇专门的用Python抓取Google播放搜索应用程序博客文章,其中有一个一步步的解释,这对这个答案来说太过分了。

免责声明,我为SerpApi工作。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/71727849

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档