问注释在网页上可见，但BeautifulSoup返回的html对象不包含注释部分
EN

Stack Overflow用户

提问于 2019-03-25 09:04:33

回答 1查看 140关注 0票数 1

我尝试使用网址链接从网页中提取评论的文本内容，并使用BeautifulSoup进行抓取。当我单击URL链接时，在页面上可以看到注释的内容，但是BeautifulSoup返回的HTML对象不包含这些标记和文本。

我使用带有'html.parser‘的BeautifulSoup来做网页抓取。我成功地提取了给定网页中视频的点赞/浏览量/评论数，但评论部分的信息没有包含在HTML文件中。我使用的浏览器是Chrome，系统是Ubuntu 18.04.1 LTS。

这是我使用的代码(在python中)：

from urllib.request import urlopen
from bs4 import BeautifulSoup
import os

webpage_link = "https://www.airvuz.com/video/Majestic-Beast-Nanuk?id=59b2a56141ab4823e61ea901"

try:
    page = urlopen(webpage_link)
except urllib.error.HTTPError as err:  # webpage cannot be found
    print("ERROR! %s" %(webpage_link))

soup = BeautifulSoup(page, 'html.parser')

预期的结果是soup对象包含了网页上所有可见的内容，特别是评论的文本内容(比如“不在那里，我很享受看到白熊的生活方式。感谢提供这样的纪录片。”和"WOOOW...令人惊叹...“)；但是，我在soup对象中找不到相应的节点。任何帮助都将不胜感激！

python

web-scraping

beautifulsoup

data-extraction

回答 1

Stack Overflow用户

回答已采纳

发布于 2019-03-25 09:26:13

注释是由JavasSript通过ajax请求生成的。您可以发送相同的请求并从json响应中获取注释。您可以使用inspect工具中的network选项卡找到该请求。

from urllib.request import urlopen
from bs4 import BeautifulSoup, Comment
import json
webpage_link = "https://www.airvuz.com/api/comments/video/59b2a56141ab4823e61ea901?page=1&limit=20"
page = urlopen(webpage_link).read()
comments_json=data = json.loads(page)
for comment_info in comments_json['data']:
    print(comment_info['comment'].strip())

输出

Not being there I enjoyed a lot seeing the life style of white bear. Thanks to the provider for  such documentary.
WOOOW... amazing...
I've been photographing polar bears for years, but to see this footage from a drones perspective was epic! Well done and congratz on the Nominee! Well deserved.
You are da man Florian!
Absolutely outstanding!
This is incredible
jaw dropping
This is wow amazing, love it.
So cool! Did the bears react to the drone at all?
Congratulations! It's awesome! I am watching in tears....
Awesome!
perfect video awesome
It is very, very beautiful !!! Sincere congratulations
Made my day, exquisite, thank you
Wow
Super!
Marvelous!
Man this is incredible!
Material is good, but  edi is bad. This history about  beer's family...
Muy bueno!

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/55330107

复制

相似问题

问注释在网页上可见，但BeautifulSoup返回的html对象不包含注释部分
EN

回答 1

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问注释在网页上可见，但BeautifulSoup返回的html对象不包含注释部分EN

回答 1

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问注释在网页上可见，但BeautifulSoup返回的html对象不包含注释部分
EN