文章/答案/技术大牛

发布

社区首页 >问答首页 >尝试使用BeautifulSoup来抓取yelp评级并导出到csv，但csv只有评论，没有评级或ID

问尝试使用BeautifulSoup来抓取yelp评级并导出到csv，但csv只有评论，没有评级或ID
EN

Stack Overflow用户

提问于 2021-11-03 14:01:22

回答 1查看 106关注 0票数 0

我正在尝试用BeautifulSoup收集一家Yelp餐厅的100条评论/评分，以完成一项任务。我特别在寻找:评论ID评论评论评级

我是Python的新手，我觉得我错过了一些非常明显的东西

这是我到目前为止所得到的：

from bs4 import BeautifulSoup
import urllib.request

url = 'https://www.yelp.com/biz/ichiran-times-square-new-york-4?osq=Ichiban+Ramen' ourUrl = urllib.request.urlopen(url)

soup = BeautifulSoup(ourUrl,'html.parser') type(soup) print(soup.prettify())

for i in soup.find_all('div', {'class':" arrange-unit__373c0__3XPkE arrange-unit-fill__373c0__38Zde border-color--default__373c0__r305k"}): ID.append(i.find("div").get("aria-label"))

soup.find('p', {'class':"comment__373c0__Nsutg css-n6i4z7"})

i = soup.find('p', {'class':"comment__373c0__Nsutg css-n6i4z7"}) i.text

review=[]
rating = []
ID = []

for x in range(0,10):

url = "https://www.yelp.com/biz/ichiran-times-square-new-york-4?osq=Ichiban+Ramen="+str(10*x)

ourUrl = urllib.request.urlopen(url)

soup = BeautifulSoup(ourUrl,'html.parser')

#for i in soup,


for i in soup.find_all('div', {'class':" i-stars__373c0___sZu0 i-stars--regular-5__373c0__20dKs border-color--default__373c0__1yxBb overflow--hidden__373c0__1TJqF"}):
    per_rating = i.text
    rating.append(per_rating)

for i in soup.find_all('span', {'class':" arrange-unit__373c0__3XPkE arrange-unit-fill__373c0__38Zde border-color--default__373c0__r305k"}):
    ID.append(i.find("div").get("aria-label"))

for i in soup.find_all('p', {'class':"comment__373c0__Nsutg css-n6i4z7"}):
    per_review=i.text 
    review.append(per_review)

len(review)

下面是我导出到csv的尝试，在csv中，我只能获得评论文本，而不能获得其他内容：

with open('Review.csv','a',encoding = 'utf-8') as f:
     for each in review:
          f.write(each+'\n')

python

web-scraping

Stack Overflow用户

发布于 2021-11-03 14:35:48

编辑-更新的

该问题实际上看起来是由于没有针对HTML中的正确标记。

# Import regex package
import re

# Narrow down the section that you are searching in to avoid erroneous elements
child = soup.find('div', {'class': 'css-79elbk border-color--default__373c0__1ei3H'})

for x in child.find_all('span', {'class':"fs-block css-m6anxm"}):
    # Ignore the titular "Username"
    if x.text != 'Username':
        ID.append(x.text)

for x in child.find_all('div', {'class':re.compile(r'i-stars.+')}):
    rating.append(x.get('aria-label'))

for x in child.find_all('p', {'class':'comment__373c0__Nsutg css-n6i4z7'}):
    comment = x.find('span', {'class':'raw__373c0__tQAx6'})
    review.append(comment.text)

ID需要以特定元素'class':"fs-block css-m6anxm"为目标，而rating类根据它达到的星数而有所不同，因此实现正则表达式以识别以i-stars开头的任何内容。

原始答案

我相信你的问题是，当你还需要循环ID和rating时，你只在review中循环：

# Create new_line to work around f-strings issue with '\'
new_line = '\n'

with open('Review.csv','a',encoding = 'utf-8') as f:
     for i in range(len(review):
          f.write(f'{review[i]},{ID[i]},{rating[i]}{new_line}')

为了实现这一点，您还可以查看Pandas package。

您可以创建数据帧，然后将其导出为多种不同的文件类型，包括CSV，例如：

# Import Pandas package
import Pandas

# Store list values, along with column headings, in a dictionary
d = {'review_comment': review, 'review_id': ID, 'review_rating': rating}

# Create dataframe from the dictionary
df = pd.DataFrame(data=d)

# Export the dataframe as a CSV
df.to_csv('desired/save/location.csv', index=False)

票数 1

查看全部 1 条回答

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/69826227

复制

相似问题

问尝试使用BeautifulSoup来抓取yelp评级并导出到csv，但csv只有评论，没有评级或ID
EN

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问尝试使用BeautifulSoup来抓取yelp评级并导出到csv，但csv只有评论，没有评级或IDEN

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问尝试使用BeautifulSoup来抓取yelp评级并导出到csv，但csv只有评论，没有评级或ID
EN