我写了一些代码,把“地址”和“电话”与一些工作正常的商店名称擦掉。然而,它有两个参数需要填充来执行它的活动。我希望在csv文件中也这样做,其中"Name“将位于第一列,"Lid”将出现在第二列中,所获得的结果将相应地放置在第三和第四列中。此时,我不知道如何从csv文件执行搜索。任何建议都会受到极大的赞赏。
import requests
from lxml import html
Names=["Literati Cafe","Standard Insurance Co","Suehiro Cafe"]
Lids=["3221083","497670909","12183177"]
for Name in Names and Lids:
Page_link="https://www.yellowpages.com/los-angeles-ca/mip/"+Name.replace(" ","-")+"-"+Name
response = requests.get(Page_link)
tree = html.fromstring(response.text)
titles = tree.xpath('//article[contains(@class,"business-card")]')
for title in titles:
Address= title.xpath('.//p[@class="address"]/span/text()')[0]
Contact = title.xpath('.//p[@class="phone"]/text()')[0]
print(Address,Contact)
发布于 2017-05-24 22:34:59
您可以从CSV获得Names
和Lids
列表,如下所示:
import csv
Names, Lids = [], []
with open("file_name.csv", "r") as f:
reader = csv.DictReader(f)
for line in reader:
Names.append(line["Name"])
Lids.append(line["Lid"])
(目前没有违反PEP的规定;)。然后,您可以在剩下的代码中使用它,尽管我不确定您试图用for Name in Names and Lids:
循环实现什么,但它并没有给出您认为的结果-它不会循环通过Names
列表,而是只通过Lids
列表。
优化的第一步应该是用CSV上的循环替换您的循环,如下所示:
with open("file_name.csv", "r") as f:
reader = csv.DictReader(f)
for entry in reader:
page_link = "https://www.yellowpages.com/los-angeles-ca/mip/{}-{}".format(entry["Name"].replace(" ","-"), entry["Lid"])
# rest of your scraping code...
https://stackoverflow.com/questions/44167840
复制相似问题