我是一个新手,正在使用Scrapy,我试图从这个表中抓取数据,但我试图首先获取所有国家/地区的数据,但它是有效的,但我首先面对或得到空字符串。请看一下这段代码,如果可以的话请帮我。谢谢你。
import scrapy
class Covid19Spider(scrapy.Spider):
name = 'covid19'
allowed_domains = ['worldometers.info']
start_urls = ['https://www.worldometers.info/coronavirus/']
def parse(self, response):
table = response.xpath('//*[contains(@class, "table table-bordered")]')[0]
trs = table.xpath('.//tr')[3:]
for tr in trs:
country = tr.xpath('.//td[2]//a//text()|'
'.//td[2]//text()').extract_first().strip()
yield {
"Country": country,
}
我收到的错误如下
{'Country': ''}
2020-09-20 23:01:18 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.worldometers.info/coronavirus/>
{'Country': ''}
2020-09-20 23:01:18 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.worldometers.info/coronavirus/>
{'Country': ''}
2020-09-20 23:01:18 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.worldometers.info/coronavirus/>
{'Country': ''}
2020-09-20 23:01:18 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.worldometers.info/coronavirus/>
{'Country': ''}
2020-09-20 23:01:18 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.worldometers.info/coronavirus/>
{'Country': 'World'}
2020-09-20 23:01:18 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.worldometers.info/coronavirus/>
{'Country': 'USA'}
发布于 2020-09-21 06:35:37
在@PaulM之后,我更改了行号。我设置了9而不是3,它现在工作得很好。这是一个解决方案
import scrapy
class Covid19Spider(scrapy.Spider):
name = 'covid19'
allowed_domains = ['worldometers.info']
start_urls = ['https://www.worldometers.info/coronavirus/']
def parse(self, response):
table = response.xpath('//*[contains(@class, "table table-bordered")]')[0]
trs = table.xpath('.//tr')[9:] # Set 9 instead of 3
for tr in trs:
country = tr.xpath('.//td[2]//a//text()|'
'.//td[2]//text()').extract_first().strip()
yield {
"Country": country,
}
https://stackoverflow.com/questions/63981739
复制