我还有许多其他脚本,但当我在cmd中运行这个爬虫程序,打开.csv文件查看保存的“标题”时,我得到的xpath复制到excel中。知道为什么吗?
import scrapy
class MovieSpider(scrapy.Spider):
name = 'movie'
allowed_domains = ['https://www.imdb.com/search/title?start=1']
start_urls = ['https://www.imdb.com/search/title?start=1/']
def parse(self, response):
titles = response.xpath('//*[@id="main"]/div/div/div[3]/div[1]/div[3]/h3/a')
pass
print(titles)
for title in titles:
yield {'Title': title}-尝试下面的两个:
for subject in titles:
yield {
'Title': subject.xpath('.//h3[@class="lister-item-header"]/a/text()').extract_first(),
'Runtime': subject.xpath('.//p[@class="text-muted"]/span/text()').extract_first(),
'Description': subject.xpath('.//p[@class="text-muted"]/p/text()').extract_first(),
'Director': subject.xpath('.//*[@id="main"]/a/text()').extract_first(),
'Rating': subject.xpath('.//div[@class="inline-block ratings-imdb-rating"]/strong/text()').extract_first()
}发布于 2018-09-20 13:50:42
使用extract()或extract_first(),也可以使用更短和更大的xpath表示法:
import scrapy
class MovieSpider(scrapy.Spider):
name = 'movie'
allowed_domains = ['https://www.imdb.com/search/title?start=1']
start_urls = ['https://www.imdb.com/search/title?start=1/']
def parse(self, response):
subjects = response.xpath('//div[@class="lister-item mode-advanced"]')
for subject in subjects:
yield {
'Title': subject.xpath('.//h3[@class="lister-item-header"]/a/text()').extract_first(),
'Rating': subject.xpath('.//div[@class="inline-block ratings-imdb-rating"]/strong/text()').extract_first(),
'Runtime': subject.xpath('.//span[@class="runtime"]/text()').extract_first(),
'Description': subject.xpath('.//p[@class="text-muted"]/text()').extract_first(),
'Directior': subject.xpath('.//p[contains(text(), "Director")]/a[1]/text()').extract_first(),
}输出:

https://stackoverflow.com/questions/52418097
复制相似问题