我是新手使用scrapy和python我想开始从搜索结果中抓取数据,如果你会加载页面默认内容将会出现,我需要抓取的是过滤后的内容,同时做分页?
下面是我需要从时间过滤器中抓取项目的URL:"Today“https://teslamotorsclub.com/tmc/post-ratings/6/posts
我尝试了不同的方法,但都不起作用。
我所做的就是这些,但更多的是关于布局结构。
class TmcnfSpider(scrapy.Spider):
name = 'tmcnf'
allowed_domains = ['teslamotorsclub.com']
start_urls = ['https://teslamotorsclub.com/tmc/post-ratings/6/posts']
def start_requests(self):
#Show form from a filtered search result
def parse(self, response):
#some code scraping item
#Yield url for pagination
发布于 2019-05-10 17:44:30
要获得今日帖子过滤器,您需要将post请求与有效负载一起发送到此url https://teslamotorsclub.com/tmc/post-ratings/6/posts
。下面的代码应该会给你带来你感兴趣的结果。
import scrapy
class TmcnfSpider(scrapy.Spider):
name = "teslamotorsclub"
start_urls = ["https://teslamotorsclub.com/tmc/post-ratings/6/posts"]
def parse(self,response):
payload = {'time_chooser':'4','_xfToken':''}
yield scrapy.FormRequest(response.url,formdata=payload,callback=self.parse_results)
def parse_results(self,response):
for items in response.css("h3.title > a::text").getall():
yield {"title":items.strip()}
https://stackoverflow.com/questions/56074404
复制相似问题