先登录网页,获取cookie,然后转化为字典,保存在settings.py中的COOKIES池中,使用中间件用cookie登录。
1、cookie,转化为字典 def cookieChangeToDict(cookie): ''' 将cookie字符串转换成字典 :param cookie: 登录后的cookie :return:字典 ''' cookieList = cookie.split(';') cookieDict = {} for cookie in cookieList: name = cookie.split('=', maxsplit=1)[0].strip() value = cookie.split('=', maxsplit=1)[1].strip() cookieDict[name] = value return cookieDict
if name == 'main': cookie = """ 你的cookie """ print(cookieChangeToDict(cookie))
2、使用登录后的cookie发送请求 方式一:
def start_requests(self):
url= ''
return [scrapy.FormRequest(url, cookies = self.cookies, callback = self.parse)]
方式2:使用中间件:
from scrapy import signals from scrapy.downloadermiddlewares.cookies import CookiesMiddleware import random
from renren.settings import COOKIES
class RandomCookieMiddleware(CookiesMiddleware): ''' 随机cookie池 ''' def process_request(self, request, spider): cookie = random.choice(COOKIES) request.cookies = cookie
在settings.py中设置:
ROBOTSTXT_OBEY = False
COOKIES_ENABLED = True
DOWNLOADER_MIDDLEWARES = { 'renren.middlewares.RandomCookieMiddleware': 543, }
COOKIES = [ ]