前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >使用Selenium与Requests模拟登陆

使用Selenium与Requests模拟登陆

作者头像
小歪
发布2018-07-25 15:41:37
1.4K0
发布2018-07-25 15:41:37
举报

本期讲一讲模拟登录相关的东西,目标网站是Github

简单的Selnium

想说说简单的方法,使用浏览器登录,基本上就是傻瓜操作了。

如上图所示,登录设计的很简单,没有验证码什么的,代码如下:

代码语言:javascript
复制
#!/usr/bin/env python
# -*- coding: utf-8 -*-

import time
from selenium import webdriver

driver = webdriver.Chrome()
driver.maximize_window()


def login(account, password):
    driver.get('https://github.com/login')
    time.sleep(2)
    driver.find_element_by_id('login_field').send_keys(account)
    driver.find_element_by_id('password').send_keys(password)
    driver.find_element_by_xpath('//input[@class="btn btn-primary btn-block"]').click()
    # do whatever you want


if __name__ == '__main__':
    account, password = 'account', 'password'
    login(account, password)

分析请求之Requests

打开F12,使用错误的账号密码登录,复制curl

代码语言:javascript
复制
curl 'https://github.com/session' -H 'Cookie: has_recent_activity=1; _octo=GH1.1.1477592343.1531820067; logged_in=no; _gh_sess=UEZzYnVCMVlhNkVOdE5rU1hWRFpDbmFlY0UyQ1Y2b3Z4TGw2NFlTMmJLUWk5VENVQ3Q4TWxiSWN5ckEyZXN0MUFkT29XVjQvbWJVbm9RV0JNQmc1TmU0UnBtK0taUXJpcElqUk5PNGZ5TjZOQ2ZPRVR4NU5WQXcrb2xWRnRBMnRPMkRWYzYvWmVGY0FrYU12Q3BVVTY3dXVSblliNG4rWjc2QXVwR2pjQ1pzZXM1MFk1MjU5OUw2WkFLTU1BMzJDWGlTeXliNzNaejlUaW43cWhFNzQ0MFFVVmJ1aEppbzdtQTZkRERmUm5mWExkRDlmWW5lNk9mdlFYb05MQUtubDZBbXFJWjV6eFhic3JiWlRtZ2QxZ2FqZUxnOGFheUgzaXJmc290b0Jma09pRTJZdHZySEVmdVdGZHVBU3ZTVTJRM0pESnE1N1VPRDM0ck9FZzNJZTN5VWljUktyZ3FZQU16THVBeFBXV3BNPS0tSDh4WVV6U2RSNjlBL3FNQ3VaRGxEUT09--71cf0886128d55b42c82cf6f7b76e007ebfdc77b; _ga=GA1.2.57857743.1531820085; _gat=1; tz=Asia%2FShanghai' -H 'Origin: https://github.com' -H 'Accept-Encoding: gzip, deflate, br' -H 'Accept-Language: zh-CN,zh;q=0.9,en;q=0.8' -H 'Upgrade-Insecure-Requests: 1' -H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36' -H 'Content-Type: application/x-www-form-urlencoded' -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8' -H 'Cache-Control: max-age=0' -H 'Referer: https://github.com/login' -H 'Connection: keep-alive' --data 'commit=Sign+in&utf8=%E2%9C%93&authenticity_token=%2BtgUHwMIxnoHOHNMqQFkLak9mJzrxt%2B4yfFiZaf66WiMB5ZyRaVXq%2BFpZsM%2BtxgaRRX6Fzfezu1IdRqy%2BTGHwg%3D%3D&login=123456788%40gmail.com&password=123456788' --compressed

转为Python代码,

代码语言:javascript
复制
import requests

cookies = {
    'has_recent_activity': '1',
    '_octo': 'GH1.1.1477592343.1531820067',
    'logged_in': 'no',
    '_gh_sess': 'UEZzYnVCMVlhNkVOdE5rU1hWRFpDbmFlY0UyQ1Y2b3Z4TGw2NFlTMmJLUWk5VENVQ3Q4TWxiSWN5ckEyZXN0MUFkT29XVjQvbWJVbm9RV0JNQmc1TmU0UnBtK0taUXJpcElqUk5PNGZ5TjZOQ2ZPRVR4NU5WQXcrb2xWRnRBMnRPMkRWYzYvWmVGY0FrYU12Q3BVVTY3dXVSblliNG4rWjc2QXVwR2pjQ1pzZXM1MFk1MjU5OUw2WkFLTU1BMzJDWGlTeXliNzNaejlUaW43cWhFNzQ0MFFVVmJ1aEppbzdtQTZkRERmUm5mWExkRDlmWW5lNk9mdlFYb05MQUtubDZBbXFJWjV6eFhic3JiWlRtZ2QxZ2FqZUxnOGFheUgzaXJmc290b0Jma09pRTJZdHZySEVmdVdGZHVBU3ZTVTJRM0pESnE1N1VPRDM0ck9FZzNJZTN5VWljUktyZ3FZQU16THVBeFBXV3BNPS0tSDh4WVV6U2RSNjlBL3FNQ3VaRGxEUT09--71cf0886128d55b42c82cf6f7b76e007ebfdc77b',
    '_ga': 'GA1.2.57857743.1531820085',
    '_gat': '1',
    'tz': 'Asia%2FShanghai',
}

headers = {
    'Origin': 'https://github.com',
    'Accept-Encoding': 'gzip, deflate, br',
    'Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8',
    'Upgrade-Insecure-Requests': '1',
    'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36',
    'Content-Type': 'application/x-www-form-urlencoded',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
    'Cache-Control': 'max-age=0',
    'Referer': 'https://github.com/login',
    'Connection': 'keep-alive',
}

data = [
  ('commit', 'Sign in'),
  ('utf8', '\u2713'),
  ('authenticity_token', '+tgUHwMIxnoHOHNMqQFkLak9mJzrxt+4yfFiZaf66WiMB5ZyRaVXq+FpZsM+txgaRRX6Fzfezu1IdRqy+TGHwg=='),
  ('login', '123456788@gmail.com'),
  ('password', '123456788'),
]

response = requests.post('https://github.com/session', headers=headers, cookies=cookies, data=data)

注意两个地方,cookies和参数,先来看看参数,稍微特别的就是authenticity_token,感觉是验证。Ctrl+Shift+F打开搜索,最终在返回的html中找到

代码语言:javascript
复制
    <!-- '"` --><!-- </textarea></xmp> --></option></form><form action="/session" accept-charset="UTF-8" method="post"><input name="utf8" type="hidden" value="&#x2713;" /><input type="hidden" name="authenticity_token" value="CTujn/pHGMQBpEhYcJj9Mn6ChsNSkd5ul8rgNSP/6/KxdZlhS0ABKblsq1pLn6EaQvIGLMzl/IQawaDL8KFjDw==" />      <div class="auth-form-header p-0">

authenticity_token解决了,下一步想办法获取cookies

继续搜索_gh_sess_octo关键字,看到有这样一段js

代码语言:javascript
复制
javascript var e, t = void 0, r = void 0, n = this._getCookie("_octo"), a = [];

猜测cookies不是本地生成,查看打开Github首页的请求,果然在Response Cookies中找到了相关数据,那么使用Session就可以维持会话了。

代码语言:javascript
复制
#!/usr/bin/env python
# -*- coding: utf-8 -*-

import re
import requests


headers = {
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
    'Accept-Encoding': 'gzip, deflate, br',
    'Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8',
    'Connection': 'keep-alive',
    'Host': 'github.com',
    'Upgrade-Insecure-Requests': '1',
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36'
}

s = requests.session()
s.headers.update(headers)

def get_token():
    url = 'https://github.com/login'
    response = s.get(url)
    pat = 'name=\"authenticity_token\" value=\"(.*?)\"'
    authenticity_token = re.findall(pat, response.text)[0]
    return authenticity_token

def login(authenticity_token, account, password):
    payload = {
        'commit': 'Sign in',
        'utf8': '\u2713',
        'authenticity_token': authenticity_token,
        'login': account,
        'password': password,
    }
    url = 'https://github.com/session'
    response = s.post(url, data=payload)
    print(response)
    # do whatever you want


if __name__ == '__main__':
    account, password = 'account', 'password'
    authenticity_token = get_token()
    login(authenticity_token, account, password)

对比

Selenium:

  • 优点:简单、无脑,不用分析复杂的网页请求,不用保持会话状态
  • 缺点:速度慢,速度慢,速度慢(某些情况下会出现js加载不全)

Requests:

  • 优点:速度快,可以增加自己对cookies登陆的理解
  • 缺点:需要花时间寻找相关参数

如果对Github感兴趣,可以直接使用 Github API

最近在使用Selenium处理验证码,发现很强大,如果模拟请求,难度会非常大。

模拟登陆的两种方式,你喜欢哪种 (๑• . •๑)

本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2018-07-17,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 Python爬虫与算法进阶 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 简单的Selnium
  • 分析请求之Requests
相关产品与服务
验证码
腾讯云新一代行为验证码(Captcha),基于十道安全栅栏, 为网页、App、小程序开发者打造立体、全面的人机验证。最大程度保护注册登录、活动秒杀、点赞发帖、数据保护等各大场景下业务安全的同时,提供更精细化的用户体验。
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档