首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Python POST用于检查网页上的方框

Python POST用于检查网页上的方框
EN

Stack Overflow用户
提问于 2019-10-22 01:04:02
回答 1查看 439关注 0票数 1

我试图刮掉一个网页,为墨西哥电力市场发布价格。该网页具有需要选中的复选框,以便显示具有价格的文件。一旦我得到了相关的方框检查,我想拉页面上的链接,并检查我正在寻找的特定文件是否张贴。我在使用requests.post选中复选框的第一部分中遇到了问题。当我发布并通过requests.post传递这些参数时,我使用fiddler来跟踪更改。我希望能够解析出响应中的所有“href”链接,但我没有得到任何链接。任何帮助我重定向到解决方案的帮助都将不胜感激。下面是我使用的代码的相关部分:

代码语言:javascript
复制
data{
"ctl00$ContentPlaceHolder1$toolkit":"ctl00$ContentPlaceHolder1$UpdatePanel1|ctl00$ContentPlaceHolder1$treePrincipal",
"_EVENTTARGET": "ctl00$ContentPlaceHolder1$treePrincipal",
"__EVENTARGUMENT":{"commandName":"Check","index":"0:0:0:0"},
"__VIEWSTATE": "/verylongstringhere",
"__VIEWSTATEGENERATOR":"6B88769A",
"__EVENTVALIDATION":"/wEdAAPhpIpHlL5kdIfX6MRCtKcRwfFVx5pEsE3np13JV2opXVEvSNmVO1vU+umjph0Dtwe41EcPKcg0qvxOp6m6pWTIV4q0ZOXSBrDwJTrxjo3dZg==",
"ctl00_ContentPlaceHolder1_treePrincipal_ClientState":{"expandedNodes":[],"collapsedNodes":
 [],"logEntries":[],"selectedNodes":[],"checkedNodes":["0","0:0","0:0:0","0:0:0:0"],"scrollPosition":0},
"ctl00_ContentPlaceHolder1_ListViewNodos_ClientState":"",
"ctl00_ContentPlaceHolder1_NotifAvisos_ClientState":"",
"ctl00$ContentPlaceHolder1$NotifAvisos$hiddenState":"",
"ctl00_ContentPlaceHolder1_NotifAvisos_XmlPanel_ClientState":"",
"ctl00_ContentPlaceHolder1_NotifAvisos_TitleMenu_ClientState":"",
"__ASYNCPOST":"true"
}
代码语言:javascript
复制
headers = {
    'Accept': '*/*',
    'Accept-Encoding': 'gzip, deflate, br',
    'Accept-Language': 'en-US,en;q=0.9',
    'Cache-Control': 'no-cache',
    'Connection': 'keep-alive',
    'Content-Length': '26255',
    'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
    'Cookie': '_ga=GA1.3.1966843891.1571403663; _gid=GA1.3.1095695800.1571665852',
    'Host': 'www.cenace.gob.mx',
    'Origin': 'https://www.cenace.gob.mx',
    'Referer': 'https://www.cenace.gob.mx/SIM/VISTA/REPORTES/PreEnergiaSisMEM.aspx',
    'Sec-Fetch-Mode': 'cors',
    'Sec-Fetch-Site': 'same-origin',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) 
     Chrome/77.0.3865.120 Safari/537.36',
    'X-MicrosoftAjax': 'Delta=true',
    'X-Requested-With': 'XMLHttpRequest'   
}
代码语言:javascript
复制
url ="https://www.cenace.gob.mx/SIM/VISTA/REPORTES/PreEnergiaSisMEM.aspx"
r= requests.post(url,data=data, headers=headers, verify=False)

这就是Fiddler在《华盛顿邮报》上展示的:enter image description here

EN

Stack Overflow用户

回答已采纳

发布于 2019-10-22 04:38:16

可能您的__EVENTVALIDATION__VIEWSTATE字段不正确。您可以获得初始页面&用初始值抓取所有输入。

下面的代码获取第一个请求的输入,像您一样编辑它们,然后发送抓取所有href值的POST请求:

代码语言:javascript
复制
import requests
import json
from bs4 import BeautifulSoup

base_url = "https://www.cenace.gob.mx"
url = "{}/SIM/VISTA/REPORTES/PreEnergiaSisMEM.aspx".format(base_url)

r = requests.get(url)
soup = BeautifulSoup(r.text, "html.parser")
payload = dict([
    (t['name'],t.get('value',''))
    for t in soup.select("input")
    if t.has_attr('name')
])
payload['ctl00$ContentPlaceHolder1$toolkit'] = 'ctl00$ContentPlaceHolder1$UpdatePanel1|ctl00$ContentPlaceHolder1$treePrincipal'
payload['__EVENTTARGET'] = 'ctl00$ContentPlaceHolder1$treePrincipal'
payload['__ASYNCPOST'] = 'true'
payload['__EVENTARGUMENT']= json.dumps({
  "commandName":"Check",
  "index":"0:1:1:0"
})
payload['ctl00_ContentPlaceHolder1_treePrincipal_ClientState'] = json.dumps({
  "expandedNodes":[], "collapsedNodes":[], 
  "logEntries":[], "selectedNodes":[],
  "checkedNodes":["0","0:1","0:1:1","0:1:1:0"],
  "scrollPosition":0
})

r = requests.post(url, data = payload, headers= {
    "User-Agent": "Mozilla/5.0 (X11; Linux x86_64)"
  })
soup = BeautifulSoup(r.text, "html.parser")

print([ 
  "{}/{}".format(base_url, t["href"]) 
  for t in soup.findAll('a') 
  if not t["href"].startswith('javascript')
])
票数 0
EN
查看全部 1 条回答
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/58491033

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档