我想要获取页面的内容http://www2.bmf.com.br/pages/portal/bmfbovespa/lumis/lum-tipo-de-participante-ptBR.asp
当我将这个url复制并粘贴到浏览器的位置时,我得到了页面的全部内容。
但是,使用POST (发送"dData1“参数)和GET这两种方法使用R和httr包都不成功。
传递参数"dData1“的POST方法
library(httr);
url="http://www2.bmf.com.br/pages/portal/bmfbovespa/lumis/lum-tipo-de-participante-ptBR.asp";
body = list(dData1="16/05/2018");
POST(url, body = body, encode = "form", verbose());
结果是:
-> POST /pages/portal/bmfbovespa/lumis/lum-tipo-de-participante-ptBR.asp HTTP/1.1
-> Host: www2.bmf.com.br
(...omitted...)
->
>> dData1=16%2F05%2F2018
<- HTTP/1.1 200 OK
(...omitted...)
<-
Response [http://www2.bmf.com.br/pages/portal/bmfbovespa/lumis/lum-tipo-de-participante-ptBR.asp]
Date: 2018-06-02 16:28
Status: 200
Content-Type: text/html
Size: 111 kB
NA
即使我尝试了一个简单的GET,我也不能获得页面的内容:
library(httr);
url="http://www2.bmf.com.br/pages/portal/bmfbovespa/lumis/lum-tipo-de-participante-ptBR.asp";
GET(url,verbose())
结果是:
-> GET /pages/portal/bmfbovespa/lumis/lum-tipo-de-participante-ptBR.asp HTTP/1.1
(...omitted...)
->
<- HTTP/1.1 200 OK
(...omitted...)
<-
Response [http://www2.bmf.com.br/pages/portal/bmfbovespa/lumis/lum-tipo-de-participante-ptBR.asp]
Date: 2018-06-02 16:33
Status: 200
Content-Type: text/html
Size: 140 kB
NA
我已经使用浏览器开发工具检查了请求头,但我无法找出我做错了什么,也无法获得此页面的内容。任何提示都将不胜感激。
发布于 2018-06-03 01:19:13
该网站不是UTF-8编码的,因此您需要找到正确的编码并将其设置为解析内容:
my_url <- "http://www2.bmf.com.br/pages/portal/bmfbovespa/lumis/lum-tipo-de-participante-ptBR.asp"
response <- GET(my_url)
response
content(response,as = "parsed",encoding = "iso-8859-1")
结果:
> content(response,as = "parsed",encoding = "iso-8859-1")
{xml_document}
<html class="no-js" lang="pt-br">
[1] <head>\n<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">\n<meta name="viewport" content="width=device-width, initial-scale=1.0">\n<link rel=" ...
[2] <body>\n<!-- Google Tag Manager -->\r\n<noscript><iframe src="//www.googletagmanager.com/ns.html?id=GTM-KPF8G3" height="0" width="0" style="display:none;visibil ...
https://stackoverflow.com/questions/50659046
复制相似问题