不久前,我开始编程,遇到了这个问题。我想收集股票数据从网站:https://statusinvest.com.br/acoes/petr4。但是很明显,它们是用javascript呈现的,BeautifulSoup不收集,如果你能帮我理解的话
发布于 2022-11-19 07:47:13
这个部分不仅需要js来加载,它实际上不会加载直到您滚动到它。您可以尝试找出哪个请求和/或一些js是用来呈现该部分的,然后尝试用python复制它,但我认为使用硒会更容易一些。我甚至还使用有此功能来使在抓取html之前自动化一些更简单/常见的交互变得更加方便:
#### FIRST PASTE [or DOWNLOAD&IMPORT] FUNCTION DEF from https://pastebin.com/kEC9gPC8 ####
soup = linkToSoup_selenium(
'https://statusinvest.com.br/acoes/petr4',
clickFirst='//strong[@data-item="avg_F"]' # it actually just has to scroll, not click [but I haven't added an option for that yet],
ecx='//strong[@data-item="avg_F"][text()!="-"]' # waits till this loads
)
if soup is not None:
print({
t.find_previous_sibling().get_text(' ').strip(): t.get_text(' ').strip()
for t in soup.select('div#payout-section span.title + strong.value')
})
版画
{'MÉDIA': '83,32%', 'ATUAL': '124,13% \n ( 48,97% acima da média )', 'MENOR\xa0VALOR': '26,35% \n ( 2019 )', 'MAIOR\xa0VALOR': '144,51% \n \n( 2020 )'}
编辑:I最终注意到了用于获取数据的API (https://statusinvest.com.br/acao/payoutresult?code=petr4&companyid=408&type=0)。即使在js加载发生之前就可以使用html,您也可以对其进行实际的修改:
soup.select_one('#payout-section[data-company][data-code]').attrs
应该回来
{'id': 'payout-section', 'data-company': '408', 'data-code': 'petr4', 'data-category': '1'}
这样,url就可以用
payout = soup.select_one('#payout-section[data-company][data-code]')
if payout:
compId, dCode = payout.get('data-company'), payout.get('data-code')
apiUrl = f'https://statusinvest.com.br/acao'
apiUrl = f'{apiUrl}/payoutresult?code={dCode}&companyid={compId}&type=0'
我认为type
参数是为时间窗口-0为5年,1为10年,2为最大窗口。requests.get(apiUrl, headers=headers).json()
应该返回类似的内容
{
"actual": 124.12623323305537,
"avg": 83.32096287339556,
"avgDifference": 48.97359434223362,
"minValue": 26.353309862919502,
"minValueRank": 2019,
"maxValue": 144.51093035368598,
"maxValueRank": 2020,
"actual_F": "124,13%",
"avg_F": "83,32%",
"avgDifference_F": "48,97% acima da m\u00e9dia",
"minValue_F": "26,35%",
"minValueRank_F": "2019",
"maxValue_F": "144,51%",
"maxValueRank_F": "2020",
"chart": {
"categoryUnique": true,
"category": [
"2018",
"2019",
"2020",
"2021",
"2022"
],
"series": {
"percentual": [
{
"value": 27.189302754606462,
"value_F": "27,19%"
},
{
"value": 26.353309862919502,
"value_F": "26,35%"
},
{
"value": 144.51093035368598,
"value_F": "144,51%"
},
{
"value": 94.42503816271046,
"value_F": "94,43%"
},
{
"value": 124.12623323305537,
"value_F": "124,13%"
}
],
"proventos": [
{
"value": 7009130357.11,
"value_F": "R$ 7.009.130.357,11",
"valueSmall_F": "7,01 B"
},
{
"value": 10577427979.68,
"value_F": "R$ 10.577.427.979,68",
"valueSmall_F": "10,58 B"
},
{
"value": 10271836929.54,
"value_F": "R$ 10.271.836.929,54",
"valueSmall_F": "10,27 B"
},
{
"value": 100721299707.4,
"value_F": "R$ 100.721.299.707,40",
"valueSmall_F": "100,72 B"
},
{
"value": 179966901777.61,
"value_F": "R$ 179.966.901.777,61",
"valueSmall_F": "179,97 B"
}
],
"lucroLiquido": [
{
"value": 25779000000.0,
"value_F": "R$ 25.779.000.000,00",
"valueSmall_F": "25,78 B"
},
{
"value": 40137000000.0,
"value_F": "R$ 40.137.000.000,00",
"valueSmall_F": "40,14 B"
},
{
"value": 7108000000.0,
"value_F": "R$ 7.108.000.000,00",
"valueSmall_F": "7,11 B"
},
{
"value": 106668000000.0,
"value_F": "R$ 106.668.000.000,00",
"valueSmall_F": "106,67 B"
},
{
"value": 144987000000.0,
"value_F": "R$ 144.987.000.000,00",
"valueSmall_F": "144,99 B"
}
]
}
}
}
然后你可以从那里得到你想要的值。(我认为它还包括图表数据。)
发布于 2022-11-19 07:41:35
希望OP的下一个问题将包含一个最小的,可复制的例子,下面是使用请求和BeautifulSoup从该页面获取一些数据的一种方法:
from bs4 import BeautifulSoup as bs
import requests
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36'
}
r = requests.get('https://statusinvest.com.br/acoes/petr4', headers=headers)
soup = bs(r.text, 'html.parser')
valor_atual = soup.select_one('h3:-soup-contains("Valor atual")').find_next('strong').text
min_52_semanas = soup.select_one('h3:-soup-contains("Min. 52 semanas")').find_next('strong').text
print('Valor atual:', valor_atual)
print('Min. 52 semanas:', min_52_semanas)
### and now some values hydrated in page by Javascript, from an API endpoint:
api_url = 'https://statusinvest.com.br/acao/payoutresult?code=petr4&companyid=408&type=0'
api_headers = {
'referer': 'https://statusinvest.com.br/acoes/petr4',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36'
}
r = requests.get(api_url, headers=api_headers)
print(r.json())
终点站的结果:
Valor atual: 26,54
Min. 52 semanas: 15,85
{'actual': 124.12623323305537, 'avg': 83.32096287339556, 'avgDifference': 48.97359434223362, 'minValue': 26.353309862919502, 'minValueRank': 2019, 'maxValue': 144.51093035368598, 'maxValueRank': 2020, 'actual_F': '124,13%', 'avg_F': '83,32%', 'avgDifference_F': '48,97% acima da média', 'minValue_F': '26,35%', 'minValueRank_F': '2019', 'maxValue_F': '144,51%', 'maxValueRank_F': '2020', 'chart': {'categoryUnique': True, 'category': ['2018', '2019', '2020', '2021', '2022'], 'series': {'percentual': [{'value': 27.189302754606462, 'value_F': '27,19%'}, {'value': 26.353309862919502, 'value_F': '26,35%'}, {'value': 144.51093035368598, 'value_F': '144,51%'}, {'value': 94.42503816271046, 'value_F': '94,43%'}, {'value': 124.12623323305537, 'value_F': '124,13%'}], 'proventos': [{'value': 7009130357.11, 'value_F': 'R$ 7.009.130.357,11', 'valueSmall_F': '7,01 B'}, {'value': 10577427979.68, 'value_F': 'R$ 10.577.427.979,68', 'valueSmall_F': '10,58 B'}, {'value': 10271836929.54, 'value_F': 'R$ 10.271.836.929,54', 'valueSmall_F': '10,27 B'}, {'value': 100721299707.4, 'value_F': 'R$ 100.721.299.707,40', 'valueSmall_F': '100,72 B'}, {'value': 179966901777.61, 'value_F': 'R$ 179.966.901.777,61', 'valueSmall_F': '179,97 B'}], 'lucroLiquido': [{'value': 25779000000.0, 'value_F': 'R$ 25.779.000.000,00', 'valueSmall_F': '25,78 B'}, {'value': 40137000000.0, 'value_F': 'R$ 40.137.000.000,00', 'valueSmall_F': '40,14 B'}, {'value': 7108000000.0, 'value_F': 'R$ 7.108.000.000,00', 'valueSmall_F': '7,11 B'}, {'value': 106668000000.0, 'value_F': 'R$ 106.668.000.000,00', 'valueSmall_F': '106,67 B'}, {'value': 144987000000.0, 'value_F': 'R$ 144.987.000.000,00', 'valueSmall_F': '144,99 B'}]}}}
BeautifulSoup文档可以在这里找到:https://beautiful-soup-4.readthedocs.io/en/latest/
https://stackoverflow.com/questions/74497235
复制相似问题