blocks|key|4738026|text|是的，它肯定是通过javascript查询加载内容的。您可以复制这些查询的内容(标头、有效负载...)并通过requests库手动发送它们，或者(最好是imo)使用浏览器模拟驱动程序(如selenium+)来抓取动态页面。|type|unstyled|depth|inlineStyleRanges|offset|length|style|BOLD|CODE|entityRanges|data|4738027|entityMap^0|1I|8|2L|8|1I|8|2L|8|0^^$0|@$1|2|3|4|5|6|7|I|8|@$9|J|A|K|B|C]|$9|L|A|M|B|C]|$9|N|A|O|B|D]|$9|P|A|Q|B|D]]|E|@]|F|$]]|$1|G|3|-4|5|6|7|R|8|@]|E|@]|F|$]]]|H|$]]

<p>Yes, it definitely loads the content through javascript queries. You can either copy the content of these queries (headers, payload, ...) and manually send them through the <strong><code>requests</code></strong> library, or (better imo) use a browser simulation driver like <strong><code>selenium</code></strong> to scrape dinamic pages.</p>


blocks|key|3322950|text|数据通过JavaScript动态加载。但是您可以使用此脚本构造Ajax请求并解析一些数据：|type|unstyled|depth|inlineStyleRanges|entityRanges|data|3322951|import+re
import+json
import+requests
from+datetime+import+datetime,+timezone

url+=+'https://zone4.ca/race/2020-11-08/c91ec8f6/results/'
html_doc+=+requests.get(url).text
data+=+re.search(r'callback$(\{.*\})$',+html_doc,+flags=re.S).group(1).replace("'",+'"')
data+=+json.loads(re.sub(r'([%5E\s]%2B):',+r'"\1":',+data))
data_url+=+"https://zone4.ca/public/data/race.json?url={url}&page={page}&channel_id={channelID}&channel_class=StandardRace&entity_id={entityID}"

feed+=+requests.get(data_url.format(**data)).json()

#+uncomment+this+to+print+all+data:
#+print(json.dumps(feed,+indent=4))

for+racer+in+feed['tree']['_child_racers']:
++++print(racer['first_name'][0],+racer['last_name'][0])
++++for+t+in+racer['_child_timedentitys']:++++++++
++++++++for+i+in+range(1,+12):
++++++++++++time+=+t.get('time_{}_list'.format(i))
++++++++++++if+not+time:
++++++++++++++++continue
++++++++++++dtobj+=+datetime.fromtimestamp(time[0][0]+/+1_000_000,+timezone.utc)
++++++++++++print('\tLap+{}:+{}'.format(i,+dtobj))|code-block|syntax|javascript|3322952|打印：|3322953|Tim+Shea
++++Lap+1:+2020-11-08+14:40:54.611000%2B00:00
++++Lap+2:+2020-11-08+14:45:17.259000%2B00:00
++++Lap+3:+2020-11-08+14:49:48.259000%2B00:00
++++Lap+4:+2020-11-08+14:54:18.778000%2B00:00
++++Lap+5:+2020-11-08+14:58:52.099000%2B00:00
++++Lap+6:+2020-11-08+15:03:17.700000%2B00:00
++++Lap+7:+2020-11-08+15:07:44.818000%2B00:00
++++Lap+8:+2020-11-08+15:12:18.896000%2B00:00
++++Lap+9:+2020-11-08+15:16:52.010000%2B00:00
++++Lap+10:+2020-11-08+15:21:18.897000%2B00:00
++++Lap+11:+2020-11-08+15:25:55.058000%2B00:00
Zachary+Steinman
++++Lap+1:+2020-11-08+14:41:32.912000%2B00:00
++++Lap+2:+2020-11-08+14:46:29.458000%2B00:00
++++Lap+3:+2020-11-08+14:51:29.970000%2B00:00
++++Lap+4:+2020-11-08+14:56:30.875000%2B00:00
++++Lap+5:+2020-11-08+15:01:40.057000%2B00:00
++++Lap+6:+2020-11-08+15:06:47.620000%2B00:00
++++Lap+7:+2020-11-08+15:11:58.790000%2B00:00
++++Lap+8:+2020-11-08+15:17:09.099000%2B00:00
++++Lap+9:+2020-11-08+15:22:14.819000%2B00:00
++++Lap+10:+2020-11-08+15:27:19.859000%2B00:00
Kent+Williams
++++Lap+1:+2020-11-08+14:42:40.399000%2B00:00
++++Lap+2:+2020-11-08+14:48:33.714000%2B00:00

...and+so+on.|3322954|entityMap^0|0|0|0|0^^$0|@$1|2|3|4|5|6|7|M|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|N|8|@]|9|@]|A|$E|F]]|$1|G|3|H|5|6|7|O|8|@]|9|@]|A|$]]|$1|I|3|J|5|D|7|P|8|@]|9|@]|A|$E|F]]|$1|K|3|-4|5|6|7|Q|8|@]|9|@]|A|$]]]|L|$]]

<p>The data is loaded dynamically via JavaScript. But you can use this script to construct the Ajax request and parse some data:</p>
<pre><code>import re
import json
import requests
from datetime import datetime, timezone

url = 'https://zone4.ca/race/2020-11-08/c91ec8f6/results/'
html_doc = requests.get(url).text
data = re.search(r'callback$(\{.*\})$', html_doc, flags=re.S).group(1).replace(&quot;'&quot;, '&quot;')
data = json.loads(re.sub(r'([^\s]+):', r'&quot;\1&quot;:', data))
data_url = &quot;https://zone4.ca/public/data/race.json?url={url}&amp;page={page}&amp;channel_id={channelID}&amp;channel_class=StandardRace&amp;entity_id={entityID}&quot;

feed = requests.get(data_url.format(**data)).json()

# uncomment this to print all data:
# print(json.dumps(feed, indent=4))

for racer in feed['tree']['_child_racers']:
    print(racer['first_name'][0], racer['last_name'][0])
    for t in racer['_child_timedentitys']:        
        for i in range(1, 12):
            time = t.get('time_{}_list'.format(i))
            if not time:
                continue
            dtobj = datetime.fromtimestamp(time[0][0] / 1_000_000, timezone.utc)
            print('\tLap {}: {}'.format(i, dtobj))
</code></pre>
<p>Prints:</p>
<pre><code>Tim Shea
    Lap 1: 2020-11-08 14:40:54.611000+00:00
    Lap 2: 2020-11-08 14:45:17.259000+00:00
    Lap 3: 2020-11-08 14:49:48.259000+00:00
    Lap 4: 2020-11-08 14:54:18.778000+00:00
    Lap 5: 2020-11-08 14:58:52.099000+00:00
    Lap 6: 2020-11-08 15:03:17.700000+00:00
    Lap 7: 2020-11-08 15:07:44.818000+00:00
    Lap 8: 2020-11-08 15:12:18.896000+00:00
    Lap 9: 2020-11-08 15:16:52.010000+00:00
    Lap 10: 2020-11-08 15:21:18.897000+00:00
    Lap 11: 2020-11-08 15:25:55.058000+00:00
Zachary Steinman
    Lap 1: 2020-11-08 14:41:32.912000+00:00
    Lap 2: 2020-11-08 14:46:29.458000+00:00
    Lap 3: 2020-11-08 14:51:29.970000+00:00
    Lap 4: 2020-11-08 14:56:30.875000+00:00
    Lap 5: 2020-11-08 15:01:40.057000+00:00
    Lap 6: 2020-11-08 15:06:47.620000+00:00
    Lap 7: 2020-11-08 15:11:58.790000+00:00
    Lap 8: 2020-11-08 15:17:09.099000+00:00
    Lap 9: 2020-11-08 15:22:14.819000+00:00
    Lap 10: 2020-11-08 15:27:19.859000+00:00
Kent Williams
    Lap 1: 2020-11-08 14:42:40.399000+00:00
    Lap 2: 2020-11-08 14:48:33.714000+00:00

...and so on.
</code></pre>


<p>I'm trying to scrape some info from a website using beautifulsoup, but the output differs from the web page html. The content I am trying to get out of the webpage is in</p>
<pre><code>&lt;div class=&quot;page-content&quot;&gt;
</code></pre>
<p>but in my beautifulsoup object it shows up as:</p>
<pre><code>&lt;div class=&quot;page-content loading&quot;&gt;&lt;/div&gt;
</code></pre>
<p>With nothing contained in the division. I tried to find the stuff I was looking for anyways but it came back with nothing. I also tried html5lib and lxml parsers but that didn't change the output. Is the browser running some sort of javascript code that is preventing me from getting the full web page html or something? I'm new to this so any suggestions would be appreciated.</p>
<p>here is my script:</p>
<pre><code>URL = 'https://zone4.ca/race/2020-11-08/c91ec8f6/results'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')

results = soup.find_all(&quot;div&quot;, class_=&quot;racer-row&quot;)

print(results)
print(soup)
</code></pre>


BeautifulSoup outputs <div class="page-content"> as <div class="page-content loading"> with none of the contents of the division?

翻译质量差，导致语言生硬或混乱。

没有提供实际的解决方法或示例。

解答不清晰，无法理解或解决问题。

页面排版不美观，阅读体验差。

文章

问答

视频

学习中心

腾讯云实验室

直播

竞赛

腾讯云代码分析专区

腾讯iOA零信任安全管理系统专区

腾讯云架构师技术同盟交流圈

腾讯云数据库专区

腾讯云顾问专区

腾讯云原生专区

腾讯混元专区

腾讯云TCE专区

腾讯云Lighthouse专区

腾讯云HAI专区

腾讯云Edgeone专区

腾讯云存储专区

腾讯云智能专区

腾讯轻联专区 

腾讯云开发专区

TAPD专区

腾讯轻量云游戏服专区

腾讯云最具价值专家

腾讯云架构师技术同盟

腾讯云创作之星

腾讯云开发者先锋

腾讯云代码助手

云原生构建

TAPD 敏捷项目管理

Cloud Studio

SDK中心

API中心

命令行工具

涵盖代码开发、场景应用、自动测试全流程，助你从零构建专属AI助手

一站式MCP教程库，解锁AI应用新玩法

我正在尝试从一个网站上抓取一些信息，但是输出结果与网页的html不同。我试图从网页中获取的内容在<div class="page-content">但在我的漂亮的shows对象中，它显示为：<div class="page-content loading"></div>在组织中什么都没有。我试着去找我想要的东西，但是一...

问BeautifulSoup将<div class=“页面内容”>输出为<div class=“页面内容加载”>，但没有分区的任何内容？
EN

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问BeautifulSoup将<div class=“页面内容”>输出为<div class=“页面内容加载”>，但没有分区的任何内容？EN