我正试图通过在https://www.watchcartoononline.com/bobs-burgers-season-9-episode-3-tweentrepreneurs上下载“鲍勃的汉堡”的一集电视节目来验证概念。
我想不出如何从这个网站提取视频网址。我使用了Chrome和Firefox web开发工具来确定它是否在iframe中,但是使用BeautifulSoup搜索iframe来提取src urls,会返回与视频无关的链接。哪里是对mp4或flv文件的引用(我在开发人员工具中看到它们-尽管禁止单击它们)。
任何关于如何用BeautifulSoup和请求做视频网络抓取的理解将不胜感激。
如果需要,这里有一些代码。很多教程都说要使用'a‘标签,但我没有收到任何'a’标签。
import requests
from bs4 import BeautifulSoup
r = requests.get("https://www.watchcartoononline.com/bobs-burgers-season-9-episode-5-live-and-let-fly")
soup = BeautifulSoup(r.content,'html.parser')
links = soup.find_all('iframe')
for link in links:
print(link['src'])
发布于 2018-11-08 04:53:26
import requests
url = "https://disk19.cizgifilmlerizle.com/cizgi/bobs.burgers.s09e03.mp4?st=_EEVz36ktZOv7ZxlTaXZfg&e=1541637622"
def download_file(url,filename):
# NOTE the stream=True parameter
r = requests.get(url, stream=True)
with open(filename, 'wb') as f:
for chunk in r.iter_content(chunk_size=1024):
if chunk: # filter out keep-alive new chunks
f.write(chunk)
#f.flush() commented by recommendation from J.F.Sebastian
return filename
download_file(url,"bobs.burgers.s09e03.mp4")
这段代码会把这一集下载到你的电脑上。视频url嵌套在<source>
标签中的<video>
标签内。
发布于 2021-04-10 07:50:38
背景信息
(向下滚动查看您的答案)
只有当你试图获取视频格式的网站在中明确声明的时候,这才是容易获得的。例如,如果您希望通过引用inspect URL从选择的站点获取.mp4文件,则例如,如果我们在此处使用此站点;如果我们在inspect元素中查找https://4anime.to/yakunara-mug-cup-mo-episode-01-1?id=45314
,则为there will be an src containing the .mp4
现在,如果我们尝试从这个网站获取.mp4 URL,如下所示
import requests
from bs4 import BeautifulSoup
html_url = "https://4anime.to/yakunara-mug-cup-mo-episode-01-1?id=45314"
html_response = requests.get(html_url)
soup = BeautifulSoup(html_response.text, 'html.parser')
for mp4 in soup.find_all('video'):
mp4 = mp4['src']
print(mp4)
我们将得到一个KeyError: 'src'
输出。发生这种情况是因为实际的视频存储在source
中,如果我们在soup.find_all('video')
中打印出值,就可以查看该视频
import requests
from bs4 import BeautifulSoup
html_url = "https://4anime.to/yakunara-mug-cup-mo-episode-01-1?id=45314"
html_response = requests.get(html_url)
soup = BeautifulSoup(html_response.text, 'html.parser')
for mp4 in soup.find_all('video'):
pass
print(mp4)
输出:
<video class="video-js vjs-default-skin vjs-big-play-centered" controls="" data-setup="{}" height="264" id="example_video_1" poster="" preload="none" width="640">
<source src="https://mountainoservo0002.animecdn.com/Yakunara-Mug-Cup-mo/Yakunara-Mug-Cup-mo-Episode-01.1-1080p.mp4" type="video/mp4"/>
</video>
因此,如果我们希望现在下载 .mp4,我们将使用source
元素并从中获取src
。
import requests
import shutil # - - This module helps to transfer information from 1 file to another
from bs4 import BeautifulSoup # - - We could honestly do this without soup
# - - Get the url of the site you want to scrape
html_url = "https://4anime.to/yakunara-mug-cup-mo-episode-01-1?id=45314"
html_response = requests.get(html_url)
soup = BeautifulSoup(html_response.text, 'html.parser')
# - - Get the .mp4 url and the filename
for vid in soup.find_all('source'):
url = vid['src']
filename = vid['src'].split('/')[-1]
# - - Get the video
response = requests.get(url, stream=True)
# - - Make sure the status is OK
if response.status_code == 200:
# - - Make sure the file size is not 0
response.raw.decode_content = True
with open(filename, 'wb') as f:
# - - Copy what's in response.raw and transfer it into the file
shutil.copyfileobj(response.raw, f)
(您显然可以通过手动复制源代码的src
并将其用作基URL来简化这一过程,而不必使用html_url
。我只是想向您展示,您可以选择引用.mp4 (也称为源代码的src
__))。
再说一次,并不是每个网站都是这样清晰的。特别是对于这个站点,我们很幸运它是可管理的。您可能尝试从其他站点抓取视频可能需要您从Elements
(在inspect元素中)转到Network
。在那里,你必须尝试获得嵌入链接的片段,并尝试将它们全部下载,以组成完整的视频,但再一次,并不总是那么容易,但您所请求的网站的视频是。
你的答案
转到inspect元素,单击位于视频顶部的Chromecast Player (2. Player)
以查看HTML属性,最后单击应该如下所示的embed
/inc/embed/embed.php?file=bobs.burgers.s09e05.flv&hd=1&pid=437035&h=25424730eed390d0bb4634fa93a2e96c&t=1618011716&embed=cizgi
完成此操作后,单击播放,确保检查元素已打开,单击视频以查看属性(或单击ctrl+f以筛选<video>
)并复制src
https://cdn.cizgifilmlerizle.com/cizgi/bobs.burgers.s09e05.mp4?st=f9OWlOq1e-2M9eUVvhZa8A&e=1618019876
现在我们可以用python下载它。
import requests
# - - This module helps to transfer information from 1 file to another
import shutil
url = "https://cdn.cizgifilmlerizle.com/cizgi/bobs.burgers.s09e05.mp4?st=f9OWlOq1e-2M9eUVvhZa8A&e=1618019876"
response = requests.get(url, stream=True)
if response.status_code == 200:
# - - Make sure the file size is not 0
response.raw.decode_content = True
with open('bobs-burgers.mp4', 'wb') as f:
# - - Take the data from response.raw and transfer it to the file
shutil.copyfileobj(response.raw, f)
print('downloaded file')
else:
print('Download failed')
https://stackoverflow.com/questions/53196594
复制相似问题