文章/答案/技术大牛

发布

问网络抓取视频
EN

Stack Overflow用户

提问于 2018-11-08 03:37:40

回答 2查看 9.4K关注 0票数 2

我正试图通过在https://www.watchcartoononline.com/bobs-burgers-season-9-episode-3-tweentrepreneurs上下载“鲍勃的汉堡”的一集电视节目来验证概念。

我想不出如何从这个网站提取视频网址。我使用了Chrome和Firefox web开发工具来确定它是否在iframe中，但是使用BeautifulSoup搜索iframe来提取src urls，会返回与视频无关的链接。哪里是对mp4或flv文件的引用(我在开发人员工具中看到它们-尽管禁止单击它们)。

任何关于如何用BeautifulSoup和请求做视频网络抓取的理解将不胜感激。

如果需要，这里有一些代码。很多教程都说要使用'a‘标签，但我没有收到任何'a’标签。

import requests
from bs4 import BeautifulSoup

r = requests.get("https://www.watchcartoononline.com/bobs-burgers-season-9-episode-5-live-and-let-fly")
soup = BeautifulSoup(r.content,'html.parser')
links = soup.find_all('iframe')
for link in links:
    print(link['src'])

python

video

screen-scraping

回答 2

Stack Overflow用户

发布于 2018-11-08 04:53:26

import requests
url = "https://disk19.cizgifilmlerizle.com/cizgi/bobs.burgers.s09e03.mp4?st=_EEVz36ktZOv7ZxlTaXZfg&e=1541637622"
def download_file(url,filename):
    # NOTE the stream=True parameter
    r = requests.get(url, stream=True)
    with open(filename, 'wb') as f:
        for chunk in r.iter_content(chunk_size=1024): 
            if chunk: # filter out keep-alive new chunks
                f.write(chunk)
                #f.flush() commented by recommendation from J.F.Sebastian       
    return filename

download_file(url,"bobs.burgers.s09e03.mp4")

这段代码会把这一集下载到你的电脑上。视频url嵌套在<source>标签中的<video>标签内。

票数 5

Stack Overflow用户

发布于 2021-04-10 07:50:38

背景信息

(向下滚动查看您的答案)

只有当你试图获取视频格式的网站在中明确声明的时候，这才是容易获得的。例如，如果您希望通过引用inspect URL从选择的站点获取.mp4文件，则例如，如果我们在此处使用此站点；如果我们在inspect元素中查找https://4anime.to/yakunara-mug-cup-mo-episode-01-1?id=45314，则为there will be an src containing the .mp4

现在，如果我们尝试从这个网站获取.mp4 URL，如下所示

import requests
from bs4 import BeautifulSoup 


html_url = "https://4anime.to/yakunara-mug-cup-mo-episode-01-1?id=45314"
html_response = requests.get(html_url) 
soup = BeautifulSoup(html_response.text, 'html.parser') 


for mp4 in soup.find_all('video'):
    mp4 = mp4['src']

print(mp4)

我们将得到一个KeyError: 'src'输出。发生这种情况是因为实际的视频存储在source中，如果我们在soup.find_all('video')中打印出值，就可以查看该视频

import requests
from bs4 import BeautifulSoup 


html_url = "https://4anime.to/yakunara-mug-cup-mo-episode-01-1?id=45314"
html_response = requests.get(html_url) 
soup = BeautifulSoup(html_response.text, 'html.parser') 


for mp4 in soup.find_all('video'):
    pass

print(mp4)

输出：

<video class="video-js vjs-default-skin vjs-big-play-centered" controls="" data-setup="{}" height="264" id="example_video_1" poster="" preload="none" width="640">
<source src="https://mountainoservo0002.animecdn.com/Yakunara-Mug-Cup-mo/Yakunara-Mug-Cup-mo-Episode-01.1-1080p.mp4" type="video/mp4"/>
</video>

因此，如果我们希望现在下载 .mp4，我们将使用source元素并从中获取src。

import requests
import shutil # - - This module helps to transfer information from 1 file to another 
from bs4 import BeautifulSoup # - - We could honestly do this without soup


# - - Get the url of the site you want to scrape
html_url = "https://4anime.to/yakunara-mug-cup-mo-episode-01-1?id=45314"
html_response = requests.get(html_url) 
soup = BeautifulSoup(html_response.text, 'html.parser') 

# - - Get the .mp4 url and the filename 
for vid in soup.find_all('source'):
    url = vid['src']
    filename = vid['src'].split('/')[-1]

# - - Get the video 
response = requests.get(url, stream=True)

# - - Make sure the status is OK
if response.status_code == 200:
    # - - Make sure the file size is not 0
    response.raw.decode_content = True

    with open(filename, 'wb') as f:
        # - - Copy what's in response.raw and transfer it into the file
        shutil.copyfileobj(response.raw, f)

(您显然可以通过手动复制源代码的src并将其用作基URL来简化这一过程，而不必使用html_url。我只是想向您展示，您可以选择引用.mp4 (也称为源代码的src__))。

再说一次，并不是每个网站都是这样清晰的。特别是对于这个站点，我们很幸运它是可管理的。您可能尝试从其他站点抓取视频可能需要您从Elements (在inspect元素中)转到Network。在那里，你必须尝试获得嵌入链接的片段，并尝试将它们全部下载，以组成完整的视频，但再一次，并不总是那么容易，但您所请求的网站的视频是。

你的答案

转到inspect元素，单击位于视频顶部的Chromecast Player (2. Player)以查看HTML属性，最后单击应该如下所示的embed

/inc/embed/embed.php?file=bobs.burgers.s09e05.flv&amp;hd=1&amp;pid=437035&amp;h=25424730eed390d0bb4634fa93a2e96c&amp;t=1618011716&amp;embed=cizgi

完成此操作后，单击播放，确保检查元素已打开，单击视频以查看属性(或单击ctrl+f以筛选<video>)并复制src

https://cdn.cizgifilmlerizle.com/cizgi/bobs.burgers.s09e05.mp4?st=f9OWlOq1e-2M9eUVvhZa8A&e=1618019876

现在我们可以用python下载它。

import requests
# - - This module helps to transfer information from 1 file to another 
import shutil

   
url = "https://cdn.cizgifilmlerizle.com/cizgi/bobs.burgers.s09e05.mp4?st=f9OWlOq1e-2M9eUVvhZa8A&e=1618019876"

response = requests.get(url, stream=True)

if response.status_code == 200:
    # - - Make sure the file size is not 0
    response.raw.decode_content = True

    with open('bobs-burgers.mp4', 'wb') as f:
        #  - - Take the data from response.raw and transfer it to the file
        shutil.copyfileobj(response.raw, f)
    print('downloaded file')
else:
    print('Download failed')

票数 2

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/53196594

复制

相似问题

问网络抓取视频
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问网络抓取视频EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问网络抓取视频
EN