前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >下载 Google 机器学习速成课程python3 https://www.python.org pipenv https://github.com/pypa/pipenv request-html

下载 Google 机器学习速成课程python3 https://www.python.org pipenv https://github.com/pypa/pipenv request-html

作者头像
iOSDevLog
发布2018-05-17 11:06:04
1.1K0
发布2018-05-17 11:06:04
举报
文章被收录于专栏:iOSDevLog

python3 https://www.python.org

从官网下载安装或者用brew

代码语言:javascript
复制
$ brew linkapps python3
$ brew linkapps python3

pipenv https://github.com/pypa/pipenv

代码语言:javascript
复制
$ pip install pipenv

.zshrc

代码语言:javascript
复制
eval "$(pipenv --completion)"

request-html http://html.python-requests.org/en/latest/

代码语言:javascript
复制
$ git clone https://github.com/iOSDevLog/Machine-Learning-Crash-Course
$ cd Machine-Learning-Crash-Course
$ pipenv --python 3.6
$ pipenv install requests-html
$ pipenv shell     # To activate this project's virtualenv

fetch_course.py

代码语言:javascript
复制
#!/usr/bin/env python
import os
import ssl
import time

from requests_html import HTMLSession

ssl._create_default_https_context = ssl._create_unverified_context

base_url = 'https://developers.google.com/machine-learning/crash-course/'

def course_info(course_url):
    session = HTMLSession()
    request = session.get(course_url)

    data_video_url = ''
    data_captions_url = ''
    # video_info = request.html.find('.devsite-vplus', first=True)
    # data_video_url = video_info.attrs['data-video-url']
    # data_captions_url = video_info.attrs['data-captions-url']

    next_url_info = request.html.find('div.devsite-steps-next > a.devsite-steps-link', first=True)
    next_url = next_url_info.attrs['href']

    return (data_video_url, data_video_url, next_url)

import urllib.request

def getHtml(url):
    html = urllib.request.urlopen(url).read()
    return html

def saveHtml(file_name, file_content):
    dir = 'course_html/'
    file_name = file_name.replace('/','_')+'.html'
    path = os.path.join(dir, file_name)
    with open (path, 'wb') as f:
        f.write(file_content)

if __name__ == '__main__':
    next_url = 'https://developers.google.com/machine-learning/crash-course/framing/check-your-understanding'
    while next_url:
        try:
            (_, _, next_url) = course_info(next_url)
            filename = os.path.basename(next_url)
            html = getHtml(next_url)
            saveHtml(filename, html)
            print(next_url)
        except:
            time.sleep(5)
            print("Was a nice sleep, now let me continue...")
            continue

data_video_url 为mp4视频相对地址 data_captions_url 为字幕相对地址

通过 base_url 可得到绝对地址,后面再写吧。

本文参与 腾讯云自媒体同步曝光计划,分享自作者个人站点/博客。
原始发表:2018.03.01 ,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • python3 https://www.python.org
  • pipenv https://github.com/pypa/pipenv
    • .zshrc
    • request-html http://html.python-requests.org/en/latest/
    • fetch_course.py
    领券
    问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档