前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >使用python批量下载文献

使用python批量下载文献

作者头像
DoubleHelix
发布2023-09-06 10:03:13
3050
发布2023-09-06 10:03:13
举报
文章被收录于专栏:生物信息云生物信息云

代码语言:javascript
复制
pip3 install sci-hub
代码语言:javascript
复制
scihub -c

下载:

代码语言:javascript
复制
scihub -s 10.1186/s12864-016-2858-0

方式二:

代码语言:javascript
复制
import requests
from bs4 import BeautifulSoup
import os
代码语言:javascript
复制
path = ".\\downloadArticles\\"
if os.path.exists(path) == False:
    os.mkdir(path)  #创建保存下载文章的文件夹
f = open("doi_list.txt", "r", encoding="utf-8")  #存放DOI码的.txt文件中,每行存放一个文献的DOI码,完毕须换行(最后一个也须换行!)
head = {\
            'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.117 Safari/537.36'\
            }  #20210607更新,防止HTTP403错误
for line in f.readlines():
    line = line[:-1] #去换行符
    url = "https://www.sci-hub.ren/" + line + "#" #现在换成这个sci hub检索地址
    try:
        download_url = ""  #20211111更新
        r = requests.get(url, headers = head)
        r.raise_for_status()
        r.encoding = r.apparent_encoding
        soup = BeautifulSoup(r.text, "html.parser")
        #download_url = "https:" + str(soup.div.ul.find(href="#")).split("href='")[-1].split(".pdf")[0] + ".pdf" #寻找出存放该文献的pdf资源地址(该检索已失效)
        if soup.iframe == None:  #
            download_url = "https:" + soup.embed.attrs["src"] #
        else:
            download_url = soup.iframe.attrs["src"]  #
        print(line + " is downloading...\n  --The download url is: " + download_url)
        download_r = requests.get(download_url, headers = head)
        download_r.raise_for_status()
        with open(path + line.replace("/","_") + ".pdf", "wb+") as temp:
            temp.write(download_r.content)
    except:
        with open("error.txt", "a+") as error:
            error.write(line + " occurs error!\n")
            if "https://" in download_url:
                error.write(" --The download url is: " + download_url + "\n\n")
    else:
        download_url = ""  
        print(line + " download successfully.\n")
f.close()
本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2023-06-06,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 MedBioInfoCloud 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档