首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >自动wget下载pdf文件-给定标题text/html;charset=UTF-8

自动wget下载pdf文件-给定标题text/html;charset=UTF-8
EN

Stack Overflow用户
提问于 2020-01-15 20:18:36
回答 1查看 119关注 0票数 0

我正在寻找一种方法来下载一个pdf文件从一个在线图书馆使用python wget。示例url可能为https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8006280

请求收到的结果内容类型是'text/html;charset=UTF-8‘。使用下载功能可以得到包含一些html内容的stamp.jsp文件--有没有人能找到pdf文件?

谢谢你们

接收到的html:

代码语言:javascript
复制
<script type="text/javascript" src="/assets/vendor/jquery/jquery.js?cv=20191217_000002" charset="utf-8"></script>

<!-- Fingerprint Cookie -->
<script type="text/javascript" src="/assets/vendor/js-cookie/src/js.cookie.js?cv=20191217_000002"></script>
<script type="text/javascript" src="/assets/vendor/fingerprintjs2/fingerprint2.js?cv=20191217_000002"></script>
<script type="text/javascript" src="/assets/js/lib/core/fingerprint.js?cv=20191217_000002"></script>
<script type="text/javascript">Xplore.Fingerprint.init();</script>

<!-- BEGIN: tealium in stamp/stamp.jsp. NOTE stamp.jsp does not use template.jsp, nor include common/assets.jsp, so including tealiumAnalytics.jsp here -->








		<!-- BEGIN: TealiumAnalytics.jsp -->
		
		
		
		
		
		
		
		
		
		
		
		
			
				
			
			
			
			
		
		
		
		
		
		

			<script type ="text/javascript">
 				// tealium config vars
				var TEALIUM_CONFIG_TAGGING_ENABLED = true;		
				var TEALIUM_CONFIG_CDN_URL = '//tags.tiqcdn.com/utag/';
				var TEALIUM_CONFIG_ACCOUNT_PROFILE_ENV = 'ieeexplore/main/prod';
				
				// tealium utag_data values for user 
				var TEALIUM_userType = 'Anonymous';
				var TEALIUM_userInstitutionId = '';
				var TEALIUM_userId = '';
				var TEALIUM_user_third_party = '';
				
				var TEALIUM_products = '';
			</script>


			<script type="text/javascript">
			// asynchronously load tealium's utag.js , which declares tealium JS variables like; utag_data, utag
			(function(a,b,c,d){
			
				a=TEALIUM_CONFIG_CDN_URL + TEALIUM_CONFIG_ACCOUNT_PROFILE_ENV + '/utag.js';
				b=document;c='script';d=b.createElement(c);d.src=a;
				d.type='text/java'+c;d.async=true;
				a=b.getElementsByTagName(c)[0];a.parentNode.insertBefore(d,a);
			})();
			</script>

			<script type="text/javascript" src="/assets/js/analytics/tealiumTagsData.js?cv=20191217_000002"></script>
			<script type="text/javascript" src="/assets/js/analytics/tealiumAnalytics.js?cv=20191217_000002"></script>


		
 		
		<!-- END: TealiumAnalytics.jsp -->
			 

<!-- END: tealium in stamp/stamp.jsp -->
		



<html lang="en-US">
	<head>	
		<title>IEEE Xplore Full-Text PDF: </title>
		<style>
			html {
			    margin: 0;
			    padding: 0;
			    overflow: hidden;
			}
			body {
			    margin: 0;
			    padding: 0;
			}
			iframe {
				display: block;
				position: fixed;
				width: 100%;
				height: 100%;
			}
		</style>
	</head>
	<body>
		<iframe src="https://ieeexplore.ieee.org/ielx7/6979/8326752/08006280.pdf?tp=&arnumber=8006280&isnumber=8326752&ref=" frameborder=0></iframe>
	</body>
</html>

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-01-15 20:26:16

恐怕您指向的URL不起作用。

通过查看网络消息(例如使用Chrome Inspection ),我实际上看到了正确的.pdf链接:https://ieeexplore.ieee.org/ielx7/6979/8326752/08006280.pdf

代码语言:javascript
复制
>>> import wget
>>> url = "https://ieeexplore.ieee.org/ielx7/6979/8326752/08006280.pdf"
>>> response = wget.download(url=url, out="/path/to/your/directory")
100% [..........................................................................] 2817205 / 2817205>>>
票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/59751300

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档