首页
学习
活动
专区
工具
TVP
发布
社区首页 >问答首页 >将长url转换为短url -替换python

将长url转换为短url -替换python
EN

Stack Overflow用户
提问于 2018-05-30 05:24:40
回答 1查看 57关注 0票数 1

myfile.txt中的长urls必须转换为短urls。这是在myfile.txt中:

代码语言:javascript
复制
26-04-2018 | Publication  2018, 88936 , https://search.publications.com/pgm-2018-88936.html?search=%3fzkt%3dextended%26pst%3dPublication%26vrt%3d%26zkd%3dInTitle%26dpr%3dAll%26spd%3d20180529%26epd%3d20180529%26sdt%3dDatePublication%26pubId%3d%26pnr%3d1%26rpp%3d10&resultInx=0&sorttype=1&sortorder=4

19-04-2018 | Publication 2018, 8168 , https://search.publications.com/pgm-2018-8168.html?search=%3fzkt%3dextended%26pst%3dPublication%26vrt%3d%26zkd%3dInTitle%26dpr%3dAll%26spd%3d20180529%26epd%3d20180529%26sdt%3dDatePublication%26pubId%3d%26pnr%3d1%26rpp%3d10&resultInx=1&sorttype=1&sortorder=4

26-03-2018 | Publication 2018, 611724 , https://search.publications.com/pgm-2018-611724.html?search=%3fzkt%3dextended%26pst%3dPublication%26vrt%3d%26zkd%3dInTitle%26dpr%3dAll%26spd%3d20180529%26epd%3d20180529%26sdt%3dDatePublication%26pubId%3d%26pnr%3d1%26rpp%3d10&resultInx=2&sorttype=1&sortorder=4

01-02-2017 | Publication 2017, 1452026 , https://search.publications.com/pgm-2017-1452026.html?search=%3fzkt%3dextended%26pst%3dPublication%26vrt%3d%26zkd%3dInTitle%26dpr%3dAll%26spd%3d20180529%26epd%3d20180529%26sdt%3dDatePublication%26pubId%3d%26pnr%3d1%26rpp%3d10&resultInx=3&sorttype=1&sortorder=4

在python 2.7中有这样的代码:

代码语言:javascript
复制
import re

with open('myfile.txt', 'r+') as myfile:
    data = myfile.read()
    url = re.findall(r'[^https.+?]', data)
    urlshort = re.findall(r'[^https.+html?]', data)
    for url in data:
        myfile.write(url.replace(url, urlshort, data))
myfile.close()

输出为:

回溯(最近一次调用):文件"/pyscripts/data.py",第9行,在myfile.write(url.replace(url,urlshort,data)) TypeError:需要一个整数

如何在文件中实现此功能?

EN

回答 1

Stack Overflow用户

发布于 2018-05-30 05:30:49

(https.*html).*中使用re.sub

代码语言:javascript
复制
import re

s = """
26-04-2018 | Publication  2018, 88936 , https://search.publications.com/pgm-2018-88936.html?search=%3fzkt%3dextended%26pst%3dPublication%26vrt%3d%26zkd%3dInTitle%26dpr%3dAll%26spd%3d20180529%26epd%3d20180529%26sdt%3dDatePublication%26pubId%3d%26pnr%3d1%26rpp%3d10&resultInx=0&sorttype=1&sortorder=4    
19-04-2018 | Publication 2018, 8168 , https://search.publications.com/pgm-2018-8168.html?search=%3fzkt%3dextended%26pst%3dPublication%26vrt%3d%26zkd%3dInTitle%26dpr%3dAll%26spd%3d20180529%26epd%3d20180529%26sdt%3dDatePublication%26pubId%3d%26pnr%3d1%26rpp%3d10&resultInx=1&sorttype=1&sortorder=4  
26-03-2018 | Publication 2018, 611724 , https://search.publications.com/pgm-2018-611724.html?search=%3fzkt%3dextended%26pst%3dPublication%26vrt%3d%26zkd%3dInTitle%26dpr%3dAll%26spd%3d20180529%26epd%3d20180529%26sdt%3dDatePublication%26pubId%3d%26pnr%3d1%26rpp%3d10&resultInx=2&sorttype=1&sortorder=4    
01-02-2017 | Publication 2017, 1452026 , https://search.publications.com/pgm-2017-1452026.html?search=%3fzkt%3dextended%26pst%3dPublication%26vrt%3d%26zkd%3dInTitle%26dpr%3dAll%26spd%3d20180529%26epd%3d20180529%26sdt%3dDatePublication%26pubId%3d%26pnr%3d1%26rpp%3d10&resultInx=3&sorttype=1&sortorder=4
"""

print(re.sub(r'(https.*html).*', r'\1', s))

输出:

代码语言:javascript
复制
26-04-2018 | Publication  2018, 88936 , https://search.publications.com/pgm-2018-88936.html
19-04-2018 | Publication 2018, 8168 , https://search.publications.com/pgm-2018-8168.html
26-03-2018 | Publication 2018, 611724 , https://search.publications.com/pgm-2018-611724.html
01-02-2017 | Publication 2017, 1452026 , https://search.publications.com/pgm-2017-1452026.html

这样,您就可以将re.sub的整个结果写入到您的文件中,而不是尝试替换您当前操作的方式。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/50593169

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档