文章/答案/技术大牛

发布

社区首页 >问答首页 >如何使函数修复href链接？

问如何使函数修复href链接？
EN

Stack Overflow用户

提问于 2021-07-18 12:30:54

回答 2查看 47关注 0票数 0

是否有请求/selenium函数将href链接转换为适当的链接，如：

clickLink("https://www.google.com","about")

返回像https://www.google.com/about这样的值吗？

就像它修复了href链接并转换为常规链接一样。

例如：

https://google.com about https://google.com/about
//www.pastebin.com/ / https://www.pastebin.com/

等

我试着做一个，但没有运气

def fixLink(Link,LinkOriginalPage):
    '''Fixes link. ex. /f/d -> https://www.wtds.com/f/d
    LinkOriginalPage=page Link redirected from'''
    if Link.startswith("https://") or Link.startswith("http://"):
        return "debug1 " + Link # , and exit
        #fix 329 links crawled! - Latest link: https://www.wikipedia.com/https://kl.wikipedia.org/
    if Link.startswith("//"):
        Link="debug2 " + "https:"+Link # example, //www.pastebin.com/ -> http://www.pastebin.com/
        # print(Link)
        return Link # due to glitch
    # now link does not start with //
    # check if link is like a/b/c->site.com/a/b/c
    asciiLetters="abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
    linkStartsWithValidProtocol=not (Link.startswith("http://") or Link.startswith("https://"))
    linkDoesNotStartWithSlash=Link[0] in asciiLetters
    if linkStartsWithValidProtocol and linkDoesNotStartWithSlash:
        if LinkOriginalPage.endswith("/"):
            Link="debug3 " + LinkOriginalPage+Link
        else:
            Link="debug4 " + LinkOriginalPage+"/"+Link
        return Link
    # now link does not start with ascii letter
    # check if link is like /a/b/c
    if Link.startswith("/"):
        domainOfLink=getDomainFromLink(LinkOriginalPage)
        # print(domainOfLink)
        Link="debug 5|"+LinkOriginalPage+" http://"+domainOfLink+Link
        # print("startswith / "+Link)
        return Link # due to glitch
    # fix div links (widely used bad code practice)
    if Link.startswith("#"):
        #glitch, invalud url like *&YT -> invalud url schema
        #fix div
        domainOfLink=getDomainFromLink(LinkOriginalPage)
        Link="debug 6 "+domainOfLink+Link
        return Link
    # return the output if not returned (nvm)
    return "https://about.io"

python

回答 2

Stack Overflow用户

回答已采纳

发布于 2021-07-18 13:14:21

您可以在urllib.parse中使用"urljoin“函数。下面是一个例子。

from urllib.parse import urljoin
a = "http://www.example.com"
b = "index.html"
print(urljoin(a,b))
# Returns 'http://www.example.com/index.html'

PS。http://www.example.com/确实存在。

票数 0

Stack Overflow用户

发布于 2021-07-18 12:37:59

你试过把这些链接当作字符串吗？e.g

x = "google.com"
y = "about"
final_string= x+y

然后用它作为函数中的参数？

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/68429106

复制

相似问题

问如何使函数修复href链接？
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何使函数修复href链接？EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何使函数修复href链接？
EN