是否有请求/selenium函数将href链接转换为适当的链接,如:
clickLink("https://www.google.com","about")
返回像https://www.google.com/about这样的值吗?
就像它修复了href链接并转换为常规链接一样。
例如:
https://google.com about https://google.com/about
//www.pastebin.com/ / https://www.pastebin.com/等
我试着做一个,但没有运气
def fixLink(Link,LinkOriginalPage):
'''Fixes link. ex. /f/d -> https://www.wtds.com/f/d
LinkOriginalPage=page Link redirected from'''
if Link.startswith("https://") or Link.startswith("http://"):
return "debug1 " + Link # , and exit
#fix 329 links crawled! - Latest link: https://www.wikipedia.com/https://kl.wikipedia.org/
if Link.startswith("//"):
Link="debug2 " + "https:"+Link # example, //www.pastebin.com/ -> http://www.pastebin.com/
# print(Link)
return Link # due to glitch
# now link does not start with //
# check if link is like a/b/c->site.com/a/b/c
asciiLetters="abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
linkStartsWithValidProtocol=not (Link.startswith("http://") or Link.startswith("https://"))
linkDoesNotStartWithSlash=Link[0] in asciiLetters
if linkStartsWithValidProtocol and linkDoesNotStartWithSlash:
if LinkOriginalPage.endswith("/"):
Link="debug3 " + LinkOriginalPage+Link
else:
Link="debug4 " + LinkOriginalPage+"/"+Link
return Link
# now link does not start with ascii letter
# check if link is like /a/b/c
if Link.startswith("/"):
domainOfLink=getDomainFromLink(LinkOriginalPage)
# print(domainOfLink)
Link="debug 5|"+LinkOriginalPage+" http://"+domainOfLink+Link
# print("startswith / "+Link)
return Link # due to glitch
# fix div links (widely used bad code practice)
if Link.startswith("#"):
#glitch, invalud url like *&YT -> invalud url schema
#fix div
domainOfLink=getDomainFromLink(LinkOriginalPage)
Link="debug 6 "+domainOfLink+Link
return Link
# return the output if not returned (nvm)
return "https://about.io"发布于 2021-07-18 13:14:21
您可以在urllib.parse中使用"urljoin“函数。下面是一个例子。
from urllib.parse import urljoin
a = "http://www.example.com"
b = "index.html"
print(urljoin(a,b))
# Returns 'http://www.example.com/index.html'PS。http://www.example.com/确实存在。
发布于 2021-07-18 12:37:59
你试过把这些链接当作字符串吗?e.g
x = "google.com"
y = "about"
final_string= x+y然后用它作为函数中的参数?
https://stackoverflow.com/questions/68429106
复制相似问题