我需要刮除权限评分,有机搜索流量,反向链接从burton.com使用Selenium。
下面的脚本给出了一些错误,需要擦拭的Semrush图像
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
options = webdriver.ChromeOptions()
options.add_experimental_option('excludeSwitches', ['enable-logging'])
url = 'https://www.semrush.com/analytics/overview/?q=burton.com&searchType=domain' #your url
options = Options() #set up options
options.add_argument('--headless') #add --headless mode to options
driver = webdriver.Chrome(executable_path='c:\chromedriver.exe',chrome_options=options)
#note: executable_path will depend on where your chromedriver.exe is located
driver.get(url) #get response
driver.implicitly_wait(1) #wait to load content
elements = driver.find_element("xpath", '//a[@href="/info/burton.com+(by+organic)"]') #grab that stuff you wanted?
for e in elements: print(e.get_attribute('text').strip()) #print text fields
driver.quit() #close the driver when you're done
下面的是我在Visual代码上的错误, Semrush需要登录一个免费的试用程序来查看上面的数据,这会在这里造成问题吗?
PS C:\Users\akein> & C:/Python310/python.exe c:/Users/akein/OneDrive/Desktop/aaa.py
c:\Users\akein\OneDrive\Desktop\aaa.py:12: DeprecationWarning: executable_path has been deprecated, please pass in a Service object
driver = webdriver.Chrome(executable_path='c:\chromedriver.exe',chrome_options=options)
c:\Users\akein\OneDrive\Desktop\aaa.py:12: DeprecationWarning: use options instead of chrome_options
driver = webdriver.Chrome(executable_path='c:\chromedriver.exe',chrome_options=options)
DevTools listening on ws://127.0.0.1:50030/devtools/browser/6a717a35-4404-46d0-b2df-fa1ba06fbb3d
[1008/234714.670:INFO:CONSOLE(2)] "limitPopup", source: https://static.semrush.com/domain-overview/vendor.2365e1d7f296adbbe3c8.chunk.js (2)
[1008/234715.006:INFO:CONSOLE(2)] "SyntaxError: Unexpected token 'B', "Bad Request
" is not valid JSON", source: https://static.semrush.com/domain-overview/vendor.2365e1d7f296adbbe3c8.chunk.js (2)
[1008/234715.058:INFO:CONSOLE(2)] "dataLayerProxy: prop [[getByName]] is not supported", source: https://static.semrush.com/domain-overview/vendor.2365e1d7f296adbbe3c8.chunk.js (2)
[1008/234715.059:INFO:CONSOLE(2)] "dataLayerProxy: method call is not supported", source: https://static.semrush.com/domain-overview/vendor.2365e1d7f296adbbe3c8.chunk.js (2)
[1008/234715.059:INFO:CONSOLE(2)] "dataLayerProxy: prop [[getByName]] is not supported", source: https://static.semrush.com/domain-overview/vendor.2365e1d7f296adbbe3c8.chunk.js (2)
[1008/234715.060:INFO:CONSOLE(2)] "dataLayerProxy: method call is not supported", source: https://static.semrush.com/domain-overview/vendor.2365e1d7f296adbbe3c8.chunk.js (2)
[1008/234715.060:INFO:CONSOLE(2)] "dataLayerProxy: prop [[getByName]] is not supported", source: https://static.semrush.com/domain-overview/vendor.2365e1d7f296adbbe3c8.chunk.js (2)
[1008/234715.060:INFO:CONSOLE(2)] "dataLayerProxy: method call is not supported", source: https://static.semrush.com/domain-overview/vendor.2365e1d7f296adbbe3c8.chunk.js (2)
[1008/234715.068:INFO:CONSOLE(2)] "dataLayerProxy: prop [[getByName]] is not supported", source: https://static.semrush.com/domain-overview/vendor.2365e1d7f296adbbe3c8.chunk.js (2)
[1008/234715.068:INFO:CONSOLE(2)] "dataLayerProxy: method call is not supported", source: https://static.semrush.com/domain-overview/vendor.2365e1d7f296adbbe3c8.chunk.js (2)
[1008/234715.433:INFO:CONSOLE(2)] "SSO Frontend. You are using old value for defaultActiveTab parameter.
Please use loginForm instead of login.
For more information see the documentation.", source: https://static.semrush.com/domain-overview/vendor.2365e1d7f296adbbe3c8.chunk.js (2)
Traceback (most recent call last):
File "c:\Users\akein\OneDrive\Desktop\aaa.py", line 18, in <module>
elements = driver.find_element("xpath", '//a[@href="/info/burton.com+(by+organic)"]') #grab that stuff you wanted?
File "C:\Python310\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 856, in find_element
return self.execute(Command.FIND_ELEMENT, {
File "C:\Python310\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 429, in execute
self.error_handler.check_response(response)
File "C:\Python310\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 243, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//a[@href="/info/burton.com+(by+organic)"]"}
(Session info: headless chrome=106.0.5249.103)
Stacktrace:
Backtrace:
Ordinal0 [0x00D71ED3+2236115]
Ordinal0 [0x00D092F1+1807089]
Ordinal0 [0x00C166FD+812797]
Ordinal0 [0x00C455DF+1005023]
Ordinal0 [0x00C457CB+1005515]
Ordinal0 [0x00C77632+1209906]
Ordinal0 [0x00C61AD4+1120980]
Ordinal0 [0x00C759E2+1202658]
Ordinal0 [0x00C618A6+1120422]
Ordinal0 [0x00C3A73D+960317]
Ordinal0 [0x00C3B71F+964383]
GetHandleVerifier [0x0101E7E2+2743074]
GetHandleVerifier [0x010108D4+2685972]
GetHandleVerifier [0x00E02BAA+532202]
GetHandleVerifier [0x00E01990+527568]
Ordinal0 [0x00D1080C+1837068]
Ordinal0 [0x00D14CD8+1854680]
Ordinal0 [0x00D14DC5+1854917]
Ordinal0 [0x00D1ED64+1895780]
BaseThreadInitThunk [0x7666FA29+25]
RtlGetAppContainerNamedObjectPath [0x77427A9E+286]
RtlGetAppContainerNamedObjectPath [0x77427A6E+238]
A
发布于 2022-10-09 06:49:25
导致错误no such element: Unable to locate element: {"method":"xpath","selector":"//a[@href="/info/burton.com+(by+organic)"]"
的原因是上面的xpath不在页面上。
首先,在访问网站时,我们需要Log In
为了提取Authority score, Organic Search Traffic, Backlinks
表单站点,我们可以对这些字段使用label
,并将值作为字段的relative value
查找(因为value字段没有任何特定的id)
你的解决方案看起来就像
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
options = webdriver.ChromeOptions()
options.add_experimental_option('excludeSwitches', ['enable-logging'])
url = 'https://www.semrush.com/analytics/overview/?
q=burton.com&searchType=domain' #your url
options = Options() #set up options
options.add_argument('--headless') #add --headless mode to options
options.add_argument("--window-size=1920x1080")
options.add_argument('--log-level=3') # Only display fatal logs and remove info logs from selenium output console
driver = webdriver.Chrome(executable_path='c:\chromedriver.exe',chrome_options=options)
driver.get(url) #get response
# Login to website
driver.find_element(By.XPATH, "//span[contains(text(), 'Log In')]").click()
driver.find_element(By.ID, "email").send_keys("your usename")
driver.find_element(By.ID, "password").send_keys("your password")
driver.find_element(By.XPATH, "//div[contains(text(), 'Log in')]").click()
# Ensure user is logged in
WebDriverWait(driver, 3).until(EC.presence_of_element_located((By.TAG_NAME,
"use")))
# Add label of the field for which details need to be fetched
tags = ['Authority Score', 'Organic search traffic', 'Backlinks']
for tag in tags:
print(driver.find_element(By.XPATH, f"//span[contains(text(), '{tag}')]/ancestor::div[@direction='column']/descendant::a[@data-at='main-number']/span").text)
driver.quit() #close the driver when you're done
The below logs are not error in the selenium script what it is displaying is the browser console message when you visit the above url i.e https://www.semrush.com/analytics/overview/? q=burton.com&searchType=domain
You can view those messages manually as well by navigating to the url and viewing the console(Right click on the page and click inspect element)
If you do not want them to pop in your selenium script console you can add the following chrome argument to only show logs when there is an error on the site options.add_argument('--log-level=3'))
用相同的方法更新答案
PS C:\Users\akein> & C:/Python310/python.exe c:/Users/akein/OneDrive/Desktop/stackhelp1.py
c:\Users\akein\OneDrive\Desktop\stackhelp1.py:15: DeprecationWarning: executable_path has been deprecated, please pass in a Service object
driver = webdriver.Chrome(executable_path='c:\chromedriver.exe',chrome_options=options)
c:\Users\akein\OneDrive\Desktop\stackhelp1.py:15: DeprecationWarning: use options instead of chrome_options
driver = webdriver.Chrome(executable_path='c:\chromedriver.exe',chrome_options=options)
这些是来自网页的控制台消息。
DevTools listening on ws://127.0.0.1:61855/devtools/browser/2d0fea02-dfc1-499b-be1c-698558028f9b
[1009/140232.922:INFO:CONSOLE(1)] "dataLayerProxy: prop [[getByName]] is not supported", source: https://www.semrush.com/__static__/webpack/data_layer_proxy.bce1755d.js (1)
[1009/140232.923:INFO:CONSOLE(1)] "dataLayerProxy: method call is not supported", source: https://www.semrush.com/__static__/webpack/data_layer_proxy.bce1755d.js (1)
[1009/140232.923:INFO:CONSOLE(1)] "dataLayerProxy: prop [[getByName]] is not supported", source: https://www.semrush.com/__static__/webpack/data_layer_proxy.bce1755d.js (1)
[1009/140232.923:INFO:CONSOLE(1)] "dataLayerProxy: method call is not supported", source: https://www.semrush.com/__static__/webpack/data_layer_proxy.bce1755d.js (1)
[1009/140232.924:INFO:CONSOLE(1)] "dataLayerProxy: prop [[getByName]] is not supported", source: https://www.semrush.com/__static__/webpack/data_layer_proxy.bce1755d.js (1)
[1009/140232.924:INFO:CONSOLE(1)] "dataLayerProxy: method call is not supported", source: https://www.semrush.com/__static__/webpack/data_layer_proxy.bce1755d.js (1)
[1009/140232.931:INFO:CONSOLE(1)] "dataLayerProxy: prop [[getByName]] is not supported", source: https://www.semrush.com/__static__/webpack/data_layer_proxy.bce1755d.js (1)
[1009/140232.932:INFO:CONSOLE(1)] "dataLayerProxy: method call is not supported", source: https://www.semrush.com/__static__/webpack/data_layer_proxy.bce1755d.js (1)
[1009/140233.435:INFO:CONSOLE(2)] "limitPopup", source: https://static.semrush.com/domain-overview/vendor.2365e1d7f296adbbe3c8.chunk.js (2)
[1009/140234.006:INFO:CONSOLE(2)] "SyntaxError: Unexpected token 'B', "Bad Request
" is not valid JSON", source: https://static.semrush.com/domain-overview/vendor.2365e1d7f296adbbe3c8.chunk.js (2)
[1009/140234.520:INFO:CONSOLE(2)] "SSO Frontend. You are using old value for defaultActiveTab parameter.
Please use loginForm instead of login.
For more information see the documentation.", source: https://static.semrush.com/domain-overview/vendor.2365e1d7f296adbbe3c8.chunk.js (2)
[1009/140235.753:INFO:CONSOLE(1)] "dataLayerProxy: prop [[getByName]] is not supported", source: https://www.semrush.com/__static__/webpack/data_layer_proxy.bce1755d.js (1)
[1009/140235.754:INFO:CONSOLE(1)] "dataLayerProxy: method call is not supported", source: https://www.semrush.com/__static__/webpack/data_layer_proxy.bce1755d.js (1)
[1009/140235.755:INFO:CONSOLE(1)] "dataLayerProxy: prop [[getByName]] is not supported", source: https://www.semrush.com/__static__/webpack/data_layer_proxy.bce1755d.js (1)
[1009/140235.755:INFO:CONSOLE(1)] "dataLayerProxy: method call is not supported", source: https://www.semrush.com/__static__/webpack/data_layer_proxy.bce1755d.js (1)
[1009/140235.756:INFO:CONSOLE(1)] "dataLayerProxy: prop [[getByName]] is not supported", source: https://www.semrush.com/__static__/webpack/data_layer_proxy.bce1755d.js (1)
[1009/140235.756:INFO:CONSOLE(1)] "dataLayerProxy: method call is not supported", source: https://www.semrush.com/__static__/webpack/data_layer_proxy.bce1755d.js (1)
[1009/140236.107:INFO:CONSOLE(2)] "Munchkin.init("%s") options: 519-IIY-869 [object Object]", source: https://www.semrush.com/static/spa.vendors.chunk.87dbdd75124bc5d6b456.js (2)
[1009/140237.151:INFO:CONSOLE(2)] "Your client application uses libraries for user authentication or authorization that will soon be deprecated. See the [Migration Guide](https://developers.google.com/identity/gsi/web/guides/gis-migration) for more information.", source: https://www.semrush.com/static/spa.vendors.chunk.87dbdd75124bc5d6b456.js (2)
[1009/140241.967:INFO:CONSOLE(1)] "dataLayerProxy: prop [[getByName]] is not supported", source: https://www.semrush.com/__static__/webpack/data_layer_proxy.bce1755d.js (1)
[1009/140241.967:INFO:CONSOLE(1)] "dataLayerProxy: method call is not supported", source: https://www.semrush.com/__static__/webpack/data_layer_proxy.bce1755d.js (1)
[1009/140241.968:INFO:CONSOLE(1)] "dataLayerProxy: prop [[getByName]] is not supported", source: https://www.semrush.com/__static__/webpack/data_layer_proxy.bce1755d.js (1)
[1009/140241.968:INFO:CONSOLE(1)] "dataLayerProxy: method call is not supported", source: https://www.semrush.com/__static__/webpack/data_layer_proxy.bce1755d.js (1)
[1009/140241.968:INFO:CONSOLE(1)] "dataLayerProxy: prop [[getByName]] is not supported", source: https://www.semrush.com/__static__/webpack/data_layer_proxy.bce1755d.js (1)
[1009/140241.969:INFO:CONSOLE(1)] "dataLayerProxy: method call is not supported", source: https://www.semrush.com/__static__/webpack/data_layer_proxy.bce1755d.js (1)
[1009/140241.994:INFO:CONSOLE(1)] "dataLayerProxy: prop [[getByName]] is not supported", source: https://www.semrush.com/__static__/webpack/data_layer_proxy.bce1755d.js (1)
[1009/140241.994:INFO:CONSOLE(1)] "dataLayerProxy: method call is not supported", source: https://www.semrush.com/__static__/webpack/data_layer_proxy.bce1755d.js (1)
[1009/140244.394:INFO:CONSOLE(2)] "Munchkin.init("%s") options: 519-IIY-869 [object Object]", source: https://static.semrush.com/domain-overview/vendor.2365e1d7f296adbbe3c8.chunk.js (2)
[1009/140245.412:INFO:CONSOLE(2)] "FirstScreenHeroData222 summary_organic", source: https://static.semrush.com/domain-overview/vendor.2365e1d7f296adbbe3c8.chunk.js (2)
[1009/140245.412:INFO:CONSOLE(2)] "FirstScreenHeroData222 summary_adwords", source: https://static.semrush.com/domain-overview/vendor.2365e1d7f296adbbe3c8.chunk.js (2)
[1009/140245.708:INFO:CONSOLE(2)] "FirstScreenHeroData222 summary_backlinks", source: https://static.semrush.com/domain-overview/vendor.2365e1d7f296adbbe3c8.chunk.js (2)
这是实际输出
72
1.6M
74.1M
https://stackoverflow.com/questions/73999564
复制相似问题