blocks|key|794853|text|没有一种简单的方法可以获得webelement的HTML源代码。您将不得不使用JavaScript。我对python绑定不太确定，但在Java中可以很容易地做到这一点。我相信Python中一定有类似于JavascriptExecutor类的东西。|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|794854|+WebElement+element+=+driver.findElement(By.id("foo"));
+String+contents+=+(String)((JavascriptExecutor)driver).executeScript("return+arguments[0].innerHTML;",+element);|code-block|syntax|javascript|794855|entityMap^0|D|A|2S|I|0|0^^$0|@$1|2|3|4|5|6|7|M|8|@$9|N|A|O|B|C]|$9|P|A|Q|B|C]]|D|@]|E|$]]|$1|F|3|G|5|H|7|R|8|@]|D|@]|E|$I|J]]|$1|K|3|-4|5|6|7|S|8|@]|D|@]|E|$]]]|L|$]]

There is not really a straightforward way of getting the HTML source code of a <code>webelement</code>. You will have to use JavaScript. I am not too sure about python bindings, but you can easily do like this in Java. I am sure there must be something similar to <code>JavascriptExecutor</code> class in Python.
<pre><code> WebElement element = driver.findElement(By.id(&quot;foo&quot;));
 String contents = (String)((JavascriptExecutor)driver).executeScript(&quot;return arguments[0].innerHTML;&quot;, element);
</code></pre>

blocks|key|579056|text|当然，我们可以在Selenium+Python中使用以下脚本获得所有HTML源代码：|type|unstyled|depth|inlineStyleRanges|entityRanges|data|579057|elem+=+driver.find_element_by_xpath("//*")
source_code+=+elem.get_attribute("outerHTML")|code-block|syntax|javascript|579058|如果您要将其保存到文件中：|579059|with+open('c:/html_source_code.html',+'w')+as+f:
++++f.write(source_code.encode('utf-8'))|579060|我建议保存到一个文件中，因为源代码非常非常长。|579061|entityMap^0|0|0|0|0|0^^$0|@$1|2|3|4|5|6|7|O|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|P|8|@]|9|@]|A|$E|F]]|$1|G|3|H|5|6|7|Q|8|@]|9|@]|A|$]]|$1|I|3|J|5|D|7|R|8|@]|9|@]|A|$E|F]]|$1|K|3|L|5|6|7|S|8|@]|9|@]|A|$]]|$1|M|3|-4|5|6|7|T|8|@]|9|@]|A|$]]]|N|$]]

Sure we can get all HTML source code with this script below in Selenium Python:

<pre><code>elem = driver.find_element_by_xpath("//*")
source_code = elem.get_attribute("outerHTML")
</code></pre>

If you you want to save it to file:

<pre><code>with open('c:/html_source_code.html', 'w') as f:
 f.write(source_code.encode('utf-8'))
</code></pre>

I suggest saving to a file because source code is very very long.

blocks|key|795136|text|在Ruby中，使用selenium-webdriver+(2.32.1)，有一个包含整个页面源代码的page_source方法。|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|795137|entityMap^0|1E|B|0^^$0|@$1|2|3|4|5|6|7|H|8|@$9|I|A|J|B|C]]|D|@]|E|$]]|$1|F|3|-4|5|6|7|K|8|@]|D|@]|E|$]]]|G|$]]

In Ruby, using selenium-webdriver (2.32.1), there is a <code>page_source</code> method that contains the entire page source.

blocks|key|795245|text|它看起来过时了，但不管怎样，还是让它在这里吧。在您的情况下，正确的方法是：|type|unstyled|depth|inlineStyleRanges|entityRanges|data|795246|elem+=+wd.find_element_by_css_selector('#my-id')
html+=+wd.execute_script("return+arguments[0].innerHTML;",+elem)|code-block|syntax|javascript|795247|或|795248|html+=+elem.get_attribute('innerHTML')|795249|两者都适合我(selenium-server-standalone-2.35.0)。|795250|entityMap^0|0|0|0|0|0^^$0|@$1|2|3|4|5|6|7|O|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|P|8|@]|9|@]|A|$E|F]]|$1|G|3|H|5|6|7|Q|8|@]|9|@]|A|$]]|$1|I|3|J|5|D|7|R|8|@]|9|@]|A|$E|F]]|$1|K|3|L|5|6|7|S|8|@]|9|@]|A|$]]|$1|M|3|-4|5|6|7|T|8|@]|9|@]|A|$]]]|N|$]]

It looks outdated, but let it be here anyway. The correct way to do it in your case:
<pre><code>elem = wd.find_element_by_css_selector('#my-id')
html = wd.execute_script(&quot;return arguments[0].innerHTML;&quot;, elem)
</code></pre>
or
<pre><code>html = elem.get_attribute('innerHTML')
</code></pre>
Both are working for me (selenium-server-standalone-2.35.0).

blocks|key|579320|text|带Selenium+2.53.0的Java|type|unstyled|depth|inlineStyleRanges|entityRanges|data|579321|driver.getPageSource();|code-block|syntax|javascript|579322|entityMap^0|0|0^^$0|@$1|2|3|4|5|6|7|I|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|J|8|@]|9|@]|A|$E|F]]|$1|G|3|-4|5|6|7|K|8|@]|9|@]|A|$]]]|H|$]]

Java with Selenium 2.53.0

<pre><code>driver.getPageSource();
</code></pre>

blocks|key|794896|text|我希望这能有所帮助：http://selenium.googlecode.com/svn/trunk/docs/api/java/org/openqa/selenium/WebElement.html|type|unstyled|depth|inlineStyleRanges|entityRanges|offset|length|data|794897|下面是Java方法的描述：|794898|java.lang.String++++getText()+|code-block|syntax|javascript|794899|但不幸的是，它在Python中不可用。因此，您可以将方法名称从Java转换为Python，并使用现有方法尝试另一种逻辑，而无需获取整个页面源代码……|794900|例如。|794901|+my_id+=+elem[0].get_attribute('my-id')|794902|entityMap|0|LINK|mutability|MUTABLE|url|http://selenium.googlecode.com/svn/trunk/docs/api/java/org/openqa/selenium/WebElement.html^0|A|2I|0|0|0|0|0|0|0^^$0|@$1|2|3|4|5|6|7|Y|8|@]|9|@$A|Z|B|10|1|11]]|C|$]]|$1|D|3|E|5|6|7|12|8|@]|9|@]|C|$]]|$1|F|3|G|5|H|7|13|8|@]|9|@]|C|$I|J]]|$1|K|3|L|5|6|7|14|8|@]|9|@]|C|$]]|$1|M|3|N|5|6|7|15|8|@]|9|@]|C|$]]|$1|O|3|P|5|H|7|16|8|@]|9|@]|C|$I|J]]|$1|Q|3|-4|5|6|7|17|8|@]|9|@]|C|$]]]|R|$S|$5|T|U|V|C|$W|X]]]]

I hope this could help:
<a href="http://selenium.googlecode.com/svn/trunk/docs/api/java/org/openqa/selenium/WebElement.html" rel="nofollow">http://selenium.googlecode.com/svn/trunk/docs/api/java/org/openqa/selenium/WebElement.html</a>

Here is described Java method:

<pre><code>java.lang.String getText() 
</code></pre>

But unfortunately it's not available in Python. So you can translate the method names to Python from Java and try another logic using present methods without getting the whole page source...

E.g.

<pre><code> my_id = elem[0].get_attribute('my-id')
</code></pre>

blocks|key|795364|text|InnerHTML将返回所选元素内的元素，outerHTML将返回内含HTML以及所选元素|type|unstyled|depth|inlineStyleRanges|entityRanges|data|795365|示例：|795366|现在假设您的元素如下所示|795367|<tr+id="myRow"><td>A</td><td>B</td></tr>|code-block|syntax|javascript|795368|innerHTML元素输出|795369|<td>A</td><td>B</td>|795370|outerHTML元素输出|795371|<tr+id="myRow"><td>A</td><td>B</td></tr>|795372|现场示例：|795373|http://www.java2s.com/Tutorials/JavascriptDemo/f/find_out_the_difference_between_innerhtml_and_outerhtml_in_javascript_example.htm|offset|length|795374|下面你会发现根据不同的绑定所需的语法。根据需要将innerHTML更改为outerHTML。|style|CODE|795375|Python：|795376|element.get_attribute('innerHTML')|795377|Java：|795378|elem.getAttribute("innerHTML");|795379|如果你想要整个页面的HTML，使用下面的代码：|795380|driver.getPageSource();|795381|entityMap|0|LINK|mutability|MUTABLE|url^0|0|0|0|0|0|0|0|0|0|0|3M|0|0|O|9|10|9|0|0|0|0|0|0|0^^$0|@$1|2|3|4|5|6|7|1L|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|1M|8|@]|9|@]|A|$]]|$1|D|3|E|5|6|7|1N|8|@]|9|@]|A|$]]|$1|F|3|G|5|H|7|1O|8|@]|9|@]|A|$I|J]]|$1|K|3|L|5|6|7|1P|8|@]|9|@]|A|$]]|$1|M|3|N|5|H|7|1Q|8|@]|9|@]|A|$I|J]]|$1|O|3|P|5|6|7|1R|8|@]|9|@]|A|$]]|$1|Q|3|R|5|H|7|1S|8|@]|9|@]|A|$I|J]]|$1|S|3|T|5|6|7|1T|8|@]|9|@]|A|$]]|$1|U|3|V|5|6|7|1U|8|@]|9|@$W|1V|X|1W|1|1X]]|A|$]]|$1|Y|3|Z|5|6|7|1Y|8|@$W|1Z|X|20|10|11]|$W|21|X|22|10|11]]|9|@]|A|$]]|$1|12|3|13|5|6|7|23|8|@]|9|@]|A|$]]|$1|14|3|15|5|H|7|24|8|@]|9|@]|A|$I|J]]|$1|16|3|17|5|6|7|25|8|@]|9|@]|A|$]]|$1|18|3|19|5|H|7|26|8|@]|9|@]|A|$I|J]]|$1|1A|3|1B|5|6|7|27|8|@]|9|@]|A|$]]|$1|1C|3|1D|5|H|7|28|8|@]|9|@]|A|$I|J]]|$1|1E|3|-4|5|6|7|29|8|@]|9|@]|A|$]]]|1F|$1G|$5|1H|1I|1J|A|$1K|V]]]]

InnerHTML will return the element inside the selected element and outerHTML will return the inside HTML along with the element you have selected
Example:
Now suppose your Element is as below
<pre><code>&lt;tr id=&quot;myRow&quot;&gt;&lt;td&gt;A&lt;/td&gt;&lt;td&gt;B&lt;/td&gt;&lt;/tr&gt;
</code></pre>
<h3>innerHTML element output</h3>
<pre><code>&lt;td&gt;A&lt;/td&gt;&lt;td&gt;B&lt;/td&gt;
</code></pre>
<h3>outerHTML element output</h3>
<pre><code>&lt;tr id=&quot;myRow&quot;&gt;&lt;td&gt;A&lt;/td&gt;&lt;td&gt;B&lt;/td&gt;&lt;/tr&gt;
</code></pre>
Live Example:
<a href="http://www.java2s.com/Tutorials/JavascriptDemo/f/find_out_the_difference_between_innerhtml_and_outerhtml_in_javascript_example.htm" rel="nofollow noreferrer">http://www.java2s.com/Tutorials/JavascriptDemo/f/find_out_the_difference_between_innerhtml_and_outerhtml_in_javascript_example.htm</a>
Below you will find the syntax which require as per different binding. Change the <code>innerHTML</code> to <code>outerHTML</code> as per required.
Python:
<pre><code>element.get_attribute('innerHTML')
</code></pre>
Java:
<pre><code>elem.getAttribute(&quot;innerHTML&quot;);
</code></pre>
If you want whole page HTML, use the below code:
<pre><code>driver.getPageSource();
</code></pre>

blocks|key|795452|text|这对我来说是无缝工作的。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|795453|element.get_attribute('innerHTML')|code-block|syntax|javascript|795454|entityMap^0|0|0^^$0|@$1|2|3|4|5|6|7|I|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|J|8|@]|9|@]|A|$E|F]]|$1|G|3|-4|5|6|7|K|8|@]|9|@]|A|$]]]|H|$]]

This works seamlessly for me.

<pre><code>element.get_attribute('innerHTML')
</code></pre>

blocks|key|579419|text|获取我喜欢的渲染HTML的方法如下所示：|type|unstyled|depth|inlineStyleRanges|entityRanges|data|579420|driver.get("http://www.google.com")
body_html+=+driver.find_element_by_xpath("/html/body")
print+body_html.text|code-block|syntax|javascript|579421|但是，上面的方法删除了所有标记(是的，也删除了嵌套的标记)，并且只返回文本内容。如果您也对获取HTML标记感兴趣，那么使用下面的方法。|579422|print+body_html.getAttribute("innerHTML")|579423|entityMap^0|0|0|0|0^^$0|@$1|2|3|4|5|6|7|M|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|N|8|@]|9|@]|A|$E|F]]|$1|G|3|H|5|6|7|O|8|@]|9|@]|A|$]]|$1|I|3|J|5|D|7|P|8|@]|9|@]|A|$E|F]]|$1|K|3|-4|5|6|7|Q|8|@]|9|@]|A|$]]]|L|$]]

The method to get the rendered HTML I prefer is the following:
<pre><code>driver.get(&quot;http://www.google.com&quot;)
body_html = driver.find_element_by_xpath(&quot;/html/body&quot;)
print body_html.text
</code></pre>
However, the above method removes all the tags (yes, the nested tags as well) and returns only text content. If you interested in getting the HTML markup as well, then use the method below.
<pre><code>print body_html.getAttribute(&quot;innerHTML&quot;)
</code></pre>

blocks|key|579196|text|如果你对Python中的Selenium+Remote+Control解决方案感兴趣，这里是如何获得innerHTML的：|type|unstyled|depth|inlineStyleRanges|entityRanges|offset|length|data|579197|innerHTML+=+sel.get_eval("window.document.getElementById('prodid').innerHTML")|code-block|syntax|javascript|579198|entityMap|0|LINK|mutability|MUTABLE|url|https://en.wikipedia.org/wiki/Selenium_(software)#Selenium_Remote_Control^0|C|N|0|0|0^^$0|@$1|2|3|4|5|6|7|Q|8|@]|9|@$A|R|B|S|1|T]]|C|$]]|$1|D|3|E|5|F|7|U|8|@]|9|@]|C|$G|H]]|$1|I|3|-4|5|6|7|V|8|@]|9|@]|C|$]]]|J|$K|$5|L|M|N|C|$O|P]]]]

If you are interested in a solution for <a href="https://en.wikipedia.org/wiki/Selenium_(software)#Selenium_Remote_Control" rel="nofollow noreferrer">Selenium Remote Control</a> in Python, here is how to get innerHTML:
<pre><code>innerHTML = sel.get_eval(&quot;window.document.getElementById('prodid').innerHTML&quot;)
</code></pre>

blocks|key|579279|text|在PHPUnit+Selenium测试中是这样的：|type|unstyled|depth|inlineStyleRanges|entityRanges|offset|length|data|579280|$text+=+$this->byCssSelector('.some-class-nmae')->attribute('innerHTML');|code-block|syntax|javascript|579281|entityMap|0|LINK|mutability|MUTABLE|url|https://en.wikipedia.org/wiki/PHPUnit^0|1|7|0|0|0^^$0|@$1|2|3|4|5|6|7|Q|8|@]|9|@$A|R|B|S|1|T]]|C|$]]|$1|D|3|E|5|F|7|U|8|@]|9|@]|C|$G|H]]|$1|I|3|-4|5|6|7|V|8|@]|9|@]|C|$]]]|J|$K|$5|L|M|N|C|$O|P]]]]

And in <a href="https://en.wikipedia.org/wiki/PHPUnit" rel="nofollow noreferrer">PHPUnit</a> Selenium test it's like this:
<pre><code>$text = $this-&gt;byCssSelector('.some-class-nmae')-&gt;attribute('innerHTML');
</code></pre>

blocks|key|579484|text|使用execute_script获取html|type|unstyled|depth|inlineStyleRanges|entityRanges|data|579485|bs4(BeautifulSoup)也可以快速访问html标签。|579486|from+bs4+import+BeautifulSoup
html+=+adriver.execute_script("return+document.documentElement.outerHTML")
bs4_onepage_object=BeautifulSoup(html,"html.parser")
bs4_div_object=bs4_onepage_object.find_all("atag",class_="attribute")|code-block|syntax|javascript|579487|entityMap^0|0|0|0^^$0|@$1|2|3|4|5|6|7|K|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|L|8|@]|9|@]|A|$]]|$1|D|3|E|5|F|7|M|8|@]|9|@]|A|$G|H]]|$1|I|3|-4|5|6|7|N|8|@]|9|@]|A|$]]]|J|$]]

Use execute_script get html
bs4(BeautifulSoup) also can access html tag quickly.
<pre class="lang-py prettyprint-override"><code>from bs4 import BeautifulSoup
html = adriver.execute_script(&quot;return document.documentElement.outerHTML&quot;)
bs4_onepage_object=BeautifulSoup(html,&quot;html.parser&quot;)
bs4_div_object=bs4_onepage_object.find_all(&quot;atag&quot;,class_=&quot;attribute&quot;)
</code></pre>

blocks|key|795548|text|在PHP+Selenium+WebDriver中，你可以获得如下的页面源代码：|type|unstyled|depth|inlineStyleRanges|entityRanges|offset|length|data|795549|$html+=+$driver->getPageSource();|code-block|syntax|javascript|795550|或者获取元素的HTML，如下所示：|795551|//+innerHTML+if+you+need+HTML+of+the+element+content
$html+=+$element->getDomProperty('outerHTML');|795552|entityMap|0|LINK|mutability|MUTABLE|url|https://github.com/php-webdriver/php-webdriver^0|1|M|0|0|0|0|0^^$0|@$1|2|3|4|5|6|7|U|8|@]|9|@$A|V|B|W|1|X]]|C|$]]|$1|D|3|E|5|F|7|Y|8|@]|9|@]|C|$G|H]]|$1|I|3|J|5|6|7|Z|8|@]|9|@]|C|$]]|$1|K|3|L|5|F|7|10|8|@]|9|@]|C|$G|H]]|$1|M|3|-4|5|6|7|11|8|@]|9|@]|C|$]]]|N|$O|$5|P|Q|R|C|$S|T]]]]

In <a href="https://github.com/php-webdriver/php-webdriver" rel="nofollow noreferrer">PHP Selenium WebDriver</a> you can get page source like this:
<pre class="lang-php prettyprint-override"><code>$html = $driver-&gt;getPageSource();
</code></pre>
Or get HTML of the element like this:
<pre class="lang-php prettyprint-override"><code>// innerHTML if you need HTML of the element content
$html = $element-&gt;getDomProperty('outerHTML');
</code></pre>

blocks|key|794946|text|WebElement+element+=+driver.findElement(By.id("foo"));
String+contents+=+(String)((JavascriptExecutor)driver).executeScript("return+arguments[0].innerHTML;",+element);+|type|code-block|depth|inlineStyleRanges|entityRanges|data|syntax|javascript|794947|这段代码还可以从源代码中获取JavaScript！|unstyled|794948|entityMap^0|0|0^^$0|@$1|2|3|4|5|6|7|I|8|@]|9|@]|A|$B|C]]|$1|D|3|E|5|F|7|J|8|@]|9|@]|A|$]]|$1|G|3|-4|5|F|7|K|8|@]|9|@]|A|$]]]|H|$]]

<pre><code>WebElement element = driver.findElement(By.id(&quot;foo&quot;));
String contents = (String)((JavascriptExecutor)driver).executeScript(&quot;return arguments[0].innerHTML;&quot;, element); 
</code></pre>
This code really works to get JavaScript from source as well!

I'm using the Python bindings to run Selenium WebDriver:
<pre><code>from selenium import webdriver
wd = webdriver.Firefox()
</code></pre>
I know I can grab a webelement like so:
<pre><code>elem = wd.find_element_by_css_selector('#my-id')
</code></pre>
And I know I can get the full page source with...
<pre><code>wd.page_source
</code></pre>
But is there a way to get the &quot;element source&quot;?
<pre><code>elem.source # &lt;-- returns the HTML as a string
</code></pre>
The Selenium WebDriver documentation for Python are basically non-existent and I don't see anything in the code that seems to enable that functionality.
What is the best way to access the HTML of an element (and its children)?

Get HTML source of WebElement in Selenium WebDriver using Python

我正在使用Python绑定来运行Selenium WebDriver：from selenium import webdriverwd = webdriver.Firefox()我知道我可以像这样抓取一个网页：elem = wd.find_element_by_css_selector('#my-id')我知道我可以拿到整个页面的源代码...wd.page_source但是有没有办法获得“元素源”

问使用Python获取Selenium WebDriver中WebElement的HTML源代码
EN

回答 14

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用Python获取Selenium WebDriver中WebElement的HTML源代码EN

回答 14

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用Python获取Selenium WebDriver中WebElement的HTML源代码
EN