为什么html.parse(url)失败了,当使用requests时,html.fromstring工作,html.parse(url2)工作?lxml 3.4.2
Python 2.7.9 (default, Dec 10 2014, 12:28:03) [MSC v.1500 64 bit (AMD64)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> import requests
>>>
LXML的构建器allows for easily generation of HTML and XML,如下所示: >>>from lxml.builder import E
>>>import lxml.etree
>>>lxml.etree.tostring(E.html('hello'))
b'<html>hello</html>' 但是如果我包含HTML中已经存在的文本,它就会转义尖括号,这是理所应当的: >>>lxml.etree.tostring(E
我是lxml的新手。我想要下载网页并从获取感兴趣的数据,我的代码是:
import urllib2
from lxml import etree
url = "http://www.example.com/"
html = urllib2.urlopen(url)
root = etree.parse(html) # the problem is here
有人能给我解释一下为什么它是错的吗?
错误是:
Traceback (most recent call last):
File "yatego.py", line 10, in <module
考虑一下这个Python脚本:
from lxml import etree
html = '''
<html xmlns="http://www.w3.org/1999/xhtml">
<head></head>
<body>
<p>This is some text followed with 2 citations.<span class="footnote">1</span>
<span сlass="fo
在使用Inkscape时,我一直会遇到一些错误,这些错误似乎意味着python2vs3的期望没有得到满足,尽管我已经安装了它们。例如,当我尝试从模板生成一个新文档时,我得到,
Traceback (most recent call last):
File "empty_generic.py", line 82, in <module>
c.affect()
File "/usr/share/inkscape/extensions/inkex.py", line 285, in affect
self.output()
Fi
下面是我用lxml编写的python代码
import urllib.request
from lxml import etree
#import lxml.html as html
from copy import deepcopy
from lxml import etree
from lxml import html
some_xml_data = "<span>text1<div>ddd</div>text2<div>ddd</div>text3</span>"
root = etree.fro
下面的测试读取一个文件,并使用lxml.html生成页面的DOM/图形的叶节点。
但是,我也在尝试弄清楚如何从“字符串”中获取输入。使用
lxml.html.fromstring(s)
不起作用,因为这会生成一个“元素”,而不是"ElementTree“。
因此,我正在尝试弄清楚如何将元素转换为ElementTree。
思考
测试代码::
import lxml.html
from lxml import etree # trying this to see if needed
# to convert from eleme
我试图用将一个HTML表解析为python (2.7)。当我尝试使用字符串的前两种方法之一时(如示例中所示),它的工作原理非常完美。但是,当我试图在etree.xml页面上使用urlib时,我会得到一个错误。我检查了每一个解决方案,我传递的变量也是str。关于下列代码:
from lxml import etree
import urllib
yearurl="http://www.boxofficemojo.com/yearly/chart/?yr=2014&p=.htm"
s=urllib.urlopen(yearurl).read()
print type (s)
一个小问题,真的卡在这里了,我不明白发生了什么,我只是想从网页上解析一个普通的xhtml,没有什么特别的…
下面是错误:
File "class/page.py", line 85, in xslParse
doc = lxml.etree.fromstring(self.content)
File "lxml.etree.pyx", line 2753, in lxml.etree.fromstring (src/lxml/lxml.etree.c:54647)
File "parser.pxi", line 1578
我有一个问题与包装图像与div。
from lxml.html import fromstring
from lxml import etree
tree = fromstring('<img src="/img.png"/> some text')
div = etree.Element('div')
div.insert(0, tree.find('img'))
tree.insert(0, div)
print etree.tostring(tree)
<span><div><im
来自lxml.etree的HTML解析器似乎具有最大深度限制。如果深度超过254,解析文本将不再遍历。下面是一个python代码片段,演示如下:
import lxml.etree as etree
# Setup HTML tabs
x = "<span>"
x_ = "</span>"
# Set recursion depth to 255
depth = 255
# Construct and parse using lxml.etree.HTML
# This gives an empty list []
print(e
我想在python中使用xpath。我试过了
import xml.etree.ElementTree as ET
由于这个库的使用有限,在google上搜索了很长一段时间后,我不得不使用lxml。我在安装过程中遇到了几个问题,最后我安装了lxml,但是当我使用
from lxml import etree
它返回一个错误,如下所示。你能告诉我这个问题的解决方案吗!
Traceback (most recent call last):
File "<pyshell#0>", line 1, in <module>
from lxml import etr
在运行python脚本时,我收到了以下错误:
Traceback (most recent call last):
File "/var/scripts/SchoolClosureManager/SchoolClosureManager.py", line 210, in <module>
runnable.run()
File "/var/scripts/SchoolClosureManager/SchoolClosureManager.py", line 18, in run
reporter = SchoolClosur
我使用lxml从网页中提取数据,但无法将结果的ElementUnicode对象转换为字符串。这是我的代码:
from lxml import html
from lxml import etree
from lxml.etree import tostring
url = 'https://www.imdb.com/title/tt5848272/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2413b25e-e3f6-4229-9efd-599bb9ab1f97&pf_rd_r=9S5A89ZHEXE4K8SZBC40&pf_rd_s=ri
我有一台。请下载并另存为blog.xml。这是我在Google-blogger中的文件列表,我写了一些代码来解析它,lxml中有一些东西。
code1:
from stripogram import html2text
import feedparser
d = feedparser.parse('blog.xml')
for num,entry in enumerate(d.entries):
string=entry.content[0]['value'].encode("utf-8")
print html2text(s