背景:
我正在学习网络抓取,并决定使用python和美丽的汤来抓取,这个程序将要求用户的链接,并将缩小他们的HTML搜索在网页上。
问题:
当我要求用户为soup页面定义自己的扩展名(例如.div.div.a )时,我将其附加到整个字符串,并尝试在打印函数中执行它,它总是返回None。如何从收集到的用户输入中运行扩展并打印它?在本例中,我正在抓取Newegg中的显卡搜索。
链接示例:https://www.newegg.com/Video-Cards-Video-Devices/Category/ID-38?Tpk=graphics%20cards
请记住,在下面的代码中,我已经为div class="item-info“使用了findAll,所以它将在该代码块中搜索扩展。
我已经尝试过对字符串执行exec(),但似乎不起作用
isdone = ""
while isdone != "done":
try:
route = "container"
userinput = input("what extensions would you like to search for?\n seperate each denotion with a space \n ex: div div img[\"title\"]\n: ")
inputRoute = userinput.split(' ')
for i in range(len(inputRoute)):
route += "." + inputRoute[i]
print("---\n"+route+"\n---")
print("Current Route ^\n---")
print("output:\n", exec(route),"\n---")#actual resaults if user had inputed a
print(container.a) # what i actually want to output (if the user only inputed a)
#add the abilitie to add extensions ex: container.div.a.img["foo"] -ignore this stackoverflow
isdone = input("are you happy with these extensions? \n type 'done' when happy\n or enter to change extension\n: ")
except Exception as e:
print(e)
input("Make sure their is no leftover spaces\npress enter to continue")
'#‘是我在整个输出中的注释这是控制台输出:
'what extensions would you like to search for?
seperate each denotion with a space
ex: div div img["title"]
: a # <--what I put in the input
---
container.a #what
---
Current Route ^
---
output:
None # <-- what actually outputs when i use exec()
---
<a class="item-brand" href="https://www.newegg.com/EVGA/BrandStore/ID-1402">
<img alt="EVGA" class="lazy-img" data-effect="fadeIn" data-src="//c1.neweggimages.com/Brandimage_70x28//Brand1402.gif" src="//c1.neweggimages.com/WebResource/Themes/2005/Nest/blank.gif" title="EVGA">
</img></a>
are you happy with these extensions?
type 'done' when happy
or enter to change extension
:'
发布于 2019-05-28 07:40:04
如果container
是您的BeautifulSoup
对象,那么eval('container.a')
将返回所有<a>
标记的列表。在您的情况下,使用eval
或exec
可能不是一个好主意,但是,请参阅Why should exec() and eval() be avoided?
我建议使用find_all和它的attrs
参数,尽管解析输入可能会比您目前预期的要难得多。
https://stackoverflow.com/questions/56333131
复制相似问题