我对None元素有问题,我添加了if来测试这一点,但它仍然不起作用:
soup = BeautifulSoup(htmlpage, "lxml")
element = soup.find(None, "div", class_='12345').find('a')
if element is not None:
print "Your element is: " + element.text
else:
print "No element"
错误:
page =
我试图从他们的产品名称(例如,"Tote“从”白金人造专利皮革Tote")的款式。这是我的密码:
from urllib.request import urlopen
from bs4 import BeautifulSoup
saksurl="http://www.saksfifthavenue.com/Handbags/shop/_/N-52jzot/Ne- 6lvnb5?FOLDER%3C%3Efolder_id=2534374306622829"
html = urlopen(saksurl)
bsObj = B
from bs4 import BeautifulSoup
import requests
import time
urls = ['http://www.soku.com/search_playlist/q_python_orderby_1_limitdate_0?site=14&page={}&spm=a2h0k.8191403.0.00'.format(str(i)) for i in range(1,30,1)]
def UUrl(urls):
def Url(url):
single_urls = []
t
我想刮掉段落中的所有文字。下面是我所做的链接和代码:注意:当前要先提取10页。
import requests
from bs4 import BeautifulSoup
import numpy as np
import pandas as pd
import re
import time
#create a list
Title = []
Paragraph = []
#scrape the first 10 page data(the no of pages can be changed)
pages = np.arange(1, 10)
for page in pages:
我正在尝试抓取这里的网站:。使用如下代码:
from bs4 import BeautifulSoup
import urllib.request
html = urllib.request.urlopen("ftp://ftp.sec.gov/edgar/daily-index/")
soup = BeautifulSoup(line, "lxml")
soup.a # or soup.find_all('a') neither of them works
#return None.
请帮帮忙,我真的很沮丧。我怀疑是标签导致了问题。站点的超
例如:
<p>I am in a paragraph element!</p>
I am plaintext!
如何通过调用I am plaintext! 4在BeautifulSoup 4中获取find("p")文本
我已经试过了:
from bs4 import BeautifulSoup
soup = BeautifulSoup("...", "html.parser")
soup.find("p").findNextSibling()
# Returns None
我正在寻找一种更优雅的方法来声明变量值,其中函数可能返回None,并且函数调用后面有链式方法。
在下面的示例中,我使用BeautifulSoup传递HTML,如果找不到要查找的元素,则初始函数调用返回None。然后,链式方法会中断代码,因为.string不是None对象的方法。
这一切都是有意义的,但我想知道是否有一种更简洁的方法来编写这些不会破坏None值的变量声明。
# I want to do something like this but it throws error if soup.find returns
# none because .string is not a metho
尝试使用BeautifulSoup,但我得到了这个错误:
AttributeError: 'NoneType' object has no attribute 'string'
问题出在这条线路上
_a = _dev.find('a')
下面是我的代码:
for _dev in devs:
_d = _dev.find('div')
authors.append(_d.text.strip())
_a = _dev.find('a')
if not _a.string is None:
names.app
我试图从一家公司的上市网站上获取联系信息,网站上使用漂亮的汤进行抓取。
联系人信息存储在带有id=‘valuephone_’或'valuewebsite_‘的span标记中
from bs4 import BeautifulSoup
import requests
url = "https://www.timesbusinessdirectory.com/company-listings"
html=requests.get(url)
soup=BeautifulSoup(html.text,'lxml')
for i in soup.find_al
我正在使用Python3和BeautifulSoup模块4.9.3版本。我试图使用这个包来练习解析一些简单的HTML。
我拥有的字符串如下:
text = '''<li><p>Some text</p>is put here</li><li><p>And other text is put here</p></li>'''
我使用BeautifulSoup的方式如下:
x = BeautifulSoup(text, "html.parser
我一直在编写一个脚本,使用Python和模块BeautifulSoup和requests从URL中获取一些数据。我将代码分成几个函数来添加一些代码模块化,尽管我希望代码可以很好地工作,但使用AttributeError时有时会失败,更准确地说,这是我得到的错误: Traceback (most recent call last):
File "stats_tracker.py", line 140, in <module>
print_interval(60)
File "stats_tracker.py", line 108, i
from bs4 import BeautifulSoup
from urllib.request import urlopen as uReq
import requests
url = 'https://en.wikisource.org/wiki/Main_Page'
r = requests.get(url)
Soup = BeautifulSoup(r.text, "html5lib")
List = Soup.find("div",class_="enws-mainpage-widget-content", id