我有一个使用BeautifulSoup的输出。
h2
无用)我的代码:
import requests
from bs4 import BeautifulSoup
url = 'https://www.planetware.com/tourist-attractions-/oslo-n-osl-oslo.htm'
url_get = requests.get(url)
soup = BeautifulSoup(url_get.content, 'html.parser')
attraction_place=soup.find_all('h2', class_="sitename")
for attraction in attraction_place:
print(attraction.text)
type(attraction)
输出:
1 Vigeland Sculpture Park
2 Akershus Fortress
3 Viking Ship Museum
4 The National Museum
5 Munch Museum
6 Royal Palace
7 The Museum of Cultural History
8 Fram Museum
9 Holmenkollen Ski Jump and Museum
10 Oslo Cathedral
11 City Hall (Rådhuset)
12 Aker Brygge
13 Natural History Museum & Botanical Gardens
14 Oslo Opera House and Annual Music Festivals
Where to Stay in Oslo for Sightseeing
Tips and Tours: How to Make the Most of Your Visit to Oslo
More Related Articles on PlanetWare.com
我希望看到这样的列表:
attraction=[Vigeland Sculpture Park, Akershus Fortress, ......]
非常提前感谢您。
发布于 2019-06-06 05:54:04
一个不错的简单方法是获取照片的alt
属性。这可以得到干净的文本输出,并且只有14个,而不需要任何切片/索引。
from bs4 import BeautifulSoup
import requests
r = requests.get('https://www.planetware.com/tourist-attractions-/oslo-n-osl-oslo.htm')
soup = bs(r.content, 'lxml')
attractions = [item['alt'] for item in soup.select('.photo [alt]')]
print(attractions)
发布于 2019-06-06 05:46:32
new = []
count = 1
for attraction in attraction_place:
while count < 15:
text = attraction.text
new.append(text)
count += 1
发布于 2019-06-06 06:00:14
您可以使用slice。
for attraction in attraction_place[:14]:
print(attraction.text)
type(attraction)
https://stackoverflow.com/questions/56468212
复制相似问题