首页
学习
活动
专区
工具
TVP
发布
社区首页 >问答首页 >BeautifulSoup:尝试从多行中提取数据

BeautifulSoup:尝试从多行中提取数据
EN

Stack Overflow用户
提问于 2018-07-07 05:22:03
回答 2查看 110关注 0票数 1

我是编程新手。我正在尝试写一个程序来抓取我当地(坦帕)的月亮落下的时间,并在我输入日期时显示它。

下面是我的代码:

代码语言:javascript
复制
from bs4 import BeautifulSoup
import urllib.request
def GetMoonSet():
  # setup the source
  with urllib.request.urlopen("https://www.timeanddate.com/moon/usa/tampa") as url:
    req = url.read()

  soup = BeautifulSoup(req, "html.parser")
  the_rows = soup('table', {'id': "tb-7dmn"})[0].tbody('tr')

  day1 = the_rows[0].findChildren('td')
  day2 = the_rows[1].findChildren('td')
  day3 = the_rows[2].findChildren('td')
  day4 = the_rows[3].findChildren('td')
  day5 = the_rows[4].findChildren('td')
  day6 = the_rows[5].findChildren('td')
  day7 = the_rows[6].findChildren('td')
  day8 = the_rows[7].findChildren('td')
  day9 = the_rows[8].findChildren('td')
  day10 = the_rows[9].findChildren('td')
  day11 = the_rows[10].findChildren('td')
  day12 = the_rows[11].findChildren('td')
  day13 = the_rows[12].findChildren('td')
  day14 = the_rows[13].findChildren('td')
  day15 = the_rows[14].findChildren('td')
  day16 = the_rows[15].findChildren('td')
  day17 = the_rows[16].findChildren('td')
  day18 = the_rows[17].findChildren('td')
  day19 = the_rows[18].findChildren('td')
  day20 = the_rows[19].findChildren('td')
  day21 = the_rows[20].findChildren('td')
  day22 = the_rows[21].findChildren('td')
  day23 = the_rows[22].findChildren('td')
  day24 = the_rows[23].findChildren('td')
  day25 = the_rows[24].findChildren('td')
  day26 = the_rows[25].findChildren('td')
  day27 = the_rows[26].findChildren('td')
  day28 = the_rows[27].findChildren('td')
  day29 = the_rows[28].findChildren('td')
  day30 = the_rows[29].findChildren('td')

  what_date = input("Please enter a date for this month ")

  if what_date == "1":
    print("The moon will set at " + day1[1].text)
  elif what_date == "2":
    print("The moon will set at " + day2[1].text)
  elif what_date == "3":
    print("The moon will set at " + day3[1].text)
  elif what_date == "4":
    print("The moon will set at " + day4[1].text)
  elif what_date == "5":
    print("The moon will set at " + day5[1].text)
  elif what_date == "6":
    print("The moon will set at " + day6[1].text)
  elif what_date == "7":
    print("The moon will set at " + day7[1].text)
  elif what_date == "8":
    print("The moon will set at " + day8[1].text)
  elif what_date == "9":
    print("The moon will set at " + day9[1].text)
  elif what_date == "10":
    print("The moon will set at " + day10[1].text)
  elif what_date == "11":
    print("The moon will set at " + day11[1].text)
  elif what_date == "12":
    print("The moon will set at " + day12[1].text)
  elif what_date == "13":
    print("The moon will set at " + day13[1].text)
  elif what_date == "14":
    print("The moon will set at " + day14[1].text)
  elif what_date == "15":
    print("The moon will set at " + day15[1].text)
  elif what_date == "16":
    print("The moon will set at " + day16[1].text)
  elif what_date == "17":
    print("The moon will set at " + day17[1].text)
  elif what_date == "18":
    print("The moon will set at " + day18[1].text)
  elif what_date == "19":
    print("The moon will set at " + day19[1].text)
  elif what_date == "20":
    print("The moon will set at " + day20[1].text)
  elif what_date == "21":
    print("The moon will set at " + day21[1].text)
  elif what_date == "22":
    print("The moon will set at " + day22[1].text)
  elif what_date == "23":
    print("The moon will set at " + day23[1].text)
  elif what_date == "24":
    print("The moon will set at " + day24[1].text)
  elif what_date == "25":
    print("The moon will set at " + day25[1].text)
  elif what_date == "26":
    print("The moon will set at " + day26[1].text)
  elif what_date == "27":
    print("The moon will set at " + day27[1].text)
  elif what_date == "28":
    print("The moon will set at " + day28[1].text)
  elif what_date == "29":
    print("The moon will set at " + day29[1].text)
  elif what_date == "30":
    print("The moon will set at " + day30[1].text)
  else:
     print("Please enter a different number (e.g. 4, 5, 28, 30")

GetMoonSet()

我确信它看起来不是最棒的,但我在提取数据时遇到了麻烦。从第4天到第17天,第一列发生了月亮升起。当我请求数据时,由于新信息的缘故,它让我少了一栏。我知道我可以将4-17更新为42.text,但下个月将不同,它将不再起作用。

当我输入2时,它显示:月亮将在上午10:22下沉。

当我输入4时,它显示:月亮将在↑(99°)下沉。

我这样做是不是太难了?有没有办法通过find_all只提取出月落的时间?

谢谢!

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2018-07-07 14:13:10

正如其他人提到的,使用循环重复解析月落时间并将数据保存在字典中,以便稍后在用户输入日期时使用它来提取月落时间。

下面的代码实现了这一点:

代码语言:javascript
复制
from bs4 import BeautifulSoup
import urllib.request

# Initialize a dictionary
d = {}

# Function to extract moonset times
def GetMoonSet():
    global d

    print("Extracting moonset times...")
    # setup the source
    with urllib.request.urlopen("https://www.timeanddate.com/moon/usa/tampa") as url:
        req = url.read()

    soup = BeautifulSoup(req, "html.parser")
    the_rows = soup('table', {'id': "tb-7dmn"})[0].tbody('tr')

    for row in the_rows:
        col_header = row.findChildren("th")[0]
        day = col_header.getText().strip(' \t\n\r')
        d[day] = 'NA'
        cols = row.findChildren("td")
        for col in cols:
            if col.get('title') != None and col.get('title').startswith("The Moon sets in") and col.get('class') != None and len(col.get('class')) == 2:
                d[day] = col.getText()
                continue

# Collect moonset times
GetMoonSet()

# Ask for date from user and print the corresponding moonset time
while True:
    date = input("Please enter a valid date for this month: ")
    if int(date) < 1 or int(date) > 31:
        continue
    else:
        print("Moonset time on {} is {}.".format(date, d[date]))
        break

此代码将输出:

代码语言:javascript
复制
Extracting moonset times...
Please enter a valid date for this month: 5
Moonset time on 5 is 1:01 pm.
票数 0
EN

Stack Overflow用户

发布于 2018-07-07 06:08:05

这个表看起来就像是不会被解析的!看起来title可能是您需要的密钥:

代码语言:javascript
复制
for i in soup.table.tbody.find_all(class_="pdr0", title=re.compile("^The Moon sets ")):
  print(i.get_text())

并且,为了使您正在尝试的内容更紧凑:

代码语言:javascript
复制
msets = {}
title=re.compile("^The Moon sets ")
for row in soup.table.tbody.find_all('tr'):
  day  = row['data-day']
  mset = row.find(title=title)
  if day and mset: msets[day] = mset.get_text()

what_date = input("Please enter a date for this month: ")
if what_date in msets:
  print("the moon will set at " + msets[what_date])
else:
  print("i don't know about that date.")

编程时的一个经验法则--如果你发现自己一遍又一遍地重复同样的事情,你可能需要一个循环。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/51217764

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档