文章/答案/技术大牛

发布

社区首页 >问答首页 >在HTML文本更改时不更新变量(webscraping)

问在HTML文本更改时不更新变量(webscraping)
EN

Stack Overflow用户

提问于 2021-01-16 13:03:53

回答 1查看 301关注 0票数 0

我对python很陌生，我不明白为什么这不起作用，但我把问题缩小到了一行代码上。

这个机器人的目的是从一个网站刮HTML (使用美丽和张贴时，文字变化不和谐。我使用FC2和FR2 (flightcategory2和flightrestrictions2)作为代码的内存变量，以便在每次运行时对其进行检查。如果它们是相同的，代码将等待_ the并再次检查，如果它们不同，则会发布它。

但是，在运行此代码时，变量"flightCategory“"flightRestrictions”在代码第一次运行时会发生更改，但由于某种原因，当网站上的HTML发生更改时，将停止更改。所讨论的行是这个if循环。

if 1==1: # using 1==1 so this loop constantly runs for testing, otherwise I have it set for a time
        flightCategory, flightRestrictions = und.getInfo()

在调试模式下，代码会运行，但是代码中的变量不会更新，我很困惑为什么它们会在第一次运行代码时更新，而不是连续的时间。这一行对于我的代码操作至关重要。

下面是代码的简写版本，以便于阅读。我很感谢你的帮助。

FC2 = 0
FR2 = 0
flightCategory = ""
flightRestrictions = ""

class UND:
    def __init__(self):
        page = requests.get("http://sof.aero.und.edu")
        self.soup = BeautifulSoup(page.content, "html.parser")

    def getFlightCategory(self):    # Takes the appropriate html text and sets it to a variable
        flightCategoryClass = self.soup.find(class_="auto-style1b")
        return flightCategoryClass.get_text()

    def getRestrictions(self):  # Takes the appropriate html text and sets it to a variable
        flightRestrictionsClass = self.soup.find(class_="auto-style4")
        return flightRestrictionsClass.get_text()

    def getInfo(self):
        return self.getFlightCategory(), self.getRestrictions()

und = UND()
while 1 == 1:
    if 1==1:    #using 1==1 so this loop constantly runs for testing, otherwise I have it set for a time
        flightCategory, flightRestrictions = und.getInfo()   (scrape the html from the web)
        if flightCategory == FC2 and flightRestrictions == FR2:  # if previous check is the same as this check then skip posting
            Do Something
        elif flightCategory != FC2 or flightRestrictions != FR2:  # if any variable has changed since the last time
            FC2 = flightCategory  # set the comparison variable to equal the variable
            FR2 = flightRestrictions
            if flightRestrictions == "Manager on Duty:":  # if this is seen only output category
               Do Something
            elif flightRestrictions != "Manager on Duty:":
                Do Something
    else:
        print("Outside Time")
        time.sleep(5)  # Wait _ seconds. This would be set for 30 min but for testing it is 5 seconds. O

beautifulsoup

python

variables

debugging

web-scraping

Stack Overflow用户

回答已采纳

发布于 2021-01-16 13:23:44

根据您的代码，您只在创建UND类的实例时才向http://sof.aero.und.edu发送请求。因此，在循环期间，永远不会更新实例的soup属性，并且会不断地获取过时的值。

您可以让它使用以下逻辑：

class UND:
    def __init__(self):
        pass

    def scrape(self):
        page = requests.get("http://sof.aero.und.edu")
        self.soup = BeautifulSoup(page.content, "html.parser")

    ## SOME CODE

und = UND()
while 1 == 1:
    und.scrape() # We scrape the website at the beginning of each loop iteration

    ## SOME OTHER CODE

票数 0

查看全部 1 条回答

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/65749969

复制

相似问题

问在HTML文本更改时不更新变量(webscraping)
EN

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在HTML文本更改时不更新变量(webscraping)EN

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在HTML文本更改时不更新变量(webscraping)
EN