文章/答案/技术大牛

发布

社区首页 >问答首页 >我如何在木星笔记本中为这个循环释放CPU资源呢？

问我如何在木星笔记本中为这个循环释放CPU资源呢？
EN

Stack Overflow用户

提问于 2021-10-12 08:17:53

回答 1查看 154关注 0票数 1

我试图每天在木星笔记本(来自deepnote.com)中运行一个自动化进程，但是在运行了 while loop 的第一次迭代并启动下一次迭代(在while loop内的for loop )之后，虚拟机崩溃了，抛出了如下消息：

KernelInterrupted:被木星内核中断的执行

下面是代码：

.
.
.

while y < 5:
    print(f'\u001b[45m Try No. {y} out of 5 \033[0m')

    #make the driver wait up to 10 seconds before doing anything.

    driver.implicitly_wait(10)

    #values for the example.
    #Declaring several variables for looping.
    #Let's start at the newest page.

    link = 'https...'
    driver.get(link)

    #Here we use an Xpath element to get the initial page

    initial_page = int(driver.find_element_by_xpath('Xpath').text)
    print(f'The initial page is the No. {initial_page}')   
    final_page = initial_page + 120
    
    pages = np.arange(initial_page, final_page+1, 1)
    minimun_value = 0.95
    maximum_value = 1.2
    
    #the variable to_place is set as a string value that must exist in the rows in order to be scraped.
    #if it doesn't exist it is ignored.
    to_place = 'A particular place'

    #the same comment stated above is applied to the variable POINTS.
    POINTS = 'POINTS'

    #let's set a final dataframe which will contain all the scraped data from the arange that
    #matches with the parameters set (minimun_value, maximum value, to_place, POINTS).
    df_final = pd.DataFrame()
    dataframe_final = pd.DataFrame()
    #set another final dataframe  for the 2ND PART OF THE PROCESS.
    initial_df = pd.DataFrame()

    #set a for loop for each page from the arange.
    for page in pages:
        #INITIAL SEARCH.
        #look for general data of the link.
        #amount of results and pages for the execution of the for loop, "page" variable is used within the {}. 
        url = 'https...page={}&p=1'.format(page)
        
        print(f'\u001b[42m Current page: {page} \033[0m '+'\u001b[42m Final page: '+str(final_page)+'\033[0m '+'\u001b[42m Page left: '+str(final_page-page)+'\033[0m '+'\u001b[45m Try No. '+str(y)+' out of '+str(5)+'\033[0m'+'\n')
        driver.get(url)
        #Here we order the scrapper to try finding the total number of subpages a particular page has if such page IS NOT empty.
        #if so, the scrapper will proceed to execute the rest of the procedure.
        try:
            subpages = driver.find_element_by_xpath('Xpath').text
            print(f'Reading the information about the number of subpages of this page ... {subpages}')
            subpages = int(re.search(r'\d{0,3}$', subpages).group())
            print(f'This page has {subpages} subpages in total')
                            
            df = pd.DataFrame()
            df2 = pd.DataFrame()
            
            print(df)
            print(df2)
            
            #FOR LOOP.
            #search at each subpage all the rows that contain the previous parameters set.
            #minimun_value, maximum value, to_place, POINTS.
            
            #set a sub-loop for each row from the table of each subpage of each page
            for subpage in range(1,subpages+1):
            
                url = 'https...page={}&p={}'.format(page,subpage)
                driver.get(url)
                identities_found = int(driver.find_element_by_xpath('Xpath').text.replace('A total of ','').replace(' identities found','').replace(',',''))
                identities_found_last = identities_found%50
                
                print(f'Página: {page} de {pages}') #AT THIS LINE CRASHED THE LAST TIME
                .
                .
                .
        #If the particular page is empty
        except:
            print(f'This page No. {page} IT'S EMPTY ¯\_₍⸍⸌̣ʷ̣̫⸍̣⸌₎_/¯, ¡NEXT! ')             
    .  
    .
    .

    y += 1

最初我认为KernelInterrupted Error是由于虚拟机在运行第二次迭代时缺少虚拟内存而抛出的。

但经过几次测试后，我发现我的程序根本不消耗RAM，因为在内核崩溃之前，虚拟RAM在所有过程中都没有太多的变化，我可以保证。

所以现在我想，也许我的虚拟机的虚拟CPU是导致内核崩溃的原因，但如果是这样的话，我只是不明白为什么，这是我第一次不得不处理这种情况，这个程序在我的PC中运行得很好。

这里有任何数据科学家或机器学习工程师可以帮助我吗？提前谢谢。

deepnote

python

debugging

web-scraping

memory-management

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-10-13 02:45:06

我在Deepnote社区论坛上找到了答案，简单地说，这个平台的“自由层”机器并不能保证永久的操作(24小时/7小时)，而不管它们在VM中执行什么程序。

就这样。问题解决了。

票数 2

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/69537171

复制

相似问题

问我如何在木星笔记本中为这个循环释放CPU资源呢？
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问我如何在木星笔记本中为这个循环释放CPU资源呢？EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问我如何在木星笔记本中为这个循环释放CPU资源呢？
EN