因此,我编写了代码并运行它,获得了.xlsx文件,但输出的顺序与我在代码中输入的Url列表的顺序不同。
#importing the libraries
import re
import lxml
import chardet
from os import truncate
import bs4
from bs4 import BeautifulSoup
import multiprocessing
import requests
import pandas as pd
from fake_useragent import UserAgent
import numpy as np
urls = list(('https://isabad.com/advanced-professional-email-templates-opencart-extension' ,
'https://isabad.com/seo-basic-pack-opencart-extension',
'https://isabad.com/x-shipping-pro',
'https://isabad.com/bot-blocker-opencart-extension',
'https://isabad.com/opencart-mobile-application'
))
dit = {}
user_agent = UserAgent()
for url in urls:
data = requests.get(url, headers={"user-agent": user_agent.chrome})
soup = bs4.BeautifulSoup(data.content, "lxml")
dit[url] = soup.find_all("title")
ex = pd.DataFrame({"title": dit ,})
print(ex)
ex.to_excel('sasa.xlsx', index=False, engine='xlsxwriter')
我如何解决这个问题?
发布于 2021-01-20 23:10:08
您正在使用set
数据结构来存储URL列表,而Python中的set
数据结构是一个无序数据结构。要以相同的顺序输出,您应该将URL存储在list
数据结构中,如下所示:
urls = [
'https://www.sample.com/search/category-mobile/' ,
'https://www.sample.com/search/category-tablet-ebook-reader',
'https://www.sample.com/search/category-laptop/',
'https://www.sample.com/search/category-computer-parts/',
'https://www.sample.com/search/category-office-machines/'
]
干杯!
发布于 2021-01-20 23:17:27
使用list
,这样结果的顺序将与您定义的顺序相同。
urls = ['https://www.sample.com/search/category-mobile/' ,
'https://www.sample.com/search/category-tablet-ebook-reader',
'https://www.sample.com/search/category-laptop/',
'https://www.sample.com/search/category-computer-parts/',
'https://www.sample.com/search/category-office-machines/'
]
https://stackoverflow.com/questions/65812129
复制相似问题