文章/答案/技术大牛

发布

社区首页 >问答首页 >美丽汤错误："openpyxl.utils.exceptions.illegalcharactererror“

问美丽汤错误："openpyxl.utils.exceptions.illegalcharactererror“
EN

Stack Overflow用户

提问于 2022-11-27 03:16:26

回答 1查看 28关注 0票数 -1

我试图从本地保存在我的硬盘中的html文件中提取文本。然后将它们粘贴到excel文件中的每一行中。在Mac上这样做，下面是完整的代码：

# install/import all prerequisites first
# from cgitb import text
from openpyxl import Workbook, load_workbook
from bs4 import BeautifulSoup


# create a question that asks how many files you have
i = 1
n = int(input("How many files ? "))
# final_n = n - 1


# the list of files
files = []


# the list of files only has 1 file contained by default
# while loop will create multiple files in the list so that I don't have to do the tedious work
while i <= n:
    files.append("folder/SplashBidNoticeAbstractUI (" + str(i) +").html")
    i = i+1


# load an existing Libreoffice Calc file
wb = Workbook()
ws = wb.active
ws.title = "Data"


# add the titles on the first row, each column with the respective title
ws.append(["DatePublished", "Closing Date", "Category", "Procuring Entity", "Approved Budget for the Contract", "Name", "Delivery Period", "Reference Number", "Title", "Area of Delivery", "Solicitation Number", "Contact"])


# the actual magic.
# extract desired data from the html files and then
# paste in the active Libreoffice Calc file

for i in files:
    with open(i, "r", errors="ignore") as html_file:
        content = html_file.read()  # does something
        soup = BeautifulSoup(content, "html.parser")  # does something

        # extracts data from the webpages
        
        if soup.find("span", id="lblDisplayDatePublish") != None:
            datePublished = soup.find("span", id="lblDisplayDatePublish").text
        else: datePublished = ""

        if soup.find("span", id="lblDisplayCloseDateTime") != None:
            cd  = soup.find("span", id="lblDisplayCloseDateTime").text
        else: cd = ""

        if soup.find("span", id="lblDisplayCategory") != None:
            cat = soup.find("span", id="lblDisplayCategory").text
        else: cat = ""

        if soup.find("span", id="lblDisplayProcuringEntity") != None:
            pro_id = soup.find("span", id="lblDisplayProcuringEntity").text.replace("", "")
        else: pro_id = ""

        if soup.find("span", id="lblDisplayBudget") != None:
            abc = soup.find("span", id="lblDisplayBudget").text
        else: abc = ""

        if soup.find("span", id="lblHeader") != None:
            name = soup.find("span", id="lblHeader").text.replace(" ", "_").replace("\n", "_")
        else: name = ""

        if soup.find("span", id="lblDisplayPeriod") != None:
            delp = soup.find("span", id="lblDisplayPeriod").text
        else: delp = ""

        if soup.find("span", id="lblDisplayReferenceNo")!= None:
            ref_num = soup.find("span", id="lblDisplayReferenceNo").text
        else: ref_num = ""

        if soup.find("span", id="lblDisplayTitle")!= None:
            title = soup.find("span", id="lblDisplayTitle").text.replace(" ", "_").replace("\n", "_")
        else: title = ""

        if soup.find("span", id="lblDisplayAOD") != None:
            aod = soup.find("span", id="lblDisplayAOD").text.replace("\n", "_")
        else: aod = ""

        if soup.find("span", id="lblDisplaySolNumber") != None:
            solNr = soup.find("span", id="lblDisplaySolNumber").text
        else: solNr = ""

        if soup.find("span", id="lblDisplayContactPerson")!= None:
            contact = soup.find("span", id="lblDisplayContactPerson").text
        else: contact = ""

        # just an assurance that the code worked and nothing screwed up
        print("\nBid" + i)
        print("Date Published: " + datePublished)
        print("Closing Date: " + cd)
        print("Category : " + cat)
        print("Procurement Entity : " + pro_id)
        print("Name: " + name)
        print("Delivery Period: " + delp)
        print("ABC: " + abc)
        print("Reference Number : " + ref_num)
        print("Title : " + title)
        print("Area of Delivery : " + aod)
        print("Solicitation Number: "+ solNr)
        print("Contact: "+ contact)

        # pastes the data inside the calc file under the titles
        ws.append([datePublished, cd, cat, pro_id, abc, name, delp, ref_num, title, aod, solNr, contact])

# saves file so work is safe and sound
filename = input("filename: ") 
wb.save(filename + ".xlsx")

print("Saved into '" + filename + ".xlsx'.")

为所有37,820 html文件运行此代码。

我尝试将我的Python版本从3.9更新到3.10和3.11。我试过运行python3 phase3.py和python phase3.py。我还重新安装了bs4和openpyxl。不过，问题还没有解决。这是错误

openpyxl.utils.exceptions.illegalcharactererror

python-3.x

excel

beautifulsoup

openpyxl

python

回答 1

Stack Overflow用户

发布于 2022-11-27 03:47:27

当您尝试分配ASCII控制字符时，Openpyxl模块将引发此exception (例如，"\x00"、"\x01"、.)一个细胞的价值。这意味着至少有一个html文件包含此类字符。因此，您需要使用str.encode来转义这些。

代之以：

ws.append([datePublished, cd, cat, pro_id, abc, name, delp, ref_num, title, aod, solNr, contact])

通过这一点：

    try:
        list_of_vals= [datePublished, cd, cat, pro_id, abc, name, delp, ref_num, title, aod, solNr, contact]
        ws.append(list_of_valls)
    except openpyxl.utils.exceptions.illegalcharactererror:
        ws.append([cell_val.encode("ascii") for cell_val in list_of_vals]

注意:注意这里的凹痕。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/74587240

复制

相似问题

问美丽汤错误："openpyxl.utils.exceptions.illegalcharactererror“
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问美丽汤错误："openpyxl.utils.exceptions.illegalcharactererror“EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问美丽汤错误："openpyxl.utils.exceptions.illegalcharactererror“
EN