首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >在python中打开DBF文件时遇到的问题

在python中打开DBF文件时遇到的问题
EN

Stack Overflow用户
提问于 2019-07-26 08:01:48
回答 4查看 9.2K关注 0票数 4

我正在尝试打开en转换几个DBF文件到dataframe。它们大多数工作正常,但对于其中一个文件,我收到了错误:"UnicodeDecodeError:'utf-8‘编解码器无法解码位置15的字节0xf6 :无效的开始字节“

我在其他一些主题上读到了这个错误,比如打开csv和xlsx以及其他文件。建议的解决方案是在读取文件部分中包括encoding = 'utf-8'。不幸的是,我还没有找到DBF文件的解决方案,而且我对DBF文件的了解也非常有限。

到目前为止,我已经尝试过:

1)

代码语言:javascript
运行
复制
from dbfread import DBF
dbf = DBF('file.DBF')
dbf = pd.DataFrame(dbf)

UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 8: character maps to <undefined>

2)

代码语言:javascript
运行
复制
from simpledbf import Dbf5
dbf = Dbf5('file.DBF')
dbf = dbf.to_dataframe()

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf6 in position 15: invalid start byte

3)

代码语言:javascript
运行
复制
# this block of code copied from https://gist.github.com/ryan-hill/f90b1c68f60d12baea81 
import pysal as ps

def dbf2DF(dbfile, upper=True): #Reads in DBF files and returns Pandas DF
    db = ps.table(dbfile) #Pysal to open DBF
    d = {col: db.by_col(col) for col in db.header} #Convert dbf to dictionary
    #pandasDF = pd.DataFrame(db[:]) #Convert to Pandas DF
    pandasDF = pd.DataFrame(d) #Convert to Pandas DF
    if upper == True: #Make columns uppercase if wanted 
        pandasDF.columns = map(str.upper, db.header) 
    db.close() 
    return pandasDF

dfb = dbf2DF('file.DBF')

AttributeError: module 'pysal' has no attribute 'open'

最后,如果我试图安装dbfpy模块,我会收到: SyntaxError:无效语法

对如何解决这个问题有什么建议吗?

EN

Stack Overflow用户

发布于 2022-09-01 02:53:52

对于所有在这个问题上帮助我的人来说,我必须修复一个损坏的.dbf文件(因此来自一个.dbf,必须返回到一个.dbf)。我的特别问题是整个.dbf的日期.只是大错特错..。并尝试并失败了通过许多方法,有许多错误,破译和重新组装.在取得以下成果之前:

代码语言:javascript
运行
复制
#Modify dbase3 file to recast null date fields as a default date and 
#reimport back into dbase3 file

import collections
import datetime
from typing import OrderedDict
import dbf as dbf1
from simpledbf import Dbf5
from dbfread import DBF, FieldParser
import pandas as pd
import numpy as np

#Default date to overwrite NaN values
blank_date = datetime.date(1900, 1, 1)

#Read in dbase file from Old Path and point to new Path
old_path = r"C:\...\ex.dbf"
new_path = r"C:\...\newex.dbf"

#Establish 1st rule for resolving corrupted dates
class MyFieldParser(FieldParser):
    def parse(self, field, data):
        try:
            return FieldParser.parse(self, field, data)
        except ValueError:
            return blank_date

#Collect the original .DBF data while stepping over any errors
table = DBF(old_path, None, True, False, MyFieldParser, collections.OrderedDict, False, False, False,'ignore')

#Grab the Header Name, Old School Variable Format, and number of characters/length for each variable
dbfh = Dbf5(old_path, codec='utf-8')
headers = dbfh.fields
hdct = {x[0]: x[1:] for x in headers}
hdct.pop('DeletionFlag')
keys = hdct.keys()

#Position of Type and Length relative to field name
ftype = 0
characters = 1

# Reformat and join all old school DBF Header fields in required format
fields = list()

for key in keys:
    ftemp = hdct.get(key)
    k1 = str(key)
    res1 = ftemp[ftype]
    res2 = ftemp[characters]
    if k1 == "decimal_field_name":
        fields.append(k1 + " " + res1 + "(" + str(res2) + ",2)")
    elif res1 == 'N':
        fields.append(k1 + " " + res1 + "(" + str(res2) + ",0)")
    elif res1 == 'D':
        fields.append(k1 + " " + res1)
    elif res1 == 'L':
        fields.append(k1 + " " + res1)
    else: 
        fields.append(k1 + " " + res1 + "(" + str(res2) + ")")


addfields = '; '.join(str(f) for f in fields)

#load the records of the.dbf into a dataframe
df = pd.DataFrame(iter(table))

#go ham reformatting date fields to ensure they are in the correct format
df['DATE_FIELD1'] = df['DATE_FIELD1'].replace(np.nan, blank_date)

df['DATE_FIELD1'] = pd.to_datetime(df['DATE_FIELD1'])


# eliminate further errors in the dataframe
df = df.fillna('0')

#drop added "record index" field from dataframe
df.set_index('existing_primary_key', inplace=False)


#initialize defaulttdict and convert the dataframe into a .DBF appendable format
dd = collections.defaultdict(list)
records = df.to_dict('records',into=dd)

#create the new .DBF file
new_table = dbf1.Table(new_path, addfields)

#append the dataframe to the new .DBF file
new_table.open(mode=dbf1.READ_WRITE)

for record in records:
    new_table.append(record)

new_table.close()
票数 0
EN
查看全部 4 条回答
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/57215656

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档