首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >熊猫无法阅读excel,因为错误

熊猫无法阅读excel,因为错误
EN

Stack Overflow用户
提问于 2022-10-23 10:18:13
回答 1查看 51关注 0票数 0

我在阅读这个excel表格时出错了。

专业Forecasters.xlsx调查

我用Python做的尝试

代码语言:javascript
运行
复制
spf = pd.read_excel(
    "ds/us/survey_of_professional_forecasters_all.xlsx"
)

错误

代码语言:javascript
运行
复制
TypeError                                 Traceback (most recent call last)
File D:\Programs\Python_Anaconda\lib\site-packages\openpyxl\descriptors\base.py:55, in _convert(expected_type, value)
     54 try:
---> 55     value = expected_type(value)
     56 except:

TypeError: an integer is required (got type datetime.date)

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
Input In [12], in <cell line: 1>()
----> 1 spf = pd.read_excel(
      2     "ds/us/survey_of_professional_forecasters_all.xlsx"
      3 )
      4 spf

File D:\Programs\Python_Anaconda\lib\site-packages\pandas\util\_decorators.py:211, in deprecate_kwarg.<locals>._deprecate_kwarg.<locals>.wrapper(*args, **kwargs)
    209     else:
    210         kwargs[new_arg_name] = new_arg_value
--> 211 return func(*args, **kwargs)

File D:\Programs\Python_Anaconda\lib\site-packages\pandas\util\_decorators.py:331, in deprecate_nonkeyword_arguments.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
    325 if len(args) > num_allow_args:
    326     warnings.warn(
    327         msg.format(arguments=_format_argument_list(allow_args)),
    328         FutureWarning,
    329         stacklevel=find_stack_level(),
    330     )
--> 331 return func(*args, **kwargs)

File D:\Programs\Python_Anaconda\lib\site-packages\pandas\io\excel\_base.py:482, in read_excel(io, sheet_name, header, names, index_col, usecols, squeeze, dtype, engine, converters, true_values, false_values, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, parse_dates, date_parser, thousands, decimal, comment, skipfooter, convert_float, mangle_dupe_cols, storage_options)
    480 if not isinstance(io, ExcelFile):
    481     should_close = True
--> 482     io = ExcelFile(io, storage_options=storage_options, engine=engine)
    483 elif engine and engine != io.engine:
    484     raise ValueError(
    485         "Engine should not be specified when passing "
    486         "an ExcelFile - ExcelFile already has the engine set"
    487     )

File D:\Programs\Python_Anaconda\lib\site-packages\pandas\io\excel\_base.py:1695, in ExcelFile.__init__(self, path_or_buffer, engine, storage_options)
   1692 self.engine = engine
   1693 self.storage_options = storage_options
-> 1695 self._reader = self._engines[engine](self._io, storage_options=storage_options)

File D:\Programs\Python_Anaconda\lib\site-packages\pandas\io\excel\_openpyxl.py:557, in OpenpyxlReader.__init__(self, filepath_or_buffer, storage_options)
    547 """
    548 Reader using openpyxl engine.
    549 
   (...)
    554 {storage_options}
    555 """
    556 import_optional_dependency("openpyxl")
--> 557 super().__init__(filepath_or_buffer, storage_options=storage_options)

File D:\Programs\Python_Anaconda\lib\site-packages\pandas\io\excel\_base.py:545, in BaseExcelReader.__init__(self, filepath_or_buffer, storage_options)
    543 self.handles.handle.seek(0)
    544 try:
--> 545     self.book = self.load_workbook(self.handles.handle)
    546 except Exception:
    547     self.close()

File D:\Programs\Python_Anaconda\lib\site-packages\pandas\io\excel\_openpyxl.py:568, in OpenpyxlReader.load_workbook(self, filepath_or_buffer)
    565 def load_workbook(self, filepath_or_buffer: FilePath | ReadBuffer[bytes]):
    566     from openpyxl import load_workbook
--> 568     return load_workbook(
    569         filepath_or_buffer, read_only=True, data_only=True, keep_links=False
    570     )

File D:\Programs\Python_Anaconda\lib\site-packages\openpyxl\reader\excel.py:317, in load_workbook(filename, read_only, keep_vba, data_only, keep_links)
    290 """Open the given filename and return the workbook
    291 
    292 :param filename: the path to open or a file-like object
   (...)
    313 
    314 """
    315 reader = ExcelReader(filename, read_only, keep_vba,
    316                     data_only, keep_links)
--> 317 reader.read()
    318 return reader.wb

File D:\Programs\Python_Anaconda\lib\site-packages\openpyxl\reader\excel.py:279, in ExcelReader.read(self)
    277 self.read_strings()
    278 self.read_workbook()
--> 279 self.read_properties()
    280 self.read_theme()
    281 apply_stylesheet(self.archive, self.wb)

File D:\Programs\Python_Anaconda\lib\site-packages\openpyxl\reader\excel.py:173, in ExcelReader.read_properties(self)
    171 if ARC_CORE in self.valid_files:
    172     src = fromstring(self.archive.read(ARC_CORE))
--> 173     self.wb.properties = DocumentProperties.from_tree(src)

File D:\Programs\Python_Anaconda\lib\site-packages\openpyxl\descriptors\serialisable.py:103, in Serialisable.from_tree(cls, node)
    100     else:
    101         attrib[tag] = obj
--> 103 return cls(**attrib)

File D:\Programs\Python_Anaconda\lib\site-packages\openpyxl\packaging\core.py:104, in DocumentProperties.__init__(self, category, contentStatus, keywords, lastModifiedBy, lastPrinted, revision, version, created, creator, description, identifier, language, modified, subject, title)
    102 self.creator = creator
    103 self.lastModifiedBy = lastModifiedBy
--> 104 self.modified = modified
    105 self.created = created
    106 self.title = title

File D:\Programs\Python_Anaconda\lib\site-packages\openpyxl\descriptors\base.py:268, in DateTime.__set__(self, instance, value)
    266     except ValueError:
    267         raise ValueError("Value must be ISO datetime format")
--> 268 super(DateTime, self).__set__(instance, value)

File D:\Programs\Python_Anaconda\lib\site-packages\openpyxl\descriptors\nested.py:35, in Nested.__set__(self, instance, value)
     32         raise ValueError("Tag does not match attribute")
     34     value = self.from_tree(value)
---> 35 super(Nested, self).__set__(instance, value)

File D:\Programs\Python_Anaconda\lib\site-packages\openpyxl\descriptors\base.py:67, in Convertible.__set__(self, instance, value)
     64 def __set__(self, instance, value):
     65     if ((self.allow_none and value is not None)
     66         or not self.allow_none):
---> 67         value = _convert(self.expected_type, value)
     68     super(Convertible, self).__set__(instance, value)

File D:\Programs\Python_Anaconda\lib\site-packages\openpyxl\descriptors\base.py:57, in _convert(expected_type, value)
     55         value = expected_type(value)
     56     except:
---> 57         raise TypeError('expected ' + str(expected_type))
     58 return value

TypeError: expected <class 'datetime.datetime'>

什么起作用了

  • 使用“查找并替换”将所有#N/A替换为空白
  • 打开那张特别的纸

在所有工作表上执行查找和替换操作都不起作用,因为程序只是崩溃。

结论

任何帮助都将不胜感激。

EN

回答 1

Stack Overflow用户

发布于 2022-10-23 12:27:34

熊猫会看到你表格的前几行,从中推断出类型。很明显,在一张床单的某个地方,有一个datetime在一个熊猫认为是int的专栏中,它导致了

代码语言:javascript
运行
复制
TypeError: an integer is required (got type datetime.date)

加载数据的一个解决方案是

代码语言:javascript
运行
复制
spf = pd.read_excel(
    "ds/us/survey_of_professional_forecasters_all.xlsx",
    dtype=object
)

这实际上应该将所有内容解析为字符串,然后您可以逐个遍历这些列,并确保类型在

代码语言:javascript
运行
复制
spf.info()

另一种选择就是你一直在做的事情:检查原始数据并确保它们是正确的。

不幸的是,加载未清理的数据总是需要大量(手动)工作:

另外,用小步骤浏览表格真的很有帮助,例如,一张一张地

代码语言:javascript
运行
复制
spf = pd.read_excel(..., sheet_name=0)
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/74170324

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档