前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >使用Pandas读取大型Excel文件

使用Pandas读取大型Excel文件

作者头像
hankleo
发布2020-09-17 10:19:37
2.2K0
发布2020-09-17 10:19:37
举报
文章被收录于专栏:Hank’s BlogHank’s Blog
代码语言:javascript
复制
import os
import pandas as pd

HERE = os.path.abspath(os.path.dirname(__file__))
DATA_DIR = os.path.abspath(os.path.join(HERE, '..', 'data'))

def make_df_from_excel(file_name, nrows):
    """Read from an Excel file in chunks and make a single DataFrame.
    Parameters
    ----------
    file_name : str
    nrows : int
        Number of rows to read at a time. These Excel files are too big,
        so we can't read all rows in one go.
    """
    file_path = os.path.abspath(os.path.join(DATA_DIR, file_name))
    xl = pd.ExcelFile(file_path)

    # In this case, there was only a single Worksheet in the Workbook.
    sheetname = xl.sheet_names[0]

    # Read the header outside of the loop, so all chunk reads are
    # consistent across all loop iterations.
    df_header = pd.read_excel(file_path, sheetname=sheetname, nrows=1)
    # print(f"Excel file: {file_name} (worksheet: {sheetname})")
    print(f"文件名:{file_name}")
    print(f"工作表:{sheetname}")

    chunks = []
    i_chunk = 0
    # The first row is the header. We have already read it, so we skip it.
    skiprows = 1
    while True:
        df_chunk = pd.read_excel(
            file_path, sheetname=sheetname,
            nrows=nrows, skiprows=skiprows, header=None)
        skiprows += nrows
        # When there is no data, we know we can break out of the loop.
        if not df_chunk.shape[0]:
            break
        else:
            # print(f"  - chunk {i_chunk} ({df_chunk.shape[0]} rows)")
            print(f"行数:{df_chunk.shape[0]}")
            chunks.append(df_chunk)
        i_chunk += 1

    df_chunks = pd.concat(chunks)
    # Rename the columns to concatenate the chunks with the header.
    columns = {i: col for i, col in enumerate(df_header.columns.tolist())}
    df_chunks.rename(columns=columns, inplace=True)
    df = pd.concat([df_header, df_chunks])
    return df

if __name__ == '__main__':
    df = make_df_from_excel('/Users/mac/Desktop/Data/demo.xlsx', nrows=1000000)

from: cnblogs.com/everfight/p/pandas_read_large_number.html

本文参与 腾讯云自媒体同步曝光计划,分享自作者个人站点/博客。
原始发表:2019-10-15 ,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档