excel办公小能手，python合并多个EXCEL表的两种方法

二爷

发布于 2020-11-03 15:43:28

1.1K00

代码可运行

文章被收录于专栏：二爷记二爷记

运行总次数：0

代码可运行

手里头有一份网站关键词数据，当然是来源于工具，站长之家工具的网站查询数据，百度搜索前百名的网站数据，总共96个excel文档数据，至于为什么不是一百个excel文档，答案是有重复网站覆盖了，比如某些大站，比如，百度自身网站，知乎，京东，阿里网站等！

需求

现在的需求就是将这些文档数据合并为一个excel文档，思路无非就是将这些数据合并为一个文档数据。

思路

应用python实现的方法有两种，第一种是借助第三方库，xlrd和lsxWriter打开文档读取数据重新写入到一个新excel文档中；第二种方法是使用第三方库，pandas读取所有文档的数据，重新写入到一个新的excel文档当中，其实感觉都是大同小异吧。

实现前奏

排除干扰网站数据

通过判断来排除干扰数据，有干扰的大网站数据

    def get_excels(self):
        """
        读取所有excel文件数据，应用 if 排除干扰网站数据
        :return: excel_files
        """
        excel_files=[]
        excels=os.listdir(self.excel_path)
        print(len(excels))
        for excel in excels:
            if "zhihu" not in excel:
                if "baidu" not in excel:
                    if "bilibili" not in excel:
                        excel=os.path.join(self.excel_path,excel)
                        print(excel)
                        excel_files.append(excel)

        print(len(excel_files))

        return excel_files

方法一

借助第三方库，xlrd和lsxWriter打开文档读取数据重新写入到一个新excel文档

    def get_first_exceldata(self):
        """
        xlrd和lsxWriter打开文档读取数据重新写入到一个新excel文档
        读取所有excel表格数据，写入新的excel表格
        :return:
        """
        #读取数据
        data=[]
        excel_files=self.get_excels()
        for excel_file in excel_files:
            wb=xlrd.open_workbook(excel_file)
            for sheet in wb.sheets():
                for rownum in range(sheet.nrows):
                    data.append(sheet.row_values(rownum))
        print(data)
        print(len(data))

        #写入数据
        workbook = xlsxwriter.Workbook(self.first_target_xls)
        worksheet = workbook.add_worksheet()
        #font = workbook.add_format({"font_size": 14})
        for i in range(len(data)):
            for j in range(len(data[i])):
                worksheet.write(i, j, data[i][j])
                #worksheet.write(i, j, data[i][j], font)

        workbook.close()  #关闭文件流

方法二

使用第三方库，pandas读取所有文档的数据，重新写入到一个新的excel文档

    def get_second_exceldata(self):
        """
        pandas 读取所有文档的数据，重新写入到一个新的excel文档
        :return:
        """
        data = []
        excel_files = self.get_excels()
        for excel_file in excel_files:
            df = pd.read_excel(excel_file)  # excel转换成DataFrame
            data.append(df)


        result = pd.concat(data)
        result.to_csv(self.second_target_xls,encoding='utf-8-sig',sep=',', index=False)  # 保存合并的数据，并把合并后的文件命名

附完整参考源码

#合并多个excel  20201015
#author/微信：huguo00289
# -*- coding: utf-8 -*-


import os
import xlrd
import xlsxwriter
import pandas as pd

class Hb():
    def __init__(self):
        self.excel_path=r'E:/Python/mryq'
        self.first_target_xls = "E:/python/first_mryq.xlsx"
        self.second_target_xls = "E:/python/second_mryq.csv"

    def get_excels(self):
        """
        读取所有excel文件数据，应用 if 排除干扰网站数据
        :return: excel_files
        """
        excel_files=[]
        excels=os.listdir(self.excel_path)
        print(len(excels))
        for excel in excels:
            if "zhihu" not in excel:
                if "baidu" not in excel:
                    if "bilibili" not in excel:
                        excel=os.path.join(self.excel_path,excel)
                        print(excel)
                        excel_files.append(excel)

        print(len(excel_files))

        return excel_files



    def get_first_exceldata(self):
        """
        xlrd和lsxWriter打开文档读取数据重新写入到一个新excel文档
        读取所有excel表格数据，写入新的excel表格
        :return:
        """
        #读取数据
        data=[]
        excel_files=self.get_excels()
        for excel_file in excel_files:
            wb=xlrd.open_workbook(excel_file)
            for sheet in wb.sheets():
                for rownum in range(sheet.nrows):
                    data.append(sheet.row_values(rownum))
        print(data)
        print(len(data))

        #写入数据
        workbook = xlsxwriter.Workbook(self.first_target_xls)
        worksheet = workbook.add_worksheet()
        #font = workbook.add_format({"font_size": 14})
        for i in range(len(data)):
            for j in range(len(data[i])):
                worksheet.write(i, j, data[i][j])
                #worksheet.write(i, j, data[i][j], font)

        workbook.close()  #关闭文件流



    def get_second_exceldata(self):
        """
        pandas 读取所有文档的数据，重新写入到一个新的excel文档
        :return:
        """
        data = []
        excel_files = self.get_excels()
        for excel_file in excel_files:
            df = pd.read_excel(excel_file)  # excel转换成DataFrame
            data.append(df)


        result = pd.concat(data)
        result.to_csv(self.second_target_xls,encoding='utf-8-sig',sep=',', index=False)  # 保存合并的数据，并把合并后的文件命名


if __name__=='__main__':
    spider=Hb()
    spider.get_first_exceldata()
    spider.get_second_exceldata()