首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
社区首页 >问答首页 >为什么潘达斯没有移除重复的

为什么潘达斯没有移除重复的
EN

Stack Overflow用户
提问于 2022-11-12 23:14:36
回答 1查看 39关注 0票数 0

对于PythonandPandas...but来说,非常新的问题是,我的最终输出文件并不排除“客户号”上的任何重复项。任何关于为什么会发生这种情况的建议都将不胜感激!

代码语言:javascript
代码运行次数:0
运行
复制
import pandas as pd
import numpy as np #numpy is the module which can replace errors from huge datasets 
from openpyxl import load_workbook
from openpyxl.styles import Font

df_1 = pd.read_excel('PRT Tracings 2020.xlsx', sheet_name='Export') #this is reading the Excel document shifts and looks at sheet
df_2 = pd.read_excel('PRT Tracings 2021.xlsx', sheet_name='Export') #this reads the same Excel document but looks at a different sheet
df_3 = pd.read_excel('PRT Tracings YTD 2022.xlsx', sheet_name='Export') #this reads a different Excel file, and only has one sheet so no need to have it read a sheet

df_all = pd.concat([df_1, df_2, df_3], sort=False) #this combines the sheets from 1,2,3 and the sort function as false so our columns stay in the same order

to_excel = df_all.to_excel('Combined_PRT_Tracings.xlsx', index=None) #this Excel file combines all three sheets into one spreadsheet

df_all = df_all.replace(np.nan, 'N/A', regex=True) #replaces errors with N/A

remove = ['ORDERNUMBER', 'ORDER_TYPE', 'ORDERDATE', 'Major Code Description', 'Product_Number_And_Desc', 'Qty', 'Order_$', 'Order_List_$'] #this will remove all unwanted columns
df_all.drop(columns=remove, inplace=True)

df_all.drop_duplicates(subset=['Customer Number'], keep=False) #this will remove all duplicates from the tracing number syntax with pandas module

to_excel = df_all.to_excel('Combined_PRT_Tracings.xlsx', index=None) #this Excel file combines all three sheets into one spreadsheet

wb = load_workbook('Combined_PRT_Tracings.xlsx') #we are using this to have openpyxl read the data, from the spreadsheet already created
ws = wb.active #this workbook is active

wb.save('Combined_PRT_Tracings.xlsx')
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-11-12 23:22:32

您应该将df_all.drop_duplicates的返回值赋值给一个变量,或者将inplace=True设置为覆盖DataFrame内容。这是为了防止对原始数据的不需要的更改。

尝试:

代码语言:javascript
代码运行次数:0
运行
复制
df_all = df_all.drop_duplicates(subset='Customer Number', keep=False)

或相当于:

代码语言:javascript
代码运行次数:0
运行
复制
df_all.drop_duplicates(subset='Customer Number', keep=False, inplace=True)

这将从DataFrame中删除所有重复行。如果要保留包含重复的第一行或最后一行,请将keep更改为firstlast

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/74417286

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档