我有下面的测试数据: c1,c2等。表示列。
我的目标是运行一个python脚本来识别有多少列,以及哪些列的数字与第一列(即C1)不同。在本例中,C1是888,但也可以是其他值。我需要找出同一行中有多少列包含与C1不同的值。现在这是存储在csv中,我没有为此编写太多代码,因为我正在努力弄清楚如何处理它:
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 C16 C17 C18 C19
888 888 888 888 888 888 888 888 888 888 888 888 888 888 888 999 999 239.66 214.75
代码
with open(r'path','r') as r:
reader=csv.DictReader(r)
diff=[]
x=0
for row in reader:
diff.append(row)
发布于 2019-02-21 04:25:30
只需搜索与感兴趣的列不匹配的键,并检查它们的值是否与'c1‘的值匹配。
with open("test.csv", "r") as r:
reader = csv.DictReader(r)
diff = []
for row in reader:
print(row)
rowdiff = []
val = row['c1']
for key in row:
if key != 'c1' and row[key] != val:
rowdiff += [key]
# Place tuple of (# differences, column keys) in diff
diff.append((len(rowdiff), rowdiff))
print(diff)
发布于 2019-02-21 04:18:47
您也可以在这里使用pandas
。
假设您的数据帧是df
。
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 C16 C17 C18 C19
0 888 888 888 888 888 888 888 888 888 888 888 888 888 888 888 999 999 239.66 214.75
mask = df.eq(df.iloc[0, 0]) # boolean mask
df[~mask].dropna(axis=1)
C16 C17 C18 C19
0 999 999 239.66 214.75
发布于 2019-02-21 04:26:54
下面的代码填充一个字典,其中每个键都是一行,每个值都是哪些列与第一列不匹配。
dict_cols={}
for row in range(len(reader)):
same_cols=[]
for col in reader.columns[1:]:
if reader[reader.columns[0]][row]!=reader[col][row]:
same_cols.append(col)
dict_cols[row]=same_cols
不过,这可能有点费解
https://stackoverflow.com/questions/54794451
复制相似问题