我正在导入一个包含5列数据(不同数据类型)的文本文件。由于某种原因,一旦数据被导入和清理。它们都是pandas中的类型对象,因此无法区分列。
我的目标是通过数据类型区分列,并删除包含特定数据类型的列。代码和结果如下:
import pandas as pd
import re
data = pd.read_csv('SevAvail2.txt', sep="\t", header=None)
df = pd.DataFrame(data)
header = df.column = df.iloc[0]
header = df.reindex(df.index.drop(0))
# print(header)
df = header
df = df.loc[:, df.isnull().mean() < .95]
#count remaining column length and print list with count
col_length = len(df.columns)
print(col_length)
header_label = []
for i in range(0, col_length):
header_label.append(i)
#reset headers to (0 : n)
df.columns = header_label
# print(df)
for column in df.columns[0:]:
print(df[column])结果列:
1 AB21313BF
2 AB21313GF
3 AB21313SF
4 AB21313CF
5 AB21313KF
Name: 0, dtype: object
1 BABA TECH
2 LALA TECH
3 NDMP
4 IND CORP
5 CAMP
Name: 1, dtype: object
1 9.2500
2 15.7500
3 7.0000
4 19.7500
5 33.5000
Name: 2, dtype: object
1 -65
2 1.75
3 0
4 -4
5 .75)
Name: 3, dtype: object
1 4,501,561.00
2 3,145,531.00
3 1,454,303.00
4 1,420,949.00
5 1,095,575.00
Name: 4, dtype: object发布于 2019-03-15 04:08:54
您可以使用pandas infer_dtype api来推断列的数据类型。
示例:
import pandas as pd
df = pd.DataFrame({'c1': [1,2], 'c2': [1.0,2.0], 'c3': ["a","b"]})
for c in df.columns:
print (pd.lib.infer_dtype(df[c]))输出:
integer floating string
文档:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.api.types.infer_dtype.html
以字符串形式存储的数字:
当一个数字包含",“并且存储为字符串时(例如:‘4,501,561.00’),一种暴力方式是
import pandas as pd
df = pd.DataFrame({'c1': ['4,501,561.00','501,561.00'], 'c2': [1.0,2.0], 'c3': ["a","b"]})
for c in df.columns:
if pd.lib.infer_dtype(df[c]) == 'string':
# Or is it a number stored as string
try:
df[c].str.replace(',','').astype(float)
print ("floating")
except:
print ("string")
else:
print (pd.lib.infer_dtype(df[c]))https://stackoverflow.com/questions/55170792
复制相似问题