我正在做一个项目,在这个项目中我把数据从SQL导入到了pandas DataFrame中。这看起来很顺利,但当我使用pandas.mean()时,它抛出了一个TypeError,说明连接的值列表不能转换为数字(见下文):
示例数据帧:
df =
ProductSKU OverallHeight-ToptoBottom
0 AAI2185 74.5
1 AAI2275 47
2 AAI2686 56.5
3 AASA1002 73.23
4 AASA1032 39.37
5 AASA1039 72.44
6 AASA1099 75.6
7 AASA1101 38
8 ABCM1910 69
9 ABCM1980 72
函数调用:
def summarizeTagData(df, tag):
avgValue = df.loc[:,tag].dropna().mean() <--- Breaks here
stdevValue = df.loc[:,tag].dropna().std()
lowerBound = max(avgValue-(3*stdevValue),0)
upperBound = (avgValue+(3*stdevValue))
outsideRangeCount = df[df[tag]>upperBound].shape[0]
missingDataCount = df[df[tag].isnull()].shape[0]
dataDict = {"Average":avgValue
, "StDev":stdevValue
, "UpperBound":upperBound
, "LowerBound":lowerBound
, "OutsideRange":outsideRangeCount
, "MissingData":missingDataCount
}
return dataDict
控制台输出:
summarizeTagData(df, 'OverallHeight-ToptoBottom')
Traceback (most recent call last):
File "<ipython-input-22-f1f26a0a0520>", line 1, in <module>
summarizeTagData(df, 'OverallHeight-ToptoBottom')
File "C:/Users/tmori/Google Drive/Projects/Product Dimension Accuracy/ProductDataTag_Analysis.py", line 23, in summarizeTagData
avgValue = df.loc[:,tag].dropna().mean()
File "C:\Program Files\Anaconda\lib\site-packages\pandas\core\generic.py", line 5310, in stat_func
numeric_only=numeric_only)
...
File "C:\Program Files\Anaconda\lib\site-packages\pandas\core\nanops.py", line 293, in nanmean
the_sum = _ensure_numeric(values.sum(axis, dtype=dtype_sum))
File "C:\Program Files\Anaconda\lib\site-packages\pandas\core\nanops.py", line 743, in _ensure_numeric
raise TypeError('Could not convert %s to numeric' % str(x))
TypeError: Could not convert 74.54756.573.2339.3772.4475.6386972 to numeric
最奇怪的(也是我搞不懂的)是,当我通过CSV导入相同的数据时,它工作得非常好。只有当我通过SQL加载它时,它才会崩溃,会不会有什么地方我做错了?
最好的,汤姆
https://stackoverflow.com/questions/44522741
复制相似问题