在csv文件中,我有两组地理位置数据作为lat/long。前两列(0和1)包括第一组的纬度和经度,后两列(2和3)包括第二组。第一组包含大约100个条目,第二组包含大约75项。我想把这两个集合都画在同一张图上。我已经使用numpy.loadtxt将csv文件加载到Python中。
第一组加载时没有任何问题,但当我试图加载第二组时,会得到以下错误:
ValueError:无法将字符串转换为浮动:‘’
经过几个小时的沮丧之后,我终于确定Python正在读取第二个数据集中的空白csv单元格,这样它将与第一个数据集相等。这确实是我听过的最愚蠢的理由,因为它读取空白单元格,期望它们包含信息,然后抱怨没有信息。
为什么Python读取空白单元格,我如何使它停止这样做?
要重新创建错误,请用数据填充两列,用较少的数据填充下两列。如果前两列有10个条目,则下两个列的条目不超过9个。请将其保存为csv文件并运行以下代码:
import numpy as np
import matplotlib.pyplot as plt
lat1 = np.loadtxt(r'C:\Users\XXXXX\Desktop\latlong.csv', usecols=0, delimiter=',')
long1 = np.loadtxt(r'C:\Users\XXXXX\Desktop\latlong.csv', usecols=1, delimiter=',')
lat2 = np.loadtxt(r'C:\Users\XXXXX\Desktop\latlong.csv', usecols=2, delimiter=',')
long2 = np.loadtxt(r'C:\Users\XXXXX\Desktop\latlong.csv', usecols=3, delimiter=',')
plt.plot(lat1, long1, 'o')
plt.plot(lat2, long2, '+')
您将得到以下错误:
Traceback (most recent call last):
File "C:\Users\XXXXX\AppData\Local\Programs\Spyder\pkgs\spyder_kernels\py3compat.py", line 356, in compat_exec
exec(code, globals, locals)
File "c:\users\XXXXX\XXXXX\XXXXX\XXXXX\XXXXX\untitled0.py", line 11, in <module>
lat1 = np.loadtxt(r'C:\Users\XXXXX\Desktop\latlong.csv', usecols=0, delimiter=',')
File "C:\Users\XXXXX\AppData\Local\Programs\Spyder\pkgs\numpy\lib\npyio.py", line 1163, in loadtxt
chunk.append(packer(convert_row(words)))
File "C:\Users\XXXXX\AppData\Local\Programs\Spyder\pkgs\numpy\lib\npyio.py", line 1142, in convert_row
return [*map(_conv, vals)]
File "C:\Users\XXXXX\AppData\Local\Programs\Spyder\pkgs\numpy\lib\npyio.py", line 725, in _floatconv
return float(x) # The fastest path.
ValueError: could not convert string to float: ''
下面是我试图使用的数据示例:
41.878 -93.0977 43.78444 -88.7879
44.062 -114.742 38.58763 -80.4549
40.63313 -89.3985 43.07597 -107.29
40.55122 -85.6024
39.0119 -98.4842
前两列比第二列多两个lat/long条目。由于我不知道的原因,Python正在读取第二列中的另外4个空白单元格,然后尝试将空白单元格用作绘图数据,给出一个ValueError。为什么Python读取空白数据并试图绘制它呢?解决这个问题的办法是什么?
发布于 2022-08-25 11:08:13
我建议您立即读取整个文件,而不是尝试加载单个列。如前所述,可以使用np.genfromtxt()
更好地处理带有空值的行。
这两组数据可绘制如下:
import matplotlib.pyplot as plt
import numpy as np
latlong = np.genfromtxt('latlong.csv', delimiter=',', filling_values=np.nan)
plt.plot(latlong[:,0], latlong[:,1], 'ro')
plt.plot(latlong[:,2], latlong[:,3], 'g+')
plt.show()
对于您的示例数据,这将给您提供:
如果数据没有逗号分隔,则可以使用Python读取器对数据进行预解析,以确保每行包含正确的值数。然后,可以将它们直接加载到numpy数组中,并按如下方式转换为浮动:
import matplotlib.pyplot as plt
import numpy as np
import csv
data = []
with open('latlong.csv') as f_input:
for row in csv.reader(f_input, delimiter=' ', skipinitialspace=True):
row.extend([''] * (4 - len(row))) # pad any short rows
data.append(row)
latlong = np.array(data)
latlong[latlong == ''] = np.nan # convert empty values into nan
latlong = latlong.astype(float) # convert strings to floats
plt.plot(latlong[:,0], latlong[:,1], 'ro')
plt.plot(latlong[:,2], latlong[:,3], 'g+')
plt.show()
发布于 2022-08-24 22:42:23
从docs numpy.loadtxt
这个函数的目标是成为一个简单格式化文件的快速阅读器。genfromtxt函数提供了更复杂的处理方法,例如,缺少值的行。
沿……方向的命令
np.genfromtxt('test.csv', delimiter=',', filling_values=np.nan)
应该用nan
替换空值。您可以选择其他类似于0.0
的东西。
总结性:不傻。
发布于 2022-08-25 15:12:56
用你的样本:
In [1]: txt='''41.878 -93.0977 43.78444 -88.7879
...: 44.062 -114.742 38.58763 -80.4549
...: 40.63313 -89.3985 43.07597 -107.29
...: 40.55122 -85.6024
...: 39.0119 -98.4842'''
我在lat1
行上得到了一个错误,就像您所做的那样,但是对于另一个字符串:
In [2]: lat1 = np.loadtxt(txt.splitlines(), usecols=0, delimiter=',')
Traceback (most recent call last):
Input In [2] in <cell line: 1>
lat1 = np.loadtxt(txt.splitlines(), usecols=0, delimiter=',')
File /usr/local/lib/python3.10/dist-packages/numpy/lib/npyio.py:1313 in loadtxt
arr = _read(fname, dtype=dtype, comment=comment, delimiter=delimiter,
File /usr/local/lib/python3.10/dist-packages/numpy/lib/npyio.py:979 in _read
arr = _load_from_filelike(
ValueError: could not convert string '41.878 -93.0977 43.78444 -88.7879' to float64 at row 0, column 1.
这里有一个逗号分隔符,因此它试图将整行转换为浮点数,显然失败了。
让它使用默认的空格分隔符:
In [3]: lat1 = np.loadtxt(txt.splitlines(), usecols=0)
In [4]: lat1
Out[4]: array([41.878 , 44.062 , 40.63313, 40.55122, 39.0119 ])
In [5]: np.loadtxt(txt.splitlines(), usecols=1)
Out[5]: array([ -93.0977, -114.742 , -89.3985, -85.6024, -98.4842])
但是当我们要求它在较短的行上加载列2时,它就失败了:
In [6]: np.loadtxt(txt.splitlines(), usecols=2)
Traceback (most recent call last):
Input In [6] in <cell line: 1>
np.loadtxt(txt.splitlines(), usecols=2)
File /usr/local/lib/python3.10/dist-packages/numpy/lib/npyio.py:1313 in loadtxt
arr = _read(fname, dtype=dtype, comment=comment, delimiter=delimiter,
File /usr/local/lib/python3.10/dist-packages/numpy/lib/npyio.py:979 in _read
arr = _load_from_filelike(
ValueError: invalid column index 2 at row 4 with 2 columns
在尝试阅读整个示例时,loadtxt
和genfromtxt
的短行都有问题:
In [7]: np.loadtxt(txt.splitlines())
Traceback (most recent call last):
Input In [7] in <cell line: 1>
np.loadtxt(txt.splitlines())
File /usr/local/lib/python3.10/dist-packages/numpy/lib/npyio.py:1313 in loadtxt
arr = _read(fname, dtype=dtype, comment=comment, delimiter=delimiter,
File /usr/local/lib/python3.10/dist-packages/numpy/lib/npyio.py:979 in _read
arr = _load_from_filelike(
ValueError: the number of columns changed from 4 to 2 at row 4; use `usecols` to select a subset and avoid this error
In [8]: np.genfromtxt(txt.splitlines())
Traceback (most recent call last):
Input In [8] in <cell line: 1>
np.genfromtxt(txt.splitlines())
File /usr/local/lib/python3.10/dist-packages/numpy/lib/npyio.py:2266 in genfromtxt
raise ValueError(errmsg)
ValueError: Some errors were detected !
Line #4 (got 2 columns instead of 4)
Line #5 (got 2 columns instead of 4)
(我使用的是最新的numpy
版本,它有一个重述词loadtxt
,因此它的错误消息可能有所不同。)
将文本更改为使用逗号,包括缺失字段的标记,genfromtxt
工作得很好:
In [9]: txt='''41.878, -93.0977, 43.78444, -88.7879
...: 44.062 , -114.742 , 38.58763 , -80.4549
...: 40.63313 , -89.3985, 43.07597, -107.29
...: 40.55122 , -85.6024,,
...: 39.0119 , -98.4842,,'''
In [12]: np.genfromtxt(txt.splitlines(), delimiter=',')
Out[12]:
array([[ 41.878 , -93.0977 , 43.78444, -88.7879 ],
[ 44.062 , -114.742 , 38.58763, -80.4549 ],
[ 40.63313, -89.3985 , 43.07597, -107.29 ],
[ 40.55122, -85.6024 , nan, nan],
[ 39.0119 , -98.4842 , nan, nan]])
https://stackoverflow.com/questions/73480017
复制相似问题