我正试图从字典中创建一个非常简单的Pandas DataFrame。字典有3项,DataFrame也有。它们是:
# from a dicitionary
>>>dict1 = {"x": [1, 2, 3],
... "y": list(
... [
... [2, 4, 6],
... [3, 6, 9],
... [4, 8, 12]
... ]
... ),
... "z": 100}
>>>df1 = pd.DataFrame(dict1)
>>>df1
x y z
0 1 [2, 4, 6] 100
1 2 [3, 6, 9] 100
2 3 [4, 8, 12] 100
y
,并尝试从字典中创建一个DataFrame。我试图创建DataFrame错误的行。下面是我试图运行的代码,以及我得到的错误(为了便于阅读,在单独的代码块中)。
>>>dict2 = {"x": [1, 2, 3],
... "y": np.array(
... [
... [2, 4, 6],
... [3, 6, 9],
... [4, 8, 12]
... ]
... ),
... "z": 100}
>>>df2 = pd.DataFrame(dict2) # see the below block for error
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
d:\studies\compsci\pyscripts\study\pandas-realpython\data-delightful\01.intro.ipynb Cell 10' in <module>
1 # from a dicitionary
2 dict1 = {"x": [1, 2, 3],
3 "y": np.array(
4 [
(...)
9 ),
10 "z": 100}
---> 12 df1 = pd.DataFrame(dict1)
File ~\anaconda3\envs\dst\lib\site-packages\pandas\core\frame.py:636, in DataFrame.__init__(self, data, index, columns, dtype, copy)
630 mgr = self._init_mgr(
631 data, axes={"index": index, "columns": columns}, dtype=dtype, copy=copy
632 )
634 elif isinstance(data, dict):
635 # GH#38939 de facto copy defaults to False only in non-dict cases
--> 636 mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy, typ=manager)
637 elif isinstance(data, ma.MaskedArray):
638 import numpy.ma.mrecords as mrecords
File ~\anaconda3\envs\dst\lib\site-packages\pandas\core\internals\construction.py:502, in dict_to_mgr(data, index, columns, dtype, typ, copy)
494 arrays = [
495 x
496 if not hasattr(x, "dtype") or not isinstance(x.dtype, ExtensionDtype)
497 else x.copy()
498 for x in arrays
499 ]
500 # TODO: can we get rid of the dt64tz special case above?
--> 502 return arrays_to_mgr(arrays, columns, index, dtype=dtype, typ=typ, consolidate=copy)
File ~\anaconda3\envs\dst\lib\site-packages\pandas\core\internals\construction.py:120, in arrays_to_mgr(arrays, columns, index, dtype, verify_integrity, typ, consolidate)
117 if verify_integrity:
118 # figure out the index, if necessary
119 if index is None:
--> 120 index = _extract_index(arrays)
121 else:
122 index = ensure_index(index)
File ~\anaconda3\envs\dst\lib\site-packages\pandas\core\internals\construction.py:661, in _extract_index(data)
659 raw_lengths.append(len(val))
660 elif isinstance(val, np.ndarray) and val.ndim > 1:
--> 661 raise ValueError("Per-column arrays must each be 1-dimensional")
663 if not indexes and not raw_lengths:
664 raise ValueError("If using all scalar values, you must pass an index")
ValueError: Per-column arrays must each be 1-dimensional
为什么会像第二次尝试那样错误地结束,即使这两个数组的尺寸是相同的?解决这个问题的方法是什么?
发布于 2022-03-22 23:28:37
如果您仔细查看错误消息并快速查看源代码这里
elif isinstance(val, np.ndarray) and val.ndim > 1:
raise ValueError("Per-column arrays must each be 1-dimensional")
您会发现,如果dictionay值是一个numpy数组,并且有多个维度作为示例,它将根据源代码抛出一个错误。因此,它对list非常有效,因为一个列表没有超过一个维度,即使它是一个列表列表。
lst = [[1,2,3],[4,5,6],[7,8,9]]
len(lst) # print 3 elements or (3,) not (3,3) like numpy array.
您可以尝试使用np.array(1,2,3),它将工作,因为维度数为1,并尝试:
arr = np.array([1,2,3])
print(arr.ndim) # output is 1
如果有必要在字典中使用numpy数组,则可以使用.tolist()
将numpy数组转换为列表。
https://stackoverflow.com/questions/71577514
复制相似问题