假设您有以下数据帧:
In [1]: import pandas as pd
In [2]: index = [('California',2000),('California', 2010), ('New York', 2000),
('New York', 2000), ('New York', 2010), ('Texas', 2000), ('Texas',2010)]
In [3]: populations = [33871648, 37253956,189765457,19378102,20851820,25145561
...: ]
In [4]: pop_df = pd.DataFrame(populations,index=index,columns=["Data"])
In [5]: pop_df
Out[5]:
Data
(California, 2000) 33871648
(California, 2010) 37253956
(New York, 2000) 189765457
(New York, 2010) 19378102
(Texas, 2000) 20851820
(Texas, 2010) 25145561
如何对此数据帧进行索引以获取所有加州数据?我尝试了下面的方法,得到了一个键错误pop_df[('California,)]
。因此,我执行了以下命令,但仍然得到一个键错误:
In [6]: index2 = pd.MultiIndex.from_tuples(index)
In [7]: pop_df2 = pop_df.reindex(index2)
In [8]: pop_df2
Out[8]:
Data
California 2000 33871648
2010 37253956
New York 2000 189765457
2010 19378102
Texas 2000 20851820
2010 25145561
In [9]: pop_df2['California']
pop_df2['California']
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~/opt/miniconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3079 try:
-> 3080 return self._engine.get_loc(casted_key)
3081 except KeyError as err:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'California'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
<ipython-input-141-18a1a54664b0> in <module>
----> 1 pop_df2['California']
~/opt/miniconda3/lib/python3.8/site-packages/pandas/core/frame.py in __getitem__(self, key)
3022 if self.columns.nlevels > 1:
3023 return self._getitem_multilevel(key)
-> 3024 indexer = self.columns.get_loc(key)
3025 if is_integer(indexer):
3026 indexer = [indexer]
~/opt/miniconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3080 return self._engine.get_loc(casted_key)
3081 except KeyError as err:
-> 3082 raise KeyError(key) from err
3083
3084 if tolerance is not None:
KeyError: 'California'
索引到多索引数据帧的正确方法是什么?
发布于 2021-03-11 18:20:29
你想要.loc[]
。如果没有它,您将查找名为“California”的列,而不是索引标签。
顺便说一下,您的输入中有一个拼写错误,您正在复制索引条目。下面是完整的代码。
In [1]: import pandas as pd
...: index = [
...: ('California',2000),
...: ('California', 2010),
...: ('New York', 2000),
...: ('New York', 2010),
...: ('Texas', 2000),
...: ('Texas',2010)
...: ]
...: populations = [33871648, 37253956,189765457,19378102,20851820,25145561]
...: pop_df = pd.DataFrame(populations,index=index,columns=["Data"])
...: index2 = pd.MultiIndex.from_tuples(index)
...: pop_df2 = pop_df.reindex(index2)
...: pop_df2.loc['California']
Out[1]:
Data
2000 33871648
2010 37253956
发布于 2021-03-11 18:19:03
df['somename']
查找列,df.loc['somename']
查找索引。您需要:
pop_df2.loc['California']
输出:
Data
2000 33871648
2010 37253956
您还可以使用xs
选项,它允许在不同级别上进行切片,并保持完整的索引层次结构:
# default `drop_level` is True
# which behave like `.loc` on top level
pop_df.xs('California', level=0, drop_level=False)
输出:
Data
California 2000 33871648
2010 37253956
或第二层上的xs
:
pop_df.xs(2010, level=1, drop_level=False)
为您提供:
Data
California 2010 37253956
New York 2010 19378102
Texas 2010 25145561
发布于 2021-03-11 18:27:55
尝试使用IndexSlice
pop_df2.loc[pd.IndexSlice[['California'],],]
Out[52]:
Data
California 2000 33871648
2010 37253956
https://stackoverflow.com/questions/66588217
复制