问pandas数据帧中的"KeyError“
EN

Stack Overflow用户

提问于 2021-05-03 13:12:10

回答 1查看 616关注 0票数 0

代码：

ps = PorterStemmer()
tokens = []
for i in range(0,len(df)):
    tweet = str(df['clean_tweet'][i])
    tweet = tweet.lower()
    tweet = tweet.split()
    tweet = [ps.stem(word) for word in tweet if word not in stopWords]
    tweet = ' '.join(tweet)
    tokens.append(tweet)
    print(tokens[i])
df['clean_tweet'] = tokens
df.head()

出于某种原因抛出KeyError。

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~\anaconda3\envs\machine_learning\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   3079             try:
-> 3080                 return self._engine.get_loc(casted_key)
   3081             except KeyError as err:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine._get_loc_duplicates()

pandas\_libs\index_class_helper.pxi in pandas._libs.index.Int64Engine._maybe_get_bool_indexer()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine._unpack_bool_indexer()

KeyError: 31962

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
<ipython-input-30-7794ad45df60> in <module>
      2 tokens = []
      3 for i in range(0,len(df)):
----> 4     tweet = str(df['clean_tweet'][i])
      5     tweet = tweet.lower()
      6     tweet = tweet.split()

~\anaconda3\envs\machine_learning\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
    851 
    852         elif key_is_scalar:
--> 853             return self._get_value(key)
    854 
    855         if is_hashable(key):

~\anaconda3\envs\machine_learning\lib\site-packages\pandas\core\series.py in _get_value(self, label, takeable)
    959 
    960         # Similar to Index.get_value, but we do not fall back to positional
--> 961         loc = self.index.get_loc(label)
    962         return self.index._get_values_for_loc(self, loc, label)
    963 

~\anaconda3\envs\machine_learning\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   3080                 return self._engine.get_loc(casted_key)
   3081             except KeyError as err:
-> 3082                 raise KeyError(key) from err
   3083 
   3084         if tolerance is not None:

KeyError: 31962

我不知道为什么会发生这个错误。数据帧的形状是56745行×4列，显然代码能够将tweet转换为标记化的tweet，所以我认为KeyError可能是在我用标记列表覆盖数据框列时发生的。

python

pandas

dataframe

回答 1

Stack Overflow用户

发布于 2021-05-03 17:19:12

KeyError: 31962提升可能是您的数据帧的索引不连续，并且缺少31962。您可以在系列上试用apply()

def clean_tweet(tweet):
    tweet = tweet.lower()
    tweet = tweet.split()
    tweet = [ps.stem(word) for word in tweet if word not in stopWords]
    return ' '.join(tweet)

df['clean_tweet'] = df['clean_tweet'].apply(clean_tweet)

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/67363760

复制

相似问题

问pandas数据帧中的"KeyError“
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问pandas数据帧中的"KeyError“EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问pandas数据帧中的"KeyError“
EN