首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >pandas数据帧中的"KeyError“

pandas数据帧中的"KeyError“
EN

Stack Overflow用户
提问于 2021-05-03 13:12:10
回答 1查看 616关注 0票数 0

代码:

代码语言:javascript
运行
复制
ps = PorterStemmer()
tokens = []
for i in range(0,len(df)):
    tweet = str(df['clean_tweet'][i])
    tweet = tweet.lower()
    tweet = tweet.split()
    tweet = [ps.stem(word) for word in tweet if word not in stopWords]
    tweet = ' '.join(tweet)
    tokens.append(tweet)
    print(tokens[i])
df['clean_tweet'] = tokens
df.head()

出于某种原因抛出KeyError。

代码语言:javascript
运行
复制
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~\anaconda3\envs\machine_learning\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   3079             try:
-> 3080                 return self._engine.get_loc(casted_key)
   3081             except KeyError as err:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine._get_loc_duplicates()

pandas\_libs\index_class_helper.pxi in pandas._libs.index.Int64Engine._maybe_get_bool_indexer()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine._unpack_bool_indexer()

KeyError: 31962

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
<ipython-input-30-7794ad45df60> in <module>
      2 tokens = []
      3 for i in range(0,len(df)):
----> 4     tweet = str(df['clean_tweet'][i])
      5     tweet = tweet.lower()
      6     tweet = tweet.split()

~\anaconda3\envs\machine_learning\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
    851 
    852         elif key_is_scalar:
--> 853             return self._get_value(key)
    854 
    855         if is_hashable(key):

~\anaconda3\envs\machine_learning\lib\site-packages\pandas\core\series.py in _get_value(self, label, takeable)
    959 
    960         # Similar to Index.get_value, but we do not fall back to positional
--> 961         loc = self.index.get_loc(label)
    962         return self.index._get_values_for_loc(self, loc, label)
    963 

~\anaconda3\envs\machine_learning\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   3080                 return self._engine.get_loc(casted_key)
   3081             except KeyError as err:
-> 3082                 raise KeyError(key) from err
   3083 
   3084         if tolerance is not None:

KeyError: 31962

我不知道为什么会发生这个错误。数据帧的形状是56745行×4列,显然代码能够将tweet转换为标记化的tweet,所以我认为KeyError可能是在我用标记列表覆盖数据框列时发生的。

EN

回答 1

Stack Overflow用户

发布于 2021-05-03 17:19:12

KeyError: 31962提升可能是您的数据帧的索引不连续,并且缺少31962。您可以在系列上试用apply()

代码语言:javascript
运行
复制
def clean_tweet(tweet):
    tweet = tweet.lower()
    tweet = tweet.split()
    tweet = [ps.stem(word) for word in tweet if word not in stopWords]
    return ' '.join(tweet)

df['clean_tweet'] = df['clean_tweet'].apply(clean_tweet)
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/67363760

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档