代码:
ps = PorterStemmer()
tokens = []
for i in range(0,len(df)):
tweet = str(df['clean_tweet'][i])
tweet = tweet.lower()
tweet = tweet.split()
tweet = [ps.stem(word) for word in tweet if word not in stopWords]
tweet = ' '.join(tweet)
tokens.append(tweet)
print(tokens[i])
df['clean_tweet'] = tokens
df.head()
出于某种原因抛出KeyError。
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~\anaconda3\envs\machine_learning\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
3079 try:
-> 3080 return self._engine.get_loc(casted_key)
3081 except KeyError as err:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine._get_loc_duplicates()
pandas\_libs\index_class_helper.pxi in pandas._libs.index.Int64Engine._maybe_get_bool_indexer()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine._unpack_bool_indexer()
KeyError: 31962
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
<ipython-input-30-7794ad45df60> in <module>
2 tokens = []
3 for i in range(0,len(df)):
----> 4 tweet = str(df['clean_tweet'][i])
5 tweet = tweet.lower()
6 tweet = tweet.split()
~\anaconda3\envs\machine_learning\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
851
852 elif key_is_scalar:
--> 853 return self._get_value(key)
854
855 if is_hashable(key):
~\anaconda3\envs\machine_learning\lib\site-packages\pandas\core\series.py in _get_value(self, label, takeable)
959
960 # Similar to Index.get_value, but we do not fall back to positional
--> 961 loc = self.index.get_loc(label)
962 return self.index._get_values_for_loc(self, loc, label)
963
~\anaconda3\envs\machine_learning\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
3080 return self._engine.get_loc(casted_key)
3081 except KeyError as err:
-> 3082 raise KeyError(key) from err
3083
3084 if tolerance is not None:
KeyError: 31962
我不知道为什么会发生这个错误。数据帧的形状是56745行×4列,显然代码能够将tweet
转换为标记化的tweet,所以我认为KeyError可能是在我用标记列表覆盖数据框列时发生的。
发布于 2021-05-03 17:19:12
KeyError: 31962
提升可能是您的数据帧的索引不连续,并且缺少31962
。您可以在系列上试用apply()
def clean_tweet(tweet):
tweet = tweet.lower()
tweet = tweet.split()
tweet = [ps.stem(word) for word in tweet if word not in stopWords]
return ' '.join(tweet)
df['clean_tweet'] = df['clean_tweet'].apply(clean_tweet)
https://stackoverflow.com/questions/67363760
复制相似问题