如果列tweet_text为en,则尝试拆分列en中的字符串。
下面是如何在字符串上执行此操作:
s = 'I am always sad'
s_split = s.split(" ")这将返回:
['I', 'am', 'always', 'sad']我的当前代码不起作用:
df['tweet_text'] = df.apply(lambda x: x['tweet_text'].split(" ") if x['lang'] is 'en' else x['tweet_text'], axis = 1)数据字典:
{'lang': {1404: 'en',
1943: 'en',
2169: 'en',
2502: 'de',
3981: 'nl',
4226: 'en',
7223: 'en',
8557: 'de',
11339: 'pt',
11854: 'en'},
'tweet_text': {1404: 'I am always sad when a colleague loses his job and Frank is not just a colleague he is an impoant person in my',
1943: 'It remains goalless at FNB Stadium between Kaizer Chiefs and Baroka at halftimeRead more',
2169: 'Which one gets your vote 05',
2502: 'Was sagt ihr zu den ersten Minuten',
3981: 'En we gaan door speelronde begint vandaagTegen wie speelt jouw favoriete club',
4226: 'Quote tweet or replyYour favourite Mesut Ozil moment as a Gunner was',
7223: 'How to follow the game live The opponent Current form Did you know The squad Koeman said It must b',
8557: 'BAYERN BAYERN BAYERN BAYERN BAYERN BAYERN BAYERN BAYERN BAYERN BAYERN BAYERN BAYERN BAYERN BAYERN BAYERN BAYERN',
11339: '9o golo para',
11854: 'have loads of boss stuff available on their store products available including the m'}}发布于 2021-02-11 07:24:10
使用==代替is,也可以使用split(" ")与split()一样工作
df['tweet_text'] = df.apply(lambda x: x['tweet_text'].split() if x['lang'] == 'en' else x['tweet_text'], axis = 1)或者,您可以只对Series.str.split行使用en替代方案:
m = df['lang'] == 'en'
df.loc[m, 'tweet_text'] = df.loc[m, 'tweet_text'].str.split()发布于 2021-02-11 07:30:13
你也可以这样做:
mask = df["lang"] == "en", "tweet_text"
df.loc[mask] = df.loc[mask].str.split()https://stackoverflow.com/questions/66150362
复制相似问题