我有聊天室的聊天信息数据集。,我需要过滤掉所有聊天室,其中只有一个人在聊天中写了一些东西(即使那个人写了多个东西),。因此,在下面的示例数据集中,我需要消除聊天室1、6、和8。
data.table(Chatroom = c(1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 6, 7, 7, 8, 8), Person = c("A","A", "B","C","D","E","F","G","H","I","J","J","J","K","L","M", "M"), Message = c("Hi", "You there?", "Hello", "Hi", "Hey", "Howdy", "Hi", "Hey", "Greetings", "Hi", "Hi", "Hello?", "Anyone there?", "Hey", "Hi", "Hello?", "Helllooooooo?"))
Chatroom Person Message
1: 1 A Hi
2: 1 A You there?
3: 2 B Hello
4: 2 C Hi
5: 3 D Hey
6: 3 E Howdy
7: 4 F Hi
8: 4 G Hey
9: 5 H Greetings
10: 5 I Hi
11: 6 J Hi
12: 6 J Hello?
13: 6 J Anyone there?
14: 7 K Hey
15: 7 L Hi
16: 8 M Hello?
17: 8 M Helllooooooo?显然,这可以手动完成,但是我有大量的数据需要过滤。
有没有办法在R中使用一个或多个脚本来做到这一点?
我设想需要一个脚本来识别和保存只包含一个人的聊天室列表,然后另一个脚本将聊天室从列表中删除,但我不知道哪些功能可以实现这一点。
帮助?
发布于 2022-08-12 19:35:20
有许多选择。我的第一次尝试是在.SD中使用Chatroom
df[, .SD[uniqueN(Person)>1], Chatroom]一些可能稍快一些的备选方案:
df[, ct:=uniqueN(Person), Chatroom][ct>1][,ct:=NULL]或
df[, ct:=length(unique(Person)), Chatroom][ct>1][,ct:=NULL]或
df[, ct:=max(rleid(Person)), Chatroom][ct>1][,ct:=NULL]发布于 2022-08-12 19:45:15
这可以很容易地通过一个过滤器函数来完成。首先,为您的数据分配一个名称。从那里,您可以管道(%>%)一个group_by和一个过滤器。确保你包括了!在过滤器里。
df <- data.frame(Chatroom = c(1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 6, 7, 7, 8, 8), Person = c("A","A", "B","C","D","E","F","G","H","I","J","J","J","K","L","M", "M"), Message = c("Hi", "You there?", "Hello", "Hi", "Hey", "Howdy", "Hi", "Hey", "Greetings", "Hi", "Hi", "Hello?", "Anyone there?", "Hey", "Hi", "Hello?", "Helllooooooo?"))
final <- df %>% group_by(Person) %>% filter(!n()>1)
finalhttps://stackoverflow.com/questions/73338903
复制相似问题