我在这里关注这个stackoverflow帖子:Sort based on Frequency in R
我正在尝试按列"Node_A“中最频繁的值对数据进行排序。
library(dplyr)
Data_I_Have <- data.frame(
"Node_A" = c("John", "John", "John", "John, "John", "Peter", "Tim", "Kevin", "Adam", "Adam", "Xavier"),
"Node_B" = c("Claude", "Peter", "Tim", "Tim", "Claude", "Henry", "Kevin", "Claude", "Tim", "Henry", "Claude"),
" Place_Where_They_Met" = c("Chicago", "Boston", "Seattle", "Boston", "Paris", "Paris", "Chicago", "London", "Chicago", "London", "Paris"),
"Years_They_Have_Known_Each_Other" = c("10", "10", "1", "5", "2", "8", "7", "10", "3", "3", "5"),
"What_They_Have_In_Common" = c("Sports", "Movies", "Computers", "Computers", "Video Games", "Sports", "Movies", "Computers", "Sports", "Sports", "Video Games")
)
sort = Data_I_Have %>% arrange(Node_A, desc(Freq))
有人能告诉我我哪里做错了吗?谢谢
发布于 2020-11-06 10:20:13
作为你提到的帖子的最后一个答案结束了:
Data_I_Have %>%
group_by(Node_A) %>%
arrange( desc(n()))
# Node_A Node_B X.Place_Where_They_Met Years_They_Have_Known_Each_Other What_They_Have_In_Common
# <chr> <chr> <chr> <chr> <chr>
# 1 John Claude Chicago 10 Sports
# 2 John Peter Boston 10 Movies
# 3 John Tim Seattle 1 Computers
# 4 John Tim Boston 5 Computers
# 5 John Claude Paris 2 Video Games
# 6 Peter Henry Paris 8 Sports
# 7 Tim Kevin Chicago 7 Movies
# 8 Kevin Claude London 10 Computers
# 9 Adam Tim Chicago 3 Sports
# 10 Adam Henry London 3 Sports
# 11 Xavier Claude Paris 5 Video Games
发布于 2020-11-06 10:19:35
在对数据进行排序之前,您需要对数据进行计数。您可以尝试:
library(dplyr)
Data_I_Have %>%
count(Node_A, sort = TRUE) %>%
left_join(Data_I_Have, by = 'Node_A')
# Node_A n Node_B X.Place_Where_They_Met Years_They_Have_Known_Each_Other What_They_Have_In_Common
#1 John 5 Claude Chicago 10 Sports
#2 John 5 Peter Boston 10 Movies
#3 John 5 Tim Seattle 1 Computers
#4 John 5 Tim Boston 5 Computers
#5 John 5 Claude Paris 2 Video Games
#6 Adam 2 Tim Chicago 3 Sports
#7 Adam 2 Henry London 3 Sports
#8 Kevin 1 Claude London 10 Computers
#9 Peter 1 Henry Paris 8 Sports
#10 Tim 1 Kevin Chicago 7 Movies
#11 Xavier 1 Claude Paris 5 Video Games
或者,我们可以使用add_count
而不是count
,这样我们就不必连接数据。
Data_I_Have %>% add_count(Node_A, sort = TRUE)
如果不需要,可以从最终输出中删除n
列。
https://stackoverflow.com/questions/64707943
复制相似问题