df2 <- structure(list(location = c("Dayton", "Toledo"), total_voters = c(236L,
332L), candidate_1 = c(49L, 61L), candidate_2 = c(33L, 78L),
candidate_3 = c(19L, 71L), candidate_5 = c(42L, 52L)), row.names = c(NA,
-2L), class = "data.frame")我有来自SQL查询的数据,其形状如下:
+----------+--------------+-------------+-------------+-------------+-------------+-------------+
| location | total_voters | candidate_1 | candidate_2 | candidate_3 | candidate_4 | candidate_5 |
+----------+--------------+-------------+-------------+-------------+-------------+-------------+
| Dayton | 236 | 49 | 33 | 19 | 93 | 42 |
| Toledo | 332 | 61 | 78 | 71 | 70 | 52 |
+----------+--------------+-------------+-------------+-------------+-------------+-------------+这些数字代表每个候选人的得票数。那么我想要做什么呢?我想使用R(我想象通过dplyr或tidyr)来透视这个数据,这样它看起来就像这样:
+-------------+-------+----------+--------------+
| candidate | votes | location | total_voters |
+-------------+-------+----------+--------------+
| candidate_1 | 49 | Dayton | 236 |
| candidate_2 | 33 | Dayton | 236 |
| candidate_3 | 19 | Dayton | 236 |
| candidate_4 | 93 | Dayton | 236 |
| candidate_5 | 42 | Dayton | 236 |
| candidate_1 | 61 | Toledo | 332 |
| candidate_2 | 78 | Toledo | 332 |
| candidate_3 | 71 | Toledo | 332 |
| candidate_4 | 70 | Toledo | 332 |
| candidate_5 | 52 | Toledo | 332 |
+-------------+-------+----------+--------------+在R中实现这一点最有效的方法是什么?
发布于 2020-02-21 06:53:58
以下是pivot_longer的一种选择
library(dplyr)
library(tidyr)
df1 %>%
pivot_longer(cols = everything(), names_to = 'candidate', values_to = 'votes')
# A tibble: 5 x 2
# candidate votes
# <chr> <dbl>
#1 candidate_1 49
#2 candidate_2 33
#3 candidate_3 19
#4 candidate_4 93
#5 candidate_5 42利用更新后的数据,
df2 %>%
pivot_longer(cols = -c(location, total_voters),
names_to = 'candidate', values_to = 'votes')
# A tibble: 8 x 4
# location total_voters candidate votes
# <chr> <int> <chr> <int>
#1 Dayton 236 candidate_1 49
#2 Dayton 236 candidate_2 33
#3 Dayton 236 candidate_3 19
#4 Dayton 236 candidate_5 42
#5 Toledo 332 candidate_1 61
#6 Toledo 332 candidate_2 78
#7 Toledo 332 candidate_3 71
#8 Toledo 332 candidate_5 52或者在base R中,这可以通过stack来完成
stack(df1)[2:1]或通过转换为table
as.data.frame.table(as.matrix(df1))[,-1]或者像@markus建议的那样
reshape2::melt(df1)数据
df1 <- data.frame(candidate_1 = 49, candidate_2 = 33,
candidate_3 = 19, candidate_4 = 93, candidate_5 = 42)
df2 <- structure(list(location = c("Dayton", "Toledo"), total_voters = c(236L,
332L), candidate_1 = c(49L, 61L), candidate_2 = c(33L, 78L),
candidate_3 = c(19L, 71L), candidate_5 = c(42L, 52L)), row.names = c(NA,
-2L), class = "data.frame")发布于 2020-02-21 06:53:51
实际上你可以使用data.frame + t来实现,也就是,
dflong <- data.frame(t(dfwide))发布于 2020-02-21 07:19:02
如果列名称是candidate_1、candidate_2等,那么您可以只使用reshape2包中的melt函数。
a=data.frame(candidate_1=49,
candidate_2=33,
candidate_3=19,
candidate_4=93,
candidate_5=42)
b=reshape2::melt(a)https://stackoverflow.com/questions/60329706
复制相似问题