我希望以前没有问过这个问题,尽管我仔细检查了一下。
基本上,我有一个由21980行和9列组成的数据集。每一行由4个值组成:“其他”、“无政府状态”、“稳定”和“变化”。例如,一行:1,无政府状态稳定性,其他
,我想要一个列表,给我每一行(对象)每个政府值的重复长度(无政府状态,其他,稳定,改变“”)。
用前一行说明这一点: ID1,其他无政府状态稳定性,稳定性,其他稳定性
我的大输出列表中的第一个元素是:“无政府状态”= 2,2(有两个长度为2的重复序列)“稳定性”= 1,2(一个稳定,以及一个长度为2的重复) other =1(另一个) change =0(本行中没有变化)
基本上,我想得到整个数据集的每一行。我提出的代码如下(不幸的是,它不起作用):
matric
k <- 0
test <- list(rec)
test[[1]]$stability <- 1
test[[1]]$stability <- 2
for (j in 1: length(matric$OBJECTID)) {
for (i in 2:8) {
if (matric[j,i] == "stability") (
while (matric[j,i] == matric[j,i+1]) {
k <- k+1
biglist[[j]]$stability <- k
k <- i+k
}
)
if (matric[j,i] == "change") (
while (matric[j,i] == matric[j,i+1]) {
k <- k+1
biglist[[j]]$change <- k
k <- i+k
}
)
if (matric[j,i] == "anarchy") (
while (matric[j,i] == matric[j,i+1]) {
k <- k+1
biglist[[j]]$anarchy <- k
k <- i+k
}
)
if (matric[j,i] == "other") (
while (matric[j,i] == matric[j,i+1]) {
k <- k+1
biglist[[j]]$other <- k
k <- i+k
}
)
}
}
Matric是data.frame。biglist是一个包含21980个元素的空列表,每个元素都是一个具有四个names=的列表--“稳定性”、“无政府状态”、“更改”和“其他”。
谢谢。
另外,我应该提到,我找到了一种方法,可以轻松地通过函数rle()获取行中每个值的重复值。尽管如此,这是行不通的,因为在一天结束时,我真正需要的是数字,对应于每个值的重复长度(“无政府状态”、“变化”等),以便能够进一步平均它们。
发布于 2019-04-14 06:32:44
这里有一个tidyverse解决方案,我们将数据拉成长形式,然后进行分组和计数,以总结连续的重复值。
library(tidyverse)
# using sample data from below
df %>%
# convert to long form to help with grouping & counting
gather(col, val, -OBJECTID) %>%
arrange(OBJECTID, col) %>%
# for each OBJECTID row...
group_by(OBJECTID) %>%
# Assign a group to each contiguous set of vals by making
# a new group whenever val doesn't match the prior one
mutate(new_grp = val != lag(val, default = ""),
grp = cumsum(new_grp)) %>%
ungroup() %>%
# Count how many in each group & word within each row
count(OBJECTID, val, grp) %>%
# Count how many groups of each length by word & row
count(OBJECTID, val, n) %>%
rename(grp_length = n,
count = nn)
# A tibble: 103,432 x 4
OBJECTID val grp_length count
<int> <chr> <int> <int>
1 1 anarchy 1 1
2 1 change 1 1
3 1 change 2 1
4 1 other 1 1
5 1 stability 1 1
6 1 stability 3 1
7 2 anarchy 1 1
8 2 anarchy 2 1
9 2 change 1 1
10 2 change 2 1
# … with 103,422 more rows
这意味着对象1有一个长度为1的“无政府状态”字符串,一个长度为1和长度为2的“变化”字符串,一个长度为1的“其他”字符串,一个长度为1和3的“稳定”字符串。
样本数据:
df_rows <- 21980
df_columns <- 9
set.seed(42)
df <- tibble(
OBJECTID = rep(1:df_rows, each = df_columns),
col = rep(paste0("c", 1:df_columns), times = df_rows),
val = sample(c("other", "anarchy", "stability", "change"),
size = df_rows * df_columns, replace = TRUE)
) %>% spread(col, val)
> df
# A tibble: 21,980 x 10
OBJECTID c1 c2 c3 c4 c5 c6 c7 c8 c9
<int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 change change anarchy change stability stability stability other stability
2 2 stability anarchy stability change anarchy anarchy change change other
3 3 anarchy stability change other change change other stability anarchy
4 4 change anarchy change stability change anarchy stability other change
5 5 other other change stability anarchy anarchy other change anarchy
6 6 change change stability change stability anarchy anarchy anarchy change
7 7 other stability stability other anarchy stability stability change change
8 8 stability change other anarchy change stability other other other
9 9 other anarchy other stability other anarchy stability other stability
10 10 other anarchy stability change stability other other other anarchy
# … with 21,970 more rows
发布于 2019-04-14 05:15:55
假设您有一个包含9列的dataframe df
,这些列如下所示,而且我已经正确地理解了您的问题
str(df)
$ OBJECTID: Factor w/ 5 levels "1","2","3","4",..: 1 2 3 4 5
$ REP1 : chr "anarchy" "change" "stability" "other" ...
$ REP2 : chr "anarchy" "stability" "anarchy" "change" ...
$ REP3 : chr "other" "anarchy" "stability" "anarchy" ...
$ REP4 : chr "change" "stability" "change" "anarchy" ...
$ REP5 : chr "anarchy" "stability" "stability" "other" ...
$ REP6 : chr "other" "anarchy" "stability" "stability" ...
$ REP7 : chr "stability" "stability" "anarchy" "stability" ...
$ REP8 : chr "change" "anatchy" "change" "chang
您可以使用tidyr
对其进行重组,并为每个OBJECTID
计算每个政府发生的次数。
library(tidyr)
df %>%
gather(rep, gov, 2:9) %>%
group_by(OBJECTID, gov) %>%
summarize(count = n())
你会得到这样的东西
OBJECTID gov count
1 anarchy 3
1 change 2
1 other 2
1 stability 1
2 anarchy 3
2 change 1
2 stability 4
3 anatchy 2
https://stackoverflow.com/questions/55671688
复制相似问题