文章/答案/技术大牛

发布

社区首页 >问答首页 >计算R中列表中重复序列的长度

问计算R中列表中重复序列的长度
EN

Stack Overflow用户

提问于 2019-04-14 03:20:31

回答 2查看 126关注 0票数 0

我希望以前没有问过这个问题，尽管我仔细检查了一下。

基本上，我有一个由21980行和9列组成的数据集。每一行由4个值组成：“其他”、“无政府状态”、“稳定”和“变化”。例如，一行：1，无政府状态稳定性，其他

，我想要一个列表，给我每一行(对象)每个政府值的重复长度(无政府状态，其他，稳定，改变“”)。

用前一行说明这一点: ID1，其他无政府状态稳定性，稳定性，其他稳定性

我的大输出列表中的第一个元素是：“无政府状态”= 2，2(有两个长度为2的重复序列)“稳定性”= 1，2(一个稳定，以及一个长度为2的重复) other =1(另一个) change =0(本行中没有变化)

基本上，我想得到整个数据集的每一行。我提出的代码如下(不幸的是，它不起作用)：

matric
k <- 0

test <- list(rec)
test[[1]]$stability <- 1
test[[1]]$stability <- 2

for (j in 1: length(matric$OBJECTID)) {

  for (i in 2:8) {
    if (matric[j,i] == "stability") (
      while (matric[j,i] == matric[j,i+1]) {
        k <- k+1
        biglist[[j]]$stability <- k
        k <- i+k
      }

    )
      if (matric[j,i] == "change") (
      while (matric[j,i] == matric[j,i+1]) {
        k <- k+1
      biglist[[j]]$change <- k
      k <- i+k
      }
    )

     if (matric[j,i] == "anarchy") (
      while (matric[j,i] == matric[j,i+1]) {
        k <- k+1
        biglist[[j]]$anarchy <- k
      k <- i+k
      }
    )
         if (matric[j,i] == "other") (
      while (matric[j,i] == matric[j,i+1]) {
        k <- k+1
      biglist[[j]]$other <- k
      k <- i+k
      }
    )
  }


}

Matric是data.frame。biglist是一个包含21980个元素的空列表，每个元素都是一个具有四个names=的列表--“稳定性”、“无政府状态”、“更改”和“其他”。

谢谢。

另外，我应该提到，我找到了一种方法，可以轻松地通过函数rle()获取行中每个值的重复值。尽管如此，这是行不通的，因为在一天结束时，我真正需要的是数字，对应于每个值的重复长度(“无政府状态”、“变化”等)，以便能够进一步平均它们。

count

repeat

find-occurrences

回答 2

Stack Overflow用户

发布于 2019-04-14 06:32:44

这里有一个tidyverse解决方案，我们将数据拉成长形式，然后进行分组和计数，以总结连续的重复值。

library(tidyverse)
# using sample data from below

df %>%
  # convert to long form to help with grouping & counting
  gather(col, val, -OBJECTID) %>%
  arrange(OBJECTID, col) %>%

  # for each OBJECTID row...
  group_by(OBJECTID) %>%
  # Assign a group to each contiguous set of vals by making
  #   a new group whenever val doesn't match the prior one
  mutate(new_grp = val != lag(val, default = ""),
         grp = cumsum(new_grp)) %>%
  ungroup() %>%

  # Count how many in each group & word within each row
  count(OBJECTID, val, grp) %>%
  # Count how many groups of each length by word & row
  count(OBJECTID, val, n) %>%
  rename(grp_length = n,
         count      = nn)
# A tibble: 103,432 x 4
   OBJECTID val       grp_length count
      <int> <chr>          <int> <int>
 1        1 anarchy            1     1
 2        1 change             1     1
 3        1 change             2     1
 4        1 other              1     1
 5        1 stability          1     1
 6        1 stability          3     1
 7        2 anarchy            1     1
 8        2 anarchy            2     1
 9        2 change             1     1
10        2 change             2     1
# … with 103,422 more rows

这意味着对象1有一个长度为1的“无政府状态”字符串，一个长度为1和长度为2的“变化”字符串，一个长度为1的“其他”字符串，一个长度为1和3的“稳定”字符串。

样本数据：

df_rows <- 21980
df_columns <- 9
set.seed(42)
df <- tibble(
        OBJECTID = rep(1:df_rows, each = df_columns),
        col = rep(paste0("c", 1:df_columns), times = df_rows),
        val = sample(c("other", "anarchy", "stability", "change"), 
      size = df_rows * df_columns, replace = TRUE)
      ) %>% spread(col, val)

> df
# A tibble: 21,980 x 10
   OBJECTID c1        c2        c3        c4        c5        c6        c7        c8        c9       
      <int> <chr>     <chr>     <chr>     <chr>     <chr>     <chr>     <chr>     <chr>     <chr>    
 1        1 change    change    anarchy   change    stability stability stability other     stability
 2        2 stability anarchy   stability change    anarchy   anarchy   change    change    other    
 3        3 anarchy   stability change    other     change    change    other     stability anarchy  
 4        4 change    anarchy   change    stability change    anarchy   stability other     change   
 5        5 other     other     change    stability anarchy   anarchy   other     change    anarchy  
 6        6 change    change    stability change    stability anarchy   anarchy   anarchy   change   
 7        7 other     stability stability other     anarchy   stability stability change    change   
 8        8 stability change    other     anarchy   change    stability other     other     other    
 9        9 other     anarchy   other     stability other     anarchy   stability other     stability
10       10 other     anarchy   stability change    stability other     other     other     anarchy 
# … with 21,970 more rows

票数 1

Stack Overflow用户

发布于 2019-04-14 05:15:55

假设您有一个包含9列的dataframe df，这些列如下所示，而且我已经正确地理解了您的问题

str(df)

 $ OBJECTID: Factor w/ 5 levels "1","2","3","4",..: 1 2 3 4 5
 $ REP1    : chr  "anarchy" "change" "stability" "other" ...
 $ REP2    : chr  "anarchy" "stability" "anarchy" "change" ...
 $ REP3    : chr  "other" "anarchy" "stability" "anarchy" ...
 $ REP4    : chr  "change" "stability" "change" "anarchy" ...
 $ REP5    : chr  "anarchy" "stability" "stability" "other" ...
 $ REP6    : chr  "other" "anarchy" "stability" "stability" ...
 $ REP7    : chr  "stability" "stability" "anarchy" "stability" ...
 $ REP8    : chr  "change" "anatchy" "change" "chang

您可以使用tidyr对其进行重组，并为每个OBJECTID计算每个政府发生的次数。

library(tidyr)
df %>% 
  gather(rep, gov, 2:9) %>% 
  group_by(OBJECTID, gov) %>% 
  summarize(count = n())

你会得到这样的东西

OBJECTID  gov       count
1        anarchy    3       
1        change     2       
1        other      2       
1        stability  1       
2        anarchy    3       
2        change     1       
2        stability  4       
3        anatchy    2

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/55671688

复制

相似问题

问计算R中列表中重复序列的长度
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问计算R中列表中重复序列的长度EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问计算R中列表中重复序列的长度
EN