我是一个新自学的R用户,需要帮助.
我正在使用一个数据集,该数据集记录了居住地点,以及该地区在7年(2015-2021年)内是否属于都市、地区或农村人口的一个子集。每个人都有一个唯一的ID,并且每年都在一个新的行(即。每个ID有7行)。我想弄清楚有多少人留在同一个地方,有多少人已经搬到了哪里,他们搬到哪里去了。
我真的很难弄清楚我需要做些什么才能得到所需的输出,但我认为有一种方法可以得到一个汇总表,其中有许多人没有移动(+-他们所在的位置)和有多少个人已经移动(+-他们已经迁移到的地方)。
您的协助将不胜感激。
虚拟数据集:
stack <- tribble(
~ID, ~Year, ~Residence, ~Locality,
#--/--/--/----
"a", "2015", "Sydney", "Metro",
"a", "2016", "Sydney", "Metro",
"a", "2017", "Sydney", "Metro",
"a", "2018", "Sydney", "Metro",
"a", "2019", "Sydney", "Metro",
"a", "2020", "Sydney", "Metro",
"a", "2021", "Sydney", "Metro",
"b", "2015", "Sydney", "Metro",
"b", "2016", "Orange", "Regional",
"b", "2017", "Orange", "Regional",
"b", "2018", "Orange", "Regional",
"b", "2019", "Orange", "Regional",
"b", "2020", "Broken Hill", "Rural",
"b", "2021", "Sydney", "Metro",
"c", "2015", "Dubbo", "Regional",
"c", "2016", "Dubbo", "Regional",
"c", "2017", "Dubbo", "Regional",
"c", "2018", "Dubbo", "Regional",
"c", "2019", "Dubbo", "Regional",
"c", "2020", "Dubbo", "Regional",
"c", "2021", "Dubbo", "Regional",
)提前干杯。
发布于 2022-04-22 01:07:11
您可以使用铅函数在下一年添加包含人员位置的列。使用变体横穿,您可以同时将lead应用于两列。然后,您可以进行逐行比较,并在总结之前寻找移动。
#Group by individual before applying the lead function
#Apply the lead function to the two listed columns and add "nextyear" as a suffix
#Add a logical column which returns TRUE if any change of residence or locality is detected.
#summarise the date by individual by retaining the location with the max year.
stack%>%
unite(col="Location", c(Residence, Locality), sep="-")%>%
group_by(ID)%>%
mutate(across(c("Year", "Location"), list(nextyear= lead)),
Move=Location!=Location_nextyear)%>%
filter(!is.na(Year_nextyear))%>%
mutate(nb.of.moves=sum(Move, na.rm=TRUE))%>%
slice_max(Year)%>%
select(ID, last.location=Location_nextyear, nb.of.moves)
# A tibble: 3 x 3
# Groups: ID [3]
ID last.location nb.of.moves
<chr> <chr> <int>
1 a Sydney-Metro 0
2 b Sydney-Metro 3
3 c Dubbo-Regional 0发布于 2022-04-22 03:12:25
下面是另一个tidyverse选项,并使用cumsum。我们可以得到累计的总和来显示每个人移动多少次(如果他们移动的话)。然后,我们可以slice最后一行,并获得每个位置的计数。change列指示它们移动了多少次。然而,目前还不清楚你想要的最终产品是什么样子。
library(tidyverse)
stack %>%
group_by(ID) %>%
mutate(
change = cumsum(case_when(
paste0(Residence, Locality) != lag(paste0(Residence, Locality)) ~ TRUE,
TRUE ~ FALSE
))
) %>%
slice(n()) %>%
ungroup %>%
count(Residence, Locality, change)输出
Residence Locality change n
<chr> <chr> <int> <int>
1 Dubbo Regional 0 1
2 Sydney Metro 0 1
3 Sydney Metro 3 1发布于 2022-04-22 07:22:53
使用data.table。
library(data.table)
setDT(stack) # convert to data.table
setorder(stack, ID, Year) # assure rows are in correct order
stack[, rle(paste(Residence, Locality, sep=', ')), by=.(ID)]
## ID lengths values
## 1: a 7 Sydney, Metro
## 2: b 1 Sydney, Metro
## 3: b 4 Orange, Regional
## 4: b 1 Broken Hill, Rural
## 5: b 1 Sydney, Metro
## 6: c 7 Dubbo, Regional因此,a在悉尼呆了7年,b在悉尼呆了一年,然后搬到橘子郡住了4年,然后搬到破碎山住了一年,然后搬回悉尼一年。
要确定每个人移动了多少次:
result <- stack[, rle(paste(Residence, Locality, sep=', ')), by=.(ID)]
result[, .(N=.N-1), by=.(ID)]
## ID N
## 1: a 0
## 2: b 3
## 3: c 0因此,a和c根本没有移动,b移动了3次。
https://stackoverflow.com/questions/71962237
复制相似问题