我想创建一个新的数据集,其中包含排名最接近的“其他”地方当局与“格拉斯哥”地方当局,最接近的更高和最低。它是选择和替换,因此相同的观察可以被多次选择。
DataZone<- c("1005247", "1003253", "1003708", "1003158", "1003428",
"1004568", "1008765", "1001122", "1005234")
LocalAuthority<-c("Other", "Glasgow","Glasgow","Glasgow","Glasgow", "Other",
"Glasgow", "Glasgow", "Other")
Rank<-c(1,2,3,4,5,6,7,8,9)
df<-data.frame(DataZone, LocalAuthority, Rank)在新数据集中我想要的是
DataZone 1005247 -4倍,因为它最接近格拉斯哥的4倍
DataZone 1004568 -6倍,最接近格拉斯哥的4个,最接近2个格拉斯哥的DZ。
DataZone 1005234 -2倍,因为它最接近格拉斯哥的2倍
我的代码为我提供了一个包含所有相关DataZones的新数据集,但它不允许多次选择数据区:
df<-df[order(df$Rank),]
IncControls = Loop %>%
mutate(newcol = ifelse(!LocalAuthority=="Glasgow"&
(lag(LocalAuthority)=="Glasgow"|lead(LocalAuthority)=="Glasgow"),1,0) ) %>%
filter(newcol==1)发布于 2017-09-07 00:54:18
您需要指定输出的格式,但您可以尝试这样做
Lengths <- rle(LocalAuthority)$lengths
# 1 4 1 2 1
Positions <- cumsum(rle(LocalAuthority)$lengths)
# 1 5 6 8 9
isGlasgow <- rle(LocalAuthority)$values=="Glasgow"
# FALSE TRUE FALSE TRUE FALSE
F <- rep(DataZone[head(Positions[lead(isGlasgow)],-1)], Lengths[isGlasgow])
# "1005247" "1005247" "1005247" "1005247" "1004568" "1004568"
R <- rep(DataZone[tail(Positions[lag(isGlasgow)],-1)], Lengths[isGlasgow])
# "1004568" "1004568" "1004568" "1004568" "1005234" "1005234"
ans <- sort(c(F,R))
ans
# "1004568" "1004568" "1004568" "1004568" "1004568" "1004568" "1005234"
# "1005234" "1005247" "1005247" "1005247" "1005247"
table(ans)
# 1004568 1005234 1005247
# 6 2 4https://stackoverflow.com/questions/46076751
复制相似问题