下面是一个包含0到1之间的值的向量:
a <- runif(100, 0, 1)我想做以下转换
>= 0.975 becomes AA+
<= 0.025 becomes AA-
< 0.975 && > 0.025 becomes AA
a[a >= 0.975] = 'AA+'
sum(a == 'AA+')
3
a[a <= 0.025] = 'AA-'
sum(a == 'AA-')
2
a[a > 0.025 && a < 0.975] = 'AA'
sum(a == 'AA')
100换言之:
a
[1] "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA"
[16] "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA"
[31] "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA"
[46] "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA"
[61] "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA"
[76] "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA"
[91] "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA"我对为什么会发生这种事感到困惑。为什么AA要覆盖前两个转换?
发布于 2018-03-04 04:49:46
请注意,一旦您这样做:
a[a >= 0.975] = 'AA+' 整个向量a被转换为字符,这并不是真正想要的。最好这样做:
aa <- character(length(a)) # pre-allocate aa
aa[a >= 0.975] <- "AA+"
aa[a > 0.025 & a < 0.975] <- "AA" # note &, not &&
aa[a <= 0.025] <- "AA-"以下是一些替代方案:
1)裁剪 cut将工作,但值0.975将被指定为"AA":
cut(a, c(0, 0.025, 0.975, 1), lab = c("AA-", "AA", "AA+"))2)下标
c("AA-", "AA", "AA+")[ 1 + (a > 0.025) + (a >= 0.975) ]3)如果是
ifelse(a <= 0.025, "AA-", ifelse(a < 0.975, "AA", "AA+"))4) case_when
library(dplyr)
case_when( a <= 0.025 ~ "AA-",
a < 0.975 ~ "AA",
TRUE ~ "AA+")发布于 2018-03-04 04:34:25
1) 修改的原始解决方案我们需要使用单一&而不是&&
a[a > 0.025 & a < 0.975] = 'AA'
table(a)
# a
# AA AA- AA+
# 92 5 3 2) 解释(根据?"&" )
&指示逻辑和和,并表示逻辑OR。较短的表单以与算术运算符相同的方式执行元素级比较。较长的表单从左到右计算,只检查每个向量的第一个元素。评估只在结果确定之前进行。
这种差异很容易理解,即逻辑条件的输出是一个单一的元素。
a > 0.025 && a < 0.975
#[1] TRUE这样做的回收和所有元素被替换为'AA'。
但是如果我们这么做
a > 0.025 & a < 0.975
# [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
# [13] TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE FALSE TRUE
# [25] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE
# [37] FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
# [49] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
# [61] TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
# [73] TRUE TRUE TRUE TRUE FALSE TRUE TRUE FALSE TRUE TRUE TRUE TRUE
# [85] TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE
# [97] TRUE TRUE TRUE TRUE3) 替代解决方案如果我们需要使用更好的方法,就会有findInterval
c("AA-", "AA", "AA+")[findInterval(a, c(0, 0.025, 0.975))]4) 代替另一种选择是使用replace
library(dplyr) #for chaining
replace(a, a >= 0.975, 'AA+') %>%
replace(., .<= 0.025, 'AA-') %>%
replace(., . >0.025 & . < 0.975, 'AA')数据
set.seed(42)
a <- runif(100, 0, 1)
a[a >= 0.975] = 'AA+'
a[a <= 0.025] = 'AA-' https://stackoverflow.com/questions/49091992
复制相似问题