我需要将以下数据集中的“值”变量分为三个变量:估计值、低值变量、高值变量。请注意,有时没有置信区间,所以我只是有这个值。
country gho year publishstate value
Afghanistan Raised fasting blood glucose (>=7.0 mmol/L or on medication)(age-standardized estimate) 1980 Published 4.9 [2.5-8.6]
Afghanistan Raised fasting blood glucose (>=7.0 mmol/L or on medication)(age-standardized estimate) 1981 Published 5.1 [2.7-8.5]
Afghanistan Raised fasting blood glucose (>=7.0 mmol/L or on medication)(age-standardized estimate) 1982 Published 5.2 [2.9-8.5]
Afghanistan Raised fasting blood glucose (>=7.0 mmol/L or on medication)(age-standardized estimate) 1983 Published 5.4 [3.1-8.6]我试过这样做:
Data$estimate <- sub("\\[.*","",Data$value)但它只适用于创建变量估计。我在考虑用strsplit,但它也不管用.
你能帮个忙吗?
非常感谢,
N.
发布于 2020-01-27 13:27:52
使用注释中以可复制形式显示的数据,我们可以使用如图所示的separate。如果fill="right"参数中只列出了一个子字段,则lower和upper将使用NAs填充。
library(dplyr)
library(tidyr)
DF %>%
separate(value, c("value", "lower", "upper", NA), sep = "[^0-9.]+", fill = "right")备注
Lines <- "country,glucose,year,publishstate,value
Afghanistan,Raised fasting blood glucose (>=7.0 mmol/L or on medication)(age-standardized estimate),1980,Published,4.9 [2.5-8.6]
Afghanistan,Raised fasting blood glucose (>=7.0 mmol/L or on medication)(age-standardized estimate),1981,Published,5.1 [2.7-8.5]
Afghanistan,Raised fasting blood glucose (>=7.0 mmol/L or on medication)(age-standardized estimate),1982,Published,5.2 [2.9-8.5]
Afghanistan,Raised fasting blood glucose (>=7.0 mmol/L or on medication)(age-standardized estimate),1983,Published,5.4 [3.1-8.6]"
DF <- read.csv(text = Lines, header = TRUE, as.is = TRUE)发布于 2020-01-27 13:40:03
发布于 2020-01-27 13:40:50
这是另一种只使用R基的方法
lapply(strsplit(Data$value, "[^[:digit:].]"), function(x) as.numeric(x[x != ""]))
# [[1]]
# [1] 4.9 2.5 8.6
#
# [[2]]
# [1] 5.1 2.7 8.5
#
# [[3]]
# [1] 5.2 2.9 8.5
#
# [[4]]
# [1] 5.4 3.1 8.6https://stackoverflow.com/questions/59931822
复制相似问题