我有表格的数据表
ID REGION INCOME_BAND RESIDENCY_YEARS
1 SW Under 5,000 10-15
2 Wales Over 70,000 1-5
3 Center 15,000-19,999 6-9
4 SE 15,000-19,999 15-19
5 North 15,000-19,999 10-15
6 North 15,000-19,999 6-9
创建者
exp = data.table(
ID = c(1,2,3,4,5,6),
REGION=c("SW", "Wales", "Center", "SE", "North", "North"),
INCOME_BAND = c("Under ?5,000", "Over ?70,000", "?15,000-?19,999", "?15,000-?19,999", "?15,000-?19,999","?15,000-?19,999"),
RESIDENCY_YEARS = c("10-15","1-5","6-9","15-19","10-15", "6-9"))
我想把这个转变成
我成功地完成了dcast的大部分工作:
exp.dcast = dcast(exp,ID~REGION+INCOME_BAND+RESIDENCY_YEARS, fun=length,
value.var=c('REGION', 'INCOME_BAND', 'RESIDENCY_YEARS'))
然而,我需要一些帮助创建合理的列标题。目前我有
[“身份证” “REGION.1_Center__15,000-?19,999_6-9” “REGION.1_?15-?19,999_10-15” “REGION.1_?15,000-?19,999_6-9” “REGION.1_SE_15-?19,999_15-19”"REGION.1_SW_Under ?5,000_10-15“"REGION.1_Wales_Over ?70,000_1-5” 收入带1中心15,000-19,999 6-9 收入谱带1_(15,000-?19,999_10-15) 收入谱带1_1_… 收入带1_SE_15-15-19,999-19 "INCOME_BAND.1_SW_Under ?5,000_10-15“ "INCOME_BAND.1_Wales_Over ?70000_1-5“ "RESIDENCY_YEARS.1_Center_?15,000-?19,999_6-9“"RESIDENCY_YEARS.1_North_?15,000-?19,999_10-15”“居住年份1_15,000-?19,999_6-9” 居住年份1_SE_15-15-19 999 15-19 "RESIDENCY_YEARS.1_SW_Under ?5,000_10-15“ "RESIDENCY_YEARS.1_Wales_Over ?70000_1-5“
我希望列的标题是
ID SW Wales Center SE North Under 5,000 Over 70,000 15,000-19,999 1-5 6-9 10-15 15-19
有人能给我建议吗?
发布于 2017-12-04 13:37:42
这个显然很简单的问题不容易回答。因此,我们将一步一步地前进。
首先,OP试图同时重塑多个值列,从而创建了所有可用组合的不必要的交叉积。
为了以同样的方式对待所有值,我们需要首先对所有值列进行melt()
,然后再进行整形:
melt(exp, id.vars = "ID")[, dcast(.SD, ID ~ value, length)]
ID 1-5 10-15 15-19 6-9 ?15,000-?19,999 Center North Over ?70,000 SE SW Under ?5,000 Wales 1: 1 0 1 0 0 0 0 0 0 0 1 1 0 2: 2 1 0 0 0 0 0 0 1 0 0 0 1 3: 3 0 0 0 1 1 1 0 0 0 0 0 0 4: 4 0 0 1 0 1 0 0 0 1 0 0 0 5: 5 0 1 0 0 1 0 1 0 0 0 0 0 6: 6 0 0 0 1 1 0 1 0 0 0 0 0
现在,结果有13列,而不是19列,并且这些列是根据请求的相应值命名的。
不幸的是,列以错误的顺序出现,因为它们是按字母顺序排列的。有两种改变顺序的方法:
重塑后列的顺序更改
setcolorder()
函数重新排序已到位的data.table
列,例如不进行复制:
# define column order = order of values
col_order <- c("North", "Wales", "Center", "SW", "SE", "Under ?5,000", "?15,000-?19,999", "Over ?70,000", "1-5", "6-9", "10-15", "15-19")
melt(exp, id.vars = "ID")[, dcast(.SD, ID ~ value, length)][
# reorder columns
, setcolorder(.SD, c("ID", col_order))]
ID North Wales Center SW SE Under ?5,000 ?15,000-?19,999 Over ?70,000 1-5 6-9 10-15 15-19 1: 1 0 0 0 1 0 1 0 0 0 0 1 0 2: 2 0 1 0 0 0 0 0 1 1 0 0 0 3: 3 0 0 1 0 0 0 1 0 0 1 0 0 4: 4 0 0 0 0 1 0 1 0 0 0 0 1 5: 5 1 0 0 0 0 0 1 0 0 0 1 0 6: 6 1 0 0 0 0 0 1 0 0 1 0 0
现在,所有REGION
列首先出现,然后按照指定的顺序显示INCOME_BAND
和RESIDENCY_YEARS
列。
重塑前设定因子水平
如果将value
转换为具有适当排序因子级别的因素,dcast()
将使用因子级别对列进行排序:
melt(exp, id.vars = "ID")[, value := factor(value, col_order)][
, dcast(.SD, ID ~ value, length)]
ID North Wales Center SW SE Under ?5,000 ?15,000-?19,999 Over ?70,000 1-5 6-9 10-15 15-19 1: 1 0 0 0 1 0 1 0 0 0 0 1 0 2: 2 0 1 0 0 0 0 0 1 1 0 0 0 3: 3 0 0 1 0 0 0 1 0 0 1 0 0 4: 4 0 0 0 0 1 0 1 0 0 0 0 1 5: 5 1 0 0 0 0 0 1 0 0 0 1 0 6: 6 1 0 0 0 0 0 1 0 0 1 0 0
在重塑前设置因子级别-懒散版本
如果按REGION
、INCOME_BAND
和RESIDENCY_YEARS
对列进行分组就足够了,那么我们可以使用一个捷径来避免在col_order
中指定每个值。来自fct_inorder()
包的forcats
函数通过它们在向量中的第一次出现来重新排序因子级别:
melt(exp, id.vars = "ID")[, value := factor(value, col_order)][
, dcast(.SD, ID ~ value, length)]
ID SW Wales Center SE North Under ?5,000 Over ?70,000 ?15,000-?19,999 10-15 1-5 6-9 15-19 1: 1 1 0 0 0 0 1 0 0 1 0 0 0 2: 2 0 1 0 0 0 0 1 0 0 1 0 0 3: 3 0 0 1 0 0 0 0 1 0 0 1 0 4: 4 0 0 0 1 0 0 0 1 0 0 0 1 5: 5 0 0 0 0 1 0 0 1 1 0 0 0 6: 6 0 0 0 0 1 0 0 1 0 0 1 0
这是因为melt()
的输出是由variable
排序的。
melt(exp, id.vars = "ID")
ID variable value 1: 1 REGION SW 2: 2 REGION Wales 3: 3 REGION Center 4: 4 REGION SE 5: 5 REGION North 6: 6 REGION North 7: 1 INCOME\_BAND Under ?5,000 8: 2 INCOME\_BAND Over ?70,000 9: 3 INCOME\_BAND ?15,000-?19,999 10: 4 INCOME\_BAND ?15,000-?19,999 11: 5 INCOME\_BAND ?15,000-?19,999 12: 6 INCOME\_BAND ?15,000-?19,999 13: 1 RESIDENCY\_YEARS 10-15 14: 2 RESIDENCY\_YEARS 1-5 15: 3 RESIDENCY\_YEARS 6-9 16: 4 RESIDENCY\_YEARS 15-19 17: 5 RESIDENCY\_YEARS 10-15 18: 6 RESIDENCY\_YEARS 6-9
https://stackoverflow.com/questions/46224334
复制相似问题