首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >数据表dcast列标题

数据表dcast列标题
EN

Stack Overflow用户
提问于 2017-09-14 16:37:37
回答 1查看 348关注 0票数 0

我有表格的数据表

代码语言:javascript
运行
复制
ID  REGION  INCOME_BAND RESIDENCY_YEARS
1   SW  Under 5,000 10-15
2   Wales   Over 70,000 1-5
3   Center  15,000-19,999   6-9
4   SE  15,000-19,999   15-19
5   North   15,000-19,999   10-15
6   North   15,000-19,999   6-9

创建者

代码语言:javascript
运行
复制
exp = data.table(
  ID = c(1,2,3,4,5,6),
  REGION=c("SW", "Wales", "Center", "SE", "North", "North"),
  INCOME_BAND = c("Under ?5,000", "Over ?70,000", "?15,000-?19,999", "?15,000-?19,999", "?15,000-?19,999","?15,000-?19,999"),
  RESIDENCY_YEARS = c("10-15","1-5","6-9","15-19","10-15", "6-9"))

我想把这个转变成

我成功地完成了dcast的大部分工作:

代码语言:javascript
运行
复制
exp.dcast = dcast(exp,ID~REGION+INCOME_BAND+RESIDENCY_YEARS, fun=length,
  value.var=c('REGION', 'INCOME_BAND', 'RESIDENCY_YEARS'))

然而,我需要一些帮助创建合理的列标题。目前我有

[“身份证” “REGION.1_Center__15,000-?19,999_6-9” “REGION.1_?15-?19,999_10-15” “REGION.1_?15,000-?19,999_6-9” “REGION.1_SE_15-?19,999_15-19”"REGION.1_SW_Under ?5,000_10-15“"REGION.1_Wales_Over ?70,000_1-5” 收入带1中心15,000-19,999 6-9 收入谱带1_(15,000-?19,999_10-15) 收入谱带1_1_… 收入带1_SE_15-15-19,999-19 "INCOME_BAND.1_SW_Under ?5,000_10-15“ "INCOME_BAND.1_Wales_Over ?70000_1-5“ "RESIDENCY_YEARS.1_Center_?15,000-?19,999_6-9“"RESIDENCY_YEARS.1_North_?15,000-?19,999_10-15”“居住年份1_15,000-?19,999_6-9” 居住年份1_SE_15-15-19 999 15-19 "RESIDENCY_YEARS.1_SW_Under ?5,000_10-15“ "RESIDENCY_YEARS.1_Wales_Over ?70000_1-5“

我希望列的标题是

代码语言:javascript
运行
复制
ID  SW  Wales   Center  SE  North   Under 5,000 Over 70,000 15,000-19,999   1-5 6-9 10-15   15-19

有人能给我建议吗?

EN

回答 1

Stack Overflow用户

发布于 2017-12-04 13:37:42

这个显然很简单的问题不容易回答。因此,我们将一步一步地前进。

首先,OP试图同时重塑多个值列,从而创建了所有可用组合的不必要的交叉积。

为了以同样的方式对待所有值,我们需要首先对所有值列进行melt(),然后再进行整形:

代码语言:javascript
运行
复制
melt(exp, id.vars = "ID")[, dcast(.SD, ID ~ value, length)]

ID 1-5 10-15 15-19 6-9 ?15,000-?19,999 Center North Over ?70,000 SE SW Under ?5,000 Wales 1: 1 0 1 0 0 0 0 0 0 0 1 1 0 2: 2 1 0 0 0 0 0 0 1 0 0 0 1 3: 3 0 0 0 1 1 1 0 0 0 0 0 0 4: 4 0 0 1 0 1 0 0 0 1 0 0 0 5: 5 0 1 0 0 1 0 1 0 0 0 0 0 6: 6 0 0 0 1 1 0 1 0 0 0 0 0

现在,结果有13列,而不是19列,并且这些列是根据请求的相应值命名的。

不幸的是,列以错误的顺序出现,因为它们是按字母顺序排列的。有两种改变顺序的方法:

重塑后列的顺序更改

setcolorder()函数重新排序已到位的data.table列,例如不进行复制:

代码语言:javascript
运行
复制
# define column order = order of values
col_order <- c("North", "Wales", "Center", "SW", "SE", "Under ?5,000", "?15,000-?19,999", "Over ?70,000", "1-5", "6-9", "10-15", "15-19")

melt(exp, id.vars = "ID")[, dcast(.SD, ID ~ value, length)][
  # reorder columns
  , setcolorder(.SD, c("ID", col_order))]

ID North Wales Center SW SE Under ?5,000 ?15,000-?19,999 Over ?70,000 1-5 6-9 10-15 15-19 1: 1 0 0 0 1 0 1 0 0 0 0 1 0 2: 2 0 1 0 0 0 0 0 1 1 0 0 0 3: 3 0 0 1 0 0 0 1 0 0 1 0 0 4: 4 0 0 0 0 1 0 1 0 0 0 0 1 5: 5 1 0 0 0 0 0 1 0 0 0 1 0 6: 6 1 0 0 0 0 0 1 0 0 1 0 0

现在,所有REGION列首先出现,然后按照指定的顺序显示INCOME_BANDRESIDENCY_YEARS列。

重塑前设定因子水平

如果将value转换为具有适当排序因子级别的因素,dcast()将使用因子级别对列进行排序:

代码语言:javascript
运行
复制
melt(exp, id.vars = "ID")[, value := factor(value, col_order)][
  , dcast(.SD, ID ~ value, length)]

ID North Wales Center SW SE Under ?5,000 ?15,000-?19,999 Over ?70,000 1-5 6-9 10-15 15-19 1: 1 0 0 0 1 0 1 0 0 0 0 1 0 2: 2 0 1 0 0 0 0 0 1 1 0 0 0 3: 3 0 0 1 0 0 0 1 0 0 1 0 0 4: 4 0 0 0 0 1 0 1 0 0 0 0 1 5: 5 1 0 0 0 0 0 1 0 0 0 1 0 6: 6 1 0 0 0 0 0 1 0 0 1 0 0

在重塑前设置因子级别-懒散版本

如果按REGIONINCOME_BANDRESIDENCY_YEARS对列进行分组就足够了,那么我们可以使用一个捷径来避免在col_order中指定每个值。来自fct_inorder()包的forcats函数通过它们在向量中的第一次出现来重新排序因子级别:

代码语言:javascript
运行
复制
melt(exp, id.vars = "ID")[, value := factor(value, col_order)][
  , dcast(.SD, ID ~ value, length)]

ID SW Wales Center SE North Under ?5,000 Over ?70,000 ?15,000-?19,999 10-15 1-5 6-9 15-19 1: 1 1 0 0 0 0 1 0 0 1 0 0 0 2: 2 0 1 0 0 0 0 1 0 0 1 0 0 3: 3 0 0 1 0 0 0 0 1 0 0 1 0 4: 4 0 0 0 1 0 0 0 1 0 0 0 1 5: 5 0 0 0 0 1 0 0 1 1 0 0 0 6: 6 0 0 0 0 1 0 0 1 0 0 1 0

这是因为melt()的输出是由variable排序的。

代码语言:javascript
运行
复制
melt(exp, id.vars = "ID")

ID variable value 1: 1 REGION SW 2: 2 REGION Wales 3: 3 REGION Center 4: 4 REGION SE 5: 5 REGION North 6: 6 REGION North 7: 1 INCOME\_BAND Under ?5,000 8: 2 INCOME\_BAND Over ?70,000 9: 3 INCOME\_BAND ?15,000-?19,999 10: 4 INCOME\_BAND ?15,000-?19,999 11: 5 INCOME\_BAND ?15,000-?19,999 12: 6 INCOME\_BAND ?15,000-?19,999 13: 1 RESIDENCY\_YEARS 10-15 14: 2 RESIDENCY\_YEARS 1-5 15: 3 RESIDENCY\_YEARS 6-9 16: 4 RESIDENCY\_YEARS 15-19 17: 5 RESIDENCY\_YEARS 10-15 18: 6 RESIDENCY\_YEARS 6-9

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/46224334

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档