前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >data.table包不讲武德,欺负老实人

data.table包不讲武德,欺负老实人

作者头像
邓飞
发布2020-12-15 14:41:15
8980
发布2020-12-15 14:41:15
举报
文章被收录于专栏:育种数据分析之放飞自我

事情是这个样子的,今天上午,高高兴兴的写代码,把data.table放到循环里面,批量读取文件,批量赋值,写完运行后发现结果是错误的,查看Warning发现是类型不一致,就这个问题记录了一下。希望对后来者有帮助。

「报错类型:」

代码语言:javascript
复制
Warning messages:
1: In set(x, j = name, value = value) :
  Coercing 'character' RHS to 'integer' to match the type of the target column (column 1 named 'Number').
2: In set(x, j = name, value = value) : 强制改变过程中产生了NA

查了一下data.table的说明文档:

❝Unlike <- for data.frame, the (potentially large) LHS [Left Hand Side] is not coerced to match the type of the (often small) RHS [Right Hand Side]. Instead the RHS is coerced to match the type of the LHS, if necessary. Where this involves double precision values being coerced to an integer column, a warning is given (whether or not fractional data is truncated). The motivation for this is efficiency. It is best to get the column types correct up front and stick to them. Changing a column type is possible but deliberately harder: provide a whole column as the RHS. This RHS is then plonked into that column slot and we call this plonk syntax, or replace column syntax if you prefer. By needing to construct a full length vector of a new type, you as the user are more aware of what is happening, and it's clearer to readers of your code that you really do intend to change the column type. ❞

里面的内容大体是,当列的数据类型不一样时,会报错。有两种解决方法:

1,将列的类型变为一致,比如你的数字列要赋值为字符,那就先把数字列变为字符,再赋值 2,可以将赋值的字符的行和被赋值的行一样,这样也不会报错

1. 生成数据

「生成一个data.table的数据框」

代码语言:javascript
复制
# DT
library(data.table)

df = data.table(x = 1:10,y = rnorm(10),z = paste0("ttt",1:10))
df
str(df)
代码语言:javascript
复制
> df
     x           y     z
 1:  1  0.55319365  ttt1
 2:  2 -0.08265915  ttt2
 3:  3 -1.50851585  ttt3
 4:  4 -0.19653575  ttt4
 5:  5 -1.55555254  ttt5
 6:  6  0.03887365  ttt6
 7:  7  0.36618923  ttt7
 8:  8 -0.93304230  ttt8
 9:  9 -0.24562587  ttt9
10: 10  1.52407895 ttt10
> str(df)
Classes ‘data.table’ and 'data.frame': 10 obs. of  3 variables:
 $ x: int  1 2 3 4 5 6 7 8 9 10
 $ y: num  0.5532 -0.0827 -1.5085 -0.1965 -1.5556 ...
 $ z: chr  "ttt1" "ttt2" "ttt3" "ttt4" ...
 - attr(*, ".internal.selfref")=<externalptr> 

这里,x列是数字,y列是数字,z列是字符。

2. 重演错误:将x列变为a1

代码语言:javascript
复制
> df$x = "a1"
Warning messages:
1: In set(x, j = name, value = value) :
  Coercing 'character' RHS to 'integer' to match the type of the target column (column 1 named 'x').
2: In set(x, j = name, value = value) : 强制改变过程中产生了NA

这里的报错信息是,右边是字符,左边是数字,类型不匹配,所以报错。「注意,这里虽然用的是Warning,但是结果是错误的,看下面转化后的数据,真是不讲武德!!!,全部变为了NA

代码语言:javascript
复制
> df
     x           y     z
 1: NA  0.55319365  ttt1
 2: NA -0.08265915  ttt2
 3: NA -1.50851585  ttt3
 4: NA -0.19653575  ttt4
 5: NA -1.55555254  ttt5
 6: NA  0.03887365  ttt6
 7: NA  0.36618923  ttt7
 8: NA -0.93304230  ttt8
 9: NA -0.24562587  ttt9
10: NA  1.52407895 ttt10

如果是data.frame,就不会出现这种错误:

代码语言:javascript
复制
df = data.frame(x = 1:10,y = rnorm(10),z = paste0("ttt",1:10))
df
str(df)

df$x = "a1"
df

「可以看到,框的一下就转化好了,很快的!!!,都说data.table和data.frame差不多,但就是差这么一点点,学艺不精,bug满坑啊!!!」

代码语言:javascript
复制
> df = data.frame(x = 1:10,y = rnorm(10),z = paste0("ttt",1:10))
> df
    x          y     z
1   1 -0.5037848  ttt1
2   2 -1.4766567  ttt2
3   3 -0.1606073  ttt3
4   4 -0.6011270  ttt4
5   5  1.6626815  ttt5
6   6  0.2565216  ttt6
7   7  0.2683151  ttt7
8   8 -2.3469332  ttt8
9   9 -1.6655096  ttt9
10 10  0.3784420 ttt10
> str(df)
'data.frame': 10 obs. of  3 variables:
 $ x: int  1 2 3 4 5 6 7 8 9 10
 $ y: num  -0.504 -1.477 -0.161 -0.601 1.663 ...
 $ z: Factor w/ 10 levels "ttt1","ttt10",..: 1 3 4 5 6 7 8 9 10 2
> df$x = "a1"
> df
    x          y     z
1  a1 -0.5037848  ttt1
2  a1 -1.4766567  ttt2
3  a1 -0.1606073  ttt3
4  a1 -0.6011270  ttt4
5  a1  1.6626815  ttt5
6  a1  0.2565216  ttt6
7  a1  0.2683151  ttt7
8  a1 -2.3469332  ttt8
9  a1 -1.6655096  ttt9
10 a1  0.3784420 ttt10

3. 解决方案1:将x列先变为字符,再赋值

先把它转化为字符dfx = as.character(dfx),然后再赋值

代码语言:javascript
复制
df = data.table(x = 1:10,y = rnorm(10),z = paste0("ttt",1:10))
df
str(df)

df$x = as.character(df$x)
df$x = "a1"
df

可以看到,搞定:

代码语言:javascript
复制
> df$x = as.character(df$x)
> df$x = "a1"
> df
     x          y     z
 1: a1 -0.8852575  ttt1
 2: a1 -0.1708877  ttt2
 3: a1  0.3803468  ttt3
 4: a1  0.4192728  ttt4
 5: a1  1.4413745  ttt5
 6: a1 -0.6828477  ttt6
 7: a1  0.4294502  ttt7
 8: a1 -0.1611874  ttt8
 9: a1 -2.3305019  ttt9
10: a1 -0.1424764 ttt10

4. 把赋值的行和被赋值的一致

将被赋值的行,弄成一样长度的df$x = rep("a1",dim(df)[1])

代码语言:javascript
复制
df = data.table(x = 1:10,y = rnorm(10),z = paste0("ttt",1:10))
str(df)

df$x = rep("a1",dim(df)[1])
df

可以看到,也成功了:

代码语言:javascript
复制
> df = data.table(x = 1:10,y = rnorm(10),z = paste0("ttt",1:10))
> str(df)
Classes ‘data.table’ and 'data.frame': 10 obs. of  3 variables:
 $ x: int  1 2 3 4 5 6 7 8 9 10
 $ y: num  1.425 0.0537 0.219 1.8867 -0.1562 ...
 $ z: chr  "ttt1" "ttt2" "ttt3" "ttt4" ...
 - attr(*, ".internal.selfref")=<externalptr> 
> df$x = rep("a1",dim(df)[1])
> df
     x           y     z
 1: a1  1.42502710  ttt1
 2: a1  0.05370049  ttt2
 3: a1  0.21899323  ttt3
 4: a1  1.88674618  ttt4
 5: a1 -0.15622174  ttt5
 6: a1  0.43704146  ttt6
 7: a1  1.31103082  ttt7
 8: a1 -0.09496113  ttt8
 9: a1  0.33710145  ttt9
10: a1 -0.05053140 ttt10

5, 数字列赋值为字符,就报错。字符列赋值数字,就正常

「这不是赤裸裸的歧视吗!!!」字符赋值数字,就运行成功了df$z = 123

代码语言:javascript
复制
df = data.table(x = 1:10,y = rnorm(10),z = paste0("ttt",1:10))
str(df)

df$z = 123
df

结果如下:

代码语言:javascript
复制
> df = data.table(x = 1:10,y = rnorm(10),z = paste0("ttt",1:10))
> str(df)
Classes ‘data.table’ and 'data.frame': 10 obs. of  3 variables:
 $ x: int  1 2 3 4 5 6 7 8 9 10
 $ y: num  0.148 -0.795 1.16 0.375 0.765 ...
 $ z: chr  "ttt1" "ttt2" "ttt3" "ttt4" ...
 - attr(*, ".internal.selfref")=<externalptr> 
> df$z = 123
> df
     x          y   z
 1:  1  0.1484868 123
 2:  2 -0.7951205 123
 3:  3  1.1601522 123
 4:  4  0.3751982 123
 5:  5  0.7651195 123
 6:  6  0.7172938 123
 7:  7  1.6518403 123
 8:  8  0.3031258 123
 9:  9 -1.3506003 123
10: 10  1.4655129 123

6. data.table不讲武德,欺负老实人

但是,我还是要用它的,因为它确实很香的!!!

学艺不精,bug满坑,所以我还要继续填坑啊。

本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2020-12-10,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 育种数据分析之放飞自我 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 1. 生成数据
  • 2. 重演错误:将x列变为a1
  • 3. 解决方案1:将x列先变为字符,再赋值
  • 4. 把赋值的行和被赋值的一致
  • 5, 数字列赋值为字符,就报错。字符列赋值数字,就正常
  • 6. data.table不讲武德,欺负老实人
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档