首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >R:在`[.data.table`(out,,`:=`(term_id,1L:.N),by = list(doc_id))中出错)

R:在`[.data.table`(out,,`:=`(term_id,1L:.N),by = list(doc_id))中出错)
EN

Stack Overflow用户
提问于 2022-09-20 19:10:28
回答 1查看 28关注 0票数 0

我正在使用R编程语言。

我有以下数据:

代码语言:javascript
运行
复制
library(udpipe)
library(BTM)

my_data = structure(list(id = 1:8, reviews = c("I guess the employee decided to buy their lunch with my card my card hoping I wouldn't notice but since it took so long to run my car I want to head and check my bank account and sure enough they had bought food on my card that I did not receive leave. Had to demand for and for a refund because they acted like it was my fault and told me the charges are still pending even though they are for 2 different amounts.", 
                                            "I went to McDonald's and they charge me 50 for Big Mac when I only came with 49. The casher told me that I can't read correctly and told me to get glasses. I am file a report on your casher and now I'm mad.", 
                                            "I really think that if you can buy breakfast anytime then I should be able to get a cheeseburger anytime especially since I really don't care for breakfast food. I really like McDonald's food but I preferred tree lunch rather than breakfast. Thank you thank you thank you.", 
                                            "I guess the employee decided to buy their lunch with my card my card hoping I wouldn't notice but since it took so long to run my car I want to head and check my bank account and sure enough they had bought food on my card that I did not receive leave. Had to demand for and for a refund because they acted like it was my fault and told me the charges are still pending even though they are for 2 different amounts.", 
                                            "Never order McDonald's from Uber or Skip or any delivery service for that matter, most particularly one on Elgin Street and Rideau Street, they never get the order right. Workers at either of these locations don't know how to follow simple instructions. Don't waste your money at these two locations.", 
                                            "Employees left me out in the snow and wouldn’t answer the drive through. They locked the doors and it was freezing. I asked the employee a simple question and they were so stupid they answered a completely different question. Dumb employees and bad food.", 
                                            "McDonalds food was always so good but ever since they add new/more crispy chicken sandwiches it has come out bad. At first I thought oh they must haven't had a good day but every time I go there now it's always soggy, and has no flavor. They need to fix this!!!", 
                                            "I just ordered the new crispy chicken sandwich and I'm very disappointed. Not only did it taste horrible, but it was more bun than chicken. Not at all like the commercial shows. I hate sweet pickles and there were two slices on my sandwich. I wish I could add a photo to show the huge bun and tiny chicken."
)), class = "data.frame", row.names = c(NA, -8L))

我试图按照这里的说明https://cran.r-project.org/web/packages/BTM/readme/README.html来进行可视化。

我试图通过运行以下代码来开始这个过程:

代码语言:javascript
运行
复制
udpipe_download_model("english-ewt", model_dir = "~/Desktop/")

eng_model = udpipe_load_model("~/Desktop/english-ewt-ud-2.5-191206.udpipe")

# line with error
out = udpipe(my_data$reviews, object = eng_model)

但现在我明白了:

代码语言:javascript
运行
复制
Error in `[.data.table`(out, , `:=`(term_id, 1L:.N), by = list(doc_id)) : 
  Supplied 2 items to be assigned to group 1 of size 0 in column 'term_id'. The RHS length must either be 1 (single values are ok) or match the LHS length exactly. If you wish to 'recycle' the RHS please use rep() explicitly to make this intent clear to readers of your code.
In addition: Warning message:
In strsplit(x$conllu, "\n", fixed = TRUE) : input string 1 is invalid UTF-8

有人知道为什么会产生这个错误吗?我能做些什么来修复它呢?

谢谢!

代码语言:javascript
运行
复制
> sessionInfo()
R version 4.1.3 (2022-03-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22000)

Matrix products: default

locale:
[1] LC_COLLATE=English_Canada.1252  LC_CTYPE=English_Canada.1252    LC_MONETARY=English_Canada.1252 LC_NUMERIC=C                    LC_TIME=English_Canada.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] BTM_0.3.6    udpipe_0.8.9

loaded via a namespace (and not attached):
[1] compiler_4.1.3    Matrix_1.4-0      tools_4.1.3       Rcpp_1.0.8.3      tinytex_0.40      grid_4.1.3        data.table_1.14.2 xfun_0.30         lattice_0.20-45  
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-09-21 09:18:49

这是因为您没有按照udpipe文档中的要求以UTF-8编码方式(my_data$reviews,object = eng_model)传递您的文本数据,如果您的文本是latin1的话,下面的内容可能会工作。

代码语言:javascript
运行
复制
out = udpipe(iconv(my_data$reviews, from = 'latin1', to = 'UTF-8'), object = eng_model)
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/73791569

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档