我有一个包含许多行业名称的因素。我需要把它们分解成主要的类别和行业。例如,由于我容许被访者随心所欲地作出回应,所以我有一个被夸大的水平(例如金融服务、金融服务、银行业务、金融)。因为这些情况不匹配,它们作为一个额外的层次出现,所以我试图用forcats来折叠它们:
test <- fct_collapse(PrescreenF$Industry, Finance = c("Banking",
"Corporate Finance", "Finance", "Financial", "financial services",
"financial services", "Financial Services", "Financial services"),
NULL = "H")
我收到一个警告:“金融服务”是未知的。这是非常令人沮丧的,因为当我调用向量时,我可以看到它确实存在。我试着复制和粘贴调用中的准确单词,重写它,但似乎有隐藏的字符阻止了它的更改。
如何正确地折叠这些值?
-> test$industry
Banking
Corporate Finance
Finance Financial
financial services
financial services
Financial Services
Financial services
当我转到最后一层“金融服务”(Financial)时,它告诉我它是一个未知的字符串。
编辑dput的输出(x美元工业)
structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L,
4L, 3L, 3L, 3L, 5L, 7L, 8L, 9L, 10L, 11L, 12L, 12L, 13L, 14L,
15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 16L, 16L, 16L, 16L,
16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L,
16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 17L, 18L, 18L, 18L,
18L, 19L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 25L, 26L, 27L, 28L
), .Label = c("", "{\"ImportId\":\"QID8_TEXT\"}", "Finance",
"Financial ", "Financial services ", "Please indicate the industry you work in (e.g. technology, healthcare etc):",
"Cleantech", "Delivery", "e-commerce/fashion", "Food", "Food & Bev",
"Retail", "Service", "tech", "technology", "Technology", "IT, technology",
"Software", "Technology ", "Tehcnology", "Consulting", "Digital advertising",
"Education", "Higher education", "Technology, management consulting",
"University professor; teaching, research and service", "Information Technology and Services",
"mobile technology"), class = "factor")
编辑弄清楚了。其中一些条款在结束后有额外的空间。例如,虽然当我调用Prescreen$Industry时,它会返回一些名称,比如“银行”和“企业金融”,但它并没有告诉我在这个级别之后还有一个空格。银行业实际上是..。没有出现在R中的“银行”,如何确保这是可见的,不会再次发生?
我可以在列中运行len函数吗?如果是的话,这是如何运作的呢?(“银行业务”)
发布于 2017-10-05 21:14:19
如果"x“是您的dataframe
library(stringr)
x$industry <- as.character(x$industry)
x$industry <- str_trim(x$industry)
x$industry <- as.factor(x$industry)
然后,您可以回到fct_collapse()
来简化您的因素。
https://stackoverflow.com/questions/46592154
复制相似问题