首页
学习
活动
专区
工具
TVP
发布
社区首页 >问答首页 >R-将字符矢量转换为数据帧

R-将字符矢量转换为数据帧
EN

Stack Overflow用户
提问于 2018-06-06 01:44:02
回答 3查看 2.4K关注 0票数 2

这似乎应该是一个相当简单的问题,但我似乎找不到一个简单的解决方案。

我有一个字符列表,如下所示:

代码语言:javascript
复制
my_info <- c("Fruits",
             "North America",
             "Apples",
             "Michigan",
             "Europe",
             "Pomegranates",
             "Greece",
             "Oranges",
             "Italy",
             "Vegetables",
             "North America",
             "Potatoes",
             "Idaho",
             "Avocados",
             "California",
             "Europe",
             "Artichokes",
             "Italy",
             "Meats",
             "North America",
             "Beef",
             "Illinois")

我想把这个字符向量解析成一个数据帧,看起来像这样:

screenshot of R console

食物类型和区域列表将始终保持不变,但食物及其位置可能会发生变化。

代码语言:javascript
复制
food_type <- c("Fruits","Vegetables","Meats")
region <- c("North America","Europe")

我在想我需要使用像str_split这样的东西,但是使用food_types和regions作为某种分隔符?但我不确定该如何继续。字符向量确实有一个顺序。

谢谢。

EN

回答 3

Stack Overflow用户

发布于 2018-06-06 02:07:50

一种解决方案是首先使用ncol = 4my_info向量转换为矩阵。这将在矩阵/数据帧中拆分您的向量。

现在,您可以应用for food_typeregion的规则,并交换存在于其他列中的任何food_typeregion

注意:我请求OP检查数据一次,似乎每4个元素都不能与OP提供的描述组成一个完整的行。

代码语言:javascript
复制
df <- as.data.frame(matrix(my_info, ncol = 4, byrow = TRUE))

names(df) <- c("Foodtype", "Region", "Food", "Location")

food_type <- c("Fruits","Vegetables","Meats")
region <- c("North America","Europe")

t(apply(df,1,function(x){
  for(i in seq_along(x)){
    #One can think of writing a swap function here. 
    if(x[i] %in% region ){
      temp = x[i]
      x[i] = x[2]
      x[2] = temp
    }
    #Swap any food_type wrongly placed in other column
    if(x[i] %in% food_type ){
      temp = x[i]
      x[i] = x[1]
      x[1] = temp
    }

  }
  x
}))


#       Foodtype       Region          Food         Location  
# [1,] "Fruits"       "North America" "Apples"     "Michigan"
# [2,] "Pomegranates" "Europe"        "Greece"     "Oranges" 
# [3,] "Vegetables"   "North America" "Italy"      "Potatoes"
# [4,] "Idaho"        "Europe"        "California" "Avocados"
# [5,] "Meats"        "North America" "Artichokes" "Italy"   
# [6,] "Fruits"       "North America" "Beef"       "Illinois"
# 
票数 2
EN

Stack Overflow用户

发布于 2018-06-06 03:51:55

我有一个很长的解决方案,但只要食物和位置始终保持相同的顺序,我就应该工作。

首先用dplyr创建一些data.frames。

代码语言:javascript
复制
library(dplyr)

info <- data_frame(my_info = my_info) 
region <- data_frame(region_id = region, region = region)
food_type <- data_frame(food_type_id = food_type, food_type)

接下来,创建一个将所有这些连接在一起的data.frame,并使用tidyr填充缺少的值,并删除我们不需要的行。然后,最重要的技巧是最后一个技巧,它基于顺序始终相同的假设创建一个cols列

代码语言:javascript
复制
library(tidyr)

df <- info %>% 
  left_join(food_type, by = c("my_info" = "food_type_id")) %>% 
  left_join(region, by = c("my_info" = "region_id")) %>% 
  fill(food_type) %>% 
  group_by(food_type) %>% 
  fill(region) %>% 
  filter(!is.na(region) & !(my_info == region)) %>% 
  ungroup %>% 
  mutate(cols = rep(c("food", "location"), group_size(.)/2 ))

这将返回:

代码语言:javascript
复制
# A tibble: 14 x 4
   my_info      food_type  region        cols    
   <chr>        <chr>      <chr>         <chr>   
 1 Apples       Fruits     North America food    
 2 Michigan     Fruits     North America location
 3 Pomegranates Fruits     Europe        food    
 4 Greece       Fruits     Europe        location
 5 Oranges      Fruits     Europe        food    
 6 Italy        Fruits     Europe        location
 7 Beef         Meats      North America food    
 8 Illinois     Meats      North America location
 9 Potatoes     Vegetables North America food    
10 Idaho        Vegetables North America location
11 Avocados     Vegetables North America food    
12 California   Vegetables North America location
13 Artichokes   Vegetables Europe        food    
14 Italy        Vegetables Europe        location

接下来,使用tidyr将cols展开到食物和位置列中。

代码语言:javascript
复制
df <- df %>%
  group_by(food_type, region, cols) %>%
  mutate(ind = row_number()) %>% 
  spread(cols, my_info) %>% 
  select(-ind)

# A tibble: 7 x 4
# Groups:   food_type, region [5]
  food_type  region        food         location  
  <chr>      <chr>         <chr>        <chr>     
1 Fruits     Europe        Pomegranates Greece    
2 Fruits     Europe        Oranges      Italy     
3 Fruits     North America Apples       Michigan  
4 Meats      North America Beef         Illinois  
5 Vegetables Europe        Artichokes   Italy     
6 Vegetables North America Potatoes     Idaho     
7 Vegetables North America Avocados     California

这一切都可以一气呵成,只需删除创建data.frame的中间步骤。

票数 0
EN

Stack Overflow用户

发布于 2018-06-06 04:54:45

这里有三个替代方案。它们都使用来自zoo的na.locf0和仅在第一个中显示的cn载体。

1)假设cn是一个长度与my_info相同的向量,它标识my_info的元素属于输出的哪个列号。假设cdef是1:4的输出列定义向量,输出列名作为其名称。然后,为每个输出列创建一个长度与my_info相同的向量,其行对应于该列,并为其他元素创建NAs。然后使用na.locf0填充NA值并获取与第4列对应的元素。

代码语言:javascript
复制
library(zoo)

cn <- (my_info %in% food_type) + 2 * (my_info %in% region)
cn[cn == 0] <- 3:4

cdef <- c(food_type = 1, region = 2, food = 3, location = 4)

m <- sapply(cdef, function(i) na.locf0(ifelse(cn == i, my_info, NA))[cn == 4])

给予:

代码语言:javascript
复制
> m
     food_type    region          food           location    
[1,] "Fruits"     "North America" "Apples"       "Michigan"  
[2,] "Fruits"     "Europe"        "Pomegranates" "Greece"    
[3,] "Fruits"     "Europe"        "Oranges"      "Italy"     
[4,] "Vegetables" "North America" "Potatoes"     "Idaho"     
[5,] "Vegetables" "North America" "Avocados"     "California"
[6,] "Vegetables" "Europe"        "Artichokes"   "Italy"     
[7,] "Meats"      "North America" "Beef"         "Illinois"  

我们创建了字符矩阵输出,因为输出完全是字符,但如果您无论如何都想要一个数据帧,那么使用:

代码语言:javascript
复制
as.data.frame(mm, stringsAsFactors = FALSE)

2)可替换地,我们可以通过将m放入NAs的n×4矩阵mm的位置(i,cni)中,使用na.locf来填充NAs并取对应于列4的那些行,来从cn创建cn

代码语言:javascript
复制
n <- length(my_info)
m2 <- na.locf(replace(matrix(NA, n, 4), cbind(1:n, cn), my_info))[cn == 4, ]
colnames(m2) <- c("food_type", "region", "food", "location")

identical(m2, m) # test
## [1] TRUE

3)cn创建m的第三种选择是逐列构造矩阵,如下所示:

代码语言:javascript
复制
m3 <- cbind( food_type = na.locf0(ifelse(cn == 1, my_info, NA))[cn == 3], 
        region = na.locf0(ifelse(cn == 2, my_info, NA))[cn == 3], 
        food = my_info[cn == 3], 
        location = my_info[cn == 4])

identical(m, m3) # test
## [1] TRUE
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/50706041

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档