问将字符向量转换为数据
EN

Stack Overflow用户

提问于 2017-10-22 13:31:56

回答 1查看 2.6K关注 0票数 1

我想练习网络抓取，并使用'R‘和'rvest’包为它。现在，我有了一个由125个元素组成的字符向量(p_text)，并希望将其转换为数据格式。有25行5列，名称为q1、opt1、opt2、opt3、opt4。

所以元素1,5,10列= q1；2,6,11列= opt1；3,7,12列= opt2；等等。

library(dplyr)    
library(rvest)

url <- 'http://upscfever.com/upsc-fever/en/test/en-test-sci1.html'

webpage <- read_html(url)

p_text <- webpage %>%
        html_nodes("label") %>%
        html_text()

怎么做？

回答 1

Stack Overflow用户

回答已采纳

发布于 2017-10-22 14:11:53

转换为矩阵以正确排列事物，然后转换为数据框架：

dat <- as.data.frame(matrix(p_text, ncol = 5, byrow = TRUE), stringsAsFactors = FALSE)
names(dat) <- c("q1", "opt1", "opt2", "opt3", "opt4")

str(dat)
## 'data.frame':   25 obs. of  5 variables:
##  $ q1  : chr  "Q1: Energy giving foods are " "Q2:Animal fats are categorized as" "Q3: Which is true" "Q4: Trans fats are" ...
##  $ opt1: chr  "Carbohydrates and fats" "saturated fatty acids" "saturated fatty acids are good for health" "unsaturated fats" ...
##  $ opt2: chr  "Carbohydrates and Proteins" "unsaturated fatty acids" "unsaturated fatty acids are harmful for health" "saturated fats" ...
##  $ opt3: chr  "Proteins and fats" "polyunsaturated fatty acids" "unsaturated fatty acids are good for health" "good for health" ...
##  $ opt4: chr  "carbohydrates, fats and proteins" "trans fats" "Animal fats are good for health" "animal fats" ...

如果您想清除q1列，您可能要这样做：

dat$q1 <- sub("^Q\\d{1,2}:[ ]?", "", dat$q1)

删除前面的问号、冒号等。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/46874660

复制

相似问题

问将字符向量转换为数据
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问将字符向量转换为数据EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问将字符向量转换为数据
EN