我有一些工作描述保存为txt文件格式。职位名称,职位描述,职位名称等都集中在一起,我试图将它们分成几列。正文大约有5页长。以下是文本结构的示例-
EXECUTIVE LEVEL
001 Chief Executive Officer: Job description of CEO.
040 Area Director: This line contains job description of the Area Director.
FINANCE TEAM
025 Chief Operating Officer: This line contains job description of the Chief Operating Officer
055 Chief Financial Officer: This person controls operations of the company and reports to the COO
MARKETING TEAM
056 Marketing Director: This person is in charge of the marketing team. Blab la bla我想创建一个dataframe (或者现在叫tibble?)有4列-
第1列-团队名称(高管级别、财务团队、营销团队等)
第2列-团队编号(001,040 025,055等)
第3栏-职位(首席执行官、首席运营官等)
第4列-工作描述
提前感谢
发布于 2021-01-28 06:25:33
x2 <- x[nzchar(x)]
x3 <- split(x2, cumsum(grepl("^[A-Z]", x2)))
x4 <- lapply(x3, function(z) transform(strcapture("^([0-9]+)\\s+([^:]+):\\s*(.*)$", z[-1], list(num="", title="", desc="")), name=z[1]))
x5 <- do.call(rbind, x4)
x5
# num title desc name
# 1.1 001 Chief Executive Officer Job description of CEO. EXECUTIVE LEVEL
# 1.2 040 Area Director This line contains job description of the Area Director. EXECUTIVE LEVEL
# 2.1 025 Chief Operating Officer This line contains job description of the Chief Operating Officer FINANCE TEAM
# 2.2 055 Chief Financial Officer This person controls operations of the company and reports to the COO FINANCE TEAM
# 3 056 Marketing Director This person is in charge of the marketing team. Blab la bla MARKETING TEAM数据,很可能是x <- readLines(path_to_file)的结果。
x <- c("EXECUTIVE LEVEL", "001 Chief Executive Officer: Job description of CEO.", "040 Area Director: This line contains job description of the Area Director.", "", "FINANCE TEAM", "025 Chief Operating Officer: This line contains job description of the Chief Operating Officer", "055 Chief Financial Officer: This person controls operations of the company and reports to the COO", "", "MARKETING TEAM", "056 Marketing Director: This person is in charge of the marketing team. Blab la bla")https://stackoverflow.com/questions/65927895
复制相似问题