首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >如何将段落转换为表格?

如何将段落转换为表格?
EN

Stack Overflow用户
提问于 2022-02-01 18:31:50
回答 1查看 55关注 0票数 2

投入:

下面的列表提供了不应单独计费的项目和服务的示例。请注意,该列表并不是全部包含.

1.日常用品-医院基本病房和急诊室(急诊、观察、治疗室、心脏、医疗、外科、儿科、呼吸、烧伤、新生儿(三级和四级)、神经学、康复、麻醉后或康复和创伤)每日收费应包括下列所有服务、个人护理和供应物品和设备

2.医疗设备-医院基本病房和急诊室(急诊、观察、治疗室、心脏、医疗、外科、儿科、呼吸、烧伤、新生儿(三级和四级)、神经学、康复、麻醉后或康复和创伤)每日收费应包括下列所有服务、个人护理和供应物品和设备

这是我的代码,但它没有给出我想要的准确输出。

代码语言:javascript
运行
复制
import pdfplumber
import re
demo = []
with pdfplumber.open('HCSC IL Inpatient_Outpatient Unbundling Policy- Facility.pdf') as pdf: 
    for i in range(0, 50):
        try:
            text = pdf.pages[i]  
            clean_text = text.filter(lambda obj: obj["object_type"] == "char")
            demo.append(str(re.findall(r'(\d+\.\s.*\n?)+', clean_text.extract_text())).replace('[]', ' '))
        except IndexError:
            print("")
            break

我希望在CSV或表中输出

代码语言:javascript
运行
复制
Section                            description

Routine Supplies                The hospital basic room and critical care area room (emergency 
                                department, observation, treatment room, cardiac, medical, 
                                surgical, pediatric, respiratory, burn, neonate (level III and 
                                IV), neurological, rehabilitative,post-anesthesia or recovery, 
                                andtrauma) daily charge shall include all of the following 
                                services, personal care and supply items and equipment

Medical Equipment               The hospital basic room and critical care area room(emergency 
                                department, observation, treatment room, cardiac,medical, 
                                surgical, pediatric,respiratory, burn, neonate (level III and 
                                IV), neurological, rehabilitative, post- anesthesia or 
                                recovery, andtrauma) daily charge shall include all of the 
                                following services, personal care and supply items and 
                                equipment 
EN

回答 1

Stack Overflow用户

发布于 2022-02-01 18:57:24

如果您只想要一组包含两个项的行,则需要让正则表达式标识所需的项。如果我将您的文本存储在text中,则此操作如下:

代码语言:javascript
运行
复制
text = """\
The list below provides examples of items and services that should not be billed separately. Please note that the list is not all inclusive.

1. Routine Supplies - The hospital basic room and critical care area room (emergency department, observation, treatment room, cardiac, medical, surgical, pediatric, respiratory, burn, neonate (level III and IV), neurological, rehabilitative,post-anesthesia or recovery, andtrauma) daily charge shall include all of the following services, personal care and supply items and equipment

2. Medical Equipment- The hospital basic room and critical care area room(emergency department, observation, treatment room, cardiac,medical, surgical, pediatric,respiratory, burn, neonate (level III and IV), neurological, rehabilitative, post- anesthesia or recovery, andtrauma) daily charge shall include all of the following services, personal care and supply items and equipment"""

import re
from pprint import pprint
pattern = "\d+\.\s*([^-]*)- (.*)"
demo = re.findall(pattern, text)
pprint(demo)

输出:

代码语言:javascript
运行
复制
[('Routine Supplies ',
  'The hospital basic room and critical care area room (emergency department, '
  'observation, treatment room, cardiac, medical, surgical, pediatric, '
  'respiratory, burn, neonate (level III and IV), neurological, '
  'rehabilitative,post-anesthesia or recovery, andtrauma) daily charge shall '
  'include all of the following services, personal care and supply items and '
  'equipment'),
 ('Medical Equipment',
  'The hospital basic room and critical care area room(emergency department, '
  'observation, treatment room, cardiac,medical, surgical, '
  'pediatric,respiratory, burn, neonate (level III and IV), neurological, '
  'rehabilitative, post- anesthesia or recovery, andtrauma) daily charge shall '
  'include all of the following services, personal care and supply items and '
  'equipment')]

请注意,它只是在线之间断裂,因为有压印。每个元组包含两件事:标题和段落文本。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/70945713

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档