投入:
下面的列表提供了不应单独计费的项目和服务的示例。请注意,该列表并不是全部包含.。
1.日常用品-医院基本病房和急诊室(急诊、观察、治疗室、心脏、医疗、外科、儿科、呼吸、烧伤、新生儿(三级和四级)、神经学、康复、麻醉后或康复和创伤)每日收费应包括下列所有服务、个人护理和供应物品和设备
2.医疗设备-医院基本病房和急诊室(急诊、观察、治疗室、心脏、医疗、外科、儿科、呼吸、烧伤、新生儿(三级和四级)、神经学、康复、麻醉后或康复和创伤)每日收费应包括下列所有服务、个人护理和供应物品和设备
这是我的代码,但它没有给出我想要的准确输出。
import pdfplumber
import re
demo = []
with pdfplumber.open('HCSC IL Inpatient_Outpatient Unbundling Policy- Facility.pdf') as pdf:
for i in range(0, 50):
try:
text = pdf.pages[i]
clean_text = text.filter(lambda obj: obj["object_type"] == "char")
demo.append(str(re.findall(r'(\d+\.\s.*\n?)+', clean_text.extract_text())).replace('[]', ' '))
except IndexError:
print("")
break
我希望在CSV或表中输出
Section description
Routine Supplies The hospital basic room and critical care area room (emergency
department, observation, treatment room, cardiac, medical,
surgical, pediatric, respiratory, burn, neonate (level III and
IV), neurological, rehabilitative,post-anesthesia or recovery,
andtrauma) daily charge shall include all of the following
services, personal care and supply items and equipment
Medical Equipment The hospital basic room and critical care area room(emergency
department, observation, treatment room, cardiac,medical,
surgical, pediatric,respiratory, burn, neonate (level III and
IV), neurological, rehabilitative, post- anesthesia or
recovery, andtrauma) daily charge shall include all of the
following services, personal care and supply items and
equipment
发布于 2022-02-01 18:57:24
如果您只想要一组包含两个项的行,则需要让正则表达式标识所需的项。如果我将您的文本存储在text
中,则此操作如下:
text = """\
The list below provides examples of items and services that should not be billed separately. Please note that the list is not all inclusive.
1. Routine Supplies - The hospital basic room and critical care area room (emergency department, observation, treatment room, cardiac, medical, surgical, pediatric, respiratory, burn, neonate (level III and IV), neurological, rehabilitative,post-anesthesia or recovery, andtrauma) daily charge shall include all of the following services, personal care and supply items and equipment
2. Medical Equipment- The hospital basic room and critical care area room(emergency department, observation, treatment room, cardiac,medical, surgical, pediatric,respiratory, burn, neonate (level III and IV), neurological, rehabilitative, post- anesthesia or recovery, andtrauma) daily charge shall include all of the following services, personal care and supply items and equipment"""
import re
from pprint import pprint
pattern = "\d+\.\s*([^-]*)- (.*)"
demo = re.findall(pattern, text)
pprint(demo)
输出:
[('Routine Supplies ',
'The hospital basic room and critical care area room (emergency department, '
'observation, treatment room, cardiac, medical, surgical, pediatric, '
'respiratory, burn, neonate (level III and IV), neurological, '
'rehabilitative,post-anesthesia or recovery, andtrauma) daily charge shall '
'include all of the following services, personal care and supply items and '
'equipment'),
('Medical Equipment',
'The hospital basic room and critical care area room(emergency department, '
'observation, treatment room, cardiac,medical, surgical, '
'pediatric,respiratory, burn, neonate (level III and IV), neurological, '
'rehabilitative, post- anesthesia or recovery, andtrauma) daily charge shall '
'include all of the following services, personal care and supply items and '
'equipment')]
请注意,它只是在线之间断裂,因为有压印。每个元组包含两件事:标题和段落文本。
https://stackoverflow.com/questions/70945713
复制相似问题