首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >从文本文件中提取特定项用于标记化。

从文本文件中提取特定项用于标记化。
EN

Stack Overflow用户
提问于 2018-11-11 13:10:42
回答 2查看 72关注 0票数 0

下面是文本文件"info.txt“的结构。从这个文件中,我需要提取ID和描述(任何准确提取ID和描述信息的方法)。文件中大约有500个ID和描述实例。一个ID表示文本文件中的一个标题和一个描述。

第一部分我不确定是否要将ID和描述信息存储在2个列表中。如果我使用列表,那么我能够使用“描述”列表来标记每个描述(记住,这个列表中有500个描述)。

代码语言:javascript
运行
复制
ID: #22579462
Title: Quality Engineer
Description: Our client are a leading supplier of precision machined, high integrity components, integrated kits of parts and complete mechanical assemblies. Due to an large increase in workload they are recruiting a Quality Engineer Reporting to the Quality Manager, the successful individual will be responsible for providing documentation to fulfil our customers quality assurance requirements on specific contracts, whilst maintaining a system of storage and retrieval for documentation. The role will also support the internal audit schedule, performing audits as required. Responsibilities include: Documentation Checking all vendor supplied documentation to ensure it complies with the requirements or Express s customer specifications. Produce accurate, legible documentation packs, in accordance with customer requirements. Quality Systems Maintain system of storage and retrieval of all associated QA documentation in accordance with ISO9001:**** Certification Ensure certificates of conformance are checked, in accordance with the C of C matrix and any applicable concessions are referenced Material Certification Verify and approve certification on receipt for conformance to customer requirements and resolve discrepancies with suppliers Non conformance Raise and submit supplier reject reports and concessions. Store all responses received in relevant databases. Internal Auditing Carry out internal audits as and when required in line with the internal audit schedule. Identify and report all nonconformances within Quality Management System, and assist in corrective actions to close them out Supplier Rejects Ensure corrective action is received for supplier rejects submitted to key suppliers The Individual: Has experience within the quality department of a related company in a similar role Ideally from a mechanical or manufacturing engineering background. Ideally be familiar with the range of processes involved in the markets of Oil Must have good communication and organisational skills Has the ability to work as part of a team or as an individual. Has the ability to be customer facing and discuss technical / quality issues with vendors and customers

ID: #22933091
Title: Chef de Partie  Award Winning Dining  Live In  Share of Tips
Description: A popular hotel located in Norfolk which is a very busy operation has a position available for a Chef de Partie Role: A Chef de Partie capable of coping well under pressure is required to join the kitchen team at a hotel that has an excellent reputation for offering high quality dining to its guests and has gained accreditations in the main restaurant.The busy Brasserie style restaurant regularly serves **** covers for lunch and dinner so this Chef de Partie role will require you to be organised on your section ensuring all prep is complete to the standards expected by the Head Chef before each service. Requirements: All Chef de Parties applying for this role must have a strong background with highlights previous AA Rosette experience in a high volume operation.A candidate who is self motivated and capable of working well in a busy team of chefs would be ideal for this role. Benefits Include: Uniform Provided Meals on Duty Accommodation Available Share of Tips – IRO **** Per Month Excellent Opportunities To Progress If you are interested in this position or would like information on the other positions we are recruiting for or any temporary assignments please send your CV by clicking on the 'apply now' button below and our consultant Sean Bosley will do his utmost to assist you in your search for employment. In line with the requirements of the Asylum Immigration Act **** all applicants must be eligible to live and work in the UK. Documented evidence of the eligibility will be required from candidates as part of the recruitment process. This job was originally posted as  

ID: #23528672
Title: Senior Fatigue and Damage Tolerance Engineer
Description: Senior Fatigue Static stress (metallic or composite) Finite element analysis. Senior Fatigue Aerospace  ****K****K (dep on exp)  benefits package Bristol, Avon

ID: #23529949
Title: C I Design Engineer
Description: We are currently recruiting on behalf of our client who have an exciting opportunity available for a CE Produce CE Control Panel designs  Genera Arrangements, Detail drawings, Schematics Diagrams, Interlock Diagrams for typically PLC Specification of hardware and production of parts list. Manufacturing specification. Ensure Company policies and procedures are being applied across the projects. Manage the interface between CE Communicate at all levels with both internal and external customers to meet their expectations while meeting the project budget and programme constraints. Support the Lead Engineer in the delivery of scope to budget and programme. Provide technical expertise to tenders as and when required. Provide input to the development of the CE&l function and resource

这里有几件事我想要实现,一是用word_string:integer_index格式创建一个包含所有描述的单字词汇表。第二,创建一个文本文件,其中每一行对应一个描述。行将以ID开始(保留#)。每一行的其余部分都是以逗号分隔的word_index:word_freq形式的相应描述的稀疏表示形式。

我想这就是为什么我认为在列表中存储ID和描述信息是理想的。这样,ID列表中的索引0将为#22579462,而描述列表中的索引0将是相应的描述文本。

提前感谢

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2018-11-11 22:17:27

您可以一次读取文件,然后使用解析它。“rslt”列表包含(ID,Description)元组:

代码语言:javascript
运行
复制
with open("info.txt") as ff:
    rslt= re.findall(r"(?sm)^\s*ID:\s*#(\d+)\s*$.*?^Description:(.*?)(?:\s*(?=^ID: #)|\Z)",ff.read())

(?sm) -> m:多行模式,s:点(.)与新线路相匹配;

^\s*ID:\s*#(\d+)

S*$->在数字之后,行只能包含空格;

.*?^Description:->跳过标题,并匹配"Descripttion:“模式;

(?)(?:\s(?=^ID:#)\z)-> (.*?)获取以"ID:#“或字符串\Z结尾开头的下一个块的描述文本(分组)。

票数 2
EN

Stack Overflow用户

发布于 2018-11-11 15:43:01

正如注释中所述,您的数据似乎导致您使用字典。首先,创建一个忽略空行的函数。空白函数可以找到here,这是一个很好的解释。然后,调用该函数逐行导入txt并保存在字典中。最后,将生成一个数据文件,其中索引是您的ID。

代码语言:javascript
运行
复制
import pandas as pd
file=r"C:\***\***\info.txt".replace('\\', '/')
d={}

def nonblank_lines(f):#ingore blank lines
    for l in f:
        line = l.rstrip()
        if line:
            yield line
#importing txt line by line into a dictionary   
with open(file) as my_file:
    for line in nonblank_lines(my_file):
        key = line.split(': ')[0]
        if key not in d:#if key not in dictionary then create empty
            d[key] = []
        d[key].append(line.split(': ')[1])#populate the keys
#drop unwanted keys
my_keys=['Description','ID','Title']
for key, value in d.items():
    if key not in my_keys:
        del(d[key])
#Create a df with ID as index and the rest of data in columns
df=pd.DataFrame(data={your_key:d[your_key] for your_key in ['Description','Title']},index=d.get('ID'),columns=['Description','Title'])
df.to_csv(r'path\filename.txt',sep=',', index=True, header=True)#save your df
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/53249062

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档