文章/答案/技术大牛

发布

社区首页 >问答首页 >使用Python ()将文本文件转换为列表，同时还将一些行组合为列表中的单个项

问使用Python ()将文本文件转换为列表，同时还将一些行组合为列表中的单个项
EN

Stack Overflow用户

提问于 2021-01-13 15:39:49

回答 1查看 262关注 0票数 0

我正在努力将最初用于VBA应用程序的文本文件转换为新Python应用程序的字符串列表。每个文本文件在带有多个字符串的单独行上都有“向量”，但为了简单起见，我只是给每个文本文件一个字符串。我遇到的问题是，由于Excel/VBA的行字符限制，向量占用多行。以下是一个例子：

向量(1)=“这是第一个只需要1行的向量！”

向量(2)=“这是，的一些内容--向量2的文本，但是它还在继续！”

向量(2)=向量(2)&“这是向量2文本的延续！”

向量(3)=“这是一个只有一行的新向量！”

我试图做的是迭代由splitlines()创建的列表，创建一个新的列表，通过回顾前面的行，看看它是否具有相同的“向量(X)”标签，然后在添加到最终列表之前连接字符串。但是，它随后将未完成的字符串和连接的字符串添加到列表中。下面是我使用的代码：

import os
import re

Lines = open(doc).read().splitlines()
New_Lines = []
previous_label = 0
vector_label = 0
previous_contents = 0
vector_contents = 0
for z, vector_check in enumerate(Lines, 1):
    if vector_check.startswith("vector"):
        v_split = re.split(r"=", vector_check)
        previous_label = vector_label
        vector_label = v_split[0]
        previous_contents = vector_contents
        vector_contents = v_split[1]
    else :
        continue
    # print(vector_label)
    if previous_label != vector_label:
        repeat = 0
        New_Lines.append(vector_contents)
    else :
        repeat += 1
        vec_split_2 = re.split(r"&", v_split[1])
        vector_contents = previous_contents[:-1] + " " + vec_split_2[1][2:]
        New_Lines.append(vector_contents)
        print(vector_contents)
        continue
i = 1
for obj in New_Lines:
    print("vector_CRS(" + str(i) + ")=" + obj)
    i += 1

给出了结果：

vector_CRS(1)=“这是只需一行的第一个向量！”

Vector_CRS(2)=“这是，向量2的一些文本，但是它还在继续！”

Vector_CRS(3)=“这是，向量2的一些文本，但是它还在继续！这是向量2文本的延续！”

vector_CRS(4)=“这是一个只有一行的新向量！”

我也尝试过在列表中进行展望(这就是枚举存在的原因)，但结果比这些更糟糕。这是一个更大的脚本的“拼图”的最后一块，尽管它感觉很简单，好像我错过了一个简单的答案，但我花了几个小时来修复这个部分。

python

python-re

txt

回答 1

Stack Overflow用户

发布于 2021-01-13 16:21:24

如果您有一个文本文件vectors.txt，它如下所示：

vector(1)="This is the first vector that only takes 1 line!"
vector(2)="This is some of the text for vector 2 but it continues!"
vector(2)= vector(2) & "This is the continuation of the text for vector 2!"
vector(3)= "This is a new vector with only a single line!"

您可以使用itertools.groupby根据向量的数字对其进行分组，使用正则表达式。然后，使用另一个正则表达式，将组中每个向量的所有内容合并：

def main():

    with open("vectors.txt", "r") as file:
        lines = file.read().splitlines()

    def merge_vectors(lines):
        from itertools import groupby
        import re

        for _, group in groupby(lines, key=lambda line: re.match(r"vector\((\d+)\)", line).group(1)):
            yield " ".join(re.search("\"(.+)\"", item).group(1) for item in group)

    print(list(merge_vectors(lines)))
    
    return 0


if __name__ == "__main__":
    import sys
    sys.exit(main())

输出：

['This is the first vector that only takes 1 line!', 'This is some of the text for vector 2 but it continues! This is the continuation of the text for vector 2!', 'This is a new vector with only a single line!']
>>>

这假设vectors.txt文件中的行已经按照向量数分组在一起。例如，它假定您不能拥有以下内容：

vector(1)="Part of one"
vector(2)="Part of two"
vector(1)= vector(1) & "Also part of one"

编辑-我看了一下你的repl.it中的文本文件。我对regex模式和代码做了一些更改--我只是做了几个更明确的步骤。现在的模式比较宽松，例如，像vector(2)= vector(2) & ""这样的东西将不再抛出异常，但是由于双引号之间没有内容，所以它将被忽略。不以双引号结尾的行也会被处理。所有的行在被处理之前都会被过滤，所以只包含以vector_CRS(...)开头的行，所以您不再需要手动跳过前五行了。

def main():

    import re

    line_pattern = r"vector_CRS\((?P<vector_number>\d+)\)"
    content_pattern = "\"(?P<content>.*)\"?"

    def is_vector_line(line):
        return re.match(line_pattern, line) is not None

    with open("vectors.txt", "r") as file:
        lines = list(map(str.strip, filter(is_vector_line, file)))

    def merge_vectors(lines):
        from itertools import groupby

        def key(line):
            return re.match(line_pattern, line).group("vector_number")

        def get_content(item):
            return re.search(content_pattern, item).group("content")

        for _, group in groupby(lines, key=key):
            yield " ".join(filter(None, map(get_content, group)))

    merged = list(merge_vectors(lines))

    return 0


if __name__ == "__main__":
    import sys
    sys.exit(main())

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/65705113

复制

相似问题

问使用Python ()将文本文件转换为列表，同时还将一些行组合为列表中的单个项
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用Python ()将文本文件转换为列表，同时还将一些行组合为列表中的单个项EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用Python ()将文本文件转换为列表，同时还将一些行组合为列表中的单个项
EN