问在Python3.6中保存/加载大型字符串集合(列表/集合)的最快方法是什么？
EN

Stack Overflow用户

提问于 2018-07-05 06:54:24

回答 1查看 533关注 0票数 0

该文件有5 5gb长。

我确实在stackoverflow上发现了一个类似的问题，人们建议使用numpy数组，但我认为这个解决方案应该适用于数字集合，而不是字符串。

有没有什么比eval(list.txt)或者在列表中设置变量导入python文件更好的方法呢？

加载/保存python字符串列表的最有效方法是什么？

python

python-3.x

回答 1

Stack Overflow用户

发布于 2018-07-05 09:15:41

对于只读情况：

import numpy as np

class IndexedBlob:
    def __init__(self, filename):
        index_filename = filename + '.index'
        blob = np.memmap(filename, mode='r')

        try:
            # if there is an existing index
            indices = np.memmap(index_filename, dtype='>i8', mode='r')
        except FileNotFoundError:
            # else, create it
            indices, = np.where(blob == ord('\n'))
            # force dtype to predictable file
            indices = np.array(indices, dtype='>i8')
            with open(index_filename, 'wb') as f:
                # add a virtual newline
                np.array(-1, dtype='>i8').tofile(f)
                indices.tofile(f)
            # then reopen it as a file to reduce memory
            # (and also pick up that -1 we added)
            indices = np.memmap(index_filename, dtype='>i8', mode='r')

        self.blob = blob
        self.indices = indices

    def __getitem__(self, line):
        assert line >= 0

        lo = self.indices[line] + 1
        hi = self.indices[line + 1]

        return self.blob[lo:hi].tobytes().decode()

一些额外的注意事项：

在末尾添加新字符串很容易(只需在追加模式下打开文件并写入一行-但要注意以前的错误写入)-但也要记住手动更新索引文件。但请注意，如果您希望在现有的IndexedBlob对象中看到它，则需要重新执行mmap。
按照设计，如果最后一行缺少一个换行符，它将被忽略(以检测截断或并发写入)
你可以通过只对每个n换行符进行故事化，然后在查找时进行线性搜索来显著缩小索引的大小。
如果你对开始和结束使用不同的索引，你就不再被限制按顺序存储字符串，这就打开了几种突变的可能性。但是，如果突变很少，那么使用'\n.

而不是'\0'作为分隔符，重写整个文件并重新生成索引并不是太expensive.

Consider

当然还有：

无论你做什么，

通用并发变异都是困难的。如果你需要做任何复杂的事情，使用一个真正的数据库:这是最简单的解决方案。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/51181556

复制

相似问题

问在Python3.6中保存/加载大型字符串集合(列表/集合)的最快方法是什么？
EN

回答 1

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在Python3.6中保存/加载大型字符串集合(列表/集合)的最快方法是什么？EN

回答 1

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在Python3.6中保存/加载大型字符串集合(列表/集合)的最快方法是什么？
EN