文章/答案/技术大牛

发布

社区首页 >问答首页 >将numpy数组写入文本文件的速度

问将numpy数组写入文本文件的速度
EN

Stack Overflow用户

提问于 2018-12-17 18:12:53

回答 2查看 3.2K关注 0票数 6

我需要写一个非常“高”的两列数组到一个文本文件，它是非常慢的。我发现，如果我将数组重组为更宽的数组，则写入速度要快得多。例如

import time
import numpy as np
dataMat1 = np.random.rand(1000,1000)
dataMat2 = np.random.rand(2,500000)
dataMat3 = np.random.rand(500000,2)
start = time.perf_counter()
with open('test1.txt','w') as f:
    np.savetxt(f,dataMat1,fmt='%g',delimiter=' ')
end = time.perf_counter()
print(end-start)

start = time.perf_counter()
with open('test2.txt','w') as f:
    np.savetxt(f,dataMat2,fmt='%g',delimiter=' ')
end = time.perf_counter()
print(end-start)

start = time.perf_counter()
with open('test3.txt','w') as f:
    np.savetxt(f,dataMat3,fmt='%g',delimiter=' ')
end = time.perf_counter()
print(end-start)

由于三个数据矩阵中的元素数量相同，为什么最后一个元素比其他两个元素要耗时得多呢？有没有任何方法可以加快“高”数据数组的编写？

numpy

python

performance

回答 2

Stack Overflow用户

回答已采纳

发布于 2018-12-17 19:44:41

作为hpaulj pointed out，savetxt是X，并分别格式化每一行：

for row in X:
    try:
        v = format % tuple(row) + newline
    except TypeError:
        raise TypeError("Mismatch between array dtype ('%s') and "
                        "format specifier ('%s')"
                        % (str(X.dtype), format))
    fh.write(v)

我认为主要的时间杀手是所有的字符串插值调用。如果我们把所有的字符串内插都打包到一个调用中，事情就会进行得更快：

with open('/tmp/test4.txt','w') as f:
    fmt = ' '.join(['%g']*dataMat3.shape[1])
    fmt = '\n'.join([fmt]*dataMat3.shape[0])
    data = fmt % tuple(dataMat3.ravel())
    f.write(data)

import io
import time
import numpy as np

dataMat1 = np.random.rand(1000,1000)
dataMat2 = np.random.rand(2,500000)
dataMat3 = np.random.rand(500000,2)
start = time.perf_counter()
with open('/tmp/test1.txt','w') as f:
    np.savetxt(f,dataMat1,fmt='%g',delimiter=' ')
end = time.perf_counter()
print(end-start)

start = time.perf_counter()
with open('/tmp/test2.txt','w') as f:
    np.savetxt(f,dataMat2,fmt='%g',delimiter=' ')
end = time.perf_counter()
print(end-start)

start = time.perf_counter()
with open('/tmp/test3.txt','w') as f:
    np.savetxt(f,dataMat3,fmt='%g',delimiter=' ')
end = time.perf_counter()
print(end-start)

start = time.perf_counter()
with open('/tmp/test4.txt','w') as f:
    fmt = ' '.join(['%g']*dataMat3.shape[1])
    fmt = '\n'.join([fmt]*dataMat3.shape[0])
    data = fmt % tuple(dataMat3.ravel())        
    f.write(data)
end = time.perf_counter()
print(end-start)

报告

0.1604848340011813
0.17416274400056864
0.6634929459996783
0.16207673999997496

票数 7

Stack Overflow用户

发布于 2018-12-17 18:19:35

savetxt的代码是Python的，是可访问的。基本上，它会对每一行/行进行格式化的写入。实际上是这样的

for row in arr:
   f.write(fmt%tuple(row))

其中fmt是从您的fmt和数组的形状派生出来的。

'%g %g %g ...'

所以它为数组的每一行写了一个文件。行格式也需要一些时间，但它是用Python代码在内存中完成的。

我希望loadtxt/genfromtxt将显示同样的时间模式--读取许多行所需的时间更长。

pandas具有更快的csv负载。我还没有看到任何关于它的写速度的讨论。

票数 4

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/53820891

复制

相似问题

问将numpy数组写入文本文件的速度
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问将numpy数组写入文本文件的速度EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问将numpy数组写入文本文件的速度
EN