文章/答案/技术大牛

发布

社区首页 >问答首页 >如何在C++中快速地将一个大的缓冲区写入二进制文件？

问如何在C++中快速地将一个大的缓冲区写入二进制文件？
EN

Stack Overflow用户

提问于 2012-07-19 23:18:23

回答 12查看 227.5K关注 0票数 269

我正在尝试将大量数据写入我的固态硬盘(固态硬盘)。我所说的巨大容量是指80 by。

我浏览了网页寻找解决方案，但我想出的最好的答案是：

#include <fstream>
const unsigned long long size = 64ULL*1024ULL*1024ULL;
unsigned long long a[size];
int main()
{
    std::fstream myfile;
    myfile = std::fstream("file.binary", std::ios::out | std::ios::binary);
    //Here would be some error handling
    for(int i = 0; i < 32; ++i){
        //Some calculations to fill a[]
        myfile.write((char*)&a,size*sizeof(unsigned long long));
    }
    myfile.close();
}

这个程序是用Visual Studio2010编译的，经过充分的优化，在Windows7下运行，速度最高可达20MB/s。真正让我困扰的是，Windows能以150MB/s到200MB/s的速度把文件从另一个固态硬盘复制到这个固态硬盘上，所以速度至少快了7倍。这就是为什么我认为我应该能走得更快。

有什么办法可以加快我的写作速度吗？

c++

performance

optimization

file-io

回答 12

Stack Overflow用户

回答已采纳

发布于 2012-07-20 00:11:11

这就完成了工作(在2012年)：

#include <stdio.h>
const unsigned long long size = 8ULL*1024ULL*1024ULL;
unsigned long long a[size];

int main()
{
    FILE* pFile;
    pFile = fopen("file.binary", "wb");
    for (unsigned long long j = 0; j < 1024; ++j){
        //Some calculations to fill a[]
        fwrite(a, 1, size*sizeof(unsigned long long), pFile);
    }
    fclose(pFile);
    return 0;
}

我刚刚在36秒内计时了8 8GB，大约是220MB/s，我想这已经耗尽了我的固态硬盘。同样值得注意的是，问题中的代码100%使用了一个核心，而这段代码只使用了2-5%。

非常感谢大家。

更新：5年过去了，现在是2017年了。编译器、硬件、库和我的需求都发生了变化。这就是为什么我对代码做了一些修改，并做了一些新的测量。

首先来看代码：

#include <fstream>
#include <chrono>
#include <vector>
#include <cstdint>
#include <numeric>
#include <random>
#include <algorithm>
#include <iostream>
#include <cassert>

std::vector<uint64_t> GenerateData(std::size_t bytes)
{
    assert(bytes % sizeof(uint64_t) == 0);
    std::vector<uint64_t> data(bytes / sizeof(uint64_t));
    std::iota(data.begin(), data.end(), 0);
    std::shuffle(data.begin(), data.end(), std::mt19937{ std::random_device{}() });
    return data;
}

long long option_1(std::size_t bytes)
{
    std::vector<uint64_t> data = GenerateData(bytes);

    auto startTime = std::chrono::high_resolution_clock::now();
    auto myfile = std::fstream("file.binary", std::ios::out | std::ios::binary);
    myfile.write((char*)&data[0], bytes);
    myfile.close();
    auto endTime = std::chrono::high_resolution_clock::now();

    return std::chrono::duration_cast<std::chrono::milliseconds>(endTime - startTime).count();
}

long long option_2(std::size_t bytes)
{
    std::vector<uint64_t> data = GenerateData(bytes);

    auto startTime = std::chrono::high_resolution_clock::now();
    FILE* file = fopen("file.binary", "wb");
    fwrite(&data[0], 1, bytes, file);
    fclose(file);
    auto endTime = std::chrono::high_resolution_clock::now();

    return std::chrono::duration_cast<std::chrono::milliseconds>(endTime - startTime).count();
}

long long option_3(std::size_t bytes)
{
    std::vector<uint64_t> data = GenerateData(bytes);

    std::ios_base::sync_with_stdio(false);
    auto startTime = std::chrono::high_resolution_clock::now();
    auto myfile = std::fstream("file.binary", std::ios::out | std::ios::binary);
    myfile.write((char*)&data[0], bytes);
    myfile.close();
    auto endTime = std::chrono::high_resolution_clock::now();

    return std::chrono::duration_cast<std::chrono::milliseconds>(endTime - startTime).count();
}

int main()
{
    const std::size_t kB = 1024;
    const std::size_t MB = 1024 * kB;
    const std::size_t GB = 1024 * MB;

    for (std::size_t size = 1 * MB; size <= 4 * GB; size *= 2) std::cout << "option1, " << size / MB << "MB: " << option_1(size) << "ms" << std::endl;
    for (std::size_t size = 1 * MB; size <= 4 * GB; size *= 2) std::cout << "option2, " << size / MB << "MB: " << option_2(size) << "ms" << std::endl;
    for (std::size_t size = 1 * MB; size <= 4 * GB; size *= 2) std::cout << "option3, " << size / MB << "MB: " << option_3(size) << "ms" << std::endl;

    return 0;
}

此代码使用Visual Studio2017和g++ 7.2.0进行编译(新要求)。我用两个设置运行代码：

笔记本电脑，酷睿i7，固态硬盘，Ubuntu 16.04，g++ 7.2.0版，带-std=c++11 -march=native -O3
台式机，酷睿i7，固态硬盘，Windows10，Visual Studio 2017 15.3.1版，带/Ox /Ob2 /Oi /Ot /GT /GL /Gy

它给出了以下测量结果(去掉1MB的值之后，因为它们是明显的异常值)：

在我的固态硬盘上，option1和option3都是最大的。我没想到会出现这种情况，因为在我的旧机器上，option2曾经是最快的代码。

TL;DR：我的测量结果表明在FILE上使用std::fstream。

票数 263

Stack Overflow用户

发布于 2012-07-19 23:53:28

请按顺序尝试以下操作：

较小的缓冲区大小。一次编写~2个MiB可能是一个好的开始。在我的上一台笔记本电脑上，~512 KiB是最佳选择，但我还没有在我的固态硬盘上进行测试。

注意：我注意到非常大的缓冲区往往会降低的性能。我注意到使用16MiB的缓冲区而不是512KiB的缓冲区会造成速度的损失。

使用_open (或者_topen，如果你想正确使用Windows)来打开文件，然后使用_write。这可能会避免很多缓冲，但也不一定。

使用特定于Windows的函数，如CreateFile和WriteFile。这将避免标准库中的任何缓冲。

票数 26

Stack Overflow用户

发布于 2012-07-20 00:04:23

我看不出std::stream/FILE/device之间有什么区别。缓冲和非缓冲之间。

另请注意：

固态硬盘驱动器在装满时“倾向于”减速(较低的传输速率)。
固态硬盘驱动器“倾向于”减慢(较低的传输速率)，因为它们的老化(由于非工作位)。

我看到代码在63秒内运行。

因此，传输速率为：260M/s (我的固态硬盘看起来比你的略快)。

64 * 1024 * 1024 * 8 /*sizeof(unsigned long long) */ * 32 /*Chunks*/

= 16G
= 16G/63 = 260M/s

从std::fstream转到FILE*不会增加任何内容。

#include <stdio.h>

using namespace std;

int main()
{
    
    FILE* stream = fopen("binary", "w");

    for(int loop=0;loop < 32;++loop)
    {
         fwrite(a, sizeof(unsigned long long), size, stream);
    }
    fclose(stream);

}

因此，C++流的运行速度与底层库所允许的速度一样快。

但我认为将操作系统比作建立在操作系统之上的应用程序是不公平的。应用程序不能做任何假设(它不知道驱动器是SSD)，因此使用操作系统的文件机制进行传输。

而操作系统不需要做任何假设。它可以告知所涉及的驱动器类型，并使用最佳技术来传输数据。在这种情况下，直接存储器到存储器转移。尝试编写一个程序，将80G从内存中的一个位置复制到另一个位置，看看速度有多快。

编辑

我更改了我的代码以使用较低级别的调用：

即无缓冲。

#include <fcntl.h>
#include <unistd.h>


const unsigned long long size = 64ULL*1024ULL*1024ULL;
unsigned long long a[size];
int main()
{
    int data = open("test", O_WRONLY | O_CREAT, 0777);
    for(int loop = 0; loop < 32; ++loop)
    {   
        write(data, a, size * sizeof(unsigned long long));
    }   
    close(data);
}

这并没有什么不同。

注意：：我的硬盘是固态硬盘，如果你有一个普通的硬盘，你可能会看到上面两种技术的不同。但正如我所期望的那样，非缓冲和缓冲(当写入大于缓冲区大小的大块时)没有区别。

编辑2：

您有没有尝试过在C++中复制文件的最快方法

int main()
{
    std::ifstream  input("input");
    std::ofstream  output("ouptut");

    output << input.rdbuf();
}

票数 23

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/11563963

复制

相似问题

问如何在C++中快速地将一个大的缓冲区写入二进制文件？
EN

回答 12

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在C++中快速地将一个大的缓冲区写入二进制文件？EN

回答 12

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在C++中快速地将一个大的缓冲区写入二进制文件？
EN