文章/答案/技术大牛

发布

社区首页 >问答首页 >当高吞吐量(3GB/s)文件系统可用时，如何使用Java中的多个线程读取文件

问当高吞吐量(3GB/s)文件系统可用时，如何使用Java中的多个线程读取文件
EN

Stack Overflow用户

提问于 2016-11-03 22:03:01

回答 2查看 11.2K关注 0票数 4

我知道对于普通的主轴驱动系统来说，使用多线程读取文件是效率低下的。

--这是另一种情况，我有一个高吞吐量的文件系统，它提供高达3GB/s的读取速度，拥有196个CPU核和2tbram。

单线程Java程序读取文件的最大值为85-100 MB/s，因此我有可能比单个线程更好。我必须读取1TB大小的文件，并且我有足够的RAM来加载它。

目前，我使用以下或类似的东西，但需要编写具有多线程的东西以获得更好的吞吐量：

Java 7文件: 50 MB/s

List<String> lines = Files.readAllLines(Paths.get(path), encoding);

Java commons: 48 MB/s

List<String> lines = FileUtils.readLines(new File("/path/to/file.txt"), "utf-8");

番石榴: 45 MB/s

List<String> lines = Files.readLines(new File("/path/to/file.txt"), Charset.forName("utf-8"));

Java扫描器类:非常慢

Scanner s = new Scanner(new File("filepath"));
ArrayList<String> list = new ArrayList<String>();
while (s.hasNext()){
    list.add(s.next());
}
s.close();

我希望能够以正确的排序顺序，以尽可能快的速度加载文件并构建相同的ArrayList。

有一个类似的another question，但实际上是不同的，因为:问题是讨论多线程文件I/O在物理上不可能高效的系统，但是由于技术的进步，我们现在有了支持高吞吐量I/O的系统，因此限制因素是CPU/SW，可以通过多线程I/O来克服。

另一个问题没有回答如何为多线程I/O编写代码。

java

multithreading

file-io

回答 2

Stack Overflow用户

回答已采纳

发布于 2016-11-04 22:04:44

这里是用多个线程读取单个文件的解决方案。

将文件分成N块，读取线程中的每个块，然后按顺序合并它们。小心那些跨越块边界的线。这是用户https://stackoverflow.com/users/34397/slaks建议的基本思想。

在单个20 GB文件的多线程实现下面的工作台标记：

1线程:50秒: 400 MB/s

2线程:30秒: 666 MB/s

4线程:20秒: 1GB/s

8线程:60秒: 333 MB/s

等效Java7 readAllLines()：400秒: 50 MB/s

注意:这可能只适用于设计为支持高吞吐量I/O的系统，而不适用于通常的个人计算机。

package filereadtests;

import java.io.*;
import static java.lang.Math.toIntExact;
import java.nio.*;
import java.nio.channels.*;
import java.nio.charset.Charset;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

public class FileRead implements Runnable
{

private FileChannel _channel;
private long _startLocation;
private int _size;
int _sequence_number;

public FileRead(long loc, int size, FileChannel chnl, int sequence)
{
    _startLocation = loc;
    _size = size;
    _channel = chnl;
    _sequence_number = sequence;
}

@Override
public void run()
{
    try
    {
        System.out.println("Reading the channel: " + _startLocation + ":" + _size);

        //allocate memory
        ByteBuffer buff = ByteBuffer.allocate(_size);

        //Read file chunk to RAM
        _channel.read(buff, _startLocation);

        //chunk to String
        String string_chunk = new String(buff.array(), Charset.forName("UTF-8"));

        System.out.println("Done Reading the channel: " + _startLocation + ":" + _size);

    } catch (Exception e)
    {
        e.printStackTrace();
    }
}

//args[0] is path to read file
//args[1] is the size of thread pool; Need to try different values to fing sweet spot
public static void main(String[] args) throws Exception
{
    FileInputStream fileInputStream = new FileInputStream(args[0]);
    FileChannel channel = fileInputStream.getChannel();
    long remaining_size = channel.size(); //get the total number of bytes in the file
    long chunk_size = remaining_size / Integer.parseInt(args[1]); //file_size/threads

    //Max allocation size allowed is ~2GB
    if (chunk_size > (Integer.MAX_VALUE - 5))
    {
        chunk_size = (Integer.MAX_VALUE - 5);
    }

    //thread pool
    ExecutorService executor = Executors.newFixedThreadPool(Integer.parseInt(args[1]));

    long start_loc = 0;//file pointer
    int i = 0; //loop counter
    while (remaining_size >= chunk_size)
    {
        //launches a new thread
        executor.execute(new FileRead(start_loc, toIntExact(chunk_size), channel, i));
        remaining_size = remaining_size - chunk_size;
        start_loc = start_loc + chunk_size;
        i++;
    }

    //load the last remaining piece
    executor.execute(new FileRead(start_loc, toIntExact(remaining_size), channel, i));

    //Tear Down
    executor.shutdown();

    //Wait for all threads to finish
    while (!executor.isTerminated())
    {
        //wait for infinity time
    }
    System.out.println("Finished all threads");
    fileInputStream.close();
}

}

票数 7

Stack Overflow用户

发布于 2016-11-03 22:22:41

您应该首先尝试java 7 Files.readAllLines：

List<String> lines = Files.readAllLines(Paths.get(path), encoding);

使用多线程方法可能不是一个好的选择，因为它将迫使文件系统执行随机读取(这在文件系统上从来不是一件好事)。

票数 -3

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/40412008

复制

相似问题

问当高吞吐量(3GB/s)文件系统可用时，如何使用Java中的多个线程读取文件
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问当高吞吐量(3GB/s)文件系统可用时，如何使用Java中的多个线程读取文件EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问当高吞吐量(3GB/s)文件系统可用时，如何使用Java中的多个线程读取文件
EN