当我处理相对较大的文本文件时,我注意到了一些奇怪的东西。异步读取和写入实际上比非异步读取慢:
E,g,执行这个虚拟代码:
var res1 = File.WriteAllLinesAsync(string.Format(@"C:\Projects\DelMee\file{0}.txt", i), lines);
var res2 = File.WriteAllLinesAsync(string.Format(@"C:\Projects\DelMee\file{0}_bck.txt", i), lines);
await res1;
await res2;
实际上比
File.WriteAllLines(string.Format(@"C:\Projects\DelMee\file{0}.txt", i), lines);
File.WriteAllLines(string.Format(@"C:\Projects\DelMee\file{0}_bck.txt", i), lines);
从理论上讲,第一种方法应该更快,因为第二种写作应该在第一篇文章完成之前就开始了。15 ~25 100文件(10vs20秒)的性能差异约为100%。
我注意到了ReadAllLines和ReadAllLinesAsync的相同行为。
更新:0的主要思想是在完成TestFileWriteXXX函数后对所有文件进行处理。因此
Task.WhenAll(allTasks1); // Without await is not a valid option
更新:1,我增加了使用线程的读写,它显示了50%的改进。以下是完整的示例:
更新:2I更新代码以消除缓冲区生成开销
const int MaxAttempts = 5;
static void Main(string[] args)
{
TestFileWrite();
TestFileWriteViaThread();
TestFileWriteAsync();
Console.ReadLine();
}
private static void TestFileWrite()
{
Clear();
Stopwatch stopWatch = new Stopwatch();
stopWatch.Start();
Console.WriteLine( "Begin TestFileWrite");
for (int i = 0; i < MaxAttempts; ++i)
{
TestFileWriteInt(i);
}
TimeSpan ts = stopWatch.Elapsed;
string elapsedTime = String.Format("{0:00}:{1:00}:{2:00}.{3:00}", ts.Hours, ts.Minutes, ts.Seconds, ts.Milliseconds / 10);
Console.WriteLine("TestFileWrite took: " + elapsedTime);
}
private static void TestFileWriteViaThread()
{
Clear();
Stopwatch stopWatch = new Stopwatch();
stopWatch.Start();
Console.WriteLine("Begin TestFileWriteViaThread");
List<Thread> _threads = new List<Thread>();
for (int i = 0; i < MaxAttempts; ++i)
{
var t = new Thread(TestFileWriteInt);
t.Start(i);
_threads.Add(t);
}
_threads.ForEach(T => T.Join());
TimeSpan ts = stopWatch.Elapsed;
string elapsedTime = String.Format("{0:00}:{1:00}:{2:00}.{3:00}", ts.Hours, ts.Minutes, ts.Seconds, ts.Milliseconds / 10);
Console.WriteLine("TestFileWriteViaThread took: " + elapsedTime);
}
private static void TestFileWriteInt(object oIndex)
{
int index = (int)oIndex;
List<string> lines = GenerateLines(index);
File.WriteAllLines(string.Format(@"C:\Projects\DelMee\file{0}.txt", index), lines);
File.WriteAllLines(string.Format(@"F:\Projects\DelMee\file{0}_bck.txt", index), lines);
var text = File.ReadAllLines(string.Format(@"C:\Projects\DelMee\file{0}.txt", index));
var text1 = File.ReadAllLines(string.Format(@"C:\Projects\DelMee\file{0}.txt", index));
//File.WriteAllLines(string.Format(@"C:\Projects\DelMee\file_test{0}.txt", index), text1);
}
private static async void TestFileWriteAsync()
{
Clear();
Console.WriteLine("Begin TestFileWriteAsync ");
Stopwatch stopWatch = new Stopwatch();
stopWatch.Start();
for (int i = 0; i < MaxAttempts; ++i)
{
List<string> lines = GenerateLines(i);
var allTasks = new List<Task>();
allTasks.Add(File.WriteAllLinesAsync(string.Format(@"C:\Projects\DelMee\file{0}.txt", i), lines));
allTasks.Add(File.WriteAllLinesAsync(string.Format(@"F:\Projects\DelMee\file{0}_bck.txt", i), lines));
await Task.WhenAll(allTasks);
var allTasks1 = new List<Task<string[]>>();
allTasks1.Add(File.ReadAllLinesAsync(string.Format(@"C:\Projects\DelMee\file{0}.txt", i)));
allTasks1.Add(File.ReadAllLinesAsync(string.Format(@"C:\Projects\DelMee\file{0}.txt", i)));
await Task.WhenAll(allTasks1);
// await File.WriteAllLinesAsync(string.Format(@"C:\Projects\DelMee\file_test{0}.txt", i), allTasks1[0].Result);
}
stopWatch.Stop();
TimeSpan ts = stopWatch.Elapsed;
string elapsedTime = String.Format("{0:00}:{1:00}:{2:00}.{3:00}", ts.Hours, ts.Minutes, ts.Seconds, ts.Milliseconds / 10);
Console.WriteLine("TestFileWriteAsync took: " + elapsedTime);
}
private static void Clear()
{
for (int i = 0; i < 15; ++i)
{
System.IO.File.Delete(string.Format(@"C:\Projects\DelMee\file{0}.txt", i));
System.IO.File.Delete(string.Format(@"F:\Projects\DelMee\file{0}_bck.txt", i));
}
}
static string buffer = new string('a', 25 * 1024 * 1024);
private static List<string> GenerateLines(int i)
{
return new List<string>() { buffer };
}
其结果是:
TestFileWrite拍摄: 00:00:03.50
TestFileWriteViaThread拍摄: 00:00:01.63
TestFileWriteAsync拍摄: 00:00:06.78
8代码CPU/ C和F是2种不同的SSD驱动器,在2种不同的SATA上驱动850 EVOs。
更新:3-结论看起来很好地处理了当我们想要刷新大量数据时的场景。从下面的答案中可以看出,最好直接使用FileStream。但是异步仍然比顺序访问慢。
但就目前而言,如果您使用多线程,最快的方法仍然存在。
发布于 2020-04-07 11:09:10
我认为这是一个众所周知的问题。如果你谷歌,你会看到堆类似的帖子。
Eg https://github.com/dotnet/runtime/issues/23196
如果 fast 是单个IO操作的要求,则应该始终使用同步IO和同步方法。
Write*Async
方法在内部以异步IO模式打开文件流,与同步IO相比具有开销。
https://learn.microsoft.com/en-us/windows/win32/fileio/synchronous-and-asynchronous-i-o
然而,对于相对较快的I/O操作,处理内核I/O请求和内核信号的开销可能会使异步I/O更少受益,特别是当需要进行许多快速I/O操作时。在这种情况下,同步I/O会更好。
此外,FileStream
和StreamWriter
中的异步方法可能存在缓冲区大小较小的问题。写入文件流的默认缓冲区大小是4KB,这比文件大小(25 4KB到50 4KB)要小得多。尽管缓冲区大小似乎适用于同步方法,但它夸大了异步方法所产生的开销。
查看此线路,每当缓冲区已满时,该方法都会生成线程。如果使用默认的4096字节缓冲区写入25 If文件,则会发生6400次。
为了优化这一点,如果整个文件都在内存中,则可以将缓冲区大小设置为文件大小,以减少每次写入和刷新之间的上下文切换和同步。
如果您在代码中打开具有不同缓冲区大小的FileStream
和StreamWriter
,并运行对Write
和WriteAsync
的测试,您将看到差异。如果缓冲区大小与文件大小相同,同步方法和异步方法之间的差异很小。
例如:
// 4KB buffer sync stream
using (var stream = new FileStream(
path, FileMode.Create, FileAccess.Write, FileShare.Read,
4096, FileOptions.SequentialScan))
{
using (var writer = new StreamWriter(stream, Encoding.UTF8))
{
writer.Write(str25mb);
}
}
// 25MB buffer sync stream
using (var stream = new FileStream(
path, FileMode.Create, FileAccess.Write, FileShare.Read,
25 * 1024 * 1024, FileOptions.SequentialScan))
{
using (var writer = new StreamWriter(stream, Encoding.UTF8))
{
writer.Write(str25mb);
}
}
// 4KB buffer async stream
using (var stream = new FileStream(
path,
FileMode.Create, FileAccess.Write, FileShare.Read,
4096, FileOptions.Asynchronous | FileOptions.SequentialScan))
using (var writer = new StreamWriter(stream, Encoding.UTF8))
{
await writer.WriteAsync(str25mb);
}
// 25MB buffer async stream
using (var stream = new FileStream(
path,
FileMode.Create, FileAccess.Write, FileShare.Read,
25 * 1024 * 1024, FileOptions.Asynchronous | FileOptions.SequentialScan))
using (var writer = new StreamWriter(stream, Encoding.UTF8))
{
await writer.WriteAsync(str25mb);
}
结果(我每次测试10次)是:
TestFileWriteWithLargeBuffer took: 00:00:00.9291647
TestFileWriteWithLargeBufferAsync took: 00:00:01.1950127
TestFileWrite took: 00:00:01.5251026
TestFileWriteAsync took: 00:00:03.6913877
发布于 2020-04-07 08:41:40
我的第一个回答是错误的,因为当试图从Task.WhenAll
中删除async
时,没有使用“等待TestFileWriteAsync
”。
我已经做了固定试验,这表明File.Write*Async
非常慢。
Begin TestFileWriteAsync
TestFileWriteAsync took: 00:00:13.7128699
Begin TestFileWrite
TestFileWrite took: 00:00:01.5734895
Begin TestFileWriteViaThread
TestFileWriteViaThread took: 00:00:00.8322218
请原谅我
我已经检查了异步方法源代码。
看起来File.WriteAllLinesAsync
和File.WriteAllTextAsync
使用相同的InternalWriteAllTextAsync,再次复制原始缓冲区的部分。
buffer = ArrayPool<char>.Shared.Rent(DefaultBufferSize);
int count = contents.Length;
int index = 0;
while (index < count)
{
int batchSize = Math.Min(DefaultBufferSize, count - index);
contents.CopyTo(index, buffer, 0, batchSize);
#if MS_IO_REDIST
await sw.WriteAsync(buffer, 0, batchSize).ConfigureAwait(false);
#else
await sw.WriteAsync(new ReadOnlyMemory<char>(buffer, 0, batchSize), cancellationToken).ConfigureAwait(false);
#endif
contents.CopyTo(index, buffer, 0, batchSize);
是复制原始数据缓冲区部分的行。
您可以尝试使用File.WriteAllBytesAsync
,它接受数据缓冲区“原样”,而不执行额外的复制操作:
Begin TestFileWriteAsync
TestFileWriteAsync took: 00:00:00.7741439
Begin TestFileWrite
TestFileWrite took: 00:00:00.5772008
Begin TestFileWriteViaThread
TestFileWriteViaThread took: 00:00:00.4457552
WriteAllBytesAsync 测试源代码
https://stackoverflow.com/questions/61074203
复制相似问题