有一项任务。数组包含任意字符串。我们需要计算每个字符串在数组中发生的次数。在一个线程和多线程中解决任务,比较执行时间。
由于某种原因,单线程版本的运行速度比多线程版本快: 90 ms对300 ms。如何修复多线程版本,使其比单线程版本运行得更快?
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Collections.Concurrent;
using System.Diagnostics;
using System.Threading;
namespace ParallelDictionary
{
class Program
{
static void Main(string[] args)
{
List<string> strs = new List<string>();
for (int i=0; i<1000000; i++)
{
strs.Add("qqq");
}
for (int i=0;i< 5000; i++)
{
strs.Add("aaa");
}
F(strs);
ParallelF(strs);
}
private static void F(List<string> strs)
{
Dictionary<string, int> freqs = new Dictionary<string, int>();
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
for (int i=0; i<strs.Count; i++)
{
if (!freqs.ContainsKey(strs[i]))
freqs[strs[i]] = 1;
else
freqs[strs[i]]++;
}
stopwatch.Stop();
Console.WriteLine("single-threaded {0} ms", stopwatch.ElapsedMilliseconds);
foreach (var kvp in freqs)
{
Console.WriteLine("{0} {1}", kvp.Key, kvp.Value);
}
}
private static void ParallelF(List<string> strs)
{
ConcurrentDictionary<string, int> freqs = new ConcurrentDictionary<string, int>();
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
Parallel.ForEach(strs, str =>
{
freqs.AddOrUpdate(str, 1, (key, value) => value + 1);
});
stopwatch.Stop();
Console.WriteLine("multi-threaded {0} ms", stopwatch.ElapsedMilliseconds);
foreach (var kvp in freqs)
{
Console.WriteLine("{0} {1}", kvp.Key, kvp.Value);
}
}
}
}发布于 2022-10-13 11:24:51
通过使用分区程序将数据分割成单独处理的块,可以使多线程版本比单线程版本稍微快一点。
然后,可以将每个块处理成一个单独的非并发字典,而不需要任何锁定。最后,在每个范围的末尾,您可以更新一个非并发的结果字典(您必须锁定它)。
就像这样:
private static void ParallelF(List<string> strs)
{
Dictionary<string, int> result = new Dictionary<string, int>();
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
object locker = new object();
Parallel.ForEach(Partitioner.Create(0, strs.Count), range =>
{
var freqs = new Dictionary<string, int>();
for (int i = range.Item1; i < range.Item2; ++i)
{
if (!freqs.ContainsKey(strs[i]))
freqs[strs[i]] = 1;
else
freqs[strs[i]]++;
}
lock (locker)
{
foreach (var kvp in freqs)
{
if (!result.ContainsKey(kvp.Key))
{
result[kvp.Key] = kvp.Value;
}
else
{
result[kvp.Key] += kvp.Value;
}
}
}
});
stopwatch.Stop();
Console.WriteLine("multi-threaded {0} ms", stopwatch.ElapsedMilliseconds);
foreach (var kvp in result)
{
Console.WriteLine("{0} {1}", kvp.Key, kvp.Value);
}
}在我的系统中,它提供了以下结果(对于发布版本,.NET 6):
single-threaded 50 ms
qqq 1000000
aaa 5000
multi-threaded 26 ms
qqq 1000000
aaa 5000只是稍微快一点..。如果值得的话,由你来决定。
发布于 2022-10-13 12:19:38
下面是另一种方法,它与Matthew Watson的Partitioner-based解决方案有相似之处,但使用了不同的API。它使用高级Parallel.ForEach重载,其签名如下所示:
/// <summary>
/// Executes a foreach (For Each in Visual Basic) operation with thread-local data
/// on an System.Collections.IEnumerable in which iterations may run in parallel,
/// loop options can be configured, and the state of the loop can be monitored and
/// manipulated.
/// </summary>
public static ParallelLoopResult ForEach<TSource, TLocal>(
IEnumerable<TSource> source,
ParallelOptions parallelOptions,
Func<TLocal> localInit,
Func<TSource, ParallelLoopState, TLocal, TLocal> body,
Action<TLocal> localFinally);由于TLocal是一个本地字典,它包含由单个工作线程计算的部分结果:
static Dictionary<string, int> GetFrequencies(List<string> source)
{
Dictionary<string, int> frequencies = new();
ParallelOptions options = new()
{
MaxDegreeOfParallelism = Environment.ProcessorCount
};
Parallel.ForEach(source, options, () => new Dictionary<string, int>(),
(item, state, partialFrequencies) =>
{
ref int occurences = ref CollectionsMarshal.GetValueRefOrAddDefault(
partialFrequencies, item, out bool exists);
occurences++;
return partialFrequencies;
}, partialFrequencies =>
{
lock (frequencies)
{
foreach ((string item, int partialOccurences) in partialFrequencies)
{
ref int occurences = ref CollectionsMarshal.GetValueRefOrAddDefault(
frequencies, item, out bool exists);
occurences += partialOccurences;
}
}
});
return frequencies;
}上面的代码还演示了低水平的CollectionsMarshal.GetValueRefOrAddDefault API的使用,它允许搜索和更新带有单一哈希键的字典。
我没有测量它(也没有测试它),但我预计它会比马修沃森的解决方案慢。原因是source是以同步方式枚举的。如果您能够处理复杂性,您可以考虑将这两种方法结合起来,以获得最佳性能。
https://stackoverflow.com/questions/74054577
复制相似问题