文章/答案/技术大牛

发布

社区首页 >问答首页 >Parallel.ForEach和ConcurrentDictionary比普通字典慢

问Parallel.ForEach和ConcurrentDictionary比普通字典慢
EN

Stack Overflow用户

提问于 2022-10-13 10:53:22

回答 2查看 73关注 0票数 3

有一项任务。数组包含任意字符串。我们需要计算每个字符串在数组中发生的次数。在一个线程和多线程中解决任务，比较执行时间。

由于某种原因，单线程版本的运行速度比多线程版本快: 90 ms对300 ms。如何修复多线程版本，使其比单线程版本运行得更快？

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Collections.Concurrent;
using System.Diagnostics;
using System.Threading;

namespace ParallelDictionary
{
    class Program
    {
        static void Main(string[] args)
        {
            List<string> strs = new List<string>();
            for (int i=0; i<1000000; i++)
            {
                strs.Add("qqq");
            }
            for (int i=0;i< 5000; i++)
            {
                strs.Add("aaa");
            }

            F(strs);
            ParallelF(strs);
        }

        private static void F(List<string> strs)
        {
            Dictionary<string, int> freqs = new Dictionary<string, int>();

            Stopwatch stopwatch = new Stopwatch();
            stopwatch.Start();
            for (int i=0; i<strs.Count; i++)
            {
                if (!freqs.ContainsKey(strs[i]))
                    freqs[strs[i]] = 1;
                else
                    freqs[strs[i]]++;
            }
            stopwatch.Stop();

            Console.WriteLine("single-threaded {0} ms", stopwatch.ElapsedMilliseconds);
            foreach (var kvp in freqs)
            {
                Console.WriteLine("{0} {1}", kvp.Key, kvp.Value);
            }
        }

        private static void ParallelF(List<string> strs)
        {
            ConcurrentDictionary<string, int> freqs = new ConcurrentDictionary<string, int>();

            Stopwatch stopwatch = new Stopwatch();
            stopwatch.Start();
            Parallel.ForEach(strs, str =>
            {
                freqs.AddOrUpdate(str, 1, (key, value) => value + 1);
            });
            stopwatch.Stop();

            Console.WriteLine("multi-threaded {0} ms", stopwatch.ElapsedMilliseconds);
            foreach (var kvp in freqs)
            {
                Console.WriteLine("{0} {1}", kvp.Key, kvp.Value);
            }
        }
    }
}

concurrentdictionary

multithreading

parallel.foreach

回答 2

Stack Overflow用户

回答已采纳

发布于 2022-10-13 11:24:51

通过使用分区程序将数据分割成单独处理的块，可以使多线程版本比单线程版本稍微快一点。

然后，可以将每个块处理成一个单独的非并发字典，而不需要任何锁定。最后，在每个范围的末尾，您可以更新一个非并发的结果字典(您必须锁定它)。

就像这样：

private static void ParallelF(List<string> strs)
{
    Dictionary<string, int> result = new Dictionary<string, int>();

    Stopwatch stopwatch = new Stopwatch();
    stopwatch.Start();

    object locker = new object();

    Parallel.ForEach(Partitioner.Create(0, strs.Count), range => 
    {
        var freqs = new Dictionary<string, int>();

        for (int i = range.Item1; i < range.Item2; ++i)
        {
            if (!freqs.ContainsKey(strs[i]))
                freqs[strs[i]] = 1;
            else
                freqs[strs[i]]++;
        }

        lock (locker)
        { 
            foreach (var kvp in freqs)
            {
                if (!result.ContainsKey(kvp.Key))
                {
                    result[kvp.Key] = kvp.Value;
                }
                else
                {
                    result[kvp.Key] += kvp.Value;
                }
            }
        }
    });

    stopwatch.Stop();

    Console.WriteLine("multi-threaded {0} ms", stopwatch.ElapsedMilliseconds);
    foreach (var kvp in result)
    {
        Console.WriteLine("{0} {1}", kvp.Key, kvp.Value);
    }
}

在我的系统中，它提供了以下结果(对于发布版本，.NET 6)：

single-threaded 50 ms
qqq 1000000
aaa 5000
multi-threaded 26 ms
qqq 1000000
aaa 5000

只是稍微快一点..。如果值得的话，由你来决定。

票数 4

Stack Overflow用户

发布于 2022-10-13 12:19:38

下面是另一种方法，它与Matthew Watson的Partitioner-based解决方案有相似之处，但使用了不同的API。它使用高级Parallel.ForEach重载，其签名如下所示：

/// <summary>
/// Executes a foreach (For Each in Visual Basic) operation with thread-local data
/// on an System.Collections.IEnumerable in which iterations may run in parallel,
/// loop options can be configured, and the state of the loop can be monitored and
/// manipulated.
/// </summary>
public static ParallelLoopResult ForEach<TSource, TLocal>(
    IEnumerable<TSource> source,
    ParallelOptions parallelOptions,
    Func<TLocal> localInit,
    Func<TSource, ParallelLoopState, TLocal, TLocal> body,
    Action<TLocal> localFinally);

由于TLocal是一个本地字典，它包含由单个工作线程计算的部分结果：

static Dictionary<string, int> GetFrequencies(List<string> source)
{
    Dictionary<string, int> frequencies = new();
    ParallelOptions options = new()
    {
        MaxDegreeOfParallelism = Environment.ProcessorCount
    };
    Parallel.ForEach(source, options, () => new Dictionary<string, int>(),
        (item, state, partialFrequencies) =>
    {
        ref int occurences = ref CollectionsMarshal.GetValueRefOrAddDefault(
            partialFrequencies, item, out bool exists);
        occurences++;
        return partialFrequencies;
    }, partialFrequencies =>
    {
        lock (frequencies)
        {
            foreach ((string item, int partialOccurences) in partialFrequencies)
            {
                ref int occurences = ref CollectionsMarshal.GetValueRefOrAddDefault(
                    frequencies, item, out bool exists);
                occurences += partialOccurences;
            }
        }
    });
    return frequencies;
}

上面的代码还演示了低水平的CollectionsMarshal.GetValueRefOrAddDefault API的使用，它允许搜索和更新带有单一哈希键的字典。

我没有测量它(也没有测试它)，但我预计它会比马修沃森的解决方案慢。原因是source是以同步方式枚举的。如果您能够处理复杂性，您可以考虑将这两种方法结合起来，以获得最佳性能。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/74054577

复制

相似问题

问Parallel.ForEach和ConcurrentDictionary比普通字典慢
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Parallel.ForEach和ConcurrentDictionary比普通字典慢EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Parallel.ForEach和ConcurrentDictionary比普通字典慢
EN