在我的机器上(Quad,8gb内存),运行Vista x64业务,使用Visual 2008 SP1,我正试图非常快地交叉两组数字。
我在C++中实现了两种方法,在C#中实现了一种方法。到目前为止,C#方法速度更快,我希望改进C++方法,使其比C#更快,我希望C++能够做到这一点。
下面是C#输出:(发布版本)
Found the intersection 1000 times, in 4741.407 ms下面是两种不同方法(发布C++构建)的初始x64输出:
Found the intersection (using unordered_map) 1000 times, in 21580.7ms
Found the intersection (using set_intersection) 1000 times, in 22366.6ms以下是最新的C++输出,用于三种方法(发布x64构建):
最新基准:
Found the intersection of 504 values (using unordered_map) 1000 times, in 28827.6ms
Found the intersection of 495 values (using set_intersection) 1000 times, in 9817.69ms
Found the intersection of 504 values (using unordered_set) 1000 times, in 24769.1ms因此,set_intersection方法现在比C#慢约2倍,但比最初的C++方法快2倍。
最新的C++代码:
Code:
// MapPerformance.cpp : Defines the entry point for the console application.
//
#include "stdafx.h"
#include <hash_map>
#include <vector>
#include <iostream>
#include <time.h>
#include <algorithm>
#include <set>
#include <unordered_set>
#include <boost\unordered\unordered_map.hpp>
#include "timer.h"
using namespace std;
using namespace stdext;
using namespace boost;
using namespace tr1;
int runIntersectionTest2(const vector<int>& set1, const vector<int>& set2)
{
    // hash_map<int,int> theMap;
    // map<int,int> theMap;
    unordered_set<int> theSet;      
     theSet.insert( set1.begin(), set1.end() );
    int intersectionSize = 0;
    vector<int>::const_iterator set2_end = set2.end();
    for ( vector<int>::const_iterator iterator = set2.begin(); iterator != set2_end; ++iterator )
    {
        if ( theSet.find(*iterator) != theSet.end() )
        {
                intersectionSize++;
        }
    }
    return intersectionSize;
}
int runIntersectionTest(const vector<int>& set1, const vector<int>& set2)
{
    // hash_map<int,int> theMap;
    // map<int,int> theMap;
    unordered_map<int,int> theMap;  
    vector<int>::const_iterator set1_end = set1.end();
    // Now intersect the two sets by populating the map
    for ( vector<int>::const_iterator iterator = set1.begin(); iterator != set1_end; ++iterator )
    {
        int value = *iterator;
        theMap[value] = 1;
    }
    int intersectionSize = 0;
    vector<int>::const_iterator set2_end = set2.end();
    for ( vector<int>::const_iterator iterator = set2.begin(); iterator != set2_end; ++iterator )
    {
        int value = *iterator;
        unordered_map<int,int>::iterator foundValue = theMap.find(value);
        if ( foundValue != theMap.end() )
        {
            theMap[value] = 2;
            intersectionSize++;
        }
    }
    return intersectionSize;
}
int runSetIntersection(const vector<int>& set1_unsorted, const vector<int>& set2_unsorted)
{   
    // Create two vectors
    std::vector<int> set1(set1_unsorted.size());
    std::vector<int> set2(set2_unsorted.size());
    // Copy the unsorted data into them
    std::copy(set1_unsorted.begin(), set1_unsorted.end(), set1.begin());
    std::copy(set2_unsorted.begin(), set2_unsorted.end(), set2.begin());
    // Sort the data
    sort(set1.begin(),set1.end());
    sort(set2.begin(),set2.end());
    vector<int> intersection;
    intersection.reserve(1000);
    set_intersection(set1.begin(),set1.end(), set2.begin(), set2.end(), back_inserter(intersection));
    return intersection.size(); 
}
void createSets( vector<int>& set1, vector<int>& set2 )
{
    srand ( time(NULL) );
    set1.reserve(100000);
    set2.reserve(1000);
    // Create 100,000 values for set1
    for ( int i = 0; i < 100000; i++ )
    {
        int value = 1000000000 + i;
        set1.push_back(value);
    }
    // Try to get half of our values intersecting
    float ratio = 200000.0f / RAND_MAX;
    // Create 1,000 values for set2
    for ( int i = 0; i < 1000; i++ )
    {
        int random = rand() * ratio + 1;
        int value = 1000000000 + random;
        set2.push_back(value);
    }
    // Make sure set1 is in random order (not sorted)
    random_shuffle(set1.begin(),set1.end());
}
int _tmain(int argc, _TCHAR* argv[])
{
    int intersectionSize = 0;
    vector<int> set1, set2; 
    createSets( set1, set2 );
    Timer timer;
    for ( int i = 0; i < 1000; i++ )
    {
        intersectionSize = runIntersectionTest(set1, set2);
    }
    timer.Stop();
    cout << "Found the intersection of " << intersectionSize << " values (using unordered_map) 1000 times, in " << timer.GetMilliseconds() << "ms" << endl;
    timer.Reset();
    for ( int i = 0; i < 1000; i++ )
    {
        intersectionSize = runSetIntersection(set1,set2);
    }
    timer.Stop();
    cout << "Found the intersection of " << intersectionSize << " values (using set_intersection) 1000 times, in " << timer.GetMilliseconds() << "ms" << endl;
    timer.Reset();
    for ( int i = 0; i < 1000; i++ )
    {
        intersectionSize = runIntersectionTest2(set1,set2);
    }
    timer.Stop();
    cout << "Found the intersection of " << intersectionSize << " values (using unordered_set) 1000 times, in " << timer.GetMilliseconds() << "ms" << endl;
    getchar();
    return 0;
}C#代码:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace DictionaryPerformance
{
    class Program
    {
        static void Main(string[] args)
        {
            List<int> set1 = new List<int>(100000);
            List<int> set2 = new List<int>(1000);
            // Create 100,000 values for set1
            for (int i = 0; i < 100000; i++)
            {
                int value = 1000000000 + i;
                set1.Add(value);
            }
            Random random = new Random(DateTime.Now.Millisecond);
            // Create 1,000 values for set2
            for (int i = 0; i < 1000; i++)
            {
                int value = 1000000000 + (random.Next() % 200000 + 1);
                set2.Add(value);
            }
            long start = System.Diagnostics.Stopwatch.GetTimestamp();
            for (int i = 0; i < 1000; i++)
            {
                runIntersectionTest(set1,set2);
            }
            long duration = System.Diagnostics.Stopwatch.GetTimestamp() - start;
            Console.WriteLine(String.Format("Found the intersection 1000 times, in {0} ms", ((float) duration * 1000.0f) / System.Diagnostics.Stopwatch.Frequency));
            Console.ReadKey();
        }
        static int runIntersectionTest(List<int> set1, List<int> set2)
        {
            Dictionary<int,int> theMap = new Dictionary<int,int>(100000);
            // Now intersect the two sets by populating the map
            foreach( int value in set1 )
            {
                theMap[value] = 1;
            }
            int intersectionSize = 0;
            foreach ( int value in set2 )
            {
                int count;
                if ( theMap.TryGetValue(value, out count ) )
                {
                    theMap[value] = 2;
                    intersectionSize++;
                }
            }
            return intersectionSize;
        }
    }
}C++代码:
// MapPerformance.cpp : Defines the entry point for the console application.
//
#include "stdafx.h"
#include <hash_map>
#include <vector>
#include <iostream>
#include <time.h>
#include <algorithm>
#include <set>
#include <boost\unordered\unordered_map.hpp>
#include "timer.h"
using namespace std;
using namespace stdext;
using namespace boost;
int runIntersectionTest(vector<int> set1, vector<int> set2)
{
    // hash_map<int,int> theMap;
    // map<int,int> theMap;
    unordered_map<int,int> theMap;
    // Now intersect the two sets by populating the map
    for ( vector<int>::iterator iterator = set1.begin(); iterator != set1.end(); iterator++ )
    {
        int value = *iterator;
        theMap[value] = 1;
    }
    int intersectionSize = 0;
    for ( vector<int>::iterator iterator = set2.begin(); iterator != set2.end(); iterator++ )
    {
        int value = *iterator;
        unordered_map<int,int>::iterator foundValue = theMap.find(value);
        if ( foundValue != theMap.end() )
        {
            theMap[value] = 2;
            intersectionSize++;
        }
    }
    return intersectionSize;
}
int runSetIntersection(set<int> set1, set<int> set2)
{   
    set<int> intersection;
    set_intersection(set1.begin(),set1.end(), set2.begin(), set2.end(), inserter(intersection, intersection.end()));
    return intersection.size(); 
}
int _tmain(int argc, _TCHAR* argv[])
{
    srand ( time(NULL) );
    vector<int> set1;
    vector<int> set2;
    set1.reserve(10000);
    set2.reserve(1000);
    // Create 100,000 values for set1
    for ( int i = 0; i < 100000; i++ )
    {
        int value = 1000000000 + i;
        set1.push_back(value);
    }
    // Create 1,000 values for set2
    for ( int i = 0; i < 1000; i++ )
    {
        int random = rand() % 200000 + 1;
        random *= 10;
        int value = 1000000000 + random;
        set2.push_back(value);
    }
    Timer timer;
    for ( int i = 0; i < 1000; i++ )
    {
        runIntersectionTest(set1, set2);
    }
    timer.Stop();
    cout << "Found the intersection (using unordered_map) 1000 times, in " << timer.GetMilliseconds() << "ms" << endl;
    set<int> set21;
    set<int> set22;
    // Create 100,000 values for set1
    for ( int i = 0; i < 100000; i++ )
    {
        int value = 1000000000 + i;
        set21.insert(value);
    }
    // Create 1,000 values for set2
    for ( int i = 0; i < 1000; i++ )
    {
        int random = rand() % 200000 + 1;
        random *= 10;
        int value = 1000000000 + random;
        set22.insert(value);
    }
    timer.Reset();
    for ( int i = 0; i < 1000; i++ )
    {
        runSetIntersection(set21,set22);
    }
    timer.Stop();
    cout << "Found the intersection (using set_intersection) 1000 times, in " << timer.GetMilliseconds() << "ms" << endl;
    getchar();
    return 0;
}好的,这是最新的,有一些更改:
C++ (Release,x64)结果:
Found the intersection of 503 values (using unordered_map) 1000 times, in 35131.1ms
Found the intersection of 494 values (using set_intersection) 1000 times, in 10317ms所以它比C#慢2倍。@Jalf:你得到了一些相当快的数字,我是不是做错了什么?
C++代码:
// MapPerformance.cpp : Defines the entry point for the console application.
//
#include "stdafx.h"
#include <hash_map>
#include <vector>
#include <iostream>
#include <time.h>
#include <algorithm>
#include <set>
#include <boost\unordered\unordered_map.hpp>
#include "timer.h"
using namespace std;
using namespace stdext;
using namespace boost;
int runIntersectionTest(const vector<int>& set1, const vector<int>& set2)
{
    // hash_map<int,int> theMap;
    // map<int,int> theMap;
    unordered_map<int,int> theMap;  
    vector<int>::const_iterator set1_end = set1.end();
    // Now intersect the two sets by populating the map
    for ( vector<int>::const_iterator iterator = set1.begin(); iterator != set1_end; ++iterator )
    {
        int value = *iterator;
        theMap[value] = 1;
    }
    int intersectionSize = 0;
    vector<int>::const_iterator set2_end = set2.end();
    for ( vector<int>::const_iterator iterator = set2.begin(); iterator != set2_end; ++iterator )
    {
        int value = *iterator;
        unordered_map<int,int>::iterator foundValue = theMap.find(value);
        if ( foundValue != theMap.end() )
        {
            theMap[value] = 2;
            intersectionSize++;
        }
    }
    return intersectionSize;
}
int runSetIntersection(const vector<int> set1_unsorted, const vector<int> set2_unsorted)
{   
    // Create two vectors
    std::vector<int> set1(set1_unsorted.size());
    std::vector<int> set2(set2_unsorted.size());
    // Copy the unsorted data into them
    std::copy(set1_unsorted.begin(), set1_unsorted.end(), set1.begin());
    std::copy(set2_unsorted.begin(), set2_unsorted.end(), set2.begin());
    // Sort the data
    sort(set1.begin(),set1.end());
    sort(set2.begin(),set2.end());
    vector<int> intersection;
    intersection.reserve(1000);
    set_intersection(set1.begin(),set1.end(), set2.begin(), set2.end(), inserter(intersection, intersection.end()));
    return intersection.size(); 
}
void createSets( vector<int>& set1, vector<int>& set2 )
{
    srand ( time(NULL) );
    set1.reserve(100000);
    set2.reserve(1000);
    // Create 100,000 values for set1
    for ( int i = 0; i < 100000; i++ )
    {
        int value = 1000000000 + i;
        set1.push_back(value);
    }
    // Try to get half of our values intersecting
    float ratio = 200000.0f / RAND_MAX;
    // Create 1,000 values for set2
    for ( int i = 0; i < 1000; i++ )
    {
        int random = rand() * ratio + 1;
        int value = 1000000000 + random;
        set2.push_back(value);
    }
    // Make sure set1 is in random order (not sorted)
    random_shuffle(set1.begin(),set1.end());
}
int _tmain(int argc, _TCHAR* argv[])
{
    int intersectionSize = 0;
    vector<int> set1, set2; 
    createSets( set1, set2 );
    Timer timer;
    for ( int i = 0; i < 1000; i++ )
    {
        intersectionSize = runIntersectionTest(set1, set2);
    }
    timer.Stop();
    cout << "Found the intersection of " << intersectionSize << " values (using unordered_map) 1000 times, in " << timer.GetMilliseconds() << "ms" << endl;
    timer.Reset();
    for ( int i = 0; i < 1000; i++ )
    {
        intersectionSize = runSetIntersection(set1,set2);
    }
    timer.Stop();
    cout << "Found the intersection of " << intersectionSize << " values (using set_intersection) 1000 times, in " << timer.GetMilliseconds() << "ms" << endl;
    getchar();
    return 0;
}发布于 2009-06-29 21:55:25
你的考试有几个问题。
首先,您不是在测试set交集,而是“创建两个数组,用随机数填充它们,然后执行set交集”。您应该只对您感兴趣的部分代码进行计时。即使你想要做这些事情,他们也不应该在这里做基准。一次测量一件事,以减少不确定性。如果您希望您的C++实现执行得更好,那么您首先需要知道它的哪一部分比预期的要慢。这意味着你必须把设置代码和交集测试分开。
其次,您应该多次运行测试,以考虑到可能的缓存效果和其他不确定性。(并且可能输出一次总运行时间,例如1000次运行,而不是每一次单独的时间。这样,就可以减少定时器带来的不确定性,定时器可能有有限的分辨率,并且在0-20ms范围内使用时会报告不准确的结果。
此外,就我从文档中读取的数据而言,应该对set_intersection的输入进行排序,而set2不会排序。A似乎没有理由使用unordered_map,而unordered_set将比您所做的要好得多。
关于所需的设置代码,请注意,您可能不需要填充向量才能运行交集。您自己的实现和set_intersection都已经在迭代器上工作了,所以您可以简单地将它们传递给输入已经存在的数据结构。
关于您的代码的一些更具体的注释:
++iterator而不是iterator++unordered_set (非unordered_map)的实验编辑:
我还没有试过您的C#版本,所以我无法正确地比较数字,但这是我修改过的测试。每台运行1000次,运行在具有4GB RAM的Core2Quad2.5GHz上:
std::set_intersection on std::set: 2606ms
std::set_intersection on tr1::unordered_set: 1014ms
std::set_intersection on sorted vectors: 171ms
std::set_intersection on unsorted vectors: 10140ms最后一个问题有点不公平,因为它必须同时复制和排序向量。理想情况下,只有这类应该是基准的一部分。我尝试创建一个使用1000个未排序向量的数组的版本(这样我就不必在每次迭代中复制未排序的数据了),但是性能差不多,或者更糟一些,因为这会导致不断的缓存丢失,所以我回到了这个版本。
我的密码是:
#define _SECURE_SCL 0
#include <ctime>
#include <vector>
#include <set>
#include <iostream>
#include <algorithm>
#include <unordered_set>
#include <windows.h>
template <typename T, typename OutIter>
void stl_intersect(const T& set1, const T& set2, OutIter out){
    std::set_intersection(set1.begin(), set1.end(), set2.begin(), set2.end(), out);
}
template <typename T, typename OutIter>
void sort_stl_intersect(T& set1, T& set2, OutIter out){
    std::sort(set1.begin(), set1.end());
    std::sort(set2.begin(), set2.end());
    std::set_intersection(set1.begin(), set1.end(), set2.begin(), set2.end(), out);
}
template <typename T>
void init_sorted_vec(T first, T last){
    for ( T cur = first; cur != last; ++cur)
    {
        int i = cur - first;
        int value = 1000000000 + i;
        *cur = value;
    }
}
template <typename T>
void init_unsorted_vec(T first, T last){
    for ( T cur = first; cur != last; ++cur)
    {
        int i = rand() % 200000 + 1;
        i *= 10;
        int value = 1000000000 + i;
        *cur = value;
    }
}
struct resize_and_shuffle {
    resize_and_shuffle(int size) : size(size) {}
    void operator()(std::vector<int>& vec){
        vec.resize(size);
    }
    int size;
};
int main()
{
    srand ( time(NULL) );
    std::vector<int> out(100000);
    std::vector<int> sortedvec1(100000);
    std::vector<int> sortedvec2(1000);
    init_sorted_vec(sortedvec1.begin(), sortedvec1.end());
    init_unsorted_vec(sortedvec2.begin(), sortedvec2.end());
    std::sort(sortedvec2.begin(), sortedvec2.end());
    std::vector<int> unsortedvec1(sortedvec1.begin(), sortedvec1.end());
    std::vector<int> unsortedvec2(sortedvec2.begin(), sortedvec2.end());
    std::random_shuffle(unsortedvec1.begin(), unsortedvec1.end());
    std::random_shuffle(unsortedvec2.begin(), unsortedvec2.end());
    std::vector<int> vecs1[1000];
    std::vector<int> vecs2[1000];
    std::fill(vecs1, vecs1 + 1000, unsortedvec1);
    std::fill(vecs2, vecs2 + 1000, unsortedvec2);
    std::set<int> set1(sortedvec1.begin(), sortedvec1.end());
    std::set<int> set2(sortedvec2.begin(), sortedvec2.end());
    std::tr1::unordered_set<int> uset1(sortedvec1.begin(), sortedvec1.end());
    std::tr1::unordered_set<int> uset2(sortedvec2.begin(), sortedvec2.end());
    DWORD start, stop;
    DWORD delta[4];
    start = GetTickCount();
    for (int i = 0; i < 1000; ++i){
        stl_intersect(set1, set2, out.begin());
    }
    stop = GetTickCount();
    delta[0] = stop - start;
    start = GetTickCount();
    for (int i = 0; i < 1000; ++i){
        stl_intersect(uset1, uset2, out.begin());
    }
    stop = GetTickCount();
    delta[1] = stop - start;
    start = GetTickCount();
    for (int i = 0; i < 1000; ++i){
        stl_intersect(sortedvec1, sortedvec2, out.begin());
    }
    stop = GetTickCount();
    delta[2] = stop - start;
    start = GetTickCount();
    for (int i = 0; i < 1000; ++i){
        sort_stl_intersect(vecs1[i], vecs1[i], out.begin());
    }
    stop = GetTickCount();
    delta[3] = stop - start;
    std::cout << "std::set_intersection on std::set: " << delta[0] << "ms\n";
    std::cout << "std::set_intersection on tr1::unordered_set: " << delta[1] << "ms\n";
    std::cout << "std::set_intersection on sorted vectors: " << delta[2] << "ms\n";
    std::cout << "std::set_intersection on unsorted vectors: " << delta[3] << "ms\n";
    return 0;
}没有理由让C++总是比C#更快。C#有几个关键优势,需要在C++中与之竞争。我能想到的主要问题是,在.NET领域中,动态分配是非常便宜的。每次C++向量、set或unordered_set (或任何其他容器)必须调整大小或展开时,都是非常昂贵的malloc操作。在.NET中,堆分配只不过是向指针添加偏移量。
因此,如果您想要C++版本竞争,您可能需要解决这个问题,允许您的容器调整大小而不必执行实际的堆分配,可能是通过为容器使用自定义分配程序(可能是boost::池可能是一个不错的选择,或者您可以尝试滚动您自己的)
另一个问题是,set_difference只在排序输入上工作,为了再现涉及排序的测试结果,我们必须在每次迭代中重新复制未排序的数据,这是非常昂贵的(尽管使用自定义分配器会有很大帮助)。我不知道您的输入采用什么形式,但是您可以直接对输入进行排序,而不需要复制它,然后直接对其运行set_difference。(如果输入是一个数组或至少是一个STL容器,那么很容易做到这一点。)
STL的一个关键优点是它非常灵活,可以在几乎任何输入序列上工作。在C#中,您几乎必须将输入复制到列表或字典或其他东西中,但是在C++中,您可能可以在原始输入上运行std::sort和set_intersection。
最后,当然,尝试在分析器中运行代码,并确切地查看时间是在哪里度过的。您也可以尝试通过GCC来运行代码。这是我的印象,STL在MSVC中的表现有时有点古怪。可能值得在另一个编译器下进行测试,看看是否有类似的时间安排。
最后,您可能会发现这些与C++ vs C#:http://blogs.msdn.com/ricom/archive/2005/05/10/416151.aspx的性能相关的博客文章。
本质上,这些方法的士气是肯定的,您可以在C++中获得更好的性能,但这是一项令人惊讶的工作。
发布于 2009-06-29 22:49:00
我现在看到的一个问题是,您通过值传递C++中的集合,而不是通过const引用传递集合。所以你每次经过他们的时候都在模仿他们!
另外,我也不会使用一个集合作为set_intersection的目标。我会用这样的方法
int runSetIntersection(const set<int>& set1, const set<int>& set2)
{   
    vector<int> intersection;
    intersection.reserve(10000) // or whatever the max is
    set_intersection(set1.begin(),set1.end(), set2.begin(), set2.end(), back_inserter(intersection));
    return intersection.size(); 
}然而,这段代码仍然在函数内部分配。会更快
int runSetIntersection(const set<int>& set1, const set<int>& set2, vector<int>& scratch)
{   
    scratch.reserve(10000) // or whatever the max is
    set_intersection(set1.begin(),set1.end(), set2.begin(), set2.end(), back_inserter(scratch));
    return scratch.size(); 
}然后在启动计时器之前分配划痕。
不过,如果您只是在寻找大小,那么手工编写的for循环,再加上set::find,可能会带来更好的结果。
发布于 2009-06-29 21:32:30
用这个..。
vector<int> set1(10000);
vector<int> set2(1000);..。得到非零初始大小的向量。然后不要使用push_back,而是直接更新值。
https://stackoverflow.com/questions/1060648
复制相似问题