文章/答案/技术大牛

发布

问分析Haskell程序
EN

Stack Overflow用户

提问于 2012-07-21 17:42:48

回答 4查看 3.5K关注 0票数 24

我有一段代码，它反复使用sequence从概率分布中抽取样本。从道德上讲，它会做这样的事情：

sampleMean :: MonadRandom m => Int -> m Float -> m Float
sampleMean n dist = do
  xs <- sequence (replicate n dist)
  return (sum xs)

只是有点复杂。我感兴趣的实际代码是likelihoodWeighting at 这个吉特布回购的函数。

我注意到n的运行时间尺度是非线性的。特别是，一旦n超过某个值，它就会达到内存限制，并且运行时间会急剧增加。我不确定，但我认为这是因为sequence正在构建一长串直到调用sum才能得到评估的块列表。

一旦我通过了大约100000个样本，程序就会慢到爬行。我想优化它(我的感觉是，1000万个样本不应该是一个问题)，所以我决定分析它--但是我在理解分析器的输出时遇到了一些小问题。

特征分析

我在一个文件main.hs中创建了一个简短的可执行文件，该文件使用10万个示例运行我的函数。这是‘s的输出

$ ghc -O2 -rtsopts main.hs
$ ./main +RTS -s

首先，我注意到--它分配了将近1.5GB的堆，并将60%的时间用于垃圾收集。这是否一般地表明了太多的懒惰？

 1,377,538,232 bytes allocated in the heap
 1,195,050,032 bytes copied during GC
   169,411,368 bytes maximum residency (12 sample(s))
     7,360,232 bytes maximum slop
           423 MB total memory in use (0 MB lost due to fragmentation)

Generation 0:  2574 collections,     0 parallel,  2.40s,  2.43s elapsed
Generation 1:    12 collections,     0 parallel,  1.07s,  1.28s elapsed

INIT  time    0.00s  (  0.00s elapsed)
MUT   time    1.92s  (  1.94s elapsed)
GC    time    3.47s  (  3.70s elapsed)
RP    time    0.00s  (  0.00s elapsed)
PROF  time    0.23s  (  0.23s elapsed)
EXIT  time    0.00s  (  0.00s elapsed)
Total time    5.63s  (  5.87s elapsed)

%GC time      61.8%  (63.1% elapsed)

Alloc rate    716,368,278 bytes per MUT second

Productivity  34.2% of total user, 32.7% of total elapsed

以下是来自

$ ./main +RTS -p

当我第一次运行这个程序时，结果发现有一个函数被反复调用，结果我可以回忆录它，它使事情的速度提高了2倍。然而，它并没有解决空间泄漏问题。

COST CENTRE           MODULE                no. entries  %time %alloc   %time %alloc

MAIN                  MAIN                    1        0   0.0    0.0   100.0  100.0
 main                 Main                  434        4   0.0    0.0   100.0  100.0
  likelihoodWeighting AI.Probability.Bayes  445        1   0.0    0.3   100.0  100.0
   distributionLW     AI.Probability.Bayes  448        1   0.0    2.6     0.0    2.6
   getSampleLW        AI.Probability.Bayes  446   100000  20.0   50.4   100.0   97.1
    bnProb            AI.Probability.Bayes  458   400000   0.0    0.0     0.0    0.0
    bnCond            AI.Probability.Bayes  457   400000   6.7    0.8     6.7    0.8
    bnVals            AI.Probability.Bayes  455   400000  20.0    6.3    26.7    7.1
     bnParents        AI.Probability.Bayes  456   400000   6.7    0.8     6.7    0.8
    bnSubRef          AI.Probability.Bayes  454   800000  13.3   13.5    13.3   13.5
    weightedSample    AI.Probability.Bayes  447   100000  26.7   23.9    33.3   25.3
     bnProb           AI.Probability.Bayes  453   100000   0.0    0.0     0.0    0.0
     bnCond           AI.Probability.Bayes  452   100000   0.0    0.2     0.0    0.2
     bnVals           AI.Probability.Bayes  450   100000   0.0    0.3     6.7    0.5
      bnParents       AI.Probability.Bayes  451   100000   6.7    0.2     6.7    0.2
     bnSubRef         AI.Probability.Bayes  449   200000   0.0    0.7     0.0    0.7

这是堆的配置文件。我不知道为什么它声称运行时是1.8秒-这个运行大约需要6秒。

有人能帮助我解释分析器的输出--即找出瓶颈所在，并为如何加快速度提供建议吗？

performance

haskell

strictness

回答 4

Stack Overflow用户

回答已采纳

发布于 2012-07-22 15:08:27

通过将约翰的建议的foldM应用于likelihoodWeighting，已经取得了巨大的改进。这使得这里的内存使用量减少了10倍，并使GC时间大大减少到几乎或实际上可以忽略不计。

使用当前源运行的分析结果

probabilityIO              AI.Util.Util          26.1   42.4    413 290400000
weightedSample.go          AI.Probability.Bayes  16.1   19.1    255 131200080
bnParents                  AI.Probability.Bayes  10.8    1.2    171   8000384
bnVals                     AI.Probability.Bayes  10.4    7.8    164  53603072
bnCond                     AI.Probability.Bayes   7.9    1.2    125   8000384
ndSubRef                   AI.Util.Array          4.8    9.2     76  63204112
bnSubRef                   AI.Probability.Bayes   4.7    8.1     75  55203072
likelihoodWeighting.func   AI.Probability.Bayes   3.3    2.8     53  19195128
%!                         AI.Util.Util           3.3    0.5     53   3200000
bnProb                     AI.Probability.Bayes   2.5    0.0     40        16
bnProb.p                   AI.Probability.Bayes   2.5    3.5     40  24001152
likelihoodWeighting        AI.Probability.Bayes   2.5    2.9     39  20000264
likelihoodWeighting.func.x AI.Probability.Bayes   2.3    0.2     37   1600000

和-s报告的13 5MB内存使用量，~5MB最大驻留时间。还不算太糟。

尽管如此，仍有一些问题我们可以改进。首先，一个相对较小的问题，在大计划中，AI.UTIl.Array.ndSubRef

ndSubRef :: [Int] -> Int
ndSubRef ns = sum $ zipWith (*) (reverse ns) (map (2^) [0..])

倒转列表，并将(2^)映射到另一个列表上是效率低下的，更好的是

ndSubRef = L.foldl' (\a d -> 2*a + d) 0

它不需要将整个列表保存在内存中(可能不是什么大问题，因为列表将是短的)，因为它可以反转，并且不需要分配第二个列表。分配的减少是明显的，大约10%，而且这部分运行速度要快得多，

ndSubRef                   AI.Util.Array          1.7    1.3     24   8000384

在修改后的运行概要中，但是由于它只需要整个时间的一小部分，所以总体影响很小。在weightedSample和likelihoodWeighting中，有可能有更大的鱼可供食用。

让我们在weightedSample中添加一些严格性，看看它是如何改变事物的：

weightedSample :: Ord e => BayesNet e -> [(e,Bool)] -> IO (Map e Bool, Prob)
weightedSample bn fixed =
    go 1.0 (M.fromList fixed) (bnVars bn)
    where
        go w assignment []     = return (assignment, w)
        go w assignment (v:vs) = if v `elem` vars
            then
                let w' = w * bnProb bn assignment (v, fixed %! v)
                in go w' assignment vs
            else do
                let p = bnProb bn assignment (v,True)
                x <- probabilityIO p
                go w (M.insert v x assignment) vs

        vars = map fst fixed

go的权参数不受强迫，赋值参数也不强制，因此它们可以建立块。让我们启用{-# LANGUAGE BangPatterns #-}并强制更新立即生效，还可以在将p传递给probabilityIO之前对其进行评估

go w assignment (v:vs) = if v `elem` vars
    then
        let !w' = w * bnProb bn assignment (v, fixed %! v)
        in go w' assignment vs
    else do
        let !p = bnProb bn assignment (v,True)
        x <- probabilityIO p
        let !assignment' = M.insert v x assignment
        go w assignment' vs

这进一步减少了分配(~9%)和小加速比(~%13%)，但总内存使用量和最大驻留时间变化不大。

我没有看到任何其他明显的改变，所以让我们看看likelihoodWeighting

func m _ = do
    (a, w) <- weightedSample bn fixed
    let x = a ! e
    return $! x `seq` w `seq` M.adjust (+w) x m

在最后一行中，首先，w现在已经在weightedSample中进行了计算，因此我们不需要在这里对其进行seq，需要键x来计算更新的地图，因此也不需要seqing。这条线上的坏消息是M.adjust。adjust无法强制更新函数的结果，因此可以在映射的值中构建块。您可以通过查找修改后的值强制对块进行计算，但是Data.Map在这里提供了一种更方便的方法，因为地图更新的关键是insertWith'。

func !m _ = do
    (a, w) <- weightedSample bn fixed
    let x = a ! e
    return (M.insertWith' (+) x w m)

(注: GHC在m上使用bang模式比在这里使用return $! ...优化得更好)。这会略微减少总分配，并且不会显著改变运行时间，但会对所使用的总内存和最大驻留时间产生很大影响：

 934,566,488 bytes allocated in the heap
   1,441,744 bytes copied during GC
      68,112 bytes maximum residency (1 sample(s))
      23,272 bytes maximum slop
           1 MB total memory in use (0 MB lost due to fragmentation)

运行时间上最大的改进将是避免使用randomIO，使用的StdGen非常慢。

我很惊讶bn*函数花费了多少时间，但是没有看到任何明显的低效率。

票数 14

Stack Overflow用户

发布于 2012-07-22 18:42:11

我很难消化这些档案，但我以前被踢屁股，因为MonadRandom的黑客是严格的。创建一个懒惰的MonadRandom版本使我的内存问题消失了。

我的同事还没有得到发布代码的许可，但我已经将Control.Monad.LazyRandom放到了巴斯丁上。或者，如果您想要看到解释完全懒惰的随机搜索(包括无限随机计算列表)的摘录，请查看http://www.cs.tufts.edu/~nr/pubs/mrfy-abstract.html。

票数 7

Stack Overflow用户

发布于 2012-07-22 00:12:19

我收集了一个非常基本的例子，张贴在这里：http://hpaste.org/71919。我不确定这是否像你的例子..。只是一个很小的东西，似乎起作用了。

使用-prof和-fprof-auto编译并运行100000次迭代，将得到以下分析输出(请原谅我的行号)：

  8 COST CENTRE                   MODULE               %time %alloc
  9 
 10 sample                        AI.Util.ProbDist      31.5   36.6
 11 bnParents                     AI.Probability.Bayes  23.2    0.0
 12 bnRank                        AI.Probability.Bayes  10.7   23.7
 13 weightedSample.go             AI.Probability.Bayes   9.6   13.4
 14 bnVars                        AI.Probability.Bayes   8.6   16.2
 15 likelihoodWeighting           AI.Probability.Bayes   3.8    4.2
 16 likelihoodWeighting.getSample AI.Probability.Bayes   2.1    0.7
 17 sample.cumulative             AI.Util.ProbDist       1.7    2.1
 18 bnCond                        AI.Probability.Bayes   1.6    0.0
 19 bnRank.ps                     AI.Probability.Bayes   1.1    0.0

以下是简要统计数字：

1,433,944,752 bytes allocated in the heap
 1,016,435,800 bytes copied during GC
   176,719,648 bytes maximum residency (11 sample(s))
     1,900,232 bytes maximum slop
           400 MB total memory in use (0 MB lost due to fragmentation)

INIT    time    0.00s  (  0.00s elapsed)
MUT     time    1.40s  (  1.41s elapsed)
GC      time    1.08s  (  1.24s elapsed)
Total   time    2.47s  (  2.65s elapsed)

%GC     time      43.6%  (46.8% elapsed)

Alloc rate    1,026,674,336 bytes per MUT second

Productivity  56.4% of total user, 52.6% of total elapsed

注意，分析器将手指指向sample。我通过使用return强制使用$!函数，下面是一些总结统计信息：

1,776,908,816 bytes allocated in the heap
  165,232,656 bytes copied during GC
   34,963,136 bytes maximum residency (7 sample(s))
      483,192 bytes maximum slop
           68 MB total memory in use (0 MB lost due to fragmentation)

INIT    time    0.00s  (  0.00s elapsed)
MUT     time    2.42s  (  2.44s elapsed)
GC      time    0.21s  (  0.23s elapsed)
Total   time    2.63s  (  2.68s elapsed)

%GC     time       7.9%  (8.8% elapsed)

Alloc rate    733,248,745 bytes per MUT second

Productivity  92.1% of total user, 90.4% of total elapsed

在GC方面，效率要高得多，但在时间上没有太大变化。您可能会继续以这种配置文件/调整的方式迭代，以锁定瓶颈，并获得更好的性能。

票数 4

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/11594481

复制

相似问题

问分析Haskell程序
EN

回答 4

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问分析Haskell程序EN

回答 4

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问分析Haskell程序
EN