我正在编写一个遗传算法来生成字符串"helloworld“。但当n为10,000或更大时,演进函数会产生堆栈溢出。
module Genetics where
import Data.List (sortBy)
import Random (randomRIO)
import Control.Monad (foldM)
class Gene g where
-- How ideal is the gene from 0.0 to 1.0?
fitness :: g -> Float
-- How does a gene mutate?
mutate :: g -> IO g
-- How many species will be explored?
species :: [g] -> Int
orderFitness :: (Gene g) => [g] -> [g]
orderFitness = reverse . sortBy (\a b -> compare (fitness a) (fitness b))
compete :: (Gene g) => [g] -> IO [g]
compete pool = do
let s = species pool
variants <- (mapM (mapM mutate) . map (replicate s)) pool
let pool' = (map head . map orderFitness) variants
return pool'
evolve :: (Gene g) => Int -> [g] -> IO [g]
evolve 0 pool = return pool
evolve n pool = do
pool' <- compete pool
evolve (n - 1) pool'有了species pool = 8,8个基因库可以复制到8个组中。每一组都发生变异,每组中最适合的被选择进行进一步的进化(返回到8个基因)。
发布于 2011-05-11 07:36:58
多亏了唐的deepseq建议,我能够将问题缩小到mapM mutate,它造成了太多的thunks。新版本有mutate',它使用seq来防止thunking。
module Genetics where
import Data.List (maximumBy)
import Random (randomRIO)
class Gene g where
-- How ideal is the gene from 0.0 to 1.0?
fitness :: g -> Float
-- How does a gene mutate?
mutate :: g -> IO g
-- How many species will be explored in each round?
species :: [g] -> Int
best :: (Gene g) => [g] -> g
best = maximumBy (\a b -> compare (fitness a) (fitness b))
-- Prevents stack overflow
mutate' :: (Gene g) => g -> IO g
mutate' gene = do
gene' <- mutate gene
gene' `seq` return gene'
drift :: (Gene g) => [[g]] -> IO [[g]]
drift = mapM (mapM mutate')
compete :: (Gene g) => [g] -> IO [g]
compete pool = do
let islands = map (replicate (species pool)) pool
islands' <- drift islands
let representatives = map best islands'
return representatives
evolve :: (Gene g) => Int -> [g] -> IO [g]
evolve 0 pool = return pool
evolve n pool = compete pool >>= evolve (n - 1)发布于 2011-05-11 06:39:12
如果您对性能感兴趣,我将使用快速随机数生成器,例如:
其次,compete看起来非常可疑,因为它完全是懒惰的,尽管它构建了一些潜在的大型结构。试着用deepseq锤子把它重写得更严格一点:
import Control.DeepSeq
compete :: (Gene g, NFData g) => [g] -> IO [g]
compete pool = do
let s = species pool
variants <- (mapM (mapM mutate) . map (replicate s)) pool
let pool' = (map head . map orderFitness) variants
pool' `deepseq` return pool'不过,这些东西都不需要放在IO中(单独发布)。类似于Rand monad的东西可能是more appropriate。
发布于 2011-05-11 07:08:45
在orderFitness是sortBy的情况下,您可以使用maximumBy和单个map,而不是使用(map head . map orderFitness)。这不会节省太多(因为您将从O(n log n)变为O(n),并且可能会从消除双映射中获得另一个因子2),但至少在某种程度上更简单和更有效。您还可以摆脱对反向的调用。
我怀疑这在没有deepseq的情况下解决了这个问题,但这应该是一个改进。
编辑:如果标准库和GHC是完美的,那么head . sortBy将产生与maximumBy相同的代码,而map head . map sortBy将产生与map (head . sortBy)相同的代码。遗憾的是,这两件事在实践中都不太可能是真的。sortBy会倾向于做一堆额外的内存分配,因为它是一种分而治之的算法。组合地图是您有时会得到的优化,但不应指望。
更重要的是,使用maximumBy更具声明性。更容易看到代码做了什么以及需要多长时间。在优化中利用它也应该更容易,因为我们知道目标是什么,而不仅仅是我们如何实现它。
https://stackoverflow.com/questions/5956554
复制相似问题