布卢姆-过滤器是一种数据结构,我们可以在其中插入元素,并检查它是否已经包含给定的元素。其特点是,如果contains查询返回true,那么它可能仍然是可能的,事实上,这个元素没有插入到过滤器中。(另一方面,如果它返回false,那么元素肯定不是以前插入的。)
该实现由长度n的位向量(最初的所有位都是0)和k哈希函数组成,后者将任何输入值映射到([0...n),即0包含,n独占)的范围内。在添加元素时,我们计算所有n哈希函数的映射值,并在位向量中将相应的位设置为1。类似地,在查询是否添加了一个元素时,我们计算所有哈希函数的值并返回true,如果所有对应位都是true,而false不是这样(即,如果至少一个函数的对应位为零)。
虽然任何评论/建议都是受欢迎的,但我最感兴趣的是以下几个方面:
bloom-contains不相等的地方,那么在reduce中是否有一种优雅的方法可以跳出1?)用于测试的散列函数的质量超出了本审查的范围。(我知道有更好的方法,但目前我只关注数据结构本身。)但是,如果你知道如何更好地组织它们,避免重复(但不让它成为全球性的!),那将是非常感谢的。
core.clj
(ns bloom-filter.core)
(defn bloom-create [numbits hash-functions]
(if (or (nil? numbits) (nil? hash-functions)) nil
{:bits (vec (repeat numbits 0)) :hash-functions hash-functions}))
(defn bloom-add [bloom-filter value]
(when-not (nil? bloom-filter)
(let [hash-functions (:hash-functions bloom-filter)
bits (:bits bloom-filter)
new-bits (reduce (fn [actual-bits hash-function] (assoc actual-bits (hash-function value) 1)) bits hash-functions)]
(assoc-in bloom-filter [:bits] new-bits))))
(defn bloom-contains [bloom-filter value]
(let [hash-functions (:hash-functions bloom-filter)
bits (:bits bloom-filter)]
(reduce (fn [actual-value hash-function] (and actual-value (= 1 (bits (hash-function value))))) true hash-functions)))core-test.clj
(ns bloom-filter.core-test
(:require [clojure.test :refer :all]
[bloom-filter.core :refer :all]))
(defn mod7-fun [num] (mod num 7))
(defn always-zero-fun [dontcare] 0)
(deftest create-test
(testing "create bloom filter"
(is (= nil (bloom-create nil nil)))
(is (= nil (bloom-create 0 nil)))
(is (= nil (bloom-create nil [])))
(is (= {:bits [] :hash-functions []} (bloom-create 0 [])))
(is (= {:bits [0] :hash-functions []} (bloom-create 1 [])))
(is (= {:bits [] :hash-functions [always-zero-fun]} (bloom-create 0 [always-zero-fun])))
))
(deftest add-test
(let [empty-filter (bloom-create 7 [])
single-fun-filter (bloom-create 7 [mod7-fun])
two-fun-filter (bloom-create 7 [mod7-fun always-zero-fun])]
(testing "add to bloom filter"
(is (= nil (bloom-add nil 3)))
(is (= empty-filter (bloom-add empty-filter nil)))
(is (= empty-filter (bloom-add empty-filter 10)))
(is (= {:bits [0 0 0 1 0 0 0] :hash-functions [mod7-fun]}
(bloom-add single-fun-filter 3)))
(is (= {:bits [1 0 0 1 0 0 0] :hash-functions [mod7-fun always-zero-fun]}
(bloom-add two-fun-filter 3)))
)))
(deftest contains-test
(let [empty-filter (bloom-create 7 [])
simple-filter (bloom-create 7 [mod7-fun])
filter-with-element (bloom-add simple-filter 3)]
(testing "bloom filter contains"
(is (true? (bloom-contains empty-filter 0)))
(is (false? (bloom-contains simple-filter 0)))
(is (true? (bloom-contains filter-with-element 3)))
(is (true? (bloom-contains filter-with-element 10)))
)))发布于 2018-02-09 09:13:40
首先,这是一个很好的问题和问题陈述。我的评论主要包括代码风格的改进,购买,也许你仍然会发现它有帮助。
bloom-create函数(if ... nil ...),您可以使用(when-not ... ...)。hash-functions是函数的集合,因此您应该编写(empty? hash-functions)而不是(nil? hash-functions)。请注意,这会改变程序的语义。assert表达式来编写简单的参数验证。比如:(assert (every? ifn? hash-functions))和(assert (number? numbits))。(vector-of :boolean ...)来获得更好的性能,而不是零向量。#(mod % numbits)来实现这一点。bloom-add函数(when-not (nil? bloom-filter) ...) you can write (when bloom-filter ...)bloom-filter参数的内容。(assoc ... :bits ...)而不是(assoc-in ... [:bits] ...)。reduce使思考速度更快。不幸的是,它们目前没有与vector-of一起工作:bloom-contains函数empty-filter、single-fun-filter和two-fun-filter。add-test中,您可能只检查结果的:bits部分,从而缩短测试用例。(defn bloom-create [numbits hash-functions]
(when-not (or (nil? numbits) (empty? hash-functions))
(assert (number? numbits))
(assert (every? ifn? hash-functions))
{:bits (apply vector-of :boolean (repeat numbits false))
:hash-functions (mapv (partial comp #(mod % numbits)) hash-functions)}))
(defn bloom-add [{:keys [hash-functions bits] :as bloom-filter} value]
(when-not (nil? bloom-filter)
(->> hash-functions
(reduce (fn [actual-bits hash-function]
(assoc actual-bits (hash-function value) true)) bits)
(assoc bloom-filter :bits))))
(defn bloom-contains? [{:keys [hash-functions bits]} value]
(some (map #(false? (bits (% value))))))
(require '[clojure.test :refer :all])
(defn mod7-fun [num] (mod num 7))
(defn always-zero-fun [dontcare] 0)
(def single-fun-filter (bloom-create 7 [mod7-fun]))
(def two-fun-filter (bloom-create 7 [mod7-fun always-zero-fun]))
(deftest add-test
(testing "add to bloom filter"
(is (= nil (bloom-add nil 3)))
(is (= [false false false true false false false]
(:bits (bloom-add single-fun-filter 3))))
(is (= [true false false true false false false]
(:bits (bloom-add two-fun-filter 3))))))https://codereview.stackexchange.com/questions/187050
复制相似问题