文章/答案/技术大牛

发布

社区首页 >问答首页 >Python __hash__：身份与等价

问Python hash：身份与等价
EN

Stack Overflow用户

提问于 2019-10-11 17:52:46

回答 2查看 1K关注 0票数 3

我有一个类，它的实例将通过与它们所携带的数据值不同的标识来区分。在我的代码中，我打算使用==来表示两个实例对于它们的数据是等价的，而is则表示两个变量引用同一个实例，即它们是相同的。据我理解，这一切都是很正常的。

此外，我希望在sets中使用实例(等效或不等效)，在dicts中使用作为键，这需要为类定义__hash__函数。

但在这方面，我不理解__hash__文档中的要求

唯一需要的属性是比较相等的对象具有相同的哈希值。

这是否意味着两个不同但等效的对象不能作为dict中的不同键使用，或者单独出现在set中？在下面的代码中，我通过重写__eq__和__hash__来反映我的预期用途，从而打破了这一要求。它的工作方式与Python2.7和3.7中预期的一样。

像我在这里所做的那样，违背__hash__的要求会带来什么负面后果？

有更好的方法来完成我的目标吗？

class A( object ):
        def __init__( self, v1, v2 ):
                self.v = ( v1, v2 )

        def __eq__( self, b ):
                return self.v[0] == b.v[0] and self.v[1] == b.v[1]

        def __hash__( self ):
                return id( self )

        def __str__( self ):
                return str( self.v )

p = A( 1, 0 )
q = A( 1, 0 )

print( str( p ), str( q ) )
print( "identical?", p is q )
print( "equivalent?", p == q )
print( "hashes", hash(p), hash(q) )

s = set( [p, q] )
print( "set length", len( s ) )
print( "both in set?", p in s, q in s )

d = { p:3, q:4 }
print( "dict length", len( d ) )
print( "as distinct keys", d[p], d[q] )

python

dictionary

hash

identity

equivalence

回答 2

Stack Overflow用户

发布于 2019-10-11 18:19:06

唯一需要的属性是比较相等的对象具有相同的哈希值。

规范文本中的“比较相等”意味着它们的__eq__方法的结果--不要求它们是同一个对象。

认为，__hash__必须基于__eq__中使用的值，而不是对象的"id“--这在代码中是不正确的。要使它发挥作用，就必须这样：

只要做：

      ...
      def __eq__( self, b ):
           return self.v[0] == b.v[0] and self.v[1] == b.v[1]

      def __hash__( self ):
           return hash((self.v[0], self.v[1]))

，这是否意味着两个不同但等效的对象不能作为dict中的不同键使用，或者单独出现在一个集合中？

是。这就是规范的意思。

解决方法是让类保留默认的__eq__实现来遵守规则，并实现代码中必须使用的另一种比较形式。

最简单的方法就是保留__eq__的默认实现，它通过标识进行比较，并有一个用于比较的任意方法(不支持重写操作符的语言中的代码无论如何都必须使用)：

class A( object ):
    ...
    def equals( self, b ):
       return self.v[0] == b.v[0] and self.v[1] == b.v[1]

p = A( 1, 0 )
q = A( 1, 0 )

print( str( p ), str( q ) )
print( "identical?", p is q )
print( "equivalent?", p.equals(q) )

在这方面有一些改进的方法--但最基本的方法是：__eq__必须遵守规则，并进行身份比较。

一种方法是拥有一个可以用作比较的“命名空间”的内部关联对象：

class CompareSpace:
    def __init__(self, parent):
        self.parent = parent

        def __eq__( self, other ):
            other = other.parent if isinstance(other, type(self)) else other 
            return self.parent.v[0] == other.v[0] and other.v[1] == b.parent.v[1]


    class A:
        def __init__( self, v1, v2 ):
            self.v = ( v1, v2 )
            self.comp = CompareSpace(self)

        def __str__( self ):
            return str( self.v )

p = A( 1, 0 )
q = A( 1, 0 )

print( str( p ), str( q ) )
print( "identical?", p is q )
print( "equivalent?", p.comp == q )
print( "hashes", hash(p), hash(q) )

破碎的示范

现在，我将插入一个例子，说明这个问题是如何中断的--我正在创建一个更加仔细的类，以确保这个问题在第一次尝试时就会发生。但是，如果问题每发生一次，即使每200万次发生一次，您的代码仍然会被破坏，无法在任何真实的情况下使用，即使是个人代码:您将拥有一本不确定的字典：

class Broken:
    def __init__(self, name):
        self.name = name
    def __hash__(self):
        return id(self) % 256
    def __eq__(self, other):
        return True
    def __repr__(self):
        return self.name


In [23]: objs = [Broken(f"{i:02d}") for i in range(64)]                                        

In [24]: print(objs)                                                                           
[00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63]

In [25]: test = {}                                                                             

In [26]: for obj in objs: 
    ...:     if obj not in test: 
    ...:         test[obj] = 0 
    ...:                                                                                       

In [27]: print(test)                                                                           
{00: 0, 01: 0, 02: 0, 11: 0}

# Or with simple, unconditional, insertion:
In [29]: test = {obj: i for i, obj in enumerate(objs)}                                         

In [30]: test                                                                                  
Out[30]: {00: 57, 01: 62, 02: 63, 11: 60}

(我重复一遍，虽然您的值本身不会发生冲突，但内部dict代码必须将散列中的数目减少到哈希表中的索引中-不一定是按模块(%) -否则每个空dict都需要2 ** 64个空条目，而且只有当所有散列只有64位宽的情况下)

票数 1

Stack Overflow用户

发布于 2019-10-20 09:00:44

我做了更多的测试。其结果是，尽管缺乏文件，尽管有人警告说可能出了问题，

我写的代码从未失败过。

现在，我已经在64位和32位平台上，使用dict 2.7和3.0以及PyPy向set和CPython添加了数十亿个对象。我已经在一台更大的机器上尝试过了，在这个机器上，我一次向单个set中添加了超过20亿个对象。效果很好。我从未见过与OP中的代码发生冲突。

这不是侥幸或意外。

有人煞费苦心地确保了这种行为。神秘的是，为什么它没有记录下来？

我能从其他帖子中看到的最好结果是，容器算法在某种程度上失去了id()函数在OP类A中所保证的唯一性，当这种情况发生时，就会发生冲突，并调用__eq__。

这可能发生在某些平台和Python的某些实现上。但无论我在哪里尝试过，它都不会发生。

它可能与一些未记录的属性有关:对于任何类实例obj，

hash( id( obj ) ) == hash( obj )
# and
hash( hash( obj ) ) == hash( obj )

(事实上，hash( id( x ) )并不总是hash( x )。试试x = -2。事实上，对于对象实例obj，hash( obj ) == id( obj ) >> 16似乎是这样的。但在我看来，这可能是依赖于实现的。

为了查看代码何时或如何中断，我使用下面的代码进行了测试。这个想法是，如果A的某些实例与一个新实例发生某种冲突，那么它将无法放在集合中，因为__eq__无法打破平局。这段代码检查是否曾经发生过这种情况。我从来没见过。请自己试一试，让我知道你使用的是什么操作系统，Python的哪个版本！

小心打开一个控制台并在其中运行top，查看发生了什么。使用class A的OP定义

from __future__ import print_function
from sys import stdout

class A( object ):
    def __init__( self, v1, v2 ):
            self.v = ( v1, v2 )

    def __eq__( self, b ):
            return self.v[0] == b.v[0] and self.v[1] == b.v[1]

    def __hash__( self ):
            return id( self )

    def __str__( self ):
            return str( self.v )

NINSTANCES = 3000000    # play with this number -- carefully!
STATUS_INTERVAL = 100000

def test():
        """ hammer the set algorithms """
        s = set()
        instances = []
        for i in range( 0, NINSTANCES ):
                p = A( 1, 0 )
                s.add( p )
                instances.append( p )
                if not i % STATUS_INTERVAL:
                        stdout.write( str( i // STATUS_INTERVAL ) + " " )
                        stdout.flush()
        stdout.write( "\n" )

        print( "length of set", len( s ) )
        print( "number of instances", len( instances ) )

        for i in instances:
                if not i in s:
                        print( "INSTANCE DROPPED OUT!" )
test()

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/58346286

复制

相似问题

问Python hash：身份与等价
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python __hash__：身份与等价EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python hash：身份与等价
EN