algorithm - Simple general-purpose hash function for a collection -


please mark duplicate, questions i've found far specific or more complex i'm looking for. e.g. in "what hash function", accepted answer seems oriented toward hashing strings.

i've started programming in .net, , find unfortunate built-in classes lack ability basic things check equality , find hash. i'm sure have design reasons that; no need defend .net. want know how avoid significant sidetrack when need use collection key dictionary. want, example, 2 different list objects containing equal values map same entry in dictionary. out of box, don't: default behavior list list not equal itself, instance of list same values different key.

implementing equals straightforward. it's hash function unsure of.

is there provided can call in implementation of gethashcode?

if have write scratch, what's simple enough hash algorithm? use sha1 think overkill. xor hashes of items, think have nasty collision properties. don't care if computing hashes blazingly fast, don't want hash table slow linear on data sets particular distribution. i'd simple can memorize it. bonus if can explain (or link to) why works.

be careful here. if create gethashcode method list<t> (or similar collection), presumably it'll this:

public override int gethashcode() {     int hash = 13;     foreach (var t in this)     {         // x operation (undefined here) somehow combines         // previous hash value , item's hash value         hash = hash x t.gethashcode();     }     return hash; } 

(i suggest jenkins hash computing hash code. wang hash (or bit mixer).)

unless compute value first time , cache it, end iterating on of items every time gethashcode called.

so you've created gethashcode , equals collection , put instance dictionary. have careful not change collection (i.e. don't add or remove items) or of items inside collection. otherwise value of gethashcode change, , dictionary won't work anymore.

i suggest if want use collection key dictionary, make sure collection immutable.

one other thing consider. concept of list equality isn't simple indicate. example, lists [1, 2, 3, 4, 5] , [5, 1, 3, 4, 2] equal? rather depends on definition of equality. a.union(b) == a.intersect(b), means they're equal if definition of equality "contain same items." if order matters, lists aren't equal.

if definition "contain same items," hash code calculation showed above isn't going work because hash code computations order dependent. if wanted compute hash code of lists, you'd have sort them first.

if lists cannot contain duplicates, computing equality matter of creating hash set of 1 list , looking each item other list in hash set. if lists can contain duplicates, either have sort them determine equality, or use kind of dictionary count. , both of imply objects contained in list implement form of equality comparer, etc.

and definitions of equality don't take duplicates account @ all. is, [1, 2, 3] equal [3, 3, 3, 2, 1, 1].

considering varying differences of equality , effort have taken allow , more in defining behavior of list<t>, can understand why whoever designed collection classes didn't implement value equality. considering it's pretty uncommon use list<t> or similar collection key in dictionary or hash table.


Comments

Popular posts from this blog

ios - RestKit 0.20 — CoreData: error: Failed to call designated initializer on NSManagedObject class (again) -

laravel - PDOException in Connector.php line 55: SQLSTATE[HY000] [1045] Access denied for user 'root'@'localhost' (using password: YES) -

java - Digest auth with Spring Security using javaconfig -