Eggdev: Major flaw with 1.9 hash_table() functions (stdarg read)

Darko Ilic darko at onvol.net
Thu Mar 3 12:57:07 CST 2005


1-5 doesn't make it any better, either. It will probably be resized too 
often or become too long, with the kind of data we have. I had this hope 
that you actually analysed the best ratio and found out 100 is optimal 
:( A quick (an possibly wrong) guess of mine would lie somewhere around 
10-20 for the hashtable that initially starts with 50 raws.


stdarg wrote:

>Hey,
>
>I'm not sure why I put 100 there, it must be some mistake. That is an awful lot
>of collisions before a resize. Feel free to replace it with a #defined constant
>and make it more reasonable like something in the range of 1-5 or so heh. Did I
>really put 100 in there? That's terrible.
>
>--- Bryan Drewery <lordares at shatow.net> wrote:
>  
>
>>I've been playing around/studying these functions for a while today and
>>noticed that there is a major flaw.
>>The idea of a hash table is to hash a value and jump directly to it's spot in
>>memory, without the need to loop any lists
>>(unless possibly because of collisions).
>>
>>to create a table, hash_table_create is called with the size of the initial
>>table.
>>For example, the user list is defined to be 50:
>>#define USER_HASH_SIZE  50
>>
>>hash_table_t *hash_table_create(hash_table_hash_alg alg, 
>>hash_table_cmp_alg cmp, int nrows, int flags)
>>{
>>         ...
>>         ht->max_rows = nrows;
>>         ...
>>}
>>
>>The maximum rows are set to the initial size value, so far so good...
>>
>>Now, skipping down over hash_table_insert()..
>>
>>ht->cells_in_use is increased for each entry into the table...
>>
>>...at the end of hash_table_insert hash_table_check_resize() is called to 
>>see if the table needs to be resized...
>>
>>In this function the following is used to check if the table is too small:
>>
>>         if (ht->cells_in_use / ht->max_rows > 100) {
>>                 hash_table_resize(...);
>>                 ...
>>         }
>>
>>Now, using the defined user table size of 50 and say 4900 users added, the 
>>value calculated is:
>>4900 / 50 = ** 98 **
>>98 is still less than the 100 required to resize and add rows to the 
>>table...Not until there are 5001 users will the table
>>increase from 1 row to 3 rows (there is a multiplier of 3 in the resizing)
>>
>>Point being, until their are SIZE*100 entries into the first row, the 
>>table is just a ** 1 ** row linked-list, which
>>defeats the entire purpose of having the hash table functions...
>>
>>I did some verification checks to this and the resulting hash % cells was 
>>always: ** 0 **, meaning the first row.
>>
>>Reviewing egglib/cvs, I see that the original code simply add a linked 
>>list at collisions as is the best way of handling
>>them and not dealing with resizing...
>>or perhaps lowering the row percentage from 100 to say 2.
>>
>>Seems that when the hash_table.* files were copied over from lib/egglib to 
>>lib/eggdrop, changes such as this occurred and
>>went unlogged by stdarg.
>>
>>
>>------
>>Bryan Drewery
>>
>>
>>    
>>
>
>
>	
>		
>__________________________________ 
>Celebrate Yahoo!'s 10th Birthday! 
>Yahoo! Netrospective: 100 Moments of the Web 
>http://birthday.yahoo.com/netrospective/
>
>
>  
>




More information about the Eggdev mailing list