Eggdev: Major flaw with 1.9 hash_table() functions (stdarg read)

Darko Ilic darko at onvol.net
Wed Mar 2 06:07:20 CST 2005


Hello Bryan,

Quickly glancing at code (I am on a short lunch break) i can't see 
relation between the fact that data is being stored in first raw, and 
the formula that decides if hashtable needs resize. If what you are 
saying is, infact, true - then there might be a problem with hasing 
function (although unlikely since it's an age old code checked and 
re-checked by countless programmers - I forgot the name of the algorithm 
in question).

As for the code that checks if hashtable needs resize - the magic number 
100 has got to go and be replaced by a reasonable named constant or 
#define. The (possibly) new value has to be considered carefully because 
you don't want to end up growing other way too (countless raws). I think 
the right balance should be decided by stdarg, who alone has the best 
picture of all stuff that uses hashtables and a rough idea how will 
population/saturation occur (hostname hashes might show a whole 
differnet behaviour). Perhaps he already did, but unfortunantly used a 
faceless magic number instead of a symbolic constant.

This aside, it's very worrying that your experiments all showed 
unbalanced population, and it deserves a good look. Did you test if with 
integers instead of strings? Perhaps there is a hidden little bug in 
my_string_hash() ?

This letter has to be cut short (un)fortunantly, since my lunch break is 
over.

Regards,
Darko


Bryan Drewery wrote:

>
> I've been playing around/studying these functions for a while today 
> and noticed that there is a major flaw.
> The idea of a hash table is to hash a value and jump directly to it's 
> spot in memory, without the need to loop any lists
> (unless possibly because of collisions).
>
> to create a table, hash_table_create is called with the size of the 
> initial table.
> For example, the user list is defined to be 50:
> #define USER_HASH_SIZE  50
>
> hash_table_t *hash_table_create(hash_table_hash_alg alg, 
> hash_table_cmp_alg cmp, int nrows, int flags)
> {
>         ...
>         ht->max_rows = nrows;
>         ...
> }
>
> The maximum rows are set to the initial size value, so far so good...
>
> Now, skipping down over hash_table_insert()..
>
> ht->cells_in_use is increased for each entry into the table...
>
> ...at the end of hash_table_insert hash_table_check_resize() is called 
> to see if the table needs to be resized...
>
> In this function the following is used to check if the table is too 
> small:
>
>         if (ht->cells_in_use / ht->max_rows > 100) {
>                 hash_table_resize(...);
>                 ...
>         }
>
> Now, using the defined user table size of 50 and say 4900 users added, 
> the value calculated is:
> 4900 / 50 = ** 98 **
> 98 is still less than the 100 required to resize and add rows to the 
> table...Not until there are 5001 users will the table
> increase from 1 row to 3 rows (there is a multiplier of 3 in the 
> resizing)
>
> Point being, until their are SIZE*100 entries into the first row, the 
> table is just a ** 1 ** row linked-list, which
> defeats the entire purpose of having the hash table functions...
>
> I did some verification checks to this and the resulting hash % cells 
> was always: ** 0 **, meaning the first row.
>
> Reviewing egglib/cvs, I see that the original code simply add a linked 
> list at collisions as is the best way of handling
> them and not dealing with resizing...
> or perhaps lowering the row percentage from 100 to say 2.
>
> Seems that when the hash_table.* files were copied over from 
> lib/egglib to lib/eggdrop, changes such as this occurred and
> went unlogged by stdarg.
>
>
> ------
> Bryan Drewery
>
>




More information about the Eggdev mailing list