Eggdev: Major flaw with 1.9 hash_table() functions (stdarg read)

Bryan Drewery lordares at shatow.net
Wed Mar 2 09:02:40 CST 2005


Sorry, small mistake on my explanation, the row is not ALWAYS 0, just was 
when I tested with SIZE=1.

The problem is the table of SIZE=50 has to have 5001 entries/cells before 
it's rows are expanded.

hth

------
Bryan Drewery


On Wed, 2 Mar 2005, Bryan Drewery wrote:

> Nah, the hash functions are all fine, it's whether or not to resize is the 
> problem, I did an expirement with a hash table of size (1), which yielded 
> always using row 0, same thing with size 6. It never reached the 'full 
> capacity' of the row to move on to resize, which is SIZE*100 per row.
>
> Am waiting for stdarg before changing anything though ;)
>
> ------
> Bryan Drewery
>
>
> On Wed, 2 Mar 2005, Darko Ilic wrote:
>
>> Hello Bryan,
>> 
>> Quickly glancing at code (I am on a short lunch break) i can't see relation 
>> between the fact that data is being stored in first raw, and the formula 
>> that decides if hashtable needs resize. If what you are saying is, infact, 
>> true - then there might be a problem with hasing function (although 
>> unlikely since it's an age old code checked and re-checked by countless 
>> programmers - I forgot the name of the algorithm in question).
>> 
>> As for the code that checks if hashtable needs resize - the magic number 
>> 100 has got to go and be replaced by a reasonable named constant or 
>> #define. The (possibly) new value has to be considered carefully because 
>> you don't want to end up growing other way too (countless raws). I think 
>> the right balance should be decided by stdarg, who alone has the best 
>> picture of all stuff that uses hashtables and a rough idea how will 
>> population/saturation occur (hostname hashes might show a whole differnet 
>> behaviour). Perhaps he already did, but unfortunantly used a faceless magic 
>> number instead of a symbolic constant.
>> 
>> This aside, it's very worrying that your experiments all showed unbalanced 
>> population, and it deserves a good look. Did you test if with integers 
>> instead of strings? Perhaps there is a hidden little bug in 
>> my_string_hash() ?
>> 
>> This letter has to be cut short (un)fortunantly, since my lunch break is 
>> over.
>> 
>> Regards,
>> Darko
>> 
>> 
>> Bryan Drewery wrote:
>> 
>>> 
>>> I've been playing around/studying these functions for a while today and 
>>> noticed that there is a major flaw.
>>> The idea of a hash table is to hash a value and jump directly to it's spot 
>>> in memory, without the need to loop any lists
>>> (unless possibly because of collisions).
>>> 
>>> to create a table, hash_table_create is called with the size of the 
>>> initial table.
>>> For example, the user list is defined to be 50:
>>> #define USER_HASH_SIZE  50
>>> 
>>> hash_table_t *hash_table_create(hash_table_hash_alg alg, 
>>> hash_table_cmp_alg cmp, int nrows, int flags)
>>> {
>>>         ...
>>>         ht->max_rows = nrows;
>>>         ...
>>> }
>>> 
>>> The maximum rows are set to the initial size value, so far so good...
>>> 
>>> Now, skipping down over hash_table_insert()..
>>> 
>>> ht->cells_in_use is increased for each entry into the table...
>>> 
>>> ...at the end of hash_table_insert hash_table_check_resize() is called to 
>>> see if the table needs to be resized...
>>> 
>>> In this function the following is used to check if the table is too small:
>>> 
>>>         if (ht->cells_in_use / ht->max_rows > 100) {
>>>                 hash_table_resize(...);
>>>                 ...
>>>         }
>>> 
>>> Now, using the defined user table size of 50 and say 4900 users added, the 
>>> value calculated is:
>>> 4900 / 50 = ** 98 **
>>> 98 is still less than the 100 required to resize and add rows to the 
>>> table...Not until there are 5001 users will the table
>>> increase from 1 row to 3 rows (there is a multiplier of 3 in the resizing)
>>> 
>>> Point being, until their are SIZE*100 entries into the first row, the 
>>> table is just a ** 1 ** row linked-list, which
>>> defeats the entire purpose of having the hash table functions...
>>> 
>>> I did some verification checks to this and the resulting hash % cells was 
>>> always: ** 0 **, meaning the first row.
>>> 
>>> Reviewing egglib/cvs, I see that the original code simply add a linked 
>>> list at collisions as is the best way of handling
>>> them and not dealing with resizing...
>>> or perhaps lowering the row percentage from 100 to say 2.
>>> 
>>> Seems that when the hash_table.* files were copied over from lib/egglib to 
>>> lib/eggdrop, changes such as this occurred and
>>> went unlogged by stdarg.
>>> 
>>> 
>>> ------
>>> Bryan Drewery
>>> 
>>> 
>> 
>> 
>



More information about the Eggdev mailing list