Eggdev: Major flaw with 1.9 hash_table() functions (stdarg read)

Bryan Drewery lordares at shatow.net
Wed Mar 2 08:48:49 CST 2005


Nah, the hash functions are all fine, it's whether or not to resize is the 
problem, I did an expirement with a hash table of size (1), which yielded 
always using row 0, same thing with size 6. It never reached the 'full 
capacity' of the row to move on to resize, which is SIZE*100 per row.

Am waiting for stdarg before changing anything though ;)

------
Bryan Drewery


On Wed, 2 Mar 2005, Darko Ilic wrote:

> Hello Bryan,
>
> Quickly glancing at code (I am on a short lunch break) i can't see relation 
> between the fact that data is being stored in first raw, and the formula that 
> decides if hashtable needs resize. If what you are saying is, infact, true - 
> then there might be a problem with hasing function (although unlikely since 
> it's an age old code checked and re-checked by countless programmers - I 
> forgot the name of the algorithm in question).
>
> As for the code that checks if hashtable needs resize - the magic number 100 
> has got to go and be replaced by a reasonable named constant or #define. The 
> (possibly) new value has to be considered carefully because you don't want to 
> end up growing other way too (countless raws). I think the right balance 
> should be decided by stdarg, who alone has the best picture of all stuff that 
> uses hashtables and a rough idea how will population/saturation occur 
> (hostname hashes might show a whole differnet behaviour). Perhaps he already 
> did, but unfortunantly used a faceless magic number instead of a symbolic 
> constant.
>
> This aside, it's very worrying that your experiments all showed unbalanced 
> population, and it deserves a good look. Did you test if with integers 
> instead of strings? Perhaps there is a hidden little bug in my_string_hash() 
> ?
>
> This letter has to be cut short (un)fortunantly, since my lunch break is 
> over.
>
> Regards,
> Darko
>
>
> Bryan Drewery wrote:
>
>> 
>> I've been playing around/studying these functions for a while today and 
>> noticed that there is a major flaw.
>> The idea of a hash table is to hash a value and jump directly to it's spot 
>> in memory, without the need to loop any lists
>> (unless possibly because of collisions).
>> 
>> to create a table, hash_table_create is called with the size of the initial 
>> table.
>> For example, the user list is defined to be 50:
>> #define USER_HASH_SIZE  50
>> 
>> hash_table_t *hash_table_create(hash_table_hash_alg alg, hash_table_cmp_alg 
>> cmp, int nrows, int flags)
>> {
>>         ...
>>         ht->max_rows = nrows;
>>         ...
>> }
>> 
>> The maximum rows are set to the initial size value, so far so good...
>> 
>> Now, skipping down over hash_table_insert()..
>> 
>> ht->cells_in_use is increased for each entry into the table...
>> 
>> ...at the end of hash_table_insert hash_table_check_resize() is called to 
>> see if the table needs to be resized...
>> 
>> In this function the following is used to check if the table is too small:
>> 
>>         if (ht->cells_in_use / ht->max_rows > 100) {
>>                 hash_table_resize(...);
>>                 ...
>>         }
>> 
>> Now, using the defined user table size of 50 and say 4900 users added, the 
>> value calculated is:
>> 4900 / 50 = ** 98 **
>> 98 is still less than the 100 required to resize and add rows to the 
>> table...Not until there are 5001 users will the table
>> increase from 1 row to 3 rows (there is a multiplier of 3 in the resizing)
>> 
>> Point being, until their are SIZE*100 entries into the first row, the table 
>> is just a ** 1 ** row linked-list, which
>> defeats the entire purpose of having the hash table functions...
>> 
>> I did some verification checks to this and the resulting hash % cells was 
>> always: ** 0 **, meaning the first row.
>> 
>> Reviewing egglib/cvs, I see that the original code simply add a linked list 
>> at collisions as is the best way of handling
>> them and not dealing with resizing...
>> or perhaps lowering the row percentage from 100 to say 2.
>> 
>> Seems that when the hash_table.* files were copied over from lib/egglib to 
>> lib/eggdrop, changes such as this occurred and
>> went unlogged by stdarg.
>> 
>> 
>> ------
>> Bryan Drewery
>> 
>> 
>
>



More information about the Eggdev mailing list