Eggdev: Major flaw with 1.9 hash_table() functions (stdarg read)

stdarg stdarg at yahoo.com
Thu Mar 3 12:47:07 CST 2005


I think originally I had it at 100 because I wanted to support fractions..
that's what I've done now. The right check is (100 * cells in use) / rows >
boundary. I've set boundary to 150 so basically when it gets to 150% capacity
it resizes to 300%, so it's half full. We can play around with the numbers to
find what is most efficient.

--- stdarg <stdarg at yahoo.com> wrote:
> Hey,
> 
> I'm not sure why I put 100 there, it must be some mistake. That is an awful
> lot
> of collisions before a resize. Feel free to replace it with a #defined
> constant
> and make it more reasonable like something in the range of 1-5 or so heh. Did
> I
> really put 100 in there? That's terrible.
> 
> --- Bryan Drewery <lordares at shatow.net> wrote:
> > 
> > I've been playing around/studying these functions for a while today and
> > noticed that there is a major flaw.
> > The idea of a hash table is to hash a value and jump directly to it's spot
> in
> > memory, without the need to loop any lists
> > (unless possibly because of collisions).
> > 
> > to create a table, hash_table_create is called with the size of the initial
> > table.
> > For example, the user list is defined to be 50:
> > #define USER_HASH_SIZE  50
> > 
> > hash_table_t *hash_table_create(hash_table_hash_alg alg, 
> > hash_table_cmp_alg cmp, int nrows, int flags)
> > {
> >          ...
> >          ht->max_rows = nrows;
> >          ...
> > }
> > 
> > The maximum rows are set to the initial size value, so far so good...
> > 
> > Now, skipping down over hash_table_insert()..
> > 
> > ht->cells_in_use is increased for each entry into the table...
> > 
> > ...at the end of hash_table_insert hash_table_check_resize() is called to 
> > see if the table needs to be resized...
> > 
> > In this function the following is used to check if the table is too small:
> > 
> >          if (ht->cells_in_use / ht->max_rows > 100) {
> >                  hash_table_resize(...);
> >                  ...
> >          }
> > 
> > Now, using the defined user table size of 50 and say 4900 users added, the 
> > value calculated is:
> > 4900 / 50 = ** 98 **
> > 98 is still less than the 100 required to resize and add rows to the 
> > table...Not until there are 5001 users will the table
> > increase from 1 row to 3 rows (there is a multiplier of 3 in the resizing)
> > 
> > Point being, until their are SIZE*100 entries into the first row, the 
> > table is just a ** 1 ** row linked-list, which
> > defeats the entire purpose of having the hash table functions...
> > 
> > I did some verification checks to this and the resulting hash % cells was 
> > always: ** 0 **, meaning the first row.
> > 
> > Reviewing egglib/cvs, I see that the original code simply add a linked 
> > list at collisions as is the best way of handling
> > them and not dealing with resizing...
> > or perhaps lowering the row percentage from 100 to say 2.
> > 
> > Seems that when the hash_table.* files were copied over from lib/egglib to 
> > lib/eggdrop, changes such as this occurred and
> > went unlogged by stdarg.
> > 
> > 
> > ------
> > Bryan Drewery
> > 
> > 
> 
> 
> 	
> 		
> __________________________________ 
> Celebrate Yahoo!'s 10th Birthday! 
> Yahoo! Netrospective: 100 Moments of the Web 
> http://birthday.yahoo.com/netrospective/
> 


	
		
__________________________________ 
Celebrate Yahoo!'s 10th Birthday! 
Yahoo! Netrospective: 100 Moments of the Web 
http://birthday.yahoo.com/netrospective/



More information about the Eggdev mailing list