ON-TIME QUIZ POST-MORTEM We Will Try This Again Next Week, in Ontime Quiz 3 OntimeQuiz2FinalAdjusted = max(OntimeQuiz2,OntimeQuiz3) (Note there will be no adjustments to OntimeQuiz3.) ON-TIME QUIZ 3 POST-MORTEM 1. (2 points) Draw a 'memory and pointers' diagram as of the point marked '/* HERE */' in the following program. Omit 'args'. public class Test { x0 Integer private static Integer x0 = 9; +------+ +----------+ public static void main(String[] args) { | ------>| | int x1 = x0; +------+ | 9 | Integer x2 = x1; | | /* HERE */ +----------+ System.out.println(x0+x1+x2); x1 } +--------+ } Integer | | +-----------+ | 9 | | | +--------+ | 9 |<---\ | | | x2 +-----------+ | +--------+ \------- | | | +--------+ ON-TIME QUIZ 3 POST-MORTEM 2. (3 points) Draw a 'memory and pointers' diagram as of the point marked '/* HERE */' in the following program. Omit 'args'. public class Test { t1 Test int zot = 2; +-----+ +-----+ Test(int z) { zot = z; } | ------->| 4 | public static void main(String[] args) { +-----+ | | Test t1 = new Test(4); +-----+ Test t2 = new Test(1); t2 t2.zot = t1.zot; +-----+ Test /* HERE */ | -------->+-----+ System.out.println(t1.zot+t2.zot); +-----+ | 4 | } +-----+ } ON-TIME QUIZ 3 POST-MORTEM 3. (4 points) Draw a 'memory and pointers' diagram as of the point marked '/* HERE */' in the following program. Omit 'args'. public class Test { int a; Test b, c; Test(int d, Test e, Test f) { a = d; b = e; c = f; } public static void main(String[] args) { Test t1 = new Test(7, new Test(2,null,null), null); t1.b.c = t1; /----------------------\ t1 = t1.b; t1 | Test Test v /* HERE */ +---|-+ +------+ +-------+ System.out.println(t1.a); | | | 7 | | 2 | } +-----+ |------| |-------| } | -------->| null | |------| |-------| | null |<------- | | | +-------+ +------+ QUESTIONS LAST TIME: - Memory & pointers quiz results :(, more hashing TODAY: - Memory & pointers quiz again - Amortized O(1); open-address collision handling - Designing with objects - Extending classes PROJECT 2 STATUS - Under development - Hoping for spec before the weekend MANTRA OF THE DAY Any system design problem can be solved by adding another level of indirection. QUESTIONS LAST TIME: - Memory & pointers quiz results :(, more hashing TODAY: - Memory & pointers quiz again - Amortized O(1); open-address collision handling - Designing with objects - Extending classes PROJECT 2 STATUS - Under development - Hoping for spec before the weekend MANTRA OF THE DAY Any system design problem can be solved by adding another level of indirection. Any system performance problem can be solved by removing another level of indirection. HASH TABLES - SEPARATE CHAINING import com.remain.always.MyHash; class Whatever { public static void main(String[] args) { MyHash h = new MyHash(); h.insert("foo",1); class MyHash { ... h.insert("bar",7); int hash(String s) { return s.length(); } h.insert("bletch",2); // ^^^ AWFUL HASH FUNCTION! ^^ h.insert("mumble",3); .. }; h.insert("chaining",0); } Inside MyHash somewhere.. } (Wow, 5 inserts, 3 collisions! +----+ +--------+ +--------+ How did that happen?) [0]| | |"mumble"| /--->|"bletch"| |----| /-->|--------| / |--------| Hmm, better store the [1]| -----/ | 3 | ---/ | 2 |null| values here too.. Might |----| +--------+ +--------+ want the hashcodes too.. [2]| | |----| +----------+ +------+ +-------+ [3]| -----\ |"chaining"| |"bar" | /---->|"foo" | |----| \-->|----------| /->|------| / |-------| [4]| | | 0 | ----/ |7 | ---/ |1 |null| +----+ +----------+ +------+ +-------+ AMORTIZED O(1) ALGORITHMS Uh, how can hash table insert be O(1) if we have to rehash the table? Rehashing has to hit each stored association, so it's inherently O(n) in the number of associations. (Duh.) AMORTIZED O(1) ALGORITHMS Think about filling up an array: class Test { private String[] table = new String[10]; private int firstFreeIndex = 0; public void add(String s) { table[firstFreeIndex] = s; ++firstFreeIndex; } public static void main(String[] args) { Test t = new Test(); for (int i = 0; i < 100; ++i) t.add("foo"); } } How about this: When we've filled our size N array: (1) Get a new size 2*N array as well and hold onto both (2) Now for each of the next N add's, in addition to storing into the bigger array, copy one element from the old array to new array. Touching 3 elements only is still O(1).. (3) After those N copies, the old array can be tossed, and the whole thing repeated. So that way, each 'add' operation is still O(1), right? AMORTIZED O(1) ALGORITHMS Think about filling up an array: class Test { private String[] table = new String[10]; private int firstFreeIndex = 0; public void add(String s) { table[firstFreeIndex] = s; ++firstFreeIndex; } public static void main(String[] args) { Test t = new Test(); for (int i = 0; i < 100; ++i) t.add("foo"); } } How about this: When we've filled our size N array: (1) Get a new size 2*N array as well and hold onto both (2) Now for each of the next N add's, in addition to storing into the bigger array, copy one element from the old array to new array. Touching 3 elements only is still O(1).. (3) After those N copies, the old array can be tossed, and the whole thing repeated. So that way, each 'add' operation is still O(1), right? Well, not quite. Watch out for step (1). AMORTIZED O(1) ALGORITHMS Think about filling up an array: class Test { private String[] table = new String[10]; private int firstFreeIndex = 0; public void add(String s) { table[firstFreeIndex] = s; ++firstFreeIndex; } public static void main(String[] args) { Test t = new Test(); for (int i = 0; i < 100; ++i) t.add("foo"); } } How about this: When we've filled our size N array: (1) Get a new size 2*N array as well and hold onto both (2) Now for each of the next N add's, in addition to storing into the bigger array, copy one element from the old array to new array. Touching 3 elements only is still O(1).. (3) After those N copies, the old array can be tossed, and the whole thing repeated. So that way, each 'add' operation is still O(1), right? Well, not quite. Watch out for step (1).. For one thing, Java requires that all those 2*N array slots *have* to be initialized.. AMORTIZED O(1) ALGORITHMS But still -- a key idea: If we Think about filling up an array: double the size of the table then the amount of 'copying' class Test { is the same as the amount of private String[] table = new String[10]; 'adding' work. private int firstFreeIndex = 0; public void add(String s) { table[firstFreeIndex] = s; ++firstFreeIndex; } public static void main(String[] args) { Test t = new Test(); for (int i = 0; i < 100; ++i) t.add("foo"); } } How about this: When we've filled our size N array: (1) Get a new size 2*N array as well and hold onto both (2) Now for each of the next N add's, in addition to storing into the bigger array, copy one element from the old array to new array. Touching 3 elements only is still O(1).. (3) After those N copies, the old array can be tossed, and the whole thing repeated. So that way, each 'add' operation is still O(1), right? Well, not quite. Watch out for step (1).. For one thing, Java requires that all those 2*N array slots *have* to be initialized.. AMORTIZED O(1) ALGORITHMS But still -- a key idea: If we Think about filling up an array: double the size of the table then the amount of 'copying' class Test { is the same as the amount of private String[] table = new String[10]; 'adding' work. private int firstFreeIndex = 0; public void add(String s) { table[firstFreeIndex] = s; ++firstFreeIndex; } public static void main(String[] args) { Test t = new Test(); Suppose we don't insist that for (int i = 0; i < 100; ++i) t.add("foo"); each and every 'add' finish } is O(1), but only that a } large number N of 'add's How about this: When we've filled our size N array: take O(N) time all (1) Get a new size 2*N array as well and hold onto both together? Then (2) Now for each of the next N add's, in addition to each one is storing into the bigger array, copy one element from the 'kind old array to new array. Touching 3 elements only is still O(1).. of' (3) After those N copies, the old array can be tossed, and the O(1). whole thing repeated. Call it: 'amortized O(1)' So that way, each 'add' operation is still O(1), right? Well, not quite. Watch out for step (1).. For one thing, Java requires that all those 2*N array slots *have* to be initialized.. AMORTIZED O(1) ALGORITHMS But still -- a key idea: If we Think about filling up an array: double the size of the table then the amount of 'copying' class Test { is the same as the amount of private String[] table = new String[10]; 'adding' work. private int firstFreeIndex = 0; public void add(String s) { table[firstFreeIndex] = s; ++firstFreeIndex; } public static void main(String[] args) { Test t = new Test(); Suppose we don't insist that for (int i = 0; i < 100; ++i) t.add("foo"); each and every 'add' finish } is O(1), but only that a } large number N of 'add's How about this: When we've filled our size N array: take O(N) time all (1) Get a new size 2*N array as well and hold onto both together? Then (2) Now for each of the next N add's, in addition to each one is storing into the bigger array, copy one element from the 'kind old array to new array. Touching 3 elements only is still O(1).. of' (3) After those N copies, the old array can be tossed, and the O(1). whole thing repeated. Call it: 'amortized O(1)' So that way, each 'add' operation is still O(1), right? WARNING: NOT Well, not quite. Watch out for step (1).. For one thing, Java FOR USE IN requires that all those 2*N array slots *have* to be initialized.. PACEMAKERS! AMORTIZED O(1) ALGORITHMS Uh, how can hash table insert be O(1) if we have to rehash the table? Rehashing has to hit each stored association, so it's inherently O(n) in the number of associations. (Duh.) So: How OFTEN does that O(n) operation have to happen? Think about filling up an array: class Test { private String[] table = new String[10]; private int firstFreeIndex = 0; public void add(String s) { table[firstFreeIndex] = s; ++firstFreeIndex; } public static void main(String[] args) { Test t = new Test(); for (int i = 0; i < 100; ++i) t.add("foo"); } } AMORTIZED O(1) ALGORITHMS Uh, how can hash table insert be O(1) if we have to rehash the table? Rehashing has to hit each stored association, so it's inherently O(n) in the number of associations. (Duh.) So: How OFTEN does that O(n) operation have to happen? Think about filling up an array: class Test { private String[] table = new String[10]; private int firstFreeIndex = 0; private void grow() { String[] old = table; table = new String[old.length*2]; for (int i = 0; i= table.length) grow(); table[firstFreeIndex] = s; ++firstFreeIndex; } public static void main(String[] args) { Test t = new Test(); for (int i = 0; i < 100; ++i) t.add("foo"); } } AMORTIZED O(1) ALGORITHMS Uh, how can hash table insert be O(1) if we have to rehash the table? Rehashing has to hit each stored association, so it's inherently O(n) in the number of associations. (Duh.) But: How OFTEN does that O(n) operation have to happen? Think about filling up an array: class Test { private String[] table = new String[10]; How long does 'grow()' take, as private int firstFreeIndex = 0; a function of the # of add's? private void grow() { -> O(#adds) :( String[] old = table; table = new String[old.length*2]; for (int i = 0; i= table.length) grow(); table[firstFreeIndex] = s; ++firstFreeIndex; } public static void main(String[] args) { Test t = new Test(); for (int i = 0; i < 100; ++i) t.add("foo"); } } AMORTIZED O(1) ALGORITHMS Uh, how can hash table insert be O(1) if we have to rehash the table? Rehashing has to hit each stored association, so it's inherently O(n) in the number of associations. (Duh.) But: How OFTEN does that O(n) operation have to happen? Think about filling up an array: class Test { private String[] table = new String[10]; How long does 'grow()' take, as private int firstFreeIndex = 0; a function of the # of add's? private void grow() { -> O(#adds) :( String[] old = table; But! How often is table = new String[old.length*2]; 'grow()' called, as for (int i = 0; i once in O(#adds) if (firstFreeIndex >= table.length) grow(); table[firstFreeIndex] = s; ++firstFreeIndex; } public static void main(String[] args) { Test t = new Test(); for (int i = 0; i < 100; ++i) t.add("foo"); } } AMORTIZED O(1) ALGORITHMS Uh, how can hash table insert be O(1) if we have to rehash the table? Rehashing has to hit each stored association, so it's inherently O(n) in the number of associations. (Duh.) But: How OFTEN does that O(n) operation have to happen? Think about filling up an array: class Test { private String[] table = new String[10]; How long does 'grow()' take, as private int firstFreeIndex = 0; a function of the # of add's? private void grow() { -> O(#adds) :( String[] old = table; But! How often is table = new String[old.length*2]; 'grow()' called, as for (int i = 0; i once in O(#adds) if (firstFreeIndex >= table.length) grow(); table[firstFreeIndex] = s; ++firstFreeIndex; So if we amortize the } O(#adds) work per O(#adds) public static void main(String[] args) { operations we get Test t = new Test(); O(#adds)/O(#adds) = O(1) for (int i = 0; i < 100; ++i) t.add("foo"); and end up at } 'amortized O(1)' } per add operation AMORTIZED O(1) ALGORITHMS Uh, how can hash table insert be O(1) if we have to rehash the table? Rehashing has to hit each stored association, so it's inherently O(n) in the number of associations. (Duh.) But: How OFTEN does that O(n) operation have to happen? Think about filling up an array: class Test { private String[] table = new String[10]; How long does 'grow()' take, as private int firstFreeIndex = 0; a function of the # of add's? private void grow() { -> O(#adds) :( String[] old = table; But! How often is table = new String[old.length*2]; 'grow()' called, as for (int i = 0; i once in O(#adds) if (firstFreeIndex >= table.length) grow(); table[firstFreeIndex] = s; ++firstFreeIndex; So if we amortize the } O(#adds) work per O(#adds) public static void main(String[] args) { operations we get Test t = new Test(); O(#adds)/O(#adds) = O(1) for (int i = 0; i < 100; ++i) t.add("foo"); and end up at } 'amortized O(1)' } HashMap uses this! So does, e.g., ArrayList.. per add operation AMORTIZED O(1) ALGORITHMS -- E.G. 'new ArrayList(1)' Bytes copied Total bytes Bytes copied/ ArrayList used room in add copied ArrayList length "a" 1 1 0 0 0.0 AMORTIZED O(1) ALGORITHMS -- E.G. 'new ArrayList(1)' Bytes copied Total bytes Bytes copied/ ArrayList used room in add copied ArrayList length "a" 1 1 0 0 0.0 | "ab" 2 2 1 1 0.5 AMORTIZED O(1) ALGORITHMS -- E.G. 'new ArrayList(1)' Bytes copied Total bytes Bytes copied/ ArrayList used room in add copied ArrayList length "a" 1 1 0 0 0.0 | "ab" 2 2 1 1 0.5 || "abc." 3 4 2 3 1.0 "abcd" 4 4 0 3 0.75 AMORTIZED O(1) ALGORITHMS -- E.G. 'new ArrayList(1)' Bytes copied Total bytes Bytes copied/ ArrayList used room in add copied ArrayList length "a" 1 1 0 0 0.0 | "ab" 2 2 1 1 0.5 || "abc." 3 4 2 3 1.0 "abcd" 4 4 0 3 0.75 "abcde..." 5 8 4 7 1.4 "abcdef.." 6 8 0 7 1.1666 "abcdefg." 7 8 0 7 1.0 "abcdefgh" 8 8 0 7 0.875 AMORTIZED O(1) ALGORITHMS -- E.G. 'new ArrayList(1)' Bytes copied Total bytes Bytes copied/ ArrayList used room in add copied ArrayList length "a" 1 1 0 0 0.0 | "ab" 2 2 1 1 0.5 || "abc." 3 4 2 3 1.0 "abcd" 4 4 0 3 0.75 "abcde..." 5 8 4 7 1.4 "abcdef.." 6 8 0 7 1.1666 "abcdefg." 7 8 0 7 1.0 "abcdefgh" 8 8 0 7 0.875 |||||||| "abcdefghi......." 9 16 8 15 1.666 "abcdefghij......" 10 16 0 15 1.5 "abcdefghijk....." 11 16 0 15 1.363 "abcdefghijkl...." 12 16 0 15 1.25 "abcdefghijklm..." 13 16 0 15 1.154 "abcdefghijklmn.." 14 16 0 15 1.071 "abcdefghijklmno." 15 16 0 15 1.0 "abcdefghijklmnop" 16 16 0 15 0.938 |||||||||||||||| "abcdefghijklmnop> 16 32 16 31 1.9375 AMORTIZED O(1) ALGORITHMS The double-and-copy trick can be applied in lots of places. For example, hash table resizing. If you can get your algorithms so that (time to insert one key-val pair) is (approximately) O(1), and (time to rehash one key-val pair) is (approximately) O(1), then amortized time to store or get a key-val pair, including rehashing, is also (approximately) O(1). AMORTIZED O(1) ALGORITHMS The double-and-copy trick can be applied in lots of places. For example, hash table resizing. If you can get your algorithms so that (time to insert one key-val pair) is (approximately) O(1), and (time to rehash one key-val pair) is (approximately) O(1), then amortized time to store or get a key-val pair, including rehashing, is also (approximately) O(1). Question: What about having to chase down the chains in the hash table? How is that O(1)? ArrayList doesn't have to do that to get a char.. AMORTIZED O(1) ALGORITHMS The double-and-copy trick can be applied in lots of places. For example, hash table resizing. If you can get your algorithms so that (time to insert one key-val pair) is (approximately) O(1), and (time to rehash one key-val pair) is (approximately) O(1), then amortized time to store or get a key-val pair, including rehashing, is also (approximately) O(1). Question: What about having to chase down the chains in the hash table? How is that O(1)? ArrayList doesn't have to do that to get a char.. Answer: It's the miracle of big-Oh notation! SO LONG as we can say that the lengths of the chains are (approximately) bounded by a constant -- in other words, the chains don't keep getting longer as we insert more key-val pairs -- then we're still O(1)! AMORTIZED O(1) ALGORITHMS The double-and-copy trick can be applied in lots of places. For example, hash table resizing. If you can get your algorithms so that (time to insert one key-val pair) is (approximately) O(1), and (time to rehash one key-val pair) is (approximately) O(1), then amortized time to store or get a key-val pair, including rehashing, is also (approximately) O(1). Question: What about having to chase down the chains in the hash table? How is that O(1)? ArrayList doesn't have to do that to get a char.. Answer: It's the miracle of big-Oh notation! SO LONG as we can say that the lengths of the chains are (approximately) bounded by a constant -- in other words, the chains don't keep getting longer as we insert more key-val pairs -- then we're still O(1)! --> So, rehashing is critical for O(1), even with separate chaining. Without it, lookup in a hash table becomes O(n). AMORTIZED O(1) ALGORITHMS The double-and-copy trick can be applied in lots of places. For example, hash table resizing. If you can get your algorithms so that (time to insert one key-val pair) is (approximately) O(1), and (time to rehash one key-val pair) is (approximately) O(1), then amortized time to store or get a key-val pair, including rehashing, is also (approximately) O(1). Question: Do you have to *double* when you rehash? AMORTIZED O(1) ALGORITHMS The double-and-copy trick can be applied in lots of places. For example, hash table resizing. If you can get your algorithms so that (time to insert one key-val pair) is (approximately) O(1), and (time to rehash one key-val pair) is (approximately) O(1), then amortized time to store or get a key-val pair, including rehashing, is also (approximately) O(1). Question: Do you have to *double* when you rehash? On overflow Store Find If size = 2*size; -> O(1) O(1) size = 3*size; -> size = 3*size/2; -> size = size+1; -> size = size+100000;-> AMORTIZED O(1) ALGORITHMS The double-and-copy trick can be applied in lots of places. For example, hash table resizing. If you can get your algorithms so that (time to insert one key-val pair) is (approximately) O(1), and (time to rehash one key-val pair) is (approximately) O(1), then amortized time to store or get a key-val pair, including rehashing, is also (approximately) O(1). Question: Do you have to *double* when you rehash? On overflow Store Find If size = 2*size; -> O(1) O(1) size = 3*size; -> O(1) O(1) size = 3*size/2; -> size = size+1; -> size = size+100000;-> AMORTIZED O(1) ALGORITHMS The double-and-copy trick can be applied in lots of places. For example, hash table resizing. If you can get your algorithms so that (time to insert one key-val pair) is (approximately) O(1), and (time to rehash one key-val pair) is (approximately) O(1), then amortized time to store or get a key-val pair, including rehashing, is also (approximately) O(1). Question: Do you have to *double* when you rehash? On overflow Store Find If size = 2*size; -> O(1) O(1) size = 3*size; -> O(1) O(1) size = 3*size/2; -> O(1) O(1) size = size+1; -> size = size+100000;-> AMORTIZED O(1) ALGORITHMS The double-and-copy trick can be applied in lots of places. For example, hash table resizing. If you can get your algorithms so that (time to insert one key-val pair) is (approximately) O(1), and (time to rehash one key-val pair) is (approximately) O(1), then amortized time to store or get a key-val pair, including rehashing, is also (approximately) O(1). Question: Do you have to *double* when you rehash? On overflow Store Find If size = 2*size; -> O(1) O(1) size = 3*size; -> O(1) O(1) size = 3*size/2; -> O(1) O(1) (if size>1!) size = size+1; -> size = size+100000;-> AMORTIZED O(1) ALGORITHMS The double-and-copy trick can be applied in lots of places. For example, hash table resizing. If you can get your algorithms so that (time to insert one key-val pair) is (approximately) O(1), and (time to rehash one key-val pair) is (approximately) O(1), then amortized time to store or get a key-val pair, including rehashing, is also (approximately) O(1). Question: Do you have to *double* when you rehash? On overflow Store Find If size = 2*size; -> O(1) O(1) size = 3*size; -> O(1) O(1) size = 3*size/2; -> O(1) O(1) (if size>1!) size = size+1; -> O(n) O(1) size = size+100000;-> AMORTIZED O(1) ALGORITHMS The double-and-copy trick can be applied in lots of places. For example, hash table resizing. If you can get your algorithms so that (time to insert one key-val pair) is (approximately) O(1), and (time to rehash one key-val pair) is (approximately) O(1), then amortized time to store or get a key-val pair, including rehashing, is also (approximately) O(1). Question: Do you have to *double* when you rehash? On overflow Store Find If size = 2*size; -> O(1) O(1) size = 3*size; -> O(1) O(1) size = 3*size/2; -> O(1) O(1) (if size>1!) size = size+1; -> O(n) O(1) size = size+100000;-> O(n) O(1) AMORTIZED O(1) ALGORITHMS The double-and-copy trick can be applied in lots of places. For example, hash table resizing. If you can get your algorithms so that (time to insert one key-val pair) is (approximately) O(1), and (time to rehash one key-val pair) is (approximately) O(1), then amortized time to store or get a key-val pair, including rehashing, is also (approximately) O(1). Question: Do you have to *double* when you rehash? On overflow Store Find If size = 2*size; -> O(1) O(1) size = 3*size; -> O(1) O(1) size = 3*size/2; -> O(1) O(1) (if size>1!) size = size+1; -> O(n) O(1) size = size+100000;-> O(n) O(1) -> Not double, necessarily, but most grow in proportion to size.. HASH TABLES Getting an index from a name IDEA: 'Hash' the name up into a reasonably small number to use as an array index, with a HASH FUNCTION Upside: Can use a reasonably small array Downside: Have to deal with COLLISIONS: When two different names get hashed to the same number Issues: - What hash function? -> Speed -> Spread - How to deal with collisions? -> 'Open addressing' - put collided entries somewhere else in the table -> 'Separate chaining' - make a linked list of collided entries at each index in the array - What happens if our reasonably small table fills up? -> "With separate chaining, it never will!" Except for what? -> Need to rehash into a larger array HASH TABLES Getting an index from a name IDEA: 'Hash' the name up into a reasonably small number to use as an array index, with a HASH FUNCTION Upside: Can use a reasonably small array Downside: Have to deal with COLLISIONS: When two different names get hashed to the same number Issues: - What hash function? -> Speed -> Spread - How to deal with collisions? -> 'Open addressing' - put collided entries somewhere else in the table -> 'Separate chaining' - make a linked list of collided entries at each index in the array - What happens if our reasonably small table fills up? -> "With separate chaining, it never will!" Except for what? -> Need to rehash into a larger array -> And we can still have (amortized) O(1) add's HASH TABLES Getting an index from a name IDEA: 'Hash' the name up into a reasonably small number to use as an array index, with a HASH FUNCTION Upside: Can use a reasonably small array Downside: Have to deal with COLLISIONS: When two different names get hashed to the same number Issues: - What hash function? -> Speed -> Spread - How to deal with collisions? -> 'Open addressing' - put collided entries somewhere else in the table -> 'Separate chaining' - make a linked list of collided entries at each index in the array - What happens if our reasonably small table fills up? -> "With separate chaining, it never will!" Except for what? -> Need to rehash into a larger array -> And we can still have (amortized) O(1) add's - What if we're stupid and/or unlucky in key values and hash functions? -> Well, that's why we say 'approximately amortized O(1)'.. HASH TABLES - OPEN ADDRESSING Getting an index from a name IDEA: 'Hash' the name up into a reasonably small number to use as an array index, with a HASH FUNCTION Upside: Can use a reasonably small array Downside: Have to deal with COLLISIONS: When two different names get hashed to the same number Issues: - What hash function? /--> Extra issue: How do you -> Speed / decide where in the array -> Spread / to put collisions? - How to+--------------------------------------------------+ ->|'Open addressing' - put collided entries somewhere| | else in the table | ->+--------------------------------------------------+ collided entries at each index in the array - What happens if our reasonably small table fills up? -> "With separate chaining, it never will!" Except for what? -> Need to rehash into a larger array -> And we can still have (amortized) O(1) add's - What if we're stupid and/or unlucky in key values and hash functions? -> Well, that's why we say 'approximately amortized O(1)'.. HASH TABLES - OPEN ADDRESSING - STRAWMAN: LINEAR PROBING Where to put a collided key? Obvious, easiest, awfullest idea: - Just put it in the next free slot, wrapping around if you need to - Called: 'Linear probing' Same keys.. foo +----+ bar [0]| | bletch |----| mumble [1]| | chaining |----| [2]| | Same AWFUL hash |----| function: [3]| | key.length() |----| [4]| | +----+ HASH TABLES - OPEN ADDRESSING - STRAWMAN: LINEAR PROBING Where to put a collided key? Obvious, easiest, awfullest idea: - Just put it in the next free slot, wrapping around if you need to - Called: 'Linear probing' Same keys.. 3> foo +----+ bar [0]| | bletch |----| mumble [1]| | chaining |----| [2]| | Same AWFUL hash |----| +---------+ function: [3]| ------->|"foo" | 3| key.length() |----| +---------+ [4]| | +----+ HASH TABLES - OPEN ADDRESSING - STRAWMAN: LINEAR PROBING Where to put a collided key? Obvious, easiest, awfullest idea: - Just put it in the next free slot, wrapping around if you need to - Called: 'Linear probing' Same keys.. foo +----+ 3> bar [0]| | bletch |----| mumble [1]| | chaining |----| [2]| | Same AWFUL hash |----| +---------+ function: bar>[3]| ------->|"foo" | 3| key.length() |----| +---------+ Have to check and confirm [4]| | that 'bar' isn't 'foo'.. +----+ HASH TABLES - OPEN ADDRESSING - STRAWMAN: LINEAR PROBING Where to put a collided key? Obvious, easiest, awfullest idea: - Just put it in the next free slot, wrapping around if you need to - Called: 'Linear probing' Same keys.. foo +----+ 3> bar [0]| | bletch |----| mumble [1]| | chaining |----| [2]| | Same AWFUL hash |----| +---------+ function: [3]| ------->|"foo" | 3| key.length() |----| +---------+ bar>[4]| | +----+ HASH TABLES - OPEN ADDRESSING - STRAWMAN: LINEAR PROBING Where to put a collided key? Obvious, easiest, awfullest idea: - Just put it in the next free slot, wrapping around if you need to - Called: 'Linear probing' Same keys.. foo +----+ bar [0]| | bletch |----| mumble [1]| | chaining |----| [2]| | Same AWFUL hash |----| +---------+ function: [3]| ------->|"foo" | 3| key.length() |----| +---------+ [4]| ----\ +---------+ +----+ \->|"bar" | 3| +---------+ HASH TABLES - OPEN ADDRESSING - STRAWMAN: LINEAR PROBING Where to put a collided key? Obvious, easiest, awfullest idea: - Just put it in the next free slot, wrapping around if you need to - Called: 'Linear probing' Same keys.. foo +----+ bar [0]| | 6%5==1> bletch |----| mumble > [1]| | chaining |----| [2]| | Same AWFUL hash |----| +---------+ function: [3]| ------->|"foo" | 3| key.length() |----| +---------+ [4]| ----\ +---------+ +----+ \->|"bar" | 3| +---------+ HASH TABLES - OPEN ADDRESSING - STRAWMAN: LINEAR PROBING Where to put a collided key? Obvious, easiest, awfullest idea: - Just put it in the next free slot, wrapping around if you need to - Called: 'Linear probing' Same keys.. foo +----+ bar [0]| | 6%5==1> bletch |----| +----------+ mumble [1]| ------->|"bletch"|6| chaining |----| +----------+ [2]| | Same AWFUL hash |----| +---------+ function: [3]| ------->|"foo" | 3| key.length() |----| +---------+ [4]| ----\ +---------+ +----+ \->|"bar" | 3| +---------+ HASH TABLES - OPEN ADDRESSING - STRAWMAN: LINEAR PROBING Where to put a collided key? Obvious, easiest, awfullest idea: - Just put it in the next free slot, wrapping around if you need to - Called: 'Linear probing' Same keys.. foo +----+ bar [0]| | bletch |----| +----------+ 6%5==1> mumble > [1]| ------->|"bletch"|6| chaining |----| +----------+ [2]| | Same AWFUL hash |----| +---------+ function: [3]| ------->|"foo" | 3| key.length() |----| +---------+ [4]| ----\ +---------+ +----+ \->|"bar" | 3| +---------+ HASH TABLES - OPEN ADDRESSING - STRAWMAN: LINEAR PROBING Where to put a collided key? Obvious, easiest, awfullest idea: - Just put it in the next free slot, wrapping around if you need to - Called: 'Linear probing' Same keys.. foo +----+ bar [0]| | bletch |----| +----------+ 6%5==1> mumble [1]| ------->|"bletch"|6| chaining |----| +----------+ +----------+ > [2]| --------------------->|"mumble"|6| Same AWFUL hash |----| +---------+ +----------+ function: [3]| ------->|"foo" | 3| key.length() |----| +---------+ [4]| ----\ +---------+ +----+ \->|"bar" | 3| +---------+ HASH TABLES - OPEN ADDRESSING - STRAWMAN: LINEAR PROBING Where to put a collided key? Obvious, easiest, awfullest idea: - Just put it in the next free slot, wrapping around if you need to - Called: 'Linear probing' Same keys.. foo +----+ bar [0]| | bletch |----| +----------+ mumble [1]| ------->|"bletch"|6| 8%5==3> chaining |----| +----------+ +----------+ [2]| --------------------->|"mumble"|6| Same AWFUL hash |----| +---------+ +----------+ function: > [3]| ------->|"foo" | 3| key.length() |----| +---------+ 'chaining' is not 'foo' [4]| ----\ +---------+ +----+ \->|"bar" | 3| +---------+ HASH TABLES - OPEN ADDRESSING - STRAWMAN: LINEAR PROBING Where to put a collided key? Obvious, easiest, awfullest idea: - Just put it in the next free slot, wrapping around if you need to - Called: 'Linear probing' Same keys.. foo +----+ bar [0]| | bletch |----| +----------+ mumble [1]| ------->|"bletch"|6| 8%5==3> chaining |----| +----------+ +----------+ [2]| --------------------->|"mumble"|6| Same AWFUL hash |----| +---------+ +----------+ function: [3]| ------->|"foo" | 3| key.length() |----| +---------+ > [4]| ----\ +---------+ +----+ \->|"bar" | 3| +---------+ 'chaining' is not 'foo' HASH TABLES - OPEN ADDRESSING - STRAWMAN: LINEAR PROBING Where to put a collided key? Obvious, easiest, awfullest idea: - Just put it in the next free slot, wrapping around if you need to - Called: 'Linear probing' Same keys.. foo +----+ +------------+ bar > [0]| --------------------->|"chaining"|8| bletch |----| +----------+ +------------+ mumble [1]| ------->|"bletch"|6| 8%5==3> chaining |----| +----------+ +----------+ [2]| --------------------->|"mumble"|6| Same AWFUL hash |----| +---------+ +----------+ function: [3]| ------->|"foo" | 3| key.length() |----| +---------+ [4]| ----\ +---------+ +----+ \->|"bar" | 3| +---------+ HASH TABLES - OPEN ADDRESSING - STRAWMAN: LINEAR PROBING Where to put a collided key? Obvious, easiest, awfullest idea: - Just put it in the next free slot, wrapping around if you need to - Called: 'Linear probing' Same keys.. foo +----+ +------------+ bar [0]| --------------------->|"chaining"|8| bletch |----| +----------+ +------------+ mumble [1]| ------->|"bletch"|6| chaining |----| +----------+ +----------+ [2]| --------------------->|"mumble"|6| Same AWFUL hash |----| +---------+ +----------+ function: [3]| ------->|"foo" | 3| key.length() |----| +---------+ [4]| ----\ +---------+ Lookup: +----+ \->|"bar" | 3| gah +---------+ HASH TABLES - OPEN ADDRESSING - STRAWMAN: LINEAR PROBING Where to put a collided key? Obvious, easiest, awfullest idea: - Just put it in the next free slot, wrapping around if you need to - Called: 'Linear probing' Same keys.. foo +----+ +------------+ bar [0]| --------------------->|"chaining"|8| bletch |----| +----------+ +------------+ mumble [1]| ------->|"bletch"|6| chaining |----| +----------+ +----------+ [2]| --------------------->|"mumble"|6| Same AWFUL hash |----| +---------+ +----------+ function: > [3]| ------->|"foo" | 3| key.length() |----| +---------+ foo is not gah [4]| ----\ +---------+ Lookup: +----+ \->|"bar" | 3| 3>gah +---------+ HASH TABLES - OPEN ADDRESSING - STRAWMAN: LINEAR PROBING Where to put a collided key? Obvious, easiest, awfullest idea: - Just put it in the next free slot, wrapping around if you need to - Called: 'Linear probing' Same keys.. foo +----+ +------------+ bar [0]| --------------------->|"chaining"|8| bletch |----| +----------+ +------------+ mumble [1]| ------->|"bletch"|6| chaining |----| +----------+ +----------+ [2]| --------------------->|"mumble"|6| Same AWFUL hash |----| +---------+ +----------+ function: [3]| ------->|"foo" | 3| key.length() |----| +---------+ > [4]| ----\ +---------+ Lookup: +----+ \->|"bar" | 3| 3>gah +---------+ bar is not gah HASH TABLES - OPEN ADDRESSING - STRAWMAN: LINEAR PROBING Where to put a collided key? Obvious, easiest, awfullest idea: - Just put it in the next free slot, wrapping around if you need to - Called: 'Linear probing' Same keys.. foo +----+ +------------+ bar > [0]| --------------------->|"chaining"|8| bletch |----| +----------+ +------------+ chaining mumble [1]| ------->|"bletch"|6| is not gah chaining |----| +----------+ +----------+ [2]| --------------------->|"mumble"|6| Same AWFUL hash |----| +---------+ +----------+ function: [3]| ------->|"foo" | 3| key.length() |----| +---------+ [4]| ----\ +---------+ Lookup: +----+ \->|"bar" | 3| 3>gah +---------+ HASH TABLES - OPEN ADDRESSING - STRAWMAN: LINEAR PROBING Where to put a collided key? Obvious, easiest, awfullest idea: - Just put it in the next free slot, wrapping around if you need to - Called: 'Linear probing' Same keys.. foo +----+ +------------+ bar [0]| --------------------->|"chaining"|8| bletch |----| +----------+ +------------+ mumble > [1]| ------->|"bletch"|6| bletch is not gah chaining |----| +----------+ +----------+ [2]| --------------------->|"mumble"|6| Same AWFUL hash |----| +---------+ +----------+ function: [3]| ------->|"foo" | 3| key.length() |----| +---------+ [4]| ----\ +---------+ Lookup: +----+ \->|"bar" | 3| 3>gah +---------+ HASH TABLES - OPEN ADDRESSING - STRAWMAN: LINEAR PROBING Where to put a collided key? Obvious, easiest, awfullest idea: - Just put it in the next free slot, wrapping around if you need to - Called: 'Linear probing' Same keys.. foo +----+ +------------+ bar [0]| --------------------->|"chaining"|8| bletch |----| +----------+ +------------+ mumble [1]| ------->|"bletch"|6| chaining |----| +----------+ +----------+ > [2]| --------------------->|"mumble"|6| Same AWFUL hash |----| +---------+ +----------+ mumble is function: [3]| ------->|"foo" | 3| not gah key.length() |----| +---------+ [4]| ----\ +---------+ Lookup: +----+ \->|"bar" | 3| 3>gah +---------+ HASH TABLES - OPEN ADDRESSING - STRAWMAN: LINEAR PROBING Where to put a collided key? Obvious, easiest, awfullest idea: - Just put it in the next free slot, wrapping around if you need to - Called: 'Linear probing' Same keys.. foo +----+ +------------+ bar [0]| --------------------->|"chaining"|8| bletch |----| +----------+ +------------+ mumble [1]| ------->|"bletch"|6| chaining |----| +----------+ +----------+ [2]| --------------------->|"mumble"|6| Same AWFUL hash |----| +---------+ +----------+ function: > [3]| ------->|"foo" | 3| key.length() |----| +---------+ foo is not gah [4]| ----\ +---------+ Lookup: +----+ \->|"bar" | 3| 3>gah +---------+ HASH TABLES - OPEN ADDRESSING - STRAWMAN: LINEAR PROBING Where to put a collided key? Obvious, easiest, awfullest idea: - Just put it in the next free slot, wrapping around if you need to - Called: 'Linear probing' Same keys.. foo +----+ +------------+ bar [0]| --------------------->|"chaining"|8| bletch |----| +----------+ +------------+ mumble [1]| ------->|"bletch"|6| chaining |----| +----------+ +----------+ [2]| --------------------->|"mumble"|6| Same AWFUL hash |----| +---------+ +----------+ function: [3]| ------->|"foo" | 3| key.length() |----| +---------+ > [4]| ----\ +---------+ Lookup: +----+ \->|"bar" | 3| 3>gah +---------+ bar is not gah.. whups HASH TABLES - OPEN ADDRESSING - STRAWMAN: LINEAR PROBING Where to put a collided key? Obvious, easiest, awfullest idea: - Just put it in the next free slot, wrapping around if you need to - Called: 'Linear probing' Same keys.. foo +----+ +------------+ bar [0]| --------------------->|"chaining"|8| bletch |----| +----------+ +------------+ mumble [1]| ------->|"bletch"|6| chaining |----| +----------+ +----------+ [2]| --------------------->|"mumble"|6| Same AWFUL hash |----| +---------+ +----------+ function: [3]| ------->|"foo" | 3| key.length() |----| +---------+ [4]| ----\ +---------+ Do have to avoid Lookup: +----+ \->|"bar" | 3| looping.. gah +---------+ returns not found HASH TABLES - OPEN ADDRESSING - STRAWMAN: LINEAR PROBING Where to put a collided key? Obvious, easiest, awfullest idea: - Just put it in the next free slot, wrapping around if you need to - Called: 'Linear probing' - In real usage, never let the table get very near full before rehashing - But even so, 'linear probing' is awful. = Creates big 'islands' of collisions in the table. Same keys.. foo +----+ +------------+ bar [0]| --------------------->|"chaining"|8| bletch |----| +----------+ +------------+ mumble [1]| ------->|"bletch"|6| chaining |----| +----------+ +----------+ [2]| --------------------->|"mumble"|6| Same AWFUL hash |----| +---------+ +----------+ function: [3]| ------->|"foo" | 3| key.length() |----| +---------+ [4]| ----\ +---------+ Lookup: +----+ \->|"bar" | 3| gah +---------+ returns not found HASH TABLES - OPEN ADDRESSING - STRAWMAN: LINEAR PROBING Where to put a collided key? Obvious, easiest, awfullest idea: - Just put it in the next free slot, wrapping around if you need to - Called: 'Linear probing' - In real usage, never let the table get very near full before rehashing - But even so, 'linear probing' is awful. = Creates big 'islands' of collisions in the table. Same keys.. foo +----+ +------------+ bar [0]| --------------------->|"chaining"|8| bletch |----| +----------+ +------------+ mumble [1]| ------->|"bletch"|6| chaining |----| +----------+ +----------+ [2]| --------------------->|"mumble"|6| Same AWFUL hash |----| +---------+ +----------+ function: [3]| ------->|"foo" | 3| key.length() |----| +---------+ [4]| ----\ +---------+ How to do better? Lookup: +----+ \->|"bar" | 3| Would like to scatter gah +---------+ around the holes as much returns as possible.. not found HASH TABLES - OPEN ADDRESSING - QUADRATIC PROBING Where to put a collided key? - Linear probing: Try slot+1, slot+2, slot+3, slot+4... wrapping around - Quadratic probing: Try slot+1, slot+4, slot+9, slot+16... wrapping around HASH TABLES - OPEN ADDRESSING - QUADRATIC PROBING Where to put a collided key? - Linear probing: Try slot+1, slot+2, slot+3, slot+4... wrapping around - Quadratic probing: Try slot+1, slot+4, slot+9, slot+16... wrapping around + Leaves lots of nearby slots (e.g., slot+2, slot+3) open for other keys.. - Will we even hit every slot, eventually?? HASH TABLES - OPEN ADDRESSING - QUADRATIC PROBING Where to put a collided key? - Linear probing: Try slot+1, slot+2, slot+3, slot+4... wrapping around - Quadratic probing: Try slot+1, slot+4, slot+9, slot+16... wrapping around + Leaves lots of nearby slots (e.g., slot+2, slot+3) open for other keys.. - Will we even hit every slot, eventually?? Suppose everything hashes to 3, for example.. +----+ [0]| | |----| [1]| | |----| [2]| | |----| [3]| 1st| |----| [4]| | +----+ HASH TABLES - OPEN ADDRESSING - QUADRATIC PROBING Where to put a collided key? - Linear probing: Try slot+1, slot+2, slot+3, slot+4... wrapping around - Quadratic probing: Try slot+1, slot+4, slot+9, slot+16... wrapping around + Leaves lots of nearby slots (e.g., slot+2, slot+3) open for other keys.. - Will we even hit every slot, eventually?? Suppose everything hashes to 3, for example.. +----+ [0]| | |----| [1]| | |----| [2]| | |----| > [3]| 1st| |----| [4]| | +----+ HASH TABLES - OPEN ADDRESSING - QUADRATIC PROBING Where to put a collided key? - Linear probing: Try slot+1, slot+2, slot+3, slot+4... wrapping around - Quadratic probing: Try slot+1, slot+4, slot+9, slot+16... wrapping around + Leaves lots of nearby slots (e.g., slot+2, slot+3) open for other keys.. - Will we even hit every slot, eventually?? Suppose everything hashes to 3, for example.. +----+ [0]| | |----| [1]| | |----| [2]| | |----| [3]| 1st| |----| > [4]| 2nd| slot+1 +----+ HASH TABLES - OPEN ADDRESSING - QUADRATIC PROBING Where to put a collided key? - Linear probing: Try slot+1, slot+2, slot+3, slot+4... wrapping around - Quadratic probing: Try slot+1, slot+4, slot+9, slot+16... wrapping around + Leaves lots of nearby slots (e.g., slot+2, slot+3) open for other keys.. - Will we even hit every slot, eventually?? Suppose everything hashes to 3, for example.. +----+ [0]| | |----| [1]| | |----| [2]| | |----| > [3]| 1st| |----| [4]| 2nd| +----+ HASH TABLES - OPEN ADDRESSING - QUADRATIC PROBING Where to put a collided key? - Linear probing: Try slot+1, slot+2, slot+3, slot+4... wrapping around - Quadratic probing: Try slot+1, slot+4, slot+9, slot+16... wrapping around + Leaves lots of nearby slots (e.g., slot+2, slot+3) open for other keys.. - Will we even hit every slot, eventually?? Suppose everything hashes to 3, for example.. +----+ [0]| | |----| [1]| | |----| [2]| | |----| [3]| 1st| |----| > [4]| 2nd| slot+1 +----+ HASH TABLES - OPEN ADDRESSING - QUADRATIC PROBING Where to put a collided key? - Linear probing: Try slot+1, slot+2, slot+3, slot+4... wrapping around - Quadratic probing: Try slot+1, slot+4, slot+9, slot+16... wrapping around + Leaves lots of nearby slots (e.g., slot+2, slot+3) open for other keys.. - Will we even hit every slot, eventually?? Suppose everything hashes to 3, for example.. +----+ [0]| | |----| [1]| | |----| > [2]| 3rd| slot+4 |----| [3]| 1st| |----| [4]| 2nd| +----+ HASH TABLES - OPEN ADDRESSING - QUADRATIC PROBING Where to put a collided key? - Linear probing: Try slot+1, slot+2, slot+3, slot+4... wrapping around - Quadratic probing: Try slot+1, slot+4, slot+9, slot+16... wrapping around + Leaves lots of nearby slots (e.g., slot+2, slot+3) open for other keys.. - Will we even hit every slot, eventually?? Suppose everything hashes to 3, for example.. +----+ [0]| | |----| [1]| | |----| [2]| 3rd| |----| > [3]| 1st| |----| [4]| 2nd| +----+ HASH TABLES - OPEN ADDRESSING - QUADRATIC PROBING Where to put a collided key? - Linear probing: Try slot+1, slot+2, slot+3, slot+4... wrapping around - Quadratic probing: Try slot+1, slot+4, slot+9, slot+16... wrapping around + Leaves lots of nearby slots (e.g., slot+2, slot+3) open for other keys.. - Will we even hit every slot, eventually?? Suppose everything hashes to 3, for example.. +----+ [0]| | |----| [1]| | |----| [2]| 3rd| |----| [3]| 1st| |----| > [4]| 2nd| slot+1 +----+ HASH TABLES - OPEN ADDRESSING - QUADRATIC PROBING Where to put a collided key? - Linear probing: Try slot+1, slot+2, slot+3, slot+4... wrapping around - Quadratic probing: Try slot+1, slot+4, slot+9, slot+16... wrapping around + Leaves lots of nearby slots (e.g., slot+2, slot+3) open for other keys.. - Will we even hit every slot, eventually?? Suppose everything hashes to 3, for example.. +----+ [0]| | |----| [1]| | |----| > [2]| 3rd| slot+4 |----| [3]| 1st| |----| [4]| 2nd| +----+ HASH TABLES - OPEN ADDRESSING - QUADRATIC PROBING Where to put a collided key? - Linear probing: Try slot+1, slot+2, slot+3, slot+4... wrapping around - Quadratic probing: Try slot+1, slot+4, slot+9, slot+16... wrapping around + Leaves lots of nearby slots (e.g., slot+2, slot+3) open for other keys.. - Will we even hit every slot, eventually?? Suppose everything hashes to 3, for example.. +----+ [0]| | 7 |----| [1]| | 8 |----| > [2]| 3rd| slot+4 slot+9 ?? |----| [3]| 1st| 5 |----| [4]| 2nd| 6 +----+ HASH TABLES - OPEN ADDRESSING - QUADRATIC PROBING Where to put a collided key? - Linear probing: Try slot+1, slot+2, slot+3, slot+4... wrapping around - Quadratic probing: Try slot+1, slot+4, slot+9, slot+16... wrapping around + Leaves lots of nearby slots (e.g., slot+2, slot+3) open for other keys.. - Will we even hit every slot, eventually?? Suppose everything hashes to 3, for example.. +----+ [0]| | 7 12 |----| [1]| | 8 13 |----| [2]| 3rd| slot+4 slot+9 14 |----| [3]| 1st| 5 10 15 |----| > [4]| 2nd| 6 11 slot+16 ?? +----+ HASH TABLES - OPEN ADDRESSING - QUADRATIC PROBING Where to put a collided key? - Linear probing: Try slot+1, slot+2, slot+3, slot+4... wrapping around - Quadratic probing: Try slot+1, slot+4, slot+9, slot+16... wrapping around + Leaves lots of nearby slots (e.g., slot+2, slot+3) open for other keys.. - Will we even hit every slot, eventually?? Suppose everything hashes to 3, for example.. +----+ [0]| | 7 12 17 22 |----| [1]| | 8 13 18 23 |----| [2]| 3rd| slot+4 slot+9 14 19 24 |----| > [3]| 1st| 5 10 15 20 slot+25 ?? |----| [4]| 2nd| 6 11 slot+16 21 +----+ HASH TABLES - OPEN ADDRESSING - QUADRATIC PROBING Where to put a collided key? - Linear probing: Try slot+1, slot+2, slot+3, slot+4... wrapping around - Quadratic probing: Try slot+1, slot+4, slot+9, slot+16... wrapping around + Leaves lots of nearby slots (e.g., slot+2, slot+3) open for other keys.. - Will we even hit every slot, eventually?? Suppose everything hashes to 3, for example.. +----+ [0]| | 7 12 17 22 27 32 |----| [1]| | 8 13 18 23 28 33 |----| [2]| 3rd| slot+4 slot+9 14 19 24 29 34 |----| [3]| 1st| 5 10 15 20 slot+25 30 35 |----| > [4]| 2nd| 6 11 slot+16 21 26 31 slot+36 ??? +----+ HASH TABLES - OPEN ADDRESSING - QUADRATIC PROBING Where to put a collided key? - Linear probing: Try slot+1, slot+2, slot+3, slot+4... wrapping around - Quadratic probing: Try slot+1, slot+4, slot+9, slot+16... wrapping around + Leaves lots of nearby slots (e.g., slot+2, slot+3) open for other keys.. - Will we even hit every slot, eventually?? Suppose everything hashes to 3, for example.. +----+ [0]| | 7 12 17 22 27 32 37 42 47 |----| [1]| | 8 13 18 23 28 33 38 43 48 |----| [2]| 3rd| slot+4 slot+9 14 19 24 29 34 39 44 slot+49 ?? |----| [3]| 1st| 5 10 15 20 slot+25 30 35 40 45 |----| > [4]| 2nd| 6 11 slot+16 21 26 31 slot+36 41 46 +----+ HASH TABLES - OPEN ADDRESSING - QUADRATIC PROBING Where to put a collided key? - Linear probing: Try slot+1, slot+2, slot+3, slot+4... wrapping around - Quadratic probing: Try slot+1, slot+4, slot+9, slot+16... wrapping around + Leaves lots of nearby slots (e.g., slot+2, slot+3) open for other keys.. - Will we even hit every slot, eventually?? Suppose everything hashes to 3, for example.. +----+ [0]| | 7 12 17 22 27 32 37 42 47 52 57 62 |----| [1]| | 8 13 18 23 28 33 38 43 48 53 58 63 |----| [2]| 3rd| slot+4 slot+9 14 19 24 29 34 39 44 slot+49 64? |----| [3]| 1st| 5 10 15 20 slot+25 30 35 40 45 50 55 60 |----| > [4]| 2nd| 6 11 slot+16 21 26 31 slot+36 41 46 51 56 61 +----+ ??? This quadratic probing is all such a win? HASH TABLES - OPEN ADDRESSING - QUADRATIC PROBING Where to put a collided key? - Linear probing: Try slot+1, slot+2, slot+3, slot+4... wrapping around - Quadratic probing: Try slot+1, slot+4, slot+9, slot+16... wrapping around + Leaves lots of nearby slots (e.g., slot+2, slot+3) open for other keys.. - Will we even hit every slot, eventually?? HASH TABLES - OPEN ADDRESSING - QUADRATIC PROBING Where to put a collided key? - Linear probing: Try slot+1, slot+2, slot+3, slot+4... wrapping around - Quadratic probing: Try slot+1, slot+4, slot+9, slot+16... wrapping around + Leaves lots of nearby slots (e.g., slot+2, slot+3) open for other keys.. - Will we even hit every slot, eventually?? -> Quadratic probing is guaranteed to find an empty slot IF: (1) The table is LESS than half full, AND (2) The size of the table is a PRIME number. - Really such a win? Well, (1) In practice you don't want tables to get anywhere near full anyway, (2) And linear probing is really really bad. - Well, what about other NON-really-really bad possibilities? HASH TABLES - OPEN ADDRESSING - QUADRATIC PROBING Where to put a collided key? - Linear probing: Try slot+1, slot+2, slot+3, slot+4... wrapping around - Quadratic probing: Try slot+1, slot+4, slot+9, slot+16... wrapping around + Leaves lots of nearby slots (e.g., slot+2, slot+3) open for other keys.. - Will we even hit every slot, eventually?? -> Quadratic probing is guaranteed to find an empty slot IF: (1) The table is LESS than half full, AND (2) The size of the table is a PRIME number. - Really such a win? Well, (1) In practice you don't want tables to get anywhere near full anyway, (2) And linear probing is really really bad. - Well, what about other non-really-really bad possibilities? -> DOUBLE HASHING: Let the 'collision increment' depend on the hash too. HASH TABLES - OPEN ADDRESSING - DOUBLE HASHING Where to put a collided key? .. - Double hashing: Let the 'increment' also depend on the key. E.g., class Test { private Object[] table = new Object[5]; private int toIndex(int hash) { // Given some hash value.. int index = hash%table.length; if (index<0) index = -index; return index; } .. HASH TABLES - OPEN ADDRESSING - DOUBLE HASHING Where to put a collided key? .. - Double hashing: Let the 'increment' also depend on the key. E.g., class Test { private Object[] table = new Object[5]; private int toIndex(int hash) { // Given some hash value.. int index = hash%table.length; if (index<0) index = -index; return index; } private int toIncrement(int hash) { // Given the same hash value int incr = hash%(table.length-1); // Dies on tablesize == 1.. if (incr<0) incr = -incr; return incr+1; // Can't let increment be zero.. } public static void main(String[] args) { Test t = new Test(); for (int h = 0; h < 10; ++h) System.out.printf("hash=%d index=%d increment=%d\n", h, t.toIndex(h), t.toIncrement(h)); } } HASH TABLES - OPEN ADDRESSING - DOUBLE HASHING Where to put a collided key? .. - Double hashing: Let the 'increment' also depend on the key. E.g., $ javac Test.java;java Test hash=0 index=0 increment=1 hash=1 index=1 increment=2 hash=2 index=2 increment=3 hash=3 index=3 increment=4 hash=4 index=4 increment=1 hash=5 index=0 increment=2 hash=6 index=1 increment=3 hash=7 index=2 increment=4 hash=8 index=3 increment=1 hash=9 index=4 increment=2 $ HASH TABLES - OPEN ADDRESSING - DOUBLE HASHING Where to put a collided key? .. - Double hashing: Let the 'increment' also depend on the key. E.g., $ javac Test.java;java Test Same keys.. hash=0 index=0 increment=1 3 foo hash=1 index=1 increment=2 +----+ 3 bar hash=2 index=2 increment=3 [0]| | 6 bletch hash=3 index=3 increment=4 |----| 6 mumble hash=4 index=4 increment=1 [1]| | 8 chaining hash=5 index=0 increment=2 |----| hash=6 index=1 increment=3 [2]| | hash=7 index=2 increment=4 |----| hash=8 index=3 increment=1 [3]| | hash=9 index=4 increment=2 |----| $ [4]| | Same AWFUL hash +----+ function: key.length() HASH TABLES - OPEN ADDRESSING - DOUBLE HASHING Where to put a collided key? .. - Double hashing: Let the 'increment' also depend on the key. E.g., $ javac Test.java;java Test Same keys.. hash=0 index=0 increment=1 3 foo hash=1 index=1 increment=2 +----+ 3 bar hash=2 index=2 increment=3 [0]| | 6 bletch hash=3 index=3 increment=4 |----| 6 mumble hash=4 index=4 increment=1 [1]| | 8 chaining hash=5 index=0 increment=2 |----| hash=6 index=1 increment=3 [2]| | hash=7 index=2 increment=4 |----| hash=8 index=3 increment=1 [3]| -->"foo" 0 coll. hash=9 index=4 increment=2 |----| $ [4]| | Same AWFUL hash +----+ function: key.length() HASH TABLES - OPEN ADDRESSING - DOUBLE HASHING Where to put a collided key? .. - Double hashing: Let the 'increment' also depend on the key. E.g., $ javac Test.java;java Test Same keys.. hash=0 index=0 increment=1 3 foo hash=1 index=1 increment=2 +----+ 3 bar hash=2 index=2 increment=3 [0]| | 6 bletch hash=3 index=3 increment=4 |----| 6 mumble hash=4 index=4 increment=1 [1]| | 8 chaining hash=5 index=0 increment=2 |----| hash=6 index=1 increment=3 [2]| -->"bar" 1 coll. hash=7 index=2 increment=4 |----| hash=8 index=3 increment=1 [3]| -->"foo" 0 coll. hash=9 index=4 increment=2 |----| $ [4]| | Same AWFUL hash +----+ function: key.length() HASH TABLES - OPEN ADDRESSING - DOUBLE HASHING Where to put a collided key? .. - Double hashing: Let the 'increment' also depend on the key. E.g., $ javac Test.java;java Test Same keys.. hash=0 index=0 increment=1 3 foo hash=1 index=1 increment=2 +----+ 3 bar hash=2 index=2 increment=3 [0]| | 6 bletch hash=3 index=3 increment=4 |----| 6 mumble hash=4 index=4 increment=1 [1]| -->"bletch" 0 coll. 8 chaining hash=5 index=0 increment=2 |----| hash=6 index=1 increment=3 [2]| -->"bar" 1 coll. hash=7 index=2 increment=4 |----| hash=8 index=3 increment=1 [3]| -->"foo" 0 coll. hash=9 index=4 increment=2 |----| $ [4]| | Same AWFUL hash +----+ function: key.length() HASH TABLES - OPEN ADDRESSING - DOUBLE HASHING Where to put a collided key? .. - Double hashing: Let the 'increment' also depend on the key. E.g., $ javac Test.java;java Test Same keys.. hash=0 index=0 increment=1 3 foo hash=1 index=1 increment=2 +----+ 3 bar hash=2 index=2 increment=3 [0]| | 6 bletch hash=3 index=3 increment=4 |----| 6 mumble hash=4 index=4 increment=1 [1]| -->"bletch" 0 coll. 8 chaining hash=5 index=0 increment=2 |----| hash=6 index=1 increment=3 [2]| -->"bar" 1 coll. hash=7 index=2 increment=4 |----| hash=8 index=3 increment=1 [3]| -->"foo" 0 coll. hash=9 index=4 increment=2 |----| $ [4]| -->"mumble" 1 coll. Same AWFUL hash +----+ function: key.length() HASH TABLES - OPEN ADDRESSING - DOUBLE HASHING Where to put a collided key? .. - Double hashing: Let the 'increment' also depend on the key. E.g., $ javac Test.java;java Test Same keys.. hash=0 index=0 increment=1 3 foo hash=1 index=1 increment=2 +----+ 3 bar hash=2 index=2 increment=3 [0]| -->"chaining" 2 coll. 6 bletch hash=3 index=3 increment=4 |----| 6 mumble hash=4 index=4 increment=1 [1]| -->"bletch" 0 coll. 8 chaining hash=5 index=0 increment=2 |----| hash=6 index=1 increment=3 [2]| -->"bar" 1 coll. hash=7 index=2 increment=4 |----| hash=8 index=3 increment=1 [3]| -->"foo" 0 coll. hash=9 index=4 increment=2 |----| $ [4]| -->"mumble" 1 coll. Same AWFUL hash +----+ function: key.length() HASH TABLES - OPEN ADDRESSING - DOUBLE HASHING Where to put a collided key? .. - Double hashing: Let the 'increment' also depend on the key. E.g., $ javac Test.java;java Test Same keys.. hash=0 index=0 increment=1 3 foo hash=1 index=1 increment=2 +----+ 3 bar hash=2 index=2 increment=3 [0]| -->"chaining" 2 coll. 6 bletch hash=3 index=3 increment=4 |----| 6 mumble hash=4 index=4 increment=1 [1]| -->"bletch" 0 coll. 8 chaining hash=5 index=0 increment=2 |----| hash=6 index=1 increment=3 [2]| -->"bar" 1 coll. hash=7 index=2 increment=4 |----| hash=8 index=3 increment=1 [3]| -->"foo" 0 coll. hash=9 index=4 increment=2 |----| $ [4]| -->"mumble" 1 coll. Same AWFUL hash +----+ function: key.length() - Will we always find the free slot? Do we need to keep the table half empty like with quadratic probing? HASH TABLES - OPEN ADDRESSING - DOUBLE HASHING Where to put a collided key? .. - Double hashing: Let the 'increment' also depend on the key. E.g., $ javac Test.java;java Test Same keys.. hash=0 index=0 increment=1 3 foo hash=1 index=1 increment=2 +----+ 3 bar hash=2 index=2 increment=3 [0]| -->"chaining" 2 coll. 6 bletch hash=3 index=3 increment=4 |----| 6 mumble hash=4 index=4 increment=1 [1]| -->"bletch" 0 coll. 8 chaining hash=5 index=0 increment=2 |----| hash=6 index=1 increment=3 [2]| -->"bar" 1 coll. hash=7 index=2 increment=4 |----| hash=8 index=3 increment=1 [3]| -->"foo" 0 coll. hash=9 index=4 increment=2 |----| $ [4]| -->"mumble" 1 coll. Same AWFUL hash +----+ function: key.length() - Will we always find the free slot? Do we need to keep the table half empty like with quadratic probing? => We'll always find the free slot, IF the table size is a prime number => Half empty not required, but still don't want the table to get very full. HASH TABLES - OPEN ADDRESSING - DOUBLE HASHING - HASH FUNCTIONS class Test { private Object[] table = new Object[5]; private static int hash(String p) { return p.length(); } // AWFUL! private int toIndex(int hash) { // Given some hash value.. int index = hash%table.length; if (index<0) index = -index; return index; } private int toIncrement(int hash) { // Given the same hash value int incr = hash%(table.length-1); // Dies on tablesize == 1.. if (incr<0) incr = -incr; return incr+1; // Can't let increment be zero.. } String[] keys = { "foo", "bar", "bletch", "mumble", "chaining" }; public static void main(String[] args) { Test t = new Test(); for (String k : t.keys) System.out.printf("hash=%11d index=%d increment=%d (%s)\n", hash(k),t.toIndex(hash(k)),t.toIncrement(hash(k)), k); } } HASH TABLES - OPEN ADDRESSING - DOUBLE HASHING - HASH FUNCTIONS class Test { private Object[] table = new Object[5]; private static int hash(String p) { // STILL PRETTY BAD int sum = 0; for (int i = 0; i1 if the 1+h%(tablesize-1) approach to increment is used.. HASH TABLES - OPEN ADDRESSING - DOUBLE HASHING - UPSHOTS - Make the first probe (0..tablesize-1) and the collision increment both (1..tablesize-1) both depend on the key, but in 'different', 'independent' ways. - Table size must be prime to ensure all slots accessible = (And >1 if the 1+h%(tablesize-1) approach to increment is used.. - Would be nice to have two whole separate hash functions, but often that's hard to come by: index = hash1(key) folded into 0..tablesize-1 incr = hash2(key) folded into 1..tablesize-1 HASH TABLES - OPEN ADDRESSING - DOUBLE HASHING - UPSHOTS - Make the first probe (0..tablesize-1) and the collision increment both (1..tablesize-1) both depend on the key, but in 'different', 'independent' ways. - Table size must be prime to ensure all slots accessible = (And >1 if the 1+h%(tablesize-1) approach to increment is used.. - Would be nice to have two whole separate hash functions, but often that's hard to come by: index = hash1(key) folded into 0..tablesize-1 incr = hash2(key) folded into 1..tablesize-1 - Well-designed implementations of open addressing with double hashing often *significantly outperform* separate chaining implementations! = Q: Why? All the operations are the same big-Oh's? HASH TABLES - OPEN ADDRESSING - DOUBLE HASHING - UPSHOTS - Make the first probe (0..tablesize-1) and the collision increment both (1..tablesize-1) both depend on the key, but in 'different', 'independent' ways. - Table size must be prime to ensure all slots accessible = (And >1 if the 1+h%(tablesize-1) approach to increment is used.. - Would be nice to have two whole separate hash functions, but often that's hard to come by: index = hash1(key) folded into 0..tablesize-1 incr = hash2(key) folded into 1..tablesize-1 - Well-designed implementations of open addressing with double hashing often *significantly outperform* separate chaining implementations! = Q: Why? All the operations are the same big-Oh's? = A1: In the real world, Big-Oh hides a multitude of sins. HASH TABLES - OPEN ADDRESSING - DOUBLE HASHING - UPSHOTS - Make the first probe (0..tablesize-1) and the collision increment both (1..tablesize-1) both depend on the key, but in 'different', 'independent' ways. - Table size must be prime to ensure all slots accessible = (And >1 if the 1+h%(tablesize-1) approach to increment is used.. - Would be nice to have two whole separate hash functions, but often that's hard to come by: index = hash1(key) folded into 0..tablesize-1 incr = hash2(key) folded into 1..tablesize-1 - Well-designed implementations of open addressing with double hashing often *significantly outperform* separate chaining implementations! = Q: Why? All the operations are the same big-Oh's? = A1: In the real world, Big-Oh hides a multitude of sins. = A2: No single reason. But typically including things like: - Cache behavior. Separate chaining jumps all over the place - One integer addition and one comparison is fast compared to memory access HASHMAP VS TREEMAP - HashMap is one of two main Map implementation strategies. => The other is called TreeMap - TreeMap uses a kind of tree to store the associations (we don't have to worry specifically how it does it.) => There are lots of possible tree algorithms for this sort of task. Some names: 'red-black tree', '2-3 tree', 'AVL tree'... - What we really need to know is this comparison chart between Hash and Tree: For n entries in the Map or Set: HashMap, HashSet TreeMap, TreeSet storage method hashtable red-black tree space used O(n) O(n) put speed approx O(1) O(lg n) contains speed approx O(1) O(lg n) key methods hashCode, equals compareTo, equals iteration order arbitrary sorted Rule of thumb: Use a Tree only if you need them sorted, otherwise use a Hash DESIGNING WITH OBJECTS DESIGNING WITH OBJECTS - In general, make a class to represent each 'natural kind' of thing that occurs in a programming problem. - (In addition, extra classes are often used to implement 'design patterns' -- stereotypical ways of accomplishing various tasks.) DESIGNING WITH OBJECTS - In general, make a class to represent each 'natural kind' of thing that occurs in a programming problem. - (In addition, extra classes are often used to implement 'design patterns' -- stereotypical ways of accomplishing various tasks.) - Beginners to object-oriented programming tend to make two mistakes: = Not making enough classes (E.g., everything in one class) = Making too many classes (E.g., a separate class for every integer) - For now, assume each class goes into its own .java file, with the file named identically to the class name. - Classes stored in the same directory can generally access each other DESIGNING WITH OBJECTS - In general, make a class to represent each 'natural kind' of thing that occurs in a programming problem. - (In addition, extra classes are often used to implement 'design patterns' -- stereotypical ways of accomplishing various tasks.) - Beginners to object-oriented programming tend to make two mistakes: = Not making enough classes (E.g., everything in one class) = Making too many classes (E.g., a separate class for every integer) - For now, assume each class goes into its own .java file, with the file named identically to the class name. - Classes stored in the same directory can generally access each other - Guess likely classes: = in a poker-playing program = in an online music store = in a dorm-room assignment system = in a spreadsheet = in a video-player program = in a artificial ecology simulation program