- Administrivia: - Tutoring available - Keep hacking away on M2 (due 1 week from today!) - Lab section on Friday? ==================== - The Java memory model and references - This is another exercise in how to think: having the right _mental_model_ of the machine -- the "what's going on under the hood?" model. - A good (correct) mental model is _absolutely_necessary_ to be able to program in a language -- you _have_ to be able to predict how the machine will respond. Also allows you to think about the capabilities of the machine (and therefore to design around/by employing those capabilities). - Many people found the *Set (entrySet(), keySet(), and values()) methods to be the hardest - Why? What's the conceptual problem? - Issue is: many people have the wrong mental model -- understanding the Java data/memory model improperly - Problem is that Java is designed to make your life easier in some ways, but it does so by abstracting the machine - Hides important details about how all of this stuff works... - Common picture: member var == "has-a" == "contains" - Consider the following very simple linked list class: public class MyList { private String _data; private MyList _next; } (omitting methods, constructors, etc.) - If you think of _data and _next as being _containers_, then you get the following picture: ** box diagram: MyList encloses _data and _next; _next ** recursively encloses another MyList, etc. - That's an ok picture for a singly linked list, but what, then, does a tree look like? - Even better -- what does the picture look like for a circularly linked list? - Note also that boxes have to "get bigger" as you add more data. - What if, instead of "String _data", it was "Vector _data"? - Memory doesn't work that way. - The (more nearly) "correct" memory picture is the following: - Memory is a long, flat array. - Java objects are just adjacent cells of that array. - When you call "new" on an object, that object is created as a fixed-length segment of that array. Thereafter, _that_object_ _is_never_moved_. - Any variable/class member var is _not_ a container by itself. Instead, it's a _pointer_ to some other blob of memory. (Think C-style pointer and you're in the right place.) - The _only_ things that are allocated blobs of memory are: - Atomic types (char, int, double, boolean, etc.) - References (pointers) - Groups of the above. (Arrays and objects.) ** Picture: memory layout for the object public class MemLayoutObj { public int x; public char c; public boolan b; public double y; public byte[] bArr=new bArr[8]; public String s; } - Instead of member var == "contains", think member var == "points to" (or "refers to") - Need this memory model to explain a couple of different effects: - Object resizing: - Many Java objects are self-resizing, but that's hiding a bunch of detail from you. - In M1, you saw how to build a self-resizing object: - Allocate a new blob of memory - Copy data from old memory area to new - Deallocate the old blob of memory (we skipped this step b/c the JVM's garbage collector picks this up for us) - The key is: you _always_ need to allocate new memory rather than growing old mem. (Don't know that you have enough space to just grow a thing in place.) - Data views - The *Set methods return "data view" objects - Most common confusion: "How does the data get into the set object? How can that object only take up O(1) space when it has to hold all of those key/value pairs? Where does its data come from?" - This comes from a misunderstanding that the returned Set objects should "contain" the data. - Instead of "contains", think that the Set objects "point to" data. ** Picture: memory layout of simple HashMap and EntrySet objects. - Note that the EntrySet object simply _refers_to_ data that already exists. Doesn't need to copy/create any new data. - This idea is called a "data view" - Design pattern #1: - Defn: data view: an object that provides a functional interface for interacting with an underlying data store - Note: _all_ Java objs that provide methods for interacting with private data are view objects. - MondoHashMap itself is a view object -- provides you a common interface (get(), put(), containsKey(), etc.) for interacting with a complex hidden object (the actual hash table itself) ==================== - Design exercise (group) - Design the inner loop for P1M2: the breadth-first search -- download and parse page cycle - Assume that you have magic methods public Reader downloadURLContent(String url) public List getPageURLs(Reader page) - Don't worry about mechanics of parsing page yet - Q1: How do you ensure that you don't crawl outside of unm.cs.edu? - Q2: How do you ensure that you never crawl any page twice? - Q3: What data structures do you need to support this loop?