((I)) The SiteRiter Dynamic Web Site Creation System: An Introduction ----------- ((I.0)) Table of contents (I.1) The pitch; slogans (I.2) Background (I.3) A few examples (I.4) Terminology ----------- ((I.1)) The pitch; slogans ((I.1.1)) "Do you need a really great web site?! Of COURSE you DO! EVERYBODY does! But: HOW are you going to make it?? "PHP? RoR? JSF? Bah! Too much work! Forget about 'em! The AMAZING SiteRiter will INSTANTLY build and serve web sites for you -- even BIG web sites with thousands, or MILLIONS, or even BILLIONS of web pages, _OR_ _MORE_!! WOW! "SiteRiter Haz Teh Pages!" "How much would YOU PAY for a giant web site PACKED FULL of all your critical marketing content, and crosslinked nine ways to Sunday! "But Wait! Don't Answer That! "Yes friends, it's true. SiteRiter will EASILY serve up your MOST GIGUNDANTIC web sites, with up to BILLIONS or TRILLIONS of web pages _OR_ _MORE_, without breaking a sweat! "Does SiteRiter sound TOO INCREDIBLE? Is there a CATCH? "Wait a minute! How many files do _YOU_ have to supply to make all that magic happen??? To make YOUR GIANT web site? "Listen to this my friends! Do you have to supply billions of files! No no no no! NOT billions! NOT millions! NOT even THOUSANDS OR HUNDREDS! YOU SUPPLY JUST \\\_ONE_/// COMPACT FILE! Aaaaah! "**N*O*W** HOW MUCH WOULD YOU PAY?!" ((I.1.2)) "SiteRiter Builds Buzz Fast!" ((I.1.3)) "This ain't your grammar school grammar! (But can be!)" ((I.1.4)) "Keep calm and carry on" ----------- ((I.2)) Background ((I.2.1)) From http://en.wikipedia.org/wiki/Grammar_(computer_science) In theoretical computer science, a formal grammar (sometimes simply called a grammar) is a set of formation rules that describe which strings formed from the alphabet of a formal language are syntactically valid within the language. ... ((I.2.2)) In that sense, a grammar _defines_ a language, using a set of grammar rules. With a grammar in hand, it is possible to 'parse' an input 'sentence' to determine if that input is legal (which is to say, 'grammatical') according to the given grammar. ((I.2.3)) But, in addition to such 'input parsing', a grammar can also be used 'in reverse', to _generate_ sentences that are legal according to the grammar, and that is the key to SiteRiter. ((I.2.3.1)) With the easy-to-use 'SiteRiter Rule Format', you simply write a set of 'grammar rules' describing your web site -- but not the old pokey one-page-at-a-time way. The SiteRiter Rules File describes the WHOLE web site at once! ((I.2.3.2)) When you run SiteRiter, it quickly loads up your Rules File, and then it's ready to begin serving pages immediately, generating and linking pages on demand as requests come in. ((I.2.3.3)) With SiteRiter Page Generator in control, those pesky '404 errors' are a thing of the past! SiteRiter GUARANTEES to produce a valid webpage containing your content for every possible URL that arrives at your site! ((I.2.3.4)) And using SiteRiter's amazing pendant-patting "URLalyzer Technology", you are also guaranteed that even though myriads of different web pages are possible (depending on your Rules File), any particular URL will always reach exactly the same web page no matter how many times it is accessed. Rest assured that when your target web users bookmark their favorite pages, they will be able to return and find them there! (Given your Rules File). ((I.2.4)) SiteRiter is a web site generator based on the SiteRiter Site Definition Language (SDL). A full SiteRiter system includes a custom web server, the SiteRiter SDL parser (SDLP), and the SiteRiter Super Stochastic Page Information Generator (SSPIG). ((I.2.4.1)) Keep calm. ((I.2.4.2)) Carry on. ----------- ((I.3)) A few examples ((I.3.1)) Just to get concrete immediately, let's look at a few sample SiteRiter Rules Files and see what kind of output (e.g., web pages) they can generate. We won't explain all the fine details at this point, so it'll be a bit confusing for a while, but that's good -- it should trigger questions for you to think about as you read on (and reread back). ((I.3.3)) Examples ((I.3.3.1)) Suppose a Rules File contained just one line: ----BEGIN-INPUT---- page = "Hello world!"; -----END-INPUT---- With this Rules File, the only possible resulting web page will be: ----BEGIN-OUTPUT---- Hello world!-----END-OUTPUT---- (note the lack of a newline after 'world!'), and that same output will be generated for all URLs. ((I.3.3.2)) This Rules File is quite similar: ----BEGIN-INPUT---- page = "Hi there! "; -----END-INPUT---- and in this case the output generated for all URLs is: ----BEGIN-OUTPUT---- Hi there! -----END-OUTPUT---- ((I.3.3.3)) Now, here's a more interesting Rules File: ----BEGIN-INPUT---- show = greet " " user "! "; greet = "Hello" | "Good" " " "day"; user = "world" | "to you"; -----END-INPUT---- and in this case, in a manner that is completely determined by what URL is requested, the user will receive one of the following four pages: ----BEGIN-OUTPUT1---- Hello world! -----END-OUTPUT1---- ----BEGIN-OUTPUT2---- Hello to you! -----END-OUTPUT2---- ----BEGIN-OUTPUT3---- Good day world! -----END-OUTPUT3---- ----BEGIN-OUTPUT4---- Good day to you! -----END-OUTPUT4---- ((I.3.3.4)) One last example, for now, starts to illustrate the awesome power of this fully-operational web site generation system: ----BEGIN-INPUT---- page = begin middle end; begin = 'a'; middle = | "b" middle; end = "c "; -----END-INPUT---- which can produce -- in principle -- an _infinite_ number of different web pages, of which these are just samples (leaving out the BEGIN/END bracketing in this case): ac abc abbc abbbc abbbbc abbbbbc abbbbbbc abbbbbbbc etc etc. Again, what specific page is generated depends completely on what URL is requested, and any specific URL will always produce the same resulting page. With this Rules File, the SSPIG will generate the "ac" page for about half of all URLs, the "abc" page for about one quarter of all possible URLs, the "abbc" page for about one eighth of URLs, and so on. ----------- ((I.4)) Terminology ((I.4.1)) These definitions don't have to make complete sense on first reading, so Keep Calm, but they should start to make sense once you've gotten all the way through the SiteRiter documents once or twice. ((I.4.1.1)) Note that some of these definitions are a bit non-standard, since they're somewhat spun towards SiteRiter's view of grammar rules and parsing rather than attempting to be completely general. ((I.4.1.2)) Note also that, as it's difficult to be precise before enough terminology is introduced, some statements here may be slightly off when taken as descriptions of SiteRiter's approach and behavior. If you find discrepancies between the descriptions in this section and stuff later (particularly in section (C)), that later stuff should probably be believed over this. (And of course if anything seems hopelessly busted or even just fishy, ask a question!) ((I.4.2)) Terms ((I.4.2.1)) TOKEN: A sequence of one or more characters (depending on specifics) that appear in a SiteRiter Rules File and are treated as a single unit for parsing purposes. Some types of tokens are called: NAME, LITERAL, and OPERATOR. ((I.4.2.2)) NAME: A token used to identify a SiteRiter RULE or SELECTOR. A name consists of one or more characters that obey the rules for Java identifiers. When EXPANDed, a name token is interpreted as the name of a RULE, and the corresponding rule is expanded, if it exists. See details in (C.3.4). ((I.4.2.3)) LITERAL: A token that provides zero or more specific characters to be incorporated into the output if the LITERAL is EXPANDed. There are two subvarieties of LITERAL token: DLITERAL and SLITERAL. ((I.4.2.3.1)) A DLITERAL token consists of an initial double-quote char (in Java, represented by '"'), followed by a 'body' of zero or more char that are _not_ the double-quote char, followed by a final double-quote char. When EXPANDed, a DLITERAL produces its body -- the zero or more chars excluding the initial and final double-quote chars. ((I.4.2.3.2)) A SLITERAL token consists of an initial single-quote char (in Java, represented by '\''), followed by a 'body' of zero or more char that are _not_ the single-quote char, followed by a final single-quote char. When EXPANDed, a SLITERAL produces its body -- the zero or more chars excluding the initial and final single-quote chars. ((I.4.2.4)) OPERATOR: A token consisting of a single char. OPERATOR tokens are used to describe the syntactic structures of Rules Files; operator tokens are never directly EXPANDed. There are four subvarieties of OPERATOR token: EQUAL, BAR, SEMICOLON, and COLON. ((I.4.2.4.1)) An EQUAL token is produced when the parser reads a '=' char outside of a LITERAL. The EQUAL token separates a RULE name from its associated RULE body. ((I.4.2.4.2)) A BAR token is produced when the parser reads a '|' char outside of a LITERAL. The BAR token separates one SEQUENCE from another inside a RULE body. ((I.4.2.4.3)) A SEMICOLON token is produced when the parser reads a ';' char outside of a LITERAL. The SEMICOLON token terminates a RULE body. ((I.4.2.4.4)) A COLON token is produced when the parser reads a ':' char outside of a LITERAL. The COLON token is used to add a SELECTOR to a RULE name. ((I.4.2.5)) A SEQUENCE consists of zero or more 'sequence tokens', each of which must be either a NAME or a LITERAL token. (Note that a SEQUENCE is not itself a token.) When EXPANDed, a sequence outputs the result of expanding each of its sequence tokens, in order. ((I.4.2.6)) A CHOICE consists of zero or more SEQUENCEs, with a BAR token separating the sequences from each other, if there is more than one sequence in the choice. ((I.4.2.7)) A RULE consists of a HEAD, followed by a EQUAL token, followed by a CHOICE, followed by a SEMICOLON which terminates the rule. A rule associates a choice with a name. When EXPANDed, a rule outputs the result of expanding one of the SEQUENCEs of its CHOICE. The mechanism determining which sequence is expanded is described in (C.3.4.2). ((I.4.2.8)) A rule HEAD consists of a NAME token, whose contents is called the 'name of the rule', optionally followed by a SELECTOR. ((I.4.2.9)) A SELECTOR consists of a COLON token followed by a NAME token. The appearance of a SELECTOR in a RULE HEAD modifies the mechanism by which the rule is EXPANDed. ((I.4.2.10)) EXPAND is the process of performing an expansion. ((I.4.2.10.1)) EXPANSION is the recursive process performed by the SSPIG. Overall, expansion produces the output from a SiteRiter Rules File; deeper in the recursion, expansion can be applied to RULEs, SEQUENCEs, and so on. The specifics of expansion depends on what is expanded, as sketched above and described in (C.3). ((I.4.2.11)) The START SYMBOL is the NAME of the first RULE in a SiteRiter Rules File. ((I.4.2.12)) The LEXER is the part of a program that converts input chars into tokens, to make parsing easier. ((I.4.2.13)) LEXING is the job performed by the lexer. Also called 'tokenizing'. ((I.4.2.14)) The PARSER is the part of a program that converts a stream of tokens into various internal data structures appropriate to the language being parsed, using the syntax of the language being processed to determine what actions to perform. ((I.4.2.15)) PARSING is the job performed by the parser. ((I.4.2.16)) TOKENIZING is the process of converting a sequence of input characters into a sequence of logical tokens. The job performed by a lexical analyzer. Also called lexing.