Command line help for SBNet -flag : sets up a flag's value from the command line -var : sets up a map between varName and varValue so that any time the parser sees it automatically replaces it with Language Semantics: calculate CPTs - calculates the CPTs of the current Bayes network score - calculates the score of the current bayes network (given training data?) likelihood - calculates the likelihood of the on the error training - Calculates the training error of all datasets used to train net general - Calculates the error of testing datasets on the model trained with the training datasets leaveOneOut - Calculates the leave-one-out accuracies of all the training data. Controlled by LOOCVRequantize and leavexout flags. Also controlled by other LOOCV flags. Also controlled by findBestNBTK and findBestN flags. if findbestnbtk or findbestn are true, this will find the best setting for those values based on all the training data. Iteratively leaves out all of the data in the workspace, so make sure all data is linked to at least one BN, or some folds wont have removed data from the BNs. 2sLeaveOneOut - Calculates the 2 state leave one out accuracies of all the training data. This function will call the leaveOneOut function to train the bestnbtk and n hyperparameters, search the structure based on those values, and then classify some left out data. Thus, the classification on the left out data does not use the data at all in learning (unlike the leaveOneOut which uses left out data to train hyperparameters). compare nets Compares the structural difference between two networks. Assumes both nets have the same set of nodes. Outputs the number of number links in each network, the number of shared links, and the number of unique links each of the nets have. copy net Creates a new Bayes net called . It then duplicates the set of nodes in the source, the set of links in the source and the CPT tables in the source. That is all it copies. CPT . Copies the source's CPTs into the target's CPT. Source and target must have same arity, same # of parents and the parents must all have the same arity. ***WARNING*** This function also copies over the intermediate JPD that was created during the "set cpt distribution" functions. This intermediate JPD describes the actual behavior of the node (as opposed to the conditional behavior of the node given by the CPT), but that behaivor is based on the parent structure of the source node that this CPT is being copied from, thus the intermediate JPD will not reflect the actual behavior of this node (since the actual behavior of the parents may not be accurately described in the JPD). Thus, future calls to set this CPT's distribution may not work as intended. ***WARNING*** create link - creates a link in the currently selected bayesian network net - creates a new bayesian network and makes it the currently selected one node - creates a node in the currently selected bayesian network ***Any node with a name with extension '_t' or '_tp1' is assumed to be a node in a DBN, and thus data for this node will be treated differently than data for a normal BN node*** ***Any node with a name starting with 'psxxxx_' where x is a number [0-9] will be assumed to be a pseudonode that refers to the same data as the node name without the 'psxx_' prefix. Other than that it behaves exactly like any other node*** random - creates random BN structure elements links For each node, creates links with randomly chosen parents. It will not add links with _t nodes as a child or _tp1 nodes as a parent. The value is a real value between 0 and 1 that a randomly assigned parent will be accepted. If this value is 1, then each node will have parents. If it is 0, every node will not have any parents. The expected number of parents per node is *. **This command uses the flag ForceCascadingLinks. If it is true, each node can only have a parent that is occurs before it in the BN's list of nodes (this ensures that the CPT generators can deal with it) SPECIAL CASE - If = 0 and = 1, exactly one random link will be created with a random parent and a random child. A variable will be added to the variable list "__RANDOM_LINK_PARENT_0__" and "__RANDOM_LINK_CHILD_0__" where '0' is an auto-incrementing number. This allows subsequent commands to reference the randomly generated parent. E.g., set cpt distribution uniform __RANDOM_LINK_CHILD_000__. **IMPORTANT** This function fails to make a link when it randomly picks a link that already exists, or picks a link that would violate the forcerandomlinksmaxparents flag setting. (I should fix it so it retries till it succeeds) So for now, repeating this command 50 times, e.g., is not guaranteed to add 50 links, even if 50 links are possible. nodes <# nodes> Creates <# nodes> nodes with the name _x where x is a unique number starting at 0. Each node is created of airity . can either be BN or DBN. If it is DBN, two nodes are actually made one with _tp1 extension and one with _t extension. search_omission - indicates that the node should not be searched in the search simulated data creates a file of simulated data for the bayesian network using the simulated data flags (all flags start with the string "simData"). datas Creates multiple simulated data files based off of a bayes net. Each file will be named _x where x is an integer starting at 0. After creating the filename, this function simple calls the create simulated data function. variable Adds a variable to the RV list. Whenever the text of the variable is seen in the future, it will automatically be replaced with the value of the variable by the parser. Good for defining variables for things like long file names to make a SBNet script file more readable. name is alphanumeric and value is of type filename (alphanumeric + things like '_' ':' '\' etc). destroy dataset - removes a dataset from the workspace, and unlinks it from all bayesian networks it was linked to. link - removes a link from the bayes net node - removes a node from the bayes net (and any links connecting to it) exit - Closes SBNet. help - Displays a help menu infer DBN nodes - infers the nodes in a bayes net based on a data file. It creates a new bayes net with two nodes for each column in the datafile, a _t node and a _tp1 node. list nodes - Outputs a table with all the nodes in the net links - Outputs a table with all the nodes and the links between them in the net nets - Outputs a table with all the nets in the workspace dataset - Outputs a table listing the quantized values in a dataset datasets - Outputs a table listing all the datasets loaded into the net DOTGraph - Outputs a file with the DOT graph structure load flags - lists the settings of the flags (frequently this is not up to date) CPT * - lists the CPT(s) of the node(s) hierarchy * - lists the structure of hierarchy link data - Links the dataset to the bayes net . Bayes nets can only be trained on datasets that are linked to them. hierarchy - Links the hierarchy to the bayes net. Only one hierarchy can be linked to a bayes net at a time. **Must be done after the nodes in a hierarchy are created** load classifications - Load in all the classifications for the datasets from a file in the format specified in the fileFormats.txt file cdata - Loads *continuous* data. The input file is text. See fileFormats.txt. The behavior of loading cdata is controlled by flags that start with "continuous". See "continuousDataZeroMean". data - Loads the discrete data in file fname into the bayes net. The input file is binary (output of prepdata function). See fileFormats.txt. hierarchy - loads a hierarchy into the workspace net - Loads a bayesian network called from a file called of the format listed in the fileFormats.txt file. NodeSizeInfo - Loads a file that indicates how "big" each of the nodes in the Bayesian network are. E.g., for fMRI data, the nodes are regions of the brain that are composed of voxels. Some regions have many voxels (Left Hemisphere) some have few (nodule). The only thing to currently use this info is shrinkage experiments. For format of infput file, see fileFormats.txt. prepdata - given a directory name, and a pattern, the pattern is used to match against all files in that directory. Any files matching the string will be prepped and an output binary file will be created that can be read in by the dbn learner. A file matches the string if the filename contains the string (nothing as complicated as including wild cards or anything). The output file name will be same as the input file except it will have an additional .sbn extension (any file with a .sbn extension will be ignored). The hash tables used in prepdata are consistant across an entire sbnet session. Subsequent calls of prepdata will use the same mappings. The flags forcePrepDataKeepNumericalTagsIncrement and forcePrepDataKeepNumericalTags are used to control prepdata's behavior. *** IMPORTANT *** All of the files for an experiment (across all classes) must be prepped in the same sbnet session since prepdata maps the discrete RV values to the discreted values 0 -> airity in the order they were seen in the files. If seperate sessions are used across classes, the discrete values that should be the same can be (and probably will) be mapped to different discrete values in the prepped data. This is not necessary if the forcePrepData flags are used. *** IMPORTANT *** quantize * quantize node * Quantizes a dataset with a particular quantization method. This process of quantization is controlled by flags that start with "quantize". ***IF __all__ TAG IS USE, ONLY QUANTIZES TRAINING DATASET*** If "node" is indicated, only the named node in the datasets will be quantized q_method: "CIND" - [C]lass [IND]ependent quantization. All node's are quantized in a class-independent way. This means that the same quantization setting applies to all the same node's in each of the datasets. "slidingWindowMean" - Calculates boundaries at zero, 1/2 max and 1/2 min values in the each RV series independentely. Thus, every RV in every dataset has different quantization boundaries. *** IMPORTANT: The data should have zero mean *** Currently, the quantization goes down to 4 states: very low, low, high, very high. To match old HBM experiements, load difftrend continuous data and turn off m_continuousDataZeroMean. *** NOTE - THE NAME 'slidingwindowmean' IS A HORRIBLE NAME FOR THIS--IT SHOULD BE 'highlow2' or something like that*** "matchQuantizedPS" - Quantizes the continuous data in the dataset with the boundaries that result in the closest distribution of discrete values that are present in the dataset's current set of discrete data. (Originally, this attempted to match power spectrums, but that actually causes significant difference in the two resulting distributions). Uses zero as the middle quantization point. ***For 2-ary data, just uses zero as the middle quantization point.*** "manual" - Quantizes the dataset manually based on a list of floats as boundaries. quit Closes SBNet randomize dataset - randomly permutes the rows in a dataset so that confidence measures can be taken Columns in the dataset remain unchanged. Actually, uses flag datarandomizationmethod to determine how data is randomized. Also uses flag preserveCrossCorrelation. Also uses flag forceCrossCorrelationsToChangeAcrossDatasets. hierarchy - Randomly modifies the loaded hierarchy. Does so in a way to make it that there are not isolated h-children (provided that there are not isolated h-children to begin with). **** CURRENTLY, THE RANDOMIZATION MUST BE DONE BEFORE LINKING HIERARCHY TO GRAPH. OTHERWISE, EACH NET WILL NOT HAVE CURRENT VIEW OF HIERARCHY **** repeat [additional params] Repeats the last command number of times. type: "exactly" - Repeats the last command exactly as it was done before. No additional parameters "incNumber" - Repeats the last command, but increments numbers parsed in the command. additional parameters are a list of doubles of the form ... "endList" : 1 = integer, 2 = double, 3 = string, 4 = intList, 5 = doubleList : Which of the whatToInc's to increment. This requires knowledge of how commands are parsed. For instance, if a command requires two strings, the first string parsed will typically be 1, the second will be 2. : Strings and lists have multiple numbers that could be incremented. This value tells which one to increment. : How much should the last value be incremented. ***ALL VALUES ARE 1-INDEXED*** E.g.s, to repeat "load data john_1.txt" while incrementing the number in the filename, use "repeat 10 incNumber 3 1 1 1 endList". The first 1 says to increment string 1, the second 1 says to increment the first number found in the string, the third 1 says to increment that number by 1. This repeat command does not modify random variables defined in the parser--those are replaced before the repeat command ever knew they existed. Use "literal" repeats when using parser defined random variables. "literal_incNumber" - Repeats the last command, but does so by repeating the exact string of literals given in the last command. This repeat command will modify parts of parser defined variables. Additional parameters are a double list of the form: "endList" - Which token in the command should be incremented. - Which number of the chosen token be incremented. - How much should the number in the token be incremented. *** Do not place this after a line with a comment. Repeat history is cleared after comments *** (should fix that) *** Known bug - do not place a repeat command after a line that had a blank line before it. I.e: 1) 2) create link a1 b1 3) repeat 5 literal_incNumber 3 1 1 endList This will cause an error because line 1) was blank. *** save net - saves the bayesian network named to the file select net - Selects the currently activated bayesian network set classification Sets the classification of a dataset to that of a bayesian network. This is used when computing accuracy of classification on structure searches. An alternate to specifing each file independently is to use the "load classifications" command. CPT value Sets a value in a CPT to be a certain value. After the value is set, the other entries in the CPT multinomial are normalized so that their sum = 1- and the CPT is stil a valid conditional probability distribution. The index list of 0-indexed. For example: "set CPT value bnet node_a 0 2 3 endList .2" would set the 0x2x3 entry in the node node_a in the bayes net bnet to the value of .2, and the other values at _x2x3 in the CPT would be renormalized. If the contains any negative values, a random value for that index will be chosen. If value is negative, then the absolute value of the value will be used as a range to choose a value from around the current value of the CPT element. E.g., if "-0.1" is read and the current value in the cpt is "0.4", a random value between "0.3" and "0.5" will be chosen. distribution * [optional parameters] Sets the CPT for a node to be a certain type of distrubution. If setting a single node's CPT dist, the tp1 nodes paired with the t nodes of the parents of this node must have already computed their distributions. If they have not, this function will ***SILENTLY FAIL*** (need to fix at some point). E.g., if RV_0_t -> RV_1_tp1, in order to fill in RV_1_tp1's CPT, the CPT for RV_0_tp1 must be filled in. One way to avoid this confusion is to simply fill in all CPTs with uniform distributions before trying to fill in individual CPTs. This now requires the CPTGenAssumeUnknownParentJPDsUniform flag to be set to true, otherwise even the uniform value will fail. The following values are valid for "mimicParents" - Sets the CPT so the child node duplicates the value of the parent nodes. If there is only a single parent, the child will always be the same value of the parent. If there are multiple parents, the child will randomly be a value based on equally giving weight to the values of its parents. E.g., assume three parents, two have value '0' and one has value '1'. The child will have a 66% chance of being '0' and a 33% chance of being '1'. No optional parameters. **FUTURE** add an optional parameter that gives stochastic behavior, so child is not always parent. **NOTE** Right now, the CPT is smoothed so the child doesn't perfectly mimic the parents. Each element has .1 (or something) added to it, which is actually a pretty big smoothing factor. This needs to be a parameter** "uniform" - Sets every entry in the CPT to be 1/airity "infoScalable" - Sets each multinomial in the CPT to be a randomly generated CPT with normalize information score (normalized shannon information) specified by flag simDataInfoDistScore. Flag simDataInfoDistScorePrec determines how close the iteratively caluclated score must be to simDataInfoDistScore before accepting the distribution. "relativeMutInfo" - Creates a CPT in which each of the parents of the child node have a specific normalized mutual information score with the child. This value requires the optional parameters to be a list of doubles. The doubles give the desired normalized mutual info score of the each parent (listed in order the parents are listed in the child's parent list). Each number gives mutual information between the dep var and one of the indep vars 0 = indep var *IS NOT* predictive of dep var max = indep var *IS* predictive of dep var (where max = value of base information level) "changeMutInfo" - changes the current CPT of the Bayesian network such that the relative mutual info between the child and the specified parent goes to the new specified value. The [optional parameters] are [x:int y:double] where x is which parent to change, and y is the new mutual information. x must range from 1 to the number of parents (notice that 0 is not the first parent, that is actual the index of the child). "randomizeCPT" - changes the current CPT so that it is a randomized version of the previous one. The optional parameters are [#changes amount]. #changes tells how many locations in the CPT to randomly pick, and amount tells how much mass to move either to or from that location. This procedure actually modifies the intermediate JPD made by the other CPT creaters, but does not actually change the intermediate JPD. So, this function can be called time and time again, and each time its called, it is modified the same intermediate JPD (so successive calls to this are not cumulative in their effect). data training - indicates the dataset is a training dataset testing - indicates the dataset is a testing dataset flag - Sets the value of flag to value . **WARNING** - Setting a flag does not invalidate the CPTs in the BNs, even though changing a flag after some CPTs have been calculated would result in different learned CPTs had the CPTs been relearnt. **WARNING** allowIsochronalLinks - Allows isochronal links. **NOTE** This only corresponds to links added during explicit BN construction. It does not cause isochronal links to be searched for. BDeEquivSampSize - The equivalent sample size. BDEPrior - determines which prior the BDE algorithm will use. ILL_FORMED - SHOULD NOT USE. Sets Nijk = 1 and Nij = 1. Who knows what this does. Only here to maintain ability reproduce early tests. K2 - Sets the Nijk = 1. Nij = sum of Nijk's. ***THIS IS NOT A BDE PRIOR, IT IS A BD PRIOR*** SMALL_UNINF - Sets Nijk to 1/(#configs of parent&child), Nij = sum Nijk's. Both are multiplied by BDeEquivSampSize. classificationType - determines what algorithm to use to classify the testing data likelihood - Use standard likelihood to classify the testing data maxLikeMargin - Find the classification boundary in likelihood space that minimizes the RSS of the datapoints (required for comparative scoring and used with scoreType = compLikelihood CPTParamType = compMLE) continuousDataQuantization - Determines which type of quantization to use during the requantization step in the calculate loocv functions. ***When calling quantize dataset directly this flag is ignored*** continuousDataTrendSize - Determines the size of the sliding window trend to use when normalizing the continuous data. Default value is 8. continuousDataZeroMean - If true, all continuous data is is normalized about its mean via sliding window trend. Default is true. CPTParamType - determines how the parameters for the CPT are learnt "MLE" - the maximum likelihood parameters "compMLE" - the ACL parameters for the comparative search. This should be set with the comparativeSearch flag being set to true. This should be set with scoreType = compLikelihood Affected by the CPTParamCompMaxOrMid flag. CPTParamCompMaxOrMid <"max"|"mid"> - Determines which comp likelihood CPT parameter to use "max" - Use the parameters that actually maximize the ACL score "mid" - Use the parameters that increase the ACL score, but that doesnt zero out all CPT elements that correspond to lower seen events for the BN CPTParamShrinkage - Determines if CPT parameters should be shrunk to the CPT values of hierarchically related families. Defaults to false. Uses CPTParamShrinkageDataFolds. ***ASSUMES ALL HIERARCHICALLY RELATED RVs HAVE SAME AIRITY*** CPTParamShrinkageDataFolds - The number of subsets the data should be divided into during training of the shrinkage coefficients. Default is 5. CPTGenAssumeUnknownParentJPDsUniform - default false When generating CPTs, the corresponding JPDs for the node and its parents are also created. In order to create these JPDs, the JPDs of the nodes parents must be received. If there is a temporally cyclic structure in a DBN, the necessary JPDs may not be computable. Set this flag to true to force any node with uncomputed parents to be considered uniform. ***warning: dont do this if you need the JPD for a BN to have relationships with a JPD for a different BN (like in generating the data for two classes that fools BDE but not ACL) dataRandomizationMethod - Indicates the type of method used to create surrogate data 'permuteRows' - Randomly permutes the rows of each dataset independently of each other. This method destroys all temporal correlations (including the colored noise caused by fMRI scanners) but leaves all isochronal correlations intact. 'preserveAutocorrelation' - Randomly permutes fourier phases of fourier transformed data. The permutation is the same for every ROI. Since the power spectrum of a time series is the fourier transformation of the autocorrelation function, and that permuting the phases of the data does not alter the power spectrum, taking the inverse of the permuted coeffcieints creates a sample of the time series with the same autocorrelations. Since the permutation is the same for every RV, the cross correlations are also preserved (but I think only linear correlations). This randomization method is controlled by "preserveCrossCorrelations" flag-- if the flag is false, then cross correlations *are not* preserved. 'pointWise' - Randomly changes the data in a dataset by simply selecting a row and column and swapping that point's value to the value seen in some other randomly select row. This verifies that the new value is not the same as the previous value. Uses flag, dataRandomizationPointWiseCount to tell how many changes to make. Uses flag dataRandomizationPointWiseMinimumRealNode to tell the randomize which is the minimum node index that points to real data (often datasets will have several columns at the start which contain meta data and randomizing them may be pointless or even harmful. As long as they are all at the front, this flag deals with it). This flag is zero-indexed. This randomization method keeps the multinomial distribution of the time series the same. 'reverse' -- Not actually a randomization method. This takes a data series and reverses it. dataRandomizationPointWiseMinimumRealNode - Default 0. See pointWise dataRandomization method. dataRandomizationPointWiseCount - See pointWise dataRanomization method. debugVerifyFlags - If true, verifies that all CPTs are valid during every LOOCV fold Throws a critical error if any are invalid. DBN_DATA_CYCLE - how frequently the data cycles. Has effect of not computing CPT transition probabilities between n and n+1 data elements. Can be thought of as breaking up the data into TOTAL_DATA_SIZE/DBN_DATA_CYCLE blocks. failureSuccessRatio - The ration of failures to success allowed before structures search gives up findBestNBTK - During the calculate leave one out accuracies, test for every possible number-best-to-use value. This will iteratively remove parent-sets from nodes and re-calculate LOOCV accuracies to find the best number-best-to-use value. This will increase the running time, as for every leaveXOut iteration, this must calculate LOOCV accuracies m times, where m is the number of nodes with parents. However, this does not increase the number of structure searches required, so it is better than simply trying multiple tests with numBestToKeep varying (which would duplicate many structure searches). If this is set to true, then the value in numBestToKeep is ignored. The flag highestNBTK is used. After finding the best NBTK value, the flag numBestToKeep is set to this value. However, unless the findBestNBTK flag is set to false, numBestToKeep will be ignored in subsequent searchs--so turn off findBestNBTK if you want to search for numBestToKeep parameter with leave one out cross validation, then use the best value in full data structure search, be sure to turn off findBestNBTK after calculationing leave-one-out cross validation statistics. ***findBestN and findBestNBTK must both be true or false. If one is true, the other is considered true also*** findBestN - During calculate leave one out accuracies, figure out what the best value for the flag n (number of parents per node) is. When true, the n flag will be treated as the largest valid n value, and search for all values of n less than n (not by repeating the search of course, but by removing nodes) ***findBestN and findBestNBTK must both be true or false. If one is true, the other is considered true also*** forceCascadingLinks - See "create random links" for description of this flag. forceLeaveXOutSameClass - When calculating the error, should the leaveXOut datasets be forced to all have the same class. This should be set to true whenever a single series is broken up into individual datasets, and they are all left out together with leaveXOut. forcePrepDataKeepNumericalTags - If true, then the string values that are seen in the data being prepped are assumed to be integers from the range 0-x, where x is the highest number. Having this be true causes the binary data for each value to be set to the string value seen in the data. This will have bad results if the data is not like that. See forcePrepDataKeepNumericalTagsIncrement and forcePrepDataKeepNumericalTagsIgnoreZero flags. forcePrepDataKeepNumericalTagsIncrement - Default 0. See forcePrepDataKeepNumericalTags. This is the amount added to the values seen in the data before they are converted to their binary value in the hash table. This is primariily used to be able to deal with negative numbers so that they map to a positive value. Set this to the greatest negative value the variable takes on. forcePrepDataKeepNumericalTagsIgnoreZero - Default true. Should zero be considered a value value for the variables to take on. If this is true, zero is not valid. All postive values will be mapped to (value + forcePrepDataKeepNumericalTagsIncrement - 1). WARNING: if there are zeroes in the file, they will be treated exactly the same as ones. hierarchicalConstraint - (true/false) Indicates whether the parents for nodes are limited by the results of previous hierarchical structure searches (true), or can be any parents in the same h-level. hierSearchAllowHAncestorParents - (true/false) Indicates whether hierarchical structure search should allow a node to have parents that are the h-ancestors of the node. Normally in hier search, the only parents a node is allowed to have are nodes contained within the same h-level as the child. This relaxes that so it has those nodes, as well as its h-ancestors. highestNBTK - Use in the search for the best NBTK, this is the highest the value of numBestToKeep is used. laplaceSmoothingConst - The amount to add to each CPT element to smooth out the multinomials (and get rid of zero values). The value is added and then the multinomials are renormalized. Setting this to 1 is a common setting. This should be set to zero when shrinkage is used (as it has a uniform CPT in the hierarchy that does this constant type smoothing) leaveXOut - The number of datasets to leave out when attempting to do leave x out cross validation LOOCVListLinksEveryFold - True if the LLOCV should print out all the links found in the datasets for every LOOCV fold. (Usually to test for structural result robustness/sensitive to noise) LOOCVRequantize - True if data should be requantized for every LOOCV fold. Should be done if quantization uses more than just a single data file to calcualte quantazation boundaries. ***ONLY DO THIS if quantization scheme is class independent (i.e., the boundaries for the left out data come from training data). If each dataset is quantized indpendently regardless of its training/testing status, simply quantize all the datasets once at the begining of the experiments.*** likelihoodUsesNBTK - Indicates that calculations of likelihood are to use the value of number best to keep flag and not all the nodes. memoizeCPTSize - The number of CPTs to memoize during calculation of CPTs. Zero indicates that no CPTs will be memoized and that memoization overhead will not be incurred. minimumAcceptableImprovement - (aka minAccImp) This flag name is a misnomer. This value is treated as a constant structural complexity term that is subtracted out from the score for every parent in a node. Thus, the more parents, the more times this value is subtracted from the score. For a penalty that is based on the number of elements in the CPT, use the flag structComplexityPenalty (for MDL like penalties). n - Variable used as a parameter in different searches (see search types for description of its use) numBestToKeep - The number of the best nodes to keep in a search (see search type for description) The value of this flag is ignored if findBestNBTK is true. output1BasedIndex - If true, outputs the indicies to CPTs as 1-based arrays as opposed to 0-based arrays. This is so that it can be cut and pasted into Matlab. preserveCrossCorrelations - If true, the preserveAutoCorrelation data randomization method will also preserve cross correlations between the RVs. randomSeed - The number to seed the random generator with. Setting this flag immediately results in setting the random number generator. Setting this value to 0 will result in the generator being seeded by the current time. searchType - The type of topological search to use nGreedyBest - Search using nGreedyBest algorithm. n indicates the number of parents each node is allowed to have. This supports numBestToKeep. Supports search_omission. This is obfuscated to be flat search with subSearch greedy. hierarchical - Search using hierarchical algorithm. n indicates the number of parents each node is allowed to have. Supports search_omission. Requires subSearch value to be set. Flag hierSearchAllowContainerParents modifies this structure search behavior. Flag hierarchicalConstraint modifies this structure search behavior. hierarchicalIndividual - Search using the individual hierarchical algorithm. This method does not constrain a node to have a possible parent set based on the results of it's h-parent's structure. Instead, the search for each node starts at the top level of the hierarchy, where it is allowed to find a structure. Then, the results of that structure search constrain the parentset for a structure search *for the same child node* at a subsequent level in the hierarchy. After all hierarchical levels are searched (each constrained by the results at a previous h-level of the same child), a final structure search involoving only the parents that were found throughout the different levels' searches is performed to determine the final parent set. ***NOTE*** Does not allow for searched node to already have links--it will delete them during the search **NOTE** flat - Search using all nodes as possible nodes to all other nodes (withing DBN valid structures of course) supports nuMbestToKeep, supports search_omission. scoreMinAcceptable - Determines if the structure search will clear all families with scores less than an an acceptable amount determined by the scoreMinValue flag. The family will also be marked as not being one of the best, so if the flag likelihoodusesnbtk is turned on, the family won't be used in likelihood calculations. scoreMinValue - The cut off point for when to reject a family based on its score. This feature is turned on by the scoreMinAcceptable flag. scoreType - The scoring type to use when searching structure BDE - Use Heckerman's BDE score. *** Actually, this is just the BD score. Depending on priors choosen, it could be BDE. See flag BDEPrior. likelihood - Use straight likelihood compLikelihood - Use comparative likelihood (a.k.a, approximate conditional likelihood (ACL)) CCL - Uses class-conditional likelihood (a non decomposable score!!!) simCoolingRate - The cooling rate for simulated annealing. T = T * simCoolingRate (T = temperatre simDataInfoDistScore - See "infoScalable" CPT distribution setting. 0 = nonuniform, 1 = uniform simDataInfoDistScorePrec - See "infoScalable" CPT distrbution setting. Values less than 1e-6 are not reccommended and may cause the program to infinitely stay in iterative algorithm. simTemperature - The starting temperature for simulated annealing structComplexityPenalty - (aka StrcCompPen) The amount to penalize the score of a family based on its structural complexity. This value is multiplied by the number of elements in the CPT for a family, and that value is subtracted from the node's overall score. If Log(data sequence size)/2 is used for this value, and the score type is likelihood, this yeilds the MDL score (I think). Zero indicates no penalty. This does nothing with a score that doesn't require the CPT to be generated to calculate the score (such as BDe). subSearch - Inidcates the sub-search method to be used by the overal search method random - Simply select n parents randomly from a list, randomly replacing parents with better parents until failureSuccessRation is met. greedy - Select parents greedily (iteratively n times) from the list (note, this isn't actually greedy, it's iterative). simulatedAnnealing - Uses a simulated annealing approach. Requires values in the simTemperature flag and simCoolingRate flag. bestMix - Based on the size of a parentset for each node, perform the appropriate search. <= 16 parents - Use optimal (corresponds to about 2000 possible parent configurations) <= 25 parents - Use simulated annealing (does well with medium sized parent set) otherwise - Use greedy (does best on large parent sets) **NOTE** Exhaustive search does not allow for node to have parents before it searches--it will delete them-- snapshot create - Take a snap shot of the link structure BN given and associate it with the name given restore - Restore a snap shot link structure for the BN given associated with the name given To perform a comparative search, set the following flags: classificationType = maxLikeMargin if using discriminative params, likelihood otherwise** (doing so isn't optimal, still figuring out better method) (**using likelihood classification for any form of compLikelihood is wrong, I don't think maxlikemargin fixes all the issues) (classifier should actually be complikelihood) scoreType = compLikelihood CPTParamType = compMLE (for discriminative params) CPTParamType = MLE (for generative params--I think this is probably better) CPTParamCompMaxOrMid set flag scoreMinAcceptable true \ These two lines prevent families from having negative scores set flag scoreMinValue 0 / which would indicate that the families have stronger correlations in the other family. This should defiantely be turned on if CPTParamType = MLE set flag laplacesmoothingconst 1 (since the score depends on CPTs that are not trained on the data their were calculated from, having zeros in CPT elements in extra-class data will wreck compLikelihood) To perform a normal liklihood search classificationType = likelihood scoreType = likelihood (or BDE) CPTParamType = MLE To create discrete surrogate data 1 - quantize the data with the method used to quantize in original experiement 2 - set flag dataRandomizationMethod preserveAutocorrelation 3 - randomize the data 4 - Quantize the data with 'matchQuantizedPS' method * - Indicates this value can be set to __all__ in which case all appropriate elements are matched. % - Used to deliminate a comment line in the input has the syntax: i1 i2 ... in "endList" where the i's are integers. has the syntax: d1 d2 ... dn "endList" where the d's are doubles.