Physical Algebra

Timber allows the (advanced) user to execute queries by entering the physical plans directly. In this way, a query plan can be optimized to execute more efficiently than a naively generated plan. Some query functionalities that are not currently supported at Timber's XQuery level are suppported at the physical plan level.

A physical plan is composed of a sequence of iterators, listed in a depth-first preorder manner. First the Iterator then its inputs. Each Iterator is on a separate line. A line in the file that starts with a ‘*’ will be ignored (comment line). Start key is the pedigree (unique id) of the node. Iterators are case-sensitive. File name(s) where you will find more info about the Iterator or some of its input is written underneath the Iterator name (column 1).

The NRE is a unique number that gets assigned to a collection of nodes upon retrieval from indices or database. In some iterators, where nodes are being created (e.g. construct), NRE gets assigned to the newly created nodes. Each node in the witness tree will have an NRE. This NRE is used to reference the node.

For example, we have a structural join following two index accesses, in which we want to join book with author.

J,D,3,9,-1,-1,16,A,B

I,3,index_books_elementtag,books.xml,GIST,STR,book

I,9,index_books_elementtag,books.xml,GIST,STR,author

We assign book nodes NRE 3, and author nodes NRE 9. When joining, we reference node with NRE 3 and node with NRE 9, which will join book and author.

Physical Iterators

c, D, d, F, f, G, H, I, i, J, K, k, m, n, O, P, R, r, S, s, T, t, u, U, V, w

Iterator

Calling Format

Parameter Explanation

Index Access

IndexAccess.h

GistIndexAccess.h

I,nre,indexName,filename,indexSource,indexType,value1,value2

NRE: the nre to be assigned to nodes retrieved from the index.

IndexName: name of the index.

Filename: name of the file that the index is built on.

IndexSource: GIST: if the index is a gist index.

HASH: if index is a hash index.

IndexType: INT: the index key is an integer.

FLT: the index key is a float.

STR: the index key is a string.

DBL: the index key is a double.

Value1: the key value you are asking for.

Value2: if you are asking for a range, this is the second value. This is optional.

Scan Access

ScanIterator.h

SelectionCondition.h

S,nre,fileName,rootKey,nodeType,returnType,num1,num2,Left,opr,rightType,right,

…..,num3,start,end,…

NRE: the nre to be assigned to nodes retrieved from the database.

Filename: name of the file to scan.

RootKey: start key of the root element in the file.

NodeType: type of the node to scan for.

ALL_NODES: scan all nodes.

DOCUMENT_NODE: scan doc nodes.

ELEMENT_NODE: scan for element nodes.

ATTRIBUTE_NODE: scan for attribute nodes.

TEXT_NODE: scan for text nodes.

COMMENT_NODE: scan for comment nodes.

ReturnType: type of the node to be returned as part of the result of the scan.

THISNODE: return the node scanned.

PARENTNODE: return the parent of the scanned node.

ELEMENTNODE: return the element node that is closest to the scanned node (if attribute-node, get its parent)

ATTRIBUTENODE: return the attribute node that is closest to the scanned node (if elem-node, get its attrs)

SUBTREE: return the subtree of the scanned node.

Num1: the number of disjunctive conditions you want to apply on scanned nodes. Num2,Left,opr,rightType,right should occur num1number of times.

Num2: the number of conjunctive conditions in each disjunct. Left,opr,rightType,right should occur num2 number of times.

Left: one of the following values: (check SelectionCondition.h for more details) VALUE,LENGTH,XMLFILENAME,DTDFILENAME,NODETAG,

CHILDNUMBER, ATTRIBUTENUMBER, HAS_CHILD, HAS_ATTRIBUTE,ATTRIBUTE_VALUE,ELEMENTCONTENT

If you choose ATTRIBUTE_VALUE then pass here an attribute name.

Opr: one of the following values: EQ,LE,LT,GT,GE,NE: for =,<,<=,>,>=,!=

CONTAIN,CONTAINEDBY,STARTWITH

RightType: STR/INT/FLT

Right: value you want to be checked. if rightType is STR, you can pass <EMPTY> to conmpare against empty content. otherwise, pass a value.

Num3: number of scan ranges: start,end should occur num3 number of times.

The scan will scan the sub-trees contained in these start-end.

Start: the start key of the first node to scan.

End: the end key of the last node to scan.

Structural Join

StackBasedAncs.h

StackBasedDesc.h

J,Algorithm,ancsNRE,descNRE,leftSiblingNRE ,

RightSiblingNRE,expectedDepth,relation,projectWhich,N

Alg: A: if you want results to be sorted by ancs.

D: if you want results to be sorted by desc.

N: if you want a negated join (output ancs that don’t have desc).

a: if you want an ancs-outer join sorted by ancs (ancs is output even if it doesn’t join).

AncsNRE: the nre of the ancs in the ancs input witness tree.

DescNRE: the nre of the desc in the desc input witness tree.

LeftSiblingNRE: the nre of the left sibling in ordered semantics. If you want unordered semantics, Pass –1.

RightSiblingNRE: the nre of the right sibling in ordered semantics. If you want unordered semantics, Pass –1.

ExpectedDepth: the expected depth of the ancs. I.e. how many ancs are likely to be nested under other ancs . If in doubt, pass the depth of the XML document.

Relation: A: ancs-desc relationship between two inputs to the join.

P: parent-child relationship. The ancs should be one level higher than desc.

ProjectWhich: what you want to keep in the output of this join.

B: both ancs and desc are kept.

A: only ancs is kept. Valid when alg = A.

D: only desc is kept. Valid when alg = D.

N: this field is optional. If you pass an “N”, then output of the join will be nested descendants. If you don’t pass it or pass something else, output of join will be ancs-desc pairs. Valid only with stackBased ancestor join.

Filter

FilterIterator.h

FilterCondition.h

f,num1,num2,app,lType,lNRE,lNum,lStr,opr,rType,rNRE, rNum, rStr ,IndexName,FileName…

Num1: is the number of disjunctive conditions. Num2,lType, ….. should occur num1 number of times.

Num2: number of conjunctive conditions in each disjunctive condition. App,lType,lIndex,..

Should occur num2 number of times.

App: is one of the following

E: the tree will pass if EVERY node with nre lNRE matches the condition.

AO: the tree will pass if AT LEAST ONE node with nre lNRE matches the condition.

EO: the tree will pass if EXACTLY ONE node with nre lNRE matches the condition.

EX: the tree will pass if EXACTLY ONE node has the nre specified and it matches the condition.

Note that it is applied to the left part of the condition only.

lType : type of the left value of the predicate.

V: left side is a value.

A: left side is an attr.

T: left side is text directly under node. (child text)

DT: left side is text anywhere under node. (desc text)

L: left side is length.

LCV,LCA,LCT,LCDT,LCL: value/attr/text/descText/length of local child.

LAV,LAA,LAT,LADT,LAL: value/attr/text/descText/length of local ancs.

ACV,ACA,ACT,ACDT,ACL: value/attr/text/descText/length of actual child.

AAV,AAA,AAT,AADT,AAL: value/attr/text/descText/length of actual ancs.

SK: left side is a start key.

EK: left side is an end key.

LEV: left side is a level.

C: left side is a const.

LF: left side is local fan-out.

AF: left side is actual fan-out.

LD: left side is local sub-tree depth.

AD: left side is actual sub-tree depth.

CI: left side is a constant that we are getting from an Iterator.

lNRE : nre of the left node in the left input witness tree. This node we will be using its value, attr, text… etc. depending on lType.

lNum: is the child number if lType is one of the following:

LCV,LCA,LCT,LCDT,LCL,ACV,ACA,ACT,ACDT,ACL.

Or it is how many levels up you would like to go to get an ancestor if lType is one of the following:

LAV,LAA,LAT,LADT,LAL, AAV,AAA,AAT,AADT,AAL.

If lType = C and opr uses number comparison then lNum holds the number to be compared against.

lStr: is attrName if lType is one of the following:

A, LCA,LAA,ACA,AAA. Or if lType is V and lNRE returns an attribute node. Or if lType is C and operation uses string comparison, then this will hold the constant value to be compared against. if you want to compare against an empty string, pass <EMPTY> here.

opr:EQN,NEN,LTN,LEN,GTN,GEN:=,!=,<,<=,>,>= using number comparison.

EQS,NES,LTS,LES,GTS,GES:=,!=,<,<=,>,>= using string comparison.

S: left start with right. Strings.

B: left is at the beginning of right. Strings.

C: left contains right.

CD: left contained in right.

NS: left does not start with right.

NC: left does not contain right.

rType, rNRE,rNum ,… are exactly like the above except they are for the right side of the condition.

IndexName: if you want to use a join index to evaluate filter, pass the name of the index here. otherwise, pass NULL.

FileName: if you pass an indexName, you have to pass the xml file name here. otherwise, pass NULL.

Sort

SortIterator.h

ExternalSortIterator.h

R,exp.Size,sortByWhat,num,nre,order ,whereEmptyGoes,…,X

,order,X

ExpectedSize: is the number of trees we are getting as input.

SortByWhat:is

K: to sort by start key

S: to sort by witness tree score.

T: to sort by the whole tree (start keys).

If sorting by start key, give the following parameters:

Num: number of nodes you want to sort with. nre,order.whereEmptyGoes should occur num number of times.

nre : nre of node in input witness tree you want to be sorted with.

Order: A: ascending order.

D: descending order.

WhereEmptyGoes: where the empty nodes go. So if you provide an nre and it returns no nodes, this tree will be either put at the beginning or the end of the set of trees to be sorted. It takes one of two values:

B: the empty ones will go at the beginning.

E: the empty ones will be placed at the end.

If sorting by score or tree, give the following parameter:

Order: A: ascending order.

D: descending order.

X: this field is optional. If you pass an “X”, external sort will be used. If you don’t pass anything or pass something else, in-memory quick sort will be used.

Value Sort

ValueSortIterator.h

ExternalValueeSortIterator.h

r,exp.Size,num,sortBy,NRE,order,attrName ,whereEmptyGoes…,X

ExpectedSize: is the number of trees we are getting as input.

Num: number of nodes you want to sort with. sortBy , nre,order,attrName,whereEmptyGoes should occur num number of times.

SortBy: AN: sort by attribute value. Use number comparison.

AS: sort by attribute value. Use string comparison.

TN: sort by text. Use number comparison.

TS: sort by text. Use string comparison.

TG: sort by tag name.

VN: sort by value. Value is text if nre is for a text

node . It is tagname if nre is for an element node. It is attribute value if the nre belongs to an attribute node. In the last case attrName should be provided. Use number comparison.

VS: same as VN except use string comparison.

SK: sort by start key.

NRE: NRE of element that you want to sort by its text or attr.

Order: A: ascending.

D: descending.

AttrName: name of attribute to be sorted by in the case of AN, AS, NV and VS. Pass NULL if

You want to sort by TN or TS.

WhereEmptyGoes: where the empty nodes go. So if you provide an nre and it returns no nodes or you give an attribute name that is not in there, this tree will be either put at the beginning or the end of the set of trees to be sorted. It takes one of two values:

B: the empty ones will go at the beginning.

E: the empty ones will be placed at the end.

X: this field is optional. If you pass an “X”, external sort will be used. If you don’t pass anything or pass something else, in-memory quick sort will be used.

Set Ops.

UnionIterator.h – IntersectionIterator.h – DifferenceIterator.h

T,whichOp,length

WhichOp: U: union

I: intersection.

D: difference.

Length: starting from root, how many nodes you want to compare. Optional. If not specified, deep equality is used.

Group by

GroupByIterator.h

G,exp.Sz,groupByWhat,GroupByNRE,GroupByAttrName,keepTrees,Num,opr,

oprOnWhat,attrName,NRE,opNRE ….rootNRE,valueNRE,sort

ExpectedSize: is the number of trees we are getting as input.

GroupByWhat: AN: attribute. Use number comparison.

AS: attribute. Use string comparison.

TN: text. Use number comparison.

TS: text. Use string comparison.

SK: start key.

VN: value. if groupBy NRE is an element, then we group by tag. if groupBy NRE is an attr, then we group by attribute value of attribute groupByAttrName. if groupBy NRE is text, then we group by value of text. Use number comparison.

VS: value. if groupBy NRE is an element, then we group by tag. if groupBy NRE is an attr, then we group by attribute value of attribute groupByAttrName. if groupBy NRE is text, then we group by value of text. Use string comparison.

GroupByNRE: the NRE of the element you want to group by its attr, text or start key.

groupbyAttrName : is the attribute name of the node if you are grouping by attribute value. Otherwise, pass NULL.

KeepTrees: 0: do not keep the input trees. Just pass the grouped by value.

1: keep the input trees and pass the grouped by value.

Num: number of operations to be performed on the grouped trees. Opr, oprOnWhat, attrName,nre,opNRE should occur num number of times.

Opr: Operation to be performed on the grouped trees.

C: count

A: avg

S: sum

M: max

m: min

N: none

OprOnWhat: On what you want to perform the operation.

A: attribute. Number comparison.

LA: local attr. not from DB. number omparison.

T: text. Number comparison.

LT: local text. not from DB. Number comparison.

LF: local fanout.

AF: actual fanout.

LD: local depth.

AD: actual depth.

V: value. if NRE is an element, then we perform operation on tag. if NRE is an attr, then we perform operation onattribute value of attribute AttrName. if NRE is text, then we perform operation on value of text. Use number comparison.

AttrName: name of attribute if AN or AS. NULL otherwise.

NRE: NRE of node in input trees that you want to perform operation on its text, attr, fanout or depth.

opNRE : the nre to be assigned to the node holding the operation result value.

rootNRE : the NRE to be assigned to the dummy root added to the group by tree.

valueNRE : the NRE to be assigned to the dummy node holding the group by value (it is the first child of the root).

Sort: 0/1. if you want the groupby to sort by grouped by value, pass 1. if results are already sorted by grouped by value, pass 0.

Function

FunctionIterator.h

F,num,opr,oprOnWhat,attrName,NRE,opNRE ….treeLevel

Num: number of operations to be performed on input trees. Opr, oprOnWhat, attrName,nre,opNRE should occur num number of times.

Opr: C: count

A: avg

S: sum

M: max

m: min

OprOnWhat: A: attribute. Number comparison.

LA: local attr. not from DB. number comparison.

T: text. Number comparison.

LT: local text. not from DB. Number comparison.

LF: local fanout.

AF: actual fanout.

LD: local depth.

AD: actual depth.

AttrName: name of attribute if AN or AS. NULL otherwise.

NRE: NRE of node in input trees that you want to perform operation on its text, attr, fanout or depth.

opNRE : the nre to be assigned to the node holding the operation result value.

treeLevel : if you want the operations to be performed on the tree level pass 1. on the input level pass 0. The difference is that in the case of the input level, the operations are performed across all input trees and one output tree is generated holding the operation result values. In the tree level case, the operations are performed on every tree and the result nodes are appended to the result.

Value join

ValueJoinIterator.h

j,rootNRE,leftNRE,leftTag,rightNRE,rightTag,expSize,attrNameLeft,attrNameRight,

opr,joinWhatLeft,joinWhatRight,sortedInput,N,O,A

RootNRE: the nre to be assigned to the dummy root added to the joined tree.

LeftNRE: NRE of the node in the left input trees that will be checked for join.

LeftTag: the tag of the element in the left input tree to perform value join on. If you know its NRE, pass NULL here and pass the NRE in leftNRE. If leftTag is not NULL, leftNRE is ignored.

RightNRE: NRE of the node in the right input trees that will be checked for join.

RightTag: the tag of the element in the right input tree to perform value join on. If you know its NRE, pass NULL here and pass the NRE in rightNRE. If rightTag is not NULL, rightNRE is ignored.

Exp. Size: expected number of trees from the right input (join will block on right input).

AttrNameLeft: if joining made based on an attr value, then put attr Name of the left input here. NULL otherwise.

AttrNameRight: if joining made based on an attr value, then put attr Name of the right input here. NULL otherwise.

Operator: EQN: equality join. Use number comparison

NEN: non-equality join. Use number comparison

LTN: if left is < right, join. Use number comparison

LEN: if left is <= right, join. Use number comparison

GTN: if left is > right, join. Use number comparison

GEN: if left is >= right, join. Use number comparison

EQS: equality join. Use string comparison

NES: non-equality join. Use string comparison

LTS: if left is < right, join. Use string comparison

LES: if left is <= right, join. Use string comparison

GTS: if left is > right, join. Use string comparison

GES: if left is >= right, join. Use string comparison

C: if left CONTAINS right. String comparison.

CB: if left CONTAINED BY right. String comparison.

JoinWhatLeft: A: join by attr.

T: join by text.

DT: join by text not directly underneath node. but even desc text.

S: join by start key.

E: join be end key.

L: join by level.

N: join by nothing. Cartesian product.

TR: join by subtree rooted at leftNRE. Tags, content, and attributes are to be compared.

TNR: join by subtree rooted at leftNRE. Tags, content, and attributes are to be compared except the root tag.

V: value of the node leftNRE if it is an element node, then tag name is used . if it is an attr node, then attribute attrNameLeft is used. If it is a text node, then text is used.

JoinWhatRight: A: join by attr.

T: join by text..

DT: join by text not directly underneath node. but even desc text.

S: join by start key.

E: join by end key.

L: join by level.

N: join by nothing. Cartesian product.

TR: join by subtree rooted at rightNRE. Tags, content, and attributes are to be compared.

TNR: join by subtree rooted at rightNRE. Tags, content, and attributes are to be compared except the root tag.

V: value of the node rightNRE if it is an element node, then tag name is used . if it is an attr node, then attribute attrNameRight is used. If it is a text node, then text is used.

SortedInput: 0: if the input is not sorted by the join value. (nested-loops join is performed)

1: if the input is already sorted by join value. (sort-merge join is performed)

IndexName: if you want to use a join index to evaluate this join, pass the name of the index here. otherwise, pass NULL.

FileName: if you pass an indexName, you have to pass the xml file name here. otherwise, pass NULL.

N: this parameter is optional. If you pass N, then the output of the join will be nested by the left input. I.e. each left input tree will have all joining right input trees with it in the output tree.

O: this parameter is optional. If you pass O, then the join will be left outer join. I.e. each left input tree will be output whether or not it joined with a right input tree..

A: this parameter is optional. if you pass it, at least one of all the nodes with NRE leftNRE will need to match at least one of the nodes with NRE rightNRE. This works only with nested-loops join.

Index Hash Nested Value Join

IndexHashValueJoinIterator.h

H,expSize,indexName,filename,leftNRE,rightNRE,rootNRE,O,A

Exp. Size: expected number of trees from the right input (join will block on right input).

IndexName:The name of the value join index. it is the index to which we will be submitting start keys of node with leftNRE in input trees and get right nodes

FileName: The name of the XML file where right trees come from.

LeftNRE: NRE of the node in the left input trees that will be checked for join.

RightNRE: NRE of the node in the right input trees that will be checked for join.

RootNRE: the nre to be assigned to the dummy root added to the joined tree.

O: this parameter is optional. If you pass O, then the join will be left outer join. I.e. each left input tree will be output whether or not it joined with a right input tree..

A: this parameter is optional. if you pass it, at least one of all the nodes with NRE leftNRE will need to match at least one of the nodes with NRE rightNRE

One Sided Value Join

OneSidedValueJoinIterator.h

O,indexName,filename,leftNRE,assignedRightNRE,rootNRE,N,O

IndexName:The name of the value join index. it is the index to which we will be submitting start keys of node with leftNRE in input trees and get right nodes

FileName: The name of the XML file where right trees come from.

LeftNRE: NRE of the node in the left input trees that will be checked for join.

assignedRightNRE: NRE to be assigned to the nodes read from the index.

RootNRE: the nre to be assigned to the dummy root added to the joined tree.

O: this parameter is optional. If you pass O, then the join will be left outer join. I.e. each left input tree will be output whether or not it joined with a right input tree.

Child Chooser

ChildChooserIterator.h

c,whichChild,num,parentNRE...

WhichChild: which child you want to pass. 1: first child. 2:second child,…. –1:last child. So this Iterator expects input to have ancs-desc pairs sorted by ancs. And it will pass for each ancs, the i^th occurrence.

num: number of parents you want to pass their children. parentNRE should occur num number of times.

ParentNRE: the NRE of the ancs in the input tree.

What this iterator does is for a group of ParentNREs, it will pass the whichChild occurance in the input stream. input should be sorted on parentNREs key.

Duplicate Eliminator

DuplicateEliminatorIterator.h

D,byWhat,NRE,exp.Size,attrName,sort

ByWhat: SK: eliminate trees with duplicate start key (id) of a specific node.

TR: eliminate duplicate trees based on start keys. (whole tree is checked)

TS: eliminate trees with duplicate text values. String comparison.

TN: eliminate trees with duplicate text values. Number comparison.

AS: eliminate trees with duplicate attr values. String comparison.

AN: eliminate trees with duplicate attr values. Number comparison.

NRE: nre of the node to eliminate duplicates by. If eliminating duplicates by tree, this is ignored. If by start key, then the start key of this nre is used in eliminating duplicates. If by text, the text of the node with this nre is used. If by attr, then the attribute of this node is used.

ExpectedSize: is the number of trees we are getting as input. Pass whatever if sorting is turned off.

AttrName: if byWhat =AS or AN, then this is the name of the attribute that you want to eliminate duplicates of.

Sort: 0: if you do not want it to sort input. (it is already sorted).

1: if you want the Iterator to sort input before eliminating duplicates.

X: this field is optional. If you pass an “X” and sort=1, external sort will be used. If you don’t pass anything or pass something else and sort=1, in-memory quick sort will be used.

Phrase Finder

PhraseFinderIterator.h

n,NRE,indexName,filename,phrase

NRE: the nre to be assigned to the output nodes.

IndexName: is the name of the index to look up when searching for the words in the sentence. It should be some kind of an inverted index or so.

Filename: name of the file the index is built on.

Phrase: this is the phrase you want to find in the document. You can have “*” anywhere in the phrase to represent wildcards. For example, phrase: “hello ** world”

Will match “hello bad ugly world”, “hello blah blah world” but not “hello world” or “hello blah world”. Also, if you want to make sure that hello is the second word in the text, you should do “* hello world” third :”** hello world”.

Construct

Iterator

ConstructIterator.h

s,n1,lev,type,src,NRE,str1

str1,str2

str1,rootNRE,n2,ns,lev,rel,getWhat,attrName….

rootNRE,n2,ns,lev,rel,getWhat,attrName….

rootNRE,0,rootGetWhat,attrName

n1:is the number of nodes in the output tree (attribute is a node, element is a node and content is a node). Lev,type,src,… are repeated n1 number of times.

Lev: is the level of the node in the output tree (attr is 1 level deeper than its element)

Type: is the type of the node in the output tree:

E: element node.

A: attribute node.

C: content node.

OA: optional attribute. just like attribute except that if it doesn't have a value, it will not apper in teh output.

Src: source of data for this output node.

C: constant. I.e. data to be placed there should be specified now.

R: reference. I.e. a pattern tree is used with a reference root (exp. $a).

NRE: is the nre to be assigned to extra nodes. If there are nodes to be retrieved from the database, then this nre will be assigned to them in case type = C. if type = E or A, then this nre is assigned to the new node formed whether it is an element or attr node.

Now, there are 4 different information formats you need to provide.

1- If src is constant.

Str1: should be the constant that will be the element name if the type is element or

Text if the type is content.

Str1,str2: str1 should be the attribute name and sr2 is attribute value if type is attribute node.

2- If src is reference.

str1,rootNRE,n2,ns,lev,rel,getWhat,attrName…. If type is attribute.

str1: is the name of the attribute

RootNRE: is the nre of the root of the pattern tree node in the input witness trees.

n2: is the number of nodes in the pattern tree. Level, relation, getWhat should be repeated num-1 number of times. Level,relation,getWhat will describe nodes of the pattern tree. The nodes will be listed in a depth-first pre-order manner.

Ns: node source: if I, then index accesses are used to retrieve nodes. If W, then the witness tree is scanned for nodes. . If ns = I then you should pass in the following lines a bunch of index access. If ns=W, then after lev an extra field is required, tagName.

Level: the level of the node in the pattern tree.

tagName : pass this field only if ns = W. otherwise, don’t pass anything here.

Relation: the relationship with its parent in the pattern either A: ancs-desc or P: parent-child.

GetWhat: is what you want to get exactly for this node. It is ignored for internal nodes. Works only for leaf nodes. It can be one of the following:

S: gets the subtree rooted at this node.

T: gets the text of the node.

I: gets the node itself only.

V: gets value of the node. If it is an element node, the tag name is returned. If it is an attribute node, the attribute value of attrName is returned. If it is a text node, the text is returned. If it is a document node, the document name is returned.

A: gets an attribute of the node. In this case, you need to add a field to

the string just after ,A, which is the attr name that you want returned.

L: gets the local subtree of the node. The difference between this option and the ‘S’ option is that the ‘S’ will get you the subtree of the node from the database. While the ‘L’ option gets it from the witness tree.

attrName : is the name of the attribute to be retrieved if getWhat = A or getWhat=V. In the case of the V, if the node is not an attribute node, pass NULL. otherwise, don’t pass anything here.

NOTE: if ns = I, then the following num-1 lines to this line should be index accesses or scan lines. Each line corresponds to a pattern tree node in a depth-first pre-order traversal.

rootNRE,n2,ns,lev,tagName,rel,getWhat,attrName…. If type is element or content.

RootNRE: is the nre of the root of the pattern tree node in the input witness trees.

Level: the level of the node in the pattern tree.

tagName : pass this field only if ns = W. otherwise, don’t pass anything here.

Relation: the relationship with its parent in the pattern either A: ancs-desc or P: parent-child.

GetWhat: is what you want to get exactly for this node. It is ignored for internal nodes. Works only for leaf nodes. It can be one of the following:

S: gets the subtree rooted at this node.

T: gets the text of the node.

I: gets the node itself only.

A: gets an attribute of the node. In this case, you need to add a field to

the string just after ,A, which is the attr name that you want returned.

attrName : is the name of the attribute to be retrieved if getWhat = A or getWhat=V. In the case of the V, if the node is not an attribute node, pass NULL. otherwise, don’t pass anything here.

NOTE: if ns = I, then the following num-1 lines to this line should be index accesses or scan lines. Each line corresponds to a pattern tree node in a depth-first pre-order traversal.

rootNRE,0,rootGetWhat,attrName. This is a special case of the previous case. This is when you want to output the root or information regarding the root.

RootNRE: is the nre of the node in the input witness trees that you want to retrieve its data.

RootGetWhat: is what you want to get exactly for this node. It can be one of the following:

S: gets the subtree rooted at this node.

T: gets the text of the node.

I: gets the node itself only.

A: gets an attribute of the node. In this case, you need to add a field to

the string just after ,A, which is the attr name that you want returned.

attrName : is the name of the attribute to be retrieved if getWhat = A or getWhat=V. In the case of the V, if the node is not an attribute node, pass NULL. otherwise, don’t pass anything here.

Stack-based Term Join

TmpFile.h

t,NRE,indexName,filename,phrase,function1,function2,function3,depth,simpleScore,

keywordset,meetAlg,parentIndexName

NRE: the nre to be assigned to the output nodes.

IndexName: is the name of the index to look up when searching for the words in the sentence. It should be some kind of an inverted index or so.

Filename: name of the file the index is built on.

Phrase: this is the phrase you want to find its word in the document. And common ancestors of these words will be returned.

Function1: score function. The number you pass here is the number of the function in a switch statement. Basically, you write your function, add a case with a number to the first switch statement in the EVALUATION_OP_SBTERMJOIN case. The function should take in a char * and return a double.

Function2: ancestor depth threshold function. The number is the same as above but to the second switch statement. The function you use should take in Keytype and int and return an integer, which is the number of level to go up in the document. If –1 is returned the function does nothing.

Function3: ancestor qualifies function. The number is like the above except that it is the third switch statement in the EVALUATION_OP_SBTERMJOIN case. The function should take a double in and returns a true or false.

Depth: is the expected depth of the document.

SimpleScore: pass a 0 or 1 value here. 0 if you want the complex score to be calculated for each ancestor or 1 if you want a simple score that counts the number of occurrences of each term per ancs.

Keywordset:Cong stuff. Just pass 0.

MeetAlg: pass 0 or 1. 0 if you want the stack based alg to be performed. 1 if you want the meet alg to be performed.

ParentIndexName: optional. If passed the parent index is used in getting ancestors. Otherwise, accessing the database.

Navigational Get Relative

NavigationalGetRelative.h

V,nre,relation,num,nre2

NRE: nre of the node you want to get its relative.

Relation: one of the following:

1A: gets a specific ancestor.

1C: gets a specific child.

AAO: gets all ancestors and outputs them one at a time.

ADO : gets all descendants and outputs them one at a time.

ACO: gets all children and outputs them one at a time.

AAOI: gets all ancestors and outputs them one at a time.

Including the node itself.

ADOI: gets all descendants and outputs them one at a time. Including the node itself.

ACOI: gets all children and outputs them one at a time. Including the node itself

AAT: gets all ancestors and outputs them all at once in the same output tree.

ADT: gets all descendants and outputs them all at once in the same output tree.

ACT: gets all children and outputs them all at once in the same output tree.

AT: gets all text nodes under node index.

CT: gets children text nodes under node index.

A: gets attribute node of node index.

Num: is used only when relation is one of the following:

1A: num is used to specify which ancestor is returned in terms of levels higher. So if num=1, the immediate parent is returned. Num=2, the grandparent is returned and so on.

1C: num is used to specify which child to be returned. Num=0, first child is returned, num =1 second child,… num=-1, last child is returned.

NRE2: nre to be assigned to the retrieved relative(s).

FileWriter

FileWriterIterator.h

w,filename

Filename: the name of the file that you want your query results to be written to. This cannot be a leaf node. Place it anywhere in the query plan except at leaves.

FileReader

FileReaderIterator.h

d,filename

Filename: the name of the file that you want to read from. This must be a leaf node. You need to write query results using FileWriter before using fileReader to read them.

Pick Iterator

PickIterator.h

K,function1,function2,depth,sort

Function1: worth outputting function. This function determines whether or not a node is worth outputting. The number you pass here is the number of the function in a switch statement. Basically, you write your function, add a case with a number to the first switch statement in the EVALUATION_OP_PICK case. The function should take in a witnessTree * and return a bool.

Function2: percentage function. Returns the percentage of children of a node that if these children are worth outputting, the parent is output instead. The number is the same as above but to the second switch statement. The function you use should take in nothing and return a float.

Depth: is the expected depth of the document.

Sort: if 1, the input will be sorted in the Pick iterator. If 0, the input is assumed to be sorted by start key.

Data Instantiation

DataInstatiationIterator.h

i,index,scope,cutDepth,getElement,getAttr,getContent

Index: the index of the node in the witness tree that you want to instantiate its subtree.

Scope: I: get node itself

S: get whole subtree

C: get whole subtree until a given depth

CutDepth: applicable if scope = C. pass the depth to which you want the subtree to be instatiated.

GetElement: if 0, no element nodes in the subtree are returned. If 1, element nodes are returned.

GetAttr: if 0, no attr nodes in the subtree are returned. If 1, attr nodes are returned.

GetContent: if 0, no content nodes in the subtree are returned. If 1, content nodes are returned.

SortStopK

SortStopKIterator.h

k,K,inSize

K: number of outputs. This iterator returns top k scoring inputs.

inSize : number of inputs to be read by this iterator. Among these inputs, top k scoring ones will be output in order of descending score.

Projection

ProjectionIterator.h

P,num,NRE,..,preserveNodeOrder

Num: number of NRE you want to keep in the tree.

NRE: NRE of nodes to be kept. NRE should be repeated num number of times.

PreserveNodeOrder: if 1: nodes are kept in the output tree in their original order.

0: nodes are kept in the output tree in the order of the input NRE.

Universal Quantification

u,rootNRE,num,relation ,…,compareWhat,attrName,operation,number,str

rootNRE : the nre of the node you want to perform universal quantification on. For example, if you want to get books that have all authors affiliated with university of Michigan . Then, nre of book in the input trees should be passed here.

num: is the number of nodes in the pattern. So, to follow up on the previous example. the pattern is author with child/desc affiliation. So that gives us 2 for num. the following num lines in the evaluation plan should be index accesses to pattern nodes. In our example, the following 2 lines to the universal quantification line should be index access to author and then index access to affiliation.

relation: relation is repeated num number of times. For each node in the pattern, the corresponding relation is either P or A (parent-child or ancs-desc). This tells us the relation of the node with what comes before it in the pattern. For the root of the pattern, relation is its relationship with the rootNRE node. To follow up on our example, you should pass 2 relations, one is for book-author and the other is for author-affiliation.

compareWhat: it is one of two values:

T: if you want text of the leaf of the pattern to be checked.

A: if you want an attribute of the leaf of the pattern to be checked.

In our example, we want to check the affiliation text and make sure it is equal to university of Michigan .

attrName : if you use the A option in the previous parameter, then you should pass an attribute name. Otherwise, pass NULL.

operation:EQN,NEN,LTN,LEN,GTN,GEN:=,!=,<,<=,>,>= using number comparison.

EQS,NES,LTS,LES,GTS,GES:=,!=,<,<=,>,>= using string comparison.

C: contains. String comparison.

S: starts with. String comparison.

In our example, we should use EQS because we want the affilation text be equal to “ university of Michigan ”.

number: if you use any of the number operations, then pass a number to compare against. Otherwise, pass -1.

str : if you use any of the string operations, then pass a string. Otherwise, pass NULL. In our example, pass “University of Michigan” here.

MLCA (Meaningful Lowest Common Ancestor)

meaningfulClosestCommon AncestorStructure.h

m,num,NRE1,NRE2,...,NREnum,assignedNREforRoot,expectedSize,expectedDepth

Num: number of leaf nodes you want to keep in the MLCAS (Meaningful Lowest Common Ancestor Structure)

NRE_i: NRE value of the i_th leaf node

assignedNREforRoot: assigned NRE for the root node of each MLCAS

expectedSize : the max of the estimated size of all the inputs. Enter -1 if unknown.

expectedDepth : the depth of the lowest nodes from the inputs. Enter -1 if unknown

Update

UpdatesIterator.h

U,ModTNode,ElementNRE,CONST,constantValue
U,ModTNode,ElementNRE,REF,ElementRefNRE
U,ModTNode,ElementNRE,REF,DummyNodeNRE
U,ModTNode,ElementNRE,REF,AttributeNRE,AttributeName

Modifying a text node:

ModTNode will modify the text value of the element with NRE=ElementNRE.
Only one element node per witness tree should have an NRE=ElementNRE,
otherwise an error will be issued, and the transaction will be aborted. The new text value will become the last child of the element (and all previous text nodes for this element will be removed). When using the REF option, the NREs for ElementRefNRE, DummyNodeNRE, and AttributeNRE should also have only one node with this NRE in the witness tree. If this is not the case, the query will be carried out (by assigning the first of such values as the new text value for the element node with NRE=ElementNRE) but a warning will be issued.

Note: all subsequent update operators have these same restrictions. The first NRE, which specifies the variable to modify must be a singleton (only one in the witness tree), all subsequent NREs may be mapped to more than one node in a given witness tree, but only the first of such values will be used for the update.

...,CONST,constantValue: constantValue will be the new text value.
...,REF,ElementRefNRE: the text value for the element with NRE = ElementRefNRE will be the new text value.
...,REF,DummyNodeNRE: the value contained in the "Dummy Node" (e.g.
generated by the Function Iterator) with NRE=DummyNodeNRE is the new text
value.
...,REF,AttributeNRE,AttributeName: the value associated with the
attribute AttributeName is the new text value (an attribute node stores
all of the attribute name/value pairs associated with a given element, so
AttributeNRE by itself would specify a group of attributes).

U,ModANode,AttributeNRE,AttributeName,CONST,constantValue
U,ModANode,AttributeNRE,AttributeName,REF,ElementRefNRE
U,ModANode,AttributeNRE,AttributeName,REF,DummyNodeNRE
U,ModANode,AttributeNRE,AttributeName,REF,AttributeRefNRE,AttributeName

Modifying an attribute value:
ModANode will modify the value associated with AttributeName (for the
attribute node with NRE=AttributeNRE).
As with ModTNode, both CONST and REF values can be assigned to the
attribute value, and the descriptions of these options are the same as for
ModTNode. If the attribute AttributeName does not exist, an attribute
value pair is not inserted.

U,DelTNode,ElementNodeNRE

Deleting text node(s):

DelTNode deletes all of the text content from the element node specified
by ElementNodeNRE.

U,DelAttrValue,AttributeNRE,AttributeName

Deleting an attribute value:

DelAttrValue deletes the attribute value associated with AttributeName,
from the attribute node with NRE=AttributeNRE.

U,DelTree,ElementNRE

Deleting a subtree (rooted at an element node):

DelTree deletes the subtree rooted at the element with NRE=ElementNRE.
All descendent elements are deleted, as well as their associated text and
attributes.

U,InsAttr,ElementNRE,AttributeName,CONST,constantValue
U,InsAttr,ElementNRE,AttributeName,REF,ElementRefNRE
U,InsAttr,ElementNRE,AttributeName,REF,DummyNodeNRE
U,InsAttr,ElementNRE,AttributeName,REF,AttributeRefNRE,AttributeName

Inserting an attribute:

InsAttr inserts an attribute (and possibly an associated value) as the
last attribute of the element with NRE=ElementNRE.
The value of the attribute is given as either a constant value, or a
reference (in the same way as ModTNode).

Note: 'U,InsAttr,ElementNRE,AttributeName,<NOVAL>' is used to insert an
attribute without a corresponding value.

A. Inserting an element node with no associated text/content:
U,InsENode,ElementNRE,newChildElementTag,<NOVAL>

B. Inserting an element node with a constant value='val' for the child text node:
U,InsENode,ElementNRE,newChildElementTag,CONST,val

C. Inserting an element node with a reference value for the child text node
(note: X is not used, but must be included):
U,InsENode,ElementNRE,newChildElementTag,REF,ElementNodeNRE,X
U,InsENode,ElementNRE,newChildElementTag,REF,DummyNodeNRE,X
U,InsENode,ElementNRE,newChildElementTag,REF,AttributeNRE,AttributeName

nsENode inserts a new element as a child of the element having
NRE=ElementNRE, with (optional) text and attribute values.

We can include arguments for attribute insertion after any of the
five lines in 'A', 'B', and 'C',

These attribute arguments are given as 0 or more repetitions of the
following five patterns (where X is not used, but must be included):
,AttributeName,<NOVAL>
,AttributeName,CONST,constantValue
,AttributeName,REF,ElementRefNRE,X
,AttributeName,REF,DummyNodeNRE,X
,AttributeName,REF,AttributeRefNRE,AttributeName

Last Update: 07/21/2004