Home :: Contact :: Sitemap ::

 

:: XDG: An XML Document Generator

Overview: Abstract :: Download :: Contact
 

Abstract

The XDG is a scalable document generator for syntethic XML data sets. It allows to generate documents with specific parameters, i.e. the number of nodes, the document depth, the fan-out of each element, and the number of different tag names. It can, for example, be used to specifically evaluate various access patterns for XPath queries.

Conceptually the generator recursively creates as many child nodes as defined by the parameter ``Fan-out''. When the depth of the recursive calls reaches the specified parameter value ``Depth'', no recursive calls are executed any more. The frequency of occurrences of tag names decreases by a factor 2 for each subsequent tag name. For example, the value ``C'' for parameter #Nodes means that the tag names A, B, and C are used in the document where every second node gets tag name A, every fourth node gets tag name B, and so on. To get up to 100%, nodes with tag name A are generated. The tool generates new nodes until the limit for the number of nodes (#Nodes) is reached.

XDG generates elements with names A, B, C, ... , T
and   attributes with names a, b, c, ... , t
where every second element is A,
      every fourth element is B,
      every eight element is C and so on.
and   attribute a contains "0", or "1"
      attribute b contains "0", "1" , "2", or "3"
      and so on.

Additionally, everything is wrapped into a <DOC> element, and every element other than <DOC> has an attribute id which contains a number identical to the preorder number (dmin). Numbering starts with 1. Text nodes occur only at leaf nodes. Every text node contains a string starting and ending with '!'. In between, we find attribute-name, attribute-value pairs for all attributes of the according leaf element, also separated by '!'.

An example document, generated by ./xdg -F -n 10 -d 3 -f 3 is:
<DOC>
  <A id="1" a="1" b="1" c="1" d="1">
    <B id="2" a="0" b="2" c="2" d="2">
      <A id="3" a="1" b="3" c="3" d="3">
      !id3!a1!b3!c3!d3!
      </A>
      <C id="4" a="0" b="0" c="4" d="4">
      !id4!a0!b0!c4!d4!
      </C>
      <A id="5" a="1" b="1" c="5" d="5">
      !id5!a1!b1!c5!d5!
      </A>
    </B>
    <B id="6" a="0" b="2" c="6" d="6">
      <A id="7" a="1" b="3" c="7" d="7">
      !id7!a1!b3!c7!d7!
      </A>
      <D id="8" a="0" b="0" c="0" d="8">
      !id8!a0!b0!c0!d8!
      </D>
      <A id="9" a="1" b="1" c="1" d="9">
      !id9!a1!b1!c1!d9!
      </A>
    </B>
    <B id="10" a="0" b="2" c="2" d="10">
    </B>
  </A>
</DOC>
 

Download

XDG 1.0 is available under the GNU Lesser General Public License (see License Agreement).

A tar-ball of the source code can be found here. A statically compiled binary for Linux is also available here.  

Contact, Questions and Bug Reports

If you have questions related to the Natix installation, general comments and suggestions, or bug reports, feel free to join the Natix mailing list at http://pi3.informatik.uni-mannheim.de/mailman/listinfo/natixusers