Open Data Description Language (OpenDDL)

A general-purpose, human-readable, and strongly-typed data language for information exchange.

structure

data-type name { data-list } [ integer-literal ] name { data-array-list } identifier name ( property , ) { structure }

An OpenDDL file contains a sequence of structures that follow the production rule shown here.

Introduction

The Open Data Description Language (OpenDDL) is a generic text-based language that is designed to store arbitrary data in a concise human-readable format. It can be used as a means for easily exchanging information among many programs or simply as a method for storing a program's data in an editable format. One thing that sets OpenDDL apart from other data languages is the fact that each unit of data in an OpenDDL file has an explicitly specified type. This eliminates ambiguity and fragile inferencing practices that can impact the integrity of the data. This strong typing is further supported by the specification of an exact number of bits required to store numerical data values when converted to a binary representation.

The data structures in an OpenDDL file are organized as a collection of trees (also known as a forest). The language includes a built-in mechanism for making references from one data structure to any other data structure, effectively allowing the contents of a file to take the form of a directed graph.

As a foundation for higher-level data formats, OpenDDL is intended to be minimalistic. It assigns no meaning whatsoever to any data beyond its hierarchical organization, and it imposes no restrictions on the composition of data structures. Semantics and validation are left to be defined by specific higher-level formats derived from OpenDDL. The core language is designed to place as little burden as possible on readers so that it's easy to write programs that understand OpenDDL.

The OpenDDL syntax is illustrated in the “railroad diagrams” found throughout this document, and it is designed to feel familiar to C/C++ programmers. A significant feature of the language is that whitespace never has any meaning, so OpenDDL files can be formatted in any manner preferred.

The Language

An OpenDDL file is composed of a sequence of structures. A single structure consists of a type identifier followed by an optional name, an optional list of properties, and then its data payload enclosed in braces, as shown in Figure 1. There are two general types of structures, those with built-in types that contain primitive data such as integers or strings, and those that represent custom data structures defined by a derivative file format. As an example, suppose that a derivative file format defined a data type called Vertex that contains the 3D coordinates of a single vertex position. This could be written as follows.

Vertex
{
    float {1.0, 2.0, 3.0}
}

The Vertex identifier represents a custom data structure defined by the derivative file format, and it contains another structure of type float, which is a built-in primitive data type. The data in the float structure consists of the three values 1.0, 2.0, and 3.0. In general, raw data values in a primitive data structure are specified as a homogeneous comma-separated list of unbounded size, as shown in Figure 2.

The raw data inside a primitive data structure may also be specified as a comma-separated list of subarrays of values, as shown in Figure 3. The size of each subarray is specified by placing a positive integer value inside brackets immediately following the primitive type identifier, preceding the structure's name if it has one. Each value contained in the primitive data structure is then written as a comma-separated array of values enclosed in braces. As an example, suppose that a VertexArray structure expects to contain an array of 3D positions, each of which is specified as an array of three floating-point values. This would be written as follows.

VertexArray
{
    float[3]
    {
        {1.0, 2.0, 3.0},
        {0.5, 0.0, 0.5},
        {0.0, -1.0, 4.0}
    }
}

The number of elements in each subarray must always match the array size specified inside the brackets following the primitive type identifier. If the array size is one, then the braces are still required.

Note that a reader would use its knowledge of the already-parsed data type to choose only a single rule in Figures 2 and 3, as opposed to allowing any of the types of data to appear inside the braces. (It is also not possible to disambiguate among the numerical data types without some extra information.) This restriction could be expressed in the grammar, but doing so would come at a significant cost in conciseness.

The data payload of a primitive data structure is a homogeneous array of values separated by commas.

data-array-list

{ bool-literal , } , { integer-literal , } , { float-literal , } , { string-literal , } , { reference , } , { data-type , } ,

A data payload may consist of an array of subarrays separated by commas. Each subarray contains a homogeneous array of values enclosed in parentheses.

Primitive Data Types

OpenDDL defines the 15 primitive data types shown in Figure 4, and they are described in the following table.

TypeDescription
boolA boolean type that can have the value true or false.
int8An 8-bit signed integer that can have values in the range [−128127].
int16A 16-bit signed integer that can have values in the range [−3276832767].
int32A 32-bit signed integer that can have values in the range [−21474836482147483647].
int64A 64-bit signed integer that can have values in the range [−92233720368547758089223372036854775807].
unsigned_int8An 8-bit unsigned integer that can have values in the range [0255].
unsigned_int16A 16-bit unsigned integer that can have values in the range [065535].
unsigned_int32A 32-bit unsigned integer that can have values in the range [04294967295].
unsigned_int64A 64-bit unsigned integer that can have values in the range [018446744073709551615].
halfA 16-bit floating-point type conforming to the standard S1E5M10 format.
floatA 32-bit floating-point type conforming to the standard S1E8M23 format.
doubleA 64-bit floating-point type conforming to the standard S1E11M52 format.
stringA double-quoted character string with contents encoded in UTF-8.
refA sequence of structure names, or the keyword null.
typeA type whose values are identifiers naming types in the first column of this table.

When used as the identifier for a data structure, each entry in the above table indicates that the structure is a primitive structure and its data payload is composed of an array of literal values. Primitive structures cannot have substructures.

There is no implicit type conversion in OpenDDL. Data values belonging to a primitive structure must be parsable as literal values corresponding to the primitive data type.

The type data type is convenient for schemas built upon OpenDDL itself in order to define valid type usages in derivative file formats.

data-type

bool int8 int16 int32 int64 unsigned_int8 unsigned_int16 unsigned_int32 unsigned_int64 half float double string ref type

These are the 15 primitive data types defined in OpenDDL.

Identifiers

An identifier in OpenDDL is a sequence of characters composed from the set {AZ, az, 09, _}, as shown in Figure 5. That is, an identifier is composed of uppercase and lowercase roman letters, the numbers 0–9, and the underscore. An identifier cannot begin with a number.

Identifiers are used to identify data structure types, structure names, and properties. The 15 primitive data types shown in Figure 4 are reserved as structure types, but they can still be used as structure names and property identifiers.

identifier

[A–Z] [a–z] _ [0–9] [A–Z] [a–z] _

An identifier is composed of uppercase and lowercase roman letters, the numbers 0–9, and the underscore.

Names

Any structure in an OpenDDL file may have a name. Names are used to reference other structures from within primitive data structures or through property values. A name can be a global name or a local name. Each global name must be unique among all global names used inside the file containing it, and each local name must be unique among all local names used by its siblings in the structure tree. Local names can be reused inside different structures, and they can duplicate global names.

As shown in Figure 6, a name is composed of either a dollar sign character ($) or percent sign character (%) followed by an identifier with no intervening whitespace. A name that begins with a dollar sign is a global name, and a name that begins with a percent sign is a local name. A name is assigned to a structure by placing it right after the structure identifier (and no whitespace is technically required before the dollar sign), as in the following example.

Vertex $apex
{
    float {1.0, 2.0, 3.0}
}

This structure can be referenced from elsewhere in the file by using the name $apex.

name

$ % identifier

A name is composed of either a dollar sign character ($) or a percent sign character (%) followed by an identifier with no intervening whitespace.

References

A reference value is used to form a link to a specific structure within an OpenDDL file. If the target structure has a global name, then the value of a reference to it is simply the name of the structure, beginning with the dollar sign character. If the target structure has a local name, then the value of a reference to it depends on the scope in which the reference appears. If the reference appears in a structure that is a sibling of the target structure, then its value is the name of the target structure, beginning with the percent sign character. Otherwise, the value of the reference consists of a sequence of names, as shown in Figure 7, that identify a sequence of structures along a branch in the structure tree. Only the first name in the sequence can be a global name, and the rest must be local names.

The value of a reference can also be keyword null to indicate that a reference has no target structure.

In the following example, where the Person, Name, and Friends data types are defined by some derivative format, references are used to link a data structure representing a person to the data structures representing his friends.

Person $chuck
{
    Name {string {"Charles"}}

    Friends
    {
        ref {$alice, $bob}
    }
}

Person $alice {...}
Person $bob {...}

reference

name % identifier null

A reference value is either the name of a structure or the keyword null. A structure may be identified by a sequence of names providing the path to the target along a branch of the structure tree.

Properties

A custom data structure in a derivative format may define one or more properties that can be specified separately from the data that the structure contains. Properties are written in a comma-separated list inside parentheses following the name of the structure (or just following the structure identifier if there is no name). As shown in Figure 8, each property is composed of a property identifier followed by an equals character (=) and the value of the property. The type of the property’s value must be specified by some external source of information such as a schema or program associated with a derivative format. For example, a string cannot be specified for a property that was expecting an integer. The specified type determines which rule in Figure 8 is applied, and a mismatch must be detected at the time that the property is parsed.

As an example, suppose that a data structure called Model defined a property called lod that takes an integer representing the level of detail to which its contents belong. This property would be specified as follows.

Model (lod = 2)
{
    ...
}

If another property called part existed and accepted a string (perhaps to identify a body part), then that property could be added to the list as follows.

Model (lod = 2, part = "Left Hand")
{
    ...
}

The order in which properties are listed is insignificant. Derivative file formats may require that certain properties always be specified. Optional properties must always have a default value or be specially handled as being in an unspecified state. The same property can be specified more than once in the same property list, and in such a case, all but the final value specified for the same property must be ignored.

The syntax does not allow primitive data types to have a property list. (See Figure 1.)

A property is composed of an identifier followed by an equals character (=) and the value of the property.

Booleans

A boolean value is one of the keywords false or true, as shown in Figure 9.

bool-literal

false true

A boolean value is either the keyword false or the keyword true.

Integers

The language allows integers to be specified as a decimal number, a hexadecimal number, an octal number, a binary number, or a single-quoted character literal.

Between any two consecutive digits of each type of integer literal, a single underscore character may be inserted as a separator to enhance readability. The presence of underscore characters and their positions have no significance, and they do not affect the value of a literal.

A decimal literal is simply composed of a sequence of numerical digits, as shown in Figure 10, and leading zeros are permitted.

A hexadecimal literal is specified by prefixing a number with 0x or 0X, as shown in Figure 12. This is followed, without any intervening whitespace, by any number of hexadecimal digits that don't cause the underlying integer type to overflow. As shown in Figure 11, the use of the letters A–F in a hexadecimal literal is not case sensitive.

An octal literal is specified by prefixing a number with 0o or 0O, as shown in Figure 13. This is followed, without any intervening whitespace, by any number of digits between 0 and 7, inclusive, that don't cause the underlying integer type to overflow.

A binary literal is specified by prefixing a number with 0b or 0B, as shown in Figure 14. This is followed, without any intervening whitespace, by any number of zeros and ones that don't cause the underlying integer type to overflow.

A character literal is specified by a sequence of printable ASCII characters enclosed in single quotes, as shown in Figure 16. OpenDDL supports the escape sequences listed in the following table and illustrated in Figure 15. Escape sequences may be used to generate control characters or arbitrary byte values. The single quote (') and backslash (\) characters cannot be represented directly and must be encoded with escape sequences. The \x escape sequence is always followed by exactly two hexadecimal digits. Each character (after resolving escape sequences) corresponds to exactly one byte in the resulting integer value, and the right-most character corresponds to the least significant byte.

Escape SequenceASCII CodeDescription
\"0x22Double quote
\'0x27Single quote
\?0x3FQuestion mark
\\0x5CBackslash
\a0x07Bell
\b0x08Backspace
\f0x0CFormfeed
\n0x0ANewline
\r0x0DCarriage return
\t0x09Horizontal tab
\v0x0BVertical tab
\xhhByte value specified by the two hex digits hh

An integer literal is composed of an optional plus or minus sign followed by a decimal, hexadecimal, octal, binary, or character literal, as shown in Figure 17.

In the following example, the same integer value is repeated five times using different literal types.

unsigned_int32
{
    1094861636,
    0x41424344,
    0o10120441504,
    0b0100_0001_0100_0010_0100_0011_0100_0100,
    'ABCD'
}

decimal-literal

[0–9] _

A decimal literal is any sequence of numerical digits.

hex-digit

[0–9] [A–F] [a–f]

A hexadecimal digit is a numerical digit 0–9 or a letter A–F (with no regard for case).

hex-literal

0x 0X hex-digit _

A hexadecimal literal starts with 0x or 0X and continues with one or more hexadecimal digits.

octal-literal

0o 0O [0–7] _

An octal literal starts with 0o or 0O and continues with one or more octal digits.

binary-literal

0b 0B 0 1 _

A binary literal starts with 0b or 0B and continues with one or more binary digits.

escape-char

\" \' \? \\ \a \b \f \n \r \t \v \x hex-digit hex-digit

An escape character consists of a backslash (\) followed by a single character code. In the case of the \x character code, the escape sequence includes exactly two additional hexadecimal digits.

char-literal

' [U+0020–U+0026] [U+0028–U+005B] [U+005D–U+007E] escape-char '

A character literal is composed of a sequence of printable ASCII characters enclosed in single quotes. The single quote (') and backslash (\) characters cannot be represented directly and must be encoded with escape sequences.

An integer literal is composed of an optional sign followed by a decimal, hexadecimal, binary, or character literal.

Floating-Point Numbers

The language allows floating-point numbers to be specified as a decimal number with or without a decimal point and fraction, and with or without a trailing exponent, as shown in Figure 18. When a fraction and/or exponent is present, the number format is the same as defined in C/C++. Floating-point numbers may also be specified as hexadecimal, octal, or binary literals representing the underlying bit pattern of the number. This is particularly useful for lossless exchange of floating-point data since round-off errors possible in the conversion to and from a decimal representation are avoided. Using a hexadecimal, octal, or binary representation is also the only way to specify a floating-point infinity or not-a-number (NaN) value.

As with integer literals, an underscore character may be inserted between any two consecutive numerical digits in a floating-point literal to enhance readability. Underscore characters are ignored and do not affect the value of a literal.

float-literal

+ [0–9] _ . [0–9] _ . [0–9] _ e E + [0–9] _ hex-literal octal-literal binary-literal

A floating-point literal is composed of an optional sign followed by a number with or without a decimal point and an optional exponent. Hexadecimal and binary literals representing the underlying bit pattern are also accepted.

Strings

Strings in OpenDDL are composed of a sequence of characters enclosed in double quotes, as shown in Figure 19. Unicode values (encoded as UTF-8) in the following ranges may be directly included in a string literal:

This is the only place where non-ASCII characters are allowed other than in comments.

A string may contain the escape sequences defined for character literals (see Figure 15). The double quote (") and backslash (\) characters cannot be represented directly and must be encoded with escape sequences. String literals also support the \u escape sequence, which specifies a nonzero Unicode character using exactly four hexadecimal digits immediately following the u. In order to support Unicode characters outside the Basic Multilingual Plane (BMP), a six-digit code can be specified by using an uppercase U. The \U escape sequence must be followed by exactly six hexadecimal digits that specify a value in the range [0x000001, 0x10FFFF].

Multiple string literals may be placed adjacent to each other with or without intervening whitespace, and this results in concatenation.

string-literal

" [U+0020–U+0021] [U+0023–U+005B] [U+005D–U+007E] [U+00A0–U+D7FF] [U+E000–U+FFFD] [U+010000–U+10FFFF] escape-char \u hex-digit hex-digit hex-digit hex-digit \U hex-digit hex-digit hex-digit hex-digit hex-digit hex-digit "

A string literal is composed of a sequence of Unicode characters enclosed in double quotes. The double-quote ("), backslash (\), and non-printing control characters are excluded from the set of characters that can be directly represented. A string may contain the same escape characters as a character literal as well as additional Unicode escape sequences. Adjacent strings are concatenated.

Comments and Whitespace

The language supports C++-style block comments and single-line comments as follows:

If any sequence /*, */, or // appears inside a character literal or string literal, then it is part of the literal value and not treated as a comment.

Comments may include any Unicode characters encoded as UTF-8. The only other place where non-ASCII characters are allowed is inside a string literal (see Figure 19).

All characters having a value in the range [132] (which includes the space, tab, newline, and carriage return characters), as well as all characters belonging to comments, are considered to be whitespace in OpenDDL. Any arbitrarily long contiguous sequence of whitespace characters is equivalent to a single space character.

OpenDDL Library

A free OpenDDL parser can be downloaded through the following link. See the software license in the source code. This package includes project files for Visual Studio 2013, and the solution file can be found in the OpenDDL-Parser/OpenDDL/ directory. The parser is built as the static library OpenDDL.lib, and a minimal example is included to demonstrate the simplest usage. Class and function documentation can be found in the header files as well as on the C4 Engine website.

Download OpenDDL Library

An example usage that is much more complex can be found in the OpenGEX Import Template, available on opengex.org.

Revision History

Version 1.1. 17-Nov-2014

Version 1.0. 24-Sep-2013

About

OpenDDL was created by Eric Lengyel, and the first parser was implemented in C4 Engine version 3.5. OpenDDL arose during the development of the Open Game Engine Exchange (OpenGEX) format.

The railroad diagrams on this page were generated with the Railroad Diagram Generator using this grammar and then manually tweaked.

Copyright © 2013–2014 by Eric Lengyel.

Creative Commons License
The OpenDDL specification and the images on this page by Eric Lengyel are licensed under a
Creative Commons Attribution-ShareAlike 3.0 Unported License.
Based on a work at http://openddl.org/.