===================== Formats and Languages ===================== The formats and languages used by Tangible Bit are explained in English and (in some cases) formally defined in EBNF. EBNF Grammar ~~~~~~~~~~~~ We use the EBNF notation as used in the `Go Language Specification `_, which is almost identical to a subset of standard ISO EBNF. Additionally, we use the “except” construct from ISO EBNF: A - B Matches any string that matches A but does not match B. We use ``#xN``, where N is a hexadecimal integer, to match the character whose Unicode code point is N. Predefined Rules ~~~~~~~~~~~~~~~~ The following EBNF rules are assumed to be predefined and not formally defined in this document: Char Any Unicode character. UnicodeLetter A character classified as “Letter” in the Unicode database. UnicodeDigit A character classified as “Digit” in the Unicode database. Polyps ~~~~~~ Here’s the EBNF of the *Polyps* language: .. include:: ../engines/polyps/polyps.ebnf :literal: TODO unicode_letter and unicode_digit should really be Unicode letters (as defined above), not just ASCII letters. Tangible Data Format (TDF) ~~~~~~~~~~~~~~~~~~~~~~~~~~ TDF is used as general-purpose data serialization format for object definitions and representations (and possibly other purposes). The data model is roughly equivalent to JSON, but TDF is meant to be easily readable and writable by humans, as well as being easily parsable and serializable by computers. TDF is a superset of the popular RFC 5322 “name: value” format, adding support for lists and nested content as well as for Windows-INI-style “[section]” blocks (as syntactic sugar). * There are two types of *values*: *compounds* and *atoms*. * There are two types of *compounds*: *lists* and *maps*. * There are two types of *lists*: *block lists* and *inline lists*. * *Block lists* contain one or more *block list items*. (Note: the *empty list*\ —a *list* with zero items—must be written as an *inline list*, since *block lists* must contain at least one item.) * *Block list items* are either *simple items* or *complex items*. If a *block list item* spans multiple lines, all subsequent lines must be *deeper indented* than the first one. All *block list items* in a *block list* must start at the same *indentation level* (they must be preceded by the same amount of *inline whitespace*). * *Simple items* are introduced by a hyphen (-), followed by *whitespace* and an *atom*. * *Complex items* are introduced by a plus sign (+), followed by *whitespace* and a *compound*. * *Inline lists* are enclosed in curly brackets (“{…}”) and contain zero or more *inline list items*. * *Inline list items* are *atoms* or *inline lists* (*inline lists* can contain nested *inline lists*, but they cannot contain *block lists* or *maps*). * Successive *inline list items* are separated by a comma (,). * *Maps* contain zero or more *pairs*. All *pairs* in a *map* must start at the same *indentation level* (they must be preceded by the same amount of *inline whitespace*). * *Pairs* are either *simple pairs* or *complex pairs*. If a *pair* spans multiple lines, all subsequent lines must be *deeper indented* than the first one. * *Simple pairs* contain a *key* followed by an *atom*, separated by a colon (:) followed by *whitespace*. * *Complex pairs* contain a *key* followed by a *compound*, separated by two colons (::) followed by *whitespace*. * *Keys* are *atoms*. The *keys* of all *pairs* in the *maps* must be distinct—no *map* may contain the same *key* several times. *Integer* and *float* representations of the same numeric value are considered as identical (a *map* cannot contain the *keys* “3” and “3.0” at the same time). * There are three types of *atoms*: *strings*, *numbers* and *literals*. * There are three *literals* (all written without the surrounding quotes): “true”, “false” (representing the two Boolean values), and “null” (representing the null/nil/nothing value). * There are two types of *numbers*: *integers* and *floats*. *Integers* are written like decimal integers in Polyps (cf. the ``decimal_lit`` rule above—octal and hexadecimal notations are not supported); *floats* are written as in Polyps (cf. the ``float_lit`` rule above). * *Strings* are sequences of zero or more Unicode characters. If a *strings* looks like an *atom* of another kinds (a *number* or *literal*) or if it starts with an opening bracket ([) or curly bracket ({) or with *whitespace*, its first character must be *backslash-escaped*. If a *string* ends with *whitespace*, its last character must be *backslash-escaped*\ —if a string ends in a single backslash, followed by a *linebreak* or the end of the document, that counts as a *backslash-escaped* *linebreak* at the end of the string (it’s not necessary to explicitly add an extra *linebreak*). Plus or minus signs (+-) at the start of a string must be *backslash-escaped* if followed by *whitespace* (to prevent confusion with *lists*). The hash sign (#) must be escaped at the start of the string or if preceded by *whitespace* (to prevent confusion with *comments*). Colons (:) must be *backslash-escaped* within *keys*; commas (,) and curly brackets ({}) must be *backslash-escaped* within *inline list items*. Backslashes (\\) must be *backslash-escaped* within all *strings*. *Backslash-escaping* other characters is allowed, but not required. * Characters are *backslash-escaped* by writing a backslash (\\) in front of the character. * *Complex maps* are *maps* that contain only *complex pairs* (i.e. *keys* followed by *compounds*). There is an alternative syntax for serializing all the *complex pairs* of a *complex map*, provided that none of the *compounds* in the *complex pairs* is the *empty list* (a *list* with zero items): the *key* can be written as a *bracket key* on a line of its own, enclosed in brackets (“[…]”) and followed by the *compound* in subsequent lines. In this case, the *compound* must start at the same *indentation level* as the preceding *bracket key* (it must NOT be *deeper indented*). The *compounds* themselves must NOT use this alternative syntax for *complex maps*, even if they are *complex maps* (since it would be impossible to distinguish *bracket keys* belonging to the inner from those belonging to the outer *complex map*). *Compounds* that are *lists* must be serialized as *block lists*, not as *inline lists* (otherwise they could be easily confused with *bracket keys*\ —that’s the reason that *lists* with zero items are forbidden in this alternative syntax: *block lists* must always have at least one item). (This alternative syntax is similar to the syntax of Windows “INI” files and Python’s “configparser” module.) * *Whitespace* is any combination of *inline whitespace*, *linebreaks*, and the vertical tab (#x0B) and form feed (#x0C) characters. * *Inline whitespace* is any combination of space (#x20) and tab (#x09) characters. * There are three *linebreaks*: just LF (#x0A, Unix style), just CR (#x0D, old Mac style), and CR followed by LF (Windows style). * Indentation expresses nesting. The sequence of *inline whitespace* between the start of a line and a *value* determines the *indentation level* of the *value*. Other *values* start at the same *indentation level* if they are separated by the same sequence of *inline whitespace* from the start of the line; they are *deeper intended* if they are separated from the start of the line by the same sequence of *inline whitespace*, followed by additional *inline whitespace*. (Note that it’s not possible to exchange spaces for tabs or vice versa. If one *value* is preceded by, say, 8 spaces, and another one by 1 tab, they won’t start at the same *indentation level*, since the sequences of *inline whitespace* preceding them differ; neither is one of them *deeper indented* than the other, since neither whitespace sequence is a prefix of the other one.) * *Comments* are introduced by a hash sign (#) and extend until the end of the line. *Comments* can occur anywhere, but they must always be preceded by *whitespace* (except for a *comment* that occurs at the very start of the file). All *comments* and all *whitespace* preceding them on the same line are discarded when reading TDF documents. * A TDF *document* is a *compound*. For examples, see the ``.tt`` files in the ``data/types`` directory and the ``.tb`` files in the ``sample`` directory. Tangible Bit Query Language (TQL) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Used to query for objects, locations and other items contained in the TBit database. Here’s the EBNF: .. include:: querylanguage.ebnf :literal: * TODO Calculations: evaluate to an atom using standard operators arithmetic operators (+, -, \*, /, %). Can contain Atoms. Grouping is used where necessary to resolve ambiguities. * Conditions: evaluate to a Boolean using standard Boolean syntax: logical connectives (! & | ^), relational operators (== != < > <= >=), set operators (in ~). Can contain calculations and atoms and atom lists. Grouping is used where necessary to resolve ambiguities. Tangible Bit Declaration Language (TDL) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Used to the declare the properties of objects and processes. TODO describe Tangible Bit Type Definitions (TTD) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ TODO describe interpretation of TTD; object and complex type declarations: uri and desc fields, 4 keywords blocks: req, opt, reqlist, optlist (all optional, order doesn’t matter); describe variant used for ``basetypes.tt`` (basic types). Possibly define additional *optlist* field for child categories: ``parent`` contains the URI of the parent category/categories (but in the base types, we just specify the parent type in parentheses after the name)—probably unify name and URI and allow declaring a ``@prefix`` field for relative URIs (cf. prefixes used in N3/RDF). Each file should contain the declaration of an object category as well as any complex types used in its definition.