summaryrefslogtreecommitdiff
path: root/doc/formats.rst
blob: 895bc9c83eec8424c70a92e46bb923dd4b5d133b (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
=====================
Formats and Languages
=====================

The formats and languages used by Tangible Bit are explained in English and
(in some cases) formally defined in EBNF.

EBNF Grammar
~~~~~~~~~~~~

We use the EBNF notation as used in the `Go Language Specification
<http://golang.org/doc/go_spec.html#Notation>`_, which is almost identical
to a subset of standard ISO EBNF.

Additionally, we use the “except” construct from ISO EBNF:

A - B
  Matches any string that matches A but does not match B.

We use ``#xN``, where N is a hexadecimal integer, to match the character
whose Unicode code point is N.

Predefined Rules
~~~~~~~~~~~~~~~~

The following EBNF rules are assumed to be predefined and not formally
defined in this document:

Char
  Any Unicode character.

UnicodeLetter
  A character classified as “Letter” in the Unicode database.

UnicodeDigit
  A character classified as “Digit” in the Unicode database.

Polyps
~~~~~~

Here’s the EBNF of the *Polyps* language:

.. include:: ../engines/polyps/polyps.ebnf
   :literal:

TODO unicode_letter and unicode_digit should really be Unicode letters (as
defined above), not just ASCII letters.

Tangible Data Format (TDF)
~~~~~~~~~~~~~~~~~~~~~~~~~~

TDF is used as general-purpose data serialization format for object
definitions and representations (and possibly other purposes). The data
model is roughly equivalent to JSON, but TDF is meant to be easily readable
and writable by humans, as well as being easily parsable and serializable
by computers. TDF is a superset of the popular RFC 5322 “name: value”
format, adding support for lists and nested content as well as for
Windows-INI-style “[section]” blocks (as syntactic sugar).

* There are two types of *values*: *compounds* and *atoms*.
* There are two types of *compounds*: *lists* and *maps*.
* There are two types of *lists*: *block lists* and *inline lists*.
* *Block lists* contain one or more *block list items*. (Note: the *empty
  list*\ —a *list* with zero items—must be written as an *inline list*,
  since *block lists* must contain at least one item.)
* *Block list items* are either *simple items* or *complex items*. If a
  *block list item* spans multiple lines, all subsequent lines must be
  *deeper indented* than the first one. All *block list items* in a *block
  list* must start at the same *indentation level* (they must be preceded
  by the same amount of *inline whitespace*).
* *Simple items* are introduced by a hyphen (-), followed by *whitespace*
  and an *atom*.
* *Complex items* are introduced by a plus sign (+), followed by
  *whitespace* and a *compound*.
* *Inline lists* are enclosed in curly brackets (“{…}”) and contain zero
  or more *inline list items*.
* *Inline list items* are *atoms* or *inline lists* (*inline lists* can
  contain nested *inline lists*, but they cannot contain *block lists* or
  *maps*).
* Successive *inline list items* are separated by a comma (,).
* *Maps* contain zero or more *pairs*. All *pairs* in a *map* must start at
  the same *indentation level* (they must be preceded by the same amount of
  *inline whitespace*).
* *Pairs* are either *simple pairs* or *complex pairs*. If a *pair* spans
  multiple lines, all subsequent lines must be *deeper indented* than the
  first one.
* *Simple pairs* contain a *key* followed by an *atom*, separated by a
  colon (:) followed by *whitespace*.
* *Complex pairs* contain a *key* followed by a *compound*, separated by
  two colons (::) followed by *whitespace*.
* *Keys* are *atoms*. The *keys* of all *pairs* in the *maps* must be
  distinct—no *map* may contain the same *key* several times. *Integer*
  and *float* representations of the same numeric value are considered as
  identical (a *map* cannot contain the *keys* “3” and “3.0” at the same
  time).
* There are three types of *atoms*: *strings*, *numbers* and *literals*.
* There are three *literals* (all written without the surrounding quotes):
  “true”, “false” (representing the two Boolean values), and “null”
  (representing the null/nil/nothing value).
* There are two types of *numbers*: *integers* and *floats*. *Integers* are
  written like decimal integers in Polyps (cf. the ``decimal_lit`` rule
  above—octal and hexadecimal notations are not supported); *floats* are
  written as in Polyps (cf. the ``float_lit`` rule above).
* *Strings* are sequences of zero or more Unicode characters. If a
  *strings* looks like an *atom* of another kinds (a *number* or *literal*)
  or if it starts with an opening bracket ([) or curly bracket ({) or with
  *whitespace*, its first character must be *backslash-escaped*. If a
  *string* ends with *whitespace*, its last character must be
  *backslash-escaped*\ —if a string ends in a single backslash, followed by
  a *linebreak* or the end of the document, that counts as a
  *backslash-escaped* *linebreak* at the end of the string (it’s not
  necessary to explicitly add an extra *linebreak*). Plus or minus signs
  (+-) at the start of a string must be *backslash-escaped* if followed by
  *whitespace* (to prevent confusion with *lists*). The hash sign (#) must
  be escaped at the start of the string or if preceded by *whitespace* (to
  prevent confusion with *comments*). Colons (:) must be
  *backslash-escaped* within *keys*; commas (,) and curly brackets ({})
  must be *backslash-escaped* within *inline list items*. Backslashes (\\)
  must be *backslash-escaped* within all *strings*. *Backslash-escaping*
  other characters is allowed, but not required.
* Characters are *backslash-escaped* by writing a backslash (\\) in front
  of the character.
* *Complex maps* are *maps* that contain only *complex pairs* (i.e. *keys*
  followed by *compounds*). There is an alternative syntax for serializing
  all the *complex pairs* of a *complex map*, provided that none of the
  *compounds* in the *complex pairs* is the *empty list* (a *list* with
  zero items): the *key* can be written as a *bracket key* on a line of its
  own, enclosed in brackets (“[…]”) and followed by the *compound* in
  subsequent lines. In this case, the *compound* must start at the same
  *indentation level* as the preceding *bracket key* (it must NOT be
  *deeper indented*). The *compounds* themselves must NOT use this
  alternative syntax for *complex maps*, even if they are *complex maps*
  (since it would be impossible to distinguish *bracket keys* belonging to
  the inner from those belonging to the outer *complex map*). *Compounds*
  that are *lists* must be serialized as *block lists*, not as *inline
  lists* (otherwise they could be easily confused with *bracket
  keys*\ —that’s the reason that *lists* with zero items are forbidden in
  this alternative syntax: *block lists* must always have at least one
  item). (This alternative syntax is similar to the syntax of Windows “INI”
  files and Python’s “configparser” module.)
* *Whitespace* is any combination of *inline whitespace*, *linebreaks*, and
  the vertical tab (#x0B) and form feed (#x0C) characters.
* *Inline whitespace* is any combination of space (#x20) and tab (#x09)
  characters.
* There are three *linebreaks*: just LF (#x0A, Unix style), just CR (#x0D,
  old Mac style), and CR followed by LF (Windows style).
* Indentation expresses nesting. The sequence of *inline whitespace*
  between the start of a line and a *value* determines the *indentation
  level* of the *value*. Other *values* start at the same *indentation
  level* if they are separated by the same sequence of *inline whitespace*
  from the start of the line; they are *deeper intended* if they are
  separated from the start of the line by the same sequence of *inline
  whitespace*, followed by additional *inline whitespace*. (Note that it’s
  not possible to exchange spaces for tabs or vice versa. If one *value* is
  preceded by, say, 8 spaces, and another one by 1 tab, they won’t start at
  the same *indentation level*, since the sequences of *inline whitespace*
  preceding them differ; neither is one of them *deeper indented* than the
  other, since neither whitespace sequence is a prefix of the other one.)
* *Comments* are introduced by a hash sign (#) and extend until the end of
  the line. *Comments* can occur anywhere, but they must always be preceded
  by *whitespace* (except for a *comment* that occurs at the very start of
  the file). All *comments* and all *whitespace* preceding them on the same
  line are discarded when reading TDF documents.
* A TDF *document* is a *compound*.

For examples, see the ``.tt`` files in the ``data/types`` directory and the
``.tb`` files in the ``sample`` directory.

Tangible Bit Query Language (TQL)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Used to query for objects, locations and other items contained in the TBit
database. Here’s the EBNF:

.. include:: querylanguage.ebnf
   :literal:

* TODO Calculations: evaluate to an atom using standard operators arithmetic
  operators (+, -, \*, /, %).  Can contain Atoms. Grouping is used where
  necessary to resolve ambiguities.
* Conditions: evaluate to a Boolean using standard Boolean syntax: logical
  connectives (! & | ^), relational operators (== != < > <= >=), set
  operators (in ~). Can contain calculations and atoms and atom lists.
  Grouping is used where necessary to resolve ambiguities.

Tangible Bit Declaration Language (TDL)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Used to the declare the properties of objects and processes. TODO describe

Tangible Bit Type Definitions (TTD)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

TODO describe interpretation of TTD; object and complex type declarations:
uri and desc fields, 4 keywords blocks: req, opt, reqlist, optlist (all
optional, order doesn’t matter); describe variant used for ``basetypes.tt``
(basic types). Possibly define additional *optlist* field for child
categories: ``parent`` contains the URI of the parent category/categories
(but in the base types, we just specify the parent type in parentheses
after the name)—probably unify name and URI and allow declaring a
``@prefix`` field for relative URIs (cf. prefixes used in N3/RDF). Each
file should contain the declaration of an object category as well as any
complex types used in its definition.