cad/src/files/mmp/mmpformat_versions.py


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418

# Copyright 2005-2008 Nanorex, Inc.  See LICENSE file for details.
"""
mmpformat_versions.py -- list of all mmpformat versions ever used, and
related parsing and testing code, and definitions of currently written
versions.

@author: Bruce
@version: $Id$
@copyright: 2005-2008 Nanorex, Inc.  See LICENSE file for details.

History:

bruce 080328 split this out of files_mmp_writing.py, which was earlier
split out of files_mmp.py, which may have been called fileIO.py when
this list was first started.

bruce 080328 added reader code (not in this file) to warn if the mmpformat
being read appears to be too new.

bruce 080410 added a long string comment about the mmpformat situation
for the upcoming release of NE1 1.0.0.

TODO:

- fix the logic bug in "info leaf", described below.
"""

# ==

# imports are lower down

_RAW_LIST_OF_KNOWN_MMPFORMAT_VERSIONS = \
"""
Notes by bruce 050217 about mmp file format version strings,
and a list of all that have been used so far.

Developers -- please maintain this list!

For general notes about when and how to change the mmp format version,
see a separate file, files_mmp_format_version.txt,
which may be in the same directory as this source file.

WARNING: THIS LONG STRING LITERAL IS PARSED TO OBTAIN THE LIST
OF KNOWN VERSIONS. Any line in it which starts with "'" (a single quote)
must look like the lines below which present an mmpformat version string
that has been used. The testing code in this module extracts this list
and makes sure each version string can be parsed by the mmp reader's
code for doing that, and makes sure they are presented here in
chronological order, and fit other rules described with that code.

===

There was no mmpformat record before 050130 (shortly before Alpha-1 was
released), though the format had several versions before then, not all of
which were upward-compatible.

At some points, new elements were added, but this was not listed here
as a new required format (though it should have been),
since it was not noticed that this broke the reading of new files by
old code due to exceptions for unrecognized elements.

===

'050130' -- the mmpformat record, using this format-version "050130",
was introduced just before Alpha-1 release, at or shortly after
the format was changed so that two (rather than one) NamedView (Csys) records
were stored, one for Home View and one for Last View

'050130 required; 050217 optional' -- introduced by bruce on 050217,
when the info record was added, for info chunk hotspot.
(The optional part needs incrementing whenever more kinds of info records
are interpretable, at least once per "release".)

'050130 required; 050421 optional' -- bruce, adding new info records,
namely "info leaf hidden" and "info opengroup open";
and adding "per-part views" in the initial data group,
whose names are HomeView%d and LastView%d. All these changes are
backward-compatible -- old code will ignore the new records.

'050130 required; 050422 optional' -- bruce, adding forward_ref,
info leaf forwarded, and info leaf disabled.

'050502 required' -- bruce, now writing bond2, bond3, bonda, bondg
for higher-valence bonds appearing in the model (if any). (The code
that actually writes these is not in this file.)

Actually, "required" is conservative -- these are only "required" if
higher-valence bonds are present in the model being written.

Unfortunately, we don't yet have any way to say that to old code reading the file.
(This would require declaring these new bond records in the file, using a "declare"
record known by older reading-code, and telling it (as part of the declaration,
something formal that meant) "if you see these new record types and don't understand
them, then you miss some essential bond info of the kind carried by bond1 which you
do understand". In other words, "error if you see bond2 (etc), don't understand it,
but do understand (and care about) bond1".)

'050502 required; 050505 optional' -- bruce, adding "info chunk color".

'050502 required; 050511 optional' -- bruce, adding "info atom atomtype".

Strictly speaking, these are required in the sense that the atoms in the file
will seem to have the wrong number of bonds if these are not understood. But since
the file would still be usable to old code, and no altered file would be better
for old code, we call these new records optional.

'050502 required; 050618 preferred' -- bruce, adding url-encoding of '(', ')',
and '%' in node names (so ')' is legal in them, fixing part of bug 474).
I'm calling it optional, since old code could read new files with only the
harmless cosmetic issue of the users seeing the encoded node-names.

I also decided that "preferred" is more understandable than "optional".
Nothing yet uses that word (except the user who sees this format in the
Part Properties dialog), so no harm is caused by changing it.

'050502 required; 050701 preferred' -- bruce, adding gamess jig and info gamess records.

'050502 required; 050706 preferred' -- bruce, increased precision of Linear Motor force & stiffness

'050920 required' -- bruce, save carbomeric bonds as their own bond type bondc, not bonda as before

'050920 required; 051102 preferred' -- bruce, adding "info leaf enable_in_minimize"

'050920 required; 051103 preferred' -- this value existed for some time; unknown whether the prior one actually existed or not

'050920 required; 060421 preferred' -- bruce, adding "info leaf dampers_enabled"

'050920 required; 060522 preferred' -- bruce, adding "comment" and "info leaf commentline <encoding>" [will be in Alpha8]

'050920 required; 070415 preferred' -- bruce, adding "bond_direction" record

'050920 required; 080115 preferred' -- bruce, adding group classifications DnaGroup, DnaSegment, DnaStrand, Block

'050920 required; 080321 preferred' -- bruce, adding info chunk display_as_pam, save_as_pam

'080327 required' -- bruce, ADDED AFTER THE FACT to cover new display style names but not new "write_bonds_compactly" records

'080327 required; 080412 preferred' -- bruce, to cover mark's "info opengroup nanotube-parameters" record, added today
    (it should be ok that 080412 > 080328, since the pair of dates are compared independently, *but* the preferred
     date defaults to the required date, therefore we also have to upgrade '080328 required' to '080328 required; 080412 preferred')

'080327 required; 080523 preferred' -- bruce, adding "info atom +5data" for saving PAM3+5 data on Ss3 pseudoatoms

'080327 required; 080529 preferred' -- bruce, adding "info atom ghost" for marking placeholder bases aka ghost bases

'080328 required' -- bruce, adding new records bond_chain, directional_bond_chain, and dna_rung_bonds
                     (all enabled for writing by the option write_bonds_compactly in the present code),
                     and also using new display style names

'080328 required; 080412 preferred' -- bruce, see comment for '080327 required; 080412 preferred'

'080328 required; 080523 preferred' -- bruce, see comment for '080327 required; 080523 preferred'

'080328 required; 080529 preferred' -- bruce, see comment for '080327 required; 080529 preferred'
"""

# ==

# comment about status for NE1 1.0.0, to be released soon: [bruce 080410]

"""
This describes the mmp format situation as it's being revised today [080410]
to prepare for releasing NE1 1.0.0 in the near future. It covers
two incompatible changes (write_bonds_compactly and new display style names),
one being written by default, one not. (It's edited email, but ought
to be rewritten from scratch to be more useful as a comment in this file.)

==

We will not enable the write_bonds_compactly optimized mmp format by
default, for NE1 1.0.0. The risk is small, the work is small, the
testing is small, but they are all nonzero, and the gain is at most a
20-30% savings in disk space (and some in runtime for file open or
save, perhaps, but less than that) (never yet measured, FYI), and at
this late stage (shortly before the release, 080410) that gain doesn't
really justify any nonzero risk or extra work.

(One of the work and/or risk items is that there are loose ends in the
writing code for certain cases where strands leave one corner of a dna
ladder and reenter another corner of the same ladder -- rare, but
theoretically possible; easy to fix, but not yet fixed; only a writing
code bug, not needing any change in format or reading code. I'd
forgotten I still needed to fix those writing bugs when I was
discussing this on today's call.)

We should still (and I just did) enable the new display style names,
which *are* worth enabling now for all mmp writing. Those have
essentially zero risk, and are a confusion-reduction rather than just
an optimization.

These changes were both part of a single new "mmpformat version". The
code before now was conservative -- if either change was enabled, it
wrote the new mmpformat version, to warn reading code about both
changes being potentially present. To avoid needless worry by reading
code that understands new display names but not new bond records (as
we are writing now), I'll add an intermediate mmpformat version "after
the fact" for that, dated one day before the one that includes both
changes: '080327 required'. This won't matter for NE1 reading code
(since if it's new enough to care about the mmpformat version record,
it can also read both revisions to the mmp format itself), but will
allow other code to rely on that record to know that compact bonds are
not present even though new display style names are present.

==

The debug_pref we have now controls write_bonds_compactly for NE1
save, only. It will remain available but off by default. In current
code, there is no way to turn it on for mmp writing meant for ND1 or
NV1.

If time permits, we'll add 2 other debug_prefs to enable
write_bonds_compactly for writing files for ND1 and for NV1, so we can
easily test it for them later.

(ND1 has code to read it but this is untested. NV1 doesn't read it and
no work on making it read it is urgent -- unlike for its reading new
display style codes, now required but trivial.)

==

It is worth noting here that there are two known compatibility bugs
not covered by the mmpformat version record value:

- New element numbers cause crashes when read by old code,
  or turn into the wrong element (I think) in fairly recent code.
  (Ideally, we'd read them as special atoms which permitted any
   number of bonds and could be written out again with the same
   "unrecognized element number".)

- There is a logic bug in the "info leaf" record, or in any info kind
  which can apply to more mmp recordnames in newer writing code
  ("leaf" is the only one). Old readers will ignore the new mmp recordnames
  it applies to, but will recognize "info leaf" and will associate it
  with the wrong record (whichever prior one it can apply to which they
  *did* recognize and read). This bug won't matter for this release,
  even though it adds a DnaMarker record to the set which can use
  info leaf, since that record can't occur without new element numbers
  (for PAM DNA elements), and those cause all older released reading
  code to crash. But it will matter for the next subsequent release
  in which we add any jigs applicable to not-just-added element numbers.
  (It even affects this release, for a debugging-only jig which can mark any
   kind of atom, but that can be ignored since it's undocumented and only
   useful for debugging.)

  The solution is to close off the set of mmprecords info leaf can apply to,
  after this upcoming release of NE1 1.0.0, and revise that scheme somehow
  for any new mmp recordname that needs it, by adding a new info kind just for
  that mmp recordname (which can be isomorphic to info leaf).

==

Update, 080521: I fixed a bug in Atom.writemmp's rounding of negative
atom position coordinates. I didn't modify the mmpformat version for this,
since it would cause old code to warn when reading new files, even though
these are *more* compatible with that reader code than older-written
files are (since their atom coordinates were written by buggy code).
This forgoes a chance to record in the file whether the writing bug
was fixed, but there will very soon be a new preferred mmpformat version
for other reasons, so it's not important.
"""

# ==

# When you revise these, also revise (if necessary) the body of
# def _mmp_format_version_we_can_read() in files_mmp.py.

MMP_FORMAT_VERSION_TO_WRITE = '050920 required; 080321 preferred'
    # this semi-formally indicates required & ideal reader versions...
    # for notes about when/how to revise this, see general notes referred to
    # at end of module docstring.

MMP_FORMAT_VERSION_TO_WRITE__WITH_NEW_DISPLAY_NAMES = '080327 required; 080529 preferred'
    # For NE1 1.0.0 we will write new display names but not (by default)
    # the new bond records enabled in current code by 'write_bonds_compactly'.

MMP_FORMAT_VERSION_TO_WRITE__WITH_COMPACT_BONDS_AND_NEW_DISPLAY_NAMES = '080328 required; 080529 preferred'
    # Soon after NE1 1.0.0, this can become the usually-written version, I hope,
    # and these separately named constants can go away. (Or, the oldest one
    # might be retained, so we can offer the ability to write mmp files for
    # old reading code, if there is any reason for that.)

_version_being_written_by_default = MMP_FORMAT_VERSION_TO_WRITE__WITH_NEW_DISPLAY_NAMES
    # used locally, only for startup prints, but still it ought to be
    # kept in sync with reality as determined by files_mmp_writing.py
    # and by the default values of certain debug_prefs

# ==

from utilities.debug import print_compact_traceback

# ==

def _extract(raw_list):
    lines = raw_list.split('\n')
    res = []
    for line in lines:
        if line.startswith("'"):
            res.append( line.split("'")[1] )
        continue
    return res

KNOWN_MMPFORMAT_VERSIONS = _extract(_RAW_LIST_OF_KNOWN_MMPFORMAT_VERSIONS)

assert MMP_FORMAT_VERSION_TO_WRITE in KNOWN_MMPFORMAT_VERSIONS

assert MMP_FORMAT_VERSION_TO_WRITE__WITH_NEW_DISPLAY_NAMES in KNOWN_MMPFORMAT_VERSIONS

assert MMP_FORMAT_VERSION_TO_WRITE__WITH_COMPACT_BONDS_AND_NEW_DISPLAY_NAMES in KNOWN_MMPFORMAT_VERSIONS

if _version_being_written_by_default != KNOWN_MMPFORMAT_VERSIONS[-1]:
    # warn, since this situation should be temporary; warning will be wrong
    # if these constants are not kept up to date with the actual writing code
    print "note: KNOWN_MMPFORMAT_VERSIONS contains more recent versions " \
          "than the one we're writing by default, %r" % _version_being_written_by_default
    pass

# ==

def parse_mmpformat(mmpformat): #bruce 080328
    """
    Parse an mmpformat value string of the syntax which can presently be used
    in an mmpformat mmp record, or any syntax which has been used previously.

    @return: a tuple of (ok, required_date, preferred_date)
             (containing types boolean, string (if ok), string (if ok),
              where the dates are raw date strings from the mmpformat record,
              understandable only to functions in this module, namely
              mmp_date_newer)

    @note: never raises exceptions for syntax errors, just returns ok == false
    """
    mmpformat = mmpformat.strip()
    try:
        sections = mmpformat.split(';')
        assert len(sections) in (1, 2)
        def parse_section(section, allowed_descriptive_words):
            """
            parse a section of the mmpformat record,
            such as "050502 required" or "050701 preferred",
            verifying the word is ok,
            and returning the date string.
            """
            words = section.strip().split()
            if len(words) == 1:
                assert words[0] == '050130' # kluge, special case
            else:
                assert len(words) == 2 and words[1] in allowed_descriptive_words
            return words[0]
        required_date = parse_section(sections[0], ["required"])
        if len(sections) == 1:
            preferred_date = required_date
        else:
            preferred_date = parse_section(sections[1], ["optional", "preferred"])
        return True, required_date, preferred_date
    except:
        msg = "syntax error in mmpformat (or bug) [%s]" % (mmpformat,)
        print_compact_traceback(msg + ": ")
        return False, None, None
    pass

def mmp_date_newer(date1, date2): #bruce 080328
    """
    """
    date1 = _normalize_mmp_date(date1)
    date2 = _normalize_mmp_date(date2)
    return (date1 > date2)

def _normalize_mmp_date(date): #bruce 080328
    """
    [private helper for mmp_date_newer]

    @note: raises exceptions for some mmp file syntax errors
    """
    assert type(date) == type('080328') and date.isdigit()
    if len(date) == 6:
        date = '20' + date
    assert len(date) == 8
    assert int(date) >= 20050130, "too early: %r" % (date,)
    assert date >= '20050130', "too early : %r" % (date,)
        # date of earliest mmpformat record, compared two equivalent ways
    return int(date)

def _mmp_format_newer( (required1, preferred1), (required2, preferred2) ):
    return mmp_date_newer( required1, required2) or \
           ( required1 == required2 and
             mmp_date_newer( preferred1, preferred2) )

# ==

def _test():
    """
    Make sure the parsing code can handle all the format versions,
    and (same thing) that newly defined format versions can be parsed.

    Also verify the versions occur in the right order.
    """
    last_version = None
    last_version_data = None

    for version in KNOWN_MMPFORMAT_VERSIONS:
        ok, required, preferred = parse_mmpformat(version)
        assert ok, "parse failure or error in %r" % (version,)
        assert not mmp_date_newer( required, preferred), "invalid dates in %r" % (version,)
        if last_version:
            assert _mmp_format_newer( (required, preferred), last_version_data ), \
                   "wrong order: %r comes after %r" % \
                   (version, last_version)
        last_version = version
        last_version_data = (required, preferred)
        continue
    # print "fyi: all %d mmpformat versions passed the test" % len(KNOWN_MMPFORMAT_VERSIONS)
    return

# always test, once at startup (since it takes less than 0.01 second)

_test()

# end