Notes on re-reading original ILL archive data from the PDP10 1973-78
Ron Ghosh, reghosh (at) gmail.com, December 2018
The format of the tapes remain that of the DECSystem10, namely written sequentially as 36 bit words. These are mapped into 5 bytes, the last byte 5 containing bits 32-35, and repeating several bits from byte 4. Physical tape record markers on the tape indicated the end of records.
The tape consisted of:
file header(120 bytes) record header(640 bytes> data or parameters record (200-20480 bytes) record header(640 bytes) data or parameters (200-22840 bytes) record header etc.The header record contained, amongst other
36-bit word 6 A5 instrument name 7 A5 NOMEXP(1) 8 A5 NOMEXP(2) 12 A5 date_time_1 13 A5 date_time_2 14 A5 Date_time_3 15 integer ireperThe integer control variable ireper was 0 for the beginning of a run, 1 for data or parameters, and 2 for a continuation of the previous data/parameter block. (For example D11 sometimes recorded the 4096 integer counts in 4 records of 10240 bytes or 2 records of 20460 bytes. (The latter created problems since it is larger than the default VAX/VMS tape block size of 17k. If tapes were not mounted explicitly with the necessary block size of 20480 the data were truncated on rereading.)
Typical contents of ASCII data files
In the standard ILL ascii files The first lines are standard, comprising run number and identifier (80A)
The 80A record is InstNomexp.....Date................Carine intruments
d1a 1973 (Carine) 80A 16A 1I 13I 13F 76A 1I 1I 1I 2F 1I 1I 4I + 1 x ( 172F ) (4 values sets including 1 detector monitor and time ) d1a 1976 (Carine) 80A 16A 1I 10I 10F 76A 1I 1I 1I 2F 1I 1I 10I + 1 x ( 70F ) (10 values per scan settng including 6 detectors, monitor and time) d1a 1990 (PDP11) 80A 72A 8I 18F + 1 x ( 252F ) (12 values per scan setting including 10 detectors, monitor and time)Typically a few steps are recorded in each run-file. The type of scan is set in a set of integer flags, with some real parameters and the counts and angle data are stored as reals; later one file was sometimes used per instrument setting
Nicole
in5 1974 (Nicole) 80A 12A 16A 156I 2560A + 26 x ( 512I )The initial write information is in lines 0000 to 0100 ";" The nomexp field is in 0090; For all the Nicole instruments, IN4, IN5, D7, D11 etc. this is the only title. The remainder are setup parameters for the delays, time of flight unit, distances etc. added manually.
in5 1991 (VMS) 80A 156I 512A 256F 128F + 123 x ( 512I ) (additional real fields include recorded and manual parameters)The metadata in 1974 was a hand-edited text field (2560A) similar to that shown below, later including information on detector angles etc. When the PDP11/34 was introduced in the Nicole-II project the ASCII field was retained, and made avalable for treatment by pre-interpreting the line number and values, these being wrtten into an additional real parameter block (128F), with scalers and angles in the 256F field.
0000 9 0010 0 0020 26 0030 0 0040 0 0050 0 0060 0 0070 1 0080 "BAND" 0090 "EMPTYCAN" 0100 540 ; 0110 P = 2997 0120 P3 = 4 X P 0130 D1/2 = 1097 0140 D1/3 = 1145 X 10 0150 D1/4 = 4953 0160 CHW = 23.5 MMS 0170 NCH = 507 0180 DTOF = 3478 MMS 0190 LAMBDA NOM. = 9.16 A 0200 DIST.CH4/M1 = 0.931 M 0210 DIST. CH4/S = 1.2885 M 0220 DIST. M1/S = 0.358 M 0230 DIST. M1/M2 = 0.871 M 0240 DIST. S/M2 = 0.589 M 0250 DIST. S/DET = 3.977 M 0260 MEAS TIME = 1215.78 MIN 0270 M1 = 618389 Data here have beeen re-aligned - the original is in 90 character recordsReading DecSystem10 tape data stored by VAX/VMS
The ILL utility program SPECTRA (Blanchard, 1986) used the VMS Datatrieve library to manage all the archived data on the VMS system. It included the possibility of writing out data in the TAPDAT ascii file format (Ghosh, Pater 1981), which was the basis for the ascii file format subsequently adopted for use on the unix systems.
Much of the database was converted in the period beyond 1992 when unix was designated the preferred future system at the ILL. Consequently most early data were made available in the new ascii archive, which was compatible with big- (SGI) and little- (HP) endian systems, as well as PC-Windows. Inevitably some data were overlooked (D11 used the name D11A for several years up to 1989, and hence was never included in the list of known instruments.) SPECTRA used its known list of instruments and cycles to identify the storage source (tape, later CDROM).
In 2004, long after SPECTRA was no longer available or easy to modify) there was an opportunity to recover VMS data from the early days directly from the CDROMS; it was hence possible to complete the D11 archive from 1973 on, with a common data format matching current ascii standards and filling obvious gaps. Most of this work was performed on Linux systems, but required data transfer from the VMS files.
VMS files have well defined attributes which cannot be matched, for example, on Unix systems where the contents are simply a byte stream, and it is up to the application to know how to interpret the data. The tape data on the VMS was stored as variable length record data. VMS applications can easily obtain the record lengths (Fortran Q format descriptor, etc). The system has no simple utility to create a simple byte-stream from these data without embedding control bytes. A small program was created on the VMS system to read these variable length records and write out in 512 byte blocks in image format. Preceding each record a 4-byte little endian integer, record length, was prepended (with -1 at the end). The file could then be copied to the linux systems for treatment.
Treatment involved reading the byte stream on linux converting from the 40 bit DEC10 word to 5 bytes, with interpretaion of byte order, for integers, reals and ascii. Here the record headers could be used to separate data and new runs. Finally the data could be written out following the standard ascii prescription and the compressed files added to the ascii archive. A little knowledge of the original structure helped. For D11 there were recorded monitor and preset values and others in a table of name, floating value information using a mixed byte stream of 3A5,F within the data record. The contents of about ten VMS-CDROMs of data from the DEC10 era were transformed to be readable on linux. Even in 2004 there were some problems in reading the discs, which had the tape files HAA01, HAA02 etc stored by the VMS-utility backup.