Notes on re-reading original ILL archive data from the PDP10 1973-78

Notes on re-reading original ILL archive data from the PDP10 1973-78

Ron Ghosh, reghosh (at) gmail.com, December 2018


This note mainly concerns PDP10 archive magnetic tapes which were later copied onto VAX/VMS disks, and subsequently archived onto optical CDROMS.

The format of the tapes remain that of the DECSystem10, namely written sequentially as 36 bit words. These are mapped into 5 bytes, the last byte 5 containing bits 32-35, and repeating several bits from byte 4. Physical tape record markers on the tape indicated the end of records.

The tape consisted of:

file header(120 bytes)
record header(640 bytes>
data or parameters record (200-20480 bytes)
record header(640 bytes)
data or parameters (200-22840 bytes)
record header etc.
The header record contained, amongst other
36-bit word
6		A5	instrument name
7		A5	NOMEXP(1)
8		A5	NOMEXP(2)
12		A5	date_time_1
13		A5	date_time_2
14		A5	Date_time_3
15		integer	ireper
The integer control variable ireper was 0 for the beginning of a run, 1 for data or parameters, and 2 for a continuation of the previous data/parameter block. (For example D11 sometimes recorded the 4096 integer counts in 4 records of 10240 bytes or 2 records of 20460 bytes. (The latter created problems since it is larger than the default VAX/VMS tape block size of 17k. If tapes were not mounted explicitly with the necessary block size of 20480 the data were truncated on rereading.)

Typical contents of ASCII data files

In the standard ILL ascii files The first lines are standard, comprising run number and identifier (80A)

The 80A record is
InstNomexp.....Date................

Carine intruments
d1a 1973 (Carine)
80A  16A 1I 13I 13F 76A 1I 1I 1I 2F 1I 1I  4I + 1 x (   172F  ) 
(4 values sets including 1 detector monitor and time )
d1a 1976 (Carine)
80A  16A 1I 10I 10F 76A 1I 1I 1I 2F 1I 1I 10I + 1 x ( 70F  ) 
(10 values per scan settng including 6 detectors, monitor and time)
d1a 1990 (PDP11)
80A  72A 8I 18F + 1 x ( 252F  ) 
(12 values per scan setting including 10 detectors, monitor and time)
Typically a few steps are recorded in each run-file. The type of scan is set in a set of integer flags, with some real parameters and the counts and angle data are stored as reals; later one file was sometimes used per instrument setting

Nicole


in5 1974 (Nicole)
80A     12A     16A    156I   2560A   +   26 x (   512I  )
The initial write information is in lines 0000 to 0100 ";" The nomexp field is in 0090; For all the Nicole instruments, IN4, IN5, D7, D11 etc. this is the only title. The remainder are setup parameters for the delays, time of flight unit, distances etc. added manually.
in5 1991 (VMS)
80A    156I    512A    256F    128F   +  123 x (   512I  )
(additional real fields include recorded and manual parameters)
The metadata in 1974 was a hand-edited text field (2560A) similar to that shown below, later including information on detector angles etc. When the PDP11/34 was introduced in the Nicole-II project the ASCII field was retained, and made avalable for treatment by pre-interpreting the line number and values, these being wrtten into an additional real parameter block (128F), with scalers and angles in the 256F field.
                                                                            
   0000    9                                                                    
   0010    0                                                          
   0020    26                                               
   0030    0                                      
   0040    0                            
   0050    0                  
   0060    0        
   0070    1                                                                              
   0080    "BAND"                                                               
   0090    "EMPTYCAN"                                                 
   0100        540                    ;                     
   0110    P = 2997                               
   0120    P3 = 4 X P                   
   0130    D1/2 = 1097        
   0140    D1/3 = 1145 X 10
   0150    D1/4 = 4953                                                                    
   0160    CHW = 23.5 MMS                                                       
   0170    NCH = 507                                                  
   0180    DTOF = 3478 MMS                                  
   0190    LAMBDA NOM. = 9.16 A                   
   0200    DIST.CH4/M1 = 0.931 M        
   0210    DIST. CH4/S = 1.2885 M
   0220    DIST. M1/S = 0.358 M
   0230    DIST. M1/M2 = 0.871 M                                                          
   0240    DIST. S/M2 = 0.589 M                                                 
   0250    DIST. S/DET = 3.977 M                                      
   0260    MEAS TIME = 1215.78 MIN                          
   0270    M1 = 618389                            

Data here have beeen re-aligned - the original is in 90 character records

Reading DecSystem10 tape data stored by VAX/VMS

The ILL utility program SPECTRA (Blanchard, 1986) used the VMS Datatrieve library to manage all the archived data on the VMS system. It included the possibility of writing out data in the TAPDAT ascii file format (Ghosh, Pater 1981), which was the basis for the ascii file format subsequently adopted for use on the unix systems.

Much of the database was converted in the period beyond 1992 when unix was designated the preferred future system at the ILL. Consequently most early data were made available in the new ascii archive, which was compatible with big- (SGI) and little- (HP) endian systems, as well as PC-Windows. Inevitably some data were overlooked (D11 used the name D11A for several years up to 1989, and hence was never included in the list of known instruments.) SPECTRA used its known list of instruments and cycles to identify the storage source (tape, later CDROM).

In 2004, long after SPECTRA was no longer available or easy to modify) there was an opportunity to recover VMS data from the early days directly from the CDROMS; it was hence possible to complete the D11 archive from 1973 on, with a common data format matching current ascii standards and filling obvious gaps. Most of this work was performed on Linux systems, but required data transfer from the VMS files.

VMS files have well defined attributes which cannot be matched, for example, on Unix systems where the contents are simply a byte stream, and it is up to the application to know how to interpret the data. The tape data on the VMS was stored as variable length record data. VMS applications can easily obtain the record lengths (Fortran Q format descriptor, etc). The system has no simple utility to create a simple byte-stream from these data without embedding control bytes. A small program was created on the VMS system to read these variable length records and write out in 512 byte blocks in image format. Preceding each record a 4-byte little endian integer, record length, was prepended (with -1 at the end). The file could then be copied to the linux systems for treatment.

Treatment involved reading the byte stream on linux converting from the 40 bit DEC10 word to 5 bytes, with interpretaion of byte order, for integers, reals and ascii. Here the record headers could be used to separate data and new runs. Finally the data could be written out following the standard ascii prescription and the compressed files added to the ascii archive. A little knowledge of the original structure helped. For D11 there were recorded monitor and preset values and others in a table of name, floating value information using a mixed byte stream of 3A5,F within the data record. The contents of about ten VMS-CDROMs of data from the DEC10 era were transformed to be readable on linux. Even in 2004 there were some problems in reading the discs, which had the tape files HAA01, HAA02 etc stored by the VMS-utility backup.