Solved

Fortran -> VB.net conversion - Binary Files

Posted on 2004-03-29
47
747 Views
Last Modified: 2013-11-08
Hi, I desparately need the following Fortran code to be explained and translated into VB.net (or any .NET lang, but will take more explaining) for me.  Its purpose is to take a single file as input, runs it through a process, and create a bunch of little files as an output.  

----------------------------------------------------------------------------------------------------------------------------------------------------------------------
!++
!
!       Call Sequence:  Assign/user    Input File     Input$file
!                       Assign/user    Output File    Output$file
!                       Run Convert_from_hasp
!
!       Errors:         None
!==

         Options        /Extend_source
         Program        Convert_from_hasp
         Implicit       None

         
         !==============
         !
         ! Include Files
         !
         !==============

         Include        '($Ssdef)'

         !===========
         !
         ! Constants:
         !
         !===========        

         Parameter      Blocksize = 32760

         !================
         !
         ! Local variables
         !
         !================        
 
         Character*80            Buffer          ! Buffer for data entry      ! 2
         Integer*4               Byte_count      ! Physical length of record      ! 5
         Character*(BLOCKSIZE)   Data_buf        ! Buffer for data blocks
         Logical*1               Eof_found       ! Flag, when true, indicates first EOF record found ! 6
         Integer*2               I               ! Byte counter                    ! 7
         Integer*4               Input_lun       ! Channel for input file
         Integer*4               J               ! Loop counter                    ! 5
         Integer*4               K               ! Dummy variable              ! 5
         Integer*4               Lib$free_lun    ! RTL routine to release a channel
         Integer*4               Lib$get_lun     ! RTL routine to assign a channel
         Integer*4               Line_count      ! Current line number              ! 6
         Character*80            Msg             ! Buffer for error messages
         Integer*4               Nchars          ! Number of characters read
         Integer*4               Output_lun      ! Channel for output file
         Character*2             Rewind_cmd      ! Byte count on rewind command      ! 4
         Character*1             Space           ! EBCDIC space character      ! 5
         Integer*4               Status          ! Status variable
         Integer*4               Status2         ! Status variable              ! 2
         Character*1             Temp_nchars (2) ! Buffer for NCHARS field      ! 2
         
         !============================================================
         !
         ! Force the longword containing the byte count to have both a
         ! logical and integer representation.
         !
         !============================================================

         Equivalence (Nchars, Temp_nchars)                                ! 2
                                                              ! 4
         Data  Eof_found         /.False./                                ! 6
         Data  Rewind_cmd(1:1)   /'90'x/                                ! 4
         Data  Rewind_cmd(2:2)   /'00'x/                                ! 4
         Data  Space             /'40'x/                                ! 5

                                                              ! 6
         !============================================
         !
         ! Get the channel numbers for the data files.
         !
         !============================================

         status = lib$get_lun (input_lun)
         if (status .ne. SS$_NORMAL) then
            encode (80, 10, msg) 'VMS', status, 'get channel for input file.'
10          format (a, ' status code ', z6, ' received trying to ', a)
            call lib$put_output (msg)
            go to 999
         end if

         status = lib$get_lun (output_lun)
         if (status .ne. SS$_NORMAL) then
            encode (80, 10, msg) 'VMS', status, 'get channel for output file.'
            call lib$put_output (msg)
            go to 999
         end if
         
         !==================================================================
         !
         ! Open the data files.  Open the output file as new since we want
         ! to create a new version for every file defined by the input file.
         !
         !==================================================================

         open (input_lun, status='old', recordtype='variable', iostat=status,
     &      recl=80, shared, readonly, name='INPUT$FILE')
         if (status .ne. 0) then
            encode (80, 10, msg) 'I/O', status, 'open input file.'
            call lib$put_output (msg)
            go to 999
         end if

         open (output_lun, status='new', recordtype='variable',
     &      iostat=status, recl=BLOCKSIZE, name='OUTPUT$FILE')
         if (status .ne. 0) then
            encode (80, 10, msg) 'I/O', status, 'open output file.'
            call lib$put_output (msg)
            go to 999
         end if
         
         !==============================================================
         !
         ! Start the looping process.  We start by looking for a control
         ! record.  If one is not found, the process is aborted.
         !
         !==============================================================

         read (input_lun, 20, iostat=status) byte_count, buffer                    ! 5
20       format (q, a80)                                            ! 5
         line_count = 1                                                  ! 6
         do while ((status .eq. 0) .and. (status2 .eq. 0))                    ! 2
           
            !============================================================      ! 5
            !                                                        ! 5
            ! Handle the possibility that Hexadecimal 40 may be a valid              ! 6
            ! byte count.  The problem is that the SNA/Gateway strips off      ! 5
            ! trailing EBCDIC spaces (Hex 40) before transmitting the              ! 6
            ! record.  We have to examine the number of bytes read in order      ! 5
            ! to determine if we have to replace the EBCDIC spaces.              ! 5
            !                                                        ! 5
            !============================================================      ! 5
                                                              ! 5
            if (byte_count .eq. 0) then                                      ! 5
               buffer(2:2) = space                                      ! 5
               buffer(3:3) = space                                      ! 5
            else if (byte_count .eq. 2) then                                ! 5
               buffer(3:3) = space                                      ! 5
            end if                                                  ! 5
                                                              ! 5
            temp_nchars(2) = buffer(2:2)                                ! 3
            temp_nchars(1) = buffer(3:3)                                ! 3
                                                              ! 4
            !========================================================              ! 3
            !
            ! The number of bytes to read is zero.  Assume that the              ! 4
            ! record is an end-of-file record and create a new output              ! 3
            ! file.                                                  ! 3
            !
            !========================================================              ! 3
                                                              ! 4
            if (nchars .eq. 0) then                                      ! 3
                                                              ! 3
               close (output_lun, status='save', iostat=status)
                                                              ! 7
               if (eof_found) then                                      ! 6
                  status = -1                                            ! 6
                  go to 60                                            ! 6
               end if                                                  ! 6
               eof_found = .true.                                      ! 6
                                                              ! 6
               open (output_lun, status='new', recordtype='variable',
     &            iostat=status, recl=BLOCKSIZE, name='OUTPUT$FILE')
               if (status .ne. 0) then
                  encode (80, 10, msg) 'I/O', status, 'open output file.'
                  call lib$put_output (msg)
                  status = -1
                  go to 999
               end if
           
            else if (buffer(2:3) .eq. rewind_cmd) then                          ! 4
               status = -1                                            ! 4
               go to 60                                                  ! 4
                                                              ! 4
            else                                                  ! 4
                                                              ! 2
               !=========================================
               !
               ! Make sure we do not overflow the buffer.
               !
               !=========================================

               eof_found = .false.                                      ! 6
               if (nchars .gt. BLOCKSIZE) then
                  write (6, 25) line_count, buffer                          ! 6
25                format ('0Byte count of record exceeds buffer size.  Routine aborted.'/ ! 6
     &                    ' Error occurred at line number ', i8, ', record read was ',/ ! 6
     &                    '0', a80)                                      ! 6
                  status = -1
                  go to 999
               end if
               
               !=========================================================
               !
               ! Read the required number of bytes from the data file and
               ! write them out as one big record.
               !
               !=========================================================

               byte_count = 0                                            ! 6
               do while (byte_count .lt. nchars)                          ! 5
                  read (input_lun, 40, iostat=status) k, buffer                    ! 5
40                format (q, a80)                                      ! 5
                  line_count = line_count + 1                                ! 6
                                                              ! 6
                  !============================================================      ! 5
                  !                                                  ! 5
                  ! Because the SNA/Gateway strips off trailing spaces, if the      ! 6
                  ! byte count does not match the number of bytes that are      ! 6
                  ! supposed to be on the record, insert EBCDIC spaces to fill      ! 5
                  ! out the record.                                      ! 5
                  !                                                  ! 5
                  !============================================================      ! 5
                                                              ! 5
                  if (k .ne. 80) then                                      ! 5
                     do j = k + 1, 80                                      ! 5
                        buffer(j:j) = space                                ! 5
                     end do                                            ! 5
                  end if                                            ! 5
                                                              ! 5
                  i = nchars - byte_count                                ! 7
                  if (i .gt. 79) i = 79                                      ! 7
                                                              ! 7
                  data_buf(byte_count+1:) = buffer(2:i+1)                    ! 7
                  byte_count = byte_count + i                                ! 7
               end do                                                  ! 2

               write (output_lun, 50, iostat=status2) data_buf(1:nchars)      ! 2
50             format (a<nchars>)

            end if

            read (input_lun, 20, iostat=status) byte_count, buffer              ! 5
            line_count = line_count + 1                                      ! 6
60       end do
         
         !==========================
         !
         ! Close the files and exit.
         !
         !==========================

         close (input_lun)
         close (output_lun)

         status = lib$free_lun (input_lun)
         status = lib$free_lun (output_lun)

999      call exit (status)
         end
0
Comment
Question by:970170
  • 20
  • 13
  • 12
47 Comments
 
LVL 96

Expert Comment

by:Bob Learned
Comment Utility
Are the files in EBCDIC or ASCII format?  Is this from a mainframe?  How are the smaller output files different from the single input file?

Bob
0
 

Author Comment

by:970170
Comment Utility
I believe the files are in EBCDIC format.  It is possible that the files originated from a mainframe..  For the example files I have, the input file is 561KB, and the output are 12 files, ranging from 1KB to 54KB and totalling only 280KB.  Does this tell you anything?
0
 
LVL 96

Expert Comment

by:Bob Learned
Comment Utility
I studied Fortran in college, way back in the 80's.  I don't think that I can help you with the code, but I can try to help with interpreting the files, and trying to come up with a strategy.  I just read some of the comments:

>>The problem is that the SNA/Gateway strips off trailing EBCDIC spaces (Hex 40)

How is it that you have the input files, but you don't even know if they are EBCDIC or not?  Where did you get the input file from?  

Bob
0
 

Author Comment

by:970170
Comment Utility
Well, I was sorta just given a bunch of files and a task.  I am more concerned with the Fortran to .NEt conversion.. I believe I know how the system works but I have no clue how to do anything with records within binary files in .net
0
 
LVL 96

Expert Comment

by:Bob Learned
Comment Utility
What it looks like to me is variable-length records.  'byte_count' gets the number of bytes read from 'input_lun' file, into the 'buffer' variable.

Is this a correct assumption?

Bob
0
 

Author Comment

by:970170
Comment Utility
Yes.. except im not sure what the "20" is for when reading the control record.. also not sure how the buffer(3:3) string manipulations mean..
0
 
LVL 96

Expert Comment

by:Bob Learned
Comment Utility
Do you know what version of Fortran this is?

Bob
0
 

Author Comment

by:970170
Comment Utility
i *believe* it is fortran 77...
0
 
LVL 96

Expert Comment

by:Bob Learned
Comment Utility
It appears that the read command is in the format:

read (UNIT, FMT, IOSTAT, ERR, END) where IOSTAT, ERR and END are optional.

Trying to find out what FMT is.

0
 
LVL 96

Accepted Solution

by:
Bob Learned earned 250 total points
Comment Utility
The 20 is the statement label for the format, referring to the line:

20       format (q, a80)

A 80 = 80 characters.

Bob

0
 
LVL 12

Expert Comment

by:farsight
Comment Utility
Here's a tip regarding the Q format descriptor.  It does NOT consume any bytes, which is not what I initially assumed.  Yet the code seems to ignore the first character of input, resulting in only 79 actual data characters per record. Therefore, I suspect the first character encodes the record length.
http://h18009.www1.hp.com/fortran/docs/lrm/lrm0436.htm#q_edit

The biggest question I have is how do we read/write that type of file: i.e. one with logical records that have some particular length?
How are these files represented byte-by-byte?  In other words, can we read a byte-stream and interpret that in a way that allows us to distinguish the boundaries between records?

Other than that and the filenames, the rest seems (reasonably) straightforward.
0
 

Author Comment

by:970170
Comment Utility
Hmm.. what do you mean when you say that the Q format descriptor doesnt consume any bytes?  its purpose is to define the input type right?  also when you say that the code ignores the first char of input, could you just highlight the line?  i think im not understanding the code enough...

i also have the corresponding fortran code to generate one of these files with a bunch of little files (the opposite of this snippet).. would that be helpful?
0
 
LVL 96

Expert Comment

by:Bob Learned
Comment Utility
As we sit here in the dark, pondering code that isn't ours, a ray of light comes streaming through the open window above:  "would that be helpful" a voice says.  It was like a voice from heaven.  We try to respond--and it finally comes out--Yes.
0
 

Author Comment

by:970170
Comment Utility
funny guy :P - well, not sure how much sense it will make, but here goes nothing:

---------------------------------------------------------------------


!++
!
!       Call Sequence:  Assign/user    Input File     Input$file
!                       Assign/user    Output File    Output$file
!                       Run Convert_to_hasp
!
!       Description:    This routine reads a record from the file specified
!                       by the logical variable Input$file and generates a
!                       block character count record followed enough 80
!                       byte records to represent the input record.  The
!                       generated records are written to a disk file
!                       which can be transmitted to the IPC.
!
!       Errors:         None
!==

         Options        /Extend_source
         Implicit       None

         
         !==============
         !
         ! Include Files
         !
         !==============

         Include        '($Ssdef)'

         !===========
         !
         ! Constants:
         !
         !===========        

         Parameter      Blocksize = 32760

         !================
         !
         ! Local variables
         !
         !================        
 
         Character*80            Buffer          ! Buffer for building print line ! 6
         Character*(BLOCKSIZE)   Data_buf        ! Buffer for data blocks
         Logical*1               Header_flag     ! Flag indicating header record should be printed ! 5
         Character*1             Header_rec (77) ! Header record for Panvalet files ! 5
         Integer*2               I               ! Loop counter
         Integer*4               Input_lun       ! Channel for input file
         Character*1             Len_rec (3)     ! EBCDIC characters "LEN"      ! 6
         Integer*4               Lib$free_lun    ! RTL routine to release a channel
         Integer*4               Lib$get_lun     ! RTL routine to assign a channel
         Integer*2               Min_amt         ! Minimum amount to move      ! 6
         Character*80            Msg             ! Buffer for error messages
         Integer*2               Nchars          ! Number of characters read
         Integer*4               Output_lun      ! Channel for output file
         Character*1             Space (77)      ! EBCDIC space character      ! 5
         Integer*4               Status          ! Status variable
         Character*4             String          ! Scratch space
         Logical*1               Temp_nchars (2) ! Temporary field for NCHARS      ! 4
                                                              ! 4
         Equivalence             (Nchars, Temp_nchars)                          ! 4
   
         Data  Header_flag /.true./                                      ! 5
         Data  (Header_rec(i), i = 1, 77) /'E4'x, '40'x, '5C'x, '5C'x, '40'x, 'F3'x, ! 6
     &                                     'F2'x, 'F7'x, 'F6'x, 'F0'x, '40'x, 'D7'x, ! 6
     &                                     'E2'x, '40'x, 'C4'x, 'C5'x, 'C3'x, '61'x, ! 6
     &                                     'D7'x, 'C1'x, 'D5'x, 'E5'x, 'C1'x, 'D3'x, ! 6
     &                                     'C5'x, 'E3'x, '40'x, 'D4'x, 'E4'x, 'D3'x, ! 6
     &                                     'E3'x, 'C9'x, 'C6'x, 'C9'x, 'D3'x, 'C5'x, ! 6
     &                                     '40'x, 'C6'x, 'D6'x, 'D9'x, 'D4'x, 'C1'x, ! 6
     &                                     'E3'x, 34 * '40'x/                    ! 5
         Data  (Len_rec(i), i = 1, 3)     /'D3'x, 'C5'x, 'D5'x/                    ! 6
         Data  (Space(i), i = 1, 77)      /77 * '40'x/                          ! 6

         
         !============================================
         !
         ! Get the channel numbers for the data files.
         !
         !============================================

         status = lib$get_lun (input_lun)
         if (status .ne. SS$_NORMAL) then
            encode (80, 10, msg) 'VMS', status, 'get channel for input file.'
10          format (a, ' status code ', z6, ' received trying to ', a)
            call lib$put_output (msg)
            go to 999
         end if

         status = lib$get_lun (output_lun)
         if (status .ne. SS$_NORMAL) then
            encode (80, 10, msg) 'VMS', status, 'get channel for output file.'
            call lib$put_output (msg)
            go to 999
         end if
         
         !==============================================================
         !
         ! Open the data files.  Because there may be more than one file
         ! to process, the output file must be open as unknown and the
         ! access type must be append.
         !
         !==============================================================

         open (input_lun, status='old', recordtype='variable', iostat=status,
     &      shared, readonly, name='INPUT$FILE')
         if (status .ne. 0) then
            encode (80, 10, msg) 'I/O', status, 'open input file.'
            call lib$put_output (msg)
            go to 999
         end if

         open (output_lun, status='unknown', recordtype='variable', recl=80,
     &      iostat=status, access='append', name='OUTPUT$FILE')
         if (status .ne. 0) then
            encode (80, 10, msg) 'I/O', status, 'open output file.'
            call lib$put_output (msg)
            go to 999
         end if
         
         !=================================================================
         !
         ! Read the first record.  Make sure that no characters were missed
         ! by comparing the number of bytes read with the maximum block
         ! size.  Convert the hexadecimal values to EDCDIC eqiivalents.
         ! Write the control record to the file followed by the record
         ! containing the data block.  The record containing the data
         ! block is broken into as many 80 byte records as required to
         ! equal the number of bytes specified in the control record.
         ! Finally, get the next record.
         !
         !=================================================================

         read (input_lun, 20, iostat=status) nchars, data_buf(1:BLOCKSIZE)
20       format (q, a)
                                                              ! 4
         do while ((status .eq. 0) .and. (nchars .ne. 0))                    ! 4
                                                              ! 4
            if (nchars .le. BLOCKSIZE) then                                ! 4
                                                              ! 4
               if (header_flag) then                                      ! 5
                  write (output_lun, 40) space(1), (temp_nchars(i),i=2,1,-1), header_rec ! 5
40                format (a1, 2a1, 77a1)                                ! 5
                  header_flag = .false.                                      ! 5
                                                              ! 6
               else                                                  ! 5
                  write (output_lun, 45) space(1), (temp_nchars(i), i = 2, 1, -1), len_rec, (space(i),i=1,74) ! 6
45                format (a1, 2a1, 3a1, 74a1)                                ! 6
               end if                                                  ! 5
                                                                       ! 6
               !============================================================      ! 6
               !                                                  ! 6
               ! Initialize the buffer with EBCDIC zeros.  Move a maximum of      ! 6
               ! 79 bytes to the buffer.  Write the record to file.  Shift      ! 6
               ! the record left and decrement the NCHARS variable.              ! 6
               !                                                  ! 6
               !============================================================      ! 6
                                                              ! 6
               do while (nchars .gt. 0)                                      ! 6
                  do i = 1, 80                                            ! 6
                     buffer(i:i) = space(1)                                ! 6
                  end do                                            ! 6
                                                              ! 6
                  min_amt = min (79, nchars)                                ! 6
                  buffer(2:min_amt + 1) = data_buf(1:min_amt)                    ! 6
                  write (output_lun, 50) buffer                                ! 6
50                format (a80)                                            ! 6
                                                              ! 6
                  data_buf(1:) = data_buf(80:)                                ! 6
                  nchars = nchars - 79                                      ! 6
               end do

            else
               encode (80, 60, msg) nchars
60             format ('Error in read.  Number of characters read was ', i8)
               call lib$put_output (msg)
               go to 999
            end if

            read (input_lun, 20, iostat=status) nchars, data_buf(1:BLOCKSIZE)
         end do
         
         !================================
         !
         ! Write out the end-of-file mark.
         !
         !================================

         nchars = 0                                                  ! 4
         write (output_lun, 45) space(1), (temp_nchars(i),i = 2, 1, -1), len_rec, (space(i),i=1,74) ! 6

         close (input_lun)
         close (output_lun)

         status = lib$free_lun (input_lun)
         status = lib$free_lun (output_lun)

999      call exit (status)
         end
0
 
LVL 96

Expert Comment

by:Bob Learned
Comment Utility
Some interesting things to note from this:

>> The record containing the data block is broken into as many 80 byte records as required to equal the number of bytes specified in the control record.

>> Move a maximum of 79 bytes to the buffer.  Write the record to file.
0
 

Author Comment

by:970170
Comment Utility
can you diagram for me how the file is set up in terms of byte/record lay out?  im having problems visualizing this whole thing :P
0
 

Author Comment

by:970170
Comment Utility
so lets see if im understanding this right..

at the beginning of the file, there is a control record of 80 chars, which consists of a byte count for the whole data block.  after the control record, there are a bunch of other data records of max length 79 chars and one byte record length?  all these little records then have to add up to the byte count from the control record?
0
 
LVL 96

Expert Comment

by:Bob Learned
Comment Utility
There is:

(1) header_rec:

>>Data  (Header_rec(i), i = 1, 77) /'E4'x, '40'x...

(2) len_rec (appears to be EOF marker):

>>Data  (Len_rec(i), i = 1, 3)     /'D3'x, 'C5'x, 'D5'x/

(3) write (output_lun, 40) space(1), (temp_nchars(i),i=2,1,-1), header_rec:

>> Appears to write the header record with a reverse-order representation for the number of characters to write.
0
 

Author Comment

by:970170
Comment Utility
1. what does all that /'e4'x, '40'x.. stuff mean?  im sorry if i sound stupid.. ive never worked with binary files before..
2. what do you mean by reverse order representation?
0
 
LVL 96

Expert Comment

by:Bob Learned
Comment Utility
The characters in the header record are EBCDIC characters that unique identify each record.  There might be some information indicated by the header, but I couldn't tell you from the code that I see.  I don't think, though, that you need to concern yourself with the header, other than to know that it is there, and how large it is.

The write statement in (3) above writes 1 space, and the numerical representation of the number of characters written from 2 to 1 step -1, and then the header record.  The format statement in #40 (format (a1, 2a1, 77a1)) is 1 alpha character + 2 alpha characters + 77 alpha characters.
0
 

Author Comment

by:970170
Comment Utility
how would i go about reading in the control record in vb.net?  is there any way you could give me some code to play with?
0
 

Author Comment

by:970170
Comment Utility
aside: do you know if there is any way that copying these EBCDIC files from a VAX OS to a Windows OS could change characters within the file?  i was told that copying them straight might change the file, but i dont see how.  is this an EBCDIC issue?
0
Highfive Gives IT Their Time Back

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 
LVL 96

Expert Comment

by:Bob Learned
Comment Utility
You could start with something like this:

  Public Sub ReadBinaryFile(ByVal strFileName As String)

      Dim streamInput As New FileStream(strFileName, FileMode.Open)
      Dim readerInput As New BinaryReader(streamInput)

      Try

         Dim byteLength() As Byte
         Dim byteHeader() As Byte

         ' Ignore the space.
         readerInput.ReadBytes(1)

         ' Get the number of characters in the data block from the
         ' control record.
         byteLength = readerInput.ReadBytes(2)

         ' Get the header record.
         byteHeader = readerInput.ReadBytes(77)

      Catch ex As Exception

         MsgBox(ex.ToString)

      End Try

      If Not streamInput Is Nothing Then
         streamInput.Close()
      End If

      If Not readerInput Is Nothing Then
         readerInput.Close()
      End If

   End Sub

Let me know what kind of information you get from this, if it works.

Bob
0
 
LVL 96

Expert Comment

by:Bob Learned
Comment Utility
One more thing, add Imports System.IO at the very top of the module or class where you put the code above.
0
 

Author Comment

by:970170
Comment Utility
ok, got it to run, but it doesnt look like the byte arrays are being populated.  how do i convert them to asii representations or to some form that i can display?
0
 
LVL 96

Expert Comment

by:Bob Learned
Comment Utility
Are there any values in the byte arrays:  Debug.WriteLine(byteLength(0))?
0
 
LVL 12

Expert Comment

by:farsight
Comment Utility
What I meant about the Q format descriptor (way earlier) is that it doesn't consume bytes, it just returns information from the reader.

A = Read a single character.
Q = How many bytes until end-of-record.

My example: A 100-yard game of "simon-says" where:
A = Take a long step of 1 yard.
Q = How many yards until the finish line.
Simon will say the Format (Q, Q, A5, Q, A10, Q, A85), consuming all 100 yards (5+10+85).

Simon: Ready?
Runner: Ready!
Simon: Q
Runner: 100 yards
Simon: Q
Runner: 100 yards, still
Simon: A5
Runner: Ran 5 yards
Simon: Q
Runner: 95 yards
Simon: A10
Runner: Done.
Simon: Q
Runner: 85 yards
Simon: A85
Runner: Finished!  I win!  Then, runs off muttering something about silly examples.

I do hope that makes it clear though.  Now, reread the link I initially provided, and I think it'll make more sense.
0
 
LVL 12

Expert Comment

by:farsight
Comment Utility
Please do this for us, so that we know what bytes are in the source file.

(1) On your Windows machine, open a console window, and move to some (tempory) directory.
(2) Copy the VMS source file to a Windows file "Sample.dat" in that directory.  (The debug command we use below only accepts 8.3 filenames, so use a filename no longer than this:   12345678.123   .)
(3) In the same directory, (Use Notepad.exe, and) build a plain text file named debug.in containing the following (shown between, not including the lines):
--- --- --- debug.in --- --- ---
d
d
d
d
d
d
q
q

--- --- --- end-of-file --- --- ---
That's 6 d's and 2 q's and a blank line.  [ Tech detail: The second q is not used, but visually ensures that there's a REQUIRED end-of-line after the first q.  If you have only one q with and end-of-file, not end-of-line, following it, things will lock up. ]

(4) Type this command:

debug Sample.dat < debug.in > debug.out

That command runs the old DOS debugger on the Sample.dat file, getting commands from debug.in, and sending output to debug.out.
(5) Post the contents of debug.out here for us to read.
The contents will look something like the following, where XXXX:0100 indicates the address of the first byte of the file, and the addresses on the left count up from that.  (It starts at 0100 because DOS loads some info before that which is NOT part of the file.)  In the middle are the bytes, in hexadecimal, 16 bytes to a line.  On the right is the bytes shown in ASCII -- pretty useless to us, because it's really EBCDIC.   It will look something like this.  That should give us enough data to determine the file format, or most of it.

C:\>debug sample.dat
-d
0B08:0100  3C 3C 3C 0C 0D 0A 3E 3E-3E 20 20 20 20 20 20 20   <<<...>>>
0B08:0110  20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20
0B08:0120  20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20
0B08:0130  20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20
0B08:0140  20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20
0B08:0150  20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20
0B08:0160  20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20
0B08:0170  20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20
-d
0B08:0180  3C 3C 3C 0C 0D 0A 3E 3E-3E 20 20 20 20 20 20 20   <<<...>>>
0B08:0190  20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20
0B08:01A0  20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20
0B08:01B0  20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20
0B08:01C0  20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20
0B08:01D0  20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20
0B08:01E0  20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20
0B08:01F0  20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20
-q
0
 

Author Comment

by:970170
Comment Utility
added:

            ' Get the number of characters in the data block from the
            ' control record.
            byteLength = readerInput.ReadBytes(2)

            For i As Integer = 0 To 1
                Debug.WriteLine("len - " & byteLength(i))
            Next
            ' Get the header record.
            byteHeader = readerInput.ReadBytes(77)

            For i As Integer = 0 To 76
                Debug.WriteLine(byteHeader(i))
            Next


i get:


len - 0
len - 120
228
64
92
92
64
243
242
247
246
240
64
215
226
64
196
197
195
97
215
193
213
229
193
211
197
227
64
212
228
211
227
201
198
201
211
197
64
198
214
217
212
193
227
64
64
64
64
64
64
64
64
64
64
64
64
64
64
64
64
64
64
64
64
64
64
64
64
64
64
64
64
64
64
64
64
64
64

so.. the 0 and the 120 represents the datablock size.. the rest represents the header...

1) 0 and 120 are in what base?  how big does this say the datablock is in bytes?
2) the header seems to have data in it.. what purpose does the header serve?
3) does the datablock follow the header? (x many bytes from #1, not read in the code)
4) does the datablock then occupy the rest of the .dat file?  or does #2-#3 repeat until we reach x many bytes from #1?

thank you so much for you help already.. im a mess :(
0
 

Author Comment

by:970170
Comment Utility
debug.out is as follows:

-d

1387:0100  40 00 78 E4 40 5C 5C 40-F3 F2 F7 F6 F0 40 D7 E2   @.x.@\\@.....@..
1387:0110  40 C4 C5 C3 61 D7 C1 D5-E5 C1 D3 C5 E3 40 D4 E4   @...a........@..
1387:0120  D3 E3 C9 C6 C9 D3 C5 40-C6 D6 D9 D4 C1 E3 40 40   .......@......@@
1387:0130  40 40 40 40 40 40 40 40-40 40 40 40 40 40 40 40   @@@@@@@@@@@@@@@@
1387:0140  40 40 40 40 40 40 40 40-40 40 40 40 40 40 40 40   @@@@@@@@@@@@@@@@
1387:0150  40 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00   @...............
1387:0160  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00   ................
1387:0170  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00   ................
-d

1387:0180  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00   ................
1387:0190  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00   ................
1387:01A0  40 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00   @...............
1387:01B0  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00   ................
1387:01C0  00 00 00 00 00 00 00 00-00 00 40 40 40 40 40 40   ..........@@@@@@
1387:01D0  40 40 40 40 40 40 40 40-40 40 40 40 40 40 40 40   @@@@@@@@@@@@@@@@
1387:01E0  40 40 40 40 40 40 40 40-40 40 40 40 40 40 40 40   @@@@@@@@@@@@@@@@
1387:01F0  40 00 78 D3 C5 D5 40 40-40 40 40 40 40 40 40 40   @.x...@@@@@@@@@@
-d

1387:0200  40 40 40 40 40 40 40 40-40 40 40 40 40 40 40 40   @@@@@@@@@@@@@@@@
1387:0210  40 40 40 40 40 40 40 40-40 40 40 40 40 40 40 40   @@@@@@@@@@@@@@@@
1387:0220  40 40 40 40 40 40 40 40-40 40 40 40 40 40 40 40   @@@@@@@@@@@@@@@@
1387:0230  40 40 40 40 40 40 40 40-40 40 40 40 40 40 40 40   @@@@@@@@@@@@@@@@
1387:0240  40 F8 83 40 80 20 40 AD-01 02 E9 01 02 B9 01 02   @..@. @.........
1387:0250  F1 01 02 F1 7F 02 80 40-40 80 80 02 F7 40 40 F7   .......@@....@@.
1387:0260  43 40 A1 43 02 E5 08 02-C7 49 02 F7 49 02 DC 4C   C@.C.....I..I..L
1387:0270  02 80 0D 02 BC 0D 02 CE-0E 02 BC 0E 02 AE 10 02   ................
-d

1387:0280  89 51 02 DC 51 02 AB 51-02 BC 51 02 CD 52 02 A8   .Q..Q..Q..Q..R..
1387:0290  40 52 02 F7 52 02 C8 13-02 E9 57 02 8C 5B 02 B3   @R..R.....W..[..
1387:02A0  23 02 86 64 02 98 25 02-E0 67 02 A7 68 02 B0 2A   #..d..%..g..h..*
1387:02B0  02 E9 6B 02 B0 6B 02 F7-6B 02 40 40 40 40 40 40   ..k..k..k.@@@@@@
1387:02C0  40 40 40 40 40 40 40 40-40 40 40 40 40 40 40 40   @@@@@@@@@@@@@@@@
1387:02D0  40 40 40 40 40 40 40 40-40 40 40 40 40 40 40 40   @@@@@@@@@@@@@@@@
1387:02E0  40 00 78 D3 C5 D5 40 40-40 40 40 40 40 40 40 40   @.x...@@@@@@@@@@
1387:02F0  40 40 40 40 40 40 40 40-40 40 40 40 40 40 40 40   @@@@@@@@@@@@@@@@
-d

1387:0300  40 40 40 40 40 40 40 40-40 40 40 40 40 40 40 40   @@@@@@@@@@@@@@@@
1387:0310  40 40 40 40 40 40 40 40-40 40 40 40 40 40 40 40   @@@@@@@@@@@@@@@@
1387:0320  40 40 40 40 40 40 40 40-40 40 40 40 40 40 40 40   @@@@@@@@@@@@@@@@
1387:0330  40 B5 2C 02 BC 2C 02 83-6D 02 8A 6D 02 F2 6D 02   @.,..,..m..m..m.
1387:0340  E9 6D 02 97 6D 02 91 6D-02 E0 6D 02 FE 6D 02 C7   .m..m..m..m..m..
1387:0350  6E 02 D0 6E 02 C4 31 02-B9 31 02 C1 32 02 E9 73   n..n..1..1..2..s
1387:0360  02 D9 34 02 B5 34 02 D9-75 02 FE 76 02 C2 79 02   ..4..4..u..v..y.
1387:0370  D0 7A 02 FE 7C 02 DF 3E-02 EF 3E 02 F4 40 43 E0   .z..|..>..>..@C.
-d

1387:0380  40 40 40 8A 76 08 8F 13-4C C7 57 0D C1 73 4C E0   @@@.v...L.W..sL.
1387:0390  0E 0D 8F 4F 4C D0 37 4C-94 57 0D 92 57 4C 94 43   ...OL.7L.W..WL.C
1387:03A0  4A D3 57 4C E0 52 0D 8F-23 4A 40 40 40 40 40 40   J.WL.R..#J@@@@@@
1387:03B0  40 40 40 40 40 40 40 40-40 40 40 40 40 40 40 40   @@@@@@@@@@@@@@@@
1387:03C0  40 40 40 40 40 40 40 40-40 40 40 40 40 40 40 40   @@@@@@@@@@@@@@@@
1387:03D0  40 00 78 D3 C5 D5 40 40-40 40 40 40 40 40 40 40   @.x...@@@@@@@@@@
1387:03E0  40 40 40 40 40 40 40 40-40 40 40 40 40 40 40 40   @@@@@@@@@@@@@@@@
1387:03F0  40 40 40 40 40 40 40 40-40 40 40 40 40 40 40 40   @@@@@@@@@@@@@@@@
-q

0
 
LVL 12

Expert Comment

by:farsight
Comment Utility
If you need an online FORTRAN manual, there's one at   http://h18009.www1.hp.com/fortran/docs/lrm/lrm-frames.html  , which is the one I'm using until I know exactly what version you're program uses.  That's where I found the Q descriptor.

Also, what is the size in bytes of the input file?
What is the size in bytes of the output files? (If you have them available.)

Here's a post with a lot of (my and other) simple BinaryReader example code in VB.NET.
http://www.experts-exchange.com/Programming/Programming_Languages/Dot_Net/Q_20874019.html
0
 

Author Comment

by:970170
Comment Utility
For the example files I have, the input file is 561KB, and the output are 12 files, ranging from 1KB to 54KB and totalling only 280KB.
0
 
LVL 12

Expert Comment

by:farsight
Comment Utility
1387:0100  40 00 78

I read this as one EBCDIC space (40 hex), and the number 00 78 hex, which is 120 decimal.
I also see 40 00 78 at 1387:01F0 and 1387:02E0 and 1387:03D0 ...

Let's convert this all to decimal
(Convert just the right-most 2 bytes to hex, minus 100 hex (256 decimal) )
1387:0100 ===> file position 0 decimal
1387:01F0 ===> file position 240 decimal
1387:02E0 ===> file position 480 decimal
1387:03D0 ===> file position 720 decimal

So every 240 bytes, we have a record starting.  Each one tells us the the records is 120 WORDS long?  (2 bytes per word)

Is EBCIDIC a 2-byte code?

That's speculation.  Can someone confirm?
0
 
LVL 12

Expert Comment

by:farsight
Comment Utility
> Is EBCIDIC a 2-byte code?
Nope.   But I still think the count is for 2-byte words.

Here's the first bit converted by hand (error-prone),
based on this EBCDIC table:
http://www.natural-innovations.com/computing/asciiebcdic.html
It seems to make sense.  Copy and paste to Notepad.exe so that
the spacing is shown correctly with a fixed-width font.

1387:0100  40 00 78 E4 40 5C 5C 40-F3 F2 F7 F6 F0 40 D7 E2   @.x.@\\@.....@..
              nnnnn  U     \  \     3  2  7  6  0     Q  S
nnnnn = decimal 120

1387:0110  40 C4 C5 C3 61 D7 C1 D5-E5 C1 D3 C5 E3 40 D4 E4   @...a........@..
               D  E  C  /  Q  A  N  V  A  L  E  T     M  U

1387:0120  D3 E3 C9 C6 C9 D3 C5 40-C6 D6 D9 D4 C1 E3 40 40   .......@......@@
            L  T  I  F  I  L  E     F  O  R  M  A  T
0
 
LVL 12

Expert Comment

by:farsight
Comment Utility
> For the example files I have, the input file is 561KB, and the output are 12 files, ranging from 1KB to 54KB and totalling only 280KB.

I really want to know how big real production input files are?
If they're about this size, you could read the whole file in at once, store it in a byte-array, and process the byte-array from start to end.  If they're large multimegabyte files, then I wouldn't recommend that strategy.
0
 

Author Comment

by:970170
Comment Utility
The real production files are typically not larger than 1 MB.  I am beginning to see the light now..

...still digesting info and playing with stuff...

will get back to this post.. please dont go away!
0
 

Author Comment

by:970170
Comment Utility
question.. (might have been answered before, but im not sure)

read (input_lun, 20, iostat=status) byte_count, buffer
20  format (q, a80)

means..

byte_count = q, buffer = a80 (in a symbolic way)

if a80 = 80 characters (does this mean 80 bytes?)
what does q mean?  how big is the byte_count?  1 byte?

0
 
LVL 12

Expert Comment

by:farsight
Comment Utility
> The character count edit descriptor returns the remaining number of characters in the current input record.
Definately.  ( Does this mean you didn't understand Simon-Says!??!  :~(     )
> what does q mean?
"The character count edit descriptor returns the remaining number of characters in the current input record."
> how big is the byte_count?
Integer*4               Byte_count      !!  This is the in-memory size of byte_count.
which is equivalent to Integer in .NET (a synonym is Int32).
It takes NO SPACE in the file, because it's computed, not read.
> 1 byte?
Nope.
0
 

Author Comment

by:970170
Comment Utility
Hmm..

ok i think im having a syntactical brain malfunction here..

read (input_lun, 20, iostat=status) byte_count, buffer
20  format (q, a80)

again means "read 80 chars from input_lun into ..?"  id originally thought it went into both byte_count and buffer, but now youre saying that byte_count is computed, not read... so where is it computed?  with q being just a descriptor, is it then only internally used?
0
 
LVL 12

Expert Comment

by:farsight
Comment Utility
The read command itself computes the byte_count.
That is what the Q format descriptor instructs it to do.

Think of the format as a little macro language that the Read command interprets.
Q:    Fill the next variable with the number of remaining characters in the record.
A80: Fill the next variable with 80 characters read from the input stream.

Does that make sense?
(If not, try again ... but I'm running out of ways to describe this.  
... but I'm tired right now, so I'm sure some sleep will give me more creativity.)
0
 

Author Comment

by:970170
Comment Utility
And so... on one not so notable day, upon a not so notable whim, the prodigal son finally returns to the land of understanding.  he looks up at the almighty Simon.. and bows.. a gesture of compete gratitude pulsating through his body.

so what other values can replace the q and what do they mean?  do you have an arg. list for read parameteres?
0
 
LVL 12

Expert Comment

by:farsight
Comment Utility
If you click on the link to the FORTRAN manual that I left way above (03/31/2004 07:25AM MST), and page down to Chapter 11.  It's all about I/O Formatting.    11.3.9 is about the Q.  The Read statement is 10.3.
0
 

Author Comment

by:970170
Comment Utility
hows does fortran know how many remaining characters are in the current record?  how would i emulate this in .net?  is there some sort of EOF marker?
0
 
LVL 12

Expert Comment

by:farsight
Comment Utility
That gets us back to my initial post:
"The biggest question I have is how do we read/write that type of file: i.e. one with logical records that have some particular length?
How are these files represented byte-by-byte?  In other words, can we read a byte-stream and interpret that in a way that allows us to distinguish the boundaries between records?
Other than that and the filenames, the rest seems (reasonably) straightforward."

It also explains why I was analyzing the file byte-by-byte.  Reread those posts.  I now think that the 1st character is a "carriage control" character.  For our purposes, we may be able to ignore it, though I'd check to see if the last record has a value different from hex 40.  If it's different, as I suspect, we might need to write a different value for the first byte of the last record.

It _APPEARS_ that the 2nd and 3rd characters of the record are a two-byte number representing the record length.  This exists on EVERY record (for this kind of file).  We do not need to read it using the Q format option, because we can just read it directly from the record.  (Technically, Q can be used multiple times at various positions to determine how many characters are left at that particular point.)

Following that, there are enough bytes so that the total number of WORDS in the record match the number we read from the two byte record length.

You could try the code below, in a suitable loop.  The code is based upon guesses I've made as to the format of the file.  It's almost certainly not correct.  It's especially likely that something goes wrong near or at the end-of-file (eof), because I have seen file data for that.  Add appropriate code for debugging/tracing, so you can see if you get the expected values while reading through the whole file.  Once you can read the whole file OK, you can start thinking about writing the little files.

[VB.NET -- untested]

    Private Sub TryReadFile(ByVal br As System.IO.BinaryReader)

        Dim r As Record
        Do
            r = New Record
            r.Read(br)

            ' Here insert code to show that Record r is complete and OK.

        Loop Until r.eof

    End Sub


    ' This is little more than a stub of a class.  You'll need to build it out and modify it to your needs.
    Class Record
        ' by default Inherits Object

        ' Of course this data should be private,
        ' with public properties to set/get them.
        ' (I'm just keeping this sample short.)
        Public eof As Boolean
        Public firstByte As Byte
        Public recordLengthInWords As Integer     ' Perhaps should be UInt16 ?
        Public data As Byte()

        ' Overrides the ToString() function provided by the Object class.
        Public Overrides Function ToString() As String
            Return ConvertEBCDICBytesToASCIIString(data)
        End Function

        Private Function ConvertEBCDICBytesToASCIIString(ByVal bytes As Byte()) As String
            ' Insert conversion code here.
        End Function

        ' Write may need additional parameters in order to correctly write the record.
        Public Sub Write(ByVal bw As System.IO.BinaryWriter)
            ' Insert code here.
        End Sub

        Public Sub Read(ByVal br As System.IO.BinaryReader)
            If br.PeekChar = -1 Then
                Me.eof = True
            Else
                Me.eof = False

                ' Read teh first byte.
                Me.firstByte = br.ReadByte()

                ' Read the record length.
                Dim secondByte As Byte = br.ReadByte()
                Dim thirdByte As Byte = br.ReadByte()
                Me.recordLengthInWords = Convert.ToInt16(secondByte) * 256 + Convert.ToInt16(thirdByte)
                'Perhaps should set eof to True here, if recordLengthInWords is zero (or near zero)???

                ' There are two bytes per word, and three bytes have already been read.
                Dim bytesToRead As Integer = Me.recordLengthInWords * 2 - 3

                ' Read the rest of the record (the data).
                Me.data = br.ReadBytes(bytesToRead - 3)
            End If
        End Sub
    End Class
0
 
LVL 12

Assisted Solution

by:farsight
farsight earned 250 total points
Comment Utility
>  end-of-file (eof), because I have seen file data for that
I mean:
   end-of-file (eof), because I have NOT seen file data for that

0

Featured Post

Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

Join & Write a Comment

Go is an acronym of golang, is a programming language developed Google in 2007. Go is a new language that is mostly in the C family, with significant input from Pascal/Modula/Oberon family. Hence Go arisen as low-level language with fast compilation…
This article is meant to give a basic understanding of how to use R Sweave as a way to merge LaTeX and R code seamlessly into one presentable document.
The goal of this video is to provide viewers with basic examples to understand and use switch statements in the C programming language.
The viewer will learn how to pass data into a function in C++. This is one step further in using functions. Instead of only printing text onto the console, the function will be able to perform calculations with argumentents given by the user.

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now