Solved

Difference In File Size Between Fixed Width and Comma Delimitted File

Posted on 2016-11-18
27
28 Views
Last Modified: 2016-11-21
Hi All,

I have a query that I'm exporting.  It has about 19,000 records in it about about 166 columns total.  I went through the Wizard to create the spec for the file and when it goes to export, the file is about 284 Mb's in size, which is crazy.  If I export it as a comma delimitted text file, the file is about 30mbs.  I need the file to stay in a specific format to feed into a vendor's site, which is why I did the spec wizard.  Does anyone have any idea why the fixed format length would create such a huge file?  Or can someone offer another suggestion for exporting this as a text file with fixed width columns?

Any help would be great appreciated.

-Anthony
0
Comment
Question by:Anthony6890
  • 13
  • 8
  • 4
  • +1
27 Comments
 
LVL 34

Expert Comment

by:PatHartman
Comment Utility
The fixed width file includes a space for every character in a field width.  For example, LastName is generally defined as 25 characters but the average last name is probably less then 15 characters so every record  is as long as it needs to be to hold all POTENTIAL characters whereas the .cav format truncates trailing spaces and so each record is only as long as it needs to be to accommodate the actual characters in THAT specific record.
0
 
LVL 1

Author Comment

by:Anthony6890
Comment Utility
Thanks Pat, that makes a lot of sense.  

Is there a way to get around this instead of doing a fixed width?  If I specified the table column widths, can I write VB code to begin certain columns at certain positions versus filling some areas with spaces?  My columns are already formatted to look the way they should upon export, I just need to ensure that the column being at specific positions...
0
 
LVL 18
Comment Utility
adding on to Pat's comment:

284mb/19,000 records is about 15 bytes/field

30mb/19,000 records is about 1.5 bytes/field

obviously you have a lot of short data and/or many fields that are not filled

> "I just need to ensure that the column being at specific positions..."

that means you need to pad with spaces, which makes the file much bigger.
0
 
LVL 1

Author Comment

by:Anthony6890
Comment Utility
Crystal, you would be correct.  Right now I'm testing a smaller population without filling in a lot of data across the table.  If I add more data, will that reduce the size of the file?
0
 
LVL 18
Comment Utility
Hi Anthony,

if each field takes the same space whether it has data or not. then the file size is dependent on the number of records.

correction:
284mb/19,000 records is about 15K/record

30mb/19,000 records is about 1.5K/record
0
 
LVL 1

Author Comment

by:Anthony6890
Comment Utility
That is so interesting to me. Can you offer any suggestions an a way that I can create a smaller file?  I've tried numerous things.  What is also crazy to me is that if I zip the file, it goes for 284Mb to about 4 Mbs...
0
 
LVL 34

Expert Comment

by:PatHartman
Comment Utility
How can adding more data reduce the size of the file?  Calculating the width of a fixed width file is straightforward.

Number of records * record length.

The record length is FIXED in a fixed width file.  That is what the format name means.  It is also why most people use .csv files which are significantly smaller if you have a lot of shorter than maximum width fields.  EVERY character of the record either contains some meaningful data or a place holding space character.  The .csv format uses a delimiter to separate fields so software that read the file doesn't need to count characters, it looks for the next instance of the delimiter to know when it gets to a new fielc.

Pat**********************Hartman*****************

or

Pat,Hartman

The * represent the space fillers.  The fixed record is 50 characters wide so the file size is 50 * number of records plus a few characters for internal header/footer characters.  The comma separated file would probably be about 60% of the size of the fixed width assuming the two fields were first name and last name.  For some types of data, the file sizes will be similar and the fixed could actually be smaller if all the data is actually as wide as the space allowed.  Things like phone numbers are 10 characters and assuming you have a phone number, it will take up all 10 characters.  There won't be any filler.
0
 
LVL 1

Author Comment

by:Anthony6890
Comment Utility
Is there a way to pass Null Values instead of Spaces for some of the areas that I'm having to fill the file?
0
 
LVL 34

Expert Comment

by:PatHartman
Comment Utility
The zip process eliminates redundant data so typically a fixed width file has a lot of contiguous spaces that can be zipped to 1 character and a count.
0
 
LVL 34

Expert Comment

by:PatHartman
Comment Utility
NO.  Fixed is FIXED.  That is what it means.  Short of zipping the file, which WILL reduce it's size, you CANNOT make it any shorter.  A Null is still a character.  It is a hex "00" whereas a space is a hex "40".  They each still take ONE BYTE.  And most applications will be adversely affected if you substitute nulls for spaces.
0
 
LVL 1

Author Comment

by:Anthony6890
Comment Utility
Ok, I completely understand.  I have one follow-up question, if I have a record that doesn't have any more data after a certain position, i.e. after position 1600; however, the record below that has data that goes out to position 8000, is Access space filling the first record from 1601-8000 because the record below it has data in those fields?
0
 
LVL 34

Expert Comment

by:PatHartman
Comment Utility
If the record width is defined as 8,000 characters then EVERY record must be 8,000 bytes followed by a crlf to end the record even if only the first field contains a single meaningful character and all the other fields are "empty".  It is what it is --- 8,000 bytes or nothin'

In a fixed width file, every record MUST BE EXACTLY the same Length.  Most applications will fail if they don't.  If you have records of differing lengths, you probably have data that has embedded crlf characters that you need to get rid of.

Some specialized file formats are a combination of fixed and delimited and have record termination characters other than crlf.  I use these for EDI transactions which are "standardized" formats used for communicating various types of data between trading partners.  For example, an 837p format is used to transmit a bill for professional services to the insurance company or whoever is going to pay the bill.  Each individual transaction might take up to 40 "records" of varying formats.  Some of the "records" had fixed width fields and some had variable length fields.  The definition for this type of exchange was an 800+ page document that still left gaping holes in the definition.  The response from the payer was an 835 format file that was somewhat less complicated but still took many "records" to communicate what they paid and why.
0
 
LVL 1

Author Comment

by:Anthony6890
Comment Utility
Thanks Pat, I'm learning a lot about file formats here.  In my case, we are remitting information about our members to our insurance carrier.  The carrier has a fixed width file layout that I need to abide by.  Within this file, instead of listing the dependents under the member, they list the dependents horizontally.  At times, a member might have 0 dependents or might have 14 dependents.  Each member record length is 1631 bytes total and each dependent record is an additional 968.  I'm assuming then, that each record has a total length of 15,183 bytes because I never included a crlf.  How can I insert that through an access query?
0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 
LVL 45

Expert Comment

by:aikimark
Comment Utility
If you have text fields that have defaulted to a length of 255, then you might want to go back to the definition of the table and make those fields their actual length.
0
 
LVL 1

Author Comment

by:Anthony6890
Comment Utility
I have gone back and changed the values of all of the fields ti their appropriate length; however, I have about 4 filler fields that are memo fields that are filled in with a various combination of spaces and 0 fills...
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
You can choose to only display the first 255 characters of those memo fields
0
 
LVL 1

Author Comment

by:Anthony6890
Comment Utility
I made a little progress, if I open the exported file up into Notepadd++ and use their Blank Operations, remove trailing space, it cuts out all of the extra spaces at the end of the record and cuts the record down to the appropriate ending versus spacing it out to 15,000 bytes.
0
 
LVL 1

Author Comment

by:Anthony6890
Comment Utility
So I guess the question I have is whether I can do this operation while exporting the file or not?
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
I doubt it.  If you are supposed to exchange data to a system that is expecting a fixed width format, all your records should be the same length
0
 
LVL 1

Author Comment

by:Anthony6890
Comment Utility
True, but they are only requiring values for the fir 1631 bytes.  Anything after that is not required, but it could be on the file and there are specific locations where those values are supposed to be.  

I compared my file to historical ones, and that appears to be the way it was being handled.  So I'm assuming then there isn't a way to force access to end a record and not fill with spaces if there is no more data after a certain point?
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
There are a couple ways to handle this:
* You could potentially invoke the RTrim() function on the line before you write it.  However, it wouldn't necessarily know when to stop removing trailing spaces.
* You could use the Left() function to truncate records.  This seems to be the surest way.
0
 
LVL 34

Accepted Solution

by:
PatHartman earned 500 total points
Comment Utility
This is not a standard fixed width file.  It is a variable width file.  This is not a format that you should be creating with built in methods because they don't allow you to control the variable aspect of the record length.

I am assuming that your schema is properly normalized and you have dependents vertically in some table - either a persons table so all people are in one table or in a separate dependents table.

The procedure to create this export needs to open two recordsets based on tables or queries.  One for the employees and one for the dependents.  And an output file to write the exported records to.

The outer loop reads the employee query and formats the fixed header portion of the output record.  The inner loop reads the dependents records and concatenates them to the fixed portion of the output record.  When there are no more dependents, the completed record is written to the output file and the outer loop reads the next employee.

A process like this will create the correct length record for each situation.  The only extraneous characters will be the fillers for each field to pad them out to the fixed width.  There will not be unnecessary dependent segments created.  So an employee without dependents will write a record of 1631 bytes an employee with three dependents will write 1631 Plus 968 plus 968 plus 968 or 4535 characters.  The Print command will append the crlf as the record is written.

I've attached the code I used to create a similar format record.  As it evolved, it turned out that I didn't have to concatenate multiple records for the variable part of the record so this code writes a header concatenated with a single detail but it will show you how to force all the fields to be padded correctly.  Access helps us by stripping trailing spaces as data is stored and in fact, I don't want to store the trailing spaces so they have to be attached when the export is written.  notice how the Space(...) function is used to determine the actual length of the field and then concatenate 0-x spaces to pad to the fixed width.
Public Function ExportFile() As Boolean
Dim db As DAO.Database
Dim rsExportParam As DAO.Recordset
Dim rsExportFile As DAO.Recordset
Dim qd As DAO.QueryDef
Dim tdExpErr As DAO.TableDef
Dim rsExpErr As DAO.Recordset

Dim strRecord As String
Dim iRecLen As Integer

Dim FSO As FileSystemObject
Dim fsoFile As TextStream
Dim strQuery As String
Dim strPath As String
Dim strFile As String
Dim strZipFile As String
Dim ExportCount As Long

   On Error GoTo Err_Proc

    Set db = CurrentDb()
    Set rsExportParam = db.OpenRecordset("tblExportParameters", dbOpenSnapshot)
    strQuery = rsExportParam!ExportQuery
    strPath = rsExportParam!ExportPath
    If Right(strPath, 1) = "\" Then
        strFile = strPath & rsExportParam!ExportFile
    Else
        strFile = strPath & "\" & rsExportParam!ExportFile
    End If
    strZipFile = strFile & ".zip"

    Set qd = db.QueryDefs(strQuery)
        qd.Parameters!EnterExpDate = Null ' rsExportParam!ExportDate
        qd.Parameters!EnterEMS = rsExportParam!EMS
        qd.Parameters!EnterProvno = rsExportParam!Provider
        qd.Parameters!EnterExpGroup = rsExportParam!ExportGroup
    Set rsExportFile = qd.OpenRecordset(dbOpenDynaset, dbSeeChanges)
    
    ExportCount = DLookup("ExportCount", "tblExportLog", "GroupID = " & rsExportParam!ExportGroup)
    If rsExportFile.RecordCount > ExportCount Then
        GoSub WriteErr
        Exit Function
    End If
    
    
'''fso stuff
    Set FSO = New FileSystemObject
    Set fsoFile = FSO.CreateTextFile(strFile, True)
    
'''internal Access stuff
    TempVars("RecCountExp").Value = 0
    Do While Not rsExportFile.EOF
        TempVars("RecCountExp").Value = TempVars("RecCountExp").Value + 1

        strRecord = rsExportFile!AgencyID                                                         'x(09) column 1-9
        strRecord = strRecord & rsExportFile!CarePlanTranType                                     'x(01) column 10
        strRecord = strRecord & rsExportFile!CarePlanNum & Space(10 - Nz(Len(rsExportFile!CarePlanNum), 0))       'x(10) column 11-20
        strRecord = strRecord & rsExportFile!PriorAuthTranType                                    'x(01) column 21
        strRecord = strRecord & rsExportFile!PriorAuthNum & Space(10 - Nz(Len(rsExportFile!PriorAuthNum), 0))     'x(10) column 22-31
        strRecord = strRecord & rsExportFile!EmsNum & Space(12 - Len(rsExportFile!EmsNum))                  'x(12) column32-43
        strRecord = strRecord & rsExportFile!FirstInitial & Space(1 - Len(rsExportFile!FirstInitial))             'x(01) column 44
        If IsNull(rsExportFile!ProviderNum) Then
            strRecord = strRecord & Space(9)
        Else
            strRecord = strRecord & rsExportFile!ProviderNum          'x(09) column 45-53
        End If
        If IsNull(rsExportFile!PAAssignmentSub) Then                  'x(01) column 54
            strRecord = strRecord & Space(1)
        Else
            strRecord = strRecord & rsExportFile!PAAssignmentSub
        End If
        If Len(rsExportFile!ClinicalNotes) > 1000 Then
            strRecord = strRecord & Left(rsExportFile!ClinicalNotes, 1000)  'x(1000) column 55-1054
        Else
            strRecord = strRecord & rsExportFile!ClinicalNotes & Space(1000 - Nz(Len(rsExportFile!ClinicalNotes), 0)) 'x(1000) column 55-1054
        End If
        If Len(rsExportFile!ExternalNotes) > 1000 Then
            strRecord = strRecord & Left(rsExportFile!ExternalNotes, 1000)  'x(1000) column 1055 - 2054
        Else
            strRecord = strRecord & rsExportFile!ExternalNotes & Space(1000 - Nz(Len(rsExportFile!ExternalNotes), 0)) 'x(1000) column 1055 - 2054
        End If
        strRecord = strRecord & rsExportFile!NumOfLineItems                                       'x(02) column 2055-2056

        If Len(strRecord) <> 2056 Then
            'MsgBox "Invalid recordlength.", vbOKOnly
            TempVars("ErrCountExp").Value = TempVars("ErrCountExp").Value + 1
            rsExportFile.Edit
                rsExportFile!ErrDesc = "Invalid Header Rec length"
            rsExportFile.Update
            Debug.Print "Header ID = " & rsExportFile!HeaderID & " -- Rec Len = " & Len(strRecord)
            Exit Function
        End If
        strRecord = strRecord & Format(rsExportFile!LineNum, "00")                                'x(02) column 2057-2058
        strRecord = strRecord & rsExportFile!lipastatus & Space(1 - Nz(Len(rsExportFile!lipastatus), 0))          'x(01) column 2059
        strRecord = strRecord & rsExportFile!LITranType                                           'x(01) column 2060
        strRecord = strRecord & rsExportFile!ProcedureCode & Space(5 - Nz(Len(rsExportFile!ProcedureCode), 0))    'x(05) column 2061-2065
        strRecord = strRecord & rsExportFile!Mod1 & Space(2 - Nz(Len(rsExportFile!Mod1), 0))                         'x(02 column 2066-2067
        strRecord = strRecord & rsExportFile!Mod2 & Space(2 - Nz(Len(rsExportFile!Mod2), 0))                         'x(02 column 2068-2069
        strRecord = strRecord & rsExportFile!Mod3 & Space(2 - Nz(Len(rsExportFile!Mod3), 0))                         'x(02 column 2070-2071
        strRecord = strRecord & rsExportFile!Mod4 & Space(2 - Nz(Len(rsExportFile!Mod4), 0))                         'x(02 column 2072-2073
        strRecord = strRecord & rsExportFile!RevenueCode & Space(4 - Nz(Len(rsExportFile!RevenueCode), 0))        'x(04) column 2074-2077
        strRecord = strRecord & rsExportFile!ProcedureCodeMod & Space(2 - Nz(Len(rsExportFile!ProcedureCodeMod), 0)) 'x(02) column 2078-2079
        strRecord = strRecord & rsExportFile!ProcedureCodeList & Space(4 - Nz(Len(rsExportFile!ProcedureCodeList), 0))  'x(04) column 2080-2083
        strRecord = strRecord & Format(rsExportFile!FromDate, "yyyymmdd")                         'x(08) column 2084-2091
        strRecord = strRecord & Format(rsExportFile!ThroughDate, "yyyymmdd")                      'x(08) column 2092-2099
        If IsNull(rsExportFile!FundingSource) Then
            strRecord = strRecord & " "
        Else
            strRecord = strRecord & rsExportFile!FundingSource                                       'x(01) column 2100
        End If
        strRecord = strRecord & Format(Nz(rsExportFile!FrequencyNum, "0"), "0000")                        'x(04) column 2101-2104
        strRecord = strRecord & Nz(rsExportFile!FrequencyType, " ")                                        'x(01) column 2105
        strRecord = strRecord & Format(Nz(rsExportFile!PaReqAmt, 0), "000000.00")                       'x(09) column 2106-2114
        strRecord = strRecord & Format(Nz(rsExportFile!PAReqUnits, 0), "0000000")                       'x(07) column 2115-2121
        TempVars("SvcCountExp").Value = TempVars("SvcCountExp").Value + 1
        If Len(strRecord) <> 2121 Then
            'MsgBox "Invalid recordlenngth.", vbOKOnly
            TempVars("ErrCountExp").Value = TempVars("ErrCountExp").Value + 1
            rsExportFile.Edit
                If Nz(rsExportFile!PAReqUnits, 0) < 0 Then
                    rsExportFile!ErrDesc = "Req units is negative"
                Else
                    rsExportFile!ErrDesc = "Invalid Detail Rec length"
                End If
                    
            rsExportFile.Update
            Debug.Print "Detail ID = " & rsExportFile!DetailID & " -- Rec Len = " & Len(strRecord)
            'MsgBox "record skipped due to invalid record length - " & rsExportFile!emsnum, vbOKOnly
            GoTo SkipWrite
        End If
'''fso stuff
        strRecord = Replace$(strRecord, vbNullChar, Chr$(32))           'without this- export is Chinese due to some bug MS introduced in Sept 2013
        fsoFile.WriteLine strRecord
'''internal Access stuff
'        Print #1, strRecord
SkipWrite:
        rsExportFile.MoveNext
    Loop
    
'''fso stuff
    fsoFile.Close
'''internal Access stuff
'    Close #1
    If TempVars("ErrCountExp").Value > 0 Then
'        MsgBox ErrCount & " Errors were found.", vbOKOnly
        ' Log the error
        Set tdExpErr = db.TableDefs("tblExportError")
        Set rsExpErr = tdExpErr.OpenRecordset(dbOpenDynaset, dbSeeChanges)
        rsExpErr.AddNew
            rsExpErr!ErrorText = TempVars("ErrCountExp").Value & " Errors were found."
            rsExpErr!EMS = rsExportParam!EMS
            rsExpErr!Provider = rsExportParam!Provider
            rsExpErr!FromDate = rsExportParam!FromDate
            rsExpErr!ThruDate = rsExportParam!ThruDate
            Select Case rsExportParam!Action
                Case 1  'all
                    rsExpErr!Action = "All"
                Case 2  'Adds
                    rsExpErr!Action = "Adds"
                Case 3  'Changes
                    rsExpErr!Action = "Changes"
            End Select
            rsExpErr!ExportCount = TempVars("RecCountExp").Value
            rsExpErr!ErrCount = TempVars("ErrCountExp").Value
            rsExpErr!ExportDate = rsExportParam!ExportDate
            rsExpErr!GroupID = rsExportParam!ExportGroup
            rsExpErr!pCode = rsExportParam!pCode
        rsExpErr.Update
        Set rsExpErr = Nothing
        Set tdExpErr = Nothing
        
        'Since the errors are discovered so far into the process, I decided to let them write to the table and then move them to an error table
        'after the fact.
        'The errors have to be deleted because they are not sent to the portal and the return file match must be done row-by-row in the order
        'in which the records are sent because we do not have a unique identifier so we can't have rows in the Export table that won't match
        'rows being returned from the portal.
        DoCmd.RunMacro "mWarningsOff"
        DoCmd.OpenQuery "qAppendErrors"     'copy recs with errors to error table.
        'DoCmd.OpenQuery "qUpdateStatus"
        Set qd = db.QueryDefs!qUpdatePendingStatusToErrorNotSent
            qd.Parameters!EnterGroupID = rsExportParam!ExportGroup
            qd.Execute
        DoCmd.OpenQuery "qDelErrors"
        DoCmd.RunMacro "mWarningsOn"
    End If
    
    Call Zip(strZipFile, strFile)

    If TempVars("RecCountExp").Value = 0 Then
        ExportFile = False
    Else
        ExportFile = True
    End If
Exit_Proc:
    Exit Function
WriteErr:
        Set tdExpErr = db.TableDefs("tblExportError")
        Set rsExpErr = tdExpErr.OpenRecordset(dbOpenDynaset, dbSeeChanges)
        rsExpErr.AddNew
            rsExpErr!ErrorText = "Output file record length different from records created for the specified GroupID"
            rsExpErr!EMS = rsExportParam!EMS
            rsExpErr!Provider = rsExportParam!Provider
            rsExpErr!FromDate = rsExportParam!FromDate
            rsExpErr!ThruDate = rsExportParam!ThruDate
            Select Case rsExportParam!Action
                Case 1  'all
                    rsExpErr!Action = "All"
                Case 2  'Adds
                    rsExpErr!Action = "Adds"
                Case 3  'Changes
                    rsExpErr!Action = "Changes"
            End Select
            rsExpErr!ExportCount = rsExportParam!ExportGroup
            rsExpErr!ErrCount = rsExportFile.RecordCount
            rsExpErr!ExportDate = rsExportParam!ExportDate
            rsExpErr!GroupID = rsExportParam!ExportGroup
            rsExpErr!pCode = rsExportParam!pCode
        rsExpErr.Update
        Set rsExpErr = Nothing
        Set tdExpErr = Nothing
        Return

Err_Proc:

    Select Case Err.Number
        Case Else
            Debug.Print "Error " & Err.Number & " (" & Err.Description & ") in procedure ExportFile of Module modBatchUpload"
            
            Resume Next
            
            Resume
    End Select
End Function

Open in new window

0
 
LVL 1

Author Comment

by:Anthony6890
Comment Utility
I will try the left function now!  Thanks!!
0
 
LVL 1

Author Comment

by:Anthony6890
Comment Utility
Wow Pat, that is some code.  I will try that as well, give me a day or so to get that working to see how it will play out for me.  Thanks for all your help!
0
 
LVL 34

Expert Comment

by:PatHartman
Comment Utility
It is not easy to keep track of where you are when writing such a long record so I added comments in the code to ensure I kept up with the record format the client sent me.  Except for the FSO (File System Object) stuff which is used to write the record, the bulk of the logic you need to under stand is in the section that starts with -- Do While Not rsExportFile.EOF

Notice the comments off to the right --- 'x(09) column 45-53
These will be a life saver as you are testing and help you keep up with following the written format you were given.
0
 
LVL 1

Author Closing Comment

by:Anthony6890
Comment Utility
Pat, your code worked wonders.  I was able to get it to fully work yesterday (Sunday).  Thanks again for your help, much appreciated.

-Anthony
0
 
LVL 34

Expert Comment

by:PatHartman
Comment Utility
You're welcome Anthony.  Once you understand the concept, it is more tedious than difficult to create this type of output.
0

Featured Post

How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

Join & Write a Comment

I originally created this report in Crystal Reports 2008 where there is an option to underlay sections. I initially came across the problem in Access Reports where I was unable to run my border lines down through the entire page as I was using the P…
Overview: This article:       (a) explains one principle method to cross-reference invoice items in Quickbooks®       (b) explores the reasons one might need to cross-reference invoice items       (c) provides a sample process for creating a M…
Get people started with the utilization of class modules. Class modules can be a powerful tool in Microsoft Access. They allow you to create self-contained objects that encapsulate functionality. They can easily hide the complexity of a process from…
Access reports are powerful and flexible. Learn how to create a query and then a grouped report using the wizard. Modify the report design after the wizard is done to make it look better. There will be another video to explain how to put the final p…

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now