• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 597
  • Last Modified:

Error Parsing a Tab Delimited Text file

windows server 2003 - Railo 3.1 - MySQL 5
I ran into an error parsing a tab delimited text file and need some help solving it.  The error is:

Element at position [4] doesn't exist in array
Stacktrace The Error Occurred in
 C:\www\IDX\MCNMLS\test.cfm: line 21

19: <cfset MLSNum = trim(rec[2])>
20: <cfset PhotoLabel = trim(rec[3])>
21: <cfset DateLastModified = trim(rec[4])>
22: <cfset PhotoOrder = trim(rec[5])>
23: <cfset DisplayAsPortrait = trim(rec[6])>

I attached a screenshot of the area of the textfile where the error occurs and my code. I think its because there are empty fields? Any help appreciated! Thanks

 
     textfile
<cfquery name="qryphotolinks" datasource="MCNMLS">
DELETE FROM PhotoLinks
</cfquery>

<cffile action="read"  file="#ExpandPath( './ftp/txt/photos_links.txt' )#"  variable="txtFile">

<cfset rowCounter = 0>
                        
<cfloop index="record" list="#Replace(Replace(Replace(txtFile,'''','''''','all'),'|','','all'),chr(9),'| ','all')#" delimiters="#chr(13)##chr(10)#">
  <cfset rowCounter = rowCounter + 1>                       
            
<cfif rowCounter neq 1>

<cfset rec = listToArray(record,"|",true) >

<cfset PhotoLink = trim(rec[1])>
<cfset MLSNum = trim(rec[2])>
<cfset PhotoLabel = trim(rec[3])>
<cfset DateLastModified = trim(rec[4])>
<cfset PhotoOrder = trim(rec[5])>
<cfset DisplayAsPortrait = trim(rec[6])>

<cfquery name="qryInsertphotolinks" datasource="MCNMLS">
INSERT INTO PhotoLinks (PhotoLink,MLSNum,PhotoLabel,DateLastModified,PhotoOrder,DisplayAsPortrait)


VALUES (
<cfqueryparam cfsqltype="CF_SQL_VARCHAR" value="#PhotoLink#">,
<cfqueryparam cfsqltype="CF_SQL_INTEGER" value="#MLSNum#" null="#NOT isNumeric(MLSNum)#">,
<cfqueryparam cfsqltype="CF_SQL_VARCHAR" value="#PhotoLabel#">,
<cfqueryparam cfsqltype="CF_SQL_VARCHAR" value="#DateLastModified#">,
<cfqueryparam cfsqltype="CF_SQL_VARCHAR" value="#PhotoOrder#">,
<cfqueryparam cfsqltype="CF_SQL_VARCHAR" value="#DisplayAsPortrait#">
)


  </cfquery> 

 </cfif>          

</cfloop>

Open in new window

0
Bang-O-Matic
Asked:
Bang-O-Matic
  • 15
  • 10
  • 8
2 Solutions
 
gdemariaCommented:
Hmmm, you're doing a lot of converting of characters and changing the delimiter, I'm wondering why that is necessary.   Couldn't you just loop your end of line and then convert it to an array using the tab?  



<cfloop index="record" list="#Replace(Replace(Replace(txtFile,'''','''''','all'),'|','','all'),chr(9),'| ','all')#" delimiters="#chr(13)##chr(10)#">
  <cfset rowCounter = rowCounter + 1>                      
  <cfif rowCounter neq 1>
   <cfset rec = listToArray(record,"|",true) >




Change to something like this......

<cfloop index="record" list="#txtFile#" delimiters="#chr(13)##chr(10)#">
    <cfset rowCounter = rowCounter + 1>                              
     <cfif rowCounter neq 1>

    <cfset rec = listToArray(record,chr(9),true) >

    <cfif arrayLen(rec) neq 12>  <!-------- the number of columns you expect.....
         <cfdump var="#rec#">
    </cfif>


You can test the array to see if it holds the right number of entries and dump out when it does not.
You may find that text fields in your file contains a random carriage return or extra tab...


0
 
Bang-O-MaticAuthor Commented:
Hi gdemaria, I changed the code, attached, and about 15 seconds into parsing the file it errors out:

500 Servlet Exception
[show] java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space

I probably made a mistake in updating the code as you suggested, can you have another look? Thanks!



<cfsetting requestTimeOut = "12000">
<cfquery name="qryphotolinks" datasource="MCNMLS">
DELETE FROM PhotoLinks
</cfquery>

  
<cffile action="read"  file="#ExpandPath( './ftp/txt/photos_links.txt' )#"  variable="txtFile">

<cfset rowCounter = 0>
                        
<cfloop index="record" list="#txtFile#" delimiters="#chr(13)##chr(10)#">
    <cfset rowCounter = rowCounter + 1>                              
     <cfif rowCounter neq 1>

    <cfset rec = listToArray(record,chr(9),true) >

    <cfif arrayLen(rec) neq 6> 
         <cfdump var="#rec#">
    </cfif>


<cfset PhotoLink = trim(rec[1])>
<cfset MLSNum = trim(rec[2])>
<cfset PhotoLabel = trim(rec[3])>
<cfset DateLastModified = trim(rec[4])>
<cfset PhotoOrder = trim(rec[5])>
<cfset DisplayAsPortrait = trim(rec[6])>





<cfquery name="qryInsertphotolinks" datasource="MCNMLS">
INSERT INTO PhotoLinks (PhotoLink,MLSNum,PhotoLabel,DateLastModified,PhotoOrder,DisplayAsPortrait)


VALUES (
<cfqueryparam cfsqltype="CF_SQL_VARCHAR" value="#PhotoLink#">,
<cfqueryparam cfsqltype="CF_SQL_INTEGER" value="#MLSNum#" null="#NOT isNumeric(MLSNum)#">,
<cfqueryparam cfsqltype="CF_SQL_VARCHAR" value="#PhotoLabel#">,
<cfqueryparam cfsqltype="CF_SQL_VARCHAR" value="#DateLastModified#">,
<cfqueryparam cfsqltype="CF_SQL_VARCHAR" value="#PhotoOrder#">,
<cfqueryparam cfsqltype="CF_SQL_VARCHAR" value="#DisplayAsPortrait#">
)


  </cfquery> 

 </cfif>          

</cfloop>

Open in new window

0
 
Bang-O-MaticAuthor Commented:
I might add that this text file has approx 90k records in it.
0
Hire Technology Freelancers with Gigs

Work with freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely, and get projects done right.

 
_agx_Commented:
Is there a reason you can't use MySQL's import tool? With large imports, I usually prefer that instead of looping.  
http://dev.mysql.com/doc/refman/5.1/en/load-data.html
0
 
gdemariaCommented:
The out of memory issue is a different issue (you're run out of server memory)

So seems like your code is now working ?
0
 
gdemariaCommented:

good idea agx,  importing that size file would be much faster
0
 
_agx_Commented:
Ooops, hit return too soon.

I was going to add figure out the initial error first. Once you have it working, consider trying mySQL's bulk import tool instead.
0
 
Bang-O-MaticAuthor Commented:
Thanks, I'm working on the java heap space error...as soon as I figure that out I will try the parse again.
0
 
_agx_Commented:
Were you getting OOM errors with the original code or just after adding the debugging stuff? Sometimes I've had problems when I have too much debugging/cfdumps in a page.  If it's the debugging code,  try aborting after the dump so you can figure out the error 1st:


    <cfif arrayLen(rec) neq 6>
         <cfdump var="#rec#">
         <cfabort>                                     <!==== abort here
    </cfif>



0
 
Bang-O-MaticAuthor Commented:
Ok just to test the parsing code, I deleted all but afew hundred records from the text file, left in the one where my code failed originally, and it still errors out with the updated code: Element at position [4] doesn't exist in array.
 As best I can tell thats the first empty space it hits.
0
 
Bang-O-MaticAuthor Commented:
OOM Just after adding the debugging. Originally I was just getting the "Element at position [4] doesn't exist in array." error.
0
 
_agx_Commented:
I'm wondering if gdemaria might be right about the file "containing.. a random carriage return or extra tab".  If it's not confidential - can you attach a sample of the text file that errors out?  
0
 
gdemariaCommented:

When the element at position 4 doesn't exist, you should see the CFDUMP of the values...   what do those values look like?    Does it look like a full row of data?



0
 
Bang-O-MaticAuthor Commented:
Heres the text file photos-links.txt
0
 
Bang-O-MaticAuthor Commented:
actually here the full text file, the first one I loaded was the one I deleted records from and also deleted some carriage returns at the end of the last field name. photos-links.txt
0
 
_agx_Commented:
.... 101172 example from model home  <!=== extra character ??
67       12/2/2010 8:43:32 AM 6 False


Yeah, I can see an extra chr(13) in there just like gdemaria said.  

116 [ascii = 104] = h
117 [ascii = 111] = o
118 [ascii = 109] = m
119 [ascii = 101] = e
120 [ascii = 13] =   <!--- extra new line --->
1 [ascii = 9] =
2 [ascii = 49] = 1
3 [ascii = 50] = 2
0
 
Bang-O-MaticAuthor Commented:
gdemaria heres the error
DUMP.jpg
0
 
_agx_Commented:
I can't download that 2nd file for some reason.  But in the 1st file there's definitely extra line feed causing the problem here:

.... 101172 example from model home  <!=== extra character ??
67       12/2/2010 8:43:32 AM 6 False
0
 
gdemariaCommented:

agx has it pin-pointed.   "Example from model home[CR]..."  


Where is your data coming from?   Can it go through a cleaning before being written to the file to ensure there are no extra carriage returns or tabs within the text that will conflict with the delimiters?
0
 
Bang-O-MaticAuthor Commented:
the file is downloaded daily from an MLS Provider. So I dont really have any control over it, I just have to work with what they provide.
0
 
gdemariaCommented:

As a courtesy, you may want to report the problem to them... the data is not good and they should be made aware of it.

What you could do is try to add intelligence to your code, it could be tricky.   If you don't have all the fields, you need to read the next line and then get the 2nd half of your variables from it.

Alternatively, if the data isn't so important that you get EVERY row, you can just consider it a bad row and skip it.
That's the easy way, just change your CFIF to wrap around the insert so you skip over the insert when the data is not good.

0
 
Bang-O-MaticAuthor Commented:
yeah I also noticed that when importing the text file with navicat it adds a 7th field that has no name or data in it.
0
 
Bang-O-MaticAuthor Commented:
" just change your CFIF to wrap around the insert so you skip over the insert when the data is not good."  Can you provide an example? that would be much appreciated...thanks
0
 
_agx_Commented:
Looks like they always end lines with the column delimiter (tab) then a new line.  So you could replace that with a plain chr(10). Then any new lines left over could be ignored,.

<cffile action="read"  file="#ExpandPath( './ftp/txt/photos_links.txt' )#"  variable="txtFile">
<cfset endLine = chr(9)&chr(13)&chr(10)>
<!--- lines end with delimiter plus new line --->
<cfset txtFile = replace(txtFile, endLine, chr(10), "all")>
<!--- otherwise, the new line is in the middle of the text --->
<cfset txtFile = replace(txtFile, chr(13)&chr(10), "", "all")>

... rest of code is same ....

Open in new window



wrap around the insert so you skip over the insert when the data is not good

... but IF the file always has 6 fields, the cfif's probably the more robust solution.
0
 
gdemariaCommented:

<cfsetting requestTimeOut = "12000">
<cfquery name="qryphotolinks" datasource="MCNMLS">
DELETE FROM PhotoLinks
</cfquery>

  
<cffile action="read"  file="#ExpandPath( './ftp/txt/photos_links.txt' )#"  variable="txtFile">

<cfset rowCounter = 0>
                        
<cfloop index="record" list="#txtFile#" delimiters="#chr(13)##chr(10)#">
    <cfset rowCounter = rowCounter + 1>                              
     <cfif rowCounter neq 1>
        <cfset rec = listToArray(record,chr(9),true) >
    
        <cfif arrayLen(rec) eq 6> 
                <cfset PhotoLink = trim(rec[1])>
                <cfset MLSNum = trim(rec[2])>
                <cfset PhotoLabel = trim(rec[3])>
                <cfset DateLastModified = trim(rec[4])>
                <cfset PhotoOrder = trim(rec[5])>
                <cfset DisplayAsPortrait = trim(rec[6])>
                
                <cfquery name="qryInsertphotolinks" datasource="MCNMLS">
                INSERT INTO PhotoLinks (PhotoLink,MLSNum,PhotoLabel,DateLastModified,PhotoOrder,DisplayAsPortrait)
                VALUES (
                <cfqueryparam cfsqltype="CF_SQL_VARCHAR" value="#PhotoLink#">,
                <cfqueryparam cfsqltype="CF_SQL_INTEGER" value="#MLSNum#" null="#NOT isNumeric(MLSNum)#">,
                <cfqueryparam cfsqltype="CF_SQL_VARCHAR" value="#PhotoLabel#">,
                <cfqueryparam cfsqltype="CF_SQL_VARCHAR" value="#DateLastModified#">,
                <cfqueryparam cfsqltype="CF_SQL_VARCHAR" value="#PhotoOrder#">,
                <cfqueryparam cfsqltype="CF_SQL_VARCHAR" value="#DisplayAsPortrait#">
                )
                  </cfquery> 
        <cfelse>
           BAD RECORD<br>
        </cfif> <!---- only insert if there are enough columns ------>
     <cfelse>
         SKIP HEADER RECORD<br> 
     </cfif>          
</cfloop>

Open in new window

0
 
gdemariaCommented:
> Looks like they always end lines with the column delimiter (tab) then a new line.  So you could replace that with a plain chr(10). Then any new lines left over could be ignored,.

Nice!   That will help fix most issues.


Combine that with the CFIF to skip the insert if there is still an issue and you have a pretty solid solution
0
 
Bang-O-MaticAuthor Commented:
It works!, and adds all the records! great work guys! Thank you very, very much!
0
 
Bang-O-MaticAuthor Commented:
Thanks again!
0
 
Bang-O-MaticAuthor Commented:
Agx, Question... how did you see these characters in the text file?

116 [ascii = 104] = h
117 [ascii = 111] = o
118 [ascii = 109] = m
119 [ascii = 101] = e
120 [ascii = 13] =   <!--- extra new line --->
1 [ascii = 9] =
2 [ascii = 49] = 1
3 [ascii = 50] = 2
0
 
_agx_Commented:
Using an old fashioned loop ;-) Not tested, but it was something along the lines of

<cfset theLine = "abc#chr(10)#efg#chr(32)#blah">
<cfloop from="1" to="#len(theLine)#" index="x">
     <cfset theChar = mid(theLine, x, 1)>
     #x# [ascii = #asc(theChar)#] = #theChar#<br>
</cfloop>
0
 
_agx_Commented:
.. with <cfoutput> tags of course.
0
 
Bang-O-MaticAuthor Commented:
nice! thanks agx!
0
 
_agx_Commented:
Welcome :)
0

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

  • 15
  • 10
  • 8
Tackle projects and never again get stuck behind a technical roadblock.
Join Now