Solved

Parsing text files using Coldfusion

Posted on 2007-11-23
9
831 Views
Last Modified: 2010-08-05
How can I parse through a directory of text files, some with different formatting, and grab key information using Coldfusion? The ultimate goal is to get all the key information from the fields below into csv spreadsheet. Though the formatting varies the fields I want to capture the information for are:

Sent:
Subject:
Name=
School=
Age=
Description=
Photo name:

Here is an example text file:

******* Example Start *******

From: jdoe@somedomain.com
Sent: Saturday, June 17, 2007 4:28 PM
To: someone@someWebSite.com
Subject: My best photos

Student photo submission


Name=Photo by John Doe
School=  Some School
Age=  15
Description=This is my trip to DisneyLand June 17, 2007.
Photo name: IMG_39587.jpg

******* Example End ******

NOTE: Some of the files do not contain all the fields and some contain other information that is not wanted in the final table.
0
Comment
Question by:CalDev
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 4
9 Comments
 
LVL 52

Expert Comment

by:_agx_
ID: 20340643
> some with different formatting

You need to find a common pattern to parse the files.  I understand some of the files do not contain all of these fields.  But if the fields are present, do they always start with

Sent:
Subject:
Name=
School=
Age=
Description=
Photo name:

?
0
 
LVL 52

Expert Comment

by:_agx_
ID: 20340647
ie Do any of the files contain different headings like

Sent At=  (Instead of Sent:)
Subject Value:  (Instead of Subject:)
Name of Photo - (Instead of Photo name:)

0
 

Author Comment

by:CalDev
ID: 20340672
aqx:

All the feilds are the same when present in the files. Some of the files don't have all the fields. There are other fields in some files and all have some blank lines but I would like to ignore those. The only fields I'm interested in are:

Sent:
Subject:
Name=
School=
Age=
Description=
Photo name:
0
More Than Just A Video Library

Train for your certification. Learn the latest DevOps tools. Grow your skillset to do better work.

At Linux Academy, we release new training modules every week so you'll always be up to date on the latest tech.

 
LVL 52

Accepted Solution

by:
_agx_ earned 500 total points
ID: 20340752
You could use cfdirectory to get a listing of all of the text files in your directory. Then loop through the query and read each text file line by line.  Split each line on ":" or "=" and extract the key/value (if any)

   ie key = "Subject"  and value = "My best photos"

Save all extracted values and finally write them to a CSV file.

Note, you might want to use a slightly different approach if you're reading a very large number of text files, because it could be very memory intensive.

   
<!--- delimiters --->
<cfset colon = ":">
<cfset equalSign = "=">
<cfset newLine = chr(13) & chr(10)>
<cfset columnDelim = ",">
<cfset qualifier = '"'>
 
<!--- append csv file headings --->
<cfset fieldNames = listToArray("SourceFilePath,Sent,Subject,Name,School,Age,Description,Photo name")>
<cfset headerRow = listQualify(arrayToList(fieldNames, columnDelim), qualifier, columnDelim, "all")>
<cfset outputContent = headerRow>
 
<!--- get a query of text files in your directory --->
<cfdirectory action="list" directory="c:\yourDirectory" filter="*.txt" name="getTextFiles">
 
<!--- for each text file --->
<cfloop query="getTextFiles">
	<!--- read the file contents into a variable --->
	<cffile action="read" file="#Directory#\#Name#" variable="filecontents">
	<!--- store the key fields from this file --->
	<cfset keyData = structNew()>
	<cfset keyData.sourceFilePath = "#Directory#\#Name#">
	
	<!--- read each line of the file --->
	<cfloop list="#filecontents#" index="inputline" delimiters="#newLine#">
		<cfset rowDelim = "" >
		<cfif listLen(inputline, colon) gt 1>
			<cfset rowDelim = colon >
		<cfelseif listLen(inputline, equalSign) gt 1>
			<cfset rowDelim = equalSign >
		</cfif>
		<!--- extract the key:value or key=value from each line --->
		<cfif rowDelim neq "">
			<cfset keyData[trim(listFirst(inputline, rowDelim))] = trim(listRest(inputline, rowDelim))>
		</cfif>
	</cfloop>
 
	<!--- if any key information was found, append results to output variable --->
	<cfif NOT StructIsEmpty(keyData)>
		<cfset outputLine = "" />
		<cfloop from="1" to="#ArrayLen(fieldNames)#" index="y">
			<cfif StructKeyExists(keyData, fieldNames[y])>
				<!--- if this field was found in the file --->
				<cfset outputLine = listAppend(outputLine, qualifier & keyData[fieldNames[y]] & qualifier, columnDelim)>
			<cfelse>
				<!--- otherwise append an empty string --->
				<cfset outputLine = listAppend(outputLine, qualifier & qualifier, columnDelim)>
			</cfif>
		</cfloop>
		<cfset outputContent = outputContent & newLine & outputLine >
	</cfif>
</cfloop>
 
<!--- write the results to a csv file --->
<cffile action="write" file="c:\yourDirectory\theResults.csv" output="#outputContent#">

Open in new window

0
 

Author Comment

by:CalDev
ID: 20340883
agx_:

I hope it doesn't offend you if I say your a genius because you defiantly are!!! This script worked perfectly and it ran through over 1700 files in only a few seconds and built the csv. I only have a few rows in the csv that need a little cleanup and other than that it's ready to go. Thank you for your help!

I will give all the points and an excellent grade right after this note.
0
 

Author Closing Comment

by:CalDev
ID: 31410700
Anyone looking for a solution to parse text files with Coldfusion look no farther, this one works great.
0
 
LVL 52

Expert Comment

by:_agx_
ID: 20340890
Glad I could help :)
0
 
LVL 52

Expert Comment

by:_agx_
ID: 20341823
For PAQ purposes:

This line
<cfif NOT StructIsEmpty(keyData)>

should be:
<cfif StructCount(keyData) GT 1>

Because the structure will always have _at least one_ key:  keyData.sourceFilePath
0
 

Author Comment

by:CalDev
ID: 20343308
Thank you for the follow up. I will make the change to the code.
0

Featured Post

Increase Agility with Enabled Toolchains

Connect your existing build, deployment, management, monitoring, and collaboration platforms. From Puppet to Chef, HipChat to Slack, ServiceNow to JIRA, Splunk to New Relic and beyond, hand off data between systems to engage the right people.

Connect with xMatters.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Hi, I will be creating today a basic tutorial on how we can create a Mail Custom Function and use it where ever we want. The main advantage about creating a custom function is that we can accommodate a range of arguments to pass to the Function and …
PROBLEM:  How to open a cfwindow or run a function on double click of a cfgrid row. One of my clients wanted to be able to double click on a row item to get more detailed information about a transaction and to be able to modify the line items i…
In this video we outline the Physical Segments view of NetCrunch network monitor. By following this brief how-to video, you will be able to learn how NetCrunch visualizes your network, how granular is the information collected, as well as where to f…
Monitoring a network: why having a policy is the best policy? Michael Kulchisky, MCSE, MCSA, MCP, VTSP, VSP, CCSP outlines the enormous benefits of having a policy-based approach when monitoring medium and large networks. Software utilized in this v…

687 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question