Regular Expression to extract data from a text file

I have several text files of varying formats.

Sample data from text file:

NAME: JOHNSON  DATE: 6-29-98  NUMBER: 1120
ADDRESS: 123 WISNER ST  CITY: SCRANTON
PHONE: 555-1212  ACCOUNT #: NEW ACCOUNT
METER #: NA  DATE OF INSTALLATION: 6-29-98
COMPLETE YES: Y  NO:
COMMENT: FLAT RATE-(103)

I would like something that will run through the text file and output something like this for each instance of city it finds:

SCRANTON
WEST PITTSTON
WILKES-BARRE
SCRANTON
DALLAS
DALLAS
WILKES-BARRE

Basically the regular expression needs to find instances of "CITY: CITY NAME " CR/LF

I'm not sure how to do this.  If there is an online resource or page that has this where I can copy and paste data from the files and get the output like I am looking for that could work or I am open to using PHP to do it as well.

I'm looking for Quick and Easy.
LVL 1
wfninpaAsked:
Who is Participating?
 
käµfm³d 👽Connect With a Mentor Commented:
If there is an online resource or page



I am open to using PHP to do it as well.

preg_match_all('/(?<=CITY:\s)\s*.*/', $source, $results);
var_dump($results);

Open in new window

0
 
HonorGodSoftware EngineerCommented:
What Operating System are you using, and what tools / utilities / languages do you have installed?

If you have Python, you could do something like:

Test output:
SCRANTON
'''Command: %(cmdName)s\n
Purpose: Locate instances of "City" in the specified input file, and display
         the associated value\n
  Usage: python %(cmdName)s.py inputFile\n
Example: python %(cmdName)s.py %(cmdName)s.txt'''

import os, os.path
import re
import sys

def main( filename ) :
  cityRE = re.compile( 'City: (.*)$', re.IGNORECASE )
  if not os.path.exists( filename ) :
    print 'Error: File not found: %s\n' % filename
    Usage()
  try :
    fh = open( filename )
    for line in fh :
      mo = cityRE.search( line )
      if mo :
        print mo.group( 1 )
  finally :
    fh.close()

def Usage( cmdName = None ) :
  if not cmdName :
    cmdName = os.path.basename( sys.argv[ 0 ] )
  if cmdName[ -3: ] == '.py' :
    cmdName = cmdName[ :-3 ]
  print __doc__ & locals()
  sys.exit()


if __name__ == '__main__' :
  argc = len( sys.argv )
  if argc != 2 :
    print 'Error: Unexpected number of command line arguments: %d\n' % argc
    Usage()
  main( sys.argv[ 1 ] )
else :
  print 'Error: script should be executed, not imported.\n'
  Usage( __name__ )

Open in new window

0
 
point_pleasantCommented:
quick shell script

grep "CITY: " city.txt > /tmp/junk
while read inputline
do
        echo $inputline | awk '{print $6}'
done < /tmp/junk
rm /tmp/junk
0
 
point_pleasantCommented:
oops this will handle cities up to three words long

grep "CITY: " city.txt > /tmp/junk
while read inputline
do
        echo $inputline | awk '{print $6" "$7" "$8}'
done < /tmp/junk
rm /tmp/junk
0
 
wfninpaAuthor Commented:
Thank you.  This was exactly what I needed.  The three sites were helpful for testing and modifying the code to my needs.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.