• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1229
  • Last Modified:

Remove Escape Codes and Such From Text File

I have a file that was created when I intercepted a COBOL print command to LPT1, having redirected LPT1 to the Generic / Text driver.

The file it generates, however, has loads of ANSI characters and escape codes and such.

Is there some way to take this file and clean out everything but the regular ASCII characters so I can import it into Excel?

Thanks,
0
BMIT
Asked:
BMIT
  • 4
  • 3
  • 3
  • +2
2 Solutions
 
ReneGeCommented:
May I suggest that you change your Zones from "MS DOS, Microsoft Operating Systems" to "
MS DOS, VB Script and Scripting Languages"

To do this, click on Request attention and ask for it.

Cheers,
Rene
0
 
QlemoBatchelor, Developer and EE Topic AdvisorCommented:
Do you have any preference for a solution? There are many means to get an appropriate filter tool. One way is to use PowerShell:
 (gc nonascii.txt) -replace "[\x01-\x1f\x80-\xff]" | out-file ascii.txt

Open in new window

That would remove any character having codes 1 to 31 and 128 to 255. Of course that is a very simplistic attempt, but since I do not know "the grammar" for your escape codes and such, I cannot be more precise. The issue is to detect a reliable pattern - if you can provide one, it can be stripped. Something like
 ESC[0,1h;  (which is a made-up ANSI-like escape sequence). Most probably the COBOL programm will generate PCL or ESC/P code (HP or Epson).
0
 
ReneGeCommented:
@Qlemo
Do you know if there would be any DOS app that would do what GC in PowerShell do?
Thanks
0
Introducing Cloud Class® training courses

Tech changes fast. You can learn faster. That’s why we’re bringing professional training courses to Experts Exchange. With a subscription, you can access all the Cloud Class® courses to expand your education, prep for certifications, and get top-notch instructions.

 
Steve KnightIT ConsultancyCommented:
Can you give us an example please.  If it was a proper "generic / text only" printout from Windows then it shouldnt contain any print codes but if the application has sent them then it might.

What does the application have as options if any related to printers.

It depends you see with the escape codes because if they are like this:

Epson type:
http://www.dragon-it.co.uk/links/epson_printer_codes.htm

or this:

HP type:
http://www.dragon-it.co.uk/links/hp_pcl_codes.htm

etc. then they can be searched for in different ways.  Epson ones are typically started with an Escape character but no specific end to them.  HP ones have a structure typically though also you can combine them.

Another option might be that you have fairly set codes, e.g. the first 4 characters of each line can be removed.

Bed time here but hopefully a few pointers...

Steve
0
 
QlemoBatchelor, Developer and EE Topic AdvisorCommented:
ReneGe,
gc is an alias for get-content, which is very similar to DOS type (but of course with a lot more options and processing capabilities). It is comparable with the VB(s) command to read a file stream.
0
 
Steve KnightIT ConsultancyCommented:
Looking back at some Quick Basic code from 1993 it seems I did this at the time... this was stripping PCL codes from a "printed" report sent to disc to view it on the screen as a preview for selecting before printing ... i.e. If you find an Escape (char 27) then skip upto and including a capital letter, or the end of the line.

Unfortunately best you could do for Esc/P codes etc. I would say is a lookup table of search/replace items.

FUNCTION StripPCL$ (In$)

New$ = In$
DO
      a = INSTR(New$, Esc): b = a
      IF a > 0 THEN
            DO: b = b + 1
                  ch$ = MID$(New$, b, 1)
            LOOP UNTIL ch$ >= "A" AND ch$ < "Z" OR ch$ = ""
            New$ = LEFT$(New$, a - 1) + MID$(New$, b + 1)
      END IF
LOOP UNTIL a = 0

StripPCL$ = New$
END FUNCTION
0
 
ReneGeCommented:
Thanks Q
0
 
Bill PrewCommented:
Need to see a sample file to provide a solution to this.  The COBOL output file may be using things like carriage returns with no line feeds for underlining, or backspace for bolding, or tab, vertical tab or horizontal tab for formatting to mention a few things.  Some of these may need to be removed all together, but others may need to be replaced with something else like spaces or delimiters to allow easy usage in Excel.

~bp
0
 
BMITAuthor Commented:
I have no control over how the COBOL program creates the file, nor how it sends it to the printer.  The only control of this ancient program over which I have control is to which LPT port it sends its data.

This is an accounting system, and I'm trying to get a clean text file of the information we need for the W2s at year's end.  I will generate a file with a couple entries, and then remove the personal data from it and attach it so you can  have an idea what I'm working with.

I do a lot of VBScript and VBA programming, and I've been around programming and computers a while, but haven't heard of this "get-content".  Can you provide more information on this?

Also, I had read that when the Generic / Text print driver captures the data it strips all that extraneous information.  Am I not setting this up correctly?

Thanks,
0
 
QlemoBatchelor, Developer and EE Topic AdvisorCommented:
1. The "Generic / Text" print driver does not strip anything. It just doesn't add print codes to the stream, so if a Windows program renders its printer output, only the plain byte stream is taken and stored as generated by the application.
Similar with DOS programs - the program itself fills in the printer codes, and Generic will not move them out.

2. get-content is part of the mighty PowerShell, which comes with Vista and above, and can be installed on XP / 2003. Most recent release is PowerShell v2, which comes with W7 / W2008R2 (and is again installable on the OS below that).
0
 
BMITAuthor Commented:
I will look into and install the PowerShell, the virtues of which I've heard but have never used.

I have attached a very short example of the file I will be working with.  If I can get this file pared down to no non-standard characters then it would be easy to write a short script that puts it in format I need, specifically space delimited. W2EE.txt
0
 
Bill PrewCommented:
Here's a VBS script that should clean it up so you can import it.  Save as a VBS and then run with either one or two filename arguments on the command line.  If two files are provided, it reads the first, cleans up the codes, and writes to the second.  If only one filename provided it overwrites the file with the changes.

~bp EE27378066.vbs
0
 
QlemoBatchelor, Developer and EE Topic AdvisorCommented:
PowerShell again - will remove all that stuff from the ESC to end of line, and the formfeed, to make it simple. It works for the file attached.
(gc w2ee.txt) -replace "\x1B.+|\x0C" | out-file W2EE-ascii.txt -encoding ascii 

Open in new window

0
 
BMITAuthor Commented:
Guys, can't thank you enough for handling all the coding for me.  Since this is a time-sensitive project, the value of not having to build a script or learn PowerShell immediately cannot be overstated.

Both ways work, but the PowerShell option cleans up the data while better preserving the organization of the file.
I can do the rest of the formatting in Notepad if I wish, but I think I'll build a VBScript to make it do some fancier things.

Thanks,
0
 
Steve KnightIT ConsultancyCommented:
Blimey if it was just two escape code sets I'd have used search & replace in Notepad :-)  It was PCL then as suggested... and the generic way to get rid of is to use code like in my "stripPCL" code above.

Steve
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Cloud Class® Course: Python 3 Fundamentals

This course will teach participants about installing and configuring Python, syntax, importing, statements, types, strings, booleans, files, lists, tuples, comprehensions, functions, and classes.

  • 4
  • 3
  • 3
  • +2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now