Solved

DOS Command to remove CR/LF to have one id and one text per line

Posted on 2016-10-05
13
50 Views
Last Modified: 2016-10-06
I have an output flat file received from a vendor.  In order to load into a table, I need one line per ID/note.  As you can see, there are misc carriage returns, line feeds and blank lines.  The beginning of each line/record will always be numbers but total number digits could vary.  Is there a way to run a DOS command Sample of FF with LF/CR etc. to fix this formatting?
I found a way to remove the blank lines and also to remove all CR/LF which gives me one large text file.
Any thoughts, direction, greatly appreciated.
0
Comment
Question by:Dianna Gibbs
  • 6
  • 4
  • 2
  • +1
13 Comments
 
LVL 23

Expert Comment

by:NVIT
ID: 41830853
Make file oneline.bat.
Change the value of FName to your filename. I used c:\myfile.txt

@echo off
setlocal enabledelayedexpansion

set FName=c:\myfile.txt

for /f "tokens=*" %%a in ('type "%FName%') do (
   set tail=%%a
   echo "!tail!" | findstr /c:"|">nul
   if !errorlevel! equ 0 (
      if defined MainIsSet echo main=!main!
      set main=!tail!
      set MainIsSet=1
   ) else (
      set main=!main!!tail!
   )
)
echo main=!main!

Open in new window


To send the results to file oneline.txt...
Open a CMD window
Type: oneline.bat >oneline.txt
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 41830855
To give you a reliable result without wasting your time with lots of backing-and-forthing, it would work best if you posted some data to process as an example, and a second version of it in it's intended processed form.
0
 
LVL 23

Expert Comment

by:NVIT
ID: 41830858
Edit to line 6. Should be:
for /f "tokens=*" %%a in ('type "%FName%"') do (

Open in new window

0
 
LVL 83

Accepted Solution

by:
oBdA earned 500 total points
ID: 41830861
Batch with a little help from Powershell (version 2.0 or later, so it will run on Windows 7 or later):
@echo off
setlocal
set InputFile=C:\Temp\test.psv
set OutputFile=C:\Temp\out.psv
set Encoding=ASCII
echo Processing '%OutputFile%' ...
powershell.exe -ExecutionPolicy Bypass -Command "[IO.File]::ReadAllText('%InputFile%', [Text.Encoding]::'%Encoding%').Replace(\"`r`n\", ' ') -split ' (?=\d+\|)' | Set-Content -Path '%OutputFile%' -Encoding %Encoding%"
echo ... done.

Open in new window

0
 

Author Comment

by:Dianna Gibbs
ID: 41830905
NVT/LVL22 example is perfect!!!!...I have two follow up questions...

1) now each row begins with "main=" before the number - do I need a second bat to remove that string or can the code be changed?  

2) Second is there a way to add a "space" - see below.

main=1539769350|Specimen A contains 2.5 mL ...
main=1539719992|Original diagnosis :...
main=1539699576|7-year-old male with...

1539313832|A. DUODENUM (BIOPSY):
   DUODENAL MUCOSA WITH NO HISTOPATHOLOGIC ABNORMALITY
 
B. GASTRIC ANTRUM (BIOPSY):
   ANTRAL MUCOSA WITH NO HISTOPATHOLOGIC ABNORMALITY

OUTPUT- See "ABNORMALITYB. - "B"  - space before if possible:
A. DUODENUM (BIOPSY):DUODENAL MUCOSA WITH NO HISTOPATHOLOGIC ABNORMALITYB. GASTRIC ANTRUM (BIOPSY):AN
0
 
LVL 83

Expert Comment

by:oBdA
ID: 41831217
Maybe a word of warning: the "Note" field seems to be free text; as such, it doesn't really lend itself to plain batch processing, as some characters or character combinations may cause issues.
NVIT's script will, for example, strip all exclamation marks, and, worse, any text between a pair of exclamation marks in the same line will be lost, too:
Careful! This will be missing! This will be shown again! More missing! Some more text.

Open in new window

Combinations like the following will report errors:
Specimen labeled "<foo>"
Specimen labeled "A & B"
Other issues like that may arise, depending on what the person entered into the note field.
0
3 Use Cases for Connected Systems

Our Dev teams are like yours. They’re continually cranking out code for new features/bugs fixes, testing, deploying, testing some more, responding to production monitoring events and more. It’s complex. So, we thought you’d like to see what’s working for us.

 

Author Comment

by:Dianna Gibbs
ID: 41831779
So what is the best solution?
0
 
LVL 83

Expert Comment

by:oBdA
ID: 41831854
Well, that's why I decided to use Powershell in https:#a41830861
For ease of use, it's a batch file, but it uses Powershell to do the conversion, which doesn't have these issues.
For further refinement:
- Is there a minimum and/or a maximum length for the ID?
- Is the pipe (which is used as delimiter) an allowed character for the Note field, and if so, (how) is it escaped?
0
 

Author Comment

by:Dianna Gibbs
ID: 41832054
Ok thanks for your help - I tried you Powershell but there is still not a space between lines.  can that easily be added to your code?

1539313832|A. DUODENUM (BIOPSY):
    DUODENAL MUCOSA WITH NO HISTOPATHOLOGIC ABNORMALITY
 
 B. GASTRIC ANTRUM (BIOPSY):
    ANTRAL MUCOSA WITH NO HISTOPATHOLOGIC ABNORMALITY

 OUTPUT- See "ABNORMALITYB. - "B"  - space before if possible:
 A. DUODENUM (BIOPSY):DUODENAL MUCOSA WITH NO HISTOPATHOLOGIC ABNORMALITYB. GASTRIC ANTRUM (BIOPSY):AN
0
 

Author Comment

by:Dianna Gibbs
ID: 41832062
Sorry, I had imported the wrong file - your Powershell fixed both issues as far as I can tell - thanks.
0
 

Author Comment

by:Dianna Gibbs
ID: 41832208
oBdA - Quick question on the powershell script - I'm running it as a bat file but it doesn't seem to be disconnecting when it completes - I added an "Exit" at the end - am I missing something else?
0
 

Author Closing Comment

by:Dianna Gibbs
ID: 41832483
Thanks to all who helped me with this!!
0
 
LVL 83

Expert Comment

by:oBdA
ID: 41832615
What do you mean with "disconnecting"? If started using a double-click in Explorer, it should run and then close the window once it's done, and if started from an open command prompt, it will leave the prompt open as usual.
0

Featured Post

3 Use Cases for Connected Systems

Our Dev teams are like yours. They’re continually cranking out code for new features/bugs fixes, testing, deploying, testing some more, responding to production monitoring events and more. It’s complex. So, we thought you’d like to see what’s working for us.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I have published numerous articles here at Experts Exchange that present programs/scripts written in a language called AutoHotkey. Each of those articles has a brief paragraph describing where to download the product and how to install it. I have al…
How to remove superseded packages in windows w60 or w61 installation media (.wim) or online system to prevent unnecessary space. w60 means Windows Vista or Windows Server 2008. w61 means Windows 7 or Windows Server 2008 R2. There are various …
This Micro Tutorial will teach you how to censor certain areas of your screen. The example in this video will show a little boy's face being blurred. This will be demonstrated using Adobe Premiere Pro CS6.
Learn how to create flexible layouts using relative units in CSS.  New relative units added in CSS3 include vw(viewports width), vh(viewports height), vmin(minimum of viewports height and width), and vmax (maximum of viewports height and width).

920 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now