Solved

DOS Command to remove CR/LF to have one id and one text per line

Posted on 2016-10-05
13
42 Views
Last Modified: 2016-10-06
I have an output flat file received from a vendor.  In order to load into a table, I need one line per ID/note.  As you can see, there are misc carriage returns, line feeds and blank lines.  The beginning of each line/record will always be numbers but total number digits could vary.  Is there a way to run a DOS command Sample of FF with LF/CR etc. to fix this formatting?
I found a way to remove the blank lines and also to remove all CR/LF which gives me one large text file.
Any thoughts, direction, greatly appreciated.
0
Comment
Question by:Dianna Gibbs
  • 6
  • 4
  • 2
  • +1
13 Comments
 
LVL 23

Expert Comment

by:NVIT
ID: 41830853
Make file oneline.bat.
Change the value of FName to your filename. I used c:\myfile.txt

@echo off
setlocal enabledelayedexpansion

set FName=c:\myfile.txt

for /f "tokens=*" %%a in ('type "%FName%') do (
   set tail=%%a
   echo "!tail!" | findstr /c:"|">nul
   if !errorlevel! equ 0 (
      if defined MainIsSet echo main=!main!
      set main=!tail!
      set MainIsSet=1
   ) else (
      set main=!main!!tail!
   )
)
echo main=!main!

Open in new window


To send the results to file oneline.txt...
Open a CMD window
Type: oneline.bat >oneline.txt
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 41830855
To give you a reliable result without wasting your time with lots of backing-and-forthing, it would work best if you posted some data to process as an example, and a second version of it in it's intended processed form.
0
 
LVL 23

Expert Comment

by:NVIT
ID: 41830858
Edit to line 6. Should be:
for /f "tokens=*" %%a in ('type "%FName%"') do (

Open in new window

0
 
LVL 83

Accepted Solution

by:
oBdA earned 500 total points
ID: 41830861
Batch with a little help from Powershell (version 2.0 or later, so it will run on Windows 7 or later):
@echo off
setlocal
set InputFile=C:\Temp\test.psv
set OutputFile=C:\Temp\out.psv
set Encoding=ASCII
echo Processing '%OutputFile%' ...
powershell.exe -ExecutionPolicy Bypass -Command "[IO.File]::ReadAllText('%InputFile%', [Text.Encoding]::'%Encoding%').Replace(\"`r`n\", ' ') -split ' (?=\d+\|)' | Set-Content -Path '%OutputFile%' -Encoding %Encoding%"
echo ... done.

Open in new window

0
 

Author Comment

by:Dianna Gibbs
ID: 41830905
NVT/LVL22 example is perfect!!!!...I have two follow up questions...

1) now each row begins with "main=" before the number - do I need a second bat to remove that string or can the code be changed?  

2) Second is there a way to add a "space" - see below.

main=1539769350|Specimen A contains 2.5 mL ...
main=1539719992|Original diagnosis :...
main=1539699576|7-year-old male with...

1539313832|A. DUODENUM (BIOPSY):
   DUODENAL MUCOSA WITH NO HISTOPATHOLOGIC ABNORMALITY
 
B. GASTRIC ANTRUM (BIOPSY):
   ANTRAL MUCOSA WITH NO HISTOPATHOLOGIC ABNORMALITY

OUTPUT- See "ABNORMALITYB. - "B"  - space before if possible:
A. DUODENUM (BIOPSY):DUODENAL MUCOSA WITH NO HISTOPATHOLOGIC ABNORMALITYB. GASTRIC ANTRUM (BIOPSY):AN
0
 
LVL 83

Expert Comment

by:oBdA
ID: 41831217
Maybe a word of warning: the "Note" field seems to be free text; as such, it doesn't really lend itself to plain batch processing, as some characters or character combinations may cause issues.
NVIT's script will, for example, strip all exclamation marks, and, worse, any text between a pair of exclamation marks in the same line will be lost, too:
Careful! This will be missing! This will be shown again! More missing! Some more text.

Open in new window

Combinations like the following will report errors:
Specimen labeled "<foo>"
Specimen labeled "A & B"
Other issues like that may arise, depending on what the person entered into the note field.
0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 

Author Comment

by:Dianna Gibbs
ID: 41831779
So what is the best solution?
0
 
LVL 83

Expert Comment

by:oBdA
ID: 41831854
Well, that's why I decided to use Powershell in https:#a41830861
For ease of use, it's a batch file, but it uses Powershell to do the conversion, which doesn't have these issues.
For further refinement:
- Is there a minimum and/or a maximum length for the ID?
- Is the pipe (which is used as delimiter) an allowed character for the Note field, and if so, (how) is it escaped?
0
 

Author Comment

by:Dianna Gibbs
ID: 41832054
Ok thanks for your help - I tried you Powershell but there is still not a space between lines.  can that easily be added to your code?

1539313832|A. DUODENUM (BIOPSY):
    DUODENAL MUCOSA WITH NO HISTOPATHOLOGIC ABNORMALITY
 
 B. GASTRIC ANTRUM (BIOPSY):
    ANTRAL MUCOSA WITH NO HISTOPATHOLOGIC ABNORMALITY

 OUTPUT- See "ABNORMALITYB. - "B"  - space before if possible:
 A. DUODENUM (BIOPSY):DUODENAL MUCOSA WITH NO HISTOPATHOLOGIC ABNORMALITYB. GASTRIC ANTRUM (BIOPSY):AN
0
 

Author Comment

by:Dianna Gibbs
ID: 41832062
Sorry, I had imported the wrong file - your Powershell fixed both issues as far as I can tell - thanks.
0
 

Author Comment

by:Dianna Gibbs
ID: 41832208
oBdA - Quick question on the powershell script - I'm running it as a bat file but it doesn't seem to be disconnecting when it completes - I added an "Exit" at the end - am I missing something else?
0
 

Author Closing Comment

by:Dianna Gibbs
ID: 41832483
Thanks to all who helped me with this!!
0
 
LVL 83

Expert Comment

by:oBdA
ID: 41832615
What do you mean with "disconnecting"? If started using a double-click in Explorer, it should run and then close the window once it's done, and if started from an open command prompt, it will leave the prompt open as usual.
0

Featured Post

Free Trending Threat Insights Every Day

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

TOMORROW TOMORROW.BAT is inspired by a question I get asked over and over again; that is, "How can I use batch file commands to obtain tomorrow's date?" The crux of this batch file revolves around the XCOPY command - a technique I discovered w…
This article was inspired by a question here at Experts Exchange (http://www.experts-exchange.com/Software/Photos_Graphics/Images_and_Photos/Q_28629170.html). The requirements stated in that question are (1) reduce the file size of a large number of…
Excel styles will make formatting consistent and let you apply and change formatting faster. In this tutorial, you'll learn how to use Excel's built-in styles, how to modify styles, and how to create your own. You'll also learn how to use your custo…
This video shows how to remove a single email address from the Outlook 2010 Auto Suggestion memory. NOTE: For Outlook 2016 and 2013 perform the exact same steps. Open a new email: Click the New email button in Outlook. Start typing the address: …

760 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

17 Experts available now in Live!

Get 1:1 Help Now