Solved

DOS Command to remove CR/LF to have one id and one text per line

Posted on 2016-10-05
13
68 Views
Last Modified: 2016-10-06
I have an output flat file received from a vendor.  In order to load into a table, I need one line per ID/note.  As you can see, there are misc carriage returns, line feeds and blank lines.  The beginning of each line/record will always be numbers but total number digits could vary.  Is there a way to run a DOS command Sample of FF with LF/CR etc. to fix this formatting?
I found a way to remove the blank lines and also to remove all CR/LF which gives me one large text file.
Any thoughts, direction, greatly appreciated.
0
Comment
Question by:Dianna Gibbs
  • 6
  • 4
  • 2
  • +1
13 Comments
 
LVL 24

Expert Comment

by:NVIT
ID: 41830853
Make file oneline.bat.
Change the value of FName to your filename. I used c:\myfile.txt

@echo off
setlocal enabledelayedexpansion

set FName=c:\myfile.txt

for /f "tokens=*" %%a in ('type "%FName%') do (
   set tail=%%a
   echo "!tail!" | findstr /c:"|">nul
   if !errorlevel! equ 0 (
      if defined MainIsSet echo main=!main!
      set main=!tail!
      set MainIsSet=1
   ) else (
      set main=!main!!tail!
   )
)
echo main=!main!

Open in new window


To send the results to file oneline.txt...
Open a CMD window
Type: oneline.bat >oneline.txt
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 41830855
To give you a reliable result without wasting your time with lots of backing-and-forthing, it would work best if you posted some data to process as an example, and a second version of it in it's intended processed form.
0
 
LVL 24

Expert Comment

by:NVIT
ID: 41830858
Edit to line 6. Should be:
for /f "tokens=*" %%a in ('type "%FName%"') do (

Open in new window

0
Guide to Performance: Optimization & Monitoring

Nowadays, monitoring is a mixture of tools, systems, and codes—making it a very complex process. And with this complexity, comes variables for failure. Get DZone’s new Guide to Performance to learn how to proactively find these variables and solve them before a disruption occurs.

 
LVL 84

Accepted Solution

by:
oBdA earned 500 total points
ID: 41830861
Batch with a little help from Powershell (version 2.0 or later, so it will run on Windows 7 or later):
@echo off
setlocal
set InputFile=C:\Temp\test.psv
set OutputFile=C:\Temp\out.psv
set Encoding=ASCII
echo Processing '%OutputFile%' ...
powershell.exe -ExecutionPolicy Bypass -Command "[IO.File]::ReadAllText('%InputFile%', [Text.Encoding]::'%Encoding%').Replace(\"`r`n\", ' ') -split ' (?=\d+\|)' | Set-Content -Path '%OutputFile%' -Encoding %Encoding%"
echo ... done.

Open in new window

0
 

Author Comment

by:Dianna Gibbs
ID: 41830905
NVT/LVL22 example is perfect!!!!...I have two follow up questions...

1) now each row begins with "main=" before the number - do I need a second bat to remove that string or can the code be changed?  

2) Second is there a way to add a "space" - see below.

main=1539769350|Specimen A contains 2.5 mL ...
main=1539719992|Original diagnosis :...
main=1539699576|7-year-old male with...

1539313832|A. DUODENUM (BIOPSY):
   DUODENAL MUCOSA WITH NO HISTOPATHOLOGIC ABNORMALITY
 
B. GASTRIC ANTRUM (BIOPSY):
   ANTRAL MUCOSA WITH NO HISTOPATHOLOGIC ABNORMALITY

OUTPUT- See "ABNORMALITYB. - "B"  - space before if possible:
A. DUODENUM (BIOPSY):DUODENAL MUCOSA WITH NO HISTOPATHOLOGIC ABNORMALITYB. GASTRIC ANTRUM (BIOPSY):AN
0
 
LVL 84

Expert Comment

by:oBdA
ID: 41831217
Maybe a word of warning: the "Note" field seems to be free text; as such, it doesn't really lend itself to plain batch processing, as some characters or character combinations may cause issues.
NVIT's script will, for example, strip all exclamation marks, and, worse, any text between a pair of exclamation marks in the same line will be lost, too:
Careful! This will be missing! This will be shown again! More missing! Some more text.

Open in new window

Combinations like the following will report errors:
Specimen labeled "<foo>"
Specimen labeled "A & B"
Other issues like that may arise, depending on what the person entered into the note field.
0
 

Author Comment

by:Dianna Gibbs
ID: 41831779
So what is the best solution?
0
 
LVL 84

Expert Comment

by:oBdA
ID: 41831854
Well, that's why I decided to use Powershell in https:#a41830861
For ease of use, it's a batch file, but it uses Powershell to do the conversion, which doesn't have these issues.
For further refinement:
- Is there a minimum and/or a maximum length for the ID?
- Is the pipe (which is used as delimiter) an allowed character for the Note field, and if so, (how) is it escaped?
0
 

Author Comment

by:Dianna Gibbs
ID: 41832054
Ok thanks for your help - I tried you Powershell but there is still not a space between lines.  can that easily be added to your code?

1539313832|A. DUODENUM (BIOPSY):
    DUODENAL MUCOSA WITH NO HISTOPATHOLOGIC ABNORMALITY
 
 B. GASTRIC ANTRUM (BIOPSY):
    ANTRAL MUCOSA WITH NO HISTOPATHOLOGIC ABNORMALITY

 OUTPUT- See "ABNORMALITYB. - "B"  - space before if possible:
 A. DUODENUM (BIOPSY):DUODENAL MUCOSA WITH NO HISTOPATHOLOGIC ABNORMALITYB. GASTRIC ANTRUM (BIOPSY):AN
0
 

Author Comment

by:Dianna Gibbs
ID: 41832062
Sorry, I had imported the wrong file - your Powershell fixed both issues as far as I can tell - thanks.
0
 

Author Comment

by:Dianna Gibbs
ID: 41832208
oBdA - Quick question on the powershell script - I'm running it as a bat file but it doesn't seem to be disconnecting when it completes - I added an "Exit" at the end - am I missing something else?
0
 

Author Closing Comment

by:Dianna Gibbs
ID: 41832483
Thanks to all who helped me with this!!
0
 
LVL 84

Expert Comment

by:oBdA
ID: 41832615
What do you mean with "disconnecting"? If started using a double-click in Explorer, it should run and then close the window once it's done, and if started from an open command prompt, it will leave the prompt open as usual.
0

Featured Post

Secure Your Active Directory - April 20, 2017

Active Directory plays a critical role in your company’s IT infrastructure and keeping it secure in today’s hacker-infested world is a must.
Microsoft published 300+ pages of guidance, but who has the time, money, and resources to implement? Register now to find an easier way.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

This article was inspired by a question here at Experts Exchange (http://www.experts-exchange.com/Software/Photos_Graphics/Images_and_Photos/Q_28629170.html). The requirements stated in that question are (1) reduce the file size of a large number of…
Introduction: Recently, I got a requirement to zip all files individually with batch file script in Windows OS. I don't know much about scripting, but I searched Google and found a lot of examples and websites to complete my task. Finally, I was ab…
A short tutorial showing how to set up an email signature in Outlook on the Web (previously known as OWA). For free email signatures designs, visit https://www.mail-signatures.com/articles/signature-templates/?sts=6651 If you want to manage em…
Finds all prime numbers in a range requested and places them in a public primes() array. I've demostrated a template size of 30 (2 * 3 * 5) but larger templates can be built such 210  (2 * 3 * 5 * 7) or 2310  (2 * 3 * 5 * 7 * 11). The larger templa…

726 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question