format file with sed

Hey All - just need some help with sed please?

I've almost got my output file where I would like it to be.  Here is what I am starting with:

(0010,0010) PN [TESTERMAN^TESTERMAN^^^]                       #  16, 1 PatientName
(0010,0020) LO [10026487]                               #   8, 1 PatientID
(0010,0030) DA [18750101]                               #   8, 1 PatientBirthDate
(0008,0020) DA [20130305]                               #   8, 1 StudyDate
(0008,0050) SH [1074110]                                #   8, 1 AccessionNumber
(0010,0040) CS [M]                                      #   2, 1 PatientSex

(0010,0010) PN [TESTERMAN^TESTERMAN^^^]                       #  16, 1 PatientName
(0010,0020) LO [10026487]                               #   8, 1 PatientID
(0010,0030) DA [18750101]                               #   8, 1 PatientBirthDate
(0008,0020) DA [20130305]                               #   8, 1 StudyDate
(0008,0050) SH [1074110]                                #   8, 1 AccessionNumber
(0010,0040) CS [M]                                      #   2, 1 PatientSex

(0010,0010) PN [TESTERMAN^TESTERMAN^^^]                       #  16, 1 PatientName
(0010,0020) LO [10026487]                               #   8, 1 PatientID
(0010,0030) DA [18750101]                               #   8, 1 PatientBirthDate
(0008,0020) DA [20130305]                               #   8, 1 StudyDate
(0008,0050) SH [1074110]                                #   8, 1 AccessionNumber
(0010,0040) CS [M]                                      #   2, 1 PatientSex

(0010,0010) PN [TESTERMAN^TESTERMAN^^^]                       #  16, 1 PatientName
(0010,0020) LO [10026487]                               #   8, 1 PatientID
(0010,0030) DA [18750101]                               #   8, 1 PatientBirthDate
(0008,0020) DA [20130305]                               #   8, 1 StudyDate
(0008,0050) SH [1074110]                                #   8, 1 AccessionNumber
(0010,0040) CS [M]                                      #   2, 1 PatientSex

(0010,0010) PN [TESTERMAN^TESTERMAN^^^]                       #  16, 1 PatientName
(0010,0020) LO [10026487]                               #   8, 1 PatientID
(0010,0030) DA [18750101]                               #   8, 1 PatientBirthDate
(0008,0020) DA [20130305]                               #   8, 1 StudyDate
(0008,0050) SH [1074110]                                #   8, 1 AccessionNumber
(0010,0040) CS [M]                                      #   2, 1 PatientSex

(0010,0010) PN [TESTERMAN^TESTERMAN^^^]                       #  16, 1 PatientName
(0010,0020) LO [10026487]                               #   8, 1 PatientID
(0010,0030) DA [18750101]                               #   8, 1 PatientBirthDate
(0008,0020) DA [20130305]                               #   8, 1 StudyDate
(0008,0050) SH [1074110]                                #   8, 1 AccessionNumber
(0010,0040) CS [M]                                      #   2, 1 PatientSex

(0010,0010) PN [TESTERMAN^TESTERMAN^^^]                       #  16, 1 PatientName
(0010,0020) LO [10026487]                               #   8, 1 PatientID
(0010,0030) DA [18750101]                               #   8, 1 PatientBirthDate
(0008,0020) DA [20130305]                               #   8, 1 StudyDate
(0008,0050) SH [1074110]                                #   8, 1 AccessionNumber
(0010,0040) CS [M]                                      #   2, 1 PatientSex

(0010,0010) PN [TESTERMAN^TESTERMAN^^^]                       #  16, 1 PatientName
(0010,0020) LO [10026487]                               #   8, 1 PatientID
(0010,0030) DA [18750101]                               #   8, 1 PatientBirthDate
(0008,0020) DA [20130305]                               #   8, 1 StudyDate
(0008,0050) SH [1074110]                                #   8, 1 AccessionNumber
(0010,0040) CS [M]                                      #   2, 1 PatientSex
E: DcmElement: Unknown Tag & Data (3028,3130) larger (808463408) than remaining bytes in file
E: dcmdump: I/O suspension or premature end of stream: reading file: d:\import\output.txt

Open in new window


...notice that this just repeats, I am only wanting one copy of each unique line.  This could end up repeated more than what is shown depending on what I run my script against to generate this text.

This is what I would like for it to end up displaying as (but with two 'tabs' after the field description so that everything is spaced correctly):

StudyDate		20130305
AccessionNumber	1074110
PatientName	TESTERMAN^TESTERMAN^^^
PatientID		10026487
PatientBirthDate	18750101
PatientSex		M

Open in new window


...& here is the code I am currently using:

sort output.txt | uniq | sed /E:/d | sed -e "/\[/s/.*\[\(.*\)\]/\1/"
e "s/^\(.*(no value available)\)$//" > test.txt

Open in new window


this is what I am currently getting back when I run the above code:

20130305                               #   8, 1 StudyDate
1074110                                #   8, 1 AccessionNumber
TESTERMAN^TESTERMAN^^^                       #  16, 1 PatientName
10026487                               #   8, 1 PatientID
18750101                               #   8, 1 PatientBirthDate
F                                      #   2, 1 PatientSex

Open in new window


I am currently running the above code in windows with gnuwin32 (coreutils for windows).  I could also run this in cygwin, but it will have to be executed on a PC that quite a few people will be doing the same task on.
doc_jayAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Gerwin Jansen, EE MVETopic Advisor Commented:
Seems like you want to replace a pattern with one or more spaces by 2 tab characters :)

Like adding this at the end:

| sed 's/[ ][ ]*/\t\t/'

Open in new window

0
doc_jayAuthor Commented:
thanks, but the above code only put two tabs 'in front' of everything to the left.  I would like for it all to be indented to the left side.  Also, I am looking for the right side of the text file to be swapped with the left side so that it ends up looking like:

StudyDate		20130305
AccessionNumber	1074110
PatientName	TESTERMAN^TESTERMAN^^^
PatientID		10026487
PatientBirthDate	18750101
PatientSex		M

Open in new window

0
Gerwin Jansen, EE MVETopic Advisor Commented:
I see, try this:
sort -u output.txt  | grep -v "^[ ]*$" | grep -v "^E:" | sed 's/\[//g;s/\]//g' | awk '{ if (length($7) > 15) print $7 "\t" $1; else print $7 "\t\t" $1 }'

Open in new window


Or do you want a sed only version?
0
10 Tips to Protect Your Business from Ransomware

Did you know that ransomware is the most widespread, destructive malware in the world today? It accounts for 39% of all security breaches, with ransomware gangsters projected to make $11.5B in profits from online extortion by 2019.

doc_jayAuthor Commented:
I'm running this in a windows shell right now and it comes back with:

Input file specified two times.

awk: '{
awk: ^ invalid char ''' in expression

Do you mind posting a sed only version?
0
Gerwin Jansen, EE MVETopic Advisor Commented:
Can you try this first:

awk: "{ ... }"

(so a double quote instead of a single quote)
0
doc_jayAuthor Commented:
That didn't seem to work either.  Here is the output:

D:\import>sort -u output.txt  | grep -v "^[ ]*$" | grep -v "^E:" | sed 's/\[//g;
s/\]//g' | awk "{ if (length($7) > 15) print $7 "\t" $1; else print $7 "\t\t" $1
 }"
Input file specified two times.

awk: { if (length($7) > 15) print $7 \t $1; else print $7 \t\t $1 }
awk:                                 ^ backslash not last character on line

Open in new window

0
Gerwin Jansen, EE MVETopic Advisor Commented:
I don't have cygwin atm, so can really test that. I've got an sed suggestion for you:

sort -u output.txt  | sed "/^[ ]*$/d;/^E:/d;s/\[//g;s/\]//g" | sed "s/  */ /g" | sed "s/\(.* \).* .* .* .* .* \(.*\)/\2\t\1/"

Open in new window

Output:

StudyDate	(0008,0020) 
AccessionNumber	(0008,0050) 
PatientName	(0010,0010) 
PatientID	(0010,0020) 
PatientBirthDate	(0010,0030) 
PatientSex	(0010,0040) 

Open in new window



The first sed is removing unwanted lines, the 2nd one is replacing multiple spaces with one space and the third one is printing the last and the first field with a tab as a separator. Getting a conditional one or 2 tabs by looking at the length of the first field is not really possible in sed. If you can use expand for example, you can align like this:

sort -u output.txt  | sed "/^[ ]*$/d;/^E:/d;s/\[//g;s/\]//g" | sed "s/  */ /g" | sed "s/\(.* \).* .* .* .* .* \(.*\)/\2\t\1/" | expand -t 20

Open in new window


Output:

StudyDate           (0008,0020) 
AccessionNumber     (0008,0050) 
PatientName         (0010,0010) 
PatientID           (0010,0020) 
PatientBirthDate    (0010,0030) 
PatientSex          (0010,0040) 

Open in new window

0
doc_jayAuthor Commented:
thanks -

  I tried your first 'sed' suggestion from your last post and it is removing the wrong info and leaving info that I would like stripped away.

I would like this to left in the output file:

StudyDate		20130305
AccessionNumber	1074110
PatientName	TESTERMAN^TESTERMAN^^^
PatientID		10026487
PatientBirthDate	18750101
PatientSex		M

Open in new window


instead I am left with:  

StudyDate           (0008,0020) 
AccessionNumber     (0008,0050) 
PatientName         (0010,0010) 
PatientID           (0010,0020) 
PatientBirthDate    (0010,0030) 
PatientSex          (0010,0040) 

Open in new window


--also I needed to remove the '-u' option from sort for it to work in cygwin.

As for your last example to try, I can't use the last command 'expand' with the '-t' option.  Is 'expand' a linux tool that I can get?
0
Gerwin Jansen, EE MVETopic Advisor Commented:
Ah, got the wrong field :)

About expand - it is a standard Linux tool: you should be able to add it to cygwin using the setup.exe of cygwin itself.

This is getting you the correct field:
sort -u sample.txt  | sed "/^[ ]*$/d;/^E:/d;s/\[//g;s/\]//g" | sed "s/  */ /g" | sed "s/.* .* \(.* \).* .* .* \(.*\)/\2\t\1/" | expand -t 20

Open in new window


sort -u -> this is a unique sort, you can replace by:

sort | uniq
0
doc_jayAuthor Commented:
thanks - its almost there, here is the output:

StudyDate
          20130305 
AccessionNumber
    1074110 
PatientName
        HYDE^JECKYL^^^ 
PatientID
          10026XXX 
PatientBirthDate
   19011231 
PatientSex
         M 

Open in new window


except in notepad (for windows) it is all displayed on one line.
0
Gerwin Jansen, EE MVETopic Advisor Commented:
It seems the seconde pattern matched includes the newline, can you try this:
sort -u sample.txt  | sed "/^[ ]*$/d;/^E:/d;s/\[//g;s/\]//g" | sed "s/  */ /g" | sed "s/.* .* \(.* \).* .* .* \(.*\)$/\2\t\1/" | expand -t 20

Open in new window

0
doc_jayAuthor Commented:
no luck, now there is a new line after each word or 'entry':

StudyDate
          20130305 
AccessionNumber
    10741103 
PatientName
        TESTERMAN^TESTERMAN^^^ 
PatientID
          100263487 
PatientBirthDate
   18750101 
PatientSex
         M

Open in new window

0
Gerwin Jansen, EE MVETopic Advisor Commented:
Pls post (attach file) your output.txt - I'll check later with cygwin

<edit>

I checked with cygwin, output looks OK to me:

$ head output.txt
(0010,0010) PN [TESTERMAN^TESTERMAN^^^]                       #  16, 1 PatientName
(0010,0020) LO [10026487]                               #   8, 1 PatientID
(0010,0030) DA [18750101]                               #   8, 1 PatientBirthDate
(0008,0020) DA [20130305]                               #   8, 1 StudyDate
(0008,0050) SH [1074110]                                #   8, 1 AccessionNumber
(0010,0040) CS [M]                                      #   2, 1 PatientSex

(0010,0010) PN [TESTERMAN^TESTERMAN^^^]                       #  16, 1 PatientName
(0010,0020) LO [10026487]                               #   8, 1 PatientID
(0010,0030) DA [18750101]                               #   8, 1 PatientBirthDate

user@host ~
$ sort -u output.txt  | sed "/^[ ]*$/d;/^E:/d;s/\[//g;s/\]//g" | sed "s/  */ /g" | sed "s/.* .* \(.* \).* .* .* \(.*\)/\2\t\1/" | expand -t 20
StudyDate           20130305
AccessionNumber     1074110
PatientName         TESTERMAN^TESTERMAN^^^
PatientID           10026487
PatientBirthDate    18750101
PatientSex          M

user@host ~

Open in new window

So how are you running this in cygwin and how to you get the output into notepad?
0
doc_jayAuthor Commented:
@gerwinjansen -

  sorry - it looked like I missed this sed
 sed "s/  */ /g"

Open in new window


--after the 'expand' command I am doing
 > output_test.txt

Open in new window


This is a SS of how I am viewing the final file with Notepad++

I'm saving the output to a .txt file and opening with notepad
0
Gerwin Jansen, EE MVETopic Advisor Commented:
Can you answer my 2 questions from above? I checked in cygwin an my output file has the 2 fields on the same line. Note that your screenshot shows 2 tabs on the 'next' line where my sed command inserts just one.
0
doc_jayAuthor Commented:
This is what I am running in cygwin:

$ sort -u output.txt  | sed "/^[ ]*$/d;/^E:/d;s/\[//g;s/\]//g" | sed "s/  */ /g" | sed "s/  */ /g" | sed "s/.* .* \(.* \).* .* .* \(.*\)$/\2\t\1/" | expand -t 20 > test_output.txt

Open in new window


I get the output into notepad++ by using ">test_output.txt"

I have also attached the test_output.txt file that is generated.
test-output.txt
0
Gerwin Jansen, EE MVETopic Advisor Commented:
Thanks, please post your 'input' file as well, you called it 'output.txt', this is what I asked for as well.
0
doc_jayAuthor Commented:
Here it is for you.  its called output.txt becuase this info is being generated from another source.

thanks
output.txt
0
Gerwin Jansen, EE MVETopic Advisor Commented:
Thanks for the output.txt file -> only now I see that this input file is the file that is lacking linefeeds. See attached image of your output.txt opened with notepad.

Adding a linefeed as a last sed command will get you the correct output file:
sort -u output.txt  | sed "/^[ ]*$/d;/^E:/d;s/\[//g;s/\]//g" | sed "s/  */ /g" | sed "s/  */ /g" | sed "s/.* .* \(.* \).* .* .* \(.*\)$/\2\t\1/;s/$/\r/" | expand -t 20 > test_output.txt

Open in new window

This is your input:
output.txt shown in notepadThis is your result:
test_output.txt shown in notepad
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
doc_jayAuthor Commented:
Great!  I'll give this a shot 1st thing in the morning and hopefully I can get the whole process to flow.  End result is to get this emailed to myself and a co-worker when a remote site rips a CD with dicom to send to us.
0
doc_jayAuthor Commented:
So, this works great in cygwin.  I made a script to do this and I can run it and it creates the  new 'output' file within cygwin.  My ultimate goal is to run this from a command prompt in windows, which shouldn't be a problem if I just call bash from a command line to run the unix script.  

Here is where I am hitting a wall, when I run my .bat file:

echo
SET dcmtk=d:\apps\dcmtk\bin
%dcmtk%\dcmdump d:\import +sd +r -s +P  "0010,0010" +P "0010,0020" +P "0010,0030" +P "0008,0020" +P "0008,0050" +P "0010,0040" > d:\import\output.txt
c:\cygwin\bin\bash dcmdump_out_format_script.sh

Open in new window


my 'dcmdump_out_format_script.sh'

#!/bin/bash
sort -u /cygdrive/d/import/output.txt  | sed "/^[ ]*$/d;/^E:/d;s/\[//g;s/\]//g" | sed "s/  */ /g" | sed "s/  */ /g" | sed "s/.* .* \(.* \).* .* .* \(.*\)$/\2\t\1/;s/$/\r/" | expand -t 20 > /cygdrive/d/import/test_output.txt

Open in new window


it errors out with:  Invalid switch

and it also displays in my 'test_output.txt' file below

Microsoft (R) File Expansion Utility  Version 5.1.2600.0
Copyright (C) Microsoft Corp 1990-1999.  All rights reserved.

Unrecognized switch -t.

Open in new window


I don't know how to tell  it to run the 'expand' tool from cygwin instead of the microsoft tool!

any ideas?

thanks for all of your help by the way!!
0
Gerwin Jansen, EE MVETopic Advisor Commented:
Ah, it should be a path issue, the Windows path containing 'expand' is in front of the cygwin path that has expand.  You could add the full path to your cygwin expand or try copying the cywin expand to expand1 for example and replace in the sed line above. Let me know if you need help with that.

<edit>

I checked, copying cygwin\bin\expand.exe to expand1.exe works the way I intended. You can try and verify.
0
doc_jayAuthor Commented:
Thanks for your help on this - I got this running through a command prompt in windows with some different code.  I kept on getting a lot of 'invalid switch' when I would use bash from windows command prompt, even though it ran fine within a cygwin console.

I ended up following the suggestion here.

points on the way & thank you again!
0
doc_jayAuthor Commented:
excellent work!
0
Gerwin Jansen, EE MVETopic Advisor Commented:
Thanks ;)
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Linux

From novice to tech pro — start learning today.