doc_jay
asked on
format file with sed
Hey All - just need some help with sed please?
I've almost got my output file where I would like it to be. Here is what I am starting with:
...notice that this just repeats, I am only wanting one copy of each unique line. This could end up repeated more than what is shown depending on what I run my script against to generate this text.
This is what I would like for it to end up displaying as (but with two 'tabs' after the field description so that everything is spaced correctly):
...& here is the code I am currently using:
this is what I am currently getting back when I run the above code:
I am currently running the above code in windows with gnuwin32 (coreutils for windows). I could also run this in cygwin, but it will have to be executed on a PC that quite a few people will be doing the same task on.
I've almost got my output file where I would like it to be. Here is what I am starting with:
(0010,0010) PN [TESTERMAN^TESTERMAN^^^] # 16, 1 PatientName
(0010,0020) LO [10026487] # 8, 1 PatientID
(0010,0030) DA [18750101] # 8, 1 PatientBirthDate
(0008,0020) DA [20130305] # 8, 1 StudyDate
(0008,0050) SH [1074110] # 8, 1 AccessionNumber
(0010,0040) CS [M] # 2, 1 PatientSex
(0010,0010) PN [TESTERMAN^TESTERMAN^^^] # 16, 1 PatientName
(0010,0020) LO [10026487] # 8, 1 PatientID
(0010,0030) DA [18750101] # 8, 1 PatientBirthDate
(0008,0020) DA [20130305] # 8, 1 StudyDate
(0008,0050) SH [1074110] # 8, 1 AccessionNumber
(0010,0040) CS [M] # 2, 1 PatientSex
(0010,0010) PN [TESTERMAN^TESTERMAN^^^] # 16, 1 PatientName
(0010,0020) LO [10026487] # 8, 1 PatientID
(0010,0030) DA [18750101] # 8, 1 PatientBirthDate
(0008,0020) DA [20130305] # 8, 1 StudyDate
(0008,0050) SH [1074110] # 8, 1 AccessionNumber
(0010,0040) CS [M] # 2, 1 PatientSex
(0010,0010) PN [TESTERMAN^TESTERMAN^^^] # 16, 1 PatientName
(0010,0020) LO [10026487] # 8, 1 PatientID
(0010,0030) DA [18750101] # 8, 1 PatientBirthDate
(0008,0020) DA [20130305] # 8, 1 StudyDate
(0008,0050) SH [1074110] # 8, 1 AccessionNumber
(0010,0040) CS [M] # 2, 1 PatientSex
(0010,0010) PN [TESTERMAN^TESTERMAN^^^] # 16, 1 PatientName
(0010,0020) LO [10026487] # 8, 1 PatientID
(0010,0030) DA [18750101] # 8, 1 PatientBirthDate
(0008,0020) DA [20130305] # 8, 1 StudyDate
(0008,0050) SH [1074110] # 8, 1 AccessionNumber
(0010,0040) CS [M] # 2, 1 PatientSex
(0010,0010) PN [TESTERMAN^TESTERMAN^^^] # 16, 1 PatientName
(0010,0020) LO [10026487] # 8, 1 PatientID
(0010,0030) DA [18750101] # 8, 1 PatientBirthDate
(0008,0020) DA [20130305] # 8, 1 StudyDate
(0008,0050) SH [1074110] # 8, 1 AccessionNumber
(0010,0040) CS [M] # 2, 1 PatientSex
(0010,0010) PN [TESTERMAN^TESTERMAN^^^] # 16, 1 PatientName
(0010,0020) LO [10026487] # 8, 1 PatientID
(0010,0030) DA [18750101] # 8, 1 PatientBirthDate
(0008,0020) DA [20130305] # 8, 1 StudyDate
(0008,0050) SH [1074110] # 8, 1 AccessionNumber
(0010,0040) CS [M] # 2, 1 PatientSex
(0010,0010) PN [TESTERMAN^TESTERMAN^^^] # 16, 1 PatientName
(0010,0020) LO [10026487] # 8, 1 PatientID
(0010,0030) DA [18750101] # 8, 1 PatientBirthDate
(0008,0020) DA [20130305] # 8, 1 StudyDate
(0008,0050) SH [1074110] # 8, 1 AccessionNumber
(0010,0040) CS [M] # 2, 1 PatientSex
E: DcmElement: Unknown Tag & Data (3028,3130) larger (808463408) than remaining bytes in file
E: dcmdump: I/O suspension or premature end of stream: reading file: d:\import\output.txt
...notice that this just repeats, I am only wanting one copy of each unique line. This could end up repeated more than what is shown depending on what I run my script against to generate this text.
This is what I would like for it to end up displaying as (but with two 'tabs' after the field description so that everything is spaced correctly):
StudyDate 20130305
AccessionNumber 1074110
PatientName TESTERMAN^TESTERMAN^^^
PatientID 10026487
PatientBirthDate 18750101
PatientSex M
...& here is the code I am currently using:
sort output.txt | uniq | sed /E:/d | sed -e "/\[/s/.*\[\(.*\)\]/\1/"
e "s/^\(.*(no value available)\)$//" > test.txt
this is what I am currently getting back when I run the above code:
20130305 # 8, 1 StudyDate
1074110 # 8, 1 AccessionNumber
TESTERMAN^TESTERMAN^^^ # 16, 1 PatientName
10026487 # 8, 1 PatientID
18750101 # 8, 1 PatientBirthDate
F # 2, 1 PatientSex
I am currently running the above code in windows with gnuwin32 (coreutils for windows). I could also run this in cygwin, but it will have to be executed on a PC that quite a few people will be doing the same task on.
ASKER
thanks, but the above code only put two tabs 'in front' of everything to the left. I would like for it all to be indented to the left side. Also, I am looking for the right side of the text file to be swapped with the left side so that it ends up looking like:
StudyDate 20130305
AccessionNumber 1074110
PatientName TESTERMAN^TESTERMAN^^^
PatientID 10026487
PatientBirthDate 18750101
PatientSex M
I see, try this:
Or do you want a sed only version?
sort -u output.txt | grep -v "^[ ]*$" | grep -v "^E:" | sed 's/\[//g;s/\]//g' | awk '{ if (length($7) > 15) print $7 "\t" $1; else print $7 "\t\t" $1 }'
Or do you want a sed only version?
ASKER
I'm running this in a windows shell right now and it comes back with:
Do you mind posting a sed only version?
Input file specified two times.
awk: '{
awk: ^ invalid char ''' in expression
Do you mind posting a sed only version?
Can you try this first:
awk: "{ ... }"
(so a double quote instead of a single quote)
awk: "{ ... }"
(so a double quote instead of a single quote)
ASKER
That didn't seem to work either. Here is the output:
D:\import>sort -u output.txt | grep -v "^[ ]*$" | grep -v "^E:" | sed 's/\[//g;
s/\]//g' | awk "{ if (length($7) > 15) print $7 "\t" $1; else print $7 "\t\t" $1
}"
Input file specified two times.
awk: { if (length($7) > 15) print $7 \t $1; else print $7 \t\t $1 }
awk: ^ backslash not last character on line
I don't have cygwin atm, so can really test that. I've got an sed suggestion for you:
The first sed is removing unwanted lines, the 2nd one is replacing multiple spaces with one space and the third one is printing the last and the first field with a tab as a separator. Getting a conditional one or 2 tabs by looking at the length of the first field is not really possible in sed. If you can use expand for example, you can align like this:
Output:
sort -u output.txt | sed "/^[ ]*$/d;/^E:/d;s/\[//g;s/\]//g" | sed "s/ */ /g" | sed "s/\(.* \).* .* .* .* .* \(.*\)/\2\t\1/"
Output:StudyDate (0008,0020)
AccessionNumber (0008,0050)
PatientName (0010,0010)
PatientID (0010,0020)
PatientBirthDate (0010,0030)
PatientSex (0010,0040)
The first sed is removing unwanted lines, the 2nd one is replacing multiple spaces with one space and the third one is printing the last and the first field with a tab as a separator. Getting a conditional one or 2 tabs by looking at the length of the first field is not really possible in sed. If you can use expand for example, you can align like this:
sort -u output.txt | sed "/^[ ]*$/d;/^E:/d;s/\[//g;s/\]//g" | sed "s/ */ /g" | sed "s/\(.* \).* .* .* .* .* \(.*\)/\2\t\1/" | expand -t 20
Output:
StudyDate (0008,0020)
AccessionNumber (0008,0050)
PatientName (0010,0010)
PatientID (0010,0020)
PatientBirthDate (0010,0030)
PatientSex (0010,0040)
ASKER
thanks -
I tried your first 'sed' suggestion from your last post and it is removing the wrong info and leaving info that I would like stripped away.
I would like this to left in the output file:
instead I am left with:
--also I needed to remove the '-u' option from sort for it to work in cygwin.
As for your last example to try, I can't use the last command 'expand' with the '-t' option. Is 'expand' a linux tool that I can get?
I tried your first 'sed' suggestion from your last post and it is removing the wrong info and leaving info that I would like stripped away.
I would like this to left in the output file:
StudyDate 20130305
AccessionNumber 1074110
PatientName TESTERMAN^TESTERMAN^^^
PatientID 10026487
PatientBirthDate 18750101
PatientSex M
instead I am left with:
StudyDate (0008,0020)
AccessionNumber (0008,0050)
PatientName (0010,0010)
PatientID (0010,0020)
PatientBirthDate (0010,0030)
PatientSex (0010,0040)
--also I needed to remove the '-u' option from sort for it to work in cygwin.
As for your last example to try, I can't use the last command 'expand' with the '-t' option. Is 'expand' a linux tool that I can get?
Ah, got the wrong field :)
About expand - it is a standard Linux tool: you should be able to add it to cygwin using the setup.exe of cygwin itself.
This is getting you the correct field:
sort -u -> this is a unique sort, you can replace by:
sort | uniq
About expand - it is a standard Linux tool: you should be able to add it to cygwin using the setup.exe of cygwin itself.
This is getting you the correct field:
sort -u sample.txt | sed "/^[ ]*$/d;/^E:/d;s/\[//g;s/\]//g" | sed "s/ */ /g" | sed "s/.* .* \(.* \).* .* .* \(.*\)/\2\t\1/" | expand -t 20
sort -u -> this is a unique sort, you can replace by:
sort | uniq
ASKER
thanks - its almost there, here is the output:
except in notepad (for windows) it is all displayed on one line.
StudyDate
20130305
AccessionNumber
1074110
PatientName
HYDE^JECKYL^^^
PatientID
10026XXX
PatientBirthDate
19011231
PatientSex
M
except in notepad (for windows) it is all displayed on one line.
It seems the seconde pattern matched includes the newline, can you try this:
sort -u sample.txt | sed "/^[ ]*$/d;/^E:/d;s/\[//g;s/\]//g" | sed "s/ */ /g" | sed "s/.* .* \(.* \).* .* .* \(.*\)$/\2\t\1/" | expand -t 20
ASKER
no luck, now there is a new line after each word or 'entry':
StudyDate
20130305
AccessionNumber
10741103
PatientName
TESTERMAN^TESTERMAN^^^
PatientID
100263487
PatientBirthDate
18750101
PatientSex
M
Pls post (attach file) your output.txt - I'll check later with cygwin
<edit>
I checked with cygwin, output looks OK to me:
<edit>
I checked with cygwin, output looks OK to me:
$ head output.txt
(0010,0010) PN [TESTERMAN^TESTERMAN^^^] # 16, 1 PatientName
(0010,0020) LO [10026487] # 8, 1 PatientID
(0010,0030) DA [18750101] # 8, 1 PatientBirthDate
(0008,0020) DA [20130305] # 8, 1 StudyDate
(0008,0050) SH [1074110] # 8, 1 AccessionNumber
(0010,0040) CS [M] # 2, 1 PatientSex
(0010,0010) PN [TESTERMAN^TESTERMAN^^^] # 16, 1 PatientName
(0010,0020) LO [10026487] # 8, 1 PatientID
(0010,0030) DA [18750101] # 8, 1 PatientBirthDate
user@host ~
$ sort -u output.txt | sed "/^[ ]*$/d;/^E:/d;s/\[//g;s/\]//g" | sed "s/ */ /g" | sed "s/.* .* \(.* \).* .* .* \(.*\)/\2\t\1/" | expand -t 20
StudyDate 20130305
AccessionNumber 1074110
PatientName TESTERMAN^TESTERMAN^^^
PatientID 10026487
PatientBirthDate 18750101
PatientSex M
user@host ~
So how are you running this in cygwin and how to you get the output into notepad?
ASKER
Can you answer my 2 questions from above? I checked in cygwin an my output file has the 2 fields on the same line. Note that your screenshot shows 2 tabs on the 'next' line where my sed command inserts just one.
ASKER
This is what I am running in cygwin:
I get the output into notepad++ by using ">test_output.txt"
I have also attached the test_output.txt file that is generated.
test-output.txt
$ sort -u output.txt | sed "/^[ ]*$/d;/^E:/d;s/\[//g;s/\]//g" | sed "s/ */ /g" | sed "s/ */ /g" | sed "s/.* .* \(.* \).* .* .* \(.*\)$/\2\t\1/" | expand -t 20 > test_output.txt
I get the output into notepad++ by using ">test_output.txt"
I have also attached the test_output.txt file that is generated.
test-output.txt
Thanks, please post your 'input' file as well, you called it 'output.txt', this is what I asked for as well.
ASKER
Here it is for you. its called output.txt becuase this info is being generated from another source.
thanks
output.txt
thanks
output.txt
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Great! I'll give this a shot 1st thing in the morning and hopefully I can get the whole process to flow. End result is to get this emailed to myself and a co-worker when a remote site rips a CD with dicom to send to us.
ASKER
So, this works great in cygwin. I made a script to do this and I can run it and it creates the new 'output' file within cygwin. My ultimate goal is to run this from a command prompt in windows, which shouldn't be a problem if I just call bash from a command line to run the unix script.
Here is where I am hitting a wall, when I run my .bat file:
my 'dcmdump_out_format_script .sh'
it errors out with: Invalid switch
and it also displays in my 'test_output.txt' file below
I don't know how to tell it to run the 'expand' tool from cygwin instead of the microsoft tool!
any ideas?
thanks for all of your help by the way!!
Here is where I am hitting a wall, when I run my .bat file:
echo
SET dcmtk=d:\apps\dcmtk\bin
%dcmtk%\dcmdump d:\import +sd +r -s +P "0010,0010" +P "0010,0020" +P "0010,0030" +P "0008,0020" +P "0008,0050" +P "0010,0040" > d:\import\output.txt
c:\cygwin\bin\bash dcmdump_out_format_script.sh
my 'dcmdump_out_format_script
#!/bin/bash
sort -u /cygdrive/d/import/output.txt | sed "/^[ ]*$/d;/^E:/d;s/\[//g;s/\]//g" | sed "s/ */ /g" | sed "s/ */ /g" | sed "s/.* .* \(.* \).* .* .* \(.*\)$/\2\t\1/;s/$/\r/" | expand -t 20 > /cygdrive/d/import/test_output.txt
it errors out with: Invalid switch
and it also displays in my 'test_output.txt' file below
Microsoft (R) File Expansion Utility Version 5.1.2600.0
Copyright (C) Microsoft Corp 1990-1999. All rights reserved.
Unrecognized switch -t.
I don't know how to tell it to run the 'expand' tool from cygwin instead of the microsoft tool!
any ideas?
thanks for all of your help by the way!!
Ah, it should be a path issue, the Windows path containing 'expand' is in front of the cygwin path that has expand. You could add the full path to your cygwin expand or try copying the cywin expand to expand1 for example and replace in the sed line above. Let me know if you need help with that.
<edit>
I checked, copying cygwin\bin\expand.exe to expand1.exe works the way I intended. You can try and verify.
<edit>
I checked, copying cygwin\bin\expand.exe to expand1.exe works the way I intended. You can try and verify.
ASKER
Thanks for your help on this - I got this running through a command prompt in windows with some different code. I kept on getting a lot of 'invalid switch' when I would use bash from windows command prompt, even though it ran fine within a cygwin console.
I ended up following the suggestion here.
points on the way & thank you again!
I ended up following the suggestion here.
points on the way & thank you again!
ASKER
excellent work!
Thanks ;)
Like adding this at the end:
Open in new window