Link to home
Start Free TrialLog in
Avatar of maurice cristen
maurice cristen

asked on

Windows7 os:gawk code to remove duplicates?

I need a gawk code to remove this type of duplicates:

john:john
mary:1234
sdfds:sadfsf
john:john
mary:1234

Open in new window

Avatar of Dan Craciun
Dan Craciun
Flag of Romania image

Are they each on a different line, like in the sample you provided?

Because if so you don't need awk, you need uniq.

HTH,
Dan
Avatar of maurice cristen
maurice cristen

ASKER

i have a huge txt file 1,8 gb remove duplicates and keep the order,unique?can u guide me plz
like in the sample i provided
Keeping the order is trickier.

Normally you would sort the file, then use uniq to remove duplicates. But the resulting file will not have the original order.
App.Merge can do that from hashkiller forum....good results but not keeping the order ,sortin alphabetical order,maybe i try that if u say no sort is trikcing
gawk '!seen[$0]++' file.txt > results.txt

Open in new window


It should work, don't know on 1GB+ files. Try it.
C:\Program Files (x86)\GnuWin32\bin>awk '!seen[$0]++' file.txt > results.txt
awk: '!seen[$0]++'
awk: ^ invalid char ''' in expression
ASKER CERTIFIED SOLUTION
Avatar of Dan Craciun
Dan Craciun
Flag of Romania image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
maybe portable gnu utilities work, can you send me your folder pls with sendspace.com or...blah blah..i will delete the link after
if dones't work anyway i will keep your code and i will try on my other laptop and i will sucessfull close this thread and give u points!
1)
Comment from "Dan Craciun" works fine using cygwin
D:\cygwin\home\murugesandinesh> D:\cygwin\bin\gawk.exe '!seen[$0]++' file.txt
john:john
mary:1234
sdfds:sadfsf

D:\cygwin\home\murugesandinesh>
[\code]
2)
cygwin commands inside C:\Windows\System32\cmd.exe
[code]
D:\cygwin\home\murugesandinesh> D:\cygwin\bin\sort.exe file.txt | D:\cygwin\bin\uniq.exe
john:john
mary:1234
sdfds:sadfsf

D:\cygwin\home\murugesandinesh>

Open in new window

3)
cygwin commands inside C:\Windows\System32\cmd.exe and without using D:\cygwin\bin\uniq.exe
D:\cygwin\home\murugesandinesh> D:\cygwin\bin\sort.exe -u file.txt
john:john
mary:1234
sdfds:sadfsf

D:\cygwin\home\murugesandinesh>

Open in new window


4)
C:\Windows\System32\cmd.exe providing help on SORT. Hence "sort -u file.txt" inside C:\Windows\System32\cmd.exe will provide following error:
Input file specified two times.
So, use full path while executing any commands at any operating systems.

5)
Use related redirection based on the requirement:
a. /full/path/command > outputfile.txt
b. /full/path/command > outputfile.txt 2>error.txt
output and error file being same:
c. /full/path/command > outputfile.txt 2>&1
Run following background
d. /full/path/command > outputfile.txt 2>&1 &
Display error and output in terminal as well as in output file.
e. /full/path/command 2>&1 | /usr/bin/tee -a outputfile.txt
thank you