Member_2_5990268
asked on
Cygwin / Bash - Join command not working as expected
I have two CSV files I need to combine.
Both of them share a key column (ID), but each file has unique columns otherwise.
Not all of the IDs in the first file have matches in the second file.
I would like the output to include ALL rows from the first file whether they match or not. The rows that did have a match need the additional columns appended.
This seems to be a fairly common question floating around on the internet. The most common solution I found was to use 'join'. However, it isn't behaving as I was expecting.
$ ls
test1.csv test2.csv
$ cat test1.csv
22,Charles,Stross,388
33,William,Shakespeare,389
44,Michael,Jackson,385
55,Tom,Green,103
58,Edgar,Poe,22
66,John,Doe,938
99,Jane,Doe,2388
$ cat test2.csv
22,Stories,10/27/1983
33,Plays,12/31/1980
44,Music,7/23/1937
$ join -a 1 -t , test1.csv test2.csv
,Stories,10/27/198388
,Plays,12/31/1980peare,389
,Music,7/23/1937on,385
55,Tom,Green,103
58,Edgar,Poe,22
66,John,Doe,938
99,Jane,Doe,2388
The output seems to be dropping the "key" value from any rows that match, and I have no idea why the text is overlapping (for example, the Date on output Line 3 is overwriting the name).
Any suggestions?
Both of them share a key column (ID), but each file has unique columns otherwise.
Not all of the IDs in the first file have matches in the second file.
I would like the output to include ALL rows from the first file whether they match or not. The rows that did have a match need the additional columns appended.
This seems to be a fairly common question floating around on the internet. The most common solution I found was to use 'join'. However, it isn't behaving as I was expecting.
$ ls
test1.csv test2.csv
$ cat test1.csv
22,Charles,Stross,388
33,William,Shakespeare,389
44,Michael,Jackson,385
55,Tom,Green,103
58,Edgar,Poe,22
66,John,Doe,938
99,Jane,Doe,2388
$ cat test2.csv
22,Stories,10/27/1983
33,Plays,12/31/1980
44,Music,7/23/1937
$ join -a 1 -t , test1.csv test2.csv
,Stories,10/27/198388
,Plays,12/31/1980peare,389
,Music,7/23/1937on,385
55,Tom,Green,103
58,Edgar,Poe,22
66,John,Doe,938
99,Jane,Doe,2388
The output seems to be dropping the "key" value from any rows that match, and I have no idea why the text is overlapping (for example, the Date on output Line 3 is overwriting the name).
Any suggestions?
ASKER
$ join -a 1 -t , test1.csv test2.csv
,Stories,10/27/198388
,Plays,12/31/1980peare,389
,Music,7/23/1937on,385
55,Tom,Green,103
58,Edgar,Poe,22
66,John,Doe,938
99,Jane,Doe,2388
$ join -a 1 -t , test1.csv test2.csv | cat -v
22,Charles,Stross,388^M,St ories,10/2 7/1983^M
33,William,Shakespeare,389 ^M,Plays,1 2/31/1980^ M
44,Michael,Jackson,385^M,M usic,7/23/ 1937
55,Tom,Green,103^M
58,Edgar,Poe,22^M
66,John,Doe,938^M
99,Jane,Doe,2388
,Stories,10/27/198388
,Plays,12/31/1980peare,389
,Music,7/23/1937on,385
55,Tom,Green,103
58,Edgar,Poe,22
66,John,Doe,938
99,Jane,Doe,2388
$ join -a 1 -t , test1.csv test2.csv | cat -v
22,Charles,Stross,388^M,St
33,William,Shakespeare,389
44,Michael,Jackson,385^M,M
55,Tom,Green,103^M
58,Edgar,Poe,22^M
66,John,Doe,938^M
99,Jane,Doe,2388
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Ozo, you are a saint!
It works now; here is the entirety of what I had to do:
cat test1.csv | tr -d '\r' > testa.csv
cat test2.csv | tr -d '\r' > testb.csv
join -a 1 -t , testa.csv testb.csv
It works now; here is the entirety of what I had to do:
cat test1.csv | tr -d '\r' > testa.csv
cat test2.csv | tr -d '\r' > testb.csv
join -a 1 -t , testa.csv testb.csv
22,Charles,Stross,388,Stor
33,William,Shakespeare,389
44,Michael,Jackson,385,Mus
55,Tom,Green,103
58,Edgar,Poe,22
66,John,Doe,938
99,Jane,Doe,2388
is it possible you have an extra \r character at the end of the lines in test1.csv?
what do you see with
join -a 1 -t , test1.csv test2.csv | cat -v