Solved

Cygwin / Bash - Join command not working as expected

Posted on 2010-09-10
4
890 Views
Last Modified: 2013-11-15
I have two CSV files I need to combine.
Both of them share a key column (ID), but each file has unique columns otherwise.
Not all of the IDs in the first file have matches in the second file.  

I would like the output to include ALL rows from the first file whether they match or not.  The rows that did have a match need the additional columns appended.

This seems to be a fairly common question floating around on the internet. The most common solution I found was to use 'join'.  However, it isn't behaving as I was expecting.



$ ls
test1.csv  test2.csv

$ cat test1.csv
22,Charles,Stross,388
33,William,Shakespeare,389
44,Michael,Jackson,385
55,Tom,Green,103
58,Edgar,Poe,22
66,John,Doe,938
99,Jane,Doe,2388

$ cat test2.csv
22,Stories,10/27/1983
33,Plays,12/31/1980
44,Music,7/23/1937

$ join -a 1 -t , test1.csv test2.csv
,Stories,10/27/198388
,Plays,12/31/1980peare,389
,Music,7/23/1937on,385
55,Tom,Green,103
58,Edgar,Poe,22
66,John,Doe,938
99,Jane,Doe,2388


The output seems to be dropping the "key" value from any rows that match, and I have no idea why the text is overlapping (for example, the Date on output Line 3 is overwriting the name).

Any suggestions?  
0
Comment
Question by:Proots
  • 2
  • 2
4 Comments
 
LVL 84

Expert Comment

by:ozo
ID: 33651437
join -a 1 -t , test1.csv test2.csv
22,Charles,Stross,388,Stories,10/27/1983
33,William,Shakespeare,389,Plays,12/31/1980
44,Michael,Jackson,385,Music,7/23/1937
55,Tom,Green,103
58,Edgar,Poe,22
66,John,Doe,938
99,Jane,Doe,2388

is it possible you have an extra \r character at the end of the lines in test1.csv?
what do you see with
join -a 1 -t , test1.csv test2.csv | cat -v
0
 
LVL 2

Author Comment

by:Proots
ID: 33652073
$ join -a 1 -t , test1.csv test2.csv
,Stories,10/27/198388
,Plays,12/31/1980peare,389
,Music,7/23/1937on,385
55,Tom,Green,103
58,Edgar,Poe,22
66,John,Doe,938
99,Jane,Doe,2388


$ join -a 1 -t , test1.csv test2.csv | cat -v
22,Charles,Stross,388^M,Stories,10/27/1983^M
33,William,Shakespeare,389^M,Plays,12/31/1980^M
44,Michael,Jackson,385^M,Music,7/23/1937
55,Tom,Green,103^M
58,Edgar,Poe,22^M
66,John,Doe,938^M
99,Jane,Doe,2388
0
 
LVL 84

Accepted Solution

by:
ozo earned 50 total points
ID: 33652098
So that's why the text is overlapping.
you may be able to remove it with
tr -d '\r'
0
 
LVL 2

Author Closing Comment

by:Proots
ID: 33653046
Ozo, you are a saint!
It works now; here is the entirety of what I had to do:

cat test1.csv | tr -d '\r' > testa.csv
cat test2.csv | tr -d '\r' > testb.csv
join -a 1 -t , testa.csv testb.csv
0

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Storage devices are generally used to save the data or sometime transfer the data from one computer system to another system. However, sometimes user accidentally erased their important data from the Storage devices. Users have to know how data reco…
In this article, you will read about the trends across the human resources departments for the upcoming year. Some of them include improving employee experience, adopting new technologies, using HR software to its full extent, and integrating artifi…
The viewer will learn how to successfully download and install the SARDU utility on Windows 8, without downloading adware.
The viewer will learn how to successfully download and install the SARDU utility on Windows 7, without downloading adware.

830 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question