We help IT Professionals succeed at work.

Check out our new AWS podcast with Certified Expert, Phil Phillips! Listen to "How to Execute a Seamless AWS Migration" on EE or on your favorite podcast platform. Listen Now

x

Combine text files according the first column

zhshqzyc
zhshqzyc asked
on
Medium Priority
244 Views
Last Modified: 2012-05-11
Hi, I have two text files. The first text file has the format:

ID  h1 h2 h3
a   1  5  7
b   3  3  5
c   0  4  8

Open in new window

And the second file:

ID h4 h5 h6
a  2  4  9
b  3  6  1
b  4  1  5

Open in new window

Now I want to merge two files. The output likes

ID  h1  h2  h3  h4 h5  h6
a   1   5   7   2   4   9 
b   3   3   5   3   6   1
b               4   1   5
c   0  4  8

Open in new window

The two files has the same rows and columns. For the first column, if they are same, we just need append the row. However sometimes in the experiment there was wrong in the process. Thus in file 1 the first column is
a
b
c

Open in new window

In the second file the first column is
a
b
b

Open in new window

The problem was sloved at MSDN Forum. However I am uncomfortable the solution because it looks complicated. I want to use linq to do it.
I had a very similar thread at expert-exchange.com
I hope that somebody can use a sibilar method to finish it. Actually I have three files to be merged.

Thanks for help.
Comment
Watch Question

CERTIFIED EXPERT
Most Valuable Expert 2011
Top Expert 2015

Commented:
How does the previous thread not work? I'm sure I'm missing something.

Author

Commented:
Yes. your previous thread is not working. But never mind it.
Top Expert 2011

Commented:
Are the files sorted by ID?

Author

Commented:
Yes. ID is a key.
Top Expert 2011

Commented:
Do duplicate IDs only occur on file #2?
Top Expert 2011

Commented:
Also, does file #2 contain any ID that is not in file #1?

Author

Commented:
I don't know. Usually it is only in file 2, but I hope that if there is a general method to accomplish it wnatever in file 1 or 2 or both. If it is too hard, we can simplify the question. That means duplicate IDs only in file 2. That is okay.
Top Expert 2011

Commented:
Please try the following code (it assumes the files are tab delimited and sorted by ID):

var fname1 = @"C:\temp\f1.txt";
var fname2 = @"C:\temp\f2.txt";

var lines1 = new List<string>(File.ReadAllLines(fname1));
var lines2 = new List<string>(File.ReadAllLines(fname2));

var filler1 = string.Empty.PadRight(lines1[0].Count(ch => ch == '\t'), '\t');
var filler2 = string.Empty.PadRight(lines2[0].Count(ch => ch == '\t'), '\t');

using (var output = new StreamWriter(@"C:\temp\output.txt"))
{
	for (int i1 = 0, i2 = 0; i1 < lines1.Count || i2 < lines2.Count; i1++, i2++)
	{
		var line1 = (i1 < lines1.Count) ? lines1[i1] : "\xFF\t";
		var line2 = (i2 < lines2.Count) ? lines2[i2] : "\xFF\t";
		
		var key1 = line1.Substring(0, line1.IndexOf('\t'));
		var key2 = line2.Substring(0, line2.IndexOf('\t'));
		
		var cmp = key1.CompareTo(key2);
		
		if (cmp < 0)
		{
			line2 = key1 + filler2;
			i2--;
		}
		else if (cmp > 0)
		{
			line1 = key2 + filler1;
			i1--;
		}
	
		output.WriteLine("{0}{1}", line1, line2.Substring(line2.IndexOf('\t')));
	}
}

Open in new window

Author

Commented:
Looks good for testing two files. So if I have three files, would you please modify the code?
Thanks again.
Top Expert 2011
Commented:
Unlock this solution with a free trial preview.
(No credit card required)
Get Preview

Author

Commented:
Thank you very muchhhhhh!
Unlock the solution to this question.
Thanks for using Experts Exchange.

Please provide your email to receive a free trial preview!

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.