Link to home
Start Free TrialLog in
Avatar of zhshqzyc
zhshqzyc

asked on

Combine text files according the first column

Hi, I have two text files. The first text file has the format:

ID  h1 h2 h3
a   1  5  7
b   3  3  5
c   0  4  8

Open in new window

And the second file:

ID h4 h5 h6
a  2  4  9
b  3  6  1
b  4  1  5

Open in new window

Now I want to merge two files. The output likes

ID  h1  h2  h3  h4 h5  h6
a   1   5   7   2   4   9 
b   3   3   5   3   6   1
b               4   1   5
c   0  4  8

Open in new window

The two files has the same rows and columns. For the first column, if they are same, we just need append the row. However sometimes in the experiment there was wrong in the process. Thus in file 1 the first column is
a
b
c

Open in new window

In the second file the first column is
a
b
b

Open in new window

The problem was sloved at MSDN Forum. However I am uncomfortable the solution because it looks complicated. I want to use linq to do it.
I had a very similar thread at expert-exchange.com
I hope that somebody can use a sibilar method to finish it. Actually I have three files to be merged.

Thanks for help.
Avatar of kaufmed
kaufmed
Flag of United States of America image

How does the previous thread not work? I'm sure I'm missing something.
Avatar of zhshqzyc
zhshqzyc

ASKER

Yes. your previous thread is not working. But never mind it.
Are the files sorted by ID?
Yes. ID is a key.
Do duplicate IDs only occur on file #2?
Also, does file #2 contain any ID that is not in file #1?
I don't know. Usually it is only in file 2, but I hope that if there is a general method to accomplish it wnatever in file 1 or 2 or both. If it is too hard, we can simplify the question. That means duplicate IDs only in file 2. That is okay.
Please try the following code (it assumes the files are tab delimited and sorted by ID):

var fname1 = @"C:\temp\f1.txt";
var fname2 = @"C:\temp\f2.txt";

var lines1 = new List<string>(File.ReadAllLines(fname1));
var lines2 = new List<string>(File.ReadAllLines(fname2));

var filler1 = string.Empty.PadRight(lines1[0].Count(ch => ch == '\t'), '\t');
var filler2 = string.Empty.PadRight(lines2[0].Count(ch => ch == '\t'), '\t');

using (var output = new StreamWriter(@"C:\temp\output.txt"))
{
	for (int i1 = 0, i2 = 0; i1 < lines1.Count || i2 < lines2.Count; i1++, i2++)
	{
		var line1 = (i1 < lines1.Count) ? lines1[i1] : "\xFF\t";
		var line2 = (i2 < lines2.Count) ? lines2[i2] : "\xFF\t";
		
		var key1 = line1.Substring(0, line1.IndexOf('\t'));
		var key2 = line2.Substring(0, line2.IndexOf('\t'));
		
		var cmp = key1.CompareTo(key2);
		
		if (cmp < 0)
		{
			line2 = key1 + filler2;
			i2--;
		}
		else if (cmp > 0)
		{
			line1 = key2 + filler1;
			i1--;
		}
	
		output.WriteLine("{0}{1}", line1, line2.Substring(line2.IndexOf('\t')));
	}
}

Open in new window

Looks good for testing two files. So if I have three files, would you please modify the code?
Thanks again.
ASKER CERTIFIED SOLUTION
Avatar of wdosanjos
wdosanjos
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thank you very muchhhhhh!