• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 218
  • Last Modified:

Combine text files according the first column

Hi, I have two text files. The first text file has the format:

ID  h1 h2 h3
a   1  5  7
b   3  3  5
c   0  4  8

Open in new window

And the second file:

ID h4 h5 h6
a  2  4  9
b  3  6  1
b  4  1  5

Open in new window

Now I want to merge two files. The output likes

ID  h1  h2  h3  h4 h5  h6
a   1   5   7   2   4   9 
b   3   3   5   3   6   1
b               4   1   5
c   0  4  8

Open in new window

The two files has the same rows and columns. For the first column, if they are same, we just need append the row. However sometimes in the experiment there was wrong in the process. Thus in file 1 the first column is
a
b
c

Open in new window

In the second file the first column is
a
b
b

Open in new window

The problem was sloved at MSDN Forum. However I am uncomfortable the solution because it looks complicated. I want to use linq to do it.
I had a very similar thread at expert-exchange.com
I hope that somebody can use a sibilar method to finish it. Actually I have three files to be merged.

Thanks for help.
0
zhshqzyc
Asked:
zhshqzyc
  • 5
  • 5
1 Solution
 
käµfm³d 👽Commented:
How does the previous thread not work? I'm sure I'm missing something.
0
 
zhshqzycAuthor Commented:
Yes. your previous thread is not working. But never mind it.
0
 
wdosanjosCommented:
Are the files sorted by ID?
0
Cloud Class® Course: CompTIA Healthcare IT Tech

This course will help prep you to earn the CompTIA Healthcare IT Technician certification showing that you have the knowledge and skills needed to succeed in installing, managing, and troubleshooting IT systems in medical and clinical settings.

 
zhshqzycAuthor Commented:
Yes. ID is a key.
0
 
wdosanjosCommented:
Do duplicate IDs only occur on file #2?
0
 
wdosanjosCommented:
Also, does file #2 contain any ID that is not in file #1?
0
 
zhshqzycAuthor Commented:
I don't know. Usually it is only in file 2, but I hope that if there is a general method to accomplish it wnatever in file 1 or 2 or both. If it is too hard, we can simplify the question. That means duplicate IDs only in file 2. That is okay.
0
 
wdosanjosCommented:
Please try the following code (it assumes the files are tab delimited and sorted by ID):

var fname1 = @"C:\temp\f1.txt";
var fname2 = @"C:\temp\f2.txt";

var lines1 = new List<string>(File.ReadAllLines(fname1));
var lines2 = new List<string>(File.ReadAllLines(fname2));

var filler1 = string.Empty.PadRight(lines1[0].Count(ch => ch == '\t'), '\t');
var filler2 = string.Empty.PadRight(lines2[0].Count(ch => ch == '\t'), '\t');

using (var output = new StreamWriter(@"C:\temp\output.txt"))
{
	for (int i1 = 0, i2 = 0; i1 < lines1.Count || i2 < lines2.Count; i1++, i2++)
	{
		var line1 = (i1 < lines1.Count) ? lines1[i1] : "\xFF\t";
		var line2 = (i2 < lines2.Count) ? lines2[i2] : "\xFF\t";
		
		var key1 = line1.Substring(0, line1.IndexOf('\t'));
		var key2 = line2.Substring(0, line2.IndexOf('\t'));
		
		var cmp = key1.CompareTo(key2);
		
		if (cmp < 0)
		{
			line2 = key1 + filler2;
			i2--;
		}
		else if (cmp > 0)
		{
			line1 = key2 + filler1;
			i1--;
		}
	
		output.WriteLine("{0}{1}", line1, line2.Substring(line2.IndexOf('\t')));
	}
}

Open in new window

0
 
zhshqzycAuthor Commented:
Looks good for testing two files. So if I have three files, would you please modify the code?
Thanks again.
0
 
wdosanjosCommented:
The following version works with 2 or more files:

public void Merge(string[] fnames, string outfname)
{
	Merge(fnames[0], fnames[1], outfname);
	
	for (int i = 2; i < fnames.Length; i++)
	{
		Merge(fnames[i], outfname, outfname);
	}
}

private void Merge(string fname1, string fname2, string outfname)
{
	var lines1 = File.ReadAllLines(fname1);
	var lines2 = File.ReadAllLines(fname2);
	
	var filler1 = string.Empty.PadRight(lines1[0].Count(ch => ch == '\t'), '\t');
	var filler2 = string.Empty.PadRight(lines2[0].Count(ch => ch == '\t'), '\t');
	
	using (var output = new StreamWriter(outfname))
	{
		for (int i1 = 0, i2 = 0; i1 < lines1.Length || i2 < lines2.Length; i1++, i2++)
		{
			var line1 = (i1 < lines1.Length) ? lines1[i1] : "\xFF\t";
			var line2 = (i2 < lines2.Length) ? lines2[i2] : "\xFF\t";
			
			var key1 = line1.Substring(0, line1.IndexOf('\t'));
			var key2 = line2.Substring(0, line2.IndexOf('\t'));
			
			var cmp = key1.CompareTo(key2);
			
			if (cmp < 0)
			{
				line2 = key1 + filler2;
				i2--;
			}
			else if (cmp > 0)
			{
				line1 = key2 + filler1;
				i1--;
			}
		
			output.WriteLine("{0}{1}", line1, line2.Substring(line2.IndexOf('\t')));
		}
	}
}

Open in new window


Here is a sample way of calling it:
var fnames = new string[] {
	@"C:\temp\f1.txt", 
	@"C:\temp\f2.txt", 
	@"C:\temp\f3.txt"
};

Merge(fnames,  @"C:\temp\output.txt");

Open in new window

0
 
zhshqzycAuthor Commented:
Thank you very muchhhhhh!
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Cloud Class® Course: MCSA MCSE Windows Server 2012

This course teaches how to install and configure Windows Server 2012 R2.  It is the first step on your path to becoming a Microsoft Certified Solutions Expert (MCSE).

  • 5
  • 5
Tackle projects and never again get stuck behind a technical roadblock.
Join Now