Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
?
Solved

Combine text files according the first column

Posted on 2011-04-20
11
Medium Priority
?
216 Views
Last Modified: 2012-05-11
Hi, I have two text files. The first text file has the format:

ID  h1 h2 h3
a   1  5  7
b   3  3  5
c   0  4  8

Open in new window

And the second file:

ID h4 h5 h6
a  2  4  9
b  3  6  1
b  4  1  5

Open in new window

Now I want to merge two files. The output likes

ID  h1  h2  h3  h4 h5  h6
a   1   5   7   2   4   9 
b   3   3   5   3   6   1
b               4   1   5
c   0  4  8

Open in new window

The two files has the same rows and columns. For the first column, if they are same, we just need append the row. However sometimes in the experiment there was wrong in the process. Thus in file 1 the first column is
a
b
c

Open in new window

In the second file the first column is
a
b
b

Open in new window

The problem was sloved at MSDN Forum. However I am uncomfortable the solution because it looks complicated. I want to use linq to do it.
I had a very similar thread at expert-exchange.com
I hope that somebody can use a sibilar method to finish it. Actually I have three files to be merged.

Thanks for help.
0
Comment
Question by:zhshqzyc
  • 5
  • 5
11 Comments
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 35433118
How does the previous thread not work? I'm sure I'm missing something.
0
 

Author Comment

by:zhshqzyc
ID: 35433242
Yes. your previous thread is not working. But never mind it.
0
 
LVL 23

Expert Comment

by:wdosanjos
ID: 35435229
Are the files sorted by ID?
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 

Author Comment

by:zhshqzyc
ID: 35435291
Yes. ID is a key.
0
 
LVL 23

Expert Comment

by:wdosanjos
ID: 35435419
Do duplicate IDs only occur on file #2?
0
 
LVL 23

Expert Comment

by:wdosanjos
ID: 35435480
Also, does file #2 contain any ID that is not in file #1?
0
 

Author Comment

by:zhshqzyc
ID: 35435496
I don't know. Usually it is only in file 2, but I hope that if there is a general method to accomplish it wnatever in file 1 or 2 or both. If it is too hard, we can simplify the question. That means duplicate IDs only in file 2. That is okay.
0
 
LVL 23

Expert Comment

by:wdosanjos
ID: 35437248
Please try the following code (it assumes the files are tab delimited and sorted by ID):

var fname1 = @"C:\temp\f1.txt";
var fname2 = @"C:\temp\f2.txt";

var lines1 = new List<string>(File.ReadAllLines(fname1));
var lines2 = new List<string>(File.ReadAllLines(fname2));

var filler1 = string.Empty.PadRight(lines1[0].Count(ch => ch == '\t'), '\t');
var filler2 = string.Empty.PadRight(lines2[0].Count(ch => ch == '\t'), '\t');

using (var output = new StreamWriter(@"C:\temp\output.txt"))
{
	for (int i1 = 0, i2 = 0; i1 < lines1.Count || i2 < lines2.Count; i1++, i2++)
	{
		var line1 = (i1 < lines1.Count) ? lines1[i1] : "\xFF\t";
		var line2 = (i2 < lines2.Count) ? lines2[i2] : "\xFF\t";
		
		var key1 = line1.Substring(0, line1.IndexOf('\t'));
		var key2 = line2.Substring(0, line2.IndexOf('\t'));
		
		var cmp = key1.CompareTo(key2);
		
		if (cmp < 0)
		{
			line2 = key1 + filler2;
			i2--;
		}
		else if (cmp > 0)
		{
			line1 = key2 + filler1;
			i1--;
		}
	
		output.WriteLine("{0}{1}", line1, line2.Substring(line2.IndexOf('\t')));
	}
}

Open in new window

0
 

Author Comment

by:zhshqzyc
ID: 35440376
Looks good for testing two files. So if I have three files, would you please modify the code?
Thanks again.
0
 
LVL 23

Accepted Solution

by:
wdosanjos earned 2000 total points
ID: 35440719
The following version works with 2 or more files:

public void Merge(string[] fnames, string outfname)
{
	Merge(fnames[0], fnames[1], outfname);
	
	for (int i = 2; i < fnames.Length; i++)
	{
		Merge(fnames[i], outfname, outfname);
	}
}

private void Merge(string fname1, string fname2, string outfname)
{
	var lines1 = File.ReadAllLines(fname1);
	var lines2 = File.ReadAllLines(fname2);
	
	var filler1 = string.Empty.PadRight(lines1[0].Count(ch => ch == '\t'), '\t');
	var filler2 = string.Empty.PadRight(lines2[0].Count(ch => ch == '\t'), '\t');
	
	using (var output = new StreamWriter(outfname))
	{
		for (int i1 = 0, i2 = 0; i1 < lines1.Length || i2 < lines2.Length; i1++, i2++)
		{
			var line1 = (i1 < lines1.Length) ? lines1[i1] : "\xFF\t";
			var line2 = (i2 < lines2.Length) ? lines2[i2] : "\xFF\t";
			
			var key1 = line1.Substring(0, line1.IndexOf('\t'));
			var key2 = line2.Substring(0, line2.IndexOf('\t'));
			
			var cmp = key1.CompareTo(key2);
			
			if (cmp < 0)
			{
				line2 = key1 + filler2;
				i2--;
			}
			else if (cmp > 0)
			{
				line1 = key2 + filler1;
				i1--;
			}
		
			output.WriteLine("{0}{1}", line1, line2.Substring(line2.IndexOf('\t')));
		}
	}
}

Open in new window


Here is a sample way of calling it:
var fnames = new string[] {
	@"C:\temp\f1.txt", 
	@"C:\temp\f2.txt", 
	@"C:\temp\f3.txt"
};

Merge(fnames,  @"C:\temp\output.txt");

Open in new window

0
 

Author Closing Comment

by:zhshqzyc
ID: 35441158
Thank you very muchhhhhh!
0

Featured Post

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction This article series is supposed to shed some light on the use of IDisposable and objects that inherit from it. In essence, a more apt title for this article would be: using (IDisposable) {}. I’m just not sure how many people would ge…
This article describes a simple method to resize a control at runtime.  It includes ready-to-use source code and a complete sample demonstration application.  We'll also talk about C# Extension Methods. Introduction In one of my applications…
Exchange organizations may use the Journaling Agent of the Transport Service to archive messages going through Exchange. However, if the Transport Service is integrated with some email content management application (such as an anti-spam), the admin…
Screencast - Getting to Know the Pipeline
Suggested Courses

564 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question