zhshqzyc
asked on
Combine text files according the first column
Hi, I have two text files. The first text file has the format:
I had a very similar thread at expert-exchange.com
I hope that somebody can use a sibilar method to finish it. Actually I have three files to be merged.
Thanks for help.
ID h1 h2 h3
a 1 5 7
b 3 3 5
c 0 4 8
And the second file:ID h4 h5 h6
a 2 4 9
b 3 6 1
b 4 1 5
Now I want to merge two files. The output likesID h1 h2 h3 h4 h5 h6
a 1 5 7 2 4 9
b 3 3 5 3 6 1
b 4 1 5
c 0 4 8
The two files has the same rows and columns. For the first column, if they are same, we just need append the row. However sometimes in the experiment there was wrong in the process. Thus in file 1 the first column is a
b
c
In the second file the first column isa
b
b
The problem was sloved at MSDN Forum. However I am uncomfortable the solution because it looks complicated. I want to use linq to do it.I had a very similar thread at expert-exchange.com
I hope that somebody can use a sibilar method to finish it. Actually I have three files to be merged.
Thanks for help.
How does the previous thread not work? I'm sure I'm missing something.
ASKER
Yes. your previous thread is not working. But never mind it.
Are the files sorted by ID?
ASKER
Yes. ID is a key.
Do duplicate IDs only occur on file #2?
Also, does file #2 contain any ID that is not in file #1?
ASKER
I don't know. Usually it is only in file 2, but I hope that if there is a general method to accomplish it wnatever in file 1 or 2 or both. If it is too hard, we can simplify the question. That means duplicate IDs only in file 2. That is okay.
Please try the following code (it assumes the files are tab delimited and sorted by ID):
var fname1 = @"C:\temp\f1.txt";
var fname2 = @"C:\temp\f2.txt";
var lines1 = new List<string>(File.ReadAllLines(fname1));
var lines2 = new List<string>(File.ReadAllLines(fname2));
var filler1 = string.Empty.PadRight(lines1[0].Count(ch => ch == '\t'), '\t');
var filler2 = string.Empty.PadRight(lines2[0].Count(ch => ch == '\t'), '\t');
using (var output = new StreamWriter(@"C:\temp\output.txt"))
{
for (int i1 = 0, i2 = 0; i1 < lines1.Count || i2 < lines2.Count; i1++, i2++)
{
var line1 = (i1 < lines1.Count) ? lines1[i1] : "\xFF\t";
var line2 = (i2 < lines2.Count) ? lines2[i2] : "\xFF\t";
var key1 = line1.Substring(0, line1.IndexOf('\t'));
var key2 = line2.Substring(0, line2.IndexOf('\t'));
var cmp = key1.CompareTo(key2);
if (cmp < 0)
{
line2 = key1 + filler2;
i2--;
}
else if (cmp > 0)
{
line1 = key2 + filler1;
i1--;
}
output.WriteLine("{0}{1}", line1, line2.Substring(line2.IndexOf('\t')));
}
}
ASKER
Looks good for testing two files. So if I have three files, would you please modify the code?
Thanks again.
Thanks again.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thank you very muchhhhhh!