• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 297
  • Last Modified:

Merge files with Todictionary method

I have three files that have many rows and columns. The following is a simple example extract from the file.
File1
	rs1107199
LY-P1_A01	G
LY-P1_A02	G
LY-P1_A03	G

Open in new window

File 2
	rs10078294
LY-P1_A01	AG
LY-P1_A02	G	
LY-P1_A03	AG

Open in new window

File 3
	rs1117330
LY-P1_A01	C
LY-P1_A02	C
LY-P1_A03	C

Open in new window

Please notice the delimiter for fields is tab. Three file have the same row number.
The first row first column item in each file is empty. I want to combine them together. The final file likes
	rs1107199	rs10078294	rs1117330
LY-P1_A01	G	        AG	       C
LY-P1_A02	G	        G	       C
LY-P1_A03	G	        AG	       C

Open in new window

Thus I used the code
         
                string[] lines1 = File.ReadAllLines(fname1);
                string[] lines2 = File.ReadAllLines(fname2);
                string[] lines3 = File.ReadAllLines(fname3);
                Dictionary<string, string> data =
            (from line1 in lines1 
             let fields1 = line1.Split('\t')
             from line2 in lines2
             let fields2 = line2.Split('\t')
             from line3 in lines3
             let fields3 = line3.Split('\t')
             where (fields1[0] == fields2[0] && fields1 [0]==fields3 [0])
             select new
             {
                 Key = fields1[0],
                 Value = line1+'\t'+line2 +'\t'+line3 
             }).ToDictionary(p => p.Key, p => p.Value);

Open in new window

However I am lacking confidence for that because there is an empty fields in each file.

Thanks for help.
0
zhshqzyc
Asked:
zhshqzyc
  • 4
  • 3
  • 2
2 Solutions
 
wdosanjosCommented:
You can achieve your goal without the need of a Dictionary, as follows:

string[] lines1 = File.ReadAllLines(fname1);
string[] lines2 = File.ReadAllLines(fname2);
string[] lines3 = File.ReadAllLines(fname3);

var query = from line1 in lines1 
			let fields1 = line1.Split('\t')
			from line2 in lines2
			let fields2 = line2.Split('\t')
			from line3 in lines3
			let fields3 = line3.Split('\t')
			where (fields1[0] == fields2[0] && fields1 [0]==fields3 [0])
			select new
			{
 				Content = fields1[0]+'\t'+fields1[1]+'\t'+fields2[1]+'\t'+fields2[1]
			};
			
using (var outFile = new StreamWriter(fnameout))
{
	foreach (var line in query)
	{
		outFile.WriteLine(line.Content);
	}
}

Open in new window

0
 
käµfm³d 👽Commented:
However I am lacking confidence for that because there is an empty fields in each file.
This is my take on what you would like to do. Please let me know if I misinterpreted the requirement  = )
Dictionary<string, string> data = (from line1 in lines1
                                   let fields1 = line1.Split('\t')
                                   from line2 in lines2
                                   let fields2 = line2.Split('\t')
                                   from line3 in lines3
                                   let fields3 = line3.Split('\t')
                                   where (fields1[0] == fields2[0] && fields1[0] == fields3[0])
                                   select new
                                   {
                                       Key = fields1[0],
                                       Value = (fields1[0].Length > 0 ? line1.Replace(fields1[0], string.Empty) : fields1[1]) + '\t' +
                                               (fields2[0].Length > 0 ? line2.Replace(fields2[0], string.Empty) : fields2[1]) + '\t' +
                                               (fields3[0].Length > 0 ? line3.Replace(fields3[0], string.Empty) : fields3[1])
                                   }).ToDictionary(p => p.Key, p => p.Value);

File.WriteAllLines("output.txt", data.Select(item => item.Key + '\t' + item.Value).ToArray());

Open in new window

0
 
zhshqzycAuthor Commented:
Okay. Actually I have many columns in files rather than two in the example.
How to modify the code?
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
wdosanjosCommented:
Please provide some sample data with a couple other columns.  Also, provide the expected output.

Thanks.
0
 
zhshqzycAuthor Commented:
I mean that there are many columns. The example just gave you two columns to demostrate.
	       col1	col2    col3	col4	col5   col6   col7    col8	col9
LY-P1_A01	AG	G	GT	C	G	T	G	GA	GA
LY-P1_A02	G	GA	GT			T	G	A	A	

Open in new window

So you cann't enumerate all such as fields[1],fields[2] etc, because you don't know how many columns.
0
 
zhshqzycAuthor Commented:
I guess
var query = from line1 in lines1
                            let fields1 = line1.Split('\t')
                            from line2 in lines2
                            let fields2 = line2.Split('\t')
                            from line3 in lines3
                            let fields3 = line3.Split('\t')
                            where (fields1[0] == fields2[0] && fields1[0] == fields3[0])
                            select new
                            {
                                Content = line1  + '\t' + fields2.Skip (1) + '\t' +   fields3.Skip (1)
                            };

Open in new window

0
 
wdosanjosCommented:
It should be something like this:

string[] lines1 = File.ReadAllLines(fname1);
string[] lines2 = File.ReadAllLines(fname2);
string[] lines3 = File.ReadAllLines(fname3);

var query = from line1 in lines1 
			let fields1 = line1.Split('\t')
			from line2 in lines2
			let fields2 = line2.Split('\t')
			from line3 in lines3
			let fields3 = line3.Split('\t')
			where (fields1[0] == fields2[0] && fields1 [0]==fields3 [0])
			select new
			{
				Key = fields1[0],
 				Content = new string[][]{fields1, fields2, fields3}
			};
			
using (var outFile = new StreamWriter(fnameout))
{
	foreach (var line in query)
	{
		outFile.Write(line.Key);
		
		foreach (string[] fields in line.Content)
		{
			for (int i = 1; i < fields.Length; i++)
			{
				outFile.Write("\t{0}",fields[i].Trim());
			}
		}
		
		outFile.WriteLine();
	}
}

Open in new window

0
 
käµfm³d 👽Commented:
Here's my proposed modification:
Dictionary<string, string> data = (from line1 in lines1
                                   let fields1 = line1.Split('\t')
                                   from line2 in lines2
                                   let fields2 = line2.Split('\t')
                                   from line3 in lines3
                                   let fields3 = line3.Split('\t')
                                   where (fields1[0] == fields2[0] && fields1[0] == fields3[0])
                                   select new
                                   {
                                       Key = fields1[0],
                                       Value = (fields1[0].Length > 0 ? line1.Replace(fields1[0], string.Empty) : string.Join("\t", fields1.Skip(1).ToArray()) + '\t' +
                                               (fields2[0].Length > 0 ? line2.Replace(fields2[0], string.Empty) : string.Join("\t", fields2.Skip(1).ToArray())) + '\t' +
                                               (fields3[0].Length > 0 ? line3.Replace(fields3[0], string.Empty) : string.Join("\t", fields3.Skip(1).ToArray())))
                                   }).ToDictionary(p => p.Key, p => p.Value);

File.WriteAllLines("output.txt", data.Select(item => item.Key + '\t' + item.Value).ToArray());

Open in new window

0
 
zhshqzycAuthor Commented:
Thank you. Tomorrow I will propose a similar but harder question. Hopefullly I can meet you.
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

  • 4
  • 3
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now