Link to home
Start Free TrialLog in
Avatar of zhshqzyc
zhshqzyc

asked on

Merge files with Todictionary method

I have three files that have many rows and columns. The following is a simple example extract from the file.
File1
	rs1107199
LY-P1_A01	G
LY-P1_A02	G
LY-P1_A03	G

Open in new window

File 2
	rs10078294
LY-P1_A01	AG
LY-P1_A02	G	
LY-P1_A03	AG

Open in new window

File 3
	rs1117330
LY-P1_A01	C
LY-P1_A02	C
LY-P1_A03	C

Open in new window

Please notice the delimiter for fields is tab. Three file have the same row number.
The first row first column item in each file is empty. I want to combine them together. The final file likes
	rs1107199	rs10078294	rs1117330
LY-P1_A01	G	        AG	       C
LY-P1_A02	G	        G	       C
LY-P1_A03	G	        AG	       C

Open in new window

Thus I used the code
         
                string[] lines1 = File.ReadAllLines(fname1);
                string[] lines2 = File.ReadAllLines(fname2);
                string[] lines3 = File.ReadAllLines(fname3);
                Dictionary<string, string> data =
            (from line1 in lines1 
             let fields1 = line1.Split('\t')
             from line2 in lines2
             let fields2 = line2.Split('\t')
             from line3 in lines3
             let fields3 = line3.Split('\t')
             where (fields1[0] == fields2[0] && fields1 [0]==fields3 [0])
             select new
             {
                 Key = fields1[0],
                 Value = line1+'\t'+line2 +'\t'+line3 
             }).ToDictionary(p => p.Key, p => p.Value);

Open in new window

However I am lacking confidence for that because there is an empty fields in each file.

Thanks for help.
Avatar of wdosanjos
wdosanjos
Flag of United States of America image

You can achieve your goal without the need of a Dictionary, as follows:

string[] lines1 = File.ReadAllLines(fname1);
string[] lines2 = File.ReadAllLines(fname2);
string[] lines3 = File.ReadAllLines(fname3);

var query = from line1 in lines1 
			let fields1 = line1.Split('\t')
			from line2 in lines2
			let fields2 = line2.Split('\t')
			from line3 in lines3
			let fields3 = line3.Split('\t')
			where (fields1[0] == fields2[0] && fields1 [0]==fields3 [0])
			select new
			{
 				Content = fields1[0]+'\t'+fields1[1]+'\t'+fields2[1]+'\t'+fields2[1]
			};
			
using (var outFile = new StreamWriter(fnameout))
{
	foreach (var line in query)
	{
		outFile.WriteLine(line.Content);
	}
}

Open in new window

Avatar of kaufmed
However I am lacking confidence for that because there is an empty fields in each file.
This is my take on what you would like to do. Please let me know if I misinterpreted the requirement  = )
Dictionary<string, string> data = (from line1 in lines1
                                   let fields1 = line1.Split('\t')
                                   from line2 in lines2
                                   let fields2 = line2.Split('\t')
                                   from line3 in lines3
                                   let fields3 = line3.Split('\t')
                                   where (fields1[0] == fields2[0] && fields1[0] == fields3[0])
                                   select new
                                   {
                                       Key = fields1[0],
                                       Value = (fields1[0].Length > 0 ? line1.Replace(fields1[0], string.Empty) : fields1[1]) + '\t' +
                                               (fields2[0].Length > 0 ? line2.Replace(fields2[0], string.Empty) : fields2[1]) + '\t' +
                                               (fields3[0].Length > 0 ? line3.Replace(fields3[0], string.Empty) : fields3[1])
                                   }).ToDictionary(p => p.Key, p => p.Value);

File.WriteAllLines("output.txt", data.Select(item => item.Key + '\t' + item.Value).ToArray());

Open in new window

Avatar of zhshqzyc
zhshqzyc

ASKER

Okay. Actually I have many columns in files rather than two in the example.
How to modify the code?
Please provide some sample data with a couple other columns.  Also, provide the expected output.

Thanks.
I mean that there are many columns. The example just gave you two columns to demostrate.
	       col1	col2    col3	col4	col5   col6   col7    col8	col9
LY-P1_A01	AG	G	GT	C	G	T	G	GA	GA
LY-P1_A02	G	GA	GT			T	G	A	A	

Open in new window

So you cann't enumerate all such as fields[1],fields[2] etc, because you don't know how many columns.
I guess
var query = from line1 in lines1
                            let fields1 = line1.Split('\t')
                            from line2 in lines2
                            let fields2 = line2.Split('\t')
                            from line3 in lines3
                            let fields3 = line3.Split('\t')
                            where (fields1[0] == fields2[0] && fields1[0] == fields3[0])
                            select new
                            {
                                Content = line1  + '\t' + fields2.Skip (1) + '\t' +   fields3.Skip (1)
                            };

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of wdosanjos
wdosanjos
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thank you. Tomorrow I will propose a similar but harder question. Hopefullly I can meet you.