Solved

Extract email list from a csv file.

Posted on 2011-09-18
8
255 Views
Last Modified: 2012-05-12
Hello,

I have hundreds of contacts in my gmail. I exported all contacts to a csv file by using gmail built-in export function. All contacts fall into several catalogues such as WTCDE New All_2 etc. The csv file's format is wild, each contact has one line in the file.
Ex,

Roger Sune,Roger,,Sune,,,,,,,,,,,,,,,,,,,,,,,WTCDE New All_2 ::: WTCDE Bible ::: WTCDE attendants ::: WTCDE formal members ::: sunday worship coworkers ::: * My Contacts,* ,rogerpkSune@aol.com,,,,,,,,,,,,,,,,,,,,,,,,,
Rollin Burwers,Rollin,,Burwers,,,,,,,,,,,,,,,,,,,,,,,,* ,burwers@mindspring.com,,,,,,,,,,,,,,,,,,,,,,,,,
aking@abcd.us,aking@abcd.us,,,,,,,,,,,,,,,,,,,,,,,,,,* ,aking@abcd.us,,,,,,,,,,,,,,,,,,,,,,,,,

Open in new window

Now I want to extract all email to a file.
The output file's format likes
rogerpkSune@aol.com,burwers@mindspring.com,aking@abcd.us

Open in new window

They may have duplicate ones, I want to get the unique result.
Thanks for help
0
Comment
Question by:zhshqzyc
  • 6
  • 2
8 Comments
 
LVL 23

Expert Comment

by:Jens Fiederer
ID: 36557391
Getting the input could be as easy as

var input = from line in (((new StreamReader(filename)).ReadToEnd()).Split(' ')) select  line[20];  // or maybe not 20, didn't feel like counting out the commas!  Whatever.

With duplicates you mean complete dups?
0
 
LVL 23

Expert Comment

by:Jens Fiederer
ID: 36557420
OK, actual detailed code here (forgot to split out the separate lines in the above) assuming "duplicates" is a complete duplicate of the whole string, and your file is in c:/doc/input.txt
var input = from line in (((new StreamReader("c:/doc/input.txt")).ReadToEnd().Split('\n'))) select line.Split(',');
            var processed = from fields in input where fields.Length > 28 select fields[28];
            var unique = from item in processed group item by item into g select g.Key;
            foreach (var x in unique)
            {
                Console.WriteLine(x);
            }

Open in new window

0
 
LVL 23

Expert Comment

by:Jens Fiederer
ID: 36557424
(added the header stuff, since you need to include files)
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;

namespace uniquecsv
{
    class Program
    {
        static void Main(string[] args)
        {
            var input = from line in (((new StreamReader("c:/doc/input.txt")).ReadToEnd().Split('\n'))) select line.Split(',');
            var processed = from fields in input where fields.Length > 28 select fields[28];
            var unique = from item in processed group item by item into g select g.Key;
            foreach (var x in unique)
            {
                Console.WriteLine(x);
            }

        }
    }
}

Open in new window

0
 

Author Comment

by:zhshqzyc
ID: 36557743
99.5% correct. A few are still wrong, I think that you used fields[28] to cause it.
Is there any way to extract them? Please consider symbol @.
Regular expression for email??
0
Highfive Gives IT Their Time Back

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 
LVL 23

Accepted Solution

by:
Jens Fiederer earned 500 total points
ID: 36570463
I can't say what is up with the other 0.5% without actually seeing the offending data.

It is likely you have some fields that themselves contain commas or even newlines in those.  This can be avoided by doing fairly elaborate parsing that makes exceptions for special characters in certain places....but probably it is easier to just iterate through the fields and only pick out those that contain the "@"  ( you can check a field for that by comparing field.IndexOf("@") != -1) or (as you point out) even matching against a regex..... http://msdn.microsoft.com/en-us/library/ff650303.aspx suggests

 ^(?("")("".+?""@)|(([0-9a-zA-Z]((\.(?!\.))|[-!#\$%&'\*\+/=\?\^`\{\}\|~\w])*)(?<=[0-9a-zA-Z])@))(?(\[)(\[(\d{1,3}\.){3}\d{1,3}\])|(([0-9a-zA-Z][-\w]*[0-9a-zA-Z]\.)+[a-zA-Z]{2,6}))$

for that
0
 
LVL 23

Expert Comment

by:Jens Fiederer
ID: 36570480
Note that if you want to do proper parsing you need specific details about the csv format, not all CSV formats are created equal.

http://en.wikipedia.org/wiki/Comma-separated_values explains:

"Simple CSV implementations will not allow field values that contain a comma or other special characters such as newlines. More sophisticated CSV implementations permit commas and other special characters in a field value. Many implementations use " (double quote) characters around values that contain reserved characters (such as commas, double quotes, or newlines); embedded double quote characters may be represented by a pair of consecutive double quotes. (Creativyst 2010) Some CSV implementations may use an escape character such as a backslash to encode reserved characters as an escape sequence, such as Sybase Central."
0
 
LVL 23

Expert Comment

by:Jens Fiederer
ID: 36570496
While the coding is a bit awkward, if your CSV format is a fair match for some Microsoft format, you might be able to use one of the Data Providers to parse the file for you, as in:

http://www.switchonthecode.com/tutorials/csharp-tutorial-using-the-built-in-oledb-csv-parser
0
 

Author Comment

by:zhshqzyc
ID: 36570613
I got a solution at MSDN forum. The guy used regular expression really impressed me.

Thanks for your input anyway. Points for your fun anyway.
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Suggested Solutions

Introduction Hi all and welcome to my first article on Experts Exchange. A while ago, someone asked me if i could do some tutorials on object oriented programming. I decided to do them on C#. Now you may ask me, why's that? Well, one of the re…
It was really hard time for me to get the understanding of Delegates in C#. I went through many websites and articles but I found them very clumsy. After going through those sites, I noted down the points in a easy way so here I am sharing that unde…
This video shows how to remove a single email address from the Outlook 2010 Auto Suggestion memory. NOTE: For Outlook 2016 and 2013 perform the exact same steps. Open a new email: Click the New email button in Outlook. Start typing the address: …
This video explains how to create simple products associated to Magento configurable product and offers fast way of their generation with Store Manager for Magento tool.

746 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now