?
Solved

Import CSV or TXT to database - theoretical questions

Posted on 2003-03-08
8
Medium Priority
?
416 Views
Last Modified: 2012-06-27
Hey All,

I am building an e-mail list manager that should allow the user the ability to import a CSV or TXT file they exported from Excel or their e-mail client. I am using ASPUpload to upload the file to the server and then I use FileSystemObject to open the file, Split() to put the contents into an array and then ADO to get the contents into the database. The problem I can forsee, is with error validation on the CSV or TXT file. I have been researching quite a bit of code and they all validate different things. Some replace quotes, others replace spaces, others replace line breaks, some remove column headers, etc. The CSV file I am using to mock up the application has one easy to follow format, but that is not real-world.

Here are my requirements:

The file should have only one column. The column should contain all e-mail addresses and I need to make sure I don't import the column header if the file contains one. Each e-mail address can have no trailing spaces, line breaks, etc.

Should I use a regular expression to do all this validation? And if so, what are all the things I should check for? In what linear fashion should I do the validation?

Please ask questions if need be. Thanks in advance for all your help.

Brian
0
Comment
Question by:bjrcreations
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
8 Comments
 
LVL 13

Expert Comment

by:stefri
ID: 8095280
What is an email address (see RFC 822) ?
Left-hand-side: A string which can be made of:
characters from A-z and 0-9, special characteers such as: - (dash) or . (dot)
An @ sign followed by right-hand-side of the same kind of sequence as left hand side of the email address

You will check first if you find an @ sign: if not found ignore the string as email address
If found, check left hand side
Compare each character against valid characters using MID or regex expression
If LHS is valid, check right hand side
If RHS is valid, import the string as a valid email address.

Some code as follow may help you for a starting point

atSignPos = instr(1, email, "@")

if check(email, 1, atSignPos) then
   if check(email, atSignPos, len(email)) then
      ' Good Email Address
   end if
end if
.....
....

function check(str,fromHere, toThere)
dim i
check = false


if toThere > 1 then
    check =true
    for i = fromHere to toThere
      theChar= (mid(str, i,1)
      if asc(theChar) = asc("-") or _
         asc(theChar) = asc(".") or _
         (asc(theChar) >= asc("A") and _
         (asc(theChar) <= asc("z")) or _
         (asc(theChar) >= asc("0") and _
         (asc(theChar) <= asc("9"))

         check = check and true
       else
         check = false
         exit for
       end if
    next
end if
end function
0
 
LVL 4

Expert Comment

by:iozturk
ID: 8095985
If I dont misunderstand your question this may help you.

email= Replace(email , chr(34), "") 'for double quotas
email= Replace(email , " ", "") 'for spaces
email= Replace(email , vbcrlf, "") 'for linebreaks
email= Replace(email , "'", "") 'for single quotas
0
 
LVL 4

Accepted Solution

by:
whammy earned 500 total points
ID: 8096001
Hmm, this is probably specific to how your list manager works.

What I would do (and I DO use something like this to import data into tables) is to write a script that parses the file one line at a time.

Since you don't want any line breaks or spaces, I'd check to see that:

- the line isn't empty
- the email address is loosely validated (but of course with a regular expression), i.e.:

Function IsEmail(str)
   Dim ieRegEx
   Set ieRegEx = New RegExp
   ieRegEx.Pattern = "^[\w\+\'\.-]+@[\w\'\.-]+\.[a-zA-Z0-9]{2,}$"
   IsEmail = ieRegEx.Test(str)
End Function

Of course some valid email addresses might not pass this test (although that will allow for even some relatively obscure valid emails with "+" in them), but I cannot understand the relevant RFC at all (and I have been looking for a solid reference for months that actually makes sense), so I just loosely validate email addresses, and the above regex seems to work pretty darn well...

Once you do that, I'd have the script you're creating split the line you're reading by spaces into an array... if there are any spaces (an array was created), then you don't include that line in the new file you are creating to load into your list manager, you might want to write it to a "bad" text file instead so you can double-check the data initially, at least.

I'd try something like that with a test list and see how it works.

As for questions, I'd want to know if your clients were sending things in comma-delimited format with a double quote text identifier, etc.

You can do a lot of this stuff with a simple VBScript, but with huge lists, it won't be very fast.

I'm working on a .NET program (.exe) that will (hopefully) quickly and easily format different types of files into the correct format to load into an email database - unfortunately, even using regular expressions, I don't see a clear-cut way to determine if a file is in a certain format and doesn't contain errors, so it does require some user intervention to change formats, but so far it's very fast compared to VBScript (seconds instead of minutes)!

0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 4

Expert Comment

by:whammy
ID: 8096057
Actually I should say you can do ALL of this stuff with a simple VBScript file, not alot of it. ;-)

It's slow, but you can have a VBScript parse files at your leisure while you're doing other stuff (although they do take up system memory on your local computer).

An .exe is better for this kind of thing, but in a pinch you can always use VBScript:

http://www.w3schools.com/asp/asp_ref_filesystem.asp

That link helped me a LOT regarding different kinds of file manipulation (since none of our clients use XML, especially!).
0
 
LVL 3

Author Comment

by:bjrcreations
ID: 8096543
Whammy, in your regular expression example above you use:

^[\w\+\'\.-]+@[\w\'\.-]+\.[a-zA-Z0-9]{2,}$

as your pattern. Does this pattern include IP addresses as the domain? I want to use the most accurate regular expression I can find for e-mail validation. Thanks.

Brian
0
 
LVL 4

Expert Comment

by:whammy
ID: 8097666
Yeah, that's what the 0-9 is for.

That's probably not the most accurate (since like I said, I have been unable to find a human readable reference, the appropriate RFC doesn't fit that category, to me anyway ;-)), but it has been working on a site that gets hundreds of emails a day with no problems for months...

Althought it's loose, it still catches a lot of invalid email addresses...

There are a great number of email address regex's here that claim to adhere directly to the specs:

http://www.regexlib.com

...however most that I tried there didn't allow for some perfectly valid email addresses, like:

Jim.O'Brien+internal@someplace-else.com

However, this one submitted by Jason Howat _seems_ to work rather well from my initial testing:

^(([^<>;()[\]\\.,;:@"]+(\.[^<>()[\]\\.,;:@"]+)*)|(".+"))@((([a-z]([-a-z0-9]*[a-z0-9])?)|(#[0-9]+)|(\[((([01]?[0-9]{0,2})|(2(([0-4][0-9])|(5[0-5]))))\.){3}(([01]?[0-9]{0,2})|(2(([0-4][0-9])|(5[0-5]))))\]))\.)*(([a-z]([-a-z0-9]*[a-z0-9])?)|(#[0-9]+)|(\[((([01]?[0-9]{0,2})|(2(([0-4][0-9])|(5[0-5]))))\.){3}(([01]?[0-9]{0,2})|(2(([0-4][0-9])|(5[0-5]))))\]))$

:-|
0
 
LVL 3

Author Comment

by:bjrcreations
ID: 8108688
Thanks for your input Whammy, it helped me get started on the right track.
0
 
LVL 4

Expert Comment

by:whammy
ID: 8110397
No problem, let me know if you run into any specific problems with this, as I do this kind of stuff all the time. ;)
0

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Hello, all! I just recently started using Microsoft's IIS 7.5 within Windows 7, as I just downloaded and installed the 90 day trial of Windows 7. (Got to love Microsoft for allowing 90 days) The main reason for downloading and testing Windows 7 is t…
Have you ever needed to get an ASP script to wait for a while? I have, just to let something else happen. Or in my case, to allow other stuff to happen while I was murdering my MySQL database with an update. The Original Issue This was written…
Add bar graphs to Access queries using Unicode block characters. Graphs appear on every record in the color you want. Give life to numbers. Hopes this gives you ideas on visualizing your data in new ways ~ Create a calculated field in a query: …
In this video you will find out how to export Office 365 mailboxes using the built in eDiscovery tool. Bear in mind that although this method might be useful in some cases, using PST files as Office 365 backup is troublesome in a long run (more on t…

752 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question