Solved

Retrieving a date from a "Free Text"

Posted on 2004-08-10
8
181 Views
Last Modified: 2012-05-05
Greetings,

I am currently working on a project and I got a problem on retrieving a date given a string of universities and the date of graduation with no patterns.  What I mean about "free text" or "string without patterns" is that anything goes in the string.

Here are some of the examples are:

Community College of Phildadelphia, ASN 79
Temple University BSN 1999
De La Salle University 02
Harvard University, /85
Burlington College Ass. RN 5/1995
AMA Vocational Center, August - 1998
Univeristy of Santo Tomas, Philippines, BSN May 80

As you can see,

in the first example, the date of graduation would be year 1979, at Community College of Philadelphia with a degree of ASN.

in the second example, the date of graduation would be year 1999 at the Temple University.

in the third example, the date of graduation is 2002 at De La Salle University

Fourth is Date: May 1995, School: Burlington College, Degree: RN

Fifth example: Date: May 1980, School: University of Santo Tomas, Philippines, Degree: BSN

My question is this:  is there a way to parse the dates from a string given that the string doesn't have a fixed pattern??  If yes, how?  (It would be better if the degrees and school are also parsed, but the most important is the date to be parsed).

Please feel free to ask questions, if I am not clear.

Thanks,
Fred
0
Comment
Question by:insanekid
  • 2
  • 2
  • 2
8 Comments
 
LVL 20

Expert Comment

by:TheAvenger
ID: 11760161
This is not possible. The best solution is to find out several different patterns, like the year is the last 2 digits or the last 4 digits or is in the middle separated by something, etc. Then for every line you would try to parse it with every pattern you have found. If the line passes several patterns, you have a problem (don't know which one it is) or if it does not match any pattern - the same. So after parsing all lines with all patterns you can define, you would show those that are not sure (i.e. matched none or more than 1 pattern) and give the user the option to extract the date himself
0
 

Author Comment

by:insanekid
ID: 11760565
Hi Avenger,

I was thinking of using Regular Expression split but I am not that familiar with regular expressions.  Could you help me out with this one?

Thanks,
Fred
0
 
LVL 20

Accepted Solution

by:
TheAvenger earned 75 total points
ID: 11760626
Have a look at the Regex class: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpref/html/frlrfSystemTextRegularExpressionsRegexClassTopic.asp
Learn something about the regular expressions, e.g. from here: http://www.regular-expressions.info/
There are more tutorials available in the web, just have a look at google.
You can also make tests with regular expressions and even find some ready here: http://www.regexlib.com/Search.aspx
0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 
LVL 3

Expert Comment

by:primeMover2004
ID: 11760804
Yes, regular expressions are the way to go here. To me this looks like your input strings assemble like this: <university> <degree> <date>

The easiest way I'd go would be to define 3 captures: one for date, one for degree and one for university. I'd have these 3 expressions run through the input file.

Some questions before we go for the expressions:

Can you expect some end of line pattern such as CR/LF?
Do you know all the strings representing a degree?
Can you give a rough estimation of the percentage of strings conforming to the <university> <degree> <date> format?
0
 

Author Comment

by:insanekid
ID: 11761929
Hi primeMover2004,

Comments to your question:
1. Can you explain what CR & LF is?
2. Nope, I don't know the strings represented by a degree
3. Yes, it is more of <university> <degree> <date> hmmm... probably 75%.

Do you have any suggestions??  

Thanks,
Fred
0
 
LVL 3

Assisted Solution

by:primeMover2004
primeMover2004 earned 75 total points
ID: 11773503

1. CR&LF stand for carriage return & line feed. Those are used to mark the end of a line, or a record as in your case.
2. So it might be a good idea to construct a regular expression and squeeze them out of that file. Do you think that's possible? Do you have access to the file?
3. This means, your application has to rely on some additional information provided by users.

My suggestions is try to find out more about the file using regular expressions and design your application so that if the input scanning finds an ambiguity the user can provide more information. I don't think there's a reasonable solution that works fully automated. Keep it simple.
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
C# Connection String for Oracle database is not working 22 93
bulid json format 3 46
Sum Column in GridView 3 44
Not showing JavaScript in the list 5 40
Introduction                                                 Was the var keyword really only brought out to shorten your syntax? Or have the VB language guys got their way in C#? What type of variable is it? All will be revealed.   Also called…
This article aims to explain the working of CircularLogArchiver. This tool was designed to solve the buildup of log file in cases where systems do not support circular logging or where circular logging is not enabled
Migrating to Microsoft Office 365 is becoming increasingly popular for organizations both large and small. If you have made the leap to Microsoft’s cloud platform, you know that you will need to create a corporate email signature for your Office 365…
Many functions in Excel can make decisions. The most simple of these is the IF function: it returns a value depending on whether a condition you describe is true or false. Once you get the hang of using the IF function, you will find it easier to us…

867 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

21 Experts available now in Live!

Get 1:1 Help Now