Solved

Retrieving a date from a "Free Text"

Posted on 2004-08-10
8
180 Views
Last Modified: 2012-05-05
Greetings,

I am currently working on a project and I got a problem on retrieving a date given a string of universities and the date of graduation with no patterns.  What I mean about "free text" or "string without patterns" is that anything goes in the string.

Here are some of the examples are:

Community College of Phildadelphia, ASN 79
Temple University BSN 1999
De La Salle University 02
Harvard University, /85
Burlington College Ass. RN 5/1995
AMA Vocational Center, August - 1998
Univeristy of Santo Tomas, Philippines, BSN May 80

As you can see,

in the first example, the date of graduation would be year 1979, at Community College of Philadelphia with a degree of ASN.

in the second example, the date of graduation would be year 1999 at the Temple University.

in the third example, the date of graduation is 2002 at De La Salle University

Fourth is Date: May 1995, School: Burlington College, Degree: RN

Fifth example: Date: May 1980, School: University of Santo Tomas, Philippines, Degree: BSN

My question is this:  is there a way to parse the dates from a string given that the string doesn't have a fixed pattern??  If yes, how?  (It would be better if the degrees and school are also parsed, but the most important is the date to be parsed).

Please feel free to ask questions, if I am not clear.

Thanks,
Fred
0
Comment
Question by:insanekid
  • 2
  • 2
  • 2
8 Comments
 
LVL 20

Expert Comment

by:TheAvenger
ID: 11760161
This is not possible. The best solution is to find out several different patterns, like the year is the last 2 digits or the last 4 digits or is in the middle separated by something, etc. Then for every line you would try to parse it with every pattern you have found. If the line passes several patterns, you have a problem (don't know which one it is) or if it does not match any pattern - the same. So after parsing all lines with all patterns you can define, you would show those that are not sure (i.e. matched none or more than 1 pattern) and give the user the option to extract the date himself
0
 

Author Comment

by:insanekid
ID: 11760565
Hi Avenger,

I was thinking of using Regular Expression split but I am not that familiar with regular expressions.  Could you help me out with this one?

Thanks,
Fred
0
 
LVL 20

Accepted Solution

by:
TheAvenger earned 75 total points
ID: 11760626
Have a look at the Regex class: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpref/html/frlrfSystemTextRegularExpressionsRegexClassTopic.asp
Learn something about the regular expressions, e.g. from here: http://www.regular-expressions.info/
There are more tutorials available in the web, just have a look at google.
You can also make tests with regular expressions and even find some ready here: http://www.regexlib.com/Search.aspx
0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 
LVL 3

Expert Comment

by:primeMover2004
ID: 11760804
Yes, regular expressions are the way to go here. To me this looks like your input strings assemble like this: <university> <degree> <date>

The easiest way I'd go would be to define 3 captures: one for date, one for degree and one for university. I'd have these 3 expressions run through the input file.

Some questions before we go for the expressions:

Can you expect some end of line pattern such as CR/LF?
Do you know all the strings representing a degree?
Can you give a rough estimation of the percentage of strings conforming to the <university> <degree> <date> format?
0
 

Author Comment

by:insanekid
ID: 11761929
Hi primeMover2004,

Comments to your question:
1. Can you explain what CR & LF is?
2. Nope, I don't know the strings represented by a degree
3. Yes, it is more of <university> <degree> <date> hmmm... probably 75%.

Do you have any suggestions??  

Thanks,
Fred
0
 
LVL 3

Assisted Solution

by:primeMover2004
primeMover2004 earned 75 total points
ID: 11773503

1. CR&LF stand for carriage return & line feed. Those are used to mark the end of a line, or a record as in your case.
2. So it might be a good idea to construct a regular expression and squeeze them out of that file. Do you think that's possible? Do you have access to the file?
3. This means, your application has to rely on some additional information provided by users.

My suggestions is try to find out more about the file using regular expressions and design your application so that if the input scanning finds an ambiguity the user can provide more information. I don't think there's a reasonable solution that works fully automated. Keep it simple.
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Introduction                                                 Was the var keyword really only brought out to shorten your syntax? Or have the VB language guys got their way in C#? What type of variable is it? All will be revealed.   Also called…
Article by: Najam
Having new technologies does not mean they will completely replace old components.  Recently I had to create WCF that will be called by VB6 component.  Here I will describe what steps one should follow while doing so, please feel free to post any qu…
This video discusses moving either the default database or any database to a new volume.
In this tutorial you'll learn about bandwidth monitoring with flows and packet sniffing with our network monitoring solution PRTG Network Monitor (https://www.paessler.com/prtg). If you're interested in additional methods for monitoring bandwidt…

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now