Solved

Retrieving a date from a "Free Text"

Posted on 2004-08-10
8
183 Views
Last Modified: 2012-05-05
Greetings,

I am currently working on a project and I got a problem on retrieving a date given a string of universities and the date of graduation with no patterns.  What I mean about "free text" or "string without patterns" is that anything goes in the string.

Here are some of the examples are:

Community College of Phildadelphia, ASN 79
Temple University BSN 1999
De La Salle University 02
Harvard University, /85
Burlington College Ass. RN 5/1995
AMA Vocational Center, August - 1998
Univeristy of Santo Tomas, Philippines, BSN May 80

As you can see,

in the first example, the date of graduation would be year 1979, at Community College of Philadelphia with a degree of ASN.

in the second example, the date of graduation would be year 1999 at the Temple University.

in the third example, the date of graduation is 2002 at De La Salle University

Fourth is Date: May 1995, School: Burlington College, Degree: RN

Fifth example: Date: May 1980, School: University of Santo Tomas, Philippines, Degree: BSN

My question is this:  is there a way to parse the dates from a string given that the string doesn't have a fixed pattern??  If yes, how?  (It would be better if the degrees and school are also parsed, but the most important is the date to be parsed).

Please feel free to ask questions, if I am not clear.

Thanks,
Fred
0
Comment
Question by:insanekid
  • 2
  • 2
  • 2
8 Comments
 
LVL 20

Expert Comment

by:TheAvenger
ID: 11760161
This is not possible. The best solution is to find out several different patterns, like the year is the last 2 digits or the last 4 digits or is in the middle separated by something, etc. Then for every line you would try to parse it with every pattern you have found. If the line passes several patterns, you have a problem (don't know which one it is) or if it does not match any pattern - the same. So after parsing all lines with all patterns you can define, you would show those that are not sure (i.e. matched none or more than 1 pattern) and give the user the option to extract the date himself
0
 

Author Comment

by:insanekid
ID: 11760565
Hi Avenger,

I was thinking of using Regular Expression split but I am not that familiar with regular expressions.  Could you help me out with this one?

Thanks,
Fred
0
 
LVL 20

Accepted Solution

by:
TheAvenger earned 75 total points
ID: 11760626
Have a look at the Regex class: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpref/html/frlrfSystemTextRegularExpressionsRegexClassTopic.asp
Learn something about the regular expressions, e.g. from here: http://www.regular-expressions.info/
There are more tutorials available in the web, just have a look at google.
You can also make tests with regular expressions and even find some ready here: http://www.regexlib.com/Search.aspx
0
Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
LVL 3

Expert Comment

by:primeMover2004
ID: 11760804
Yes, regular expressions are the way to go here. To me this looks like your input strings assemble like this: <university> <degree> <date>

The easiest way I'd go would be to define 3 captures: one for date, one for degree and one for university. I'd have these 3 expressions run through the input file.

Some questions before we go for the expressions:

Can you expect some end of line pattern such as CR/LF?
Do you know all the strings representing a degree?
Can you give a rough estimation of the percentage of strings conforming to the <university> <degree> <date> format?
0
 

Author Comment

by:insanekid
ID: 11761929
Hi primeMover2004,

Comments to your question:
1. Can you explain what CR & LF is?
2. Nope, I don't know the strings represented by a degree
3. Yes, it is more of <university> <degree> <date> hmmm... probably 75%.

Do you have any suggestions??  

Thanks,
Fred
0
 
LVL 3

Assisted Solution

by:primeMover2004
primeMover2004 earned 75 total points
ID: 11773503

1. CR&LF stand for carriage return & line feed. Those are used to mark the end of a line, or a record as in your case.
2. So it might be a good idea to construct a regular expression and squeeze them out of that file. Do you think that's possible? Do you have access to the file?
3. This means, your application has to rely on some additional information provided by users.

My suggestions is try to find out more about the file using regular expressions and design your application so that if the input scanning finds an ambiguity the user can provide more information. I don't think there's a reasonable solution that works fully automated. Keep it simple.
0

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

This article introduced a TextBox that supports transparent background.   Introduction TextBox is the most widely used control component in GUI design. Most GUI controls do not support transparent background and more or less do not have the…
Entity Framework is a powerful tool to help you interact with the DataBase but still doesn't help much when we have a Stored Procedure that returns more than one resultset. The solution takes some of out-of-the-box thinking; read on!
Established in 1997, Technology Architects has become one of the most reputable technology solutions companies in the country. TA have been providing businesses with cost effective state-of-the-art solutions and unparalleled service that is designed…

827 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question