Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

Retrieving a date from a "Free Text"

Posted on 2004-08-10
8
Medium Priority
?
188 Views
Last Modified: 2012-05-05
Greetings,

I am currently working on a project and I got a problem on retrieving a date given a string of universities and the date of graduation with no patterns.  What I mean about "free text" or "string without patterns" is that anything goes in the string.

Here are some of the examples are:

Community College of Phildadelphia, ASN 79
Temple University BSN 1999
De La Salle University 02
Harvard University, /85
Burlington College Ass. RN 5/1995
AMA Vocational Center, August - 1998
Univeristy of Santo Tomas, Philippines, BSN May 80

As you can see,

in the first example, the date of graduation would be year 1979, at Community College of Philadelphia with a degree of ASN.

in the second example, the date of graduation would be year 1999 at the Temple University.

in the third example, the date of graduation is 2002 at De La Salle University

Fourth is Date: May 1995, School: Burlington College, Degree: RN

Fifth example: Date: May 1980, School: University of Santo Tomas, Philippines, Degree: BSN

My question is this:  is there a way to parse the dates from a string given that the string doesn't have a fixed pattern??  If yes, how?  (It would be better if the degrees and school are also parsed, but the most important is the date to be parsed).

Please feel free to ask questions, if I am not clear.

Thanks,
Fred
0
Comment
Question by:insanekid
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
  • 2
8 Comments
 
LVL 20

Expert Comment

by:TheAvenger
ID: 11760161
This is not possible. The best solution is to find out several different patterns, like the year is the last 2 digits or the last 4 digits or is in the middle separated by something, etc. Then for every line you would try to parse it with every pattern you have found. If the line passes several patterns, you have a problem (don't know which one it is) or if it does not match any pattern - the same. So after parsing all lines with all patterns you can define, you would show those that are not sure (i.e. matched none or more than 1 pattern) and give the user the option to extract the date himself
0
 

Author Comment

by:insanekid
ID: 11760565
Hi Avenger,

I was thinking of using Regular Expression split but I am not that familiar with regular expressions.  Could you help me out with this one?

Thanks,
Fred
0
 
LVL 20

Accepted Solution

by:
TheAvenger earned 300 total points
ID: 11760626
Have a look at the Regex class: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpref/html/frlrfSystemTextRegularExpressionsRegexClassTopic.asp
Learn something about the regular expressions, e.g. from here: http://www.regular-expressions.info/
There are more tutorials available in the web, just have a look at google.
You can also make tests with regular expressions and even find some ready here: http://www.regexlib.com/Search.aspx
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 3

Expert Comment

by:primeMover2004
ID: 11760804
Yes, regular expressions are the way to go here. To me this looks like your input strings assemble like this: <university> <degree> <date>

The easiest way I'd go would be to define 3 captures: one for date, one for degree and one for university. I'd have these 3 expressions run through the input file.

Some questions before we go for the expressions:

Can you expect some end of line pattern such as CR/LF?
Do you know all the strings representing a degree?
Can you give a rough estimation of the percentage of strings conforming to the <university> <degree> <date> format?
0
 

Author Comment

by:insanekid
ID: 11761929
Hi primeMover2004,

Comments to your question:
1. Can you explain what CR & LF is?
2. Nope, I don't know the strings represented by a degree
3. Yes, it is more of <university> <degree> <date> hmmm... probably 75%.

Do you have any suggestions??  

Thanks,
Fred
0
 
LVL 3

Assisted Solution

by:primeMover2004
primeMover2004 earned 300 total points
ID: 11773503

1. CR&LF stand for carriage return & line feed. Those are used to mark the end of a line, or a record as in your case.
2. So it might be a good idea to construct a regular expression and squeeze them out of that file. Do you think that's possible? Do you have access to the file?
3. This means, your application has to rely on some additional information provided by users.

My suggestions is try to find out more about the file using regular expressions and design your application so that if the input scanning finds an ambiguity the user can provide more information. I don't think there's a reasonable solution that works fully automated. Keep it simple.
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Article by: Najam
Having new technologies does not mean they will completely replace old components.  Recently I had to create WCF that will be called by VB6 component.  Here I will describe what steps one should follow while doing so, please feel free to post any qu…
Performance in games development is paramount: every microsecond counts to be able to do everything in less than 33ms (aiming at 16ms). C# foreach statement is one of the worst performance killers, and here I explain why.
Video by: ITPro.TV
In this episode Don builds upon the troubleshooting techniques by demonstrating how to properly monitor a vSphere deployment to detect problems before they occur. He begins the show using tools found within the vSphere suite as ends the show demonst…
Is your data getting by on basic protection measures? In today’s climate of debilitating malware and ransomware—like WannaCry—that may not be enough. You need to establish more than basics, like a recovery plan that protects both data and endpoints.…

604 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question