Solved

Parsing text files and putting it in a table or CSV

Posted on 2007-11-28
5
411 Views
Last Modified: 2013-11-15
I am trying to parse a public domain dictionary.  The text files look at this.

Kamala (n.) The red dusty hairs of the capsules of an East Indian tree (Mallotus Philippinensis) used for dyeing silk. It is violently emetic, and is used in the treatment of tapeworm.

Kame (n.) A low ridge.

Kami (n. pl.) A title given to the celestial gods of the first mythical dynasty of Japan and extended to the demigods of the second dynasty, and then to the long line of spiritual princes still represented by the mikado.

All the words are broken up like 3 examples above: word (type of word) meaning.  There would be 3 columns parsed.   Everything after the (type of word) is the definitions and might or might not contain ().

Thanks for the help.
S
0
Comment
Question by:StaceyShindo
5 Comments
 
LVL 6

Expert Comment

by:cottsak
Comment Utility
in summary, i'd use regex to capture the different parts of each def and then iterate the groups collection for the columns.
use named/numbered regex groups to get the parts:
http://msdn2.microsoft.com/en-au/library/bs2twtah.aspx

the regex expression might look like this:
^(?<word>\w*)\s\((?<type>[a-zA-Z_. ]*)\)\s(?<meaning>.*)
ive tested the above on ur samples. it works assuming (1) that u dont break across lines in the meaning section and (2) that you have the .NET MultiLine Regex option enabled (just a parameter u pass in)
it accounts for ur last requirement with the () in the meaning section. u may need to expand the character class for the type section if further type sections contain other chatacters - do some Regex reading on Character Classes if necessary.

because i've explicitly named my groups in the expression u can reference the group collection item with the string name
hav fun ;D
0
 
LVL 18

Expert Comment

by:Jinesh Kamdar
Comment Utility
So where is it that you want to put them after parsing? In some DB?
0
 

Author Comment

by:StaceyShindo
Comment Utility
I want to ultimately put them in a db, but if we can get them in a Excel sheet or CSV file, that would be fine.  At the end of the day, we are going to put them in MS-SQL.

Thanks,
S
0
 
LVL 4

Accepted Solution

by:
origfla earned 125 total points
Comment Utility
Here is a solution - it will work in Excel as a quick way to do the job.
Paste your dictionary into a column in an excel spreadsheet - say column C. and say your first entry is in cell "C4".

TO GET THE   "WORD"   PART:
in cell D4, paste the following function:
=LEFT(C4,FIND("(",C4)-2)

TO GET THE   "TYPE OF WORD"   PART:
in cell E4, paste the following function:
=MID(C4,FIND("(",C4),FIND(")",C4)-FIND("(",C4)+1)

TO GET THE   "DEFINITION"   PART:
in cell F4, paste the following function:
=MID(C4,FIND(")",C4)+2,9999)

Now, all you have to do is copy the formula down along the D,E and F columns. You can do this by:
 - Highlighting cells D4,E4 & F4 all at once.
 - Place the cursor over the small square at the right bottom of cell F4 (cursor will become a "+" shape)
 - Double click while cursor is the "+" shape, it will copy the formulas all the way down the list of definitions.

FINALLY, if you want the columns D,E & F as vlaues instead of formulas, so you can use them as you wish, simply select the entire columns D,E & F and go Edit-> Copy.
Then, select any other Column that is empty - Say Column I, then go Edit->Paste Special
and paste Values only.

This is my first reply, so I hope this helps and that this gives you what you're looking for.
0
 

Author Closing Comment

by:StaceyShindo
Comment Utility
This is a great solution that is easy and works.  It will take some more work to copy and paste the dictionary, but that is fine, I am not afraid of a little work.  Thank you so much for your help.

S
0

Featured Post

What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

Join & Write a Comment

Using SQL Scripts we can save all the SQL queries as files that we use very frequently on our database later point of time. This is one of the feature present under SQL Workshop in Oracle Application Express.
Never store passwords in plain text or just their hash: it seems a no-brainier, but there are still plenty of people doing that. I present the why and how on this subject, offering my own real life solution that you can implement right away, bringin…
The viewer will learn how to simulate a series of sales calls dependent on a single skill level and learn how to simulate a series of sales calls dependent on two skill levels. Simulating Independent Sales Calls: Enter .75 into cell C2 – “skill leve…
This Micro Tutorial will demonstrate in Google Sheets how to use the HYPERLINK function to create live links inside your spreadsheet.

728 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

14 Experts available now in Live!

Get 1:1 Help Now