Solved

Parsing text files and putting it in a table or CSV

Posted on 2007-11-28
5
438 Views
Last Modified: 2013-11-15
I am trying to parse a public domain dictionary.  The text files look at this.

Kamala (n.) The red dusty hairs of the capsules of an East Indian tree (Mallotus Philippinensis) used for dyeing silk. It is violently emetic, and is used in the treatment of tapeworm.

Kame (n.) A low ridge.

Kami (n. pl.) A title given to the celestial gods of the first mythical dynasty of Japan and extended to the demigods of the second dynasty, and then to the long line of spiritual princes still represented by the mikado.

All the words are broken up like 3 examples above: word (type of word) meaning.  There would be 3 columns parsed.   Everything after the (type of word) is the definitions and might or might not contain ().

Thanks for the help.
S
0
Comment
Question by:StaceyShindo
5 Comments
 
LVL 6

Expert Comment

by:cottsak
ID: 20371407
in summary, i'd use regex to capture the different parts of each def and then iterate the groups collection for the columns.
use named/numbered regex groups to get the parts:
http://msdn2.microsoft.com/en-au/library/bs2twtah.aspx

the regex expression might look like this:
^(?<word>\w*)\s\((?<type>[a-zA-Z_. ]*)\)\s(?<meaning>.*)
ive tested the above on ur samples. it works assuming (1) that u dont break across lines in the meaning section and (2) that you have the .NET MultiLine Regex option enabled (just a parameter u pass in)
it accounts for ur last requirement with the () in the meaning section. u may need to expand the character class for the type section if further type sections contain other chatacters - do some Regex reading on Character Classes if necessary.

because i've explicitly named my groups in the expression u can reference the group collection item with the string name
hav fun ;D
0
 
LVL 18

Expert Comment

by:Jinesh Kamdar
ID: 20371434
So where is it that you want to put them after parsing? In some DB?
0
 

Author Comment

by:StaceyShindo
ID: 20371492
I want to ultimately put them in a db, but if we can get them in a Excel sheet or CSV file, that would be fine.  At the end of the day, we are going to put them in MS-SQL.

Thanks,
S
0
 
LVL 4

Accepted Solution

by:
origfla earned 125 total points
ID: 20371558
Here is a solution - it will work in Excel as a quick way to do the job.
Paste your dictionary into a column in an excel spreadsheet - say column C. and say your first entry is in cell "C4".

TO GET THE   "WORD"   PART:
in cell D4, paste the following function:
=LEFT(C4,FIND("(",C4)-2)

TO GET THE   "TYPE OF WORD"   PART:
in cell E4, paste the following function:
=MID(C4,FIND("(",C4),FIND(")",C4)-FIND("(",C4)+1)

TO GET THE   "DEFINITION"   PART:
in cell F4, paste the following function:
=MID(C4,FIND(")",C4)+2,9999)

Now, all you have to do is copy the formula down along the D,E and F columns. You can do this by:
 - Highlighting cells D4,E4 & F4 all at once.
 - Place the cursor over the small square at the right bottom of cell F4 (cursor will become a "+" shape)
 - Double click while cursor is the "+" shape, it will copy the formulas all the way down the list of definitions.

FINALLY, if you want the columns D,E & F as vlaues instead of formulas, so you can use them as you wish, simply select the entire columns D,E & F and go Edit-> Copy.
Then, select any other Column that is empty - Say Column I, then go Edit->Paste Special
and paste Values only.

This is my first reply, so I hope this helps and that this gives you what you're looking for.
0
 

Author Closing Comment

by:StaceyShindo
ID: 31411584
This is a great solution that is easy and works.  It will take some more work to copy and paste the dictionary, but that is fine, I am not afraid of a little work.  Thank you so much for your help.

S
0

Featured Post

Netscaler Common Configuration How To guides

If you use NetScaler you will want to see these guides. The NetScaler How To Guides show administrators how to get NetScaler up and configured by providing instructions for common scenarios and some not so common ones.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

CCModeler offers a way to enter basic information like entities, attributes and relationships and export them as yEd or erviz diagram. It also can import existing Access or SQL Server tables with relationships.
This article descibes how to create a connection between Excel and SAP and how to move data from Excel to SAP or the other way around.
This Micro Tutorial demonstrates how to create Excel charts: column, area, line, bar, and scatter charts. Formatting tips are provided as well.
Excel styles will make formatting consistent and let you apply and change formatting faster. In this tutorial, you'll learn how to use Excel's built-in styles, how to modify styles, and how to create your own. You'll also learn how to use your custo…

828 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question