Solved

Parsing text files and putting it in a table or CSV

Posted on 2007-11-28
5
445 Views
Last Modified: 2013-11-15
I am trying to parse a public domain dictionary.  The text files look at this.

Kamala (n.) The red dusty hairs of the capsules of an East Indian tree (Mallotus Philippinensis) used for dyeing silk. It is violently emetic, and is used in the treatment of tapeworm.

Kame (n.) A low ridge.

Kami (n. pl.) A title given to the celestial gods of the first mythical dynasty of Japan and extended to the demigods of the second dynasty, and then to the long line of spiritual princes still represented by the mikado.

All the words are broken up like 3 examples above: word (type of word) meaning.  There would be 3 columns parsed.   Everything after the (type of word) is the definitions and might or might not contain ().

Thanks for the help.
S
0
Comment
Question by:StaceyShindo
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
5 Comments
 
LVL 6

Expert Comment

by:cottsak
ID: 20371407
in summary, i'd use regex to capture the different parts of each def and then iterate the groups collection for the columns.
use named/numbered regex groups to get the parts:
http://msdn2.microsoft.com/en-au/library/bs2twtah.aspx

the regex expression might look like this:
^(?<word>\w*)\s\((?<type>[a-zA-Z_. ]*)\)\s(?<meaning>.*)
ive tested the above on ur samples. it works assuming (1) that u dont break across lines in the meaning section and (2) that you have the .NET MultiLine Regex option enabled (just a parameter u pass in)
it accounts for ur last requirement with the () in the meaning section. u may need to expand the character class for the type section if further type sections contain other chatacters - do some Regex reading on Character Classes if necessary.

because i've explicitly named my groups in the expression u can reference the group collection item with the string name
hav fun ;D
0
 
LVL 18

Expert Comment

by:Jinesh Kamdar
ID: 20371434
So where is it that you want to put them after parsing? In some DB?
0
 

Author Comment

by:StaceyShindo
ID: 20371492
I want to ultimately put them in a db, but if we can get them in a Excel sheet or CSV file, that would be fine.  At the end of the day, we are going to put them in MS-SQL.

Thanks,
S
0
 
LVL 4

Accepted Solution

by:
origfla earned 125 total points
ID: 20371558
Here is a solution - it will work in Excel as a quick way to do the job.
Paste your dictionary into a column in an excel spreadsheet - say column C. and say your first entry is in cell "C4".

TO GET THE   "WORD"   PART:
in cell D4, paste the following function:
=LEFT(C4,FIND("(",C4)-2)

TO GET THE   "TYPE OF WORD"   PART:
in cell E4, paste the following function:
=MID(C4,FIND("(",C4),FIND(")",C4)-FIND("(",C4)+1)

TO GET THE   "DEFINITION"   PART:
in cell F4, paste the following function:
=MID(C4,FIND(")",C4)+2,9999)

Now, all you have to do is copy the formula down along the D,E and F columns. You can do this by:
 - Highlighting cells D4,E4 & F4 all at once.
 - Place the cursor over the small square at the right bottom of cell F4 (cursor will become a "+" shape)
 - Double click while cursor is the "+" shape, it will copy the formulas all the way down the list of definitions.

FINALLY, if you want the columns D,E & F as vlaues instead of formulas, so you can use them as you wish, simply select the entire columns D,E & F and go Edit-> Copy.
Then, select any other Column that is empty - Say Column I, then go Edit->Paste Special
and paste Values only.

This is my first reply, so I hope this helps and that this gives you what you're looking for.
0
 

Author Closing Comment

by:StaceyShindo
ID: 31411584
This is a great solution that is easy and works.  It will take some more work to copy and paste the dictionary, but that is fine, I am not afraid of a little work.  Thank you so much for your help.

S
0

Featured Post

Microsoft Certification Exam 74-409

Veeam® is happy to provide the Microsoft community with a study guide prepared by MVP and MCT, Orin Thomas. This guide will take you through each of the exam objectives, helping you to prepare for and pass the examination.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Recently I was talking with Tim Sharp, one of my colleagues from our Technical Account Manager team about MongoDB’s scalability. While doing some quick training with some of the Percona team, Tim brought something to my attention...
Your data is at risk. Probably more today that at any other time in history. There are simply more people with more access to the Web with bad intentions.
This Micro Tutorial will demonstrate on a Mac how to change the sort order for chart legend values and decrpyt the intimidating chart menu.
Many functions in Excel can make decisions. The most simple of these is the IF function: it returns a value depending on whether a condition you describe is true or false. Once you get the hang of using the IF function, you will find it easier to us…

726 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question