?
Solved

Parsing text files and putting it in a table or CSV

Posted on 2007-11-28
5
Medium Priority
?
451 Views
Last Modified: 2013-11-15
I am trying to parse a public domain dictionary.  The text files look at this.

Kamala (n.) The red dusty hairs of the capsules of an East Indian tree (Mallotus Philippinensis) used for dyeing silk. It is violently emetic, and is used in the treatment of tapeworm.

Kame (n.) A low ridge.

Kami (n. pl.) A title given to the celestial gods of the first mythical dynasty of Japan and extended to the demigods of the second dynasty, and then to the long line of spiritual princes still represented by the mikado.

All the words are broken up like 3 examples above: word (type of word) meaning.  There would be 3 columns parsed.   Everything after the (type of word) is the definitions and might or might not contain ().

Thanks for the help.
S
0
Comment
Question by:StaceyShindo
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
5 Comments
 
LVL 6

Expert Comment

by:cottsak
ID: 20371407
in summary, i'd use regex to capture the different parts of each def and then iterate the groups collection for the columns.
use named/numbered regex groups to get the parts:
http://msdn2.microsoft.com/en-au/library/bs2twtah.aspx

the regex expression might look like this:
^(?<word>\w*)\s\((?<type>[a-zA-Z_. ]*)\)\s(?<meaning>.*)
ive tested the above on ur samples. it works assuming (1) that u dont break across lines in the meaning section and (2) that you have the .NET MultiLine Regex option enabled (just a parameter u pass in)
it accounts for ur last requirement with the () in the meaning section. u may need to expand the character class for the type section if further type sections contain other chatacters - do some Regex reading on Character Classes if necessary.

because i've explicitly named my groups in the expression u can reference the group collection item with the string name
hav fun ;D
0
 
LVL 18

Expert Comment

by:Jinesh Kamdar
ID: 20371434
So where is it that you want to put them after parsing? In some DB?
0
 

Author Comment

by:StaceyShindo
ID: 20371492
I want to ultimately put them in a db, but if we can get them in a Excel sheet or CSV file, that would be fine.  At the end of the day, we are going to put them in MS-SQL.

Thanks,
S
0
 
LVL 4

Accepted Solution

by:
origfla earned 500 total points
ID: 20371558
Here is a solution - it will work in Excel as a quick way to do the job.
Paste your dictionary into a column in an excel spreadsheet - say column C. and say your first entry is in cell "C4".

TO GET THE   "WORD"   PART:
in cell D4, paste the following function:
=LEFT(C4,FIND("(",C4)-2)

TO GET THE   "TYPE OF WORD"   PART:
in cell E4, paste the following function:
=MID(C4,FIND("(",C4),FIND(")",C4)-FIND("(",C4)+1)

TO GET THE   "DEFINITION"   PART:
in cell F4, paste the following function:
=MID(C4,FIND(")",C4)+2,9999)

Now, all you have to do is copy the formula down along the D,E and F columns. You can do this by:
 - Highlighting cells D4,E4 & F4 all at once.
 - Place the cursor over the small square at the right bottom of cell F4 (cursor will become a "+" shape)
 - Double click while cursor is the "+" shape, it will copy the formulas all the way down the list of definitions.

FINALLY, if you want the columns D,E & F as vlaues instead of formulas, so you can use them as you wish, simply select the entire columns D,E & F and go Edit-> Copy.
Then, select any other Column that is empty - Say Column I, then go Edit->Paste Special
and paste Values only.

This is my first reply, so I hope this helps and that this gives you what you're looking for.
0
 

Author Closing Comment

by:StaceyShindo
ID: 31411584
This is a great solution that is easy and works.  It will take some more work to copy and paste the dictionary, but that is fine, I am not afraid of a little work.  Thank you so much for your help.

S
0

Featured Post

Get your Conversational Ransomware Defense e‑book

This e-book gives you an insight into the ransomware threat and reviews the fundamentals of top-notch ransomware preparedness and recovery. To help you protect yourself and your organization. The initial infection may be inevitable, so the best protection is to be fully prepared.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Microsoft Access is a place to store data within tables and represent this stored data using multiple database objects such as in form of macros, forms, reports, etc. After a MS Access database is created there is need to improve the performance and…
This post looks at MongoDB and MySQL, and covers high-level MongoDB strengths, weaknesses, features, and uses from the perspective of an SQL user.
Excel styles will make formatting consistent and let you apply and change formatting faster. In this tutorial, you'll learn how to use Excel's built-in styles, how to modify styles, and how to create your own. You'll also learn how to use your custo…
In this video, Percona Solution Engineer Rick Golba discuss how (and why) you implement high availability in a database environment. To discuss how Percona Consulting can help with your design and architecture needs for your database and infrastr…

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question