We help IT Professionals succeed at work.

Finding Unique Keyword Phrases in a Text Field (Matching a predefined list)

mandrewmc
mandrewmc asked
on
589 Views
Last Modified: 2010-04-27
Here is the basic gist of what I would like to do.

I have a field with an article/text in it. What I would like to do is have filemaker be able to automatically look in that field and see if it can find any keywords that are already in predefined list stored in another table. So I might have text based information that contains references to different cities, locations, etc. Lets say "Atlanta," "New York," and "San Francisco"

I'd like to be able to look in the field that contains the text, scan it to see if it contains any references to the information, then have the script or calculation return those keywords in a field below it which would contain the matching locations for that article. I know that it can't be too difficult but I'm not sure what the best way to script it would be. So it might look at the article and say aha, this article mentions San Francisco and Atlanta.

I would want one script that could just run on that one field then perhaps a script that could go through a whole range of records looking and extracting the matching keywords in the text field that contains the article/text information. So that I could import a bunch of records and then be able to quickly classify them based upon the pre-defined list in another table of match phrases.

Still learning filemaker, big points to whomever can help me figure out an elegant solution since its quite helpful for my daily work!

thanks!

****Update****
I've figured out the basics from a solution elsewhere on the web to loop through and add they keywords to a field. What I do not know how to do is run a script for JUST the record I am on and also to have it so it only adds the NEW values found, and not necessarily keywords it found on any previous pass. I don't want to have to wipe out the value list each time I process it though that would be acceptable I guess. But I would want to be able to process on a single record instead of having to loop through the entire set.

Here is what I figured out that works for a set:
Script 1 (the loop which calls script 2):

Goto Layout ["keyword tag match" (Keyword Tag Match)]
Show All Records
Goto Record/Request/Page [First]
Loop
Perform Script["Script 2"; Parameter: Keyword Tag Match::KeyPhrase]
Goto Layout ["Keyword Tag Match" (Keyword Tag Match)]
Goto Record/Request/Page [Next; Exit After Last]
End Loop

Script 2:

Goto Layout ["News" (News)]
Show All Records
Goto Record/Request/Page [First]
Loop
If [Position (News::Article Content; Get (ScriptParameter); 1; 1) > 0 ]
Set Field [News::Keyword Matches; 1]
Insert Calculated Results [News::Keyword Match Text; Get(ScriptParameter) & ", "]
End If
Goto Record/Request/Page [Next; Exit After Last]







Comment
Watch Question

>What I do not know how to do is run a script for JUST the record I am on
This should be pretty simple. The current script is deliberately showing all records and then looping through them, so all you need to do is make a copy of Script2, then delete everything except the part inside the loop.

>also to have it so it only adds the NEW values found, and not necessarily keywords it found on any
>previous pass.

This should also be fairly straightforward. Basically, you just need an IF that tests to see if the currently found keyword already exists in the list, and only adds it on if it doesn't. There are a couple of ways you could approach this, either by using an IF in the script itself, or you could simply always append a value to the list of keywords, but base the value on an if() function. That approach is less intuitive, but might be faster if you're worried about performance.

I'd like to point out a possible improvement to your current approach. It seems that you are looping through a list of keywords, and then for each keyword looping through all the stories. It would probably be more efficient to reverse that, and loop through all the stories only one time, then in each story loop through all the keywords in turn. Unless you've got a huge list of keywords, this approach will probably be faster.

Something else you might want to think about...
In a situation like this, where each story can have multiple keywords, and each keyword can be in multiple stories, you've got what's called a many-to-many relationship. Often in these cases, it's best to create an intermediary table which links the two together.

It's more complicated to set up initially, but the advantage is much greater flexibility in the long run, especially when it comes to ading new features or creating sophisticated reports. It's also a more efficient way to store data than storing multiple copies of the keywords in lots of tables. It might be overkill to do this, depends on what your long-term plans are for this database.  


Author

Commented:
I'm likely to have more stories then keywords, though the list would grow over time.

I tried creating copies of the scripts then taking out the loop in script 2 so the script is just from the "If...EndIf" and tried running it from the record I was on. However, it didn't seem to do anything. Is there some step to tell filemaker what record to process?

It seems there must be since when I run script 1, it goes to the keyword layout first, runs through they keywords, calling the Script 2. How does it then know which layout/record to goto on the Script 2? Wouldn't I have to set it some way and tell it to go to the record/layout I was first on when I called the script? (in the current version of running through all the keywords first and checking the articles)


thanks!

Author

Commented:
Ok actually I got it to work when I told it to Goto the "News" layout in Script 2. I guess it just automatically goes back to whatever record you were last on and that is why it works?

Can you give an example of how to implement the "if" in the script to check if a keyword has been written? I'll award credit for that.

I would, however, appreciate an example of how you would design a many to many relationship for this example. I'm learning databases so every concrete example helps, especially if you think its an approach which is better and would scale more efficiently over time. I'm not sure I completely understand what you mean.

thanks!



>Ok actually I got it to work when I told it to Goto the "News" layout in Script 2. I guess it just automatically
>goes back to whatever record you were last on and that is why it works?

Basically, yes. If you are in the wrong layout, then your script may not do what you expect. Each layout in your file is associated with a specific table. Presumably, you have a table for stories, and one for keywords, and each one has at least one layout. When you switch from the stories layout to the keywords layout, you're effectively switching which table you are looking at as well. As long as you don't do anything in this window that alters your position or the found set of records, you will come back to the same place you left off.

This one is on us!
(Get your first solution completely free - no credit card required)
UNLOCK SOLUTION
With respect to the many-to-many relationship, each record in the table needs to have a unique identifier that can be used as an index to refer to that specific record. This is called the primary key. This value should be auto-entered by the system, not entered by a user, and users must not be allowed to edit or change it. In FileMaker, this is usually done by making the ID field an auto-enter serial number.

When each of your two main tables has such an id set up, you then create a third table, which then contains the two ID numbers from the other two tables. This way, each keyword is only stored in one place, and the numbers are used to refer to it, which saves storage space. Also, this makes it easy to find out things like what keyword is the most commonly found one in a set of stories, and that sort of thing.

Here's a very simple example. Open it up and look at the relationships, the definitions of the value lists, and how the fields on the Links layout are formatted. Also, notice that the Links layout can display values from the other two tables without having to store them.

http://www.milleniumhandandshrimp.com/upload/Many2many_keywords.fp7

Author

Commented:
Thanks for this Bill, I see what you mean by linking the tables.

I guess you would then apply a version of the script on the in between file that links the two? As far as I can tell this file just provides a way to assign by numbers. But I see what you mean on how this makes things more efficient, because then you are only storing keywords and articles in one place, and the numbers/id that match in another. So in the article file it would have numbers that represent the id of the keyword that matched, rather then entering the keyword again on every single article. So you would be pointing back to the other table instead of duplicating the content and storing the keyword again in the file.

I was able to modify my little script so that it checks the field to see if the keyword is there and if it is, does nothing, if not it enters the keyword into the field.

thanks for the help, learning about databases has been exciting because I can see how they can help to speed up and manage daily tasks. I am basically trying to build something to help classify and segment news stories for a website in which new content is added every day and there are a lot of stories to process. Searching for keywords helps to quickly classify without necessarily having to read and scan for every mention in the documents.

> As far as I can tell this file just provides a way to assign by numbers.
Yes, that's correct. But it's not just more efficient for storage, it's much more flexible. For example, if you want to know how many times a given keyword is linked to, you simply count the number of link records with that keyword id. And if you want to know how many times a story was linked to, you count the records with that story id.

Also, creating and deleting these link records can be done without having to make changes to the data in the other tables, which is important if you're tracking modification and creation timestamps. You don't want to flag a story as having been modified just because you changed the keywords list. This avoids that sort of problem.

Unlock the solution to this question.
Join our community and discover your potential

Experts Exchange is the only place where you can interact directly with leading experts in the technology field. Become a member today and access the collective knowledge of thousands of technology experts.

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.