Andrea Edwards
asked on
Building a glossary into your website
Hi
I store descriptions in a database table. Lets call the table Organisation and the field 'description,' Any word/phrase in this description may be an important word, call it 'term' that links to another table, Glossary, which naturally contains definitions and inforation about the matching term. It also has a link a to an external resource giving 'proper' information about that term on an external site
In terms of displaying a word/phrase that is in the glossary- I want the webpage to visually highlight the terms in the description field that are in the glossary. There will probably be a pop up when the user hover the mouse over the term or perhaps a link to an external site. That isn't really where I am stuck.
Another important point is that the glossary table is small compared to the number of size of description fields and the words they contain
I have seen a solution where you iterate over word tuples in the description field and see if they match an entry in the Glossary table . This article uses this solution give or take (http://moinne.com/blog/ronald/mysql/mysql-creating-a-link-to-your-glossary-while-fetching-text-for-a-webpage) but there is a comment regarding speed and performance).
When I build the description field I could have some unique tag / regex around words that belong in the glossary as I am writing the descriptions myself and I know the terms in the description that have a a glossary link, This way I only need to get the look up the words that have my chosen regex so this might speed things up a bit. On the down side if I added a a new glossary entry I would have to run some batch job which looked through all description fields for this glossary term and added my tag to them
I also read something about having the glossary on the client side in cache in this post (http://stackoverflow.com/questions/23159206/dynamic-glossary-based-on-php-and-mysql-performance-ideas) but I didn't really understand what they were talking about. Perhaps they mean to add the glossary on the client side and search it here before the page is loaded. But I am not really sure what they mean. They do mention storing the glossary in json.
I could use some sort of ajax approach by adding the glossary term is the user hovered the mouse over the term but I really need to visually indicate the terms as glossary terms in advance when I write the page otherwise how would the user know to hover the mouse or that there was a glossary entry related to the term.
So to summaries would you:
a) do the look up on the server with the database idea. Perhaps this can be optimised (if so please indicate how)
b) would you have the glossary on client side (if so - please provide pointers how)
c) would you use the regex and build the glossary into the site from its start and have 'jobs' to upate the datbase if a new gossary term is added
d) use another option altogether that I have missed
Thanks in advance
I store descriptions in a database table. Lets call the table Organisation and the field 'description,' Any word/phrase in this description may be an important word, call it 'term' that links to another table, Glossary, which naturally contains definitions and inforation about the matching term. It also has a link a to an external resource giving 'proper' information about that term on an external site
In terms of displaying a word/phrase that is in the glossary- I want the webpage to visually highlight the terms in the description field that are in the glossary. There will probably be a pop up when the user hover the mouse over the term or perhaps a link to an external site. That isn't really where I am stuck.
Another important point is that the glossary table is small compared to the number of size of description fields and the words they contain
I have seen a solution where you iterate over word tuples in the description field and see if they match an entry in the Glossary table . This article uses this solution give or take (http://moinne.com/blog/ronald/mysql/mysql-creating-a-link-to-your-glossary-while-fetching-text-for-a-webpage) but there is a comment regarding speed and performance).
When I build the description field I could have some unique tag / regex around words that belong in the glossary as I am writing the descriptions myself and I know the terms in the description that have a a glossary link, This way I only need to get the look up the words that have my chosen regex so this might speed things up a bit. On the down side if I added a a new glossary entry I would have to run some batch job which looked through all description fields for this glossary term and added my tag to them
I also read something about having the glossary on the client side in cache in this post (http://stackoverflow.com/questions/23159206/dynamic-glossary-based-on-php-and-mysql-performance-ideas) but I didn't really understand what they were talking about. Perhaps they mean to add the glossary on the client side and search it here before the page is loaded. But I am not really sure what they mean. They do mention storing the glossary in json.
I could use some sort of ajax approach by adding the glossary term is the user hovered the mouse over the term but I really need to visually indicate the terms as glossary terms in advance when I write the page otherwise how would the user know to hover the mouse or that there was a glossary entry related to the term.
So to summaries would you:
a) do the look up on the server with the database idea. Perhaps this can be optimised (if so please indicate how)
b) would you have the glossary on client side (if so - please provide pointers how)
c) would you use the regex and build the glossary into the site from its start and have 'jobs' to upate the datbase if a new gossary term is added
d) use another option altogether that I have missed
Thanks in advance
ASKER
thank you fir your detailed reply. I will need a little time to assimilate it before I chose it as the solution but your answers have never let me down in the past :)
Sure, please take your time and ask follow-on questions as needed. This is a really great question and it deserves thoughtful attention!
ASKER
good. :)
I have read your answer and this seems to rely on the queries being via 'get' queries. i dont have a problem with get queries (I know some people who are stcicklers for post() but If I were to consider the large number of input variables variable the query strong and their range of values, the maths to make the number of page permutations gives a large number quite large.
very clever for considering suffixes and prefixes or even plain old plurals 0 I hadn't conditioner then yet,
Does the large number of dynamic pages that can be from query string permutation change your solution. I will re-read the cache article but I don't know when you will be around so I am asking the best I can from what I have gleaned so are. I don't know whether this is correct design principles but I thought I might wireframe the site first. Most people i know says this is unnecessary (true ontologists that want truly independent code would call this interaction problem- when you write something form the standpoint of what it must do then you are not writing truly independent code).
I disagree with this but I have not been eloquent enough on my feet to say why. Even if its something as simple as you need to know what ia project does to make sure you have all bases covered.
In essence does he large number of dynamic pages change your viewpoint on the solution. Some of the pages are based on date and time queries which may not even need to be cached if they expire
many thanks
I have read your answer and this seems to rely on the queries being via 'get' queries. i dont have a problem with get queries (I know some people who are stcicklers for post() but If I were to consider the large number of input variables variable the query strong and their range of values, the maths to make the number of page permutations gives a large number quite large.
very clever for considering suffixes and prefixes or even plain old plurals 0 I hadn't conditioner then yet,
Does the large number of dynamic pages that can be from query string permutation change your solution. I will re-read the cache article but I don't know when you will be around so I am asking the best I can from what I have gleaned so are. I don't know whether this is correct design principles but I thought I might wireframe the site first. Most people i know says this is unnecessary (true ontologists that want truly independent code would call this interaction problem- when you write something form the standpoint of what it must do then you are not writing truly independent code).
I disagree with this but I have not been eloquent enough on my feet to say why. Even if its something as simple as you need to know what ia project does to make sure you have all bases covered.
In essence does he large number of dynamic pages change your viewpoint on the solution. Some of the pages are based on date and time queries which may not even need to be cached if they expire
many thanks
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thanks Ray. I haven't used rest for about 5 years. I had not heard that we use GET to acquire and post to change. Makes sense though.
iwas indeed wireframing my site as I reloaded this page :)
I've not used CRC cards for wireframing - i use a diagrammatic tool but I will give these a g
I've only ever done 00 in Java and .net - not php. I don't need OO explaining but what would you use objects for? Map an table to an object? I use an OR mapper called NotORM which is very basic and I can use with little learning curve.
Regarding budget - it's a personal project which I think my community needs so I am writing it. I can set the server how I want.
iwas indeed wireframing my site as I reloaded this page :)
I've not used CRC cards for wireframing - i use a diagrammatic tool but I will give these a g
I've only ever done 00 in Java and .net - not php. I don't need OO explaining but what would you use objects for? Map an table to an object? I use an OR mapper called NotORM which is very basic and I can use with little learning curve.
Regarding budget - it's a personal project which I think my community needs so I am writing it. I can set the server how I want.
If you have OO experience in Java, you will pick up the PHP part of things very quickly.
http://php.net/manual/en/language.oop5.php
If you are willing to consider a framework (and I sincerely hope you will do so), please consider Laravel. It's mature, but currently maintained and very well thought-of in the PHP community. Basically, there is Laravel, and there is everything else. You might also take a look at Slim, but I would probably choose Laravel or Lumen.
https://laravel.com/docs/5.4
Laravel has its own ORM, called "Eloquent." Also mature, maintained currently, and very well loved by the community.
https://laravel.com/docs/5.4/eloquent
I would say, at a high level, these would probably be my important objects:
1. a Description in its original form
2. a Term and its associated Glossary information
3. a Description after the Terms are defined (and it's in the final form for presentation to the client)
http://php.net/manual/en/language.oop5.php
If you are willing to consider a framework (and I sincerely hope you will do so), please consider Laravel. It's mature, but currently maintained and very well thought-of in the PHP community. Basically, there is Laravel, and there is everything else. You might also take a look at Slim, but I would probably choose Laravel or Lumen.
https://laravel.com/docs/5.4
Laravel has its own ORM, called "Eloquent." Also mature, maintained currently, and very well loved by the community.
https://laravel.com/docs/5.4/eloquent
I would say, at a high level, these would probably be my important objects:
1. a Description in its original form
2. a Term and its associated Glossary information
3. a Description after the Terms are defined (and it's in the final form for presentation to the client)
Erm, I just read the 2014 StackOverflow question. Don't do that! But please post back if you still have any questions.
ASKER
Extremely thorough answer. Very helpful to a programmer working on her own without a team to ask questions to. My sincere thanks
You're serving up mostly static pages - information for readers. The design should use GET-method requests with all the request variables in the URL. This lends itself well to cache. Cached pages mitigate all of the issues associated with long-running database queries. This article explains a bit more about the concept.
https://www.experts-exchange.com/articles/18437/Improving-Web-Site-Performance-via-PHP-Cache.html
So here is the design of the long-running job that builds each web page. Take the plain text, break it into words, and for each "important" word, try a lookup in the glossary. If there is a match in the glossary, replace the word with the appropriate links, popups, etc. The unimportant words are often called "stop words" and there is an example below. When the modified text is complete, generate the web page which will automatically get cached, according to the design in the article. Use a long-term cache duration (perhaps weeks).
Now let's say you've added to the glossary or created a new description and you need to rebuild the web pages to reflect the new content. Run a script that flushes the cache. Any new HTTP request that comes in will cause the web page to be rebuilt and the latest version will be cached. So you want to have a handy script that contains all of the links to the documents in the web site. Immediately after you flush the cache, visit each of those hyperlinks with file_get_contents(). Doing so will cause the page to get rebuilt and stored in the cache.
I've used this design for a church web site that had dozens of dynamic elements on the home page - bible readings, announcements, sermon information, music information, etc. The page itself took several seconds to generate as it gathered information from databases, files and APIs. But with the cache, it always produced sub-second response. You might consider a CRON job that will flush the cache and then visit all of the web pages in your site. You could run this CRON at an hour when it's unlikely that there will be a lot of visitors.
Here's an example that illustrates stop words (see $exclusions).
Open in new window