Approach to mitigate operator input slackness or apparent extraneous search term constructs.

How much effort do you Experts think should go into defensively 'protecting' a text input field (which will be used to obtain key search terms for a text search) from (possibly) irrelevant input from the operator, such as too many space characters, punctuation and other tokens which might not be helpful to an effective search strategy ? I'm relatively ok with implementing methods that are considered helpful, so this is not a request for coding assistance, but rather about approach and real-life search term management and oversight. (I suppose another way of putting the same point it that I wish to avoid second-guessing the inputter whose attempts may well be valid, but look odd in plain discrete standalone terms. Thanks for any suggestions along these lines.
LVL 17
krakatoaAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

CEHJCommented:
I would always assume the worst from a user. That doesn't mean sanitize the input at entry because if you 'ban' certain characters they might tell you the gui is 'broken'. You can always remove garbage after entry

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
krakatoaAuthor Commented:
Yes, your observations are, naturally, correct.

Part of the "problem" (which I admit to having created myself of course), is that the post-entry correction route - which would be the best path - is complicated by the fact that I have a Listener on the input field which reflects the lexical correctness of the input, marrying it to the number of words that have been entered. So, if the user inputs "Hello World", the Listener picks up on the fact that there are two words - 'in real time' as it were. This is arrived at from a split() on the text, based on space characters. So if the user inadvertently inputs more than one space between words, (terms), the Listener concludes more than the correct number of words. Of course, a later correction can be made, but it is the on-the-spot word count that I'm hoping to reflect more accurately, if that makes any sense.
CEHJCommented:
Well splitting on the pattern " +" would obviate it if it's being done by regex
Big Business Goals? Which KPIs Will Help You

The most successful MSPs rely on metrics – known as key performance indicators (KPIs) – for making informed decisions that help their businesses thrive, rather than just survive. This eBook provides an overview of the most important KPIs used by top MSPs.

krakatoaAuthor Commented:
I had that in there. But I have perhaps overlaid it with some other process. Will check again.
krakatoaAuthor Commented:
I think I'm there with it now. Not entirely just a question of removing white space, but I have a working result - albeit in an ugly exception handler. But I can work on that angle.
CEHJCommented:
:)
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Programming

From novice to tech pro — start learning today.