I want to check an input email address in a MySQL table to see if a form input of an email address exists or not. I want to include in "exists" items either in the $_POST['var'] or the existing MySQL table that have either leading or trailing spaces.
Does this do the trick?
$qry = "SELECT * from users where TRIM(email) = '" . trim($_POST['email']) . "'";
If I get a match (meaning the email is already there with or without leading or trailing spaces, the logic will go path A, otherwise, path B.
Will this do what I want?
The index performance issue will occur only if there is an index on that column, right?
...(and it can be a good idea to also standardize the case, like make all of the values lowercase)...You haven't stated a reason for that. Why is it a good idea? SQL is usually case insensitive when SELECTing with WHERE, right?
In regards to your first question, yes, you would need an index to gain the benefits of that index. If a query can't find an index to use, it'll be the same as not having an index on that column at all.
There is also a point at which MySQL will deem it faster not to use an index even if one is available. For example, if the table only had a hundred records, it's faster to do a full table scan than incur the overhead of loading the index, but that's not something you typically have to worry about unless you're doing micro-optimizations and know the data don't grow.
>> "SQL is usually case insensitive "
> "ONLY when the table collation is case insensitive. There are case sensitive versions of all the collations I believe."
You may be right, which is why I said "usually". It's not likely to happen by default.
> "...I'll add that it's always a good idea to standardize data formats when there's no negative impact."
I understand that depending on the mail server, it is possible to have a case sensitive email address, so if the real address contained capitals, and you forced it to lower case, then that could create a problem. But I haven't encounted this issue yet, and it's probably very rare, e.g. when someone has their own dedicated & (badly?) customised mail server, but if there's one in a million, and you have a million rows, then you could easily hit one.
> "You never know when you might have to integrate with some other 3rd party system that might be case-sensitive, or display the value in a clean fashion (e.g. a report showing email addresses in mixed cases looks ugly)."
In the rare case that integrating with a case sensitive system happened, then the issue could be dealt with by standardising all data then (as you've suggested), or by forcing a case-insensitive compare, I guess.
Whether it's ugly is subjective, as not everyone would consider it to be, especially the people who supplied their address in mixed case, and it can help with reading people's names from their email addresses.
> "In regards to your first question, yes, you would need an index to gain the benefits of that index."
I was meaning the performance degredation caused by TRIMming the column, as a result of not being able to use an index. If there's no index on the column, then there's no degradation from not being able to use the index on the column, since the index doesn't exist. (Though there'd be a slight degradation from the fact the TRIM is being applied to each row that needs to be matched against.) But let's make my question rhetorical, at this point.
> "There is also a point at which MySQL will deem it faster not to use an index even if one is available."
True, and I've heard the threshold is in the thousands of rows for some DBMSs.
> when they supplied their address, they may not be happy
I suppose that's possible but I've never seen it. In my experience (and I do this stuff -all- the time), people are more likely to forget they even registered somewhere than remember the letter case in which they provided their email address. It may also be possible that some people do notice and feel upset but rarely ever speak up about it. -shrug-
Overall, I think the rest of the email convo is a moot point because it's impossible to claim either case with 100% certainty. For over 10 years, I've worked daily with the contact databases of some of the largest companies in the world, and I haven't yet seen a public case-sensitive email address, but it could just as easily be an anecdotal fallacy on my part.
> In the rare case that integrating with a case sensitive system happened...
This is actually somewhat common with SSO integrations. You'll often have a SAML assertion subject that is some kind of unique identifier and it has to match a case-sensitive user ID of some kind. Several of the bigger IdPs will provide data transformation rules for this purpose (to your point of standardizing at runtime).
In regards to the performance degradation, I'm not sure I follow what you're saying.
Let's say you have a table with 100,000 rows, and no indexes at all. If you perform a query against a column, that is your base performance. It's not "degraded" per se, it's just not optimized.
If you performed a query against a function applied to the same column, then you are experiencing degraded performance, because the overhead of that function being applied to all the rows is now bringing the performance lower than the base performance.
If you added a regular index to the column, then it (generally) speeds up the queries that are based on that column because now MySQL might only have to review a small handful of records out of 100,000 in order to find the 1 record you're looking for.
If you're on MySQL 8 or higher and have a functional index, then your query when using that combination (function applied to the column) will be sped up like the regular index situation above. The downside is that if you want to do a query against that column without the function applied, the functional index won't take effect. So you'd need to have a second, non-functional index for that purpose. Given that you only get 64 indexes per table on typical MySQL engine tables, and each index includes a slight performance hit to data inserts, updates, and deletes, the functional index is often a poor choice.