Hi I need to test meta data keys for valid
These Formats are valid
geo.0123.lowercaseword
AB.lowercaseword.lowercaseword
AB.lowercaseword.123
ABC.lowercaseword123
ABC.lowercaseword.lowercaseword
exception(s) or special cases where above rules don't apply
not valid
ab.UPPERcaseword123
AD.not132valid
so unless geo or exception
2 or 3 uppercase letters DOT lowercaseword DOT or numbers
for my $metaKey (sort keys $metaHash){ next if($metaKey =~ m/(exceptions|list/); if($metaKey =~ m/^([A-Z]{2,3}\.[a-z0-9|\.]|geo\.\d.*[a-z])$/{ print "$metaKey good\n"; } else{ print "$metaKey bad\n"; }}
I suspect I'll need to do this in 2 goes as the exceptions list might be quite long but if the $metaKey is not in this list or it's the wrong format it's bad
I'm assuming a hash key is case sensitive
so "HashKey" is not the same as "hashkey" and 1 doesn't overwrite the other?
Perl
Last Comment
trevor1940
8/22/2022 - Mon
wilcoxon
I think this will do what you want. I changed the geo portion slightly to do better checking based on the example - if other geo cases are valid, you can change it back - the only critical change to that portion was adding + after [a-z]. Most of your problem was due to anchoring the regex to the end of the string ($) but not giving full string matching in the regex (I prefer full validation so changed the match rather than remove the end-of-string anchor).
for my $metaKey (sort keys $metaHash){ next if($metaKey =~ m/(exceptions|list/); if ($metaKey =~ m{^(?:[A-Z]{2,3}\.[a-z]+(?:[0-9]+|\.[a-z]+)|geo\.\d+\.[a-z]+)$}) { print "$metaKey good\n"; } else{ print "$metaKey bad\n"; }}
Has I suspected the list of exceptions is growing so having it all in 1 REGEX will work but it makes it un readable
ozo
geo.lowercaseword is already covered by "so unless geo or exception"
But if ABC.lowercaseword is valid, that could simplify it to /geo|exception|^[A-Z]{2,3}\.[a-z]+\.?(\d+|[a-z]+)$/
Is there anything containing "geo" or "exception" that is not valid?
trevor1940
ASKER
Is there anything containing "geo" or "exception" that is not valid?
"exception" is going to be an hard coded list of valid metadata that doesn't fit the format in the opening remarks most are single words so to answer your question no!
Hence the reason to treat these separately and simple skip over using next
in theory every thing starting with lower case "geo" should also be valid as long as every thing after the first dot is either [a-z] or [0-9] or \.
So for simplicity I may end up treating these separately
ozo
in theory every thing starting with lower case "geo" should also be valid as long as every thing after the first dot is either [a-z] or [0-9] or \.
That's a little different from how I originally interpreted the problem.
Is "every thing after the first dot is either [a-z] or [0-9] or \." meant to be the same condition as "DOT lowercaseword DOT or numbers"
or are there things that are valid following ^geo but not after ^[A-Z]{2,3}?
If the latter, then you might use
/exception|^(geo[^.]*(\.[a-z\d.]*)?|[A-Z]{2,3}\.[a-z]+\.?(\d+|[a-z]+))$/
Unless "every thing after the first dot is either [a-z] or [0-9] or \." is meant to imply that it is required that a first dot is present in which case
/exception|^(geo[^.]*\.[a-z\d.]*|[A-Z]{2,3}\.[a-z]+\.?(\d+|[a-z]+))$/,
could suffice
Although you only stated restrictions on what follows the first dot, and did not state any restrictions on what can be between "geo" and the first dot, in your valid examples there is nothing between "geo" and the first dot. Should that be a requirement?
You say
geo.datum = Transvers mercator
is valid, but " = Transvers mercator" is after the first dot, and is not either [a-z] or [0-9] or \.
Should " = .*" be added to the list of every thing after the first dot?
When you have
AB.docid and ab.docid or ab.docID etc.
and you need to locate the second 2 and remove them, should the result be
AB.docid and or etc.
Or do you mean that whenever the beginning of a key looks valid except for some extra stuff following, everything after the valid first part should be removed?
Do you want to remove invalid keys and leave valid keys instead of just printing "good" after valid keys and "bad" after invalid keys?
If you change an invalid key into a valid key by removing invalid parts, what should happen if that key then collides with another valid key? Should the data associated with both keys be combined somehow?
trevor1940
ASKER
Wow
"geo" is not in the list of exceptions
There is nothing between "geo" and the first DOT the same with the other valid examples so only AB | KLM |XYZ | geo should be before the first DOT
"geo.datum = Transvers mercator" was an example of a valid key value pair I apologize for the confusion
I am still trying to establish what to do with the non valid metadata
The %metaHash is generated from an SQL query my hunch is once I've worked out what is 'good' and what isn't valid 'bad' I will run update script to remove the 'bad' from the hash and the database
I am also still trying to establish what to do when %metaHash keys clash this is being discussed by people above my pay scale
Combining the key values and updating the database is 1 option my recommendation is going to be if the key isn't valid the probability is the value will also be invalid however this may not work with every metadata key value pair
For now I'm just trying to establish the good from the bad
I'm hopping for a list of 'bad' meta keys I can then do a visual check of the values to see if these are right or wrong
Open in new window