Link to home
Start Free TrialLog in
Avatar of chunky_uk
chunky_uk

asked on

Regular expression for UK postcode.

Hi,
Does anyone have a current regular expression for checking UK postcodes.  I was using:

(GIR 0AA|[A-PR-UWYZ]([0-9]{1,2}|([A-HK-Y][0-9]|[A-HK-Y][0-9]([0-9]|[ABEHMNPRV-Y]))|[0-9][A-HJKPS-UW]) [0-9][ABD-HJLNP-UW-Z]{2})

but according to here http://www.cabinetoffice.gov.uk/govtalk/schemasstandards/e-gif/datastandards/address/postcode.aspx this is now out of date?

Thanks,

C
Avatar of Rajkumar Gs
Rajkumar Gs
Flag of India image


From http://www.regxlib.com/REDetails.aspx?regexp_id=260
^([A-PR-UWYZ0-9][A-HK-Y0-9][AEHMNPRTVXY0-9]?[ABEHMNPRVWXY0-9]? {1,2}[0-9][ABD-HJLN-UW-Z]{2}|GIR 0AA)$

---------------------------------------------------------------------------------------------
A regular expression is given in the comments of the schema, which implements full checking of all the stated BS 7666 postcode format rules. That regular expression can be restated as a "traditional" regular expression:

(GIR 0AA|[A-PR-UWYZ]([0-9]{1,2}|([A-HK-Y][0-9]|[A-HK-Y][0-9]([0-9]|[ABEHMNPRV-Y]))|[0-9][A-HJKPS-UW]) [0-9][ABD-HJLNP-UW-Z]{2})

British Forces Post Office postcodes do not follow the BS 7666 rules, but have the format "BFPO NNN" or "BFPO c/o NNN", where NNN is 1 to 4 numerical digits. A regular expression to implement the BS 7666 rules:[45]

(GIR 0AA)|((([A-Z-[QVX]][0-9][0-9]?)|(([A-Z-[QVX]][A-Z-[IJZ]][0-9][0-9]?)|(([A-Z-[QVX]][0-9][A-HJKSTUW])|([A-Z-[QVX]][A-Z-[IJZ]][0-9][ABEHMNPRVWXY])))) [0-9][A-Z-[CIKMOV]]{2})

Alternative short regular expression from BS7666 Schema is:

[A-Z]{1,2}[0-9R][0-9A-Z]? [0-9][ABD-HJLNP-UW-Z]{2}

Courtesy:- http://en.wikipedia.org/wiki/Postcodes_in_the_United_Kingdom


Raj






Avatar of chunky_uk
chunky_uk

ASKER

Hi Raj - I quoted that regular expression in my question, it's out of date.

Thanks,

C
Yes. I know you are looking for updated regular expression of UK postcode.

I googled and got some different UK postcode regular experssions - that I posted above.

Did you try those two ?

Thanks
Raj
Avatar of Chris Bottomley
HOw about:

(GIR 0AA|(([A-PR-UWYZ]([0-9]([0-9]|[A-HJKS-UW]){0,1}|[A-HK-Y]([0-9]([0-9]|[ABEHMNPRV-Y]){0,1})))) [0-9][ABD-JLNP-UW-X]{2})

Seems ok on my test.

Chris
ASKER CERTIFIED SOLUTION
Avatar of Chris Bottomley
Chris Bottomley
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I settled on this as a solution: -

^([A-PR-UWYZ][A-HK-Y0-9][A-HJKS-UW0-9]?[ABEHMNPRVWXY0-9]?{1,2} [0-9][ABD-HJLNP-UW-Z]{2})$
Hi chunky,

Congrats to figure out the solution :-)

Raj
With the solution I posted I tried to match up to the provided reference.  For example therefore the Girobank traditional postcode.  That aside the proposed self solution has not addressed why the proposal by me was wrong and out of interest the author solution passes the invalid  posrtcode whereas mine does not:

AAAA 1AA

I would appreciate some guidance as to why my solution is rejected and the authors should be accepted.

Chris
Hi Chris - thanks for your solution.  The truth is I ran out of time with this, I was just posting my solution to close the thread.  I'll give your expression a go today and update accordingly.

Thanks.
With reference to the supplied link of http://www.cabinetoffice.gov.uk/govtalk/schemasstandards/e-gif/datastandards/address/postcode.aspx the regex I supplied in http:#32353186 was tested by me to check it met all the examples.

I tested the suthor supplied solution from http:#32631157 and noted it passed an invalid sample of AAAA 1AA.

Given the author comment that they did not have time to test the provided regex from me and then sought to close the question based on a flawed solution of their own over 4 days later I do not believe the author explanation makes sense.

There being no indication that my prior post is incorrect I believe the only correct course is to accept my own post of http:#32353186

Chris
Apologies for the confusion and delay in closing this, thanks for the solution Chris...
Sorry I was difficult over the closure but i'm glad you have a solution that meets your needs and I hope I didn't offend you in the process.

Chris
Hi Chris,
A couple of defects have arisen with this expression, now that we've had time to test fully.

These are: -

1. It's possible to enter I, J or Z in the second position (e.g. KI1 8SH), this should not be allowed.
2. The only letters to appear in the fourth position are A, B, E, H, M, N, P, R, V, W, X and Y, in fact only I,L,X and Z are excluded from the fourth position.

Any ideas?

Thanks...
Initially I would think the problem comes with word boundaries so see if:

(GIR 0AA|(\b([A-PR-UWYZ]([0-9]([0-9]|[A-HJKS-UW]){0,1}|[A-HK-Y]([0-9]([0-9]|[ABEHMNPRV-Y]){0,1})))) [0-9][ABD-HJLNP-UW-X]{2}\b)

Resolves both issues.

Chris
Hi Chris - no that didn't work, all postcodes fail now with the \b added?
SLight change in flavour then as I am unfamiliar with the anchors in oracle - I would expect however the previous regex would also fail similarly:

(GIR 0AA|(\m([A-PR-UWYZ]([0-9]([0-9]|[A-HJKS-UW]){0,1}|[A-HK-Y]([0-9]([0-9]|[ABEHMNPRV-Y]){0,1})))) [0-9][ABD-HJLNP-UW-X]{2}$)

Chris
Hi Chris, no same result, \b is the word boundary anchor in Oracle though?
I've just found an Oracle reference and \b is definitely valid syntax for word boundary so:

(GIR 0AA|(\b([A-PR-UWYZ]([0-9]([0-9]|[A-HJKS-UW]){0,1}|[A-HK-Y]([0-9]([0-9]|[ABEHMNPRV-Y]){0,1})))) [0-9][ABD-HJLNP-UW-X]{2}\b)

Should have worked and i'm afraid I can't do any more therefore.

Chris
I have just tried the original regex in your initial post and as expected that also fails the same way, which is as I surmised.

Chris
Hi Chris,
Indeed, I was aware mine didn't fully meet the new requirements, but no idea why the expression you provided, doesn't meet the 2 requirements I mentioned above (sorry I actually meant to quote the 3 position, not the 4th position)?

    * The letters I, J and Z are not used in the second position.
    * The only letters to appear in the third position are A, B, C, D, E, F, G, H, J, K, S, T, U and W.

So EC1I 8SH is allowed when it shouldn't be?

Thanks...
The supplied change does not accept the EC1I 8SH structure either so still should do the job.  GIven the validity of the \b token for word boundary there is no logic to the failure.  Are you sure therefore there are no additional codes anywhere therein?

Chris
Do you mean that: -

(GIR 0AA|(\b([A-PR-UWYZ]([0-9]([0-9]|[A-HJKS-UW]){0,1}|[A-HK-Y]([0-9]([0-9]|[ABEHMNPRV-Y]){0,1})))) [0-9][ABD-HJLNP-UW-X]{2}\b)

satisfies: -

    * The letters I, J and Z are not used in the second position.
    * The only letters to appear in the third position are A, B, C, D, E, F, G, H, J, K, S, T, U and W.

in your non-Oracle environment?  The expression above runs OK in my Oracle environment, but doesn't validate any postcodes?

Thanks..
SOme revision later, it looks as though there is no word boundary and I haven't been able to create it by a group either.

Is there anything else that can be used ... i.e. will there always be a space before the postcode for example

Chris
Hi,
The postcode is entered through a web form, and then passed into a PL/SQL procedure to be validated, so I could append e.g. $ before and after the postcode before it is validated, would that help?
In that case assuming no extraneous characters try the following which anchors to the line start and end:

^(GIR 0AA|(([A-PR-UWYZ]([0-9]([0-9]|[A-HJKS-UW]){0,1}|[A-HK-Y]([0-9]([0-9]|[ABEHMNPRV-Y]){0,1})))) [0-9][ABD-HJLNP-UW-X]{2})$

Chris
If not a dollar prefix and suffix would be:

\$(GIR 0AA|(([A-PR-UWYZ]([0-9]([0-9]|[A-HJKS-UW]){0,1}|[A-HK-Y]([0-9]([0-9]|[ABEHMNPRV-Y]){0,1})))) [0-9][ABD-HJLNP-UW-X]{2})\$

Chris
Hi Chris,
Ah so close!  Thought that was it cracked, I'm using the \$, but this postcode is still accepted: -

EC1X 8SH
Looks to be just that 3rd character position that is not quite right now?
As far as I can see from the reference document EC1X 8SH is valid

Chris
Hi,
No it is a little confusing, the way they refer to 3rd/4th position.

This is the rule that I think is being broken: -

* The only letters to appear in the third position are A, B, C, D, E, F, G, H, J, K, S, T, U and W.

By third, they mean the third character position (e.g. EC1A 1BB).  So EC1X 8SH should fail?
To me froom teh spec and common expectation:

AANA NAA EC1A 1BB

i.e. a numeric is an option for the third 'character' therefore a numeric is the only option for the third character when the first 'component has 4 characters and the third letter limitation only applies to postcodes with three letters in the first group, (W1A 1HQ).

Chris
Yes that does make sense, thanks so much for your help :o)