Start Free Trial

asked on

Regular expression for UK postcode.

Hi,
Does anyone have a current regular expression for checking UK postcodes. I was using:

(GIR 0AA|[A-PR-UWYZ]([0-9]{1,2}|([A-HK-Y][0-9]|[A-HK-Y][0-9]([0-9]|[ABEHMNPRV-Y]))|[0-9][A-HJKPS-UW]) [0-9][ABD-HJLNP-UW-Z]{2})

but according to here http://www.cabinetoffice.gov.uk/govtalk/schemasstandards/e-gif/datastandards/address/postcode.aspx this is now out of date?

Thanks,

C

From http://www.regxlib.com/REDetails.aspx?regexp_id=260
^([A-PR-UWYZ0-9][A-HK-Y0-9][AEHMNPRTVXY0-9]?[ABEHMNPRVWXY0-9]? {1,2}[0-9][ABD-HJLN-UW-Z]{2}|GIR 0AA)$

---------------------------------------------------------------------------------------------
A regular expression is given in the comments of the schema, which implements full checking of all the stated BS 7666 postcode format rules. That regular expression can be restated as a "traditional" regular expression:

(GIR 0AA|[A-PR-UWYZ]([0-9]{1,2}|([A-HK-Y][0-9]|[A-HK-Y][0-9]([0-9]|[ABEHMNPRV-Y]))|[0-9][A-HJKPS-UW]) [0-9][ABD-HJLNP-UW-Z]{2})

British Forces Post Office postcodes do not follow the BS 7666 rules, but have the format "BFPO NNN" or "BFPO c/o NNN", where NNN is 1 to 4 numerical digits. A regular expression to implement the BS 7666 rules:[45]

(GIR 0AA)|((([A-Z-[QVX]][0-9][0-9]?)|(([A-Z-[QVX]][A-Z-[IJZ]][0-9][0-9]?)|(([A-Z-[QVX]][0-9][A-HJKSTUW])|([A-Z-[QVX]][A-Z-[IJZ]][0-9][ABEHMNPRVWXY])))) [0-9][A-Z-[CIKMOV]]{2})

Alternative short regular expression from BS7666 Schema is:

[A-Z]{1,2}[0-9R][0-9A-Z]? [0-9][ABD-HJLNP-UW-Z]{2}

Courtesy:- http://en.wikipedia.org/wiki/Postcodes_in_the_United_Kingdom

Raj

ASKER

Hi Raj - I quoted that regular expression in my question, it's out of date.

Thanks,

C

Yes. I know you are looking for updated regular expression of UK postcode.

I googled and got some different UK postcode regular experssions - that I posted above.

Did you try those two ?

Thanks
Raj

Chris Bottomley

HOw about:

(GIR 0AA|(([A-PR-UWYZ]([0-9]([0-9]|[A-HJKS-UW]){0,1}|[A-HK-Y]([0-9]([0-9]|[ABEHMNPRV-Y]){0,1})))) [0-9][ABD-JLNP-UW-X]{2})

Seems ok on my test.

Chris

ASKER CERTIFIED SOLUTION

Chris Bottomley

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

ASKER

I settled on this as a solution: -

^([A-PR-UWYZ][A-HK-Y0-9][A-HJKS-UW0-9]?[ABEHMNPRVWXY0-9]?{1,2} [0-9][ABD-HJLNP-UW-Z]{2})$

Hi chunky,

Congrats to figure out the solution :-)

Raj

Chris Bottomley

With the solution I posted I tried to match up to the provided reference. For example therefore the Girobank traditional postcode. That aside the proposed self solution has not addressed why the proposal by me was wrong and out of interest the author solution passes the invalid posrtcode whereas mine does not:

AAAA 1AA

I would appreciate some guidance as to why my solution is rejected and the authors should be accepted.

Chris

ASKER

Hi Chris - thanks for your solution. The truth is I ran out of time with this, I was just posting my solution to close the thread. I'll give your expression a go today and update accordingly.

Thanks.

Chris Bottomley

With reference to the supplied link of http://www.cabinetoffice.gov.uk/govtalk/schemasstandards/e-gif/datastandards/address/postcode.aspx the regex I supplied in http:#32353186 was tested by me to check it met all the examples.

I tested the suthor supplied solution from http:#32631157 and noted it passed an invalid sample of AAAA 1AA.

Given the author comment that they did not have time to test the provided regex from me and then sought to close the question based on a flawed solution of their own over 4 days later I do not believe the author explanation makes sense.

There being no indication that my prior post is incorrect I believe the only correct course is to accept my own post of http:#32353186

Chris

ASKER

Apologies for the confusion and delay in closing this, thanks for the solution Chris...

Chris Bottomley

Sorry I was difficult over the closure but i'm glad you have a solution that meets your needs and I hope I didn't offend you in the process.

Chris

ASKER

Hi Chris,
A couple of defects have arisen with this expression, now that we've had time to test fully.

These are: -

1. It's possible to enter I, J or Z in the second position (e.g. KI1 8SH), this should not be allowed.
2. The only letters to appear in the fourth position are A, B, E, H, M, N, P, R, V, W, X and Y, in fact only I,L,X and Z are excluded from the fourth position.

Any ideas?

Thanks...

Chris Bottomley

Initially I would think the problem comes with word boundaries so see if:

(GIR 0AA|(\b([A-PR-UWYZ]([0-9]([0-9]|[A-HJKS-UW]){0,1}|[A-HK-Y]([0-9]([0-9]|[ABEHMNPRV-Y]){0,1})))) [0-9][ABD-HJLNP-UW-X]{2}\b)

Resolves both issues.

Chris

ASKER

Hi Chris - no that didn't work, all postcodes fail now with the \b added?

Chris Bottomley

SLight change in flavour then as I am unfamiliar with the anchors in oracle - I would expect however the previous regex would also fail similarly:

(GIR 0AA|(\m([A-PR-UWYZ]([0-9]([0-9]|[A-HJKS-UW]){0,1}|[A-HK-Y]([0-9]([0-9]|[ABEHMNPRV-Y]){0,1})))) [0-9][ABD-HJLNP-UW-X]{2}$)

Chris

ASKER

Hi Chris, no same result, \b is the word boundary anchor in Oracle though?

Chris Bottomley

I've just found an Oracle reference and \b is definitely valid syntax for word boundary so:

(GIR 0AA|(\b([A-PR-UWYZ]([0-9]([0-9]|[A-HJKS-UW]){0,1}|[A-HK-Y]([0-9]([0-9]|[ABEHMNPRV-Y]){0,1})))) [0-9][ABD-HJLNP-UW-X]{2}\b)

Should have worked and i'm afraid I can't do any more therefore.

Chris

Chris Bottomley

I have just tried the original regex in your initial post and as expected that also fails the same way, which is as I surmised.

Chris

ASKER

Hi Chris,
Indeed, I was aware mine didn't fully meet the new requirements, but no idea why the expression you provided, doesn't meet the 2 requirements I mentioned above (sorry I actually meant to quote the 3 position, not the 4th position)?

* The letters I, J and Z are not used in the second position.
* The only letters to appear in the third position are A, B, C, D, E, F, G, H, J, K, S, T, U and W.

So EC1I 8SH is allowed when it shouldn't be?

Thanks...

Chris Bottomley

The supplied change does not accept the EC1I 8SH structure either so still should do the job. GIven the validity of the \b token for word boundary there is no logic to the failure. Are you sure therefore there are no additional codes anywhere therein?

Chris

ASKER

Do you mean that: -

(GIR 0AA|(\b([A-PR-UWYZ]([0-9]([0-9]|[A-HJKS-UW]){0,1}|[A-HK-Y]([0-9]([0-9]|[ABEHMNPRV-Y]){0,1})))) [0-9][ABD-HJLNP-UW-X]{2}\b)

satisfies: -

* The letters I, J and Z are not used in the second position.
* The only letters to appear in the third position are A, B, C, D, E, F, G, H, J, K, S, T, U and W.

in your non-Oracle environment? The expression above runs OK in my Oracle environment, but doesn't validate any postcodes?

Thanks..

Chris Bottomley

SOme revision later, it looks as though there is no word boundary and I haven't been able to create it by a group either.

Is there anything else that can be used ... i.e. will there always be a space before the postcode for example

Chris

ASKER

Hi,
The postcode is entered through a web form, and then passed into a PL/SQL procedure to be validated, so I could append e.g. $ before and after the postcode before it is validated, would that help?

Chris Bottomley

In that case assuming no extraneous characters try the following which anchors to the line start and end:

^(GIR 0AA|(([A-PR-UWYZ]([0-9]([0-9]|[A-HJKS-UW]){0,1}|[A-HK-Y]([0-9]([0-9]|[ABEHMNPRV-Y]){0,1})))) [0-9][ABD-HJLNP-UW-X]{2})$

Chris

Chris Bottomley

If not a dollar prefix and suffix would be:

\$(GIR 0AA|(([A-PR-UWYZ]([0-9]([0-9]|[A-HJKS-UW]){0,1}|[A-HK-Y]([0-9]([0-9]|[ABEHMNPRV-Y]){0,1})))) [0-9][ABD-HJLNP-UW-X]{2})\$

Chris

ASKER

Hi Chris,
Ah so close! Thought that was it cracked, I'm using the \$, but this postcode is still accepted: -

EC1X 8SH

ASKER

Looks to be just that 3rd character position that is not quite right now?

Chris Bottomley

As far as I can see from the reference document EC1X 8SH is valid

Chris

ASKER

Hi,
No it is a little confusing, the way they refer to 3rd/4th position.

This is the rule that I think is being broken: -

* The only letters to appear in the third position are A, B, C, D, E, F, G, H, J, K, S, T, U and W.

By third, they mean the third character position (e.g. EC1A 1BB). So EC1X 8SH should fail?

Chris Bottomley

To me froom teh spec and common expectation:

AANA NAA EC1A 1BB

i.e. a numeric is an option for the third 'character' therefore a numeric is the only option for the third character when the first 'component has 4 characters and the third letter limitation only applies to postcodes with three letters in the first group, (W1A 1HQ).

Chris

ASKER

Yes that does make sense, thanks so much for your help :o)