Link to home
Start Free TrialLog in
Avatar of Brendan
BrendanFlag for United Kingdom of Great Britain and Northern Ireland

asked on

Regular Expression Question

Hi,

I'm using OCR to translate a bunch of bills to text format, what i want to do is then transfer some of the data from the text outputs into a database.

I'm struggling with the reg exp part of it for identifying a payment was made:

Payment Voucher - Cheque                0.00       19.97        0.00
is the text and i want to extract the middle value (in this case 19.97)

As it can be payment voucher - S/O or many other things i can really only work on the word 'Payment'

How can i create a regexp to identify a row with the word payment then return the second integer it comes across (i.e. 19.97) in this case?

Cheers

Brendan
Avatar of ozo
ozo
Flag of United States of America image

Payment.*?\d\s+(\d+\.\d+)
Avatar of Brendan

ASKER

Hi Ozo,

When i try that on regexr.com it brings back the full line, whereas i only want it to return '19.97'
Inspect capture group 1 from ozo's pattern. It container the 2nd value from the original string.
Avatar of Brendan

ASKER

Hi käµfm³d,

Can you explain what you mean?

Cheers

Brendan
What tool are you using to execute this regular expression? Is it a text editor? A programming language?
Avatar of Brendan

ASKER

Ive tested it on regexr.com (theres an editor you can use on that website to test regular expression) think its written in Java.

Cheers

Brendan
ASKER CERTIFIED SOLUTION
Avatar of kaufmed
kaufmed
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hi,

Try

Search : Payment.*?\d+\.\d+.*?(\d+\.\d+).*?\d+\.\d+
Replace: \1

Thanks,
Shail
Hi,

Sorry, please ignore my answer.

Thanks,
Shail