Regular Expression used in Apache Nifi not processing correctly. Please advise.

Techies, Can someone isolate where I'm dropping the ball on getting the regex matches I'm expecting in this dataflow? My goal is to move the matched versions over to 1 kafka topic and  the unmatched over to another kafka topic.  Attached is the client.csv test file


Here's what the data flow looks like--with the regex used in the ExtractText config. NiFi uses Java's version of regular expressions.  

ExtractFileProcessorNiFiwithRegexclient.csv
Paula DiTalloIntegration developerAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

David Johnson, CD, MVPOwnerCommented:
name will never match 0-9, if balance is NA then it also will not match 0-9
0
Terry WoodsIT GuruCommented:
I don't know how the tool works, but I would suggest trying the following configuration:
balance:
.*?,[0-9]+,([0-9]+(?:\.[0-9]+)?)
card:
.*?,([0-9]+),[0-9]+(?:\.[0-9]+)?
name:
(.*?),[0-9]+,[0-9]+(?:\.[0-9]+)?

Open in new window


Adding ?: just inside the round brackets used for the decimal place of the balance should make it non-capturing, in case that is causing a problem.
Additionally, I've added a + after the [0-9] used to capture the pre-decimal-point part of the balance, to allow for balances of more than 1 digit.

It may also look much tidier if you used \d in place of each [0-9] like this:

balance:
.*?,\d+,(\d+(?:\.\d+)?)
card:
.*?,(\d+),\d+(?:\.\d+)?
name:
(.*?),\d+,\d+(?:\.\d+)?

Open in new window

1

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Paula DiTalloIntegration developerAuthor Commented:
David,
The NA value is intentional.  That value should get thrown into the invalid-balance Kafka topic. Why would the client.csv file not be considered a csv?
0
CompTIA Network+

Prepare for the CompTIA Network+ exam by learning how to troubleshoot, configure, and manage both wired and wireless networks.

Paula DiTalloIntegration developerAuthor Commented:
Terry,
I will replace the old-style [0-9] with d and try the + added.
0
David Johnson, CD, MVPOwnerCommented:
A CSV has a header
Name,Card,Balance
Humpty Dumpty,5432,100.00
Jack Sprat,4234,200.00
Peppa Pig,3231,NA

Your file doesn't have a header so you have to split the line on the comma's and assign the name manually, where a CSV would know the names and only assign the values

Humpty Dumpty,5432,100.00
Jack Sprat,4234,200.00
Peppa Pig,3231,NA

This is equivalent if using a CSV
Name,Card,Balance
Humpty Dumpty,5432,100.00
Jack Sprat,4234,200.00
Peppa Pig,3231,NA

Card,Balance,Name
5432,100.00,Humpty Dumpty
4234,200.00,Jack Sprat
3231,NA,Peppa Pig
0
Terry WoodsIT GuruCommented:
Paula, how did you go with the suggested change?
1
Paula DiTalloIntegration developerAuthor Commented:
Terry,
Thanks so much for your corrections and suggestions to make the code base better. I really appreciate your efforts!
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Regular Expressions

From novice to tech pro — start learning today.