Regular Expression used in Apache Nifi not processing correctly. Please advise.

Techies, Can someone isolate where I'm dropping the ball on getting the regex matches I'm expecting in this dataflow? My goal is to move the matched versions over to 1 kafka topic and  the unmatched over to another kafka topic.  Attached is the client.csv test file


Here's what the data flow looks like--with the regex used in the ExtractText config. NiFi uses Java's version of regular expressions.  

ExtractFileProcessorNiFiwithRegexclient.csv
Paula DiTalloIntegration developerAsked:
Who is Participating?
 
Terry WoodsConnect With a Mentor IT GuruCommented:
I don't know how the tool works, but I would suggest trying the following configuration:
balance:
.*?,[0-9]+,([0-9]+(?:\.[0-9]+)?)
card:
.*?,([0-9]+),[0-9]+(?:\.[0-9]+)?
name:
(.*?),[0-9]+,[0-9]+(?:\.[0-9]+)?

Open in new window


Adding ?: just inside the round brackets used for the decimal place of the balance should make it non-capturing, in case that is causing a problem.
Additionally, I've added a + after the [0-9] used to capture the pre-decimal-point part of the balance, to allow for balances of more than 1 digit.

It may also look much tidier if you used \d in place of each [0-9] like this:

balance:
.*?,\d+,(\d+(?:\.\d+)?)
card:
.*?,(\d+),\d+(?:\.\d+)?
name:
(.*?),\d+,\d+(?:\.\d+)?

Open in new window

1
 
David Johnson, CD, MVPOwnerCommented:
name will never match 0-9, if balance is NA then it also will not match 0-9
0
 
Paula DiTalloIntegration developerAuthor Commented:
David,
The NA value is intentional.  That value should get thrown into the invalid-balance Kafka topic. Why would the client.csv file not be considered a csv?
0
The 14th Annual Expert Award Winners

The results are in! Meet the top members of our 2017 Expert Awards. Congratulations to all who qualified!

 
Paula DiTalloIntegration developerAuthor Commented:
Terry,
I will replace the old-style [0-9] with d and try the + added.
0
 
David Johnson, CD, MVPOwnerCommented:
A CSV has a header
Name,Card,Balance
Humpty Dumpty,5432,100.00
Jack Sprat,4234,200.00
Peppa Pig,3231,NA

Your file doesn't have a header so you have to split the line on the comma's and assign the name manually, where a CSV would know the names and only assign the values

Humpty Dumpty,5432,100.00
Jack Sprat,4234,200.00
Peppa Pig,3231,NA

This is equivalent if using a CSV
Name,Card,Balance
Humpty Dumpty,5432,100.00
Jack Sprat,4234,200.00
Peppa Pig,3231,NA

Card,Balance,Name
5432,100.00,Humpty Dumpty
4234,200.00,Jack Sprat
3231,NA,Peppa Pig
0
 
Terry WoodsIT GuruCommented:
Paula, how did you go with the suggested change?
1
 
Paula DiTalloIntegration developerAuthor Commented:
Terry,
Thanks so much for your corrections and suggestions to make the code base better. I really appreciate your efforts!
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.