asked on

RegEx to parse Cisco CSA security log into a SEIM

Hey folks, I looked through the KB and other sites but have not found anything that I can use yet. We have a SEIM product that can read in security logs from all of our products and bring them into a single console. The only one we are having problems with is the Cisco CSA log. The log file itself is a CSV which is not a problem, but the data fields can be either null or double-quoted text. An example of the data we see could be something like this:
12345,Device A,Alert,1234,Attempted access,Low priority, 45, 001
12345,Device A,Alert,1234 ,No definition, check log file,Medium priority, 55, 002
12345,Device C,Info,,Admin login,System Information, external interface,,

Notice the trailing nulls and the commas in the text fields. Our SEIM is messing the fields up when it finds the commas in the text, otherwise it works fine. I can't find a regex to handle both null fields and commas in the text. I have tried something like
(\d+),(.*?),(.*?),(.*?),(.*?),(.*?),(.*?),(\W*\w*\s*\W*\w*\s*\W*.+)
Which will handle the nulls, but not the commas in text. I have seen posts that will work with script engines, but this is a different type of interface. Each parens is a static field that maps to the SEIM in one way or another. Any help with this is very much appreciated.

kaufmed

Do the null fields only occur at the end, or is there a chance of a null field at some point in middle (or even beginning)?

kaufmed

Nevermind. My question should be:

If a null is in the middle, can we assume that it will always be two adjoined commas (,,)?

ewest02

Do you mean to say that a text field might have an embedded comma? Or just that empty text fields are marked by two adjacent commas?

One thought that comes to mind is to replace adjacent commas with text. For example:

foo,,bar => foo,na,bar

Would that help???

--Eric

Terry Woods

I can't understand why you've got this part of the regex in your pattern - what are you trying to match with it?:
(\W*\w*\s*\W*\w*\s*\W*.+)

How about something like this?
\s*(.*?)\s*,\s*(.*?)\s*,\s*(.*?)\s*,\s*(.*?)\s*,\s*(.*?)\s*,\s*(.*?)\s*,\s*(.*?)\s*,\s*(.*)

kaufmed

How about:

([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*)(?:,(.*)?)?

Open in new window

VigilantServices

ASKER

Yes, nulls can be in the middle and they are always adjacent commas. The (\W*\w*\s*\W*\w*\s*\W*.+) was an attempt to match the trailing field in case it was null. Unfortunately, I can't change the output of the CSA text file because I don't have access to the device, only read access to the text file.

I will try these:
([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*)(?:,(.*)?)?
\s*(.*?)\s*,\s*(.*?)\s*,\s*(.*?)\s*,\s*(.*?)\s*,\s*(.*?)\s*,\s*(.*?)\s*,\s*(.*?)\s*,\s*(.*)
In the morning (San Diego time) and let you know.
Thanks!

VigilantServices

ASKER

The first expression does not work. The second expression, ([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*)(?:,(.*)?)?, handles the nulls okay but fails by splitting the text fields with commas, such as "System Information, external interface", in two.

kaufmed

How are you splitting these fields, or better, which language are you using? The parentheses should function as capturing groups, so you should be receiving a collection of matches.

VigilantServices

ASKER

The fields are split using the SEIM parser interface, which is most likely Perl on the back end (I can call support for the version if needed). It is still splitting the text fields with commas and what is also odd is that the last two fields are getting combined instead of showing up as a null and a field. Here are the last three fields. There should be a null between the two text strings:
["Security - Network Worms (Medium or High Security)"] [,"NT AUTHORITY\SYSTEM"]

VigilantServices

ASKER

Okay, now after testing on the production system I got the last two fields to show up as text-null-text, but the text fields with commas are still being split. I was using Regex Coach to test the strings before.

VigilantServices

ASKER

I think since the meat of what we need is at the beginning of the log entries we can lump the last few fields together, which includes the fields that are being split. I can use the pattern above to do what I need, so should I ask another question about modifying the pattern to remove the text qualifiers (double-quotes) or can you update the pattern above for me and we can call it done?

ASKER CERTIFIED SOLUTION

kaufmed

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

VigilantServices

ASKER

The final solution was a combination of three expressions, depending on what I expected in the fields:
(\d+), -- if I know it will be a digit
([^,]*), -- if I am unsure of what it will be, keep the double-quotes and allow for nulls
"\"?([^\",]*)\"?, -- if I know it will be text

Thanks!

kaufmed

Well I'm sorry it took so many posts to get you there, but glad to be of assistance!