Link to home
Start Free TrialLog in
Avatar of jazzIIIlove
jazzIIIloveFlag for Sweden

asked on

pattern problem

java.util.regex.Pattern p = Pattern.compile("(?:^|,)\\s*(?:(?:(?=\")\"([^\"].*?)\")|(?:(?!\")(.*?)))(?=,|$)");
                // java.util.regex.Pattern p = Pattern.compile("(?:^|,)\\s*(?:(?:(?=\")\"([^\"].*?)\")|(?:(?!\")(.*?)))(?=,|$)");
//(?:^|,)\s*(?:(?:(?=")"([^"].*?)")|(?:(?!")(.*?)))(?=,|$)

                // (?:\s*(?:\"([^\"]*)\"|([^,]+))\s*,?)+?

                //java.util.regex.Pattern p = Pattern.compile("\\s*(?:\"[^\"]*\"|(?:^|(?<=,))[^,]*)");
                Matcher m = p.matcher(rawLine);

                while(m.find()) {
                    if(!m.group().equals(",")) {
                 //       if(m.group().contains("\"")) {
                            list.add(m.group().replaceAll("\"",""));
                   //     }
                    }
                }

Open in new window


I am trying to parse below csv

ID,User Name,Corporate Name
001,baz, bar corp
002,Doom,NTT
003,Kate,Some Systems
"004","Foo baz","Excorp"
"005",bar,"NTT"

the string quotes should be treated as one and if the quotes are not there it should still be treated as one.

But i couldn't solve it. Any help? Please?..
Avatar of CEHJ
CEHJ
Flag of United Kingdom of Great Britain and Northern Ireland image

What exactly is the problem? - there are three tokens per line separated by commas. And if there aren't, then it's invalid csv
Avatar of jazzIIIlove

ASKER

These are not parsed properly.

"Aaa bbb", ccc, "ddd", eee fff
Should be parsed
Aaa bbb
ccc
ddd
eee fff

You were right in the other question. Sorry for not believing you.

Br.
SOLUTION
Avatar of CEHJ
CEHJ
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hi,

The original input is the correct one. Sorry for confusion.

Br.
then you should check:
("*)([\s\S]*?)\1,("*)([\s\S]*?)\3,("*)(.*)\5
and the relevant groups are 2,4,6. You can always make it more complicated and somewhat more elegant but I think it shows how it can be done.

best of luck.
-=Yuval=-
nice effort yuval!

This is what I have done. It seems it is working fine. I appreciate if you can break this.

        java.util.regex.Pattern p = Pattern.compile("\"([^\"]*)\"|(?<=,|^)([^,]*)");
        Matcher m = p.matcher(rawLine);

        while (m.find()) {
            int groupCount = m.groupCount();
            for (int i = 1; i <= groupCount; i++) {
                if (m.group(i) != null) {
                    System.out.println(m.group(i));

                }
            }
        }
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
What is it you need to do with the parsed values? It may be that you don't need to use a regular expression at all. Something as simple as scanning or reading in a line from the file into a string, splitting on the comma, and removing the quotes might get you what you need.