jazzIIIlove
asked on
pattern problem
java.util.regex.Pattern p = Pattern.compile("(?:^|,)\\s*(?:(?:(?=\")\"([^\"].*?)\")|(?:(?!\")(.*?)))(?=,|$)");
// java.util.regex.Pattern p = Pattern.compile("(?:^|,)\\s*(?:(?:(?=\")\"([^\"].*?)\")|(?:(?!\")(.*?)))(?=,|$)");
//(?:^|,)\s*(?:(?:(?=")"([^"].*?)")|(?:(?!")(.*?)))(?=,|$)
// (?:\s*(?:\"([^\"]*)\"|([^,]+))\s*,?)+?
//java.util.regex.Pattern p = Pattern.compile("\\s*(?:\"[^\"]*\"|(?:^|(?<=,))[^,]*)");
Matcher m = p.matcher(rawLine);
while(m.find()) {
if(!m.group().equals(",")) {
// if(m.group().contains("\"")) {
list.add(m.group().replaceAll("\"",""));
// }
}
}
I am trying to parse below csv
ID,User Name,Corporate Name
001,baz, bar corp
002,Doom,NTT
003,Kate,Some Systems
"004","Foo baz","Excorp"
"005",bar,"NTT"
the string quotes should be treated as one and if the quotes are not there it should still be treated as one.
But i couldn't solve it. Any help? Please?..
What exactly is the problem? - there are three tokens per line separated by commas. And if there aren't, then it's invalid csv
ASKER
These are not parsed properly.
"Aaa bbb", ccc, "ddd", eee fff
Should be parsed
Aaa bbb
ccc
ddd
eee fff
You were right in the other question. Sorry for not believing you.
Br.
"Aaa bbb", ccc, "ddd", eee fff
Should be parsed
Aaa bbb
ccc
ddd
eee fff
You were right in the other question. Sorry for not believing you.
Br.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Hi,
The original input is the correct one. Sorry for confusion.
Br.
The original input is the correct one. Sorry for confusion.
Br.
then you should check:
("*)([\s\S]*?)\1,("*)([\s\ S]*?)\3,(" *)(.*)\5
and the relevant groups are 2,4,6. You can always make it more complicated and somewhat more elegant but I think it shows how it can be done.
best of luck.
-=Yuval=-
("*)([\s\S]*?)\1,("*)([\s\
and the relevant groups are 2,4,6. You can always make it more complicated and somewhat more elegant but I think it shows how it can be done.
best of luck.
-=Yuval=-
ASKER
nice effort yuval!
This is what I have done. It seems it is working fine. I appreciate if you can break this.
java.util.regex.Pattern p = Pattern.compile("\"([^\"]* )\"|(?<=,| ^)([^,]*)" );
Matcher m = p.matcher(rawLine);
while (m.find()) {
int groupCount = m.groupCount();
for (int i = 1; i <= groupCount; i++) {
if (m.group(i) != null) {
System.out.println(m.group (i));
}
}
}
This is what I have done. It seems it is working fine. I appreciate if you can break this.
java.util.regex.Pattern p = Pattern.compile("\"([^\"]*
Matcher m = p.matcher(rawLine);
while (m.find()) {
int groupCount = m.groupCount();
for (int i = 1; i <= groupCount; i++) {
if (m.group(i) != null) {
System.out.println(m.group
}
}
}
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
What is it you need to do with the parsed values? It may be that you don't need to use a regular expression at all. Something as simple as scanning or reading in a line from the file into a string, splitting on the comma, and removing the quotes might get you what you need.