Link to home
Start Free TrialLog in
Avatar of tyweed420
tyweed420

asked on

Having problem with stringtokenizer and seperating strings on a line

I have this text file

"CS157" "Databases" "3" "CS"
"Cs151" "OOP" "4" "CS"
"Anthro147" "Human Sexuality" "6" "Ap" <=========  see that human sexuality. i need that to break down to one word but its making it two words. How would i make it so within quotes is one word?

you can see belowq i'm delimiting tabs and quotes so its taking human sexuality as two words. I'd like if its even three words but within quotes it be stored as a single string.

But not sure how i'd do this can it be done with stringtokenizer or is there a better tool fore this?

 while(line != null)
       {
       StringTokenizer tokens = new StringTokenizer(line," \" \t");
       list = new ArrayList();


       while (tokens.hasMoreTokens())
       {
             //System.out.println(tokens.nextToken());
         list.add(tokens.nextToken());
         }
Avatar of CEHJ
CEHJ
Flag of United Kingdom of Great Britain and Northern Ireland image

Just remove the space from the tokens String
ASKER CERTIFIED SOLUTION
Avatar of CEHJ
CEHJ
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
preffered=preferred
You may use the following code:

       String[] s = x.split("\" ");
       for (int i = 0; i < s.length; i++)
          list.add(s[i].replaceAll("\"", ""));
Sorry, should be:

     String[] s = line.split("\" ");
       for (int i = 0; i < s.length; i++)
          list.add(s[i].replaceAll("\"", ""));
import java.util.regex.*;
import java.util.*;

public class X {
      public static void main(String[] args) {
            String s = "\"CS157\" \"Human sexuality\" \"3\" \"CS\"";
            Pattern p = Pattern.compile("\"([^\"]+)\"");
            Matcher m = p.matcher(s);
            List strings = new ArrayList();
            while (m.find()) {
                  strings.add(m.group(1));
            }
            System.out.println(strings);
      }
}
Avatar of tyweed420
tyweed420

ASKER

ok 1 problem there are actually two files i'm trying to split

file1: are seperated by spaces

"CS157" "Databases" "3" "CS"
"Cs151" "OOP" "4" "CS"
"Anthro147" "Human Sexuality" "6" "Ap"

file2: are seperated by tabs

"5"      "CS157"       "Fall"       "2005"      "Pollett"
"6"       "CS151"      "Spring"      "2004"      "Pollet"
"1"      "CS172"      "Fall"      "2005"      "Kenneth"

can both be handeled by one split and replaceall? or do i need to do two different cases? I'm still learning how this split works

points raised 100
cehj wrote a regular expression that finds the insides of the quotes. It doesn't matter what's between them, he extracts from matching quotes.
Since you're clearly not comfortable with regex, the first thing you should do is really learn them, because they are incredibly powerful.
But if you want a shorter, easier to understand solution, this might help get you started, though CEHJ's is clean.

String[] tokens = line.split("\"\s+\"");

\s+ is any sequence of whitespace (space, tab, etc) and the + requires at least one, but can accept more.
So this splits the line. The problem is that the first and last element have quotes in them.
tokens[0] = tokens[0].substr(1);
in n = tokens.length-1;
tokens[n] = tokens[n].substr(0, tokens[n].length-1);

You should also consider cleaning up your input format. If you want spaces in the text, use tabs as the delimiter. If you want tabs too, get a character, like comma, or colon, and use that.
Then this is a one liner. it's not like this format is so convenient to process that you actually want the quotes.
:-)