?
Solved

How do I catch numbers from a string also containing characters?

Posted on 2009-05-18
8
Medium Priority
?
516 Views
Last Modified: 2012-06-27
I have a HTML file with a table where each row contains some text and some numbers.
I would like to catch just the numbers from each row and put them into pairs of 5.

A number´s format can be 12345-123 and then I would like to just get the first five digits.

Each five digits should be concatenated to a String and then beeing put into the ArrayList.

I have tried and tried but not succeeded yet. I have submitted my code, maybe it can help a little.
String[] parts = str.split("</?td>");   // get the text between the <td> and </td> tags
            char c;
            int calc=0;
            for(int i=1; i< parts.length; i += 2)
            {
                String nr = null;
                for(int j=1; j < parts[i].length(); j++)
                {
                    c = parts[i].charAt(j);
                    if (Character.isDigit(c))
                    {
                        nr += c;      // Concatenate each 5 digits to a string
                        System.out.print(c);
                        calc++;
                        if(calc%5==0)         // Make a new line each 5 digits
                        {
                            list.add(nr);         // Add the string with 5 digits to the list
                            System.out.println();
                            calc = 0;             // Reset the calculator
                        }
                        
                    }
 
                }
            }

Open in new window

0
Comment
Question by:Roxxor
  • 3
  • 2
  • 2
  • +1
8 Comments
 
LVL 9

Accepted Solution

by:
wellhole earned 600 total points
ID: 24416141
First of all, always start at index 0 for i = 0 and j = 0.

Put the calc declaration, int calc = 0, after nr declaration, String nr = null.

Do not use calc%5 == 0, because index 0 % 5 is 0. Instead, use calc = 4.
0
 
LVL 9

Assisted Solution

by:wellhole
wellhole earned 600 total points
ID: 24416175
Well, here's the changes.... Changed lines are left aligned.
            String[] parts = str.split("</?td>");   // get the text between the <td> and </td> tags
            char c;
for(int i=0; i< parts.length; i += 2)  // ------- Why += 2?
            {
                String nr = null;
int calc=0;
                for(int j=0; j < parts[i].length(); j++)
                {
                    c = parts[i].charAt(j);
                    if (Character.isDigit(c))
                    {
                        nr += c;      // Concatenate each 5 digits to a string
                        System.out.print(c);
                        calc++;
if(calc == 5)         // Make a new line each 5 digits
                        {
                            list.add(nr);         // Add the string with 5 digits to the list
                            System.out.println();
                            calc = 0;             // Reset the calculator
                        }
                        
                    }
 
                }
            }

Open in new window

0
 
LVL 86

Expert Comment

by:CEHJ
ID: 24416506
It's generally not a good idea to reinvent an html parser, particularly with regexes. Use an html parser, of which you can get many more or less high level parsers. For a high level parser where you can simply address the tables, rows and columns as arrays, try HTMLUnit
0
2017 Webroot Threat Report

MSPs: Get the facts you need to protect your clients.
The 2017 Webroot Threat Report provides a uniquely insightful global view into the analysis and discoveries made by the Webroot® Threat Intelligence Platform to provide insights on key trends and risks as seen by our users.

 
LVL 1

Author Comment

by:Roxxor
ID: 24416549
Thanks, I have changed my mind a little, I think it´s easyier to pick up the whole number (e.g. 12345-1234)
and put it to the list. The amount of code will be much smaller and easier.

My problem:

I get a StringIndexOutOfBoundsException on the first iteration in the loop, but then it prints out all following numbers correctly. Why do I get the exception?
String s = "abcd 12345-1234 efgh";
System.out.println(s.substring(s.indexOf("-")-5, s.indexOf("-")+5)); // Prints 12345-1234 = works fine
 
String[] parts = str.split("</?td>");  
for(int i=1; i < parts.length; i += 2)
{
       try
       {
                    System.out.println(parts[i].substring(parts[i].indexOf("-")-5, parts[i].indexOf("-")+5));
                }
                catch(StringIndexOutOfBoundsException err){System.out.println(err.getMessage()); err.printStackTrace();;}
}

Open in new window

0
 
LVL 1

Author Comment

by:Roxxor
ID: 24416562
The StackTrace says:

java.lang.StringIndexOutOfBoundsException: String index out of range: -7
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 24416938
If you post your html as an attachment that would probably help
0
 
LVL 92

Assisted Solution

by:objects
objects earned 150 total points
ID: 24417199
use split() to split the number instead of using substring()

String[] numbers = field.split("-");
0
 
LVL 1

Author Comment

by:Roxxor
ID: 24421125
Yes, I am also using the split() method but I do nee using the substring() as well to filter out all characters.

I still get ArrayIndexOutOfBoundsException and now I´m using split:

A <td> tag with this text: "<td>text text text 12345-123 text text text</td>" should only result in 12345-123 with the below code, but then I get the exception (ArrayOutOfBoundException).

I still don´t get what´s wrong.
String s = "abcd 12345-1234 efgh";
System.out.println(s.substring(s.indexOf("-")-5, s.indexOf("-")+5)); // Prints 12345-1234 = works fine
 
String[] parts = str.split("</?td>");  
// each part[index] now contains a row like <td>text 12345-123 text</td>
 
String[] s = null;
for(int i=1; i < parts.length; i += 2)
{
       try
       {
              s = parts[i].split("-");
              String result = s[0].substring(s[0].length()-4) + "-" + s[1].substring(0,3);  
              System.out.println(result); // should print 12345-123
       }
       catch(StringIndexOutOfBoundsException err){System.out.println(err.getMessage()); err.printStackTrace();;}
}

Open in new window

0

Featured Post

Automating Your MSP Business

The road to profitability.
Delivering superior services is key to ensuring customer satisfaction and the consequent long-term relationships that enable MSPs to lock in predictable, recurring revenue. What's the best way to deliver superior service? One word: automation.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Securing your business data in current era should be your biggest priority. Numerous people are unaware of the fact that insiders commit more than 60 percent of security breaches. You need to figure out the underlying cause and invoke your potential…
The Internet has made sending and receiving information online a breeze. But there is also the threat of unauthorized viewing, data tampering, and phoney messages. Surprisingly, a lot of business owners do not fully understand how to use security t…
This video Micro Tutorial shows how to password-protect PDF files with free software. Many software products can do this, such as Adobe Acrobat (but not Adobe Reader), Nuance PaperPort, and Nuance Power PDF, but they are not free products. This vide…
Is your data getting by on basic protection measures? In today’s climate of debilitating malware and ransomware—like WannaCry—that may not be enough. You need to establish more than basics, like a recovery plan that protects both data and endpoints.…
Suggested Courses

862 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question