Solved

How do I catch numbers from a string also containing characters?

Posted on 2009-05-18
8
483 Views
Last Modified: 2012-06-27
I have a HTML file with a table where each row contains some text and some numbers.
I would like to catch just the numbers from each row and put them into pairs of 5.

A number´s format can be 12345-123 and then I would like to just get the first five digits.

Each five digits should be concatenated to a String and then beeing put into the ArrayList.

I have tried and tried but not succeeded yet. I have submitted my code, maybe it can help a little.
String[] parts = str.split("</?td>");   // get the text between the <td> and </td> tags

            char c;

            int calc=0;

            for(int i=1; i< parts.length; i += 2)

            {

                String nr = null;

                for(int j=1; j < parts[i].length(); j++)

                {

                    c = parts[i].charAt(j);

                    if (Character.isDigit(c))

                    {

                        nr += c;      // Concatenate each 5 digits to a string

                        System.out.print(c);

                        calc++;

                        if(calc%5==0)         // Make a new line each 5 digits

                        {

                            list.add(nr);         // Add the string with 5 digits to the list

                            System.out.println();

                            calc = 0;             // Reset the calculator

                        }

                        

                    }
 

                }

            }

Open in new window

0
Comment
Question by:Roxxor
  • 3
  • 2
  • 2
  • +1
8 Comments
 
LVL 9

Accepted Solution

by:
wellhole earned 200 total points
Comment Utility
First of all, always start at index 0 for i = 0 and j = 0.

Put the calc declaration, int calc = 0, after nr declaration, String nr = null.

Do not use calc%5 == 0, because index 0 % 5 is 0. Instead, use calc = 4.
0
 
LVL 9

Assisted Solution

by:wellhole
wellhole earned 200 total points
Comment Utility
Well, here's the changes.... Changed lines are left aligned.
            String[] parts = str.split("</?td>");   // get the text between the <td> and </td> tags

            char c;

for(int i=0; i< parts.length; i += 2)  // ------- Why += 2?

            {

                String nr = null;

int calc=0;

                for(int j=0; j < parts[i].length(); j++)

                {

                    c = parts[i].charAt(j);

                    if (Character.isDigit(c))

                    {

                        nr += c;      // Concatenate each 5 digits to a string

                        System.out.print(c);

                        calc++;

if(calc == 5)         // Make a new line each 5 digits

                        {

                            list.add(nr);         // Add the string with 5 digits to the list

                            System.out.println();

                            calc = 0;             // Reset the calculator

                        }

                        

                    }

 

                }

            }

Open in new window

0
 
LVL 86

Expert Comment

by:CEHJ
Comment Utility
It's generally not a good idea to reinvent an html parser, particularly with regexes. Use an html parser, of which you can get many more or less high level parsers. For a high level parser where you can simply address the tables, rows and columns as arrays, try HTMLUnit
0
 
LVL 1

Author Comment

by:Roxxor
Comment Utility
Thanks, I have changed my mind a little, I think it´s easyier to pick up the whole number (e.g. 12345-1234)
and put it to the list. The amount of code will be much smaller and easier.

My problem:

I get a StringIndexOutOfBoundsException on the first iteration in the loop, but then it prints out all following numbers correctly. Why do I get the exception?
String s = "abcd 12345-1234 efgh";

System.out.println(s.substring(s.indexOf("-")-5, s.indexOf("-")+5)); // Prints 12345-1234 = works fine
 

String[] parts = str.split("</?td>");  

for(int i=1; i < parts.length; i += 2)

{

       try

       {

                    System.out.println(parts[i].substring(parts[i].indexOf("-")-5, parts[i].indexOf("-")+5));

                }

                catch(StringIndexOutOfBoundsException err){System.out.println(err.getMessage()); err.printStackTrace();;}

}

Open in new window

0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 1

Author Comment

by:Roxxor
Comment Utility
The StackTrace says:

java.lang.StringIndexOutOfBoundsException: String index out of range: -7
0
 
LVL 86

Expert Comment

by:CEHJ
Comment Utility
If you post your html as an attachment that would probably help
0
 
LVL 92

Assisted Solution

by:objects
objects earned 50 total points
Comment Utility
use split() to split the number instead of using substring()

String[] numbers = field.split("-");
0
 
LVL 1

Author Comment

by:Roxxor
Comment Utility
Yes, I am also using the split() method but I do nee using the substring() as well to filter out all characters.

I still get ArrayIndexOutOfBoundsException and now I´m using split:

A <td> tag with this text: "<td>text text text 12345-123 text text text</td>" should only result in 12345-123 with the below code, but then I get the exception (ArrayOutOfBoundException).

I still don´t get what´s wrong.
String s = "abcd 12345-1234 efgh";

System.out.println(s.substring(s.indexOf("-")-5, s.indexOf("-")+5)); // Prints 12345-1234 = works fine

 

String[] parts = str.split("</?td>");  

// each part[index] now contains a row like <td>text 12345-123 text</td>
 

String[] s = null;

for(int i=1; i < parts.length; i += 2)

{

       try

       {

              s = parts[i].split("-");

              String result = s[0].substring(s[0].length()-4) + "-" + s[1].substring(0,3);  

              System.out.println(result); // should print 12345-123

       }

       catch(StringIndexOutOfBoundsException err){System.out.println(err.getMessage()); err.printStackTrace();;}

}

Open in new window

0

Featured Post

Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

Join & Write a Comment

Ransomware continues to be a growing problem for both personal and business users alike and Antivirus companies are still struggling to find a reliable way to protect you from this dangerous threat.
Password hashing is better than message digests or encryption, and you should be using it instead of message digests or encryption.  Find out why and how in this article, which supplements the original article on PHP Client Registration, Login, Logo…
Viewers will learn about basic arrays, how to declare them, and how to use them. Introduction and definition: Declare an array and cover the syntax of declaring them: Initialize every index in the created array: Example/Features of a basic arr…
This tutorial will introduce the viewer to VisualVM for the Java platform application. This video explains an example program and covers the Overview, Monitor, and Heap Dump tabs.

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now