Java/JavaEE Library To Parse String Content Based on Rules

hello,

I want to add custom logic/class to my custom Jackson JsonSerialize implementation such that it parses out html based on certain rules. For example, if html is inclosed in single quotes ''text'' then the custom logic should accept the string as is. If its not in single quotes, like text then I want the custom logic/class to return just text. Additionally, if I have a block of html enclosed in three single quotes '''example''' it should accepted as is but if its not then only the example text should be returned and everything else parsed out. What is the best Java library to accomplish this? I thought about using AnitSamy but that leaves me open to an XSS attack since I need to accept anything inside quotes.

Examples:

  input:<b>text</b>
    output:text
    
    input:'<b>'text'</b>'
    output:'<b>'text'</b>'
    
    input:<html><head><title>text</title></head></html>
    output:text
    
    input:'''<html><head><title>text</title></head></html>'''
    output:'''<html><head><title>text</title></head></html>'''

Open in new window

cgray1223Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

mccarlIT Business Systems Analyst / Software DeveloperCommented:
In this simple case, I would think that a library might me a bit of overkill. Especially when it can be done as simply as this...
public class TestTextProcessing {
    
    public static void main(String[] args) {
        runTest("<b>text</b>");        
        runTest("'<b>'text'</b>'");        
        runTest("<html><head><title>text</title></head></html>");        
        runTest("'''<html><head><title>text</title></head></html>'''");        
        runTest("<input type='text'>Input Text</input>");        
        runTest("'<input type='text'>'Input Text</input>");        
        runTest("'<input type='text'>Input Text</input>'");        
    }

    private static void runTest(String string) {
        System.out.println("Input: " + string);
        System.out.println("Output: " + parseString(string));
        System.out.println();
    }

    private static String parseString(String string) {
        StringBuilder sb = new StringBuilder();
        boolean inQuotes = false;
        boolean inAngles = false;
        
        for (char inChar : string.toCharArray()) {
            if (inChar == '<' && !inQuotes) {
                inAngles = true;
            }
            if (inChar == '>' && inAngles) {
                inAngles = false;
                continue;
            }
            if (inAngles) {
                continue;
            }
            if (inChar == '\'') {
                inQuotes = !inQuotes;
            }
            
            sb.append(inChar);
        }
        return sb.toString();
    }
}

Open in new window

Note that I have added a few more example test cases to handle some other situations that I thought of. I would recommend adding a lot more to make sure that it covers what you want in as many cases that you can think of.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Java

From novice to tech pro — start learning today.