Avatar of awking00
awking00
Flag for United States of America asked on

How to create a text file from html

I have some html files that show the difference between two different schema definitions (xsds). The attached Word file is an excerpt of a sample html file and depicts how it looks. The attached .txt file shows the underlying html source code for that excerpt. What I would like to do is write a Java program that would create a new text file in the following format:
Line 1 "OLD line(s): 235 [EMPTY] [becomes] New line(s): 236,246 [DATA] name = OtherSupportSumAmt
Line 2 "OLD line(s): 1759 [DATA] name=OrganizationTypeDesc [becomes] New line(s): 1770 [DATA] name=OrganizationTypeCd
Line 3 "OLD line(s): 1763 [N/A] [becomes] New line(s): 1774
Basically, I want
1) the old and new information on the same line of text with [becomes] (or some other delimiter) in between.
2) the word [DATA] where the substring "name=" exists followed by name=[whatever is in the quotes that follow].
3) the word [EMPTY] or [NULL] if no data follows the line(s) number.
4) the phrase "not applicable" or N/A where the substring "name=" does not exist.
In other words a new line of text for each old and new pairs of html data.
DiffWebpage.docx
DiffHTML.txt
JavaHTML

Avatar of undefined
Last Comment
awking00

8/22/2022 - Mon
mrcoffee365

What have you tried so far?  Post what you have and tell us what isn't working.
awking00

ASKER
I really haven't tried anything yet as I don't know where to begin. Perhaps I should start out by asking,"Can I create one long string of text from the html?" Then figure out how I might parse that string.
ASKER CERTIFIED SOLUTION
mrcoffee365

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
GET A PERSONALIZED SOLUTION
Ask your own question & get feedback from real experts
Find out why thousands trust the EE community with their toughest problems.
awking00

ASKER
I just downloaded the jsoup libraries and that's pretty much what I was looking for. Haven't completed all the parsing I need to do yet, but that's a fairly straight forward exercise. Thanks a lot.
Experts Exchange is like having an extremely knowledgeable team sitting and waiting for your call. Couldn't do my job half as well as I do without it!
James Murphy