I have some html files that show the difference between two different schema definitions (xsds). The attached Word file is an excerpt of a sample html file and depicts how it looks. The attached .txt file shows the underlying html source code for that excerpt. What I would like to do is write a Java program that would create a new text file in the following format:
Line 1 "OLD line(s): 235 [EMPTY] [becomes] New line(s): 236,246 [DATA] name = OtherSupportSumAmt
Line 2 "OLD line(s): 1759 [DATA] name=OrganizationTypeDesc [becomes] New line(s): 1770 [DATA] name=OrganizationTypeCd
Line 3 "OLD line(s): 1763 [N/A] [becomes] New line(s): 1774
Basically, I want
1) the old and new information on the same line of text with [becomes] (or some other delimiter) in between.
2) the word [DATA] where the substring "name=" exists followed by name=[whatever is in the quotes that follow].
3) the word [EMPTY] or [NULL] if no data follows the line(s) number.
4) the phrase "not applicable" or N/A where the substring "name=" does not exist.
In other words a new line of text for each old and new pairs of html data. DiffWebpage.docx DiffHTML.txt
JavaHTML
Last Comment
awking00
8/22/2022 - Mon
mrcoffee365
What have you tried so far? Post what you have and tell us what isn't working.
awking00
ASKER
I really haven't tried anything yet as I don't know where to begin. Perhaps I should start out by asking,"Can I create one long string of text from the html?" Then figure out how I might parse that string.
I just downloaded the jsoup libraries and that's pretty much what I was looking for. Haven't completed all the parsing I need to do yet, but that's a fairly straight forward exercise. Thanks a lot.