How to extract these values from a text file using JavaScript

Hi,
I am doing work with ANT and hit an impasse being able to read a file and extract values.  I believe I can solve this with simple JavaScript which I am unfamiliar with. I am trying to read a large file that, among many things, has fields sets that look tike this:

<foo:fieldSet name="author" value="John Doe"/><foo:fieldSet name="versionMajor" value="One"/><foo:fieldSet name="versionMinor" value="TwoThree"/>

Open in new window


I basically want to create a properties file that looks like this:
author="John Doe"
versionMajor="One"
versionMinor="TwoThree"

Open in new window


Once that's created I'll be golden as I can then have ANT use this properties file and do the work it needs to do.
LVL 2
TechBentoAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

KimputerCommented:
Although I'm not sure why you need Javascript when you're working with Java, here's some sample code that works, but ONLY IF IT THE SAME FORMAT (x times <foo:fieldSet name="x" value="x"/>)

<!DOCTYPE html>
<html>
<body>


<button onclick="myFunction()">Try it</button>

<p id="demo"></p>

<script>
function myFunction() {
    var fullstring = '<foo:fieldSet name="author" value="John Doe"/><foo:fieldSet name="versionMajor" value="One"/><foo:fieldSet name="versionMinor" value="TwoThree"/>';
	var substring = fullstring.split("name");
	var finalstring = "";

for (i = 1; i < substring.length; i++) {
	var substringa = substring[i].replace('"','').replace('"','').split("=");
	var substringb = substringa[1];
	var substringc = substringa[2];
    finalstring +=  substringb.split(" ")[0] + "=" + substringc.split("/")[0] + "\n";
 }
 
 alert(finalstring);
 
}
</script>

</body>
</html>

Open in new window

0
TechBentoAuthor Commented:
My mistake, what I can use is a "Java script" that I can call from ANT.   Not web-based JavaScript, but normal Java.   Back story is that I am reading an xml file using ant that will not process using normal tools like xmltaks or xmlparam.... So I am stuck using something like a script to get the values I need.
0
KimputerCommented:
Actually, in that case it still work (though you just need to add the Javascript ANT library and use only the script code and discard the html parts.
Also make the script line more explicit:   <script language="javascript">
0
Exploring ASP.NET Core: Fundamentals

Learn to build web apps and services, IoT apps, and mobile backends by covering the fundamentals of ASP.NET Core and  exploring the core foundations for app libraries.

TechBentoAuthor Commented:
Thanks, I am trying it now.  It seems too easy.  I like it.
0
TechBentoAuthor Commented:
Ok, so I was looking at this from my mobile, now I am in front of my desk.  So the issue I have is that the XML file that contains those strings has a lot of stuff in it.  I need to first locate them, then get their values.  That's where majority of my struggle comes form.  Also, they may not be next to each other.... so effectively the code has to

Look for the first parameter "author"..
Locate: <foo:fieldSet name="author" value="John Doe"/>
Extract "John Doe" as the value for parameter author.

Proceed to next parameter. Do that for all the ones I define.
Write them all to the properties file as indicated.
0
KimputerCommented:
Are you telling me the file is NOT

<foo:fieldSet name="xxx" value="xxx"/>

repeated multiple times ?

But looks totally different ? My code was meant for a file with ONLY <foo:fieldSet name="xxx" value="xxx"/> repeated over and over again. It did NOT take into consideration for ANY deviation of this file format.

If the file looks totally different, you should have posted that file structure to begin with.
0
TechBentoAuthor Commented:
Right, I understand your code looks for precisely those strings, and it's opening the door for me, but the issue is more complex.  

Let me go back to the beginning.  What I said was "I am trying to read a large file that, among many things, has fields sets that look tike this:" and provided the example of the fields/values that are key to me that are in the file.   The file itself is a content xml file (let's call it foo.xml for this purpose)... with lots of content.   The content is defined by those field sets, but they are not necessarily next to each other (I cannot assume they will be).  So I need to be able to read that file and only grab those relevant values as it finds them.

To give you some background: I normally can do things like this with ANT directly, usign xmltask and xmlparam on an original xml file. Due to a number of complexities, that's not possible here.  My "foo.xml" is a transformed version of an original.  Both of those ANT tasks do not support a content xml file in this format (albeit it is still xml).  

So, I am stuck doing something I am unfamiliar with...  opening the file... locating a string I need (example: <foo:fieldSet name="author" value="John Doe"/>) and grabbing the value to parse to something ANT can work with.
0
KimputerCommented:
If possible, I'd still like a file with sample data, and what you need extracted from it. How to identify what you need. Is at least "<foo:fieldSet" something that's always the same (signifying the start of something to grab)?
0
TechBentoAuthor Commented:
Sure, here is the first portion. Keep in mind I had to clean it up to remove customer data... So this is basically the opening sequence. It contains the stuff I need.  After that it's all content.  Think of it as a usable xml version of a Word document.

<?xml version="1.0" encoding="UTF-8"?><!-- foo.xsl Build 1 --><foo:content xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:tps="http://foo/cxml" schemaVersion="2.0" xsi:schemaLocation="http://foo/cxml http://foo/cxml.xsd" whiteSpaceMode="preserve"><foo:fieldSet name="author" value="John Doe"/><foo:section type="Paragraph" id="c123"><foo:fieldSet name="versionMajor" value="One"/><foo:fieldSet name="versionMinor" value="TwoThree/><foo:fieldSet name="stuff" value="10"/><foo:fieldSet name="copyright" value="2013"/><foo:fieldSet name="pagecount" value="1004"/><foo:p type="Title">Foo man shoo</foo:p>

Open in new window

0
KimputerCommented:
It seems you only need the "<foo:fieldSet" data then ?
0
TechBentoAuthor Commented:
That's fair to say.  In fact, if it created a properties file with ALL of them that would be fine too. I can live with extra properties.

By the way, thank you for your time here.    I can do many neat things with content, this just isn't one of them :)
0
KimputerCommented:
Actually I just tested with the new data, and it still lists the correct items.
It missed this one though <foo:section type="Paragraph" id="c123">
Do you need need this one too?
0
TechBentoAuthor Commented:
Is it easy to just grab everything called "fieldset" and write to a text file that is

fieldsetName1="value"
fieldsetName2="value"
fieldsetName3="value"
etc

In hindsight, that would be awesome because I could re-use that code all the time.

I only need three specific fields for this project, I used examples of their names in my post, but I can modify them as needed.
0
mccarlIT Business Systems Analyst / Software DeveloperCommented:
I did this little bit of javascript script for you that should help you out. I hope you don't mind but I went a little further than just writing out to a properties file. Since I assume that you will subsequently read in that properties file, I've made a shortcut and I just directly set properties in the Ant project. Hopefully you can see what is going on in the below script. Also, since I noticed your previous question on trying to ignore the DTD specified in the file, this code sets the "feature" that disables loading of DTD so hopefully that will work for you (the only thing is that different parsers call the feature by different names, but I guess your Ant should use the same parser and so it should work)

The below is just a simple Ant script to demonstrate it working. Note that you configure the 3 properties at the top for your situation, they should be obvious, if not I can explain further. Also, note that those 3 properties could be called anything (as long as the javascript code is changed accordingly).

build.xml
<?xml version="1.0" encoding="UTF-8"?>
<project name="MyProject" default="main">
    <property name="xmlscript.fileName" value="input2.xml" />
    <property name="xmlscript.prop.prefix" value="xml.properties" />
    <property name="xmlscript.xml.namespace" value="ns:foo" />
    <target name="main">
        <script language="javascript"><![CDATA[
            importClass(java.lang.System);
            importPackage(java.io);
            importPackage(Packages.javax.xml.parsers);
            importPackage(Packages.org.w3c.dom);
            
            fileName = project.getProperty("xmlscript.fileName");
            propPrefix = project.getProperty("xmlscript.prop.prefix");
            namespace = project.getProperty("xmlscript.xml.namespace");
            
            is = new FileInputStream(fileName);
            
            factory = DocumentBuilderFactory.newInstance();
            factory.setValidating(false);
            factory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
            factory.setNamespaceAware(true);
            builder = factory.newDocumentBuilder();
            doc = builder.parse(is);
            
            fields = doc.getElementsByTagNameNS(namespace, "fieldSet");
            for (i = 0; i < fields.getLength(); i++) {
                fieldAttrs = fields.item(i).getAttributes();
                project.setProperty(propPrefix + "." + fieldAttrs.getNamedItem("name").getNodeValue(), fieldAttrs.getNamedItem("value").getNodeValue());
            }
        ]]></script>
        <echo message="Author: ${xml.properties.author}" />
        <echo message="Major: ${xml.properties.versionMajor}" />
        <echo message="Minor: ${xml.properties.versionMinor}" />
    </target>
</project>

Open in new window


This was tested on the following xml ("input2.xml" is my case)...
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo SYSTEM "Foo.dtd">
<container xmlns:foo="ns:foo">
<foo:fieldSet name="author" value="John Doe"/><foo:fieldSet name="versionMajor" value="One"/><foo:fieldSet name="versionMinor" value="TwoThree"/>
</container>

Open in new window


As you should see in the script, it sets properties starting with the configured prefix and then a property for each "fieldSet" element found in the content. That way you could use which ever property that you need, and not worry about the rest. You could always tweak it to only look for certain fieldSet elements.


Hope this helps you out!
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
KimputerCommented:
I adjusted code from mccarl (hope you don't mind!).
Because:
a - xml input has to adhere to strict rules, while my code was based on text input. If you use his original code, you have to fix your input xml file (which you may or may not want)
b - only 3 lines of output. Mine outputs all fields.
Have a go, see if you like it:

<?xml version="1.0" encoding="UTF-8"?>
<project name="MyProject" default="main">
    <property name="xmlscript.fileName" value="input2.xml" />
    <property name="xmlscript.fileName2" value="output.xml" />
    <property name="xmlscript.prop.prefix" value="xml.properties" />
    <property name="xmlscript.xml.namespace" value="ns:foo" />
    <target name="main">
        <script language="javascript"><![CDATA[
            importClass(java.lang.System);
            importPackage(java.io);
            fileName = project.getProperty("xmlscript.fileName");
			fileName2 = project.getProperty("xmlscript.fileName2");
            
            fr = new FileReader(fileName);
            br = new BufferedReader(fr);
			fullstring = "";
			
			while ((inputLine = br.readLine()) != null) {
			   fullstring += inputLine;
			}
  			 
			var substring = fullstring.split("name");
			var finalstring = "";

			for (i = 1; i < substring.length; i++) {
				var substringa = substring[i].replace('"','').replace('"','').split("=");
				var substringb = substringa[1];
				var substringc = substringa[2];
				finalstring +=  substringb.split(" ")[0] + "=" + substringc.split("/")[0] + "\n";
			 }
	 
			fw = new FileWriter(fileName2);
			bw = new BufferedWriter(fw);
			bw.write(finalstring);            
			bw.close();

        ]]></script>

    </target>
</project>

Open in new window

0
TechBentoAuthor Commented:
Wow, this is stunning.  Testing now.   Whether it works or not, the effort and assistance you both provided is beyond what I ever expected.   Posting here was a hail-mary pass for me, hoping to get myself out of a wacky situation.   I will update in a few hours.
0
TechBentoAuthor Commented:
First update is that I cannot test yet, I hit  - what I believe is - a JDK error.  

 javax.script.ScriptException: ReferenceError: "importClass" is not defined in <eval> at line number 2

Documented at https://bugs.openjdk.java.net/browse/JDK-8025132, I'll explore this a bit.
0
TechBentoAuthor Commented:
Made progress with the script provided by mccarl but not the one by kimputer.    

I got past the JDK 8 compatibility issues by editing both pieces of code import functions.  Here is a sample for kimputer:
load('nashorn:mozilla_compat.js');
var FileInputStream = java.io.File;
var FileReader = java.io.FileReader;
importPackage(Packages.javax.xml.parsers);
importPackage(Packages.org.w3c.dom);

Open in new window


The mccarl code ran and produced results which I am about to review.   The kimputer sample is slightly more appealing for it's ability to grab everything, but it ran into a wall at:

build.xml:15: javax.script.ScriptException: TypeError: Cannot read property "split" from undefined in <eval> at line number 24

So far I've been unable to fix that one.
0
KimputerCommented:
Strange. I used the default ANT config in Eclipse, without modifying any code or configuration or plugin or addin.
Did you try with the recommended Rhino JS.jar?
0
mccarlIT Business Systems Analyst / Software DeveloperCommented:
@TechBento,

The mccarl code ran and produced results which I am about to review.   The kimputer sample is slightly more appealing for it's ability to grab everything
That was kimputer that asserted that my code could NOT "grab everything" when in fact "grab everything" is exactly what it does.

@kimputer,

a - xml input has to adhere to strict rules, while my code was based on text input. If you use his original code, you have to fix your input xml file (which you may or may not want)
You are correct that xml input has to adhere to rules, however TechBento's input IS already in valid xml form so there are no problems here.

In fact my code would be more beneficial as it would be robust in the face of slight changes that are prefectly valid (and normal) in xml files. As an example, whitespace in most of the input xml is not considered part of the actual content and so various parsers/transformers/outputs can vary the amounts of whitespace in a number of places that would cause a naive "text parsing" routine to fail. With the "xml parsing" of my code, this is handled transparently for you.

Also, your code isn't very selective in that it finds "name" wherever it may occur in the input, not just fieldSet elements which may very well upset the result you get. Perhaps TechBento does want the code to extract name/values from elements other then <foo:fieldSet> but that can be easily accounted for in the xml parsing version, if desired.

b - only 3 lines of output. Mine outputs all fields.
"Only 3 lines of output" because that IS all the fields in the example input. There is nothin hard coded in my <script> that limits it to these 3 properties. If there were 300 <foo:fieldSet> elements in the input, then my code would set 300 properties to be used in the rest of the Ant script. No problem!

If you are referring the 3 <echo> tasks in my Ant script, they are just to demonstrate that the properties had been set. Those wouldn't be in the final script. You just use the properties in whatever places that you need them. As I said in my comment above, what this does is it saves you from having the write out a .properties file and then read that file back in. It shortcuts that so that the properties are already usable anyway in your Ant script.
0
KimputerCommented:
@mccarl: Apologies if my assumptions where not correct. It's because I tested with your input file (works fine), but since @TechBento already posted his input file, I tested with this file too of course (throwing xml errors). For this reason I modified your code. My apologies again.
0
TechBentoAuthor Commented:
Hi @kimputer and @mccarl.

A few lessons learned and a successful solution developed.    I took everything you guys shared and got a final solution that's somewhat different, but I would have never arrived at it without all this support.

I found that using javax.xml.xpath.XPath (https://docs.oracle.com/javase/7/docs/api/javax/xml/xpath/package-summary.html) will work slightly better for me.   The reason I kept digging is I found that loading the xml into string and splitting it was not ideal for larger content files.  It worked well on small ones in testing.    I'll close the question for now.  

Thank you both.
0
TechBentoAuthor Commented:
Fantastic people.
0
mccarlIT Business Systems Analyst / Software DeveloperCommented:
You're welcome!
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Java

From novice to tech pro — start learning today.