Solved

How to extract these values from a text file using JavaScript

Posted on 2014-12-07
24
391 Views
Last Modified: 2014-12-10
Hi,
I am doing work with ANT and hit an impasse being able to read a file and extract values.  I believe I can solve this with simple JavaScript which I am unfamiliar with. I am trying to read a large file that, among many things, has fields sets that look tike this:

<foo:fieldSet name="author" value="John Doe"/><foo:fieldSet name="versionMajor" value="One"/><foo:fieldSet name="versionMinor" value="TwoThree"/>

Open in new window


I basically want to create a properties file that looks like this:
author="John Doe"
versionMajor="One"
versionMinor="TwoThree"

Open in new window


Once that's created I'll be golden as I can then have ANT use this properties file and do the work it needs to do.
0
Comment
Question by:TechBento
  • 12
  • 9
  • 3
24 Comments
 
LVL 35

Assisted Solution

by:Kimputer
Kimputer earned 167 total points
ID: 40486416
Although I'm not sure why you need Javascript when you're working with Java, here's some sample code that works, but ONLY IF IT THE SAME FORMAT (x times <foo:fieldSet name="x" value="x"/>)

<!DOCTYPE html>
<html>
<body>


<button onclick="myFunction()">Try it</button>

<p id="demo"></p>

<script>
function myFunction() {
    var fullstring = '<foo:fieldSet name="author" value="John Doe"/><foo:fieldSet name="versionMajor" value="One"/><foo:fieldSet name="versionMinor" value="TwoThree"/>';
	var substring = fullstring.split("name");
	var finalstring = "";

for (i = 1; i < substring.length; i++) {
	var substringa = substring[i].replace('"','').replace('"','').split("=");
	var substringb = substringa[1];
	var substringc = substringa[2];
    finalstring +=  substringb.split(" ")[0] + "=" + substringc.split("/")[0] + "\n";
 }
 
 alert(finalstring);
 
}
</script>

</body>
</html>

Open in new window

0
 
LVL 2

Author Comment

by:TechBento
ID: 40486572
My mistake, what I can use is a "Java script" that I can call from ANT.   Not web-based JavaScript, but normal Java.   Back story is that I am reading an xml file using ant that will not process using normal tools like xmltaks or xmlparam.... So I am stuck using something like a script to get the values I need.
0
 
LVL 35

Expert Comment

by:Kimputer
ID: 40486635
Actually, in that case it still work (though you just need to add the Javascript ANT library and use only the script code and discard the html parts.
Also make the script line more explicit:   <script language="javascript">
0
 
LVL 2

Author Comment

by:TechBento
ID: 40486662
Thanks, I am trying it now.  It seems too easy.  I like it.
0
 
LVL 2

Author Comment

by:TechBento
ID: 40486677
Ok, so I was looking at this from my mobile, now I am in front of my desk.  So the issue I have is that the XML file that contains those strings has a lot of stuff in it.  I need to first locate them, then get their values.  That's where majority of my struggle comes form.  Also, they may not be next to each other.... so effectively the code has to

Look for the first parameter "author"..
Locate: <foo:fieldSet name="author" value="John Doe"/>
Extract "John Doe" as the value for parameter author.

Proceed to next parameter. Do that for all the ones I define.
Write them all to the properties file as indicated.
0
 
LVL 35

Expert Comment

by:Kimputer
ID: 40486695
Are you telling me the file is NOT

<foo:fieldSet name="xxx" value="xxx"/>

repeated multiple times ?

But looks totally different ? My code was meant for a file with ONLY <foo:fieldSet name="xxx" value="xxx"/> repeated over and over again. It did NOT take into consideration for ANY deviation of this file format.

If the file looks totally different, you should have posted that file structure to begin with.
0
 
LVL 2

Author Comment

by:TechBento
ID: 40486745
Right, I understand your code looks for precisely those strings, and it's opening the door for me, but the issue is more complex.  

Let me go back to the beginning.  What I said was "I am trying to read a large file that, among many things, has fields sets that look tike this:" and provided the example of the fields/values that are key to me that are in the file.   The file itself is a content xml file (let's call it foo.xml for this purpose)... with lots of content.   The content is defined by those field sets, but they are not necessarily next to each other (I cannot assume they will be).  So I need to be able to read that file and only grab those relevant values as it finds them.

To give you some background: I normally can do things like this with ANT directly, usign xmltask and xmlparam on an original xml file. Due to a number of complexities, that's not possible here.  My "foo.xml" is a transformed version of an original.  Both of those ANT tasks do not support a content xml file in this format (albeit it is still xml).  

So, I am stuck doing something I am unfamiliar with...  opening the file... locating a string I need (example: <foo:fieldSet name="author" value="John Doe"/>) and grabbing the value to parse to something ANT can work with.
0
 
LVL 35

Expert Comment

by:Kimputer
ID: 40486753
If possible, I'd still like a file with sample data, and what you need extracted from it. How to identify what you need. Is at least "<foo:fieldSet" something that's always the same (signifying the start of something to grab)?
0
 
LVL 2

Author Comment

by:TechBento
ID: 40486799
Sure, here is the first portion. Keep in mind I had to clean it up to remove customer data... So this is basically the opening sequence. It contains the stuff I need.  After that it's all content.  Think of it as a usable xml version of a Word document.

<?xml version="1.0" encoding="UTF-8"?><!-- foo.xsl Build 1 --><foo:content xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:tps="http://foo/cxml" schemaVersion="2.0" xsi:schemaLocation="http://foo/cxml http://foo/cxml.xsd" whiteSpaceMode="preserve"><foo:fieldSet name="author" value="John Doe"/><foo:section type="Paragraph" id="c123"><foo:fieldSet name="versionMajor" value="One"/><foo:fieldSet name="versionMinor" value="TwoThree/><foo:fieldSet name="stuff" value="10"/><foo:fieldSet name="copyright" value="2013"/><foo:fieldSet name="pagecount" value="1004"/><foo:p type="Title">Foo man shoo</foo:p>

Open in new window

0
 
LVL 35

Expert Comment

by:Kimputer
ID: 40486825
It seems you only need the "<foo:fieldSet" data then ?
0
 
LVL 2

Author Comment

by:TechBento
ID: 40486828
That's fair to say.  In fact, if it created a properties file with ALL of them that would be fine too. I can live with extra properties.

By the way, thank you for your time here.    I can do many neat things with content, this just isn't one of them :)
0
 
LVL 35

Expert Comment

by:Kimputer
ID: 40486895
Actually I just tested with the new data, and it still lists the correct items.
It missed this one though <foo:section type="Paragraph" id="c123">
Do you need need this one too?
0
What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

 
LVL 2

Author Comment

by:TechBento
ID: 40486918
Is it easy to just grab everything called "fieldset" and write to a text file that is

fieldsetName1="value"
fieldsetName2="value"
fieldsetName3="value"
etc

In hindsight, that would be awesome because I could re-use that code all the time.

I only need three specific fields for this project, I used examples of their names in my post, but I can modify them as needed.
0
 
LVL 35

Accepted Solution

by:
mccarl earned 333 total points
ID: 40488245
I did this little bit of javascript script for you that should help you out. I hope you don't mind but I went a little further than just writing out to a properties file. Since I assume that you will subsequently read in that properties file, I've made a shortcut and I just directly set properties in the Ant project. Hopefully you can see what is going on in the below script. Also, since I noticed your previous question on trying to ignore the DTD specified in the file, this code sets the "feature" that disables loading of DTD so hopefully that will work for you (the only thing is that different parsers call the feature by different names, but I guess your Ant should use the same parser and so it should work)

The below is just a simple Ant script to demonstrate it working. Note that you configure the 3 properties at the top for your situation, they should be obvious, if not I can explain further. Also, note that those 3 properties could be called anything (as long as the javascript code is changed accordingly).

build.xml
<?xml version="1.0" encoding="UTF-8"?>
<project name="MyProject" default="main">
    <property name="xmlscript.fileName" value="input2.xml" />
    <property name="xmlscript.prop.prefix" value="xml.properties" />
    <property name="xmlscript.xml.namespace" value="ns:foo" />
    <target name="main">
        <script language="javascript"><![CDATA[
            importClass(java.lang.System);
            importPackage(java.io);
            importPackage(Packages.javax.xml.parsers);
            importPackage(Packages.org.w3c.dom);
            
            fileName = project.getProperty("xmlscript.fileName");
            propPrefix = project.getProperty("xmlscript.prop.prefix");
            namespace = project.getProperty("xmlscript.xml.namespace");
            
            is = new FileInputStream(fileName);
            
            factory = DocumentBuilderFactory.newInstance();
            factory.setValidating(false);
            factory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
            factory.setNamespaceAware(true);
            builder = factory.newDocumentBuilder();
            doc = builder.parse(is);
            
            fields = doc.getElementsByTagNameNS(namespace, "fieldSet");
            for (i = 0; i < fields.getLength(); i++) {
                fieldAttrs = fields.item(i).getAttributes();
                project.setProperty(propPrefix + "." + fieldAttrs.getNamedItem("name").getNodeValue(), fieldAttrs.getNamedItem("value").getNodeValue());
            }
        ]]></script>
        <echo message="Author: ${xml.properties.author}" />
        <echo message="Major: ${xml.properties.versionMajor}" />
        <echo message="Minor: ${xml.properties.versionMinor}" />
    </target>
</project>

Open in new window


This was tested on the following xml ("input2.xml" is my case)...
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo SYSTEM "Foo.dtd">
<container xmlns:foo="ns:foo">
<foo:fieldSet name="author" value="John Doe"/><foo:fieldSet name="versionMajor" value="One"/><foo:fieldSet name="versionMinor" value="TwoThree"/>
</container>

Open in new window


As you should see in the script, it sets properties starting with the configured prefix and then a property for each "fieldSet" element found in the content. That way you could use which ever property that you need, and not worry about the rest. You could always tweak it to only look for certain fieldSet elements.


Hope this helps you out!
0
 
LVL 35

Expert Comment

by:Kimputer
ID: 40488569
I adjusted code from mccarl (hope you don't mind!).
Because:
a - xml input has to adhere to strict rules, while my code was based on text input. If you use his original code, you have to fix your input xml file (which you may or may not want)
b - only 3 lines of output. Mine outputs all fields.
Have a go, see if you like it:

<?xml version="1.0" encoding="UTF-8"?>
<project name="MyProject" default="main">
    <property name="xmlscript.fileName" value="input2.xml" />
    <property name="xmlscript.fileName2" value="output.xml" />
    <property name="xmlscript.prop.prefix" value="xml.properties" />
    <property name="xmlscript.xml.namespace" value="ns:foo" />
    <target name="main">
        <script language="javascript"><![CDATA[
            importClass(java.lang.System);
            importPackage(java.io);
            fileName = project.getProperty("xmlscript.fileName");
			fileName2 = project.getProperty("xmlscript.fileName2");
            
            fr = new FileReader(fileName);
            br = new BufferedReader(fr);
			fullstring = "";
			
			while ((inputLine = br.readLine()) != null) {
			   fullstring += inputLine;
			}
  			 
			var substring = fullstring.split("name");
			var finalstring = "";

			for (i = 1; i < substring.length; i++) {
				var substringa = substring[i].replace('"','').replace('"','').split("=");
				var substringb = substringa[1];
				var substringc = substringa[2];
				finalstring +=  substringb.split(" ")[0] + "=" + substringc.split("/")[0] + "\n";
			 }
	 
			fw = new FileWriter(fileName2);
			bw = new BufferedWriter(fw);
			bw.write(finalstring);            
			bw.close();

        ]]></script>

    </target>
</project>

Open in new window

0
 
LVL 2

Author Comment

by:TechBento
ID: 40489040
Wow, this is stunning.  Testing now.   Whether it works or not, the effort and assistance you both provided is beyond what I ever expected.   Posting here was a hail-mary pass for me, hoping to get myself out of a wacky situation.   I will update in a few hours.
0
 
LVL 2

Author Comment

by:TechBento
ID: 40489060
First update is that I cannot test yet, I hit  - what I believe is - a JDK error.  

 javax.script.ScriptException: ReferenceError: "importClass" is not defined in <eval> at line number 2

Documented at https://bugs.openjdk.java.net/browse/JDK-8025132, I'll explore this a bit.
0
 
LVL 2

Author Comment

by:TechBento
ID: 40489134
Made progress with the script provided by mccarl but not the one by kimputer.    

I got past the JDK 8 compatibility issues by editing both pieces of code import functions.  Here is a sample for kimputer:
load('nashorn:mozilla_compat.js');
var FileInputStream = java.io.File;
var FileReader = java.io.FileReader;
importPackage(Packages.javax.xml.parsers);
importPackage(Packages.org.w3c.dom);

Open in new window


The mccarl code ran and produced results which I am about to review.   The kimputer sample is slightly more appealing for it's ability to grab everything, but it ran into a wall at:

build.xml:15: javax.script.ScriptException: TypeError: Cannot read property "split" from undefined in <eval> at line number 24

So far I've been unable to fix that one.
0
 
LVL 35

Expert Comment

by:Kimputer
ID: 40489176
Strange. I used the default ANT config in Eclipse, without modifying any code or configuration or plugin or addin.
Did you try with the recommended Rhino JS.jar?
0
 
LVL 35

Assisted Solution

by:mccarl
mccarl earned 333 total points
ID: 40490279
@TechBento,

The mccarl code ran and produced results which I am about to review.   The kimputer sample is slightly more appealing for it's ability to grab everything
That was kimputer that asserted that my code could NOT "grab everything" when in fact "grab everything" is exactly what it does.

@kimputer,

a - xml input has to adhere to strict rules, while my code was based on text input. If you use his original code, you have to fix your input xml file (which you may or may not want)
You are correct that xml input has to adhere to rules, however TechBento's input IS already in valid xml form so there are no problems here.

In fact my code would be more beneficial as it would be robust in the face of slight changes that are prefectly valid (and normal) in xml files. As an example, whitespace in most of the input xml is not considered part of the actual content and so various parsers/transformers/outputs can vary the amounts of whitespace in a number of places that would cause a naive "text parsing" routine to fail. With the "xml parsing" of my code, this is handled transparently for you.

Also, your code isn't very selective in that it finds "name" wherever it may occur in the input, not just fieldSet elements which may very well upset the result you get. Perhaps TechBento does want the code to extract name/values from elements other then <foo:fieldSet> but that can be easily accounted for in the xml parsing version, if desired.

b - only 3 lines of output. Mine outputs all fields.
"Only 3 lines of output" because that IS all the fields in the example input. There is nothin hard coded in my <script> that limits it to these 3 properties. If there were 300 <foo:fieldSet> elements in the input, then my code would set 300 properties to be used in the rest of the Ant script. No problem!

If you are referring the 3 <echo> tasks in my Ant script, they are just to demonstrate that the properties had been set. Those wouldn't be in the final script. You just use the properties in whatever places that you need them. As I said in my comment above, what this does is it saves you from having the write out a .properties file and then read that file back in. It shortcuts that so that the properties are already usable anyway in your Ant script.
0
 
LVL 35

Expert Comment

by:Kimputer
ID: 40490978
@mccarl: Apologies if my assumptions where not correct. It's because I tested with your input file (works fine), but since @TechBento already posted his input file, I tested with this file too of course (throwing xml errors). For this reason I modified your code. My apologies again.
0
 
LVL 2

Author Comment

by:TechBento
ID: 40491478
Hi @kimputer and @mccarl.

A few lessons learned and a successful solution developed.    I took everything you guys shared and got a final solution that's somewhat different, but I would have never arrived at it without all this support.

I found that using javax.xml.xpath.XPath (https://docs.oracle.com/javase/7/docs/api/javax/xml/xpath/package-summary.html) will work slightly better for me.   The reason I kept digging is I found that loading the xml into string and splitting it was not ideal for larger content files.  It worked well on small ones in testing.    I'll close the question for now.  

Thank you both.
0
 
LVL 2

Author Closing Comment

by:TechBento
ID: 40491486
Fantastic people.
0
 
LVL 35

Expert Comment

by:mccarl
ID: 40492753
You're welcome!
0

Featured Post

Free Trending Threat Insights Every Day

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

Suggested Solutions

Java Flight Recorder and Java Mission Control together create a complete tool chain to continuously collect low level and detailed runtime information enabling after-the-fact incident analysis. Java Flight Recorder is a profiling and event collectio…
Introduction This article is the first of three articles that explain why and how the Experts Exchange QA Team does test automation for our web site. This article explains our test automation goals. Then rationale is given for the tools we use to a…
Viewers will learn about the different types of variables in Java and how to declare them. Decide the type of variable desired: Put the keyword corresponding to the type of variable in front of the variable name: Use the equal sign to assign a v…
This tutorial covers a practical example of lazy loading technique and early loading technique in a Singleton Design Pattern.

757 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now