?
Solved

Combining XML files from a folder

Posted on 2004-08-12
7
Medium Priority
?
215 Views
Last Modified: 2010-04-23
I have a set of XHTML text files in a folder, I am reading in there text and writing it out into one large file.  The problem is that there is a lot of text that is getting repeated that should not be repeated.  I am refering to <html><head>....</head>  and the like.  There is just one long table in each file, I need to combine these tables also.

I am currently using  IO.StreamReader but I think I need to conver to Xml.XmlTextReader, so help with this would be nice too.

Any help would be nice.
0
Comment
Question by:JHalstead
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
7 Comments
 
LVL 24

Expert Comment

by:Justin_W
ID: 11788386
Unless you really need to parse the files as XML for some reason, I would probably do it this way:
1. Create a StringBuilder
2. Read the first file into a String variable.
3. Use String.IndexOf("<body"), String.LastIndexOf("</body>"), and String.SubString(...) to extract the content from the String variable.
4. Append the substring/content to your StringBuilder.
5. Repeat steps 2-4 for each additional file.

Also, here's a utility for reading the contents of any text file:
   Public Shared Function ReadTextFile(ByVal fileName As String) As String
       Dim reader As System.IO.StreamReader
       Try
           reader = System.IO.File.OpenText(fileName)
           Dim s As String = reader.ReadToEnd()
           Return s
       Finally
           If (Not IsNothing(reader)) Then
               Try
                   reader.Close()
               Catch
               End Try
           End If
       End Try
   End Function
0
 

Author Comment

by:JHalstead
ID: 11788735
Ok, I don't know what I am doing wrige here but now, only the last file in the array in being saved to the output file...


    Private Sub DoCombine()
        Dim myreader As New System.Text.StringBuilder
        Dim myjgs As New ArrayList
        Dim mystrings As New ArrayList

        If ListBox1.Items.Count <= 0 Then
            MsgBox("There are no files to load, Combine aborted!")
            Exit Sub
        End If

        For loff As Integer = 0 To ListBox1.Items.Count - 1
            If ListBox1.Items(loff).lastindexof("jg") <= 0 Then
                MsgBox("There is a non JG file in list, Combine aborted!", MsgBoxStyle.OKOnly, "Docsoft")
                Exit Sub
            End If

            Dim mystring As String = ListBox1.Items(loff).substring(0, ListBox1.Items(loff).lastindexof("jg"))
            If Not myjgs.Contains(mystring) Then myjgs.Add(mystring)
            mystrings.Add(ListBox1.Items(loff))
        Next

        For moff As Integer = 0 To myjgs.Count - 1
            Dim myfstrings As ArrayList = mystrings
            myfstrings = FilterArray(myjgs(moff) & "jg", mystrings.Clone)

            Dim mypath As String = rootpath & "\" & myjgs(moff) & ".toc.all.htm"
            '            Dim mywriter As New Xml.XmlTextWriter(mypath, System.Text.Encoding.UTF8)

            For x As Integer = 0 To myfstrings.Count - 1

                '------------------------------
                Dim sr As New IO.StreamReader(rootpath & "\" & myfstrings(x).ToString)
                Dim s As String = sr.ReadLine
                Dim buf As New System.Text.StringBuilder

                Do Until s Is Nothing
                    If s.StartsWith("<html") Then

                        Do Until s Is Nothing OrElse s.StartsWith("</html>")

                            If s.StartsWith("<EFFECT") Then
                                Dim ei As Integer = s.LastIndexOf(">")
                                s = s.Insert(ei, "/")
                            End If

                            buf.Append(s)

                            buf.Append(vbCrLf)
                            s = sr.ReadLine()
                        Loop

                        buf.Append(s)
                        Dim sw As New IO.StreamWriter(mypath)
                        sw.WriteLine(buf.ToString())

                        sw.Close()

                    End If

                    s = sr.ReadLine()

                    sr.Close()
                Loop


                '------------------------------

            Next
        Next

        MsgBox("DONE")
    End Sub
0
 
LVL 24

Expert Comment

by:Justin_W
ID: 11788763
The following will overwrite the file each time:
                        Dim sw As New IO.StreamWriter(mypath)
                        sw.WriteLine(buf.ToString())

                        sw.Close()
Move the preceding lines to after the end of the loop, and move the following line to before the FOR loop:
                        Dim buf As New System.Text.StringBuilder
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:JHalstead
ID: 11789020
Ok, it works to some degree, could you please do two things for me:

1)  give an example of how I would delete a table, not parse it, when it has a specific ID attribute
2)  is there a way to add white space to the output document to make it more readable
0
 
LVL 24

Accepted Solution

by:
Justin_W earned 1500 total points
ID: 11789069
Both of those operations would require more complicated string manipulation or XML parsing.  You could use XmlDocument.LoadXml() to parse a String.  XmlDocument.PreserveWhiteSpace = false should make the XmlDocument format the XML nicely when converted back to a string.  Locating a specific node with specific attributes would require XPath expressions or DOM navigation, and would be a separate question.
0
 

Author Comment

by:JHalstead
ID: 11789080
Alright then, thanks for the help...
0
 
LVL 24

Expert Comment

by:Justin_W
ID: 11789142
You're welcome.
0

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This tutorial demonstrates one way to create an application that runs without any Forms but still has a GUI presence via an Icon in the System Tray. The magic lies in Inheriting from the ApplicationContext Class and passing that to Application.Ru…
The ECB site provides FX rates for major currencies since its inception in 1999 in the form of an XML feed. The files have the following format (reducted for brevity) (CODE) There are three files available HERE (http://www.ecb.europa.eu/stats/exch…
If you’ve ever visited a web page and noticed a cool font that you really liked the look of, but couldn’t figure out which font it was so that you could use it for your own work, then this video is for you! In this Micro Tutorial, you'll learn yo…
In this video, Percona Director of Solution Engineering Jon Tobin discusses the function and features of Percona Server for MongoDB. How Percona can help Percona can help you determine if Percona Server for MongoDB is the right solution for …
Suggested Courses

752 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question