Solved

Combining XML files from a folder

Posted on 2004-08-12
7
214 Views
Last Modified: 2010-04-23
I have a set of XHTML text files in a folder, I am reading in there text and writing it out into one large file.  The problem is that there is a lot of text that is getting repeated that should not be repeated.  I am refering to <html><head>....</head>  and the like.  There is just one long table in each file, I need to combine these tables also.

I am currently using  IO.StreamReader but I think I need to conver to Xml.XmlTextReader, so help with this would be nice too.

Any help would be nice.
0
Comment
Question by:JHalstead
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
7 Comments
 
LVL 24

Expert Comment

by:Justin_W
ID: 11788386
Unless you really need to parse the files as XML for some reason, I would probably do it this way:
1. Create a StringBuilder
2. Read the first file into a String variable.
3. Use String.IndexOf("<body"), String.LastIndexOf("</body>"), and String.SubString(...) to extract the content from the String variable.
4. Append the substring/content to your StringBuilder.
5. Repeat steps 2-4 for each additional file.

Also, here's a utility for reading the contents of any text file:
   Public Shared Function ReadTextFile(ByVal fileName As String) As String
       Dim reader As System.IO.StreamReader
       Try
           reader = System.IO.File.OpenText(fileName)
           Dim s As String = reader.ReadToEnd()
           Return s
       Finally
           If (Not IsNothing(reader)) Then
               Try
                   reader.Close()
               Catch
               End Try
           End If
       End Try
   End Function
0
 

Author Comment

by:JHalstead
ID: 11788735
Ok, I don't know what I am doing wrige here but now, only the last file in the array in being saved to the output file...


    Private Sub DoCombine()
        Dim myreader As New System.Text.StringBuilder
        Dim myjgs As New ArrayList
        Dim mystrings As New ArrayList

        If ListBox1.Items.Count <= 0 Then
            MsgBox("There are no files to load, Combine aborted!")
            Exit Sub
        End If

        For loff As Integer = 0 To ListBox1.Items.Count - 1
            If ListBox1.Items(loff).lastindexof("jg") <= 0 Then
                MsgBox("There is a non JG file in list, Combine aborted!", MsgBoxStyle.OKOnly, "Docsoft")
                Exit Sub
            End If

            Dim mystring As String = ListBox1.Items(loff).substring(0, ListBox1.Items(loff).lastindexof("jg"))
            If Not myjgs.Contains(mystring) Then myjgs.Add(mystring)
            mystrings.Add(ListBox1.Items(loff))
        Next

        For moff As Integer = 0 To myjgs.Count - 1
            Dim myfstrings As ArrayList = mystrings
            myfstrings = FilterArray(myjgs(moff) & "jg", mystrings.Clone)

            Dim mypath As String = rootpath & "\" & myjgs(moff) & ".toc.all.htm"
            '            Dim mywriter As New Xml.XmlTextWriter(mypath, System.Text.Encoding.UTF8)

            For x As Integer = 0 To myfstrings.Count - 1

                '------------------------------
                Dim sr As New IO.StreamReader(rootpath & "\" & myfstrings(x).ToString)
                Dim s As String = sr.ReadLine
                Dim buf As New System.Text.StringBuilder

                Do Until s Is Nothing
                    If s.StartsWith("<html") Then

                        Do Until s Is Nothing OrElse s.StartsWith("</html>")

                            If s.StartsWith("<EFFECT") Then
                                Dim ei As Integer = s.LastIndexOf(">")
                                s = s.Insert(ei, "/")
                            End If

                            buf.Append(s)

                            buf.Append(vbCrLf)
                            s = sr.ReadLine()
                        Loop

                        buf.Append(s)
                        Dim sw As New IO.StreamWriter(mypath)
                        sw.WriteLine(buf.ToString())

                        sw.Close()

                    End If

                    s = sr.ReadLine()

                    sr.Close()
                Loop


                '------------------------------

            Next
        Next

        MsgBox("DONE")
    End Sub
0
 
LVL 24

Expert Comment

by:Justin_W
ID: 11788763
The following will overwrite the file each time:
                        Dim sw As New IO.StreamWriter(mypath)
                        sw.WriteLine(buf.ToString())

                        sw.Close()
Move the preceding lines to after the end of the loop, and move the following line to before the FOR loop:
                        Dim buf As New System.Text.StringBuilder
0
Revamp Your Training Process

Drastically shorten your training time with WalkMe's advanced online training solution that Guides your trainees to action.

 

Author Comment

by:JHalstead
ID: 11789020
Ok, it works to some degree, could you please do two things for me:

1)  give an example of how I would delete a table, not parse it, when it has a specific ID attribute
2)  is there a way to add white space to the output document to make it more readable
0
 
LVL 24

Accepted Solution

by:
Justin_W earned 500 total points
ID: 11789069
Both of those operations would require more complicated string manipulation or XML parsing.  You could use XmlDocument.LoadXml() to parse a String.  XmlDocument.PreserveWhiteSpace = false should make the XmlDocument format the XML nicely when converted back to a string.  Locating a specific node with specific attributes would require XPath expressions or DOM navigation, and would be a separate question.
0
 

Author Comment

by:JHalstead
ID: 11789080
Alright then, thanks for the help...
0
 
LVL 24

Expert Comment

by:Justin_W
ID: 11789142
You're welcome.
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article explains how to create and use a custom WaterMark textbox class.  The custom WaterMark textbox class allows you to set the WaterMark Background Color and WaterMark text at design time.   IMAGE OF WATERMARKS STEPS Create VB …
I think the Typed DataTable and Typed DataSet are very good options when working with data, but I don't like auto-generated code. First, I create an Abstract Class for my DataTables Common Code.  This class Inherits from DataTable. Also, it can …
Come and listen to Percona CEO Peter Zaitsev discuss what’s new in Percona open source software, including Percona Server for MySQL (https://www.percona.com/software/mysql-database/percona-server) and MongoDB (https://www.percona.com/software/mongo-…
This is a high-level webinar that covers the history of enterprise open source database use. It addresses both the advantages companies see in using open source database technologies, as well as the fears and reservations they might have. In this…

724 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question