Solved

Combining XML files from a folder

Posted on 2004-08-12
7
210 Views
Last Modified: 2010-04-23
I have a set of XHTML text files in a folder, I am reading in there text and writing it out into one large file.  The problem is that there is a lot of text that is getting repeated that should not be repeated.  I am refering to <html><head>....</head>  and the like.  There is just one long table in each file, I need to combine these tables also.

I am currently using  IO.StreamReader but I think I need to conver to Xml.XmlTextReader, so help with this would be nice too.

Any help would be nice.
0
Comment
Question by:JHalstead
  • 4
  • 3
7 Comments
 
LVL 24

Expert Comment

by:Justin_W
ID: 11788386
Unless you really need to parse the files as XML for some reason, I would probably do it this way:
1. Create a StringBuilder
2. Read the first file into a String variable.
3. Use String.IndexOf("<body"), String.LastIndexOf("</body>"), and String.SubString(...) to extract the content from the String variable.
4. Append the substring/content to your StringBuilder.
5. Repeat steps 2-4 for each additional file.

Also, here's a utility for reading the contents of any text file:
   Public Shared Function ReadTextFile(ByVal fileName As String) As String
       Dim reader As System.IO.StreamReader
       Try
           reader = System.IO.File.OpenText(fileName)
           Dim s As String = reader.ReadToEnd()
           Return s
       Finally
           If (Not IsNothing(reader)) Then
               Try
                   reader.Close()
               Catch
               End Try
           End If
       End Try
   End Function
0
 

Author Comment

by:JHalstead
ID: 11788735
Ok, I don't know what I am doing wrige here but now, only the last file in the array in being saved to the output file...


    Private Sub DoCombine()
        Dim myreader As New System.Text.StringBuilder
        Dim myjgs As New ArrayList
        Dim mystrings As New ArrayList

        If ListBox1.Items.Count <= 0 Then
            MsgBox("There are no files to load, Combine aborted!")
            Exit Sub
        End If

        For loff As Integer = 0 To ListBox1.Items.Count - 1
            If ListBox1.Items(loff).lastindexof("jg") <= 0 Then
                MsgBox("There is a non JG file in list, Combine aborted!", MsgBoxStyle.OKOnly, "Docsoft")
                Exit Sub
            End If

            Dim mystring As String = ListBox1.Items(loff).substring(0, ListBox1.Items(loff).lastindexof("jg"))
            If Not myjgs.Contains(mystring) Then myjgs.Add(mystring)
            mystrings.Add(ListBox1.Items(loff))
        Next

        For moff As Integer = 0 To myjgs.Count - 1
            Dim myfstrings As ArrayList = mystrings
            myfstrings = FilterArray(myjgs(moff) & "jg", mystrings.Clone)

            Dim mypath As String = rootpath & "\" & myjgs(moff) & ".toc.all.htm"
            '            Dim mywriter As New Xml.XmlTextWriter(mypath, System.Text.Encoding.UTF8)

            For x As Integer = 0 To myfstrings.Count - 1

                '------------------------------
                Dim sr As New IO.StreamReader(rootpath & "\" & myfstrings(x).ToString)
                Dim s As String = sr.ReadLine
                Dim buf As New System.Text.StringBuilder

                Do Until s Is Nothing
                    If s.StartsWith("<html") Then

                        Do Until s Is Nothing OrElse s.StartsWith("</html>")

                            If s.StartsWith("<EFFECT") Then
                                Dim ei As Integer = s.LastIndexOf(">")
                                s = s.Insert(ei, "/")
                            End If

                            buf.Append(s)

                            buf.Append(vbCrLf)
                            s = sr.ReadLine()
                        Loop

                        buf.Append(s)
                        Dim sw As New IO.StreamWriter(mypath)
                        sw.WriteLine(buf.ToString())

                        sw.Close()

                    End If

                    s = sr.ReadLine()

                    sr.Close()
                Loop


                '------------------------------

            Next
        Next

        MsgBox("DONE")
    End Sub
0
 
LVL 24

Expert Comment

by:Justin_W
ID: 11788763
The following will overwrite the file each time:
                        Dim sw As New IO.StreamWriter(mypath)
                        sw.WriteLine(buf.ToString())

                        sw.Close()
Move the preceding lines to after the end of the loop, and move the following line to before the FOR loop:
                        Dim buf As New System.Text.StringBuilder
0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 

Author Comment

by:JHalstead
ID: 11789020
Ok, it works to some degree, could you please do two things for me:

1)  give an example of how I would delete a table, not parse it, when it has a specific ID attribute
2)  is there a way to add white space to the output document to make it more readable
0
 
LVL 24

Accepted Solution

by:
Justin_W earned 500 total points
ID: 11789069
Both of those operations would require more complicated string manipulation or XML parsing.  You could use XmlDocument.LoadXml() to parse a String.  XmlDocument.PreserveWhiteSpace = false should make the XmlDocument format the XML nicely when converted back to a string.  Locating a specific node with specific attributes would require XPath expressions or DOM navigation, and would be a separate question.
0
 

Author Comment

by:JHalstead
ID: 11789080
Alright then, thanks for the help...
0
 
LVL 24

Expert Comment

by:Justin_W
ID: 11789142
You're welcome.
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Article by: jpaulino
XML Literals are a great way to handle XML files and the community doesn’t use it as much as it should.  An XML Literal is like a String (http://msdn.microsoft.com/en-us/library/system.string.aspx) Literal, only instead of starting and ending with w…
Article by: Kraeven
Introduction Remote Share is a simple remote sharing tool, enabling you to see, add and remove remote or local shares. The application is written in VB.NET targeting the .NET framework 2.0. The source code and the compiled programs have been in…
This is a video that shows how the OnPage alerts system integrates into ConnectWise, how a trigger is set, how a page is sent via the trigger, and how the SENT, DELIVERED, READ & REPLIED receipts get entered into the internal tab of the ConnectWise …
Delivering innovative fully-managed cloud services for mission-critical applications requires expertise in multiple areas plus vision and commitment. Meet a few of the people behind the quality services of Concerto.

947 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now