Solved

Combining XML files from a folder

Posted on 2004-08-12
7
211 Views
Last Modified: 2010-04-23
I have a set of XHTML text files in a folder, I am reading in there text and writing it out into one large file.  The problem is that there is a lot of text that is getting repeated that should not be repeated.  I am refering to <html><head>....</head>  and the like.  There is just one long table in each file, I need to combine these tables also.

I am currently using  IO.StreamReader but I think I need to conver to Xml.XmlTextReader, so help with this would be nice too.

Any help would be nice.
0
Comment
Question by:JHalstead
  • 4
  • 3
7 Comments
 
LVL 24

Expert Comment

by:Justin_W
ID: 11788386
Unless you really need to parse the files as XML for some reason, I would probably do it this way:
1. Create a StringBuilder
2. Read the first file into a String variable.
3. Use String.IndexOf("<body"), String.LastIndexOf("</body>"), and String.SubString(...) to extract the content from the String variable.
4. Append the substring/content to your StringBuilder.
5. Repeat steps 2-4 for each additional file.

Also, here's a utility for reading the contents of any text file:
   Public Shared Function ReadTextFile(ByVal fileName As String) As String
       Dim reader As System.IO.StreamReader
       Try
           reader = System.IO.File.OpenText(fileName)
           Dim s As String = reader.ReadToEnd()
           Return s
       Finally
           If (Not IsNothing(reader)) Then
               Try
                   reader.Close()
               Catch
               End Try
           End If
       End Try
   End Function
0
 

Author Comment

by:JHalstead
ID: 11788735
Ok, I don't know what I am doing wrige here but now, only the last file in the array in being saved to the output file...


    Private Sub DoCombine()
        Dim myreader As New System.Text.StringBuilder
        Dim myjgs As New ArrayList
        Dim mystrings As New ArrayList

        If ListBox1.Items.Count <= 0 Then
            MsgBox("There are no files to load, Combine aborted!")
            Exit Sub
        End If

        For loff As Integer = 0 To ListBox1.Items.Count - 1
            If ListBox1.Items(loff).lastindexof("jg") <= 0 Then
                MsgBox("There is a non JG file in list, Combine aborted!", MsgBoxStyle.OKOnly, "Docsoft")
                Exit Sub
            End If

            Dim mystring As String = ListBox1.Items(loff).substring(0, ListBox1.Items(loff).lastindexof("jg"))
            If Not myjgs.Contains(mystring) Then myjgs.Add(mystring)
            mystrings.Add(ListBox1.Items(loff))
        Next

        For moff As Integer = 0 To myjgs.Count - 1
            Dim myfstrings As ArrayList = mystrings
            myfstrings = FilterArray(myjgs(moff) & "jg", mystrings.Clone)

            Dim mypath As String = rootpath & "\" & myjgs(moff) & ".toc.all.htm"
            '            Dim mywriter As New Xml.XmlTextWriter(mypath, System.Text.Encoding.UTF8)

            For x As Integer = 0 To myfstrings.Count - 1

                '------------------------------
                Dim sr As New IO.StreamReader(rootpath & "\" & myfstrings(x).ToString)
                Dim s As String = sr.ReadLine
                Dim buf As New System.Text.StringBuilder

                Do Until s Is Nothing
                    If s.StartsWith("<html") Then

                        Do Until s Is Nothing OrElse s.StartsWith("</html>")

                            If s.StartsWith("<EFFECT") Then
                                Dim ei As Integer = s.LastIndexOf(">")
                                s = s.Insert(ei, "/")
                            End If

                            buf.Append(s)

                            buf.Append(vbCrLf)
                            s = sr.ReadLine()
                        Loop

                        buf.Append(s)
                        Dim sw As New IO.StreamWriter(mypath)
                        sw.WriteLine(buf.ToString())

                        sw.Close()

                    End If

                    s = sr.ReadLine()

                    sr.Close()
                Loop


                '------------------------------

            Next
        Next

        MsgBox("DONE")
    End Sub
0
 
LVL 24

Expert Comment

by:Justin_W
ID: 11788763
The following will overwrite the file each time:
                        Dim sw As New IO.StreamWriter(mypath)
                        sw.WriteLine(buf.ToString())

                        sw.Close()
Move the preceding lines to after the end of the loop, and move the following line to before the FOR loop:
                        Dim buf As New System.Text.StringBuilder
0
PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

 

Author Comment

by:JHalstead
ID: 11789020
Ok, it works to some degree, could you please do two things for me:

1)  give an example of how I would delete a table, not parse it, when it has a specific ID attribute
2)  is there a way to add white space to the output document to make it more readable
0
 
LVL 24

Accepted Solution

by:
Justin_W earned 500 total points
ID: 11789069
Both of those operations would require more complicated string manipulation or XML parsing.  You could use XmlDocument.LoadXml() to parse a String.  XmlDocument.PreserveWhiteSpace = false should make the XmlDocument format the XML nicely when converted back to a string.  Locating a specific node with specific attributes would require XPath expressions or DOM navigation, and would be a separate question.
0
 

Author Comment

by:JHalstead
ID: 11789080
Alright then, thanks for the help...
0
 
LVL 24

Expert Comment

by:Justin_W
ID: 11789142
You're welcome.
0

Featured Post

Use Case: Protecting a Hybrid Cloud Infrastructure

Microsoft Azure is rapidly becoming the norm in dynamic IT environments. This document describes the challenges that organizations face when protecting data in a hybrid cloud IT environment and presents a use case to demonstrate how Acronis Backup protects all data.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Microsoft Reports are based on a report definition, which is an XML file that describes data and layout for the report, with a different extension. You can create a client-side report definition language (*.rdlc) file with Visual Studio, and build g…
Calculating holidays and working days is a function that is often needed yet it is not one found within the Framework. This article presents one approach to building a working-day calculator for use in .NET.
Windows 10 is mostly good. However the one thing that annoys me is how many clicks you have to do to dial a VPN connection. You have to go to settings from the start menu, (2 clicks), Network and Internet (1 click), Click VPN (another click) then fi…
The Email Laundry PDF encryption service allows companies to send confidential encrypted  emails to anybody. The PDF document can also contain attachments that are embedded in the encrypted PDF. The password is randomly generated by The Email Laundr…

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question