Solved

Using PERL to extract information from an XML or HTML file

Posted on 2004-03-22
15
717 Views
Last Modified: 2013-11-19
I am using a tool called GPMC to extract Group Policy Objects from the active directory set up at my client.
(See http://www.experts-exchange.com/Operating_Systems/Win2000/Q_20919835.html)

I can save this information to either an HTML file or an XML file.  I have about 37 files that are pretty detailed right now.

What I was hoping to do was either scrap the information from the HTML file or access it from the XML file and put it into a coherent format in Excel where I can document line by line the policy file name, who it acts on, the locaton, the settings and so forth.  In some cases the XML data is 3 levels deep, and in some it goes down to 7 levels.  Any ideas how to put the data into a usable format in Excel?

Thanks.

Howard
0
Comment
Question by:hglobus
  • 5
  • 4
  • 4
15 Comments
 
LVL 48

Assisted Solution

by:Tintin
Tintin earned 250 total points
ID: 10653438
Sounds like the Spreadsheet::WriteExcel::FromXML module could be just the thing.

See:

http://search.cpan.org/dist/Spreadsheet-WriteExcel-FromXML/lib/Spreadsheet/WriteExcel/FromXML.pm
0
 

Author Comment

by:hglobus
ID: 10654824
Tintin,

I looked at the module and it looks like it'll do what I need.  The only problem is that I am a realtively newbie on Perl and I am having some preal problems getting it to work.  Any place I can find some examples of working code and try to work it from there?

Thanks.

Howard
0
 
LVL 48

Expert Comment

by:Tintin
ID: 10654868
What code have you got so far.
0
 

Author Comment

by:hglobus
ID: 10654952
use strict;
use warnings;
use Spreadsheet::WriteExcel::FromXML;

my $fromxml = Spreadsheet::WriteExcel::FromXML->new( 'Policy.xml' );

$fromxml->parse;
$fromxml->buildSpreadsheet;
$fromxml->writeFile("1.xls");
 
I've tried different things and I'm getting all kinds of stuff:
        Spreadsheet::WriteExcel::FromXML::_processTree('Spreadsheet::WriteExcel:
:FromXML=HASH(0x183eeb0)', 'ARRAY(0x222cc88)', 'GPO', 'SCALAR(0x223d4b8)', 'SCAL
AR(0x223d4d0)') called at C:/Perl/site/lib/Spreadsheet/WriteExcel/FromXML.pm lin
e 186
        Spreadsheet::WriteExcel::FromXML::parse('Spreadsheet::WriteExcel::FromXM
L=HASH(0x183eeb0)') called at C:\Perl\testcode\5.pl line 7
Workbook is uninitialized.  Did you call parse?
        Spreadsheet::WriteExcel::FromXML::buildSpreadsheet('Spreadsheet::WriteEx
cel::FromXML=HASH(0x183eeb0)') called at C:\Perl\testcode\5.pl line 8
0
 
LVL 48

Expert Comment

by:Tintin
ID: 10655045
How did you install the module?

There was a report of it failing  most of the tests under Windows, so perhaps it isn't as mature as it could be.  It has been passed on Solaris.
0
 

Author Comment

by:hglobus
ID: 10656995
I installed the module on a Win32 system using the ActiveState installation method of PPM from the command console.
0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 
LVL 48

Expert Comment

by:Tintin
ID: 10662854
Hmmm.

If you used PPM (I'm assuming it installed from the ActiveState site), then it must have been verified.

Unfortunately, I can't offer any more suggestions apart from trying some additional modules like XML::Simple and one of the Excel modules to do the task.
0
 
LVL 5

Expert Comment

by:burtdav
ID: 10663748
You've probably already checked that 'Policy.xml' is there and accessible to Perl (try to open it normally: "open FH, 'Policy.xml' or die 'error!';")

Perhaps there's been a parsing error (although it should throw an exception in that case). What happens if you try using a stripped-down, very simple XML file for input?
0
 
LVL 5

Expert Comment

by:burtdav
ID: 10663756
#have you tried this alternative?
use strict;
use warnings;
use Spreadsheet::WriteExcel::FromXML;
Spreadsheet::WriteExcel::FromXML->XMLToXLS( "file.xml", "file.xlsx" );
0
 

Author Comment

by:hglobus
ID: 10664110
Ok.  So I can open the file policy.xml, returns opened.

I've tried the code from burtdav and I get the same exact scrolling error set.

Here is the policy.xml file I am working with:

<?xml version="1.0" encoding="utf-16"?>
<GPO xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.microsoft.com/GroupPolicy/Settings">
  <Identifier>
    <Identifier xmlns="http://www.microsoft.com/GroupPolicy/Types">{5FEE37D0-DD01-4C2E-B35E-397495CFFEF5}</Identifier>
    <Domain xmlns="http://www.microsoft.com/GroupPolicy/Types">art-allianz.com</Domain>
  </Identifier>
  <Name>ART NY WS Policy</Name>
  <CreatedTime>2002-12-03T21:21:04.0000000-05:00</CreatedTime>
  <ModifiedTime>2004-01-19T13:50:38.0000000-05:00</ModifiedTime>
  <ReadTime>2004-03-17T19:59:38.8269128-05:00</ReadTime>
  <SecurityDescriptor>
    <SDDL xmlns="http://www.microsoft.com/GroupPolicy/Types/Security">O:DAG:DUD:PAI(OD;OICI;CR;edacfd8f-ffb3-11d1-b41d-00a0c968f939;;DA)(OD;OICI;CR;edacfd8f-ffb3-11d1-b41d-00a0c968f939;;EA)(A;OICI;LCRPLORC;;;AU)(A;OICIIO;CCDCLCSWRPWPDTLOSDRCWDWO;;;CO)(A;OICI;CCDCLCSWRPWPDTLOSDRCWDWO;;;DA)(A;OICI;LCRPRC;;;DC)(A;OICI;CCDCLCSWRPWPDTLOSDRCWDWO;;;EA)(A;OICI;CCDCLCSWRPWPDTLOSDRCWDWO;;;SY)(OA;OICI;CR;edacfd8f-ffb3-11d1-b41d-00a0c968f939;;AU)(OA;OICI;CR;edacfd8f-ffb3-11d1-b41d-00a0c968f939;;DC)S:AI(AU;CIIDSAFA;CCDCSWWPDTCRSDWDWO;;;WD)</SDDL>
    <Owner xmlns="http://www.microsoft.com/GroupPolicy/Types/Security">
      <SID xmlns="http://www.microsoft.com/GroupPolicy/Types">S-1-5-21-73586283-789336058-682003330-512</SID>
      <Name xmlns="http://www.microsoft.com/GroupPolicy/Types">ART-ALLIANZ\Domain Admins</Name>
    </Owner>
    <Group xmlns="http://www.microsoft.com/GroupPolicy/Types/Security">
      <SID xmlns="http://www.microsoft.com/GroupPolicy/Types">S-1-5-21-73586283-789336058-682003330-513</SID>
      <Name xmlns="http://www.microsoft.com/GroupPolicy/Types">ART-ALLIANZ\Domain Users</Name>
    </Group>
    <PermissionsPresent xmlns="http://www.microsoft.com/GroupPolicy/Types/Security">true</PermissionsPresent>
    <Permissions xmlns="http://www.microsoft.com/GroupPolicy/Types/Security">
      <InheritsFromParent>false</InheritsFromParent>
      <TrusteePermissions>
        <Trustee>
          <SID xmlns="http://www.microsoft.com/GroupPolicy/Types">S-1-5-21-73586283-789336058-682003330-519</SID>
          <Name xmlns="http://www.microsoft.com/GroupPolicy/Types">ART-ALLIANZ\Enterprise Admins</Name>
        </Trustee>
        <Type xsi:type="PermissionType">
          <PermissionType>Allow</PermissionType>
        </Type>
        <Inherited>false</Inherited>
        <Applicability>
          <ToSelf>true</ToSelf>
          <ToDescendantObjects>true</ToDescendantObjects>
          <ToDescendantContainers>true</ToDescendantContainers>
          <ToDirectDescendantsOnly>false</ToDirectDescendantsOnly>
        </Applicability>
        <Standard>
          <GPOGroupedAccessEnum>Custom</GPOGroupedAccessEnum>
        </Standard>
        <AccessMask>0</AccessMask>
      </TrusteePermissions>
      <TrusteePermissions>
        <Trustee>
          <SID xmlns="http://www.microsoft.com/GroupPolicy/Types">S-1-5-21-73586283-789336058-682003330-512</SID>
          <Name xmlns="http://www.microsoft.com/GroupPolicy/Types">ART-ALLIANZ\Domain Admins</Name>
        </Trustee>
        <Type xsi:type="PermissionType">
          <PermissionType>Allow</PermissionType>
        </Type>
        <Inherited>false</Inherited>
        <Applicability>
          <ToSelf>true</ToSelf>
          <ToDescendantObjects>true</ToDescendantObjects>
          <ToDescendantContainers>true</ToDescendantContainers>
          <ToDirectDescendantsOnly>false</ToDirectDescendantsOnly>
        </Applicability>
        <Standard>
          <GPOGroupedAccessEnum>Custom</GPOGroupedAccessEnum>
        </Standard>
        <AccessMask>0</AccessMask>
      </TrusteePermissions>
      <TrusteePermissions>
        <Trustee>
          <SID xmlns="http://www.microsoft.com/GroupPolicy/Types">S-1-5-18</SID>
          <Name xmlns="http://www.microsoft.com/GroupPolicy/Types">NT AUTHORITY\SYSTEM</Name>
        </Trustee>
        <Type xsi:type="PermissionType">
          <PermissionType>Allow</PermissionType>
        </Type>
        <Inherited>false</Inherited>
        <Applicability>
          <ToSelf>true</ToSelf>
          <ToDescendantObjects>true</ToDescendantObjects>
          <ToDescendantContainers>true</ToDescendantContainers>
          <ToDirectDescendantsOnly>false</ToDirectDescendantsOnly>
        </Applicability>
        <Standard>
          <GPOGroupedAccessEnum>Edit, delete, modify security</GPOGroupedAccessEnum>
        </Standard>
        <AccessMask>0</AccessMask>
      </TrusteePermissions>
      <TrusteePermissions>
        <Trustee>
          <SID xmlns="http://www.microsoft.com/GroupPolicy/Types">S-1-5-21-73586283-789336058-682003330-515</SID>
          <Name xmlns="http://www.microsoft.com/GroupPolicy/Types">ART-ALLIANZ\Domain Computers</Name>
        </Trustee>
        <Type xsi:type="PermissionType">
          <PermissionType>Allow</PermissionType>
        </Type>
        <Inherited>false</Inherited>
        <Applicability>
          <ToSelf>true</ToSelf>
          <ToDescendantObjects>true</ToDescendantObjects>
          <ToDescendantContainers>true</ToDescendantContainers>
          <ToDirectDescendantsOnly>false</ToDirectDescendantsOnly>
        </Applicability>
        <Standard>
          <GPOGroupedAccessEnum>Apply Group Policy</GPOGroupedAccessEnum>
        </Standard>
        <AccessMask>0</AccessMask>
      </TrusteePermissions>
      <TrusteePermissions>
        <Trustee>
          <SID xmlns="http://www.microsoft.com/GroupPolicy/Types">S-1-5-11</SID>
          <Name xmlns="http://www.microsoft.com/GroupPolicy/Types">NT AUTHORITY\Authenticated Users</Name>
        </Trustee>
        <Type xsi:type="PermissionType">
          <PermissionType>Allow</PermissionType>
        </Type>
        <Inherited>false</Inherited>
        <Applicability>
          <ToSelf>true</ToSelf>
          <ToDescendantObjects>true</ToDescendantObjects>
          <ToDescendantContainers>true</ToDescendantContainers>
          <ToDirectDescendantsOnly>false</ToDirectDescendantsOnly>
        </Applicability>
        <Standard>
          <GPOGroupedAccessEnum>Apply Group Policy</GPOGroupedAccessEnum>
        </Standard>
        <AccessMask>0</AccessMask>
      </TrusteePermissions>
    </Permissions>
    <AuditingPresent xmlns="http://www.microsoft.com/GroupPolicy/Types/Security">false</AuditingPresent>
  </SecurityDescriptor>
  <FilterDataAvailable>true</FilterDataAvailable>
  <Computer>
    <VersionDirectory>1</VersionDirectory>
    <VersionSysvol>1</VersionSysvol>
    <Enabled>true</Enabled>
    <ExtensionData>
      <Extension xmlns:q1="http://www.microsoft.com/GroupPolicy/Settings/Registry" xsi:type="q1:RegistrySettings">
        <q1:Policy>
          <q1:Name>Allow or Disallow use of the Offline Files feature</q1:Name>
          <q1:State>Disabled</q1:State>
          <q1:Explain>Determines whether the Offline Files feature is enabled.\n\nThis setting also disables the "Enable Offline Files" option on the Offline Files tab. This prevents users from trying to change the option while a setting controls it.\n\nOffline Files saves a copy of network files on the user's computer for use when the computer is not connected to the network.\n\nIf you enable this setting, Offline Files is enabled and users cannot disable it.\n\nIf you disable this setting, Offline Files is disabled and users cannot enable it.\n\nBy default, Offline Files is enabled on Windows 2000 Professional and is disabled on Windows 2000 Server.\n\nTip: To enable Offline Files without specifying a setting, in Windows Explorer, on the Tools menu, click Folder Options, click the Offline Files tab, and then click "Enable Offline Files."\n\nNote: To make changes to this setting effective, you must restart Windows 2000.</q1:Explain>
          <q1:Supported>At least Microsoft Windows 2000</q1:Supported>
          <q1:Category>Network/Offline Files</q1:Category>
        </q1:Policy>
      </Extension>
      <Name>Registry</Name>
    </ExtensionData>
  </Computer>
  <User>
    <VersionDirectory>3</VersionDirectory>
    <VersionSysvol>3</VersionSysvol>
    <Enabled>true</Enabled>
    <ExtensionData>
      <Extension xmlns:q2="http://www.microsoft.com/GroupPolicy/Settings/Registry" xsi:type="q2:RegistrySettings">
        <q2:Policy>
          <q2:Name>Active Desktop Wallpaper</q2:Name>
          <q2:State>Enabled</q2:State>
          <q2:Explain>Specifies the desktop background ("wallpaper") displayed on all users' desktops.\n\nThis setting lets you specify the wallpaper on users' desktops and prevents users from changing the image or its presentation. The wallpaper you specify can be stored in a bitmap (*.bmp), JPEG (*.jpg), or HTML (*.htm, *.html) file.\n\nTo use this setting, type the fully qualified path and name of the file that stores the wallpaper image. You can type a local path, such as C:\Windows\web\wallpaper\home.jpg or a UNC path, such as \\Server\Share\Corp.jpg. If the specified file is not available when the user logs on, no wallpaper is displayed. Users cannot specify alternative wallpaper. You can also use this setting to specify that the wallpaper image be centered, tiled, or stretched. Users cannot change this specification.\n\nIf you disable this setting or do not configure it, no wallpaper is displayed. However, users can select the wallpaper of their choice.\n\nAlso, see the "Allow only bitmapped wallpaper" in the same location, and the "Prevent changing wallpaper" setting in User Configuration\Administrative Templates\Control Panel.\n\nNote: You need to enable the Active Desktop to use this setting.\n\nNote: This setting does not apply to Terminal Server sessions.</q2:Explain>
          <q2:Supported>At least Microsoft Windows 2000</q2:Supported>
          <q2:Category>Desktop/Active Desktop</q2:Category>
          <q2:EditText>
            <q2:Name>Wallpaper Name:</q2:Name>
            <q2:State>Enabled</q2:State>
            <q2:Value>c:\winnt\winnt.bmp</q2:Value>
          </q2:EditText>
          <q2:Text>
            <q2:Name>Example: Using a local path:   C:\windows\web\wallpaper\home.jpg</q2:Name>
          </q2:Text>
          <q2:Text>
            <q2:Name>Example: Using a UNC path:     \\Server\Share\Corp.jpg</q2:Name>
          </q2:Text>
          <q2:DropDownList>
            <q2:Name>Wallpaper Style:</q2:Name>
            <q2:State>Enabled</q2:State>
            <q2:Value>
              <q2:Name>Center</q2:Name>
            </q2:Value>
          </q2:DropDownList>
        </q2:Policy>
      </Extension>
      <Name>Registry</Name>
    </ExtensionData>
    <ExtensionData>
      <Extension xmlns:q3="http://www.microsoft.com/GroupPolicy/Settings/IE" xsi:type="q3:InternetExplorerSettings">
        <q3:PreferenceMode>false</q3:PreferenceMode>
      </Extension>
      <Name>Internet Explorer Maintenance</Name>
    </ExtensionData>
  </User>
  <LinksTo>
    <SOMName>Desktops</SOMName>
    <SOMPath>art-allianz.com/NY-ParkAve/Computers/Desktops</SOMPath>
    <Enabled>true</Enabled>
    <NoOverride>false</NoOverride>
  </LinksTo>
  <LinksTo>
    <SOMName>Desktops</SOMName>
    <SOMPath>art-allianz.com/BM-PittsbayRoad/Computers/Desktops</SOMPath>
    <Enabled>true</Enabled>
    <NoOverride>false</NoOverride>
  </LinksTo>
</GPO>
0
 
LVL 5

Expert Comment

by:burtdav
ID: 10664245
try parsing this (sorry I can't test, the perl I'm using right now is really old and hasn't got this module):

<GPO></GPO>
0
 
LVL 48

Expert Comment

by:Tintin
ID: 10664369
burtdav.

I'm pretty sure Spreadsheet::WriteExcel::FromXML uses XML::Parser to parse the XML, so you need to use valid XML examples.
0
 
LVL 5

Accepted Solution

by:
burtdav earned 250 total points
ID: 10664434
You're right, it does use XML::Parser, and so you do need to use valid XML.

Isn't my example valid?

Add a <?xml version="1.0" encoding="utf-16"?> to the top if you like. Make sure the actual encoding of the document matches the encoding attribute of the <?xml> tag.

All I'm saying is, see if it can handle some really basic XML, without namespaces and kilobytes of data.

Then again, maybe it's not working on Windows yet and you'll need to use XML::Parser directly for parsing and find another way to output to Excel (I don't know; maybe Excel::Template, DBD::Excel or Spreadsheet::WriteExcel::Simple::Save; or you could try Spreadsheet::WriteExcel other ways)
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Suggested Solutions

A year or so back I was asked to have a play with MongoDB; within half an hour I had downloaded (http://www.mongodb.org/downloads),  installed and started the daemon, and had a console window open. After an hour or two of playing at the command …
I found this questions asking how to do this in many different forums, so I will describe here how to implement a solution using PHP and AJAX. The logical flow for the problem should be: Write an event handler for the first drop down box to get …
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…

705 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now