Solved

Using PERL to extract information from an XML or HTML file

Posted on 2004-03-22
15
728 Views
Last Modified: 2013-11-19
I am using a tool called GPMC to extract Group Policy Objects from the active directory set up at my client.
(See http://www.experts-exchange.com/Operating_Systems/Win2000/Q_20919835.html)

I can save this information to either an HTML file or an XML file.  I have about 37 files that are pretty detailed right now.

What I was hoping to do was either scrap the information from the HTML file or access it from the XML file and put it into a coherent format in Excel where I can document line by line the policy file name, who it acts on, the locaton, the settings and so forth.  In some cases the XML data is 3 levels deep, and in some it goes down to 7 levels.  Any ideas how to put the data into a usable format in Excel?

Thanks.

Howard
0
Comment
Question by:hglobus
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 4
  • 4
15 Comments
 
LVL 48

Assisted Solution

by:Tintin
Tintin earned 250 total points
ID: 10653438
Sounds like the Spreadsheet::WriteExcel::FromXML module could be just the thing.

See:

http://search.cpan.org/dist/Spreadsheet-WriteExcel-FromXML/lib/Spreadsheet/WriteExcel/FromXML.pm
0
 

Author Comment

by:hglobus
ID: 10654824
Tintin,

I looked at the module and it looks like it'll do what I need.  The only problem is that I am a realtively newbie on Perl and I am having some preal problems getting it to work.  Any place I can find some examples of working code and try to work it from there?

Thanks.

Howard
0
 
LVL 48

Expert Comment

by:Tintin
ID: 10654868
What code have you got so far.
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:hglobus
ID: 10654952
use strict;
use warnings;
use Spreadsheet::WriteExcel::FromXML;

my $fromxml = Spreadsheet::WriteExcel::FromXML->new( 'Policy.xml' );

$fromxml->parse;
$fromxml->buildSpreadsheet;
$fromxml->writeFile("1.xls");
 
I've tried different things and I'm getting all kinds of stuff:
        Spreadsheet::WriteExcel::FromXML::_processTree('Spreadsheet::WriteExcel:
:FromXML=HASH(0x183eeb0)', 'ARRAY(0x222cc88)', 'GPO', 'SCALAR(0x223d4b8)', 'SCAL
AR(0x223d4d0)') called at C:/Perl/site/lib/Spreadsheet/WriteExcel/FromXML.pm lin
e 186
        Spreadsheet::WriteExcel::FromXML::parse('Spreadsheet::WriteExcel::FromXM
L=HASH(0x183eeb0)') called at C:\Perl\testcode\5.pl line 7
Workbook is uninitialized.  Did you call parse?
        Spreadsheet::WriteExcel::FromXML::buildSpreadsheet('Spreadsheet::WriteEx
cel::FromXML=HASH(0x183eeb0)') called at C:\Perl\testcode\5.pl line 8
0
 
LVL 48

Expert Comment

by:Tintin
ID: 10655045
How did you install the module?

There was a report of it failing  most of the tests under Windows, so perhaps it isn't as mature as it could be.  It has been passed on Solaris.
0
 

Author Comment

by:hglobus
ID: 10656995
I installed the module on a Win32 system using the ActiveState installation method of PPM from the command console.
0
 
LVL 48

Expert Comment

by:Tintin
ID: 10662854
Hmmm.

If you used PPM (I'm assuming it installed from the ActiveState site), then it must have been verified.

Unfortunately, I can't offer any more suggestions apart from trying some additional modules like XML::Simple and one of the Excel modules to do the task.
0
 
LVL 5

Expert Comment

by:burtdav
ID: 10663748
You've probably already checked that 'Policy.xml' is there and accessible to Perl (try to open it normally: "open FH, 'Policy.xml' or die 'error!';")

Perhaps there's been a parsing error (although it should throw an exception in that case). What happens if you try using a stripped-down, very simple XML file for input?
0
 
LVL 5

Expert Comment

by:burtdav
ID: 10663756
#have you tried this alternative?
use strict;
use warnings;
use Spreadsheet::WriteExcel::FromXML;
Spreadsheet::WriteExcel::FromXML->XMLToXLS( "file.xml", "file.xlsx" );
0
 

Author Comment

by:hglobus
ID: 10664110
Ok.  So I can open the file policy.xml, returns opened.

I've tried the code from burtdav and I get the same exact scrolling error set.

Here is the policy.xml file I am working with:

<?xml version="1.0" encoding="utf-16"?>
<GPO xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.microsoft.com/GroupPolicy/Settings">
  <Identifier>
    <Identifier xmlns="http://www.microsoft.com/GroupPolicy/Types">{5FEE37D0-DD01-4C2E-B35E-397495CFFEF5}</Identifier>
    <Domain xmlns="http://www.microsoft.com/GroupPolicy/Types">art-allianz.com</Domain>
  </Identifier>
  <Name>ART NY WS Policy</Name>
  <CreatedTime>2002-12-03T21:21:04.0000000-05:00</CreatedTime>
  <ModifiedTime>2004-01-19T13:50:38.0000000-05:00</ModifiedTime>
  <ReadTime>2004-03-17T19:59:38.8269128-05:00</ReadTime>
  <SecurityDescriptor>
    <SDDL xmlns="http://www.microsoft.com/GroupPolicy/Types/Security">O:DAG:DUD:PAI(OD;OICI;CR;edacfd8f-ffb3-11d1-b41d-00a0c968f939;;DA)(OD;OICI;CR;edacfd8f-ffb3-11d1-b41d-00a0c968f939;;EA)(A;OICI;LCRPLORC;;;AU)(A;OICIIO;CCDCLCSWRPWPDTLOSDRCWDWO;;;CO)(A;OICI;CCDCLCSWRPWPDTLOSDRCWDWO;;;DA)(A;OICI;LCRPRC;;;DC)(A;OICI;CCDCLCSWRPWPDTLOSDRCWDWO;;;EA)(A;OICI;CCDCLCSWRPWPDTLOSDRCWDWO;;;SY)(OA;OICI;CR;edacfd8f-ffb3-11d1-b41d-00a0c968f939;;AU)(OA;OICI;CR;edacfd8f-ffb3-11d1-b41d-00a0c968f939;;DC)S:AI(AU;CIIDSAFA;CCDCSWWPDTCRSDWDWO;;;WD)</SDDL>
    <Owner xmlns="http://www.microsoft.com/GroupPolicy/Types/Security">
      <SID xmlns="http://www.microsoft.com/GroupPolicy/Types">S-1-5-21-73586283-789336058-682003330-512</SID>
      <Name xmlns="http://www.microsoft.com/GroupPolicy/Types">ART-ALLIANZ\Domain Admins</Name>
    </Owner>
    <Group xmlns="http://www.microsoft.com/GroupPolicy/Types/Security">
      <SID xmlns="http://www.microsoft.com/GroupPolicy/Types">S-1-5-21-73586283-789336058-682003330-513</SID>
      <Name xmlns="http://www.microsoft.com/GroupPolicy/Types">ART-ALLIANZ\Domain Users</Name>
    </Group>
    <PermissionsPresent xmlns="http://www.microsoft.com/GroupPolicy/Types/Security">true</PermissionsPresent>
    <Permissions xmlns="http://www.microsoft.com/GroupPolicy/Types/Security">
      <InheritsFromParent>false</InheritsFromParent>
      <TrusteePermissions>
        <Trustee>
          <SID xmlns="http://www.microsoft.com/GroupPolicy/Types">S-1-5-21-73586283-789336058-682003330-519</SID>
          <Name xmlns="http://www.microsoft.com/GroupPolicy/Types">ART-ALLIANZ\Enterprise Admins</Name>
        </Trustee>
        <Type xsi:type="PermissionType">
          <PermissionType>Allow</PermissionType>
        </Type>
        <Inherited>false</Inherited>
        <Applicability>
          <ToSelf>true</ToSelf>
          <ToDescendantObjects>true</ToDescendantObjects>
          <ToDescendantContainers>true</ToDescendantContainers>
          <ToDirectDescendantsOnly>false</ToDirectDescendantsOnly>
        </Applicability>
        <Standard>
          <GPOGroupedAccessEnum>Custom</GPOGroupedAccessEnum>
        </Standard>
        <AccessMask>0</AccessMask>
      </TrusteePermissions>
      <TrusteePermissions>
        <Trustee>
          <SID xmlns="http://www.microsoft.com/GroupPolicy/Types">S-1-5-21-73586283-789336058-682003330-512</SID>
          <Name xmlns="http://www.microsoft.com/GroupPolicy/Types">ART-ALLIANZ\Domain Admins</Name>
        </Trustee>
        <Type xsi:type="PermissionType">
          <PermissionType>Allow</PermissionType>
        </Type>
        <Inherited>false</Inherited>
        <Applicability>
          <ToSelf>true</ToSelf>
          <ToDescendantObjects>true</ToDescendantObjects>
          <ToDescendantContainers>true</ToDescendantContainers>
          <ToDirectDescendantsOnly>false</ToDirectDescendantsOnly>
        </Applicability>
        <Standard>
          <GPOGroupedAccessEnum>Custom</GPOGroupedAccessEnum>
        </Standard>
        <AccessMask>0</AccessMask>
      </TrusteePermissions>
      <TrusteePermissions>
        <Trustee>
          <SID xmlns="http://www.microsoft.com/GroupPolicy/Types">S-1-5-18</SID>
          <Name xmlns="http://www.microsoft.com/GroupPolicy/Types">NT AUTHORITY\SYSTEM</Name>
        </Trustee>
        <Type xsi:type="PermissionType">
          <PermissionType>Allow</PermissionType>
        </Type>
        <Inherited>false</Inherited>
        <Applicability>
          <ToSelf>true</ToSelf>
          <ToDescendantObjects>true</ToDescendantObjects>
          <ToDescendantContainers>true</ToDescendantContainers>
          <ToDirectDescendantsOnly>false</ToDirectDescendantsOnly>
        </Applicability>
        <Standard>
          <GPOGroupedAccessEnum>Edit, delete, modify security</GPOGroupedAccessEnum>
        </Standard>
        <AccessMask>0</AccessMask>
      </TrusteePermissions>
      <TrusteePermissions>
        <Trustee>
          <SID xmlns="http://www.microsoft.com/GroupPolicy/Types">S-1-5-21-73586283-789336058-682003330-515</SID>
          <Name xmlns="http://www.microsoft.com/GroupPolicy/Types">ART-ALLIANZ\Domain Computers</Name>
        </Trustee>
        <Type xsi:type="PermissionType">
          <PermissionType>Allow</PermissionType>
        </Type>
        <Inherited>false</Inherited>
        <Applicability>
          <ToSelf>true</ToSelf>
          <ToDescendantObjects>true</ToDescendantObjects>
          <ToDescendantContainers>true</ToDescendantContainers>
          <ToDirectDescendantsOnly>false</ToDirectDescendantsOnly>
        </Applicability>
        <Standard>
          <GPOGroupedAccessEnum>Apply Group Policy</GPOGroupedAccessEnum>
        </Standard>
        <AccessMask>0</AccessMask>
      </TrusteePermissions>
      <TrusteePermissions>
        <Trustee>
          <SID xmlns="http://www.microsoft.com/GroupPolicy/Types">S-1-5-11</SID>
          <Name xmlns="http://www.microsoft.com/GroupPolicy/Types">NT AUTHORITY\Authenticated Users</Name>
        </Trustee>
        <Type xsi:type="PermissionType">
          <PermissionType>Allow</PermissionType>
        </Type>
        <Inherited>false</Inherited>
        <Applicability>
          <ToSelf>true</ToSelf>
          <ToDescendantObjects>true</ToDescendantObjects>
          <ToDescendantContainers>true</ToDescendantContainers>
          <ToDirectDescendantsOnly>false</ToDirectDescendantsOnly>
        </Applicability>
        <Standard>
          <GPOGroupedAccessEnum>Apply Group Policy</GPOGroupedAccessEnum>
        </Standard>
        <AccessMask>0</AccessMask>
      </TrusteePermissions>
    </Permissions>
    <AuditingPresent xmlns="http://www.microsoft.com/GroupPolicy/Types/Security">false</AuditingPresent>
  </SecurityDescriptor>
  <FilterDataAvailable>true</FilterDataAvailable>
  <Computer>
    <VersionDirectory>1</VersionDirectory>
    <VersionSysvol>1</VersionSysvol>
    <Enabled>true</Enabled>
    <ExtensionData>
      <Extension xmlns:q1="http://www.microsoft.com/GroupPolicy/Settings/Registry" xsi:type="q1:RegistrySettings">
        <q1:Policy>
          <q1:Name>Allow or Disallow use of the Offline Files feature</q1:Name>
          <q1:State>Disabled</q1:State>
          <q1:Explain>Determines whether the Offline Files feature is enabled.\n\nThis setting also disables the "Enable Offline Files" option on the Offline Files tab. This prevents users from trying to change the option while a setting controls it.\n\nOffline Files saves a copy of network files on the user's computer for use when the computer is not connected to the network.\n\nIf you enable this setting, Offline Files is enabled and users cannot disable it.\n\nIf you disable this setting, Offline Files is disabled and users cannot enable it.\n\nBy default, Offline Files is enabled on Windows 2000 Professional and is disabled on Windows 2000 Server.\n\nTip: To enable Offline Files without specifying a setting, in Windows Explorer, on the Tools menu, click Folder Options, click the Offline Files tab, and then click "Enable Offline Files."\n\nNote: To make changes to this setting effective, you must restart Windows 2000.</q1:Explain>
          <q1:Supported>At least Microsoft Windows 2000</q1:Supported>
          <q1:Category>Network/Offline Files</q1:Category>
        </q1:Policy>
      </Extension>
      <Name>Registry</Name>
    </ExtensionData>
  </Computer>
  <User>
    <VersionDirectory>3</VersionDirectory>
    <VersionSysvol>3</VersionSysvol>
    <Enabled>true</Enabled>
    <ExtensionData>
      <Extension xmlns:q2="http://www.microsoft.com/GroupPolicy/Settings/Registry" xsi:type="q2:RegistrySettings">
        <q2:Policy>
          <q2:Name>Active Desktop Wallpaper</q2:Name>
          <q2:State>Enabled</q2:State>
          <q2:Explain>Specifies the desktop background ("wallpaper") displayed on all users' desktops.\n\nThis setting lets you specify the wallpaper on users' desktops and prevents users from changing the image or its presentation. The wallpaper you specify can be stored in a bitmap (*.bmp), JPEG (*.jpg), or HTML (*.htm, *.html) file.\n\nTo use this setting, type the fully qualified path and name of the file that stores the wallpaper image. You can type a local path, such as C:\Windows\web\wallpaper\home.jpg or a UNC path, such as \\Server\Share\Corp.jpg. If the specified file is not available when the user logs on, no wallpaper is displayed. Users cannot specify alternative wallpaper. You can also use this setting to specify that the wallpaper image be centered, tiled, or stretched. Users cannot change this specification.\n\nIf you disable this setting or do not configure it, no wallpaper is displayed. However, users can select the wallpaper of their choice.\n\nAlso, see the "Allow only bitmapped wallpaper" in the same location, and the "Prevent changing wallpaper" setting in User Configuration\Administrative Templates\Control Panel.\n\nNote: You need to enable the Active Desktop to use this setting.\n\nNote: This setting does not apply to Terminal Server sessions.</q2:Explain>
          <q2:Supported>At least Microsoft Windows 2000</q2:Supported>
          <q2:Category>Desktop/Active Desktop</q2:Category>
          <q2:EditText>
            <q2:Name>Wallpaper Name:</q2:Name>
            <q2:State>Enabled</q2:State>
            <q2:Value>c:\winnt\winnt.bmp</q2:Value>
          </q2:EditText>
          <q2:Text>
            <q2:Name>Example: Using a local path:   C:\windows\web\wallpaper\home.jpg</q2:Name>
          </q2:Text>
          <q2:Text>
            <q2:Name>Example: Using a UNC path:     \\Server\Share\Corp.jpg</q2:Name>
          </q2:Text>
          <q2:DropDownList>
            <q2:Name>Wallpaper Style:</q2:Name>
            <q2:State>Enabled</q2:State>
            <q2:Value>
              <q2:Name>Center</q2:Name>
            </q2:Value>
          </q2:DropDownList>
        </q2:Policy>
      </Extension>
      <Name>Registry</Name>
    </ExtensionData>
    <ExtensionData>
      <Extension xmlns:q3="http://www.microsoft.com/GroupPolicy/Settings/IE" xsi:type="q3:InternetExplorerSettings">
        <q3:PreferenceMode>false</q3:PreferenceMode>
      </Extension>
      <Name>Internet Explorer Maintenance</Name>
    </ExtensionData>
  </User>
  <LinksTo>
    <SOMName>Desktops</SOMName>
    <SOMPath>art-allianz.com/NY-ParkAve/Computers/Desktops</SOMPath>
    <Enabled>true</Enabled>
    <NoOverride>false</NoOverride>
  </LinksTo>
  <LinksTo>
    <SOMName>Desktops</SOMName>
    <SOMPath>art-allianz.com/BM-PittsbayRoad/Computers/Desktops</SOMPath>
    <Enabled>true</Enabled>
    <NoOverride>false</NoOverride>
  </LinksTo>
</GPO>
0
 
LVL 5

Expert Comment

by:burtdav
ID: 10664245
try parsing this (sorry I can't test, the perl I'm using right now is really old and hasn't got this module):

<GPO></GPO>
0
 
LVL 48

Expert Comment

by:Tintin
ID: 10664369
burtdav.

I'm pretty sure Spreadsheet::WriteExcel::FromXML uses XML::Parser to parse the XML, so you need to use valid XML examples.
0
 
LVL 5

Accepted Solution

by:
burtdav earned 250 total points
ID: 10664434
You're right, it does use XML::Parser, and so you do need to use valid XML.

Isn't my example valid?

Add a <?xml version="1.0" encoding="utf-16"?> to the top if you like. Make sure the actual encoding of the document matches the encoding attribute of the <?xml> tag.

All I'm saying is, see if it can handle some really basic XML, without namespaces and kilobytes of data.

Then again, maybe it's not working on Windows yet and you'll need to use XML::Parser directly for parsing and find another way to output to Excel (I don't know; maybe Excel::Template, DBD::Excel or Spreadsheet::WriteExcel::Simple::Save; or you could try Spreadsheet::WriteExcel other ways)
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I have been pestered over the years to produce and distribute regular data extracts, and often the request have explicitly requested the data be emailed as an Excel attachement; specifically Excel, as it appears: CSV files confuse (no Red or Green h…
Browsers only know CSS so your awesome SASS code needs to be translated into normal CSS. Here I'll try to explain what you should aim for in order to take full advantage of SASS.
Viewers will learn about if statements in Java and their use The if statement: The condition required to create an if statement: Variations of if statements: An example using if statements:
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …

740 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question