Advertisement

02.28.2008 at 06:23AM PST, ID: 23200283
[x]
Attachment Details
[x]
The Solution Rating System

With so many solutions, how can you tell which solutions are most likely to help you and which ones are not? To provide you with a tool to use, we rate our solutions based on various elements that most accurately determine if a solution is a quality solution. To explain what factors affect the solution rating, here are the elements we take into consideration when formulating our solution rating.

  • The Grade of the Solution
  • The Zone Rank of the Expert Providing the Solution
  • The Number of Author and Expert Comments
  • The Number of Experts Contributing
  • The Feedback of the Community

Your Input Matters
Because of the way the system is set up, the most important variable in this equation is you. As a member of Experts Exchange, you are able to cast your vote on the quality of the solutions in regard to how complete, accurate, helpful and easy to understand each solution is. When you provide your feedback, each rating is adjusted accordingly. So, if you see a solution that has a poor rating that you think is a good solution, let us know by rating it. As you do, the rating will be adjusted and will become more accurate for other members of our site.

If you have any suggestions that you would like to make for our rating system, please ask a question in the Suggestions Zone of Community Support.

Thank you!

Text dump created from RDF file by Perl MUCH larger than RDF file

I use a Perl script to create a text-file dump from an rdf file.

The rdf file is about 2 gig and contains much content that I do NOT need.  The text file should contain much LESS content. However I had to abort running the script after the text file created reached 240 gigs!

Why is the file created so much larger?  It should contain LESS data.  

I thought that maybe the problem is that the text file is UTF8 so I changed this:
        binmode(OUT, ":bytes");
And the file created was still larger!

Do you know why it might do this?
Thanks!
Attachments:
 
Perl Code
 
Start your free trial to view this solution
Question Stats
Zone: Programming
Question Asked By: hankknight
Solution Provided By: Adam314
Participating Experts: 1
Solution Grade: A
Views: 0
Translate:
Loading Advertisement...
02.28.2008 at 07:22AM PST, ID: 21004549

Rank: Genius

All comments and solutions are available to Premium Service Members only.

Start your 7-day free trial and see for yourself why Experts Exchange is the easiest and most proven technology resource in the world. Get Started

Already a member? Login to view this solution.

 
02.28.2008 at 07:48AM PST, ID: 21004868

All comments and solutions are available to Premium Service Members only.

Start your 7-day free trial and see for yourself why Experts Exchange is the easiest and most proven technology resource in the world. Get Started

Already a member? Login to view this solution.

 
02.28.2008 at 07:51AM PST, ID: 21004902

All comments and solutions are available to Premium Service Members only.

Start your 7-day free trial and see for yourself why Experts Exchange is the easiest and most proven technology resource in the world. Get Started

Already a member? Login to view this solution.

 
02.28.2008 at 09:08AM PST, ID: 21005839

Rank: Genius

All comments and solutions are available to Premium Service Members only.

Start your 7-day free trial and see for yourself why Experts Exchange is the easiest and most proven technology resource in the world. Get Started

Already a member? Login to view this solution.

 
02.28.2008 at 03:07PM PST, ID: 21009189

All comments and solutions are available to Premium Service Members only.

Start your 7-day free trial and see for yourself why Experts Exchange is the easiest and most proven technology resource in the world. Get Started

Already a member? Login to view this solution.

 
02.28.2008 at 03:24PM PST, ID: 21009313

Rank: Genius

All comments and solutions are available to Premium Service Members only.

Start your 7-day free trial and see for yourself why Experts Exchange is the easiest and most proven technology resource in the world. Get Started

Already a member? Login to view this solution.

 
02.29.2008 at 07:55AM PST, ID: 21014358

All comments and solutions are available to Premium Service Members only.

Start your 7-day free trial and see for yourself why Experts Exchange is the easiest and most proven technology resource in the world. Get Started

Already a member? Login to view this solution.

 
02.29.2008 at 09:10AM PST, ID: 21015258

Rank: Genius

All comments and solutions are available to Premium Service Members only.

Start your 7-day free trial and see for yourself why Experts Exchange is the easiest and most proven technology resource in the world. Get Started

Already a member? Login to view this solution.

 
 
Loading Advertisement...
Microsoft
  • Internet Protocols
  • Applications
  • Development
  • OS
  • Hardware
  • Windows Security
Apple
  • Operating Systems
  • Hardware
  • Programming
  • Networking
  • Software
Internet
  • Search Engines
  • File Sharing
  • WebTrends / Stats
  • Spy / Ad Blockers
  • Web Browsers
  • New Net Users
  • Web Development
  • Chat / IM
  • Anti Spam
  • Web Servers
  • Anti-Virus
  • Email Clients
Gamers
  • Tips
  • Online / MMORPG
  • Puzzle
  • Emulators
  • Action / Adventure
  • Role Playing
  • Consoles
  • Game Programming
  • Strategy
  • Sports
  • Misc
  • Computer Games
Digital Living
  • Hardware
  • Automotive
  • New Net Users
  • New Users
  • Software
  • Digital Music
  • Gaming World
  • Home Security
  • Apple
  • Networking Hardware
Virus & Spyware
  • Vulnerabilities
  • IDS
  • Encryption
  • Anti-Virus
  • Operating Systems Security
  • Software Firewalls
  • WebApplications
  • Cell Phones
  • Operating Systems
  • Internet
  • Hardware Firewalls
Hardware
  • Displays / Monitors
  • Handhelds / PDAs
  • Components
  • Peripherals
  • Laptops/Notebooks
  • Servers
  • Misc
  • Apple
  • Embedded Hardware
  • Networking Hardware
  • Storage
  • Desktops
  • New Users
Software
  • System Utilities
  • Industry Specific
  • Network Management
  • Photos / Graphics
  • Page Layout
  • VMware
  • Misc
  • Web Development
  • OS
  • CYGWIN
  • Voice Recognition
  • Virtualization
  • Message Queue
  • Quality Assurance
  • Security
  • Firewalls
  • MultiMedia Applications
  • Development
  • Database
  • Office / Productivity
  • Business Management
  • OS/2 Apps
  • Server Software
  • Internet / Email
ITPro
  • OS
  • Storage
  • Encryption
  • Operating Systems Security
  • Apple Hardware
  • Laptops & Notebooks
  • Servers
  • Networking Hardware
  • Peripherals
  • Devices
  • Displays / Monitors
  • WebTrends / Stats
  • Search Engines
  • Firewalls
  • Web Computing
  • WebApplications
  • IDS
  • Vulnerabilities
  • Email Clients
  • File Sharing
  • Spy / Ad Blockers
  • Web Browsers
  • Web Servers
  • Networking
  • Anti-Virus
  • Consulting
  • Chat / IM
  • Anti Spam
Developer
  • Web Servers
  • Web Browsers
  • Game Programming
  • Dev Tools
  • Industry Specific
  • Office / Productivity
  • Database
  • CYGWIN
  • Web Development
  • Search Engines
  • File Sharing
  • WebTrends / Stats
  • Programming
  • Content Management
  • Application Servers
  • Protocols
Storage
  • Removable Backup Media
  • Storage Technology
  • Servers
  • Grid
  • Remote Access
  • Backup / Restore
  • Misc
  • Hard Drives
OS
  • Miscellaneous
  • Security
  • Development
  • Linux
  • VMware
  • MainFrame OS
  • Unix
  • Apple
  • OS / 2
  • AS / 400
  • BeOS
  • Microsoft
  • VMS / OpenVMS
Database
  • Oracle
  • Miscellaneous
  • MySQL
  • Software
  • Sybase
  • Contact Management
  • PostgreSQL
  • Data Manipulation
  • Clarion
  • InterSystems Cache
  • Siebel
  • MUMPS
  • OLAP
  • SQLBase
  • SAS
  • GIS & GPS
  • 4GL
  • Berkeley DB
  • DB2
  • Informix
  • Interbase / Firebird
  • FoxPro
  • Reporting
  • LDAP
  • Filemaker Pro
  • MS SQL Server
  • dBase
  • MS Access
Security
  • Misc
  • Web Browsers
  • Software Firewalls
  • Operating Systems Security
  • File Sharing
  • Spy / Ad Blockers
  • Vulnerabilities
  • WebApplications
  • IDS
  • Anti-Virus
  • Encryption
  • Anti Spam
  • Email Clients
  • VPN
  • Chat / IM
Programming
  • Editors IDEs
  • Installation
  • Handhelds / PDAs
  • Multimedia Programming
  • System / Kernel
  • Automation
  • Algorithms
  • Game
  • Signal Processing
  • Project Management
  • Open Source
  • Database
  • Misc
  • Languages
  • Processor Platforms
  • Theory
Web Development
  • Scripting
  • Blogs
  • Web Servers
  • Software
  • Search Engines
  • Web Graphics
  • Web Services
  • Images
  • Internet Marketing
  • Images and Photos
  • Components
  • Document Imaging
  • Web Languages/Standards
  • Illustration
  • WebApplications
  • Fonts
  • WebTrends / Stats
  • Authoring
  • Digital Camera Software
  • Miscellaneous
Networking
  • Protocols
  • Apple Networking
  • Network Management
  • Message Queue
  • Application Servers
  • Content Management
  • File Servers
  • Email Servers
  • Misc
  • Java Editors & IDEs
  • Wireless
  • Networking Hardware
  • Backup / Restore
  • System Utilities
  • ISPs & Hosting
  • Web Servers
  • Storage Technology
  • Removable Backup Media
  • Servers
  • Web Computing
  • Broadband
  • Grid
  • OS / 2
  • Novell Netware
  • Unix Networking
  • Windows Networking
  • Security
  • Telecommunications
  • Operating Systems
  • Linux Networking
Other
  • Lounge
  • Business Travel
  • Community Support
  • New Net Users
  • Philosophy / Religion
  • Math / Science
  • Miscellaneous
  • URLs
  • Expert Lounge
  • Politics
  • Puzzles / Riddles
  • Automotive
Community Support
  • Suggestions
  • New to EE
  • New Topics
  • CleanUp
  • Announcements
  • General
  • Feedback
  • Input
  • EE Bugs
 
02.28.2008 at 07:22AM PST, ID: 21004549

Rank: Genius

I can't access the attached file, i get a permission denied error.
Can you post the code in a message usign the "Attach Code Snippet"?
 
02.28.2008 at 07:48AM PST, ID: 21004868
Thanks, here is the code.
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
29:
30:
31:
32:
33:
34:
35:
36:
37:
38:
39:
40:
41:
42:
43:
44:
45:
46:
47:
48:
49:
50:
51:
52:
53:
54:
55:
56:
57:
58:
59:
#!/usr/bin/perl -w 
use strict;
use XML::Parser;
 
# Thanks Adam314 and Adam314!
 
binmode(STDOUT, ":bytes");
 
print "\n\nBegin...\n\n";
 
##### Create parser, and set handlers
my $parser = new XML::Parser(ErrorContext => 2, Style => 'Stream' );
 
$parser->setHandlers(
  End => \&handle_end,
  Start=>\&handle_start,
  Char=>\&handle_char,
  );
 
##### Open files and parse
open(OUT, ">/disk1/stuff/dump.txt") or die "output: $!\n";
open(IN, "<content.rdf.txt") or die "input: $!\n";
binmode(OUT, ":bytes");
 
$parser->parse(*IN);
close(IN);
close(OUT);
 
##### Variables needed by subroutines below
my $inExternalPage = 0;
my $Url;
my %data;
my $datakey;
 
sub handle_char
{
	return unless defined($datakey);
	$data{$datakey} .= $_[1];
}
 
sub handle_start {
	if($_[1] eq "ExternalPage") {
		$inExternalPage = 1;
		$Url = $_[3];
	}
	elsif( ($inExternalPage) and !defined($datakey) ){
		$datakey = $_[1];
	}
}
 
sub handle_end {
	if($_[1] eq 'ExternalPage') {
		$inExternalPage = 0;
		print OUT "$Url|\t$data{'d:Title'}|\t$data{'d:Description'}|\t$data{topic}\n";
	}
	$datakey = undef;
}
 
print "\n\nDone\n\n";
Open in New Window
 
02.28.2008 at 07:51AM PST, ID: 21004902
I have tested this with both this:
     binmode(OUT, ":bytes");
And this this:
     binmode(OUT, :utf8");

Both have the problem but binmode(OUT, :utf8") is better.

 
02.28.2008 at 09:08AM PST, ID: 21005839

Rank: Genius

Make this change, use this for handle_end:
1:
2:
3:
4:
5:
6:
7:
8:
9:
sub handle_end {
	if($_[1] eq 'ExternalPage') {
		$inExternalPage = 0;
		print "\r$Url";
		print OUT "$Url|\t$data{'d:Title'}|\t$data{'d:Description'}|\t$data{topic}\n";
		%data = ();
	}
	$datakey = undef;
}
Open in New Window
 
02.28.2008 at 03:07PM PST, ID: 21009189
Thanks, what was the problem?
 
02.28.2008 at 03:24PM PST, ID: 21009313

Rank: Genius

Each time it found a new ExternalPage, it added that data to what it already had, instead of just saving the new data.

So each line in the output file had the info for that line, plus for all previous lines.
 
02.29.2008 at 07:55AM PST, ID: 21014358
Thanks.

Now I get an error:
      Use of uninitialized value in concatenation (.) or string at makeDump.pl line 52.

Here is the problem line:
       print OUT "$Url|\t$data{'d:Title'}|\t$data{'d:Description'}|\t$data{topic}\n";

 
02.29.2008 at 09:10AM PST, ID: 21015258

Rank: Genius

That is because some of the ExternalPage don't have a description.  You could either turn off warnings, or give it a description.

To turn of warnings, add this just before the print:
    no warnings;

To give it an initial value:
    Change this:
        %data = ();
    To this:
        %data = ('d:Description' => '', 'd:Title' => '', topic => '');
Accepted Solution
 
 
20080236-EE-VQP-29 / EE_QW_EXPERT_20070906