Solved

Contact Extraction from defunct Exchange priv1.edb database

Posted on 2004-10-13
7
192 Views
Last Modified: 2010-03-05
Hello,

I have a 700 MB priv1.edb exchange database which is corrupt, however I have used various tools such as bintext, and hex editors and such and I CAN see the pieces of exchange information plain as day.  The info is even in order memory wise!  

I notice a unique 7 digit number prior to the individual sections that I would like to extract.

My need is to have a perl, or some other method, script to find that 7 digit number and then extract certain fields after that.  I would like to send these extractions to a separate file.  Delimited text file would be fine, and then I will import them into excel or whatever.

Can be done???


JM


tia
0
Comment
Question by:jmcnear
  • 3
  • 3
7 Comments
 

Author Comment

by:jmcnear
ID: 12298921
btw,

I have activestate perl installed on my winxp pro box

jm
0
 
LVL 84

Expert Comment

by:ozo
ID: 12302740
So the beginning the fields are identified by a 7 digit number, how are the ends of the field identified?
0
 
LVL 48

Expert Comment

by:Tintin
ID: 12304940
Could you please post a sample line with all the fields.
0
Master Your Team's Linux and Cloud Stack!

The average business loses $13.5M per year to ineffective training (per 1,000 employees). Keep ahead of the competition and combine in-person quality with online cost and flexibility by training with Linux Academy.

 

Author Comment

by:jmcnear
ID: 12376017
3124745  (binary crap)  John Q. Public 123 Main Street Acity Astate (binary crap) 3124745  (binary crap)  Person 2 info .............          (binary crap) 3124745 (binary crap) person 3 .......... (bc) .....etc...

I would like to find the number 3124745, then skip over the binary crap, extract info, and then find next number and repeat.

If I have to live with the bc that comes between the user info and the following #, that's fine, but I would like to be able to skip the bc that comes between the # and the user info.


jm

0
 
LVL 84

Expert Comment

by:ozo
ID: 12378145
Is there any way to determine where the binary crap stops and where the user info starts?
0
 

Author Comment

by:jmcnear
ID: 12399568
ozo,

I would settle for something like (x non binary crap bytes in a row)

where x= maybe 10 or so, or 5, or whatever. (Basically to indicate that I am within usable ascii information.  I dont expect this to be totally precise science.  I will modify code accordingly to see what fits.  My initial goal is to extract "most" (as much as possible) of the human readable text from this 700MB file, and save it into a separate much smaller file.  I am sure I will then run extractions against that file to fine tune, etc...

So, I suppose something like /a-bA-B0-9/ and 'space'/'tab' for x sequential bytes would be a starting place for the regex to isolate the readable text?

This initial code can be dirty(filthy even :) ), just to get a smaller working set which contains all my ascii that I want to extract.  I will then look at that file's structure and see what I should do from there.

Thanx

JM

0
 
LVL 84

Accepted Solution

by:
ozo earned 125 total points
ID: 12408017
$/="3124745";
<>;
while( <> ){
    s/^[\S\s]*?([\w\s.]{10})/$1/;
    print "$_\n";
}
0

Featured Post

3 Use Cases for Connected Systems

Our Dev teams are like yours. They’re continually cranking out code for new features/bugs fixes, testing, deploying, testing some more, responding to production monitoring events and more. It’s complex. So, we thought you’d like to see what’s working for us.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
hard perl script 16 156
Perl efficient DB Call 8 87
Regular Expression question to filter with negation. 6 96
combine multiple lines 2 60
I've just discovered very important differences between Windows an Unix formats in Perl,at least 5.xx.. MOST IMPORTANT: Use Unix file format while saving Your script. otherwise it will have ^M s or smth likely weird in the EOL, Then DO NOT use m…
Many time we need to work with multiple files all together. If its windows system then we can use some GUI based editor to accomplish our task. But what if you are on putty or have only CLI(Command Line Interface) as an option to  edit your files. I…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
This video shows how to use Hyena, from SystemTools Software, to bulk import 100 user accounts from an external text file. View in 1080p for best video quality.

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question