• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 205
  • Last Modified:

Contact Extraction from defunct Exchange priv1.edb database

Hello,

I have a 700 MB priv1.edb exchange database which is corrupt, however I have used various tools such as bintext, and hex editors and such and I CAN see the pieces of exchange information plain as day.  The info is even in order memory wise!  

I notice a unique 7 digit number prior to the individual sections that I would like to extract.

My need is to have a perl, or some other method, script to find that 7 digit number and then extract certain fields after that.  I would like to send these extractions to a separate file.  Delimited text file would be fine, and then I will import them into excel or whatever.

Can be done???


JM


tia
0
jmcnear
Asked:
jmcnear
  • 3
  • 3
1 Solution
 
jmcnearAuthor Commented:
btw,

I have activestate perl installed on my winxp pro box

jm
0
 
ozoCommented:
So the beginning the fields are identified by a 7 digit number, how are the ends of the field identified?
0
 
TintinCommented:
Could you please post a sample line with all the fields.
0
The 14th Annual Expert Award Winners

The results are in! Meet the top members of our 2017 Expert Awards. Congratulations to all who qualified!

 
jmcnearAuthor Commented:
3124745  (binary crap)  John Q. Public 123 Main Street Acity Astate (binary crap) 3124745  (binary crap)  Person 2 info .............          (binary crap) 3124745 (binary crap) person 3 .......... (bc) .....etc...

I would like to find the number 3124745, then skip over the binary crap, extract info, and then find next number and repeat.

If I have to live with the bc that comes between the user info and the following #, that's fine, but I would like to be able to skip the bc that comes between the # and the user info.


jm

0
 
ozoCommented:
Is there any way to determine where the binary crap stops and where the user info starts?
0
 
jmcnearAuthor Commented:
ozo,

I would settle for something like (x non binary crap bytes in a row)

where x= maybe 10 or so, or 5, or whatever. (Basically to indicate that I am within usable ascii information.  I dont expect this to be totally precise science.  I will modify code accordingly to see what fits.  My initial goal is to extract "most" (as much as possible) of the human readable text from this 700MB file, and save it into a separate much smaller file.  I am sure I will then run extractions against that file to fine tune, etc...

So, I suppose something like /a-bA-B0-9/ and 'space'/'tab' for x sequential bytes would be a starting place for the regex to isolate the readable text?

This initial code can be dirty(filthy even :) ), just to get a smaller working set which contains all my ascii that I want to extract.  I will then look at that file's structure and see what I should do from there.

Thanx

JM

0
 
ozoCommented:
$/="3124745";
<>;
while( <> ){
    s/^[\S\s]*?([\w\s.]{10})/$1/;
    print "$_\n";
}
0

Featured Post

Take Control of Web Hosting For Your Clients

As a web developer or IT admin, successfully managing multiple client accounts can be challenging. In this webinar we will look at the tools provided by Media Temple and Plesk to make managing your clients’ hosting easier.

  • 3
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now