Data Recovery ? accepted answer gets an A Grade !!

I am not sure where to put this question, so I figured since I would like to make programs that relate to data recovery that this would be a good place for it.

My question is how do you go abouts doing data recovery and what would I need to know to be able to do it. Any websites , resources, pdf, word documents that anyone can send me would be very much appreciated, especially if they are visual basic or C++ data recovery source code examples or anything like that !!!

I have heard that you need to have prior knowledge of the fileing system you are using to be able to do data recovery. If so, what sort of knowledge do you need of the fileing systems ?

If anyone answers it with URL's and documents that are useful !! ( that they can send to my email address, which is in my profile, then I will def give them an A grade!!)
LVL 23
Shane Russell2nd Line Desktop SupportAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

You have to first answer a few questions before looking for a possible solution

what kind of data ? (data in ram, data on disk, non informatic data, ...)

recovery from what ? (process crash, system crash, disk crash, power failure, fire, earthquake,...)

Answering these questions will help narrowing the answer to your needs. There are plenty of ways to secure data.
Shane Russell2nd Line Desktop SupportAuthor Commented:
I want to know anything at all about data recovery, so all of what you mentioned basically, what is a process crash? Taking into consideration that I want to make a program based on any useful information that I get !

All types of filing systems fat 12, 16, 32 and NTFS. As well as data in the RAM , on disk, and any type of data ( audio, video, data files such as word documents, excel files, databases, basically anything that the user wants to recover) wheather they need to run the program i make from an external disk (disc) or if they run it from the same disk( I know that might over write data though )

Information on how to recover from fires, earthquakes and them sorts of things would be very useful !

I know there are ways to secure it, but that was not what I was asking ;)
Shane Russell2nd Line Desktop SupportAuthor Commented:
I hope those were the answers you were looking for, any information you could give me at all would be very much appreciated !
Determine the Perfect Price for Your IT Services

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden with our free interactive tool and use it to determine the right price for your IT services. Download your free eBook now!

The reason why one need to know more precisely the scope of your investigation is because the strategy and technology will differ.

The problem is that your question is still to vague because the crucial point is the cost to secure data. This cost is like an insurance, so you want to balance the investment regarding the risk. This risk depends on the probability of the incident to happen and the loss in case it happens.

For instance, there is no point in securing your data against an atomic blast because you wont survive it. But for some data, it may be worth the cost because they may be critical. So you have to identify the risks you want to protect your data against and balance the cost and benefit in saving them.

Looking to most frequent incident, we have a process crash. A program that is currently running is called a process. If there is a bug in the program, like a divison by zero for instance, your program may stop. The user may also decide to kill the program at any time. You might be interrested in recover some data that was generated but the process just before it was terminated. All data in memory is erased by the system. So you have to store this data in a place where it can be recovered later.

The obvious place is the disk. You can copy data to the disk where it will remain after the program terminates. You may even recover it after the system is shutdown and restarted. So to secure dynamic data in your program you have to write them to disk. This doesn't happen for free. Writing to disk task around 10.000 times the time to write the same information into ram. So you have to find a balance between the cost to save data and loosing it. This application and context specific. Word for instance provide a system to automatically save the file at regular time interval. The user can choose the time interval. If he works on a huge file with allot of images, the saving time may be tens of seconds which is a nuisance when it happens in the middle of writing a sentence. But for most use implying small files, this strategy is fine. One can even choose a very short saving period.

Ok we have seen that a good strategy is saving data on a medium that would survive the accident. Let extend this aspect. As you know it is also possible to erase a disk. Which means that in some rare but possible accident, your data saved on disk might be lost. So you may use a none erasable medium like a writable CD or a DVD. This is used in some companies and lotteries for obvious reasons. The addiditonal price to pay compared to a disk is that you can't recover space used by saved data that you don't need any more. You will also need to change the media when it is full. And last problem, you may end up with usefull data scattered on multiple DVD so that you will need a systme to locate it and get access to it. So this approach has also its limits and can be used only in particular context and for applications when you rarely discard saved data or need access to previously saved data.

But this doesn't protect against fire or earthquakes because the CDs and DVDs can be destroyed. In this case one used a network and a distributed saving system. You can distribute copies of your data among multiple computers in your company's building. This often called backups or mirroring. Suppose your company is in california, you would then save your data in another place of the US where earthquakes are much less probable. Or at least the probability that both places are destroyed at the same time is very low. Some copanies are dedicated thus such service. The advantage is that the can be located in very cheap and retired location. The only requirement is a fast network. This strategy would also protect your data against terrorist actions. SOme locations are more subject to such risk then others. But this distributed copy of your data has a price. Sending your data accross the network takes much more time then writing them directly to disk. So here also you have to find a balance.

Another aspect is securing data against malvolent destruction or access. Distributing copies decrease the risk to loose the data but increase the risk that an ennemy gets a copy. But here also there are algorithms to secure your data against such threats. The algorithm is knows as distributed secret. This algorithm split your data in N different pieces so that you need to put together M pieces to restore the data. If M = N you are the most secure agains rubbury since the ennemy would have to break in N different safe places. But loosing one of the pieces is enough to loose your data, so the risks are higher than with storing your data in only one place. The shared secret algorithm allows to have M < N. Suppose N is 6 you could only require 3 pieces to restore the data. 4 pieces needs to be destroyed to loose your data. And 3 pieces need to collected to steal your data. This is shared secret.

Now lets detail a bit what implies saving data. If your program is searching prime numbers, you would simply have to append the latest prime number found. You don't have to modify previous data and this is fine and easy. A DVD or CD would also be fine. Suppose the data is constantly changing, you could simply write the new version in sequence and at some point recover the lost space. This is a valid solution as long as the size of data we are considering is small. Suppose now that we have magabytes of data. Saving these megabytes takes allot of time, so you can't simply save them all periodically.

Obviously one would only save the changes to the data. This is called journaling. You just write into a journal all the changes applied to the data. You only save at very infrequent checkpoints the whole data. You can reconstruct the data at a later time by replaying the changes found in the journal. But as you see recovering data is a lengthy process. This is fine if crashes are very infrequent and you can afford the time to reconstruct it. But in most application this is not acceptable because we also want to allow restarting the program at any time by the user and that this operation is fast. This would be needed for a database or for word. If the user modifies an existing file and saves it, it should be very fast to load the existing file and to save the changes.

This means that data is stored on disk and you only load the data you really need to access. And for saving you only save the modifications. You don't load the whole file and save it all to disk. For big files this would not be acceptable. For database which are big this requirement is even more obvious. The solution is to use the journal but to apply its changes to the data stored on disk very frequently. Reduce the number of changes saved into the journal before replaying it.

It may happen that the program terminates in the middle of applying its changes to the data stored on disk. When we recover we would restart replaying the journal without knowing where it was aborted. There is no safe and sure way to know where the journal repaly was interrupted. If the computer is stopped because of a power failure, then you really don't know where this happen. This is why we constrain the journal to contain only operations that can be interrupted and replayed multiple time with risk and problems. And this is the write operation. So in the journal you store only write operations with the data you write and the location in the data where you write it. This is a very low level operation, but this is the only safe way to secure data. But when you recover you need to know if you have to discard the journal or replay its write operations which may have been interrupted in the middle. So for this you have to set a flag in the file holding the data.

This sound all simple and obvious, but you have to know that disk are designed to optimize data access. So they may cache data in a very fast memory. So if you ask to the disk some data that was read a short time before you will get it much faster than a data that really needs to be fetched from disk. So all disk have caches because reading data from memory is ten tousand time faster than reading from disk. This is not a problem for our journaling system. What would be a problem is a writing cache. Such cache delays the writing to disk so that multiple changes to the same data is as fast as writing to memory. It will only save the data to disk when there has been no change to it for a certain amount of time. If the program terminates, this disk cache is not a critical problem because the disk manages it itself. But if a power failure occurs, this write cache is a problem, because you may beleive the data is saved on disk while it is not. One could imagine disk with a small battery allowing them to still save their cache to disk even if there is no power anymore, but I don't think this exist. It would make the disk more expensive and in fact one only need to add a special command to force the disk to write data stored in cache to disk. This command is called commit or sync.

Another strategy could be to backup the data you plan to modify before changing it. If a crach occures you can still replace the data with the old version. The advantage of this method is that the data is modified in place. THis may ba advatagious for some application. The disadvantage is that before you can modify the data you have to be sure the backup is safe on disk. This makes it a not so good strategy. But in some application this may be better than journaling.

What we have discussed so far is a problem that operating systems also have to face. The file system is also a sort of database on disk. One has to secure it so that in case of system crash one doesn't loose access to all the files or that files don't get corrupt. For instance FAT32 is file system without journaling. So it is unsaafe. You may loose files in case of system crash or power failure. You may even risk to loose the whole disk. NTFS is a safer file system because it has a journal. But this journal doesn't protect user data. It only secure the file system so that you don't loose you files or disk.

On linux there also exist none safe and safe file system management programs. The safe file system uses journaling. ReiserFS (for linux) uses a different method. It writes modified data in a new place. The particular data structure needed to manage such file system makes it advantageous. But it also has its limit. The principle is that the system use an index to the data. When data is modified, it doesn't get overwritten by the new data. It is written in a new location and the index is modified. But modifying the index also requires to write some if its data in a new place. As you see a small change of data may trigger many changes and data copies. With journaling at most two writes are needed. This makes it a fast and predictible system.

So we have talked about data copy on a disk or on a remote place. Now suppose you just one to protect your data against disk crash. RAID is a standard defining different techniques to secure data save on disk. A basic mathod would be to use two disks. Whenever you write to one disk to write the same value to the other. This called mirroring. One the disk is master and the other a shadow disk. It is an exact copy. If one of the disk crashes your data is not lost. This divides the risk by two.

Can we do better ? Yes ! Suppose you can afford three disks. You then spread your data on two disk. To simplify the demonstration, suppose you save bytes with even position on disk 1 and bytes with odd position on disk 2. Of course if you loose one of the two disk, you loose your data, but you gained in performance since both disk now operate in parallel. (not in seek time, but in transfer time). So apparently we lost in safeness. But now we would save some backup data on the thrid disk.

For each corresponding byte on disk 1 and disk 2 we will save the result of xoring them. Suppose a and b the two bytes at same position in disk 1 and 2. We then save
c which is computed by a xor b into the thrid disk. What is interesting with this xor operation is that if you then compute c xor a you get b or if you do c xor b you get a.
This is very interesting because you now have reduced the risk to loose your data. You would have to loose two disks over three to loose your data. As you know this is highly unprobable.

This xor operation is a very simple example of shared secret. You need to have access to two pieces over three to reconstruct the data. Or two pieces needs to be destroyed to loose the data. The shared secret algorithm allows to extend this to higher numbers. But this has a price. The required operations are much more complex than the simple xor.

This should give you a summary view of sceuring data issues and strategies. What kind of program would you want to write or would you need ?

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Shane Russell2nd Line Desktop SupportAuthor Commented:
a program that restores files from storage media such as a hard drive on any windows filing system. Regardless of what they have done, formatted, re installed windows or what ever. I was reading it up and found out that if you re install windows any data that was on the 2gb of that hard drive that windows writes over will be un recoverable but any data that is out side of the 2gb boundary can still be recovered.

This was the site I was reading about it :

The main things I need help with is :

* what API's I would need to use to be able to make this program
*any sites that would show or point me in the right direction
* what info would i need to know about the filing system(s) ( fat12, fat16, fat32 and NTFS) to be able to recover files.
* If the hard drive in question has been formatted or something like that , how would I go abouts recovering the data because I read that the data is still in tact because it only does something to the root of the drive and zeros that and the mbr or something like that and so it leaves the data in tact, would you have to do some restore command to the mbr or something ??

Is there anything else I should be aware of to be able to make this program?
Shane Russell2nd Line Desktop SupportAuthor Commented:
If you think I am asking too much for the amount of points I am offering the please let me know !! As I am willing to offer more points for help with this topic !!

As I have asked in the previous post and the question, those are the things I am looking for.
Shane Russell2nd Line Desktop SupportAuthor Commented:
meessen are you still there ? What happened ?

I suggest that you accept meesen's answer on this one and that you raise one or several more focussed questions on the topic.
My contribution is to recommend that you look at some products that do what you are aiming at.

I use WinHex. You can try it for free:
Shane Russell2nd Line Desktop SupportAuthor Commented:
how do you use win hex for data recovery, that only shows you your hard drive data as hexidecimal if i remember correctly.
Shane Russell2nd Line Desktop SupportAuthor Commented:
Also I wanted info on what knowledge I would need in terms of programming like what API's and what things I would need to research it considering this question is under programming.

I even made a post above :

Comment from gecko_au2003
Date: 09/09/2004 09:47AM PDT

stating what I was looking for.

Anyway When someone answers me with that info I will split my points between meesens answer and there answer , Because his input was useful !!

thank you  GrahamSkan for suggesting to give me the answer. It took me some time to write this answer and was quite desapointed to see that this was not what was expected. The question was too ambiguous.

Here is a pointer to a place where you will find allot of documentation and source code which might be of interest for your project.
You might be specially interested in NTFSInfo and SDelete that are provided with the sources.

Another very interesting starting point is

On the same place yo may find info about EFS (ecrypted file system).

This should more suit you needs.

For source code to do exactly what you want, the A grade a this virtual points won't be enough ;-)
You may also check 

for its recovery section.
Here you have freeware NTFS disk recovery tools

In this list you will find PCInspector which is free:
Couldn't find sources.

Shane Russell2nd Line Desktop SupportAuthor Commented:
ok here is the follow up questoin which is more specific as far as I can tell and I really hope meesen can help me more with this as he has been a great help and I just have to say thank you very much in advance !!!

Follow up question ( Worth 115 points with an A grade ) :

Thank you for everyones input here though considering the question was vague !

kind regards shane
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.