Link to home
Start Free TrialLog in
Avatar of ankursinha
ankursinha

asked on

Making my own Database Engine

I am trying to make my own database engine for a college project that implements its own storage
format and supports all SQL statements.

I need help with the following things:
1. How to implement efficient search techniques(whether to use B-Trees or Normal Indexes)
2. How to make the storage in flat files efficient that should make retrieval fast and it takes least
amount of space.

I know this is a very general question.I dont want code.Just Suggestions.Please.

Thanx


Avatar of sunnycoder
sunnycoder
Flag of India image

Hi ankursinha,

organization of data depends entirely on the data structures you have.... If you have small records and you dont expect to handle large number of records, then using single level index may be a very good idea ... for bigger databases you may have to to build mutilevel indexes such as B trees ... for still larger databases, it may be a better idea to incorporate some kind of hashing along with index....

If you can be more specific about your requirements may be I can help a bit further

Cheers!
Sunny
Avatar of grg99
grg99

There's probably no best solution that fits all the possible kinds of data and access modes.

You're looking in the right direction though, trees and/or hash tables are good ways of looking things up.

Avatar of ankursinha

ASKER

I guess i am going to implement indexes(possibly multi-level) for searching
My engine is supposed to handle sufficiently large volumes of data
I ran into the time-space tradeoff.need help.
If i use variable length records how do i jump a particular no. of records(i dont want to do
linear search of my data search)
If i use fixed length records ,then i know the offset.

Any links for multilevel Indexing.

Also ,any more suggestions???


Note:Just as an aside,Pls if u can,check out this statement on ur PC's and tell me the o/p with
your platform
1. printf("%f%f",3,4);
2. printf("%f%f",3.5,4);
3. printf("%f%f",4,3.5);

Any explanations for the above o/p's;

My o/p:
1. 0.00 0.00
2. 3.50 0.00
3. 0.00 0.00
Also,lets say i create an index on the primary key(i know indexex are supposed to be created
dynamically for a DB engine).
If the user gives a query having conditions on a non-ordering key,then what do i do
Do i create an index on that key to display the result.If yes,wouldnt that take longer
compared to if i did a linear search of my data file.

>If i use variable length records how do i jump a particular no. of records(i dont want to do
>linear search of my data search)
store an index ... suppose your records are of variable length and you have n records ... maintain an index (say for simplicity an array of n elements) where nth element points to the starting address/offset of the nth rtecord ... to access a record, find the offset from the index and jump

>Any links for multilevel Indexing
http://www.cs.rpi.edu/~zaki/cs4380/lectures/8
http://www.nist.gov/dads/HTML/bplustree.html     --- implementation is also available here
http://www.cs.duke.edu/~tavi/btree/
http://www.seanster.com/BplusTree/BplusTree.html
http://cs.clarku.edu/~achou/cs160/B+Trees/B+Trees.htm 

also check the links i posted in the previous post
I was actually planning to store the data in a file in binary format.How would i know the address
of the start of each record.Pls help.
copying from my previous post:

If the user gives a query having conditions on a non-ordering key,then what do i do
Do i create an index on that key to display the result.If yes,wouldnt that take longer
compared to if i did a linear search of my data file.I know that the linear search would be useful
if such a query occured only once & creating an index would speed up furthur retrievals

Also,i guess Hashing is not an option for me cos its very hard to search ranges using hashing
(& very slow)


Also ,is storing in a file my best option or is there any better method?

>How would i know the address of the start of each record.
There are more than one approaches to this ... you can store the length of the record as the first 4 or 8 bytes of the record ... or ... you can use some special string as record delimiter... if none of these suits you, then post the constraints and we will try to devise something that does... for most cases this should suffice

>If the user gives a query having conditions on a non-ordering key, then what do i do
All the fields on which you expect frequent search should be indexed, especially all the keys... If such a situation arises, then linear search is better than building an index as it will be less time consuming... If frequent searches occur in a field that you have not built an index for, then you must build an index for it

>Also,i guess Hashing is not an option for me cos its very hard to search ranges using hashing
yes you are right... range search is not done with hashing but normal search is faster.... there are situations where it is the best available option but it is not mandatory to use it

>Also ,is storing in a file my best option or is there any better method?
If your database is something linke 1-10 MB, then loading it entirely into memory may not be a bad idea... 256 MB ram is the norm now-a-days... so a few MB should not be much of a problem... however, you will ultimately need to store it in persistent storage and you will have to use files
I am trying to build my own database engine(something like MySQL)
so it will have to support large database sizes(>100 MB)
Also,for variable length records if i make an index at the time of insertion
i can specify the offset of that record from the start of file,that way i can index variable
length records,Am i in the right direction??

If i store the length of the record in the first four bytes,then too i wont be able to access
a particular record directly i.e. i wont know the offset to jump to reach a certain record.Right?

I am planning to store the user's tables in files where I'll store the catalog of the database(
Table details) in a separate file and an index in a separate file.

So,in the index if i specify the offset from the start of the data file, i can reach a record directly.
Any suggestions on any other method to store the user's data in files.
i.e. table details,data and the access structures(index)

>i can specify the offset of that record from the start of file
In the index yes, as a part of the record, you will be better off specifying length of the record... that will save you some memory

>that way i can index variable length records,Am i in the right direction??
yes you are in the right direction... just spare a thought about insertion and modification of records...

>Any suggestions on any other method to store the user's data in files.
>i.e. table details,data and the access structures(index)

I would recommend reading first five chapters of introduction to database systems by Navathe (oops I forgot the second author...) It discusses some good approaches
Sunnycoder,Thanx for ur avid interest in my question.

>i.e. table details,data and the access structures(index)

I am building my own database engine , remember
table details will be as the user enters them,data as the user enters according to the table
details,and the access structure will be a bplus tree.

and Hey,I have that book elmasri & navathe too.I have read it plenty of times.
Good Book Though.

ASKER CERTIFIED SOLUTION
Avatar of sunnycoder
sunnycoder
Flag of India image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hey Sunnycoder,

There was a question in my prev post.I guess u didnt spot it.
Ur post:

>Any suggestions on any other method to store the user's data in files.
>i.e. table details,data and the access structures(index)

My Post:

>i.e. table details,data and the access structures(index)
I am building my own database engine , remember
table details will be as the user enters them,data as the user enters according to the table
details,and the access structure will be a bplus tree.

But forget it.I know i've already stetched my 150 pts. for this one.
Just a little bit of help needed.
I've decided to go for fixed length records & Bplustree for Access structure.
Can u just give me links where i can find out how to store a bplus tree in persistent storage
(in a file).
I'd prefer doing this myself but i really dont have time.Pls help me for this little bit.
The pts. are already urs.
>Any suggestions on any other method to store the user's data in files.
>i.e. table details,data and the access structures(index)
flat files should be good ... they are easy to manipulate using numerous scripts which are already available

>Can u just give me links where i can find out how to store a bplus tree in persistent storage
I have already posted links for implementation of B+ trees... however, I could not locate one which stores them in persistent storage... I shall post a link, if I should find one

why would you like to store them in persistent storage? You can build an index when you "open" a database file ... Otherwise
1. what happens if a file is modified without using your program ... your index will be incorrect
2. any modifications to the database would require rewriting the entire thing back to disk
Dear sunnycoder or anybody,

Do you have any working code for all this was talked about above?  Or maybe something similar to it?
I will give you all the maximum points I can if you could get me compilable code that answers his first question to you.

Thanks!