Link to home
Start Free TrialLog in
Avatar of Magnet_
Magnet_

asked on

Selecting the first "blank" line in an index

Hi,

I'm having a slight problem with mysql 4.x.

I'm using a table with an integer value as an index, let's say :
Table : myTable
id    |  name   |  age
1         Jack        20
2         Lise        30
3         Peter      15
4         Ron        35
7         Jack        40

I want to use a sort of auto_increment that would first fill the blank lines. ie, if I want to add ('David', '12'), ('Rob', '36') and ('Laura', '14'), their respective id will be 5, 6, 8, and the table will be :

id    |  name   |  age
1         Jack        20
2         Lise        30
3         Peter      15
4         Ron        35
5         David      12
6         Rob        36
7         Jack        40
8         Laura      14

If the table doesn't have any empty "id" value, then the SELECT behavior should be exactly the same as AUTO_INCREMENT (but only in this case).

(I want to avoid my table to be full : this table changes dramatically fast and if I put my id AUTO_INCREMENT it may end up with a limit overload if BIGINT with only 20k entries !).

I want this processing to be done by MySQL server. I don't want to write an iteration algorithm in my main program.

The SELECT command which returns the first "free" index is enough.

Thank you for your help ! :)
Avatar of ikework
ikework
Flag of Germany image

depends on the definition of your id-field, is it nullable or not, a varchar??

if it's a varchar( not nuallable ):

select id from mytable where id='';


varchar nullable:

select id from mytable where id is null or id='';


hope it helps :)

maik
Avatar of Magnet_
Magnet_

ASKER

It is a BIGINT (I think I told it ;) and it is not nullable.

In your example, you are looking for a empty/null MySQL string.

I think you didn't understand what I want to do. I can't find a MySQL row which does not exist, but I want the number of the line.

Have a look at my example tables and read again my question because you're answering something else.

(btw I am not new at all with SQL, so I would not have asked for such a simple thing ;).

Thanks anyway
"I want to avoid my table to be full : this table changes dramatically fast and if I put my id AUTO_INCREMENT it may end up with a limit overload if BIGINT with only 20k entries !"

Believe me, having BIGINT overflow is the least of your worries.

A BIGINT UNSIGNED can hold a number as high as 2^64, or 18,446,744,073,709,551,616

So let me say that reusing auto_increment values is not a good idea, it can cause data corruption and orphans/adoptions. If you are really worries about overflowing BIGINT switch to BIGINT UNSIGNED, but this should not be a concern. So to sum up: don't do it.
If, however, you insist that your code will work through 18 quintillion id values, then use the approach of not deleting a line, but flipping a "deleted" flag. Then search for the first id with the deleted flag set true and update the row. In your select queries always check against the deleted flag being false to only return non-deleted rows.
Avatar of Magnet_

ASKER

Let me explain it again :

Look at this very simple table, with only a field (BIG INT not null, aimed to be used as an index).

id
--
1  <-- OK (first entry)
2  <-- OK (no blank line between 1 & 2)
3  <-- OK
4  <-- OK
8  <-- HEY ! there should be 5 first !
9
11

The SELECT command should return 5, so in my INSERT command I invoke SELECT which returns 5 as id.

Now, my table looks like this :

id
--
1  <-- OK (first entry)  
2  <-- OK (no blank line between 1 & 2)
3  <-- OK
4  <-- OK
5  <-- Ok, Just added ! :)
8  <-- HEY ! there should be 6 first !
9

and then... till it looks like :

id
--
1  
2  <-- OK (no blank line between 1 & 2)
3  <-- OK
4  <-- OK
5  <-- Ok, added some time ago
6  <-- OK
7  <-- OK
8  <-- Now OK ! :)
9  <-- OK
10 <-- Just added, OK
11 <-- OK
<-- WHERE IS 12 ? :)

It should now return 12.

That's why I'm comparing this algorithm with AUTO_INCREMENT. It does the same but does not always return MAX()+1. It returns the number corresponding to the first non-existing/blank entry.

Hope it makes my question clear now.

Any hint ? I am sure it is feasable (I remember having done it with Oracle at school some years ago but totally forgotten how, and anyway I want it for MySQL 4.x now)

Thanks for your help.
ahh ok, sorry for misunderstanding, you want the first value, which doesn't exist, to reuse
the values.
ok, but i guess with 20k of data the bigint field shouldn't overflow, it has a precision
of 64 bit -> 18 446 674 000 000 000 000... this should be enough for a long time, isn't it?
 
Avatar of Magnet_

ASKER

Just got your answers, Squeebee

OK even if BIGINT is enough, does anybody have my answer, for my personnal IT knowledge ? :)

I know I can do this and I think it is stupid to go to id numbers like 500000 when a table won't contain more than 20k entries, nope ?

I understand (and understood) exactly what you want to do, but I reiterate my advice: Don't do it.
Hey, 5000000 and 5 take up 64 bits when you use a bigint, so there is no storage benefit, the index doesn't compress the number, so there is no speed benefit. I can see why you may think 1 - 20000 is cleaner than 1 to 500000 with holes, but it is a sound database design practice to never reuse an index.
Avatar of Magnet_

ASKER

From Squeebee :

> So let me say that reusing auto_increment values is not a good idea,
> it can cause data corruption and orphans/adoptions. If you are really worries
> about overflowing BIGINT switch to BIGINT UNSIGNED, but this should not
> be a concern. So to sum up: don't do it.

I don't want my "id" to be AUTO_INCREMENT, I want my code to manage it alone. So there is no risk of data corruption if MySQL doesn't do it by itself.
I understand BIGINT is enough but the point is I really prefer having my MAX(id) equal to COUNT(id), just because it is a more logical approach to me.

> If, however, you insist that your code will work through 18 quintillion id values,
> then use the approach of not deleting a line, but flipping a "deleted" flag.
> Then search for the first id with the deleted flag set true and update the row.
> In your select queries always check against the deleted flag being false to only
> return non-deleted rows.

Nice idea indeed, but each query will have to parse, say, if "deleted = 'Y'", where in my question there would be more parsing but only when I add something.
Anyway if I can't find out how to do my thing, I may try this one.

> ok, but i guess with 20k of data the bigint field shouldn't overflow, it has a precision
> of 64 bit -> 18 446 674 000 000 000 000... this should be enough for a long time, isn't it?

Yep but like I said before, it is not a question of my db being able to contain everything, I'd like to have COUNT() equal to MAX().

That's how I'd like it. If you have any clue how to do it, please tell me. :)

Thanks anyway !
This could only be done purely in MySQL if we had stored procedures, which involves a programming loop anyway. If we were writing such a function in Oracle we could write a stored procedure to grab all data in a sorted array (can't find first available in an unsorted set), then use a cursor to compare each row to the last and break the loop when this id != lastid + 1

So, no, there is no way that I can see to do this with a single SQL query. As for wanting COUNT() = MAX(), I would let go of this need, there is no practical programming reason to want COUNT() to equal MAX() and I don't think there is a production database out there where that is true (anymore anyway, you may have found such a thing when ID values were 16 bit, but those days are gone).
Avatar of Magnet_

ASKER

OK Squeebee, I thought there was a way to do it with some not well-known command, but I trust you if you say I can't.

The matter is my table changes very fast, I'd prefer to have my index in an INT to have faster JOINs, and I don't want to support it once it's done : this table is modified by a daemon which is always running and modifying my table manually will desynch the daemon, well.. I think my method would have been the best way to do what I want to do.

I will not wait for mySql 5 and its stored procedures to get my work online so I will use your deleted flag method if nobody can't give me what I initially asked (since it seems there is no way to do it with mySql4, I'll be likely using your method). I want a table that can last years without having to check if everything's right or if i'm having too big numbers.

If nobody can't give me what I first asked for within few days, you will be rewarded the points Squeebee for your "deleted flag"/UPDATE method which keeps my index the lower possible.

Thanks ikework too :).
ASKER CERTIFIED SOLUTION
Avatar of Lowfatspread
Lowfatspread
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ps if you've only 20K row of data don't use a bigint just stick with int or even better smallint

.... save some bytes....
I was hoping to change the method and have nobody mention a theta join, but oh well, lets clean it up anyway:

One thing: Aliases can be case sensitive, so if you use A and B as table aliases, use A and B in column references (lowfatspread had mixed case).

select
min(A.id + 1) as id
 from table as A
 left join table as B
 on A.id + 1 = B.id
where B.id is null;

Ok, somebody went and let you know how to do it, but I still stress the dangers of this approach as it can cause problems with referential integrity. BE ABSOLUTELY SURE that when you delete a row you delete ALL references to the row from the other tables or you are on the road to data corruption. Also, remember that unles you use the deleted flag you will not have COUNT() = MAX() anyway. Why? because lets say you have your situation above:

"Now, my table looks like this :

id
--
1  <-- OK (first entry)  
2  <-- OK (no blank line between 1 & 2)
3  <-- OK
4  <-- OK
5  <-- Ok, Just added ! :)
8  <-- HEY ! there should be 6 first !
9
"

What is COUNT()? 7 What is MAX()? 9
You do not have a matching COUNT and MAX unless all holes are filled, so don't count on the match scenario in your code.
Avatar of Magnet_

ASKER

Thanks Lowfatspread ! That's the right answer I was waiting for !
I remember having done it with EXISTS in a sub-select with Oracle, now, but it won't work with mySQL even 4.x (I did look for EXIST in the manual index before posting here, but I thought my memory was kinda messed up and I had smoked too much because I found nothing in their SQL reference ;).

Anyway I have tested with the first method and it works, there is just slight errors :

- the table aliases do not need AS (it will produce an error)
- table aliases are case sensitive when used with an attribute

You get the points for your fast answer. Thank you very much :).
And about INT/SMALLINT, yep I will use these of course, now that I have the right SELECT command not to overload my database uselessly.

Thanks ikework and Squeebee too.

Avatar of Magnet_

ASKER

Well, AS works with table aliases too. It works with or without it anyway :).

Then, for Squeebee :
The whole COUNT/MAX story was there to explain how I wanted my table to be "in final", in fact it was not COUNT = MAX, but COUNT <= MAX, you're right here.

But if you had my answer and refused to give it to me, I can't point out why. I wanted something particular but be sure I will managed linked entries not to corrupt my database.
I appreciate your warnings though, and will focus on managing all references rightly.

Maybe it's not my place to choose what information to give or not to give, but re-using id values is very dangerous and therefore not something to be taken lightly. When you insisted you wanted to I gave you a solution that would be safer than simply finding the first available index value.

LowFatSpread chose to give you the information and that is his/her choice, but you were asking for dangerous information and I thought it best to withold it.
i agree with you squeebee, there is no point in worrying about any deletion holes that you have in
your table keys... the space is not wasted  ...

personally i'm in favour of generating randon numbers for primary key attribiutes
rather than using any auto number type of thing...

but if you're using RI or maintaining it yourself its still your database/model
and your resposibility to maintian your childrens relationships....

parental hat off...
time for bed...


;-)
Hehe, I will leave this with one last comment: It is probably a good idea to switch to InnoDB and use foreign key constraints just to be sure RI is maintained in this case.
Avatar of Magnet_

ASKER

If the table is empty, it will return NULL.
To avoid this, use

select
IFNULL(min(A.id + 1),1) as id
 from table as A
 left join table as B
 on A.id + 1 = B.id
where B.id is null;

(if you want the first value to be 1, if you want 0 you can let it as it was, since mySQL will convert NULL to 0 if in a non-nullable field).