Magnet_
asked on
Selecting the first "blank" line in an index
Hi,
I'm having a slight problem with mysql 4.x.
I'm using a table with an integer value as an index, let's say :
Table : myTable
id | name | age
1 Jack 20
2 Lise 30
3 Peter 15
4 Ron 35
7 Jack 40
I want to use a sort of auto_increment that would first fill the blank lines. ie, if I want to add ('David', '12'), ('Rob', '36') and ('Laura', '14'), their respective id will be 5, 6, 8, and the table will be :
id | name | age
1 Jack 20
2 Lise 30
3 Peter 15
4 Ron 35
5 David 12
6 Rob 36
7 Jack 40
8 Laura 14
If the table doesn't have any empty "id" value, then the SELECT behavior should be exactly the same as AUTO_INCREMENT (but only in this case).
(I want to avoid my table to be full : this table changes dramatically fast and if I put my id AUTO_INCREMENT it may end up with a limit overload if BIGINT with only 20k entries !).
I want this processing to be done by MySQL server. I don't want to write an iteration algorithm in my main program.
The SELECT command which returns the first "free" index is enough.
Thank you for your help ! :)
I'm having a slight problem with mysql 4.x.
I'm using a table with an integer value as an index, let's say :
Table : myTable
id | name | age
1 Jack 20
2 Lise 30
3 Peter 15
4 Ron 35
7 Jack 40
I want to use a sort of auto_increment that would first fill the blank lines. ie, if I want to add ('David', '12'), ('Rob', '36') and ('Laura', '14'), their respective id will be 5, 6, 8, and the table will be :
id | name | age
1 Jack 20
2 Lise 30
3 Peter 15
4 Ron 35
5 David 12
6 Rob 36
7 Jack 40
8 Laura 14
If the table doesn't have any empty "id" value, then the SELECT behavior should be exactly the same as AUTO_INCREMENT (but only in this case).
(I want to avoid my table to be full : this table changes dramatically fast and if I put my id AUTO_INCREMENT it may end up with a limit overload if BIGINT with only 20k entries !).
I want this processing to be done by MySQL server. I don't want to write an iteration algorithm in my main program.
The SELECT command which returns the first "free" index is enough.
Thank you for your help ! :)
ASKER
It is a BIGINT (I think I told it ;) and it is not nullable.
In your example, you are looking for a empty/null MySQL string.
I think you didn't understand what I want to do. I can't find a MySQL row which does not exist, but I want the number of the line.
Have a look at my example tables and read again my question because you're answering something else.
(btw I am not new at all with SQL, so I would not have asked for such a simple thing ;).
Thanks anyway
In your example, you are looking for a empty/null MySQL string.
I think you didn't understand what I want to do. I can't find a MySQL row which does not exist, but I want the number of the line.
Have a look at my example tables and read again my question because you're answering something else.
(btw I am not new at all with SQL, so I would not have asked for such a simple thing ;).
Thanks anyway
"I want to avoid my table to be full : this table changes dramatically fast and if I put my id AUTO_INCREMENT it may end up with a limit overload if BIGINT with only 20k entries !"
Believe me, having BIGINT overflow is the least of your worries.
A BIGINT UNSIGNED can hold a number as high as 2^64, or 18,446,744,073,709,551,616
So let me say that reusing auto_increment values is not a good idea, it can cause data corruption and orphans/adoptions. If you are really worries about overflowing BIGINT switch to BIGINT UNSIGNED, but this should not be a concern. So to sum up: don't do it.
Believe me, having BIGINT overflow is the least of your worries.
A BIGINT UNSIGNED can hold a number as high as 2^64, or 18,446,744,073,709,551,616
So let me say that reusing auto_increment values is not a good idea, it can cause data corruption and orphans/adoptions. If you are really worries about overflowing BIGINT switch to BIGINT UNSIGNED, but this should not be a concern. So to sum up: don't do it.
If, however, you insist that your code will work through 18 quintillion id values, then use the approach of not deleting a line, but flipping a "deleted" flag. Then search for the first id with the deleted flag set true and update the row. In your select queries always check against the deleted flag being false to only return non-deleted rows.
ASKER
Let me explain it again :
Look at this very simple table, with only a field (BIG INT not null, aimed to be used as an index).
id
--
1 <-- OK (first entry)
2 <-- OK (no blank line between 1 & 2)
3 <-- OK
4 <-- OK
8 <-- HEY ! there should be 5 first !
9
11
The SELECT command should return 5, so in my INSERT command I invoke SELECT which returns 5 as id.
Now, my table looks like this :
id
--
1 <-- OK (first entry)
2 <-- OK (no blank line between 1 & 2)
3 <-- OK
4 <-- OK
5 <-- Ok, Just added ! :)
8 <-- HEY ! there should be 6 first !
9
and then... till it looks like :
id
--
1
2 <-- OK (no blank line between 1 & 2)
3 <-- OK
4 <-- OK
5 <-- Ok, added some time ago
6 <-- OK
7 <-- OK
8 <-- Now OK ! :)
9 <-- OK
10 <-- Just added, OK
11 <-- OK
<-- WHERE IS 12 ? :)
It should now return 12.
That's why I'm comparing this algorithm with AUTO_INCREMENT. It does the same but does not always return MAX()+1. It returns the number corresponding to the first non-existing/blank entry.
Hope it makes my question clear now.
Any hint ? I am sure it is feasable (I remember having done it with Oracle at school some years ago but totally forgotten how, and anyway I want it for MySQL 4.x now)
Thanks for your help.
Look at this very simple table, with only a field (BIG INT not null, aimed to be used as an index).
id
--
1 <-- OK (first entry)
2 <-- OK (no blank line between 1 & 2)
3 <-- OK
4 <-- OK
8 <-- HEY ! there should be 5 first !
9
11
The SELECT command should return 5, so in my INSERT command I invoke SELECT which returns 5 as id.
Now, my table looks like this :
id
--
1 <-- OK (first entry)
2 <-- OK (no blank line between 1 & 2)
3 <-- OK
4 <-- OK
5 <-- Ok, Just added ! :)
8 <-- HEY ! there should be 6 first !
9
and then... till it looks like :
id
--
1
2 <-- OK (no blank line between 1 & 2)
3 <-- OK
4 <-- OK
5 <-- Ok, added some time ago
6 <-- OK
7 <-- OK
8 <-- Now OK ! :)
9 <-- OK
10 <-- Just added, OK
11 <-- OK
<-- WHERE IS 12 ? :)
It should now return 12.
That's why I'm comparing this algorithm with AUTO_INCREMENT. It does the same but does not always return MAX()+1. It returns the number corresponding to the first non-existing/blank entry.
Hope it makes my question clear now.
Any hint ? I am sure it is feasable (I remember having done it with Oracle at school some years ago but totally forgotten how, and anyway I want it for MySQL 4.x now)
Thanks for your help.
ahh ok, sorry for misunderstanding, you want the first value, which doesn't exist, to reuse
the values.
ok, but i guess with 20k of data the bigint field shouldn't overflow, it has a precision
of 64 bit -> 18 446 674 000 000 000 000... this should be enough for a long time, isn't it?
the values.
ok, but i guess with 20k of data the bigint field shouldn't overflow, it has a precision
of 64 bit -> 18 446 674 000 000 000 000... this should be enough for a long time, isn't it?
ASKER
Just got your answers, Squeebee
OK even if BIGINT is enough, does anybody have my answer, for my personnal IT knowledge ? :)
I know I can do this and I think it is stupid to go to id numbers like 500000 when a table won't contain more than 20k entries, nope ?
OK even if BIGINT is enough, does anybody have my answer, for my personnal IT knowledge ? :)
I know I can do this and I think it is stupid to go to id numbers like 500000 when a table won't contain more than 20k entries, nope ?
I understand (and understood) exactly what you want to do, but I reiterate my advice: Don't do it.
Hey, 5000000 and 5 take up 64 bits when you use a bigint, so there is no storage benefit, the index doesn't compress the number, so there is no speed benefit. I can see why you may think 1 - 20000 is cleaner than 1 to 500000 with holes, but it is a sound database design practice to never reuse an index.
ASKER
From Squeebee :
> So let me say that reusing auto_increment values is not a good idea,
> it can cause data corruption and orphans/adoptions. If you are really worries
> about overflowing BIGINT switch to BIGINT UNSIGNED, but this should not
> be a concern. So to sum up: don't do it.
I don't want my "id" to be AUTO_INCREMENT, I want my code to manage it alone. So there is no risk of data corruption if MySQL doesn't do it by itself.
I understand BIGINT is enough but the point is I really prefer having my MAX(id) equal to COUNT(id), just because it is a more logical approach to me.
> If, however, you insist that your code will work through 18 quintillion id values,
> then use the approach of not deleting a line, but flipping a "deleted" flag.
> Then search for the first id with the deleted flag set true and update the row.
> In your select queries always check against the deleted flag being false to only
> return non-deleted rows.
Nice idea indeed, but each query will have to parse, say, if "deleted = 'Y'", where in my question there would be more parsing but only when I add something.
Anyway if I can't find out how to do my thing, I may try this one.
> ok, but i guess with 20k of data the bigint field shouldn't overflow, it has a precision
> of 64 bit -> 18 446 674 000 000 000 000... this should be enough for a long time, isn't it?
Yep but like I said before, it is not a question of my db being able to contain everything, I'd like to have COUNT() equal to MAX().
That's how I'd like it. If you have any clue how to do it, please tell me. :)
Thanks anyway !
> So let me say that reusing auto_increment values is not a good idea,
> it can cause data corruption and orphans/adoptions. If you are really worries
> about overflowing BIGINT switch to BIGINT UNSIGNED, but this should not
> be a concern. So to sum up: don't do it.
I don't want my "id" to be AUTO_INCREMENT, I want my code to manage it alone. So there is no risk of data corruption if MySQL doesn't do it by itself.
I understand BIGINT is enough but the point is I really prefer having my MAX(id) equal to COUNT(id), just because it is a more logical approach to me.
> If, however, you insist that your code will work through 18 quintillion id values,
> then use the approach of not deleting a line, but flipping a "deleted" flag.
> Then search for the first id with the deleted flag set true and update the row.
> In your select queries always check against the deleted flag being false to only
> return non-deleted rows.
Nice idea indeed, but each query will have to parse, say, if "deleted = 'Y'", where in my question there would be more parsing but only when I add something.
Anyway if I can't find out how to do my thing, I may try this one.
> ok, but i guess with 20k of data the bigint field shouldn't overflow, it has a precision
> of 64 bit -> 18 446 674 000 000 000 000... this should be enough for a long time, isn't it?
Yep but like I said before, it is not a question of my db being able to contain everything, I'd like to have COUNT() equal to MAX().
That's how I'd like it. If you have any clue how to do it, please tell me. :)
Thanks anyway !
This could only be done purely in MySQL if we had stored procedures, which involves a programming loop anyway. If we were writing such a function in Oracle we could write a stored procedure to grab all data in a sorted array (can't find first available in an unsorted set), then use a cursor to compare each row to the last and break the loop when this id != lastid + 1
So, no, there is no way that I can see to do this with a single SQL query. As for wanting COUNT() = MAX(), I would let go of this need, there is no practical programming reason to want COUNT() to equal MAX() and I don't think there is a production database out there where that is true (anymore anyway, you may have found such a thing when ID values were 16 bit, but those days are gone).
So, no, there is no way that I can see to do this with a single SQL query. As for wanting COUNT() = MAX(), I would let go of this need, there is no practical programming reason to want COUNT() to equal MAX() and I don't think there is a production database out there where that is true (anymore anyway, you may have found such a thing when ID values were 16 bit, but those days are gone).
ASKER
OK Squeebee, I thought there was a way to do it with some not well-known command, but I trust you if you say I can't.
The matter is my table changes very fast, I'd prefer to have my index in an INT to have faster JOINs, and I don't want to support it once it's done : this table is modified by a daemon which is always running and modifying my table manually will desynch the daemon, well.. I think my method would have been the best way to do what I want to do.
I will not wait for mySql 5 and its stored procedures to get my work online so I will use your deleted flag method if nobody can't give me what I initially asked (since it seems there is no way to do it with mySql4, I'll be likely using your method). I want a table that can last years without having to check if everything's right or if i'm having too big numbers.
If nobody can't give me what I first asked for within few days, you will be rewarded the points Squeebee for your "deleted flag"/UPDATE method which keeps my index the lower possible.
Thanks ikework too :).
The matter is my table changes very fast, I'd prefer to have my index in an INT to have faster JOINs, and I don't want to support it once it's done : this table is modified by a daemon which is always running and modifying my table manually will desynch the daemon, well.. I think my method would have been the best way to do what I want to do.
I will not wait for mySql 5 and its stored procedures to get my work online so I will use your deleted flag method if nobody can't give me what I initially asked (since it seems there is no way to do it with mySql4, I'll be likely using your method). I want a table that can last years without having to check if everything's right or if i'm having too big numbers.
If nobody can't give me what I first asked for within few days, you will be rewarded the points Squeebee for your "deleted flag"/UPDATE method which keeps my index the lower possible.
Thanks ikework too :).
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ps if you've only 20K row of data don't use a bigint just stick with int or even better smallint
.... save some bytes....
.... save some bytes....
I was hoping to change the method and have nobody mention a theta join, but oh well, lets clean it up anyway:
One thing: Aliases can be case sensitive, so if you use A and B as table aliases, use A and B in column references (lowfatspread had mixed case).
select
min(A.id + 1) as id
from table as A
left join table as B
on A.id + 1 = B.id
where B.id is null;
Ok, somebody went and let you know how to do it, but I still stress the dangers of this approach as it can cause problems with referential integrity. BE ABSOLUTELY SURE that when you delete a row you delete ALL references to the row from the other tables or you are on the road to data corruption. Also, remember that unles you use the deleted flag you will not have COUNT() = MAX() anyway. Why? because lets say you have your situation above:
"Now, my table looks like this :
id
--
1 <-- OK (first entry)
2 <-- OK (no blank line between 1 & 2)
3 <-- OK
4 <-- OK
5 <-- Ok, Just added ! :)
8 <-- HEY ! there should be 6 first !
9
"
What is COUNT()? 7 What is MAX()? 9
You do not have a matching COUNT and MAX unless all holes are filled, so don't count on the match scenario in your code.
One thing: Aliases can be case sensitive, so if you use A and B as table aliases, use A and B in column references (lowfatspread had mixed case).
select
min(A.id + 1) as id
from table as A
left join table as B
on A.id + 1 = B.id
where B.id is null;
Ok, somebody went and let you know how to do it, but I still stress the dangers of this approach as it can cause problems with referential integrity. BE ABSOLUTELY SURE that when you delete a row you delete ALL references to the row from the other tables or you are on the road to data corruption. Also, remember that unles you use the deleted flag you will not have COUNT() = MAX() anyway. Why? because lets say you have your situation above:
"Now, my table looks like this :
id
--
1 <-- OK (first entry)
2 <-- OK (no blank line between 1 & 2)
3 <-- OK
4 <-- OK
5 <-- Ok, Just added ! :)
8 <-- HEY ! there should be 6 first !
9
"
What is COUNT()? 7 What is MAX()? 9
You do not have a matching COUNT and MAX unless all holes are filled, so don't count on the match scenario in your code.
ASKER
Thanks Lowfatspread ! That's the right answer I was waiting for !
I remember having done it with EXISTS in a sub-select with Oracle, now, but it won't work with mySQL even 4.x (I did look for EXIST in the manual index before posting here, but I thought my memory was kinda messed up and I had smoked too much because I found nothing in their SQL reference ;).
Anyway I have tested with the first method and it works, there is just slight errors :
- the table aliases do not need AS (it will produce an error)
- table aliases are case sensitive when used with an attribute
You get the points for your fast answer. Thank you very much :).
And about INT/SMALLINT, yep I will use these of course, now that I have the right SELECT command not to overload my database uselessly.
Thanks ikework and Squeebee too.
I remember having done it with EXISTS in a sub-select with Oracle, now, but it won't work with mySQL even 4.x (I did look for EXIST in the manual index before posting here, but I thought my memory was kinda messed up and I had smoked too much because I found nothing in their SQL reference ;).
Anyway I have tested with the first method and it works, there is just slight errors :
- the table aliases do not need AS (it will produce an error)
- table aliases are case sensitive when used with an attribute
You get the points for your fast answer. Thank you very much :).
And about INT/SMALLINT, yep I will use these of course, now that I have the right SELECT command not to overload my database uselessly.
Thanks ikework and Squeebee too.
ASKER
Well, AS works with table aliases too. It works with or without it anyway :).
Then, for Squeebee :
The whole COUNT/MAX story was there to explain how I wanted my table to be "in final", in fact it was not COUNT = MAX, but COUNT <= MAX, you're right here.
But if you had my answer and refused to give it to me, I can't point out why. I wanted something particular but be sure I will managed linked entries not to corrupt my database.
I appreciate your warnings though, and will focus on managing all references rightly.
Then, for Squeebee :
The whole COUNT/MAX story was there to explain how I wanted my table to be "in final", in fact it was not COUNT = MAX, but COUNT <= MAX, you're right here.
But if you had my answer and refused to give it to me, I can't point out why. I wanted something particular but be sure I will managed linked entries not to corrupt my database.
I appreciate your warnings though, and will focus on managing all references rightly.
Maybe it's not my place to choose what information to give or not to give, but re-using id values is very dangerous and therefore not something to be taken lightly. When you insisted you wanted to I gave you a solution that would be safer than simply finding the first available index value.
LowFatSpread chose to give you the information and that is his/her choice, but you were asking for dangerous information and I thought it best to withold it.
LowFatSpread chose to give you the information and that is his/her choice, but you were asking for dangerous information and I thought it best to withold it.
i agree with you squeebee, there is no point in worrying about any deletion holes that you have in
your table keys... the space is not wasted ...
personally i'm in favour of generating randon numbers for primary key attribiutes
rather than using any auto number type of thing...
but if you're using RI or maintaining it yourself its still your database/model
and your resposibility to maintian your childrens relationships....
parental hat off...
time for bed...
;-)
your table keys... the space is not wasted ...
personally i'm in favour of generating randon numbers for primary key attribiutes
rather than using any auto number type of thing...
but if you're using RI or maintaining it yourself its still your database/model
and your resposibility to maintian your childrens relationships....
parental hat off...
time for bed...
;-)
Hehe, I will leave this with one last comment: It is probably a good idea to switch to InnoDB and use foreign key constraints just to be sure RI is maintained in this case.
ASKER
If the table is empty, it will return NULL.
To avoid this, use
select
IFNULL(min(A.id + 1),1) as id
from table as A
left join table as B
on A.id + 1 = B.id
where B.id is null;
(if you want the first value to be 1, if you want 0 you can let it as it was, since mySQL will convert NULL to 0 if in a non-nullable field).
To avoid this, use
select
IFNULL(min(A.id + 1),1) as id
from table as A
left join table as B
on A.id + 1 = B.id
where B.id is null;
(if you want the first value to be 1, if you want 0 you can let it as it was, since mySQL will convert NULL to 0 if in a non-nullable field).
if it's a varchar( not nuallable ):
select id from mytable where id='';
varchar nullable:
select id from mytable where id is null or id='';
hope it helps :)
maik