Solved

Selecting the first "blank" line in an index

Posted on 2003-10-29
21
361 Views
Last Modified: 2012-06-27
Hi,

I'm having a slight problem with mysql 4.x.

I'm using a table with an integer value as an index, let's say :
Table : myTable
id    |  name   |  age
1         Jack        20
2         Lise        30
3         Peter      15
4         Ron        35
7         Jack        40

I want to use a sort of auto_increment that would first fill the blank lines. ie, if I want to add ('David', '12'), ('Rob', '36') and ('Laura', '14'), their respective id will be 5, 6, 8, and the table will be :

id    |  name   |  age
1         Jack        20
2         Lise        30
3         Peter      15
4         Ron        35
5         David      12
6         Rob        36
7         Jack        40
8         Laura      14

If the table doesn't have any empty "id" value, then the SELECT behavior should be exactly the same as AUTO_INCREMENT (but only in this case).

(I want to avoid my table to be full : this table changes dramatically fast and if I put my id AUTO_INCREMENT it may end up with a limit overload if BIGINT with only 20k entries !).

I want this processing to be done by MySQL server. I don't want to write an iteration algorithm in my main program.

The SELECT command which returns the first "free" index is enough.

Thank you for your help ! :)
0
Comment
Question by:Magnet_
  • 8
  • 8
  • 3
  • +1
21 Comments
 
LVL 20

Expert Comment

by:ikework
ID: 9645532
depends on the definition of your id-field, is it nullable or not, a varchar??

if it's a varchar( not nuallable ):

select id from mytable where id='';


varchar nullable:

select id from mytable where id is null or id='';


hope it helps :)

maik
0
 

Author Comment

by:Magnet_
ID: 9645581
It is a BIGINT (I think I told it ;) and it is not nullable.

In your example, you are looking for a empty/null MySQL string.

I think you didn't understand what I want to do. I can't find a MySQL row which does not exist, but I want the number of the line.

Have a look at my example tables and read again my question because you're answering something else.

(btw I am not new at all with SQL, so I would not have asked for such a simple thing ;).

Thanks anyway
0
 
LVL 17

Expert Comment

by:Squeebee
ID: 9645604
"I want to avoid my table to be full : this table changes dramatically fast and if I put my id AUTO_INCREMENT it may end up with a limit overload if BIGINT with only 20k entries !"

Believe me, having BIGINT overflow is the least of your worries.

A BIGINT UNSIGNED can hold a number as high as 2^64, or 18,446,744,073,709,551,616

So let me say that reusing auto_increment values is not a good idea, it can cause data corruption and orphans/adoptions. If you are really worries about overflowing BIGINT switch to BIGINT UNSIGNED, but this should not be a concern. So to sum up: don't do it.
0
 
LVL 17

Expert Comment

by:Squeebee
ID: 9645644
If, however, you insist that your code will work through 18 quintillion id values, then use the approach of not deleting a line, but flipping a "deleted" flag. Then search for the first id with the deleted flag set true and update the row. In your select queries always check against the deleted flag being false to only return non-deleted rows.
0
 

Author Comment

by:Magnet_
ID: 9645663
Let me explain it again :

Look at this very simple table, with only a field (BIG INT not null, aimed to be used as an index).

id
--
1  <-- OK (first entry)
2  <-- OK (no blank line between 1 & 2)
3  <-- OK
4  <-- OK
8  <-- HEY ! there should be 5 first !
9
11

The SELECT command should return 5, so in my INSERT command I invoke SELECT which returns 5 as id.

Now, my table looks like this :

id
--
1  <-- OK (first entry)  
2  <-- OK (no blank line between 1 & 2)
3  <-- OK
4  <-- OK
5  <-- Ok, Just added ! :)
8  <-- HEY ! there should be 6 first !
9

and then... till it looks like :

id
--
1  
2  <-- OK (no blank line between 1 & 2)
3  <-- OK
4  <-- OK
5  <-- Ok, added some time ago
6  <-- OK
7  <-- OK
8  <-- Now OK ! :)
9  <-- OK
10 <-- Just added, OK
11 <-- OK
<-- WHERE IS 12 ? :)

It should now return 12.

That's why I'm comparing this algorithm with AUTO_INCREMENT. It does the same but does not always return MAX()+1. It returns the number corresponding to the first non-existing/blank entry.

Hope it makes my question clear now.

Any hint ? I am sure it is feasable (I remember having done it with Oracle at school some years ago but totally forgotten how, and anyway I want it for MySQL 4.x now)

Thanks for your help.
0
 
LVL 20

Expert Comment

by:ikework
ID: 9645680
ahh ok, sorry for misunderstanding, you want the first value, which doesn't exist, to reuse
the values.
ok, but i guess with 20k of data the bigint field shouldn't overflow, it has a precision
of 64 bit -> 18 446 674 000 000 000 000... this should be enough for a long time, isn't it?
 
0
 

Author Comment

by:Magnet_
ID: 9645686
Just got your answers, Squeebee

OK even if BIGINT is enough, does anybody have my answer, for my personnal IT knowledge ? :)

I know I can do this and I think it is stupid to go to id numbers like 500000 when a table won't contain more than 20k entries, nope ?

0
 
LVL 17

Expert Comment

by:Squeebee
ID: 9645695
I understand (and understood) exactly what you want to do, but I reiterate my advice: Don't do it.
0
 
LVL 17

Expert Comment

by:Squeebee
ID: 9645718
Hey, 5000000 and 5 take up 64 bits when you use a bigint, so there is no storage benefit, the index doesn't compress the number, so there is no speed benefit. I can see why you may think 1 - 20000 is cleaner than 1 to 500000 with holes, but it is a sound database design practice to never reuse an index.
0
 

Author Comment

by:Magnet_
ID: 9645764
From Squeebee :

> So let me say that reusing auto_increment values is not a good idea,
> it can cause data corruption and orphans/adoptions. If you are really worries
> about overflowing BIGINT switch to BIGINT UNSIGNED, but this should not
> be a concern. So to sum up: don't do it.

I don't want my "id" to be AUTO_INCREMENT, I want my code to manage it alone. So there is no risk of data corruption if MySQL doesn't do it by itself.
I understand BIGINT is enough but the point is I really prefer having my MAX(id) equal to COUNT(id), just because it is a more logical approach to me.

> If, however, you insist that your code will work through 18 quintillion id values,
> then use the approach of not deleting a line, but flipping a "deleted" flag.
> Then search for the first id with the deleted flag set true and update the row.
> In your select queries always check against the deleted flag being false to only
> return non-deleted rows.

Nice idea indeed, but each query will have to parse, say, if "deleted = 'Y'", where in my question there would be more parsing but only when I add something.
Anyway if I can't find out how to do my thing, I may try this one.

> ok, but i guess with 20k of data the bigint field shouldn't overflow, it has a precision
> of 64 bit -> 18 446 674 000 000 000 000... this should be enough for a long time, isn't it?

Yep but like I said before, it is not a question of my db being able to contain everything, I'd like to have COUNT() equal to MAX().

That's how I'd like it. If you have any clue how to do it, please tell me. :)

Thanks anyway !
0
Zoho SalesIQ

Hassle-free live chat software re-imagined for business growth. 2 users, always free.

 
LVL 17

Expert Comment

by:Squeebee
ID: 9645817
This could only be done purely in MySQL if we had stored procedures, which involves a programming loop anyway. If we were writing such a function in Oracle we could write a stored procedure to grab all data in a sorted array (can't find first available in an unsorted set), then use a cursor to compare each row to the last and break the loop when this id != lastid + 1

So, no, there is no way that I can see to do this with a single SQL query. As for wanting COUNT() = MAX(), I would let go of this need, there is no practical programming reason to want COUNT() to equal MAX() and I don't think there is a production database out there where that is true (anymore anyway, you may have found such a thing when ID values were 16 bit, but those days are gone).
0
 

Author Comment

by:Magnet_
ID: 9645983
OK Squeebee, I thought there was a way to do it with some not well-known command, but I trust you if you say I can't.

The matter is my table changes very fast, I'd prefer to have my index in an INT to have faster JOINs, and I don't want to support it once it's done : this table is modified by a daemon which is always running and modifying my table manually will desynch the daemon, well.. I think my method would have been the best way to do what I want to do.

I will not wait for mySql 5 and its stored procedures to get my work online so I will use your deleted flag method if nobody can't give me what I initially asked (since it seems there is no way to do it with mySql4, I'll be likely using your method). I want a table that can last years without having to check if everything's right or if i'm having too big numbers.

If nobody can't give me what I first asked for within few days, you will be rewarded the points Squeebee for your "deleted flag"/UPDATE method which keeps my index the lower possible.

Thanks ikework too :).
0
 
LVL 50

Accepted Solution

by:
Lowfatspread earned 50 total points
ID: 9646032
select
min(A.id + 1) as id
 from table as A
 left join table as B
 on a.id + 1 = b.id
where b.id is null


should do it?

or possibly even better if your mysql version supports it.

select
min(A.id + 1) as id
 from table as A
 Where Not exists (select b.id from table as B
                   where b.id = a.id + 1)
0
 
LVL 50

Expert Comment

by:Lowfatspread
ID: 9646046
ps if you've only 20K row of data don't use a bigint just stick with int or even better smallint

.... save some bytes....
0
 
LVL 17

Expert Comment

by:Squeebee
ID: 9646182
I was hoping to change the method and have nobody mention a theta join, but oh well, lets clean it up anyway:

One thing: Aliases can be case sensitive, so if you use A and B as table aliases, use A and B in column references (lowfatspread had mixed case).

select
min(A.id + 1) as id
 from table as A
 left join table as B
 on A.id + 1 = B.id
where B.id is null;

Ok, somebody went and let you know how to do it, but I still stress the dangers of this approach as it can cause problems with referential integrity. BE ABSOLUTELY SURE that when you delete a row you delete ALL references to the row from the other tables or you are on the road to data corruption. Also, remember that unles you use the deleted flag you will not have COUNT() = MAX() anyway. Why? because lets say you have your situation above:

"Now, my table looks like this :

id
--
1  <-- OK (first entry)  
2  <-- OK (no blank line between 1 & 2)
3  <-- OK
4  <-- OK
5  <-- Ok, Just added ! :)
8  <-- HEY ! there should be 6 first !
9
"

What is COUNT()? 7 What is MAX()? 9
You do not have a matching COUNT and MAX unless all holes are filled, so don't count on the match scenario in your code.
0
 

Author Comment

by:Magnet_
ID: 9646207
Thanks Lowfatspread ! That's the right answer I was waiting for !
I remember having done it with EXISTS in a sub-select with Oracle, now, but it won't work with mySQL even 4.x (I did look for EXIST in the manual index before posting here, but I thought my memory was kinda messed up and I had smoked too much because I found nothing in their SQL reference ;).

Anyway I have tested with the first method and it works, there is just slight errors :

- the table aliases do not need AS (it will produce an error)
- table aliases are case sensitive when used with an attribute

You get the points for your fast answer. Thank you very much :).
And about INT/SMALLINT, yep I will use these of course, now that I have the right SELECT command not to overload my database uselessly.

Thanks ikework and Squeebee too.

0
 

Author Comment

by:Magnet_
ID: 9646258
Well, AS works with table aliases too. It works with or without it anyway :).

Then, for Squeebee :
The whole COUNT/MAX story was there to explain how I wanted my table to be "in final", in fact it was not COUNT = MAX, but COUNT <= MAX, you're right here.

But if you had my answer and refused to give it to me, I can't point out why. I wanted something particular but be sure I will managed linked entries not to corrupt my database.
I appreciate your warnings though, and will focus on managing all references rightly.

0
 
LVL 17

Expert Comment

by:Squeebee
ID: 9646313
Maybe it's not my place to choose what information to give or not to give, but re-using id values is very dangerous and therefore not something to be taken lightly. When you insisted you wanted to I gave you a solution that would be safer than simply finding the first available index value.

LowFatSpread chose to give you the information and that is his/her choice, but you were asking for dangerous information and I thought it best to withold it.
0
 
LVL 50

Expert Comment

by:Lowfatspread
ID: 9646768
i agree with you squeebee, there is no point in worrying about any deletion holes that you have in
your table keys... the space is not wasted  ...

personally i'm in favour of generating randon numbers for primary key attribiutes
rather than using any auto number type of thing...

but if you're using RI or maintaining it yourself its still your database/model
and your resposibility to maintian your childrens relationships....

parental hat off...
time for bed...


;-)
0
 
LVL 17

Expert Comment

by:Squeebee
ID: 9647008
Hehe, I will leave this with one last comment: It is probably a good idea to switch to InnoDB and use foreign key constraints just to be sure RI is maintained in this case.
0
 

Author Comment

by:Magnet_
ID: 9650662
If the table is empty, it will return NULL.
To avoid this, use

select
IFNULL(min(A.id + 1),1) as id
 from table as A
 left join table as B
 on A.id + 1 = B.id
where B.id is null;

(if you want the first value to be 1, if you want 0 you can let it as it was, since mySQL will convert NULL to 0 if in a non-nullable field).

0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Suggested Solutions

Loading csv or delimited data files to MySQL database is a very common task frequently questioned about and almost every time LOAD DATA INFILE comes to the rescue. Here we will try to understand some of the very common scenarios for loading data …
I have been using r1soft Continuous Data Protection (http://www.r1soft.com/linux-cdp/) for many years now with the mySQL Addon and wanted to share a trick I have used several times. For those of us that don't have the luxury of using all transact…
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…
This video explains how to create simple products associated to Magento configurable product and offers fast way of their generation with Store Manager for Magento tool.

747 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now