?
Solved

MySQL: Inserting Unique Column Value

Posted on 2008-06-15
11
Medium Priority
?
973 Views
Last Modified: 2008-06-18
Hi,
Scenario: I have a database table that stores blog posts. It looks like this:

CREATE TABLE `posts` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY ,
`title` VARCHAR( 50 ) NOT NULL ,
`safe_title` VARCHAR( 60 ) NOT NULL ,
`content` TEXT NOT NULL ,
`user_id` INT UNSIGNED NOT NULL ,
UNIQUE (`safe_title`)
) ENGINE = MYISAM ;

The `safe_title` column contains the post title in a URL safe format. That is, the post title with all special characters removed. So a post title of "Hello World" becomes "hello_world". It's important that every post has a unique safe title, even if some posts have the same title. Post #1 with the title "Hello World" would have the safe title of "hello_world". Post #2 with the title "Hello World" would have the safe title "hello_world_2", and so on.

The problem: How to insert a unique safe title in only one or two db transactions? I don't want to create code like this:

$found = true;
$counter = 0;
do {
      ++$counter;
      $safe_title = getSafeTitle('Hello World') . '_' . $counter;
      $results = $db->execute("SELECT FROM `posts` WHERE `safe_title` = '$safe_title' LIMIT 1");
      if (count($results) == 0) {
            $found = false;
      }
} while ($found == true);
$db->insert("INSERT INTO ....") // Insert post with safe title

That little bit of code could loop through a dozen times before a unique title is found. Even worse, by the time $db->insert() is called, the safe title that I've found may no longer be unique (If the system is inserting thousands of new blog posts a minute).

So how do I avoid this problem?
0
Comment
Question by:headzoo
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
  • 2
  • +1
11 Comments
 
LVL 2

Expert Comment

by:pain_is_relative
ID: 21791106
you are painting yourself into a corner.

lets see...you could attach an md5 hash of the current date after the name.... ugly.

you could make the safe-title a unique primary key, but that could cause you problems... old posts getting over written, errors writing to table. etc.

Alternatively you could reference the page by its primary key ... (id), and let the posts share names. So your posts could be referebced by id=678, id=56789, id=3. that way you would not be limiting your posts to having unique names.
0
 

Author Comment

by:headzoo
ID: 21791270
Hi pain_is_relative,
I wish it could be that simple. But clients today demand SEO friendly URLs, and not URLs that simply work (and work well).

I figure I could ditch the loop, and change my code to something like this:

$safe_title = 'hello_world';
$results = $db->execute(
      "SELECT * FROM `posts` WHERE `safe_title` REGEXP '^$safe_title(_[0-9]+)?$' LIMIT 1"
);

/**
 * Check to see if we found another post with the same
 * safe title
 */
if (count($results) > 0) {
      preg_match('/^(.*?)(_([0-9]))?$/', $results->safe_title, $matches);
      
      /**
       * If the safe title already has a _[0-9] appended to
       * it, we need to increment the [0-9] by one. Otherwise
       * add _1 to the safe title.
       */
      if (!empty($matches[2])) {
            $safe_title = $matches[0] . '_' . ($matches[2] + 1);
      } else {
            $safe_title = $matches[0] . '_1';
      }
}

$db->insert("INSERT INTO ....") // Insert post with safe title

That still doesn't solve the problem of another blog post with the same safe title being inserted microseconds before I call $db->insert(). Also I've never used regular expressions in a SELECT statement before, and I don't know what the performance would be like on large data sets.
0
 
LVL 2

Expert Comment

by:pain_is_relative
ID: 21791322
You could always add the session id to the date and then md5 it then add it to the safe name... but yess.. its ugly...

you could invesitigate using modrewrite, but i am not a fan of that.

i still think that a url of : blah/blah/blogpost.php?id=678 still looks reasonable.
you could even do this:
blah/blah/blogpost.php?id=678&name=my_safe_name_url_is_ace
that way you preserve the page name, and display it in the url, without ever using it. (just faking it it)
0
Will your db performance match your db growth?

In Percona’s white paper “Performance at Scale: Keeping Your Database on Its Toes,” we take a high-level approach to what you need to think about when planning for database scalability.

 
LVL 3

Accepted Solution

by:
pzurowski earned 2000 total points
ID: 21791428
Count total pages which begins with safe hello_world. If there will be none, so you have good title, if not you can attach total number of existing similar titles to the title. (If there is one, it will be without number, so the number will be 1; if there is two, there willbe wihout number and with _1, so the total is 2 and you can attach _2; ...so on..)

You can change
   LIKE '$safe_title%'
below to
   RLIKE '^${safe_title}_?[0-9]*\$'
to get more accurate results.
$safe_title = getSafeTitle('Hello World');
$results = $db->execute("SELECT count(*) AS cnt FROM `posts` WHERE `safe_title` LIKE '$safe_title%'");
 
$count = $result->getOne(); // or other function, which get "cnt" from query above
 
if( $count > 0 ){
   $safe_title .= "_" . $count;
}
 
$db->insert("INSERT INTO ....") // Insert post with safe title

Open in new window

0
 

Author Comment

by:headzoo
ID: 21791524
@pzurowski -
You gave me an idea. I could try SQL like this:

INSERT INTO `posts` (`id`, `title`, `content`, `user_id`, `safe_title`)
SELECT
null,
'Hello World',
'This is a test',
1,
IF(COUNT(*) > 1, CONCAT('hello_world_', COUNT(*)), 'hello_world')
FROM `posts` WHERE `safe_title` REGEXP '^hello_world(_[0-9]+)?$'
LIMIT 1

That would insert the blog post, and generate the safe title all in one transaction. This would be very quick, and it becomes unlikely that another post with the same safe title would be inserted before the transaction is complete.

Any thoughts on this kind of SQL?
0
 
LVL 3

Assisted Solution

by:pzurowski
pzurowski earned 2000 total points
ID: 21791659
think about seperating count and safe url, i mean:

CREATE TABLE `posts` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY ,
`title` VARCHAR( 50 ) NOT NULL ,
`safe_title` VARCHAR( 60 ) NOT NULL ,
`which` INT NOT NULL, --- added here
`content` TEXT NOT NULL ,
`user_id` INT UNSIGNED NOT NULL ,
UNIQUE (`safe_title`, `which`) --- added here
) ENGINE = MYISAM ;

and next will be simplier and faster
INSERT INTO `posts` (`id`, `title`, `content`, `user_id`, `safe_title`, `which`)
SELECT
null,
'Hello World',
'This is a test',
1,
COUNT(*) WHERE `safe_title` = 'hello_world'
 
SELECT `id`, IF( `which` > 0, CONCAT( `safe_title`, "_", `which` ), `safe_title` ) FROM `posts`

Open in new window

0
 

Author Comment

by:headzoo
ID: 21791678
@pzurowski -
I assume this is better performance wise, because I'm not requesting a count of all matching rows twice?
0
 
LVL 3

Expert Comment

by:pzurowski
ID: 21791696
First of all, because there is no regexp, which are slower. Moreover you can quicker read it, and it will be faster to expand or bug fix in future :)
0
 
LVL 2

Expert Comment

by:raheel_lips
ID: 21792213
Hi headzoo,

$safe_title = getSafeTitle('Hello World'); // this will give you white spaces removed string.
 
$results = $db->execute("SELECT count(*)+1 AS total FROM `posts` WHERE `safe_title` LIKE '$safe_title%'");
 
if( $count > 2 ){
   $safe_title .= "_" . $count;
} 
 
$db->insert("INSERT INTO ....") // Insert post with safe title

Open in new window

0
 
LVL 2

Expert Comment

by:raheel_lips
ID: 21792225
I missed some comments in my last post so consider these too.

// Above mentioned code is utilizing only 2 hits in post table.
1) check existing
2) Insert valid information

** There is no way to sence the duplicate data during insertion of post in table.

/* You would use one table transaction for check duplication before inserting data in table.
0
 

Author Comment

by:headzoo
ID: 21792292
@raheel_lips

The problem with using LIKE 'hello_world%', is it would also match against safe titles like "hello_world_how_are_you", etc.

@pzurowski

I played around with your SQL, and I can't tell what it's supposed to do. It appears to be completely invalid and/or doesn't work no matter how I rearranged it.

@everyone

It dawned on me that the request URLs would look like this:

http://site.com/users/headzoo/posts/hello_world

that is:

http://site.com/usres/[name of user]/posts/[post safe title]

Which means the user and the safe title could be combined into a unique index. So I changed the table to this:

CREATE TABLE `posts` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY ,
`title` VARCHAR( 50 ) NOT NULL ,
`safe_title` VARCHAR( 60 ) NOT NULL ,
`content` TEXT NOT NULL ,
`user_id` INT UNSIGNED NOT NULL ,
UNIQUE (`safe_title`, `user_id`) -- changed here
) ENGINE = MYISAM ;

Because it's impossible for the same user to submit more than one post with the same title, at the same time, I can safely write my code like this:

$safe_title = 'hello_world';
$results = $db->execute("SELECT COUNT(*) FROM `posts` WHERE `safe_title` = '$safe_title'");
if (count($results) > 0) {
      $safe_title .= '_' . (count($results) + 1);
}
$db->insert("INSERT INTO ....") // Insert post with safe title

I don't have to worry about another post with the same safe title being inserted prior to calling $db->insert(), because I can have that. Just so long as it's a different user.

A couple people here gave me the ideas for my final code/sql. So I need to split the points up some way.
0

Featured Post

NFR key for Veeam Backup for Microsoft Office 365

Veeam is happy to provide a free NFR license (for 1 year, up to 10 users). This license allows for the non‑production use of Veeam Backup for Microsoft Office 365 in your home lab without any feature limitations.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Creating and Managing Databases with phpMyAdmin in cPanel.
Load balancing is the method of dividing the total amount of work performed by one computer between two or more computers. Its aim is to get more work done in the same amount of time, ensuring that all the users get served faster.
In this video, Percona Solution Engineer Dimitri Vanoverbeke discusses why you want to use at least three nodes in a database cluster. To discuss how Percona Consulting can help with your design and architecture needs for your database and infras…
In this video, Percona Solution Engineer Rick Golba discuss how (and why) you implement high availability in a database environment. To discuss how Percona Consulting can help with your design and architecture needs for your database and infrastr…
Suggested Courses

649 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question