Solved

MySQL: Inserting Unique Column Value

Posted on 2008-06-15
11
964 Views
Last Modified: 2008-06-18
Hi,
Scenario: I have a database table that stores blog posts. It looks like this:

CREATE TABLE `posts` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY ,
`title` VARCHAR( 50 ) NOT NULL ,
`safe_title` VARCHAR( 60 ) NOT NULL ,
`content` TEXT NOT NULL ,
`user_id` INT UNSIGNED NOT NULL ,
UNIQUE (`safe_title`)
) ENGINE = MYISAM ;

The `safe_title` column contains the post title in a URL safe format. That is, the post title with all special characters removed. So a post title of "Hello World" becomes "hello_world". It's important that every post has a unique safe title, even if some posts have the same title. Post #1 with the title "Hello World" would have the safe title of "hello_world". Post #2 with the title "Hello World" would have the safe title "hello_world_2", and so on.

The problem: How to insert a unique safe title in only one or two db transactions? I don't want to create code like this:

$found = true;
$counter = 0;
do {
      ++$counter;
      $safe_title = getSafeTitle('Hello World') . '_' . $counter;
      $results = $db->execute("SELECT FROM `posts` WHERE `safe_title` = '$safe_title' LIMIT 1");
      if (count($results) == 0) {
            $found = false;
      }
} while ($found == true);
$db->insert("INSERT INTO ....") // Insert post with safe title

That little bit of code could loop through a dozen times before a unique title is found. Even worse, by the time $db->insert() is called, the safe title that I've found may no longer be unique (If the system is inserting thousands of new blog posts a minute).

So how do I avoid this problem?
0
Comment
Question by:headzoo
  • 4
  • 3
  • 2
  • +1
11 Comments
 
LVL 2

Expert Comment

by:pain_is_relative
ID: 21791106
you are painting yourself into a corner.

lets see...you could attach an md5 hash of the current date after the name.... ugly.

you could make the safe-title a unique primary key, but that could cause you problems... old posts getting over written, errors writing to table. etc.

Alternatively you could reference the page by its primary key ... (id), and let the posts share names. So your posts could be referebced by id=678, id=56789, id=3. that way you would not be limiting your posts to having unique names.
0
 

Author Comment

by:headzoo
ID: 21791270
Hi pain_is_relative,
I wish it could be that simple. But clients today demand SEO friendly URLs, and not URLs that simply work (and work well).

I figure I could ditch the loop, and change my code to something like this:

$safe_title = 'hello_world';
$results = $db->execute(
      "SELECT * FROM `posts` WHERE `safe_title` REGEXP '^$safe_title(_[0-9]+)?$' LIMIT 1"
);

/**
 * Check to see if we found another post with the same
 * safe title
 */
if (count($results) > 0) {
      preg_match('/^(.*?)(_([0-9]))?$/', $results->safe_title, $matches);
      
      /**
       * If the safe title already has a _[0-9] appended to
       * it, we need to increment the [0-9] by one. Otherwise
       * add _1 to the safe title.
       */
      if (!empty($matches[2])) {
            $safe_title = $matches[0] . '_' . ($matches[2] + 1);
      } else {
            $safe_title = $matches[0] . '_1';
      }
}

$db->insert("INSERT INTO ....") // Insert post with safe title

That still doesn't solve the problem of another blog post with the same safe title being inserted microseconds before I call $db->insert(). Also I've never used regular expressions in a SELECT statement before, and I don't know what the performance would be like on large data sets.
0
 
LVL 2

Expert Comment

by:pain_is_relative
ID: 21791322
You could always add the session id to the date and then md5 it then add it to the safe name... but yess.. its ugly...

you could invesitigate using modrewrite, but i am not a fan of that.

i still think that a url of : blah/blah/blogpost.php?id=678 still looks reasonable.
you could even do this:
blah/blah/blogpost.php?id=678&name=my_safe_name_url_is_ace
that way you preserve the page name, and display it in the url, without ever using it. (just faking it it)
0
Ransomware-A Revenue Bonanza for Service Providers

Ransomware – malware that gets on your customers’ computers, encrypts their data, and extorts a hefty ransom for the decryption keys – is a surging new threat.  The purpose of this eBook is to educate the reader about ransomware attacks.

 
LVL 3

Accepted Solution

by:
pzurowski earned 500 total points
ID: 21791428
Count total pages which begins with safe hello_world. If there will be none, so you have good title, if not you can attach total number of existing similar titles to the title. (If there is one, it will be without number, so the number will be 1; if there is two, there willbe wihout number and with _1, so the total is 2 and you can attach _2; ...so on..)

You can change
   LIKE '$safe_title%'
below to
   RLIKE '^${safe_title}_?[0-9]*\$'
to get more accurate results.
$safe_title = getSafeTitle('Hello World');
$results = $db->execute("SELECT count(*) AS cnt FROM `posts` WHERE `safe_title` LIKE '$safe_title%'");
 
$count = $result->getOne(); // or other function, which get "cnt" from query above
 
if( $count > 0 ){
   $safe_title .= "_" . $count;
}
 
$db->insert("INSERT INTO ....") // Insert post with safe title

Open in new window

0
 

Author Comment

by:headzoo
ID: 21791524
@pzurowski -
You gave me an idea. I could try SQL like this:

INSERT INTO `posts` (`id`, `title`, `content`, `user_id`, `safe_title`)
SELECT
null,
'Hello World',
'This is a test',
1,
IF(COUNT(*) > 1, CONCAT('hello_world_', COUNT(*)), 'hello_world')
FROM `posts` WHERE `safe_title` REGEXP '^hello_world(_[0-9]+)?$'
LIMIT 1

That would insert the blog post, and generate the safe title all in one transaction. This would be very quick, and it becomes unlikely that another post with the same safe title would be inserted before the transaction is complete.

Any thoughts on this kind of SQL?
0
 
LVL 3

Assisted Solution

by:pzurowski
pzurowski earned 500 total points
ID: 21791659
think about seperating count and safe url, i mean:

CREATE TABLE `posts` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY ,
`title` VARCHAR( 50 ) NOT NULL ,
`safe_title` VARCHAR( 60 ) NOT NULL ,
`which` INT NOT NULL, --- added here
`content` TEXT NOT NULL ,
`user_id` INT UNSIGNED NOT NULL ,
UNIQUE (`safe_title`, `which`) --- added here
) ENGINE = MYISAM ;

and next will be simplier and faster
INSERT INTO `posts` (`id`, `title`, `content`, `user_id`, `safe_title`, `which`)
SELECT
null,
'Hello World',
'This is a test',
1,
COUNT(*) WHERE `safe_title` = 'hello_world'
 
SELECT `id`, IF( `which` > 0, CONCAT( `safe_title`, "_", `which` ), `safe_title` ) FROM `posts`

Open in new window

0
 

Author Comment

by:headzoo
ID: 21791678
@pzurowski -
I assume this is better performance wise, because I'm not requesting a count of all matching rows twice?
0
 
LVL 3

Expert Comment

by:pzurowski
ID: 21791696
First of all, because there is no regexp, which are slower. Moreover you can quicker read it, and it will be faster to expand or bug fix in future :)
0
 
LVL 2

Expert Comment

by:raheel_lips
ID: 21792213
Hi headzoo,

$safe_title = getSafeTitle('Hello World'); // this will give you white spaces removed string.
 
$results = $db->execute("SELECT count(*)+1 AS total FROM `posts` WHERE `safe_title` LIKE '$safe_title%'");
 
if( $count > 2 ){
   $safe_title .= "_" . $count;
} 
 
$db->insert("INSERT INTO ....") // Insert post with safe title

Open in new window

0
 
LVL 2

Expert Comment

by:raheel_lips
ID: 21792225
I missed some comments in my last post so consider these too.

// Above mentioned code is utilizing only 2 hits in post table.
1) check existing
2) Insert valid information

** There is no way to sence the duplicate data during insertion of post in table.

/* You would use one table transaction for check duplication before inserting data in table.
0
 

Author Comment

by:headzoo
ID: 21792292
@raheel_lips

The problem with using LIKE 'hello_world%', is it would also match against safe titles like "hello_world_how_are_you", etc.

@pzurowski

I played around with your SQL, and I can't tell what it's supposed to do. It appears to be completely invalid and/or doesn't work no matter how I rearranged it.

@everyone

It dawned on me that the request URLs would look like this:

http://site.com/users/headzoo/posts/hello_world

that is:

http://site.com/usres/[name of user]/posts/[post safe title]

Which means the user and the safe title could be combined into a unique index. So I changed the table to this:

CREATE TABLE `posts` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY ,
`title` VARCHAR( 50 ) NOT NULL ,
`safe_title` VARCHAR( 60 ) NOT NULL ,
`content` TEXT NOT NULL ,
`user_id` INT UNSIGNED NOT NULL ,
UNIQUE (`safe_title`, `user_id`) -- changed here
) ENGINE = MYISAM ;

Because it's impossible for the same user to submit more than one post with the same title, at the same time, I can safely write my code like this:

$safe_title = 'hello_world';
$results = $db->execute("SELECT COUNT(*) FROM `posts` WHERE `safe_title` = '$safe_title'");
if (count($results) > 0) {
      $safe_title .= '_' . (count($results) + 1);
}
$db->insert("INSERT INTO ....") // Insert post with safe title

I don't have to worry about another post with the same safe title being inserted prior to calling $db->insert(), because I can have that. Just so long as it's a different user.

A couple people here gave me the ideas for my final code/sql. So I need to split the points up some way.
0

Featured Post

PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
RDBMS and No sql database 4 63
SubQuery link 4 35
Output in PHP throwing alignment of data off issue 12 43
Uploading a CSV Data Import via PHP & MySql 3 29
All XML, All the Time; More Fun MySQL Tidbits – Dynamically Generate XML via Stored Procedure in MySQL Extensible Markup Language (XML) and database systems, a marriage we are seeing more and more of.  So the topics of parsing and manipulating XM…
This guide whil teach how to setup live replication (database mirroring) on 2 servers for backup or other purposes. In our example situation we have this network schema (see atachment). We need to replicate EVERY executed SQL query on server 1 to…
Although Jacob Bernoulli (1654-1705) has been credited as the creator of "Binomial Distribution Table", Gottfried Leibniz (1646-1716) did his dissertation on the subject in 1666; Leibniz you may recall is the co-inventor of "Calculus" and beat Isaac…

856 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question