[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 989
  • Last Modified:

MySQL: Inserting Unique Column Value

Hi,
Scenario: I have a database table that stores blog posts. It looks like this:

CREATE TABLE `posts` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY ,
`title` VARCHAR( 50 ) NOT NULL ,
`safe_title` VARCHAR( 60 ) NOT NULL ,
`content` TEXT NOT NULL ,
`user_id` INT UNSIGNED NOT NULL ,
UNIQUE (`safe_title`)
) ENGINE = MYISAM ;

The `safe_title` column contains the post title in a URL safe format. That is, the post title with all special characters removed. So a post title of "Hello World" becomes "hello_world". It's important that every post has a unique safe title, even if some posts have the same title. Post #1 with the title "Hello World" would have the safe title of "hello_world". Post #2 with the title "Hello World" would have the safe title "hello_world_2", and so on.

The problem: How to insert a unique safe title in only one or two db transactions? I don't want to create code like this:

$found = true;
$counter = 0;
do {
      ++$counter;
      $safe_title = getSafeTitle('Hello World') . '_' . $counter;
      $results = $db->execute("SELECT FROM `posts` WHERE `safe_title` = '$safe_title' LIMIT 1");
      if (count($results) == 0) {
            $found = false;
      }
} while ($found == true);
$db->insert("INSERT INTO ....") // Insert post with safe title

That little bit of code could loop through a dozen times before a unique title is found. Even worse, by the time $db->insert() is called, the safe title that I've found may no longer be unique (If the system is inserting thousands of new blog posts a minute).

So how do I avoid this problem?
0
headzoo
Asked:
headzoo
  • 4
  • 3
  • 2
  • +1
2 Solutions
 
pain_is_relativeCommented:
you are painting yourself into a corner.

lets see...you could attach an md5 hash of the current date after the name.... ugly.

you could make the safe-title a unique primary key, but that could cause you problems... old posts getting over written, errors writing to table. etc.

Alternatively you could reference the page by its primary key ... (id), and let the posts share names. So your posts could be referebced by id=678, id=56789, id=3. that way you would not be limiting your posts to having unique names.
0
 
headzooAuthor Commented:
Hi pain_is_relative,
I wish it could be that simple. But clients today demand SEO friendly URLs, and not URLs that simply work (and work well).

I figure I could ditch the loop, and change my code to something like this:

$safe_title = 'hello_world';
$results = $db->execute(
      "SELECT * FROM `posts` WHERE `safe_title` REGEXP '^$safe_title(_[0-9]+)?$' LIMIT 1"
);

/**
 * Check to see if we found another post with the same
 * safe title
 */
if (count($results) > 0) {
      preg_match('/^(.*?)(_([0-9]))?$/', $results->safe_title, $matches);
      
      /**
       * If the safe title already has a _[0-9] appended to
       * it, we need to increment the [0-9] by one. Otherwise
       * add _1 to the safe title.
       */
      if (!empty($matches[2])) {
            $safe_title = $matches[0] . '_' . ($matches[2] + 1);
      } else {
            $safe_title = $matches[0] . '_1';
      }
}

$db->insert("INSERT INTO ....") // Insert post with safe title

That still doesn't solve the problem of another blog post with the same safe title being inserted microseconds before I call $db->insert(). Also I've never used regular expressions in a SELECT statement before, and I don't know what the performance would be like on large data sets.
0
 
pain_is_relativeCommented:
You could always add the session id to the date and then md5 it then add it to the safe name... but yess.. its ugly...

you could invesitigate using modrewrite, but i am not a fan of that.

i still think that a url of : blah/blah/blogpost.php?id=678 still looks reasonable.
you could even do this:
blah/blah/blogpost.php?id=678&name=my_safe_name_url_is_ace
that way you preserve the page name, and display it in the url, without ever using it. (just faking it it)
0
Veeam Disaster Recovery in Microsoft Azure

Veeam PN for Microsoft Azure is a FREE solution designed to simplify and automate the setup of a DR site in Microsoft Azure using lightweight software-defined networking. It reduces the complexity of VPN deployments and is designed for businesses of ALL sizes.

 
pzurowskiCommented:
Count total pages which begins with safe hello_world. If there will be none, so you have good title, if not you can attach total number of existing similar titles to the title. (If there is one, it will be without number, so the number will be 1; if there is two, there willbe wihout number and with _1, so the total is 2 and you can attach _2; ...so on..)

You can change
   LIKE '$safe_title%'
below to
   RLIKE '^${safe_title}_?[0-9]*\$'
to get more accurate results.
$safe_title = getSafeTitle('Hello World');
$results = $db->execute("SELECT count(*) AS cnt FROM `posts` WHERE `safe_title` LIKE '$safe_title%'");
 
$count = $result->getOne(); // or other function, which get "cnt" from query above
 
if( $count > 0 ){
   $safe_title .= "_" . $count;
}
 
$db->insert("INSERT INTO ....") // Insert post with safe title

Open in new window

0
 
headzooAuthor Commented:
@pzurowski -
You gave me an idea. I could try SQL like this:

INSERT INTO `posts` (`id`, `title`, `content`, `user_id`, `safe_title`)
SELECT
null,
'Hello World',
'This is a test',
1,
IF(COUNT(*) > 1, CONCAT('hello_world_', COUNT(*)), 'hello_world')
FROM `posts` WHERE `safe_title` REGEXP '^hello_world(_[0-9]+)?$'
LIMIT 1

That would insert the blog post, and generate the safe title all in one transaction. This would be very quick, and it becomes unlikely that another post with the same safe title would be inserted before the transaction is complete.

Any thoughts on this kind of SQL?
0
 
pzurowskiCommented:
think about seperating count and safe url, i mean:

CREATE TABLE `posts` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY ,
`title` VARCHAR( 50 ) NOT NULL ,
`safe_title` VARCHAR( 60 ) NOT NULL ,
`which` INT NOT NULL, --- added here
`content` TEXT NOT NULL ,
`user_id` INT UNSIGNED NOT NULL ,
UNIQUE (`safe_title`, `which`) --- added here
) ENGINE = MYISAM ;

and next will be simplier and faster
INSERT INTO `posts` (`id`, `title`, `content`, `user_id`, `safe_title`, `which`)
SELECT
null,
'Hello World',
'This is a test',
1,
COUNT(*) WHERE `safe_title` = 'hello_world'
 
SELECT `id`, IF( `which` > 0, CONCAT( `safe_title`, "_", `which` ), `safe_title` ) FROM `posts`

Open in new window

0
 
headzooAuthor Commented:
@pzurowski -
I assume this is better performance wise, because I'm not requesting a count of all matching rows twice?
0
 
pzurowskiCommented:
First of all, because there is no regexp, which are slower. Moreover you can quicker read it, and it will be faster to expand or bug fix in future :)
0
 
raheel_lipsCommented:
Hi headzoo,

$safe_title = getSafeTitle('Hello World'); // this will give you white spaces removed string.
 
$results = $db->execute("SELECT count(*)+1 AS total FROM `posts` WHERE `safe_title` LIKE '$safe_title%'");
 
if( $count > 2 ){
   $safe_title .= "_" . $count;
} 
 
$db->insert("INSERT INTO ....") // Insert post with safe title

Open in new window

0
 
raheel_lipsCommented:
I missed some comments in my last post so consider these too.

// Above mentioned code is utilizing only 2 hits in post table.
1) check existing
2) Insert valid information

** There is no way to sence the duplicate data during insertion of post in table.

/* You would use one table transaction for check duplication before inserting data in table.
0
 
headzooAuthor Commented:
@raheel_lips

The problem with using LIKE 'hello_world%', is it would also match against safe titles like "hello_world_how_are_you", etc.

@pzurowski

I played around with your SQL, and I can't tell what it's supposed to do. It appears to be completely invalid and/or doesn't work no matter how I rearranged it.

@everyone

It dawned on me that the request URLs would look like this:

http://site.com/users/headzoo/posts/hello_world

that is:

http://site.com/usres/[name of user]/posts/[post safe title]

Which means the user and the safe title could be combined into a unique index. So I changed the table to this:

CREATE TABLE `posts` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY ,
`title` VARCHAR( 50 ) NOT NULL ,
`safe_title` VARCHAR( 60 ) NOT NULL ,
`content` TEXT NOT NULL ,
`user_id` INT UNSIGNED NOT NULL ,
UNIQUE (`safe_title`, `user_id`) -- changed here
) ENGINE = MYISAM ;

Because it's impossible for the same user to submit more than one post with the same title, at the same time, I can safely write my code like this:

$safe_title = 'hello_world';
$results = $db->execute("SELECT COUNT(*) FROM `posts` WHERE `safe_title` = '$safe_title'");
if (count($results) > 0) {
      $safe_title .= '_' . (count($results) + 1);
}
$db->insert("INSERT INTO ....") // Insert post with safe title

I don't have to worry about another post with the same safe title being inserted prior to calling $db->insert(), because I can have that. Just so long as it's a different user.

A couple people here gave me the ideas for my final code/sql. So I need to split the points up some way.
0

Featured Post

Fill in the form and get your FREE NFR key NOW!

Veeam is happy to provide a FREE NFR server license to certified engineers, trainers, and bloggers.  It allows for the non‑production use of Veeam Agent for Microsoft Windows. This license is valid for five workstations and two servers.

  • 4
  • 3
  • 2
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now