I need some assistance to find and use a sql server algorithm for categorising my list of products

Hi,

I have a table in my sql server database that contains the following columns...

ID
Name
CommonName
Description

Now, I need to add another column called, 'Categorisation' and this column will be used to filter out the different types of product.

Now my idea of doing this was to run some kind of algorithm over the table that will do some clever pattern matching for me over the 3 columns Name,Common Name and Description. I know it won't be 100% accurate but there are 1300 products and I was hoping to at least get some of these products categorised automatically categorised.

Using algorithms to analyse data is a complete new experience for me so a thorough explanation would be greatly appreciated. Thanks.
LVL 2
jazz__manAsked:
Who is Participating?
 
PortletPaulfreelancerCommented:
>>Now, I need to add another column called, 'Categorisation'

Don't you need another TABLE? e.g.
ProductCategory
ID, Product_ID (FK), Category

so for one row in the existing table you may have one or more rows in the new table

then your queries would be far easier and more efficient

select
*
from products p
inner join ProductCategory pc on p.id = pc.product_id
where pc.category ='fish'
or pc.category = 'chips'


+ You could take this a step further and have a table of Categories, so that only allowed categories can be added to a product.

sorry, but I just hate the idea of you evaluating strings (like '%something%') at each and every select statement - it will be a nightmare.

Normalization makes life easier in the long run.

If you go down this route, what you will need to generate would be a set of insert statements to the new table, and this could leverage the table of categories.

let's say you have a table of Categories like this

Fish
Chips
Salt
Vinegar

then you could do something along these lines (to generate the insert data, not as a daily activity)

select
p.*
c.category
from products p
inner join categories p.description like ('%' + c.category + '%')
0
 
Juan OcasioApplication DeveloperCommented:
You can use LIKE '%prodcut%' in your where clause.

For example:

SELECT * FROM table where Name LIKE '%product%'
OR CommonNam  LIKE '%product%'
OR Description LIKE '%product%'

This may take a LONG time depending on how much data is in your db, so you can do them separately as well

SELECT * FROM table where Name LIKE '%product%'


SELECT * FROM table where CommonNam  LIKE '%product%'


SELECT * FROM table where  Description LIKE '%product%'

the product you're looking for would go between the '% and the  %'
0
 
QlemoBatchelor, Developer and EE Topic AdvisorCommented:
Something like:
update tbl
set Categorisation = case
  when Name like '%Something%' and CommonName like '%Something%' and description like '%Something%' then 'Something'
  when Name ...
end

Open in new window

You can also write individual update commands for each pattern combination, and put the condition into the where clause. It is very similar to above.
0
Upgrade your Question Security!

Your question, your audience. Choose who sees your identity—and your question—with question security.

 
jazz__manAuthor Commented:
Qlemo,

Thanks very much, this is very helpful but I would have to write hundreds of case conditions to run this. This is why I was looking for something a bit more mathematical like an algorithm to do a bit of pattern matching for me. There are 900 distinct products in a database of 1300. Now if there were 100 distinct records in 1300 then life would be much easier but im not in this position. Hope this makes sense. Thanks
0
 
QlemoBatchelor, Developer and EE Topic AdvisorCommented:
To develop something smart we would have to know the details. SQL cannot guess, so you have to tell it something concrete.
You could e.g. build it stepwise:
update tbl set Categorisation = 'Something' where name like '\[A-F\]%' escape '\' and Categorisation is null;
update tbl set Categorisation = 'Somethong' where CommonName like '_\[0-9]%' escape '\' and Categorisation is null;

Open in new window

and so on. That will only update rows not already categorized. You should go for the most common categories first, and then treat the more special cases.
0
 
jazz__manAuthor Commented:
PortletPaul,

You are making me hungry!
0
 
PortletPaulfreelancerCommented:
LOL, maybe that's what's on my mind... [note to self, eat]
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.