Group same items on a single SKU

Hi Experts!

I have a listing of 100,000's of transactions. Most of those transactions correspond to same items, e.g. used iPhone 4G 8Gb GSM. However there is no product catalogue so basically you can find a registry with brand being Apple, Mac, or iPhone and the model could be 4G 8Gb, 4Gb 8Gb, 4 8Gb, etc. I think I made my point. Each field was written with what ever the user wanted.

I want to unify these items and get a starting item catalog so that now onwards all "identical" items be specified with a single SKU and the catalog be grown. Nonetheless, I have no idea on how to start. Of course I could deploy someone to start grouping similar items according to his/her understanding, but I would like to know if there is a programmatical way to classify most of these items. Maybe using Amazon API, or Data Mining. Any help will be very appreciated.

Who is Participating?
softpro2kConnect With a Mentor Commented:
A Database would be the best solution: MySql, Access, etc.

Now use "Distinct" clause in your SQL query. that would remove duplicate item/item-names and produce only unique items.
What format, system, rdbms, etc is the data in now?

Assuming there is consistency in the data values, this kind of grouping is pretty basic with either sql for an rdbms, or perl for a txt file (csv), or xsl for xml.
degarayAuthor Commented:
Hi, I have it in sql server, and I can have it in csv or xls without a problem. The main point is that grouping is not so basic since the data values are everything but consistent
Improve Your Query Performance Tuning

In this FREE six-day email course, you'll learn from Janis Griffin, Database Performance Evangelist. She'll teach 12 steps that you can use to optimize your queries as much as possible and see measurable results in your work. Get started today!

You could a simple vba function that was designed to accept the SKU free-hand description and then return the actual SKU ID.

It would search the description for specific words e.g. 8GB and it would then return the correct SKU ID e.g. Apple Ipod 8Gb.  

The problem you would have is that if two items both have 8gb in them it could return the wrong value.
I said VBA function because I found the question in the excel page, but given the choice I would create a database function to do this.  
degarayAuthor Commented:
Right, but I do not have sku's yet. That is the point. I would like to know if there is any way that I could send those parameters such as 8gb and iphone and get the sku and any additional info.
degarayAuthor Commented:
I was looking for something more relevant but anyway I think that can help a little, although that would not distinguish between lemon and lemons or lemon_.

The distinct clause should distinguish between lemon and lemons or lemon_.

You can try a feature in excel to create unique list from an existing list. You can do this using 'Advanced Filter' under 'Data' Menu. You need to check/tick 'Unique Records Only' box before proceeding.

Read 'Filter Unique Records' of this article.


A. Roy
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.