select the first row from similar matches

I have the table solicitudes

-------------------------------------------------------------------
|         enterprise                               |      state      |
|----------------------------------------------|----------------|
|     Saint Gobain Sekurit                  |   morelos    |
|     Saint Gobain Sekurit México     |   morelos    |
|     mabe                                          |   guerrero    |
|     mabe                                          |   guerrero    |
|     mabe México                             |   morelos     |
-------------------------------------------------------------------

I would like to do a query listing by  enterprise  with a condition by state, right now I use this query:

SELECT enterprise WHERE state = 'morelos' GROUP BY enterprise

Open in new window


and it give this result
-------------------------------------------------------------------
|         enterprise                               |      state      |
|----------------------------------------------|----------------|
|     Saint Gobain Sekurit                  |   morelos    |
|     Saint Gobain Sekurit México     |   morelos    |
|     mabe México                             |   morelos     |
-------------------------------------------------------------------

But What can I do to group enterprise names and just to list the first one when I found similar names of enterprises, for example
-------------------------------------------------------------------
|         enterprise                               |      state      |
|----------------------------------------------|----------------|
|     Saint Gobain Sekurit                  |   morelos    |
|     mabe México                             |   morelos     |
-------------------------------------------------------------------


Best regards
altariamx2003Asked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Julian HansenCommented:
What constitutes a similar name - in this case it might require human knowledge rather than a rule.

I would do the following

SELECT DISTINCT enterprise FROM solicitudes;

Put those in a table - then in a second column put the grouping value you want so for insance
---------------------------------------------------------------------
|         enterprise                  |      state                  |
|-------------------------------------|-----------------------------|
|    Saint Gobain Sekurit México      |   Saint Gobain Sekurit      |
|    Saint Gobain Sekurit             |   Saint Gobain Sekurit      |
---------------------------------------------------------------------

Open in new window

You can then do a grouping on the join of this table and your solicitudes table
altariamx2003Author Commented:
Hi Julian I try this

SELECT DISTINCT T.enterprise
FROM solicitudes T
   LEFT JOIN solicitudes T2 ON T.enterprise != T2.enterprise
WHERE T.enterprise LIKE CONCAT('%', T2.enterprise, '%') AND T.state = 'Morelos'
UNION 
SELECT DISTINCT T2.enterprise
FROM solicitudes T
   LEFT JOIN solicitudes T2 ON T.enterprise != T2.enterprise
WHERE T.enterprise LIKE CONCAT('%', T2.enterprise, '%') AND T.state = 'Morelos'
ORDER BY enterprise

Open in new window


And it give me the same result as I try before
-------------------------------------------------------------------
|         enterprise                               |      state      |
|----------------------------------------------|----------------|
|     Saint Gobain Sekurit                  |   morelos    |
|     Saint Gobain Sekurit México     |   morelos    |
|     mabe México                             |   morelos     |
-------------------------------------------------------------------

This table is filled in a form that I use a subscription of a public bulletin, and as you can see I receive a lot of data with diferent format, for example:
Saint Gobain Sekurit    
Saint Gobain Sekurit México
SAIN GOBAIN MEXICO
saint gobain
etc............

And In the private system when the guys from marketing try to do a list of the name of the enterprises in the state of 'Morelos' they have that result.

Thats way Im looking for how to show the first one when I found similar names of enterprises, like this
-------------------------------------------------------------------
|         enterprise                               |      state      |
|----------------------------------------------|----------------|
|     Saint Gobain Sekurit                  |   morelos    |
|     mabe México                             |   morelos     |
-------------------------------------------------------------------
Julian HansenCommented:
I understand the requirement. The problem is you are dealing with user input so you are potentially faced with situations where where a machine solution can lead to the wrong results.

You can use the SOUNDEX function to get "closesness" value for your strings
https://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_soundex - but I am not sure how well that will work on your data see here
http://www.databasejournal.com/features/mysql/mysql-fuzzy-text-searching-using-the-soundex-function.html

The other option is to calculate the  Levenshtein distance on each entry and use that as a means to group them.

I have tried both of these in the past and in some cases they work in others they do not.

I am not sure you understood my suggestion regarding the use of DISTINCT. The idea is to create a lookup table that contains all the possible entries users can make - and link each one to a known value. This would require manual effort to setup. Once that table exists you can do something like this
LookupTable
-----------------------------------------------------------------------
|         OrginalValue                           |      UseValue     |
|----------------------------------------------|---------------------|
|     Saint Gobain Sekurit                  |   Saint Gobain  |
|     Saint Gobain                               |   Saint Gobain  |
|              ...                                         |          ...             |
-----------------------------------------------------------------------
Each row is unique and maps a user input to a UseValue.
You then do a something like this
SELECT DISTINCT u.UseValue, s.State FROM solicitudes s LEFT JOIN LookupTable u
     ON s.Enterprise = u.OriginalValue

Open in new window

The table might require maintenance in terms of updates occasionally but these will become less and less over time as all combinations are entered.

The initial populating of the LookupTable would be something like
INSERT INTO LookupTable (OringalValue)
SELECT DISTINCT Enterprise FROM solicitudes 

Open in new window


If the above is not an option then you are left with SOUNDEX and LEVENSTEIN

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
altariamx2003Author Commented:
aahh

I see

I think I understand your suggestion, let me try it, perhaps it works for me

thanks
Julian HansenCommented:
You are welcome feel free to post back if you have questions.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
MySQL Server

From novice to tech pro — start learning today.