Combine rows into new column in MS Access SQL, based on match in another column

Samuel Baker
Samuel Baker used Ask the Experts™
on
Here is my data:

RAW_OEM_DATA
| OEMCurrentPartNumber | OEMDescription       | OEMSubNumber |
|----------------------|----------------------|--------------|
| 19M7796              | SCREW                |              |
| 19M8365              | USE PN 19M7796       | 19M7796      |
| AT256104             | USE PN 19M7796       | 19M7796      |
| CH12356              | USE PN 19M7796       | 19M7796      |
| M134162              | USE PN 19M7796       | 19M7796      |
| M74138               | USE PN 19M7796       | 19M7796      |
| AE42460              | BEARING WITH HOUSING |              |
| DA25307              | USE PN AE42460       | AE42460      |
| AH223030             | USE PN AH227807      | AH227807     |
| AH227807             | FRAME                |              |
| AH208565             | USE PN AH223030      | AH223030     |

Open in new window


FINAL_OEM_DATA
| OEMCurrentPartNumber | OEMDescription       | OEMSubNumber |
|----------------------|----------------------|--------------|
| 19M7796              | SCREW                | 19M7796      |
| 19M7796              | SCREW                | 19M8365      |
| 19M7796              | SCREW                | AT256104     |
| 19M7796              | SCREW                | AT257892     |
| 19M7796              | SCREW                | CH10599      |
| 19M7796              | SCREW                | CH12356      |
| 19M7796              | SCREW                | M134162      |
| AE42460              | BEARING WITH HOUSING | DA25307      |
| AE42460              | BEARING WITH HOUSING | AE42460      |
| AH227807             | FRAME                | AH223030     |
| AH227807             | FRAME                | AH208565     |
| AH227807             | FRAME                | AH227807     |

Open in new window


COMPANY_DATA
| Item       | OEMItem  | Description  |
|------------|----------|--------------|
| AM19M7796  | 19M7796  | Screw        |
| AMAE42460  | AE42460  | Bearing Assy |
| AMAH227807 | AH227807 | Frame        |

Open in new window



The purpose
I am cross referencing OEM part numbers with my companies part numbers.

In the RAW_OEM_DATA table, any rows without a number in OEMSubNumber are the "latest" numbers. All other numbers are sub numbers that reference the "latest" .

The [OEMItem] in COMPANY DATA is the "latest" OEM number. when a user searches a part number my query searches the OEMSubNumber column and brings back the OEMCurrentPartNumber to match with the [OEMItem] in my companies numbers.

To get the raw OEM data to the format in the FINAL_OEM_DATA is what i'm trying to achieve. But I don't need all 1.6 million records in that table. just the ones that match the [OEMItem] from COMPANY_DATA.

Final Result
Both the RAW_OEM_DATA and COMPANY_DATA tables reside on my web server so employees can access via the secure website. It is a aspx site with MS SQL server. Twice a year i have to run my companies new item numbers through this process and update my tables with the new data.


Couple of notes
Some numbers in RAW_OEM_DATA "piggy back" one another. example
AH208565 subs to AH223030 which subs to AH227807
That means not all [OEMSubNumber] will be the "latest" OEM number.

The description for the "latest" OEM number has to replace all the "USE PN ***" numbers


My process in the past
run quite a few queries to get the OEM raw data to the above desired result. then run a query to pull just "main" OEM numbers that match my companies table.
I then export that data to Excel and run the following macro to get it to the final OEM table

Sub test()
    Dim a, b, i As Long, ii As Long, iii As Long, n As Long
    With Range("a1").CurrentRegion.Offset(1)
        a = .Value
        ReDim b(1 To Application.CountA(.Columns(3).Resize(, .Columns.Count)), 1 To 3)
        .Columns(3).Resize(, .Columns.Count).ClearContents
        With CreateObject("Scripting.Dictionary")
            .CompareMode = 1
            For i = 1 To UBound(a, 1)
                If Not .exists(a(i, 1)) Then
                    Set .Item(a(i, 1)) = _
                    CreateObject("Scripting.Dictionary")
                    .Item(a(i, 1)).CompareMode = 1
                End If
                For ii = 3 To UBound(a, 2)
                    If a(i, ii) <> "" Then
                        If Not .Item(a(i, 1)).exists(a(i, ii)) Then
                            n = n + 1
                            For iii = 1 To 2
                                b(n, iii) = a(i, iii)
                            Next
                            b(n, iii) = a(i, ii)
                            .Item(a(i, 1))(a(i, ii)) = Empty
                        End If
                    End If
                Next
            Next
        End With
        .Resize(n, 3).Value = b
    End With
End Sub

Open in new window


If I already have this method working why am I asking this question? Well the OEM data i was using was from 2012 and I now have the 2018 data. I started this process and just seems like i'm doing way more than is needed. On top of the fact I can't figure out exactly how I did it. I have the original database i used but the queries are not working correctly.
testData.xlsx
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
David ToddSenior Database Administrator

Commented:
Hi,

Can you post data that corresponds to your desired result or vice versa?

Regards
  David
SharathData Engineer

Commented:
I don't understand the relation between input and desired result. OEMPartNumber - 19M7781 doesn't exist in the source data.
Can you explain what exactly you are trying to achieve?

Author

Commented:
I apologize. i posted the incorrect result data. I have updated the question with the correct data
Should you be charging more for IT Services?

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden using our free interactive tool and use it to determine the right price for your IT services. Start calculating Now!

Distinguished Expert 2017

Commented:
You are taking what looks like a normalized table and turning it into a spreadsheet.  Can you explain the purpose of this?  The data should be stored as you see it in the csv file although there does seem to be two record types (what are the screw records).
Mark WillsTopic Advisor, Page Editor
Distinguished Expert 2018

Commented:
I can do it in MS SQL - seems straight forward enough.

Can you post (at least part of) the CSV ?

That is, if you dont mind a t-sql solution :)

Author

Commented:
I have updated my question entirely to help make my goad more clear.
I also attached some data from all tables
Mark WillsTopic Advisor, Page Editor
Distinguished Expert 2018

Commented:
Have to go elsewhere, so decided to post a possible SQL Sever / T-SQL solution anyway...

First some test data
create table #OEM (OEMCurrentPartNumber varchar(100), OEMDescription varchar(100), OEMSubNumber varchar(100))
insert #oem values
('19M8365','USE PN 19M7796','19M7796'),
('AT256104','USE PN 19M7796','19M7796'),
('CH12356','USE PN 19M7796','19M7796'),
('M134162','USE PN 19M7796','19M7796'),
('M74138','USE PN 19M7796','19M7796'),
('19M7796','SCREW',''),
('LCA52959','USE PN 19M5039','19M5039'),
('LCA57932','USE PN 19M5039','19M5039'),
('LCA65582','USE PN 19M5039','19M5039'),
('LW407202608','USE PN 19M5039','19M5039'),
('636102041','USE PN 19M5039','19M5039'),
('19M5039','SCREW','')

Open in new window

And now the query - using PIVOT after some manipulations....
;with cte as
( select case when len(OEMSubNumber) < 1 then OEMCurrentPartNumber else OEMSubNumber end as OEMCurrentPartNumber
       , case when len(OEMSubNumber) < 1 then OEMDescription else '' end as OEMDescription
       , OEMCurrentPartNumber as OEMSubNumber
	   , case when len(OEMSubNumber) < 1 then 0 else row_number() over (partition by OEMSubNumber order by OEMCurrentPartNumber) end as rn
  from #OEM
) select OEMCurrentPartNumber, Max(OemDescription) as OemDescription, max([1]) as OEMSubNumber1, max([2]) as OEMSubNumber2, max([3]) as OEMSubNumber3, max([4]) as OEMSubNumber4, max([5]) as OEMSubNumber5, max([6]) as OEMSubNumber6  -- currently only 6 of the [10]  possible subnumbers per the pivot criteria 
  from
  ( select OEMCurrentPartNumber, oemdescription, oemsubnumber, rn
    from cte) src
  PIVOT
  (max(oemsubnumber) for rn in ([1],[2],[3],[4],[5],[6],[7],[8],[9],[10])) pvt   -- currently supporting 10 different oemsubnumbers - add more if needed
  group by OEMCurrentPartNumber

Open in new window

And the results
OEMCurrentPartNumber	OemDescription	OEMSubNumber1	OEMSubNumber2	OEMSubNumber3	OEMSubNumber4	OEMSubNumber5	OEMSubNumber6
19M5039       	SCREW   	636102041	LCA52959	LCA57932	LCA65582	LW407202608	NULL
19M7796       	SCREW   	19M8365  	AT256104	CH12356 	M134162  	M74138         	NULL

Open in new window

Of course I am only selecting the first [1] to [6] out of a possible [10] columns so you need to adjust accordingly.

To explain PIVOT a bit more, please read : http://www.experts-exchange.com/Microsoft/Development/MS-SQL-Server/SQL-Server-2005/A_653-Dynamic-Pivot-Procedure-for-SQL-Server.html
Mark WillsTopic Advisor, Page Editor
Distinguished Expert 2018

Commented:
*laughing* looks like we were both busy typing at the same time.

Still cant see the CSV file, but given the extra information, then why would you want multiple subpartnumbers across the page ?

Really running late for another appointment, and while I would love to discuss, I will have to check back in a few hours....

In the meantime, and if open to T-SQL have a look at my previous post.

Cheers,
Mark

Author

Commented:
Hey Mark, thanks for your responses.

"why would you want multiple subpartnumbers across the page ?"
Well if my goal can be achieved doing it a simpler way that would be ideal. That is just the only way i could figure out how to do it at that time.

Let me know if there is another way and if not I will take a look at your previous post and see if i can make that work.

I agree having multiple subpartnumbers across the page is not a good idea but I do not know how to do it any other way.
Mark WillsTopic Advisor, Page Editor
Distinguished Expert 2018

Commented:
How deep can the "piggy back" go ?

Do you really need :
| AH227807             | FRAME                | AH208565     |

Because  AH208565 really points to  AH223030 (which in turn points to the 'latest' AH227807)

So, should we be seeing those 'piggy back' entries at all in OEM_RAW_DATA ?

Author

Commented:
I would have to look deeper to see how deep a piggy back could go but it can go at least a couple times.

yes the piggy back entries would show up throughout. im uploading the raw file so you can see it. Its 323 mb's so will be just a few.

Author

Commented:
Here is the link to the raw oem table. Its in `.asc` format and has more columns that i'm acutally using

https://drive.google.com/file/d/1O621TPnUKj9FVi_KJns8CWTQJtxuBLyb/view?usp=sharing
Distinguished Expert 2017

Commented:
That is just the only way i could figure out how to do it at that time.
That doesn't answer the question of why you want to flatten the schema.  Your example shows 6 "numbers".  What if there are 7 or 355?  This is spreadsheet thinking and not how you hold data in a relational database.
Mark WillsTopic Advisor, Page Editor
Distinguished Expert 2018

Commented:
I am working on this and have found quite a few data errors.

Have (obviously) successfully loaded the raw data.

Will continue the analysis and get back to you.

What is with the pipe delimited OEMSubNumber and those with a number followed by a few spaces then a single digit ?

Anyway, still working with the data and trying to find the more robust solution taking into account the 300 or so errors (small percentage overall)

Not a great data file to have to work with - I feel your pain and frustration....

Author

Commented:
What is with the pipe delimited OEMSubNumber and those with a number followed by a few spaces then a single digit ?
Can you give me an example.

Also, for he asc file. I forgot to tell you even though you probably already figured out the first, second and last columns are the only one used from it.

Thank you for anything you can do. The data file was not meant for what i am doing and that is the reason for it not being a great data file. I pulled it from a pricing application but that is the only way to get the data.
Mark WillsTopic Advisor, Page Editor
Distinguished Expert 2018

Commented:
Okay,

Have been analysing the data to the detriment of responding in this thread.... Been very interesting :)

Loaded the asc data, found 1505267 records.

1 of which was malformed and duplicated : K1209

The vast majority with "USE PN XXX" in the description had the same value XXX in the OEMSubNumber - or was a header - 1504930 in total

Except for 336 rows where the "USE PN XXX" didnt match the OEMSubNumber (see attached)

They contained pipe delimited or had a single numeric digit to the right hand side (seperatde by spaces) or, a blank OEMSubNumber.

Of those, 193 were blank, 106 at least contained the OEMSubNumber

That leaves 37 rows that do have a legitimate formatted OEMSubNumber that doesnt match the "USE PN XXX"

We need a direction as to what to do with those - I have run models ignoring and using that number with mixed results.

The majority do ultimately resolve to the same result - just different paths to get there. Have yet to model joining on those rows effectively creating an almost duplicate entry - one using the "USE PN XXX" with XXX being the OEMSubNumber and the other using the original OEMSubNumber.

Now, doing a first pass to get OEMdescription, all but 222497 rows are resolved immediately. Those 222497 rows are the "chain" type OEMSubNumber's

Of which, 125827 are OEMSubNumbers that have no corresponding OEMCurrentPartNumber.

Leaving 96670 OEMSubNumbers. Those "chained" OEM's

Of those, 89 are circular - they ultimately resolve to themselves, or nonexistant OEMCurrentPartNumber's and/or repeat that pattern indefinitely.

All I have left to do is to generate the final query without all my crap in there so it becomes something you can actually use. Aiming for the raw format you need rather then the pivot.

Please see attached OEMSubNumber errata.
OEM_DATA_INCONSISTENCIES.csv

Author

Commented:
Wow, I cannot thank you enough for what you have done. It sounds exactly like what I need.

"We need a direction as to what to do with those"
You mean "the 336 rows where the "USE PN XXX" didnt match the OEMSubNumber"?
If so they can be ignored.

Can't wait to see what you have done. Will be waiting patiently.

Author

Commented:
Hey Mark, not trying to be pushy but wondered if you came up with anything?

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial