Link to home
Start Free TrialLog in
Avatar of developingprogrammer
developingprogrammer

asked on

how to join matching records into 1 line?

hi guys,

i've got a super intriguing SQL question to ask - at least for me it's tremendously interesting.

i've got a query that produces data for reconciliation. (kindly see attached excel sheet here joining.xlsx)

in it i have some credits that have a matching debit and some that don't. the 2 yellow rows in the below picture show a match.
User generated image
the rules are that for 2 records to match, their:
1) PhysicalSettlementPeriodRowID must be the same
2) Credit and Debit must be the same (though they're on different lines)
3) Source must be different
4) ReconciliationAccountID must be the same
5) MachineID must be the same

so i've got 2 matching records as shown in the yellow highlight. how do i join these 2 into a single table that shows this?

User generated image
the matched records are on 1 line and the unmatched records are as per normal also on one line?
Avatar of Bitsqueezer
Bitsqueezer
Flag of Germany image

Hi,

untested, but should work with this:

SELECT B1.PhysicalSettlementPeriodRowID, B1.Credit, B2.Debit,
       B2.Source, B1.ReconciliationAccountID, B1.MachineID
  FROM tblBookings as B1 INNER JOIN tblBookings AS B2 ON B1.Credit = B2.Debit
 WHERE B1.PhysicalSettlementPeriodRowID = B1.PhysicalSettlementPeriodRowID
   AND (B1.Source < B2.Source) OR (B1.Source > B2.Source)
   AND B1.ReconciliationAccountID = B2.ReconciliationAccountID
   AND B1.MachineID = B2.MachineID

Open in new window


This should get all matched records which you can also use as subquery to identify all other records if you also want to list the rest.

A typical example of a bad table design, it's the way a bookkeeper thinks: Two columns containing data as column name. A normalized table would use "Account" and "Type", so that you would get "900" as Account and 1 or -1 as Type or a named group or anything like this.

Cheers,

Christian
SOLUTION
Avatar of als315
als315
Flag of Russian Federation image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of developingprogrammer
developingprogrammer

ASKER

cool thanks Christian and als315!!

Christian i tried your SQL but couldn't get it to work (i only just tried changing the names, haven't fully tested it yet)

als315 thanks for your help too! = ))

als315 what do you mean by there are problems with the table design? how should i design the table properly?

and also for your qry1, the rows and columns of the query are hidden in the query designer. they only show up when i go to SQL view and then back to design view again. how do you hide the rows and columns in the query designer? thanks!! = ))
whao als315, i'm always amazed at how you can think of all these super complex advanced multiple step queries - and SO FAST.

i spent quite awhile analysing it (i'm really slow!! = ((    ) and now i see that you're forming 4 distinct things

1) matching
2) credits
3) debits
4) nulls

but to even know that you have to form these 4 things means that you already saw the execution path in all the outer joins, PLUS using them in concert with the is null and is not null criteria.

i've only recently started to understand SQL, inner outer joins, same name fields from different table in the same query and the different criteria.

it's really amazing what you can do with your querying abilities als315!! i have a lot to learn!! = ))
oops Christian and als315, apologies i missed out on Christian's comments about the bad table design! i was so engrossed with trying to understand the amazing SQL both of yall gave me haha = )

yup yup i totally agree! in my database i have it as CreditDebit and Amount. but in the query they want it split into 2 columns again ha = )

the reason why this query is 2 columns is cause it's a stacked query. i should consider only splitting it at the end. hrmm but that may entail more work. not sure at the moment ha = )

but even if i join it into 1 column i will still need the superb queries yall gave me = ) no running away from it!! = ))
hrmm guys i am 99.9% sure that even if the table design is single column with CreditDebit field and Amount field, i will still need the multiple queries als315 gave me. am i right?

i'm just trying to explore the ways that i could do things differently if i had a "good table design" --> in double quotes cause this is a stacked query on another query shown to the user already = )) but 100% agree with yall!! = ))
Can you can show also source table(s)? May be there are any short way to get result.
ok als315 here you go! i'm not sure if this is what is sufficient but i'm pulling all my data from these 3 tables and using these queries to form the querying base first.

i only included the structure though hope that is fine!! = ))
DBJoin.zip
als315 for the last union query you gave me to join all the 4 sub queries, should i use UNION or UNION ALL?

will there be a difference in this context? all along i've been using UNION ALL for all my other queries in this context cause i don't the DISTINCT keyword.

i'm thinking by analysing the queries that UNION or UNION ALL here doesn't matter. am i correct to say that als315?
hi als315, apologies i stumbled into quite a major problem here.

when i add 2 lines of manual to your database table, i get repeated matches. let me illustrate what i mean.

i added these two line (at the bottom of the screenshot)
User generated image
when i ran the last union query, i get this User generated image
however what i should get is something like this User generated image
it doesn't matter whether manual matches with claims or manual matches with physical.

so long as the amounts match for credit and debit that's ok already.

help help!!
also als315, when i just add 1 line like this, the credit doesn't show in the union query as a single sided credit

User generated imageUser generated image
so that means there are 2 issues with the queries i need to fix

1) adding 2 matching manual transaction create duplicate transactions.
2) adding 1 unmatched manual transaction does not show up in the query
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
thanks Christian!

hrmm on the alias part i'm trying to have descriptive names cause i'm hard time differentiating things and making sense of stuff ha. hrmm gimme a bit more experience getting my head around things and i will definitely make sure i use aliases which are shorter than the table names = )

edit: hrmm Christian i'm not sure if you're talking about the table name alias but i think you are - i was using the query designer and so i didn't go into SQL view to change the alias. but after using your alias it looks much better and i can actually see which tables the fields are coming from. much much better. i will adopt that practice from now on = ) intra query aliases for table names. sorry if i miss your points sometimes Christian my head is kinda like spinning round and round now haha = )

ok let me try and use your query again Christian, i'm not sure how to build the whole solution like what als315 did with the multiple queries but i'll try now = )
Hi,

you can set the alias with the query designer, too. Simply click on the table and open the properties window, there you'll find the alias property.

Cheers,

Christian
Whao Christian I didn't know that!! Whao it really helps so now I can actually see what in the world I'm working with! Haha, great!! I'm always going to use it now!! Thanks Christian!! = ))
YES Christian! I solved the joining thing completely! Now it totally works! But I couldn't have done it of course without your help and als315's help!! = )) will post my solution in awhile when I get back! It may be down and dirty but when I learn a better way, I'll definitely adopt it!! = ))
hi Christian!

i was trying to understand the difference between how you did the first query and how als315 did the first query.

you used criteria whilst als315 used multiple joins.

what do you recommend? is there a optimisation hit or is it a matter of personal preference?

also, you've shared with me to use Jet functions instead of VBA functions for optimisation and in my other questions boag2000 and PortletPaul concluded that there is no definitive documentation for Jet functions and a Jet only function doesn't exist.

Allen Browne is also of the opinion that we should use expressions (Is Null instead of IsNull()  ) and functions (iif() Jet version instead of iif() VBA version) native to Jet, and also the German article that you shared with me German Emergency Article hahaha

so as for yourself Christian, how do you know which functions are native to Jet and which are VBA functions? i'm trying to incorporate all the best practices you've been teaching me = ))

https://www.experts-exchange.com/questions/28230107/how-to-know-if-function-is-Access-based-or-SQL-based.html
1. Those are Access functions only (you can use them when you make queries in Access, (as the link states)
There is no such thing as an Access "SQL Only" function, ...but here we get into the fine distinctions between Jet and T-SQL
But those are Access function, you can use then in VBA, Expressions and in Access queries

2. Again, there is no such thing as a "Access SQL only function"
These are T-SQL function, and some cannot be used in Access created queries 'directly' (some can be used when you use a Pass-through query though)

But is your goal is to learn these "SQL ONLY Functions", then know that some will not work in Access.  
But here again, T-QSL is only one flavor of SQL.
MySQL and Oracle each have there own unique Functions as well.

3. No, because there is no such thing as "Access SQL", There is plenty of info available on the "T-SQL", which Access uses.
You can use the expression builder in Access to list all of the Access functions
(You can also use the object browser in VBA to see functions, ...but this opens a whole new can of worms...)

You can use the function browser in SQL Server to list all the functions

On your 'Conclusion"
It depends on where these functions appear....

The bottom line is this...  
You stated that the purpose of this Q was:
    "i'm just trying to learn how to use SQL better to optimise it instead of using VBA functions"

Then just learn T-SQL, and forget about the distinctions between Access SQL, and VBA functions.
90% of the time the distinction will be irreverent (when writing queries).

But your assertion that: "VBA functions which will slow it down"
...is not fully qualified, ...so this cannot be a definitive statement

JeffCoachman

https://www.experts-exchange.com/questions/28230099/SQL-equivalent-of-len-function.html
There are many syntax difference between Access and SQL Server, and SQL Server has many differences to Oracle which has many differences to MySQL which has....

you get the idea

Yes, a UDF can be called like LEN() but it won't be listed amongst the lists I suggested. I also think you are reaching for a purity level that Access itself cannot provide.

Just look at the parentheses that Access insists on using for Joins that every other SQL db I know of does not need.

IIF() by the way is  not in all T-SQL versions as I understand it. (btw: I don't use IIF in SQL Server anyway, I prefer the much more pure SQL: CASE ... END syntax).

{+ edit}
oh, and UDFs exist in other dbms's too :)
the solution i came up with to join everything together for my database was

1) i added a UID for each row consisting of tablename + meaningless key
2) i stored your query as the 1st base query.
3) i added to your query the CreditSourceUID and DebitSourceUID so i know which records they are coming from
4) i did a UNION ALL to join all the CreditSourceUIDs and DebitSourceUIDs so that i know which records are already used in the matched query
5) to get UNMATCHEDRECORDS, i did an outer join of the ALLRECORDS data set and the MATCHEDRECORDS data set, and set the criteria to where MATCHEDRECORDS is null --> so i only show those that are not matched
6) i joined the MATCHEDRECORDS with the UMATCHEDRECORDS to get the final list = ))

if you've got time Christian, what do you think of this workflow? sorry to take up so much of your time!! = )) feel free to just give me general direction pointers and i'll sort out the rest and ask you again if i can't figure it out!! = ))
Hi,

WHERE <-> ON: I quote a section of the often mentioned book from Itzik Ben-Gan, "Inside Microsoft SQL Server 2005: T-SQL Querying" to answer that:

"There's a logical difference between the ON and WHERE clauses only when using an outer join. When using an inner join, it doesn't matter where you specify your logical expressions because the step regarding outer joins is skipped."

As I think that in this case there is no big difference in T-SQL or Access SQL it should also here make no difference. The method using ON is maybe a little bit simpler as it is easier to be drawn in the query editor.


The match problem: Adding an ID is a good idea, but you only need to add one. If I use the demo database, I have added an AutoNumber column named "ID" to your table "Source". Then I used the following query as match query:

SELECT S1.ID AS S1ID, S2.ID AS S2ID, S1.PhysicalSettlementPeriodRowID, 
       S1.Credit, S2.Debit, S1.Source, S1.EditText, S1.ReconciliationAccountID, S1.MachineID
  FROM Source AS S1 INNER JOIN Source AS S2 ON S1.Credit = S2.Debit
 WHERE S1.PhysicalSettlementPeriodRowID = S2.PhysicalSettlementPeriodRowID
   AND (S1.Source < S2.Source OR S1.Source > S2.Source)
   AND S1.ReconciliationAccountID = S2.ReconciliationAccountID
   AND S1.MachineID = S2.MachineID

Open in new window


And to add the other rows to the matching records there's only one additional UNION ALL query needed:

SELECT 'Matching' AS Comment, M0.PhysicalSettlementPeriodRowID, M0.Credit, M0.Debit, 
       M0.Source, M0.EditText, M0.ReconciliationAccountID, M0.MachineID
  FROM qryMatchingIDs AS M0

UNION ALL

SELECT 'Not Matching' AS Comment, SRC.PhysicalSettlementPeriodRowID, SRC.Credit, 
       SRC.Debit, SRC.Source, SRC.EditText, SRC.ReconciliationAccountID,
       SRC.MachineID
  FROM Source AS SRC
 WHERE SRC.ID NOT IN (SELECT S1ID FROM qryMatchingIDs AS M1)
   AND SRC.ID NOT IN (SELECT S2ID FROM qryMatchingIDs AS M2);

Open in new window



Access SQL native functions: As you saw with the IIf function there is really a difference between SQL IIf and VBA IIf (i.e., SQL IIf doesn't evaluate the complete expression whereas VBA IIf does and SQL IIf can use "IS NULL" and VBA IIf not). So I'm sure there are more specific SQL functions which would also work if you would execute such query let's say from .NET where I think pure VBA functions would perhaps not work (because you can use the JET/ACE engine on any computer without having Access or Access runtime installed, it is really a complete standalone product).

Which are the pure SQL functions?: I don't know. I don't think about that, because I simply don't use Access SQL, as I stated earlier, for me Access is no choice for me as backend, I always say: A real database begins with a database server, Access as backend is a choice for home-made databases only (although I'm sure that others have other opinions and I don't want to start this discussion here again). So in my world backend is mainly SQL Server and there I don't need to think about that because I also only use T-SQL and no Access SQL as this has really not enough comfort for me anymore. So in that question you must search on your own if you find any documentation about that...:-)

Cheers,

Christian
ah Christian!!

WHERE SRC.ID NOT IN (SELECT S1ID FROM qryMatchingIDs AS M1)
   AND SRC.ID NOT IN (SELECT S2ID FROM qryMatchingIDs AS M2);


you did what i used 2 additional queries to do in 1 fell swoop!!!

NOW i know the use of subqueries!! = ))

when we join it with and IN or NOT IN, they can become really powerful. hrmm i considered leaving it the way i did before but i think your way is so, so, so, so, SO much more superior - and more importantly a skillset that i MUST take up so i want to put it in my project so that i will always remember it = ))

i also tried looking at the really good emergency article you gave me
Test not quantity
 NOT IN (3, 4, 5)
NOT IN (3, 4, 5) ¿ NOT (x = 3 OR x = 4 OR x = 5)
(Dissolving IN)
NOT (x = 3 OR x = 4 OR x = 5) = 3 ¿ NOT x AND NOT x = 4 x = 5 AND NOT
(De Morgan)
EMERGENCY AND NOT x = 3 x = 4 x = 5 AND NOT ¿ x <> 3 AND x <> 4 AND x <> 5
(Not dissolving the same in difference)
x <> 3 AND x <> 4 AND x <> 5 ¿ (x <3 OR x> 3) AND (x <4 OR x> 4) AND (x <5 OR x> 5)
(Dissolution of inequality in In or Out)
 (X <3 OR x> 3) AND (x <4 OR x> 4) AND (x <5 OR x> 5)
500% profit
Another example to replace NOT IN with the following set of criteria is a result, in the chapter subqueries.
but i realised that we're comparing to a data set here so i did more search and found this series of mini articles http://sqlinthewild.co.za/index.php/2010/02/18/not-exists-vs-not-in/ --> really, really, really, really, really super good!!

i read through the whole series and WHAO! now i understand IN, EXISTS, OUTER JOIN... Is Null much much better!!

that made me really curious about execution plans (cause i didn't have the bandwidth to check that out in detail previously - and i still don't now haha) and i went back to mossTools to check it out again.

whao Christian, is seems like the whole world just opened up to me. i finally found something that i build up my whole life - and it will always stay relevant, not like some presentation pack we put together and then after 1 time use it's over.

i so want to do the microsoft certs now MOS for excel and access i'm pretty sure i can pass it without studying (cause i'm doing it every day). just like so many people do degrees but end up none the wiser cause they don't use what they learn and they are just going through the motions. i didn't have formal education and was always not interested in it cause always felt it was quite pointless but now i want to collect all the certs in the areas that i'm working in - just to ensure that i cover all grounds. going cert crazy in a very relevant kind of way haha - then can you imagine since T-SQL and MySQL are more or less the same, the things i learn about execution path are almost completely relevant there even in a database language specific context. and of course execution path for any database is the same as well.

yup yup i fully agree with you that Access is a completely dead backend and you've got me so so excited to start on a database server - but i must hold on horses, completely this project first and survive till another day before i do that ha.

you know it's like nothing in this world is as cumulative as programming. everything is always starting from scratch. even billards - world champions have good and bad days. singing and vocal production is slightly better cause our voice is with us every day and it's always used. other than that NOTHING is as cumulative as programming from what i know. one person could build an entire mega system by himself (or herself for mbizup! = ))   ). and that's what you've done and that's what you're doing. human factors are finally removed, can work from anywhere, anytime, and our destiny is controlled by ourselves and freedom is restored to us!! = ))

ok i see the execution path very clearly in front of me.

your use of subqueries has opened the entire world of real databases to me.

1) finish my project
2) clean it up
3) get back on track with work
4) do MOS Excel and Access certification (i wanna see what's higher level too - no idea at the moment ha)
5) now that all that component of my past life is settled (past life of dead backends), move on to iphone AND MYSQL development!! and all my experience and the lessons you've taught me are relevant there too!!
6) build my first crappy lousy version of my app
7) refine, refine, learn
8) clean up and build a good version
9) certification in iOS and MySQL!!

and then.... finally i have relevance in life!!! haha = )))

your subquery usage and what you've shared with me about query optimisation opened up my life Christian!! = ))))
i feel i'm at a point where after i'm done with my current stuff to survive, i need to go through all your articles Christian and also now i'm at the level when i can and need to look at components and solutions / utilities that people post on their websites - instead of just the small learning rules of SQL and VBA already. that will increase my understand as a whole on programming. looking forward to that!! = )) especially reading your new article!! = ))
hrmm in the end i didn't change my query cause i would have to make some changes in the other queries (add another column to have both CreditSource and DebitSource and also CreditSourceUID and DebitSourceUID - which is very doable but takes time / mental concentration). that's why i always want to try and do this right, right from the start so i won't face these problems ha = ) but i guess in learning we can't help it - that's why the scrum approach instead of the waterfall approach!

but i've definitely digested the not exists, not in, unions, joins, outer joins a LOT better and i'm much more confident in SQL now! thanks Christian!! = ))
Hi,

I don't want to destroy your dreams of getting certified, but the knowledge you currently has estimating your threads until now would really not be enough for such exams without studying more about that. Such certification exams are full of boring questions which do not need much imagination, only a lot of memorizing of lexicalic knowledge (where you are right: The most of it will be away after 2 weeks because you never need that).

Having a certificate may be a good way of getting a better job if you find an employer which is impressed by such certifications. I personally think that certifications are only good for covering ugly holes on the wall and impressing people who come to visit you. If you ask me how many certifications I have I can say, not even one. They costs a lot of time, a lot of money and the knowledge you learn if you take courses about that is completely targeted to follow an exact path which someone found useful to teach - if you step away one step right or left mostly you are lost or many of the course teachers which are trained to follow this path only without thinking much about it. That is of course not in general the case, there are also good teachers which take such tries as challenge and help you find your own way. But their main job in case of certification courses is to help you pass an exam and not to help you getting a better programmer or database designer because you maybe would blame it on the teacher if you don't pass it.

My often used example is the normalization forms. I really NEVER think about the level of normalization when I design a database, I follow an intuitive way of thinking about the table design. Who is interested in if a table is in first or fifth normalization level? Nobody. Not the designer, not the user, not the customer. So that's category "useless knowledge". It is good if you know how to normalize a table structure and when to use it and when it's better to denormalize against the normalization rules. That's what you need in database design. But in exams you are forced to know exactly everything about all these "useless knowledge" categories which doesn't help you getting better in the end. It's often vice-versa: If you fill your brain with such things you will loose more and more the capability of free thinking, it is similar to what you currently try, the try to always find "the right way of doing it". There is no right way. There are a lot of wrong ways or not so good ways, but you can achieve one and the same things on SO many ways that it is really better to not persist on the "right way", the real good programmers learn ANY possible way of doing something and decide the best one in every new situation.

Back to the exam: Calling it "execution path" would be the first fault...:-) It is called execution plan and in an exam you would need to call it that way - although calling it "path" is not so wrong. That's what I meant, the difference between real life and an inflexible way of thinking you need in an exam.
If you think you got everything about execution plans: You can read whole books about that single theme as especially in database servers it is really very complex. To give you an idea, look here (6 pages):
SQL Server Execution Plans
Or here:
SQL Server Query Execution Plan Analysis
to only mention two of a LOT of pages around this complex theme.

By the way: MySQL and T-SQL are different, some things are possible in MySQL and not in T-SQL and vice-versa (meaning using simple SQL statements, you can of course do anything in both, but for example the LIMIT clause would need more effort in T-SQL where in MySQL it is part of the SELECT syntax). If you know one you cannot do an exam in the other, both have very special parts. And that's no wonder, MySQL is mainly used in web databases where for example a LIMIT clause is very much more needed than in SQL Server.

If you know Access SQL it is maybe 30% of the SQL you will learn if you use a database server.


EXIST and IN: You are right, using EXIST is in many situations the better choice, but I must say that I more often use IN than EXIST as IN is more straightforward to my brain than using EXIST. That's my personal problem, you can of course change the logic to use EXIST instead and you will get more performance in many cases as you have seen in your linked article.

Cheers,

Christian