Link to home
Start Free TrialLog in
Avatar of Davisro
DavisroFlag for United States of America

asked on

How to improve performance when joining on text field

Experts,

**I work in MS Access. So I don't have access to query performance tools**.

Anyone know a best practice for dealing with joins on text fields?

If I have many data sets in the 500k to 1mil record range with a Cost Center field that is Text data type because the Cost Centers are 10 char with 4 numbers, a letter,  and 5 numbers in the format xxxx?xxxxx where x is a number and ? is a letter. I always both index and do joins on this field. My understanding is that joining fields with the numeric data type is more efficient than on text fields,  so I'm considering, to improve performance, creating an additional CostCenterID field with the letter removed so it would be a 9 length number of the Long data type. It does not appear that removing the letter will create any duplicate values.

My question is what is the best way to do this?
Lets say I have two tables:
Table A has 1mil records and a Cost Center Field of Text data type.
Table B has 25k records and a Cost Center Field of Text data type.

I could add a calculated CostCenterID field to my source tables with an Update Query using a formula such as Left([tA]![Cost Center],4) & Right([tA]![Cost Center],5) AS CostCenterID. The new field would be a 9 length number of Long data type.

I don't like altering my source tables but this seems like it would work . How do db admins typically handle this kind of problem?

Thanks
ASKER CERTIFIED SOLUTION
Avatar of Jim Dettman (EE MVE)
Jim Dettman (EE MVE)
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Davisro

ASKER

Thanks Jim. I always appreciate your input.
I know you have your answer, but just a little concerned about your predicament....

The previous question, the Experts all had a SQL Server bent, and dont think "we" gave you the best advice.

So, if this is part of your previous problem, then the main issue is not so much joining two text columns as it is trying to extract the right hand most cost centre from a column which is a semi-colon delimited string of a few cost centres.

From : https://www.experts-exchange.com/questions/29124895/Join-Tables-on-Calculated-Field.html

You said : "In the tIQN table only, the Cost Center field has multiple values for some employees separated by semicolon. For example, 5545Q7654; 4675Q2332; 9087Q2090"

And : "My business rule is to only use the right-most Cost Center"

That is a big problem when dealing with a million rows.

Not so much if you can join on  Table1."9087Q2090" = Table2."9087Q2090"  but when you have to extract the right hand most cost centre, then that becomes a problem.

I dont know how entrenched that structure is, but should really consider formalising the Employee / Cost Centre relationship and use that as a basis for your joins.

So, create a  EmployeeCostCenters table (and really, need to fix this one aspect first and foremost)

CREATE TABLE tblEmployeeCostCenters
(EmployeeCostCenterID AutoIncrement PRIMARY KEY,
EmployeeID INTEGER,
CostCenter VARCHAR(10),                      
CONSTRAINT idxEmployeeCostCenter UNIQUE (EmployeeID, CostCenter) )

Have assumed Employee has ID's, but can adjust as needed....

Then populate that table with the individual EmployeeID + CostCenter codes

Your  Join then becomes an extra link in the chain (and very unsure of your tables other than that other question)

SELECT qIQN2.*, tGFRS.Rate,
FROM qIQN2
INNER JOIN tblEmployeeCostCenters as EC ON EC.EmployeeID = qIQN2.EmployeeID
INNER JOIN tGFRS  ON tGFRS.Bu = EC.costcenter


Now, the last join is NOT going to be heavily penalised just because it is a string

It doesnt sound like CostCenter has a master table, and yes, that would also be a good idea.

But if costcenter is used in quite a few places, it will take a bit of work. Again, the important one is to break up  the stringing together of cost centres for an employee per above.

CREATE TABLE tblCostCenters
(CostCenterID AutoIncrement PRIMARY KEY,
CostCenter VARCHAR(10))

And if creating the CostCenter Table, then use CostCenterID instead of CostCenter in the above examples.

You would populate the costcenters table first and then go about updating every other table which has CostCenter with the CostCenterID


It would be great to get Jim to have a look at your previous question

If I ever have a MS Access question, he is the one expert I would immediately reach out to :)