Advice on building a query with nearly 6 million records

Benyaw616
Benyaw616 used Ask the Experts™
on
Hi All

i'm after some advice on the best way to avoid building a query with 6 million records (or close to that)

i have a logistics matrix to build with 3 points, collection point (40,000 of these) Load Port (8 of these) and destination port (650 of these) from 6 different suppliers.

what i am trying to do is work out the cheapest price, there are 2 tables involved.

1st table has 167k records, it's a list of haulage prices from towns in the UK, each supplier will provide a price from most town to every load port, 2nd table has the price from Load Port to destination port.

however many suppliers subsides the haulage tarriff so the supplier with the cheapest sea freight isn't always the cheapest overall (as another company's haulage maybe alot cheaper)


for example..


1. ABC Ltd. will charge $300 to collect from London and $500 to ship to New York = $800
2. DEF Ltd will charge $400 to collect from London and and $450 to ship to New York = $850
3. GHI Ltd will charge $250 to collect from London and $525 to Ship to New York = $775

therefore GHI is the better option.

ideally the user will just put in from London to New York and my data will show option 3 as the one.
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Top Expert 2016

Commented:
a sample db with the correct table names and fields will help to achieve a solution to the problem.

upload a sample db.
ste5anSenior Developer

Commented:
Please post a concise and complete example (database file). It sounds like proper indexing should do it.
Owner, Dev-Soln LLC
Most Valuable Expert 2014
Top Expert 2010
Commented:
What if a shipper has a better rate from the load town to the destination port via a different shipping port?

Assuming Table1 looks something like:
Shipper
Town
ShipPort
Rate

and Table2 looks something like:
Shipper
ShipPort
DestPort
Rate

Then you should be able to write your query something like:

SELECT A.Shipper, A.Town, A.ShipPort, B.DestPort
, Val(NZ(A.Rate, 999)) + Val(NZ(B.Rate,999)) as TotalCost
FROM (
SELECT * FROM Table1
WHERE Town = "London") as A
INNER JOIN (
SELECT * FROM Table2
WHERE DestPort = "NewYork") as B
ON A.Shipper = B.Shipper
AND A.ShipPort = B.ShipPort
ORDER BY Val(NZ(A.Rate, 999)) + Val(NZ(B.Rate, 999))

Buy configuring the query with two sub-queries, you reduce the number of joins significantly.  Each subquery should focus on the records which start at your ship from Town or the DestPort.

I've used the NZ( ) function to set the value of the rates artificially high in case a record has been created but no rate has been entered.  You may want to increase those 999 values to 9999 to really exaggerate those values.
Top Expert 2014
Commented:
Depending on the size of the problem, you might pump your data into a graph database and use its query language to get your answer.  For instance, the cheapest solution might involve multiple hops and multiple carriers between the start and end points.

You can play with Neo4j (community edition), or one of its competitors.
http://neo4j.com/download/

==========
Here are some Neo4j shipping examples:
http://info.neotechnology.com/rs/neotechnology/images/ebaynow_final.pdf
Top Expert 2014

Commented:

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial