Solved

PHP most significant string match in array.

Posted on 2009-06-30
16
820 Views
Last Modified: 2013-11-19
Hello,

I am writing a small billing application to bill phone records. I would be having an array which would load and hold  the tariff information. The array would be something like:

$Tariff
(
[prefix] = 91, [rate] = 0.51
[prefix] = 92, [rate] = 0.80
[prefix] = 94, [rate] = 0.25
[prefix] = 947, [rate] = 0.33
[prefix] = 65, [rate] = 0.12
[prefix] = 66, [rate] = 0.15
[prefix] = 965, [rate] = 0.52
)

Although I've only shown about 4 entries, the array can have about 25,000 Entries having country code / prefix and rate information.

Now, what I would like to do is, when I pick up a phone number to bill, I'd like to find the most significant match to bill the number. For instance, lets say, my phone number to be billed is:

$called_number = '94777191934';

As its clear, this number can be billed using either prefixes 94 or 947 (based on the tariff array), however, the most significant match is, 947, so the billing rate would be 0.33.

I need to find a way, that would effectively be able to find which prefix to use from the array to bill the number. If I use functions which allow to find the needle in the haystack [i.e, strstr()], then possible matches in this case would be 94, 947 and even 91, but what I'm looking for is, something that would only match at the beginning of the number, and the most significant match.

The requirement here is for find out a way that would be able to achieve this in a SUPER FAST way. I'd be requiring to process thousands of records every minute and matching part has to be executed very fast to avoid any bottlenecks. So what would be the most effective way to do it?

------------------------------------------------------------------------------------------------

Part 2:

As I mentioned earlier, the Tariff array can have about 25,000 entries in it. I've been wondering, perhaps it would be not a wise idea to search the entire array for a match.

Assuming the number to be billed is '94777191934', what I thought of doing is, create a second temporary array [$Tariff_Temp()] and copy all entries from $Tariff() to $Tariff_Temp that starts with '9'. This way, I'd have a small array to do the search and matching. Once the match has been found, then I clear the temp array, check the next number to be billed, and populate the temp array based the number again.

Is it a good idea to work this way? The primary purpose is to keep the dataset small and manageable to achieve efficiency and speed in processing. If its a good idea, then what is a best and fastest way to copy data from the Tariff array to the Temp array based on the above condition?

Any help appreciated.

Cheers
Shaf.
0
Comment
Question by:shaf81
  • 6
  • 2
  • 2
  • +3
16 Comments
 
LVL 6

Expert Comment

by:twocandles
ID: 24743694
A solution that comes to me is to arrange the tarifications in a tree-like structure, subindexing arrays. Something like this

$tarif['9']['4'] = 0.25;
$tarif['9']['4']['7'] = 0.33;

Then you can check the matching tarif like this:

$called_number = '94777191934';

if( isset( $tarif[$called_number[1]] ) )
{
   if( isset( $tarif[$called_number[1]][$called_number[2]] ) )
   {
     // make as many inner if's as the maximum prefix length
   }
   {
      return $tarif[$called_number[1]][$called_number[2]]
   }
}
else
{
    return $tarif[$called_number[1]];
}

 It is a fast way since you go directly to the matching number.
0
 
LVL 49

Expert Comment

by:Roonaan
ID: 24746059
Faster would be to use any database engine, and use a query like this:

<?php

$calledNumber = '12345';

$len = strlen($calledNumber);

$searches = array();
for($i = 0; $i < $len; $i++) {
      $searches[] = substr($calledNumber, 0, $i+1);
}

$query = 'SELECT * FROM tariff WHERE prefix IN("'.implode('","', $searches).'") ORDER BY prefix DESC LIMIT 1';

print_r($query);

?>

Because 947 > 94 you get the best match by ordering on the prefix high to low, and taking only the first match.  This even works with arrays that have 9477 and 94, but not 947.

Kind regards

-r-
0
 
LVL 3

Author Comment

by:shaf81
ID: 24747378

twocandles - thanks for the suggestion, let me analyze it and see how well it can solve my problems and if it really performs better upto expectations under load and heavy datasets.\

roonan - well, infact, the tariff array is currently populated from a mysql database. the tariff table on the database is home to many tariffs (tariff groups / tariff sheets) which belong to many customers (which we identify from tariff_id - customer_id mapping). I opted to load the relevant tariff into an array and work on it as the operations would be happening in memory and I assume it would be much faster than working on the db with sql's.

For instance, the tariff table can have multiple tariffs, which means this table would have around 500,000+ entries in it. So, If i have to apply your query, it would be like:

$query = 'SELECT * FROM tariff WHERE prefix IN("'.implode('","', $searches).'") AND tariff_id = xxx ORDER BY prefix DESC LIMIT 1';

We would be replacing the xxx with the relevant tariff_id.


if the average length of the $called_number is 10, then the query IN() would have search 10 entries in the 500,000+ tariff table. Moreover, if I have 100,000 phone numbers to parse, it would be 100,000 queries against the DB, this is the reason I thought it would be better to load the relevant tariff into an array and do the operation on the array (on memory?) Don't you agree it would be a better option than working on the DB? Or do you still thing DB would be faster than the array operations?

0
 
LVL 29

Expert Comment

by:fibo
ID: 24747390
which is the maximum number of significant digits in the prefix? your examples give 3 obviously not using a country code.
I presume your tariff and its 25000 entries are in some MySQL table, with a column for prefixes that you have indexed ON.

Let's assume that the prefix is stored as a text string. '94738...'
Assuming a phone number of '93657897' I would add a '9' at the end, getting '936578979'.
And I would take the 2 leftmost digits, '93'
Then run

SELECT prefix FROM tariff where (prefix between '93' and '936578979') ORDER by prefix desc

Since table is indexed on prefix, the query "between" will be very fast. you now can place the resulting values in a php array and work directly from here.
0
 
LVL 3

Author Comment

by:shaf81
ID: 24748274
fibo - the maximum number of significant digits on any given tariff is probably around 8 digits long. Considering your query using BETWEEN, i'd use between '9' and '936578975', because if we use '93' and the most significant match for a given number is only 1 digit long (i.e: '9'), then the query would not return a valid result.

Btw, have you read my previous comment? We use mysql, however the reason we opted to work with array based offline datesets is to maximize the speed as they operate in the memory space. Dont u agree?
0
 
LVL 3

Author Comment

by:shaf81
ID: 24748430
elaborating more on the above comment, if i use BETWEEN '9' and '93657895' then wouldn't i be getting results like '9361111' or perhaps '9411111' etc? And if these are the longest, then the matches are not valid to bill the number.

Also, for this, the prefix should be stored as strings, whereas its currently stored as numeric.

I'm willing to be flexible and change things, as long as it gives unmatched processing performance under load and size.
0
 
LVL 49

Expert Comment

by:Roonaan
ID: 24748891
If this is for reporting purposes only, you could opt to create an somewhat optimized separate mysql database for this purpose. Then you would not have any problem running larger, longer taking queries. You could then look into combining some of the 100,000 queries using joins or larger IN() sets based on numbers that partially match within the same tariff_id. With some propery keys set, this shouldn't be a problem for mysql if not run continously.
0
Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

 
LVL 3

Author Comment

by:shaf81
ID: 24751911

Well, essentially this is a rating and reporting application. At the simplest form, the database would have customer tables, tariff tables, and call records tables. tariff tables are mapped to customers via tariff_id, and call record tables have customer_id to identify to which customer the call belongs to.

What I'm doing here is, every 15 mins or so, decide a block of call records to process (record_id x to record_id y). Once the block has seen decided, we do a SELECT on the records with a condition <b>Where customer_id = n</b>. What we do is, we process records one customer at a time. The purpose of this is to reduce the queries to multiple rate table lookups. Once we know all records belong to a given customer, all we gotta do is, just load up the rate table and work on it, until the next customer is selected. Since we've isolated the customer and rate, I decided it would be a good idea to work on arrays and keep that part of the processing away from SQL in a bid to achieve speed.

Everyone seems to be talking about SQL and DB, however, nobody seems to be commenting on array operations vs SQL operations performance (memory vs memory/disk). Both views would be appreciated, and would possibly help others in future, looking for something similar.
0
 
LVL 6

Expert Comment

by:twocandles
ID: 24751978
I think it's much faster to work in memory. The trade off between a first load of the values into memory (ie, constructing the array) is well worth compared to a querying each time you want to retrieve a value. We're talking about accesing local memory instead of accessing (a probably remote) database.

Talking about the solution I commented, there's another variant: use numbers instead of chars/strings. I don't know which one would be faster (probably indexing arrays by numbers). Then you can get the digit in a position like this:

$number_at_pos_x = ( $call_number / 10^(x-1) ) mod 10;
0
 
LVL 29

Expert Comment

by:fibo
ID: 24752043
Just brainstorming...
you might build a SQL function that would compute the legth of the leftmost common string between two strings, ie
would return 1 for ('9', '987543') and ('96', '987543'), would return 3 for ('987', '987543') and ('9876','987543') etc

If this function is optimized then running it on the sql server might be very efficient, because the volume of data you would get back from the (single) would be very low. In that case, the total result would probably be better than running the php on in-memory array (because you have to transfer the array from the SQL server to the php server, and then eat RAM on the php server)

Hmmm... I'm sure some regexp expert could work the corresponding regexp, and that would probably be very fast...
0
 
LVL 2

Expert Comment

by:binghu
ID: 25102721
I am confused.. For Question 1, you just need to have the array to be sorted (longest prefix before shorter ones).. Is that it? And when you go through the array, stop as soon as you get a match.

Part 2, you don't need to create the temp every time, you can just need to split the main array by the first N characters (if you want to get a smaller size). Then when a number comes in, you locate the array by substr(N, number)..
0
 
LVL 3

Author Comment

by:shaf81
ID: 25802834
the solution for this was derived from elsewhere, however, I will be reading through all the posts and then accept and award points to the ones which helped towards the better solution. This would be done in the next 48 hours. Apologize for the delay :(
0
 
LVL 3

Author Comment

by:shaf81
ID: 26045357
Below is the deployed solution for the problem and this solution worked much faster than any other methods tried or discussed above.

The solution takes the first digit of the called number and then filters out all prefixes that start with the $first_digit in ascending order. Then its a matter of finding the most significant match within that subset. STRPOS was quite fast in doing this.
$sql = "SELECT prefix,rate FROM tariff WHERE prefix LIKE '$first_digit%' ORDER BY prefix ASC";  
$query = mysql_query($sql);
while($row = mysql_fetch_assoc($query)){
$tariff_results[] = $row; 

$prefix_match = NULL;
foreach($tariff_results as $td){
    $prefix = $td['prefix'];
	if(strpos($called_number,$prefix) === 0){
		if($prefix > $prefix_match){
			$prefix_match = $prefix;
		}
	}
}

Open in new window

0
 

Accepted Solution

by:
ee_auto earned 0 total points
ID: 26070005
Question PAQ'd, 500 points refunded, and stored in the solution database.
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Suggested Solutions

Nothing in an HTTP request can be trusted, including HTTP headers and form data.  A form token is a tool that can be used to guard against request forgeries (CSRF).  This article shows an improved approach to form tokens, making it more difficult to…
Any business that wants to seriously grow needs to keep the needs and desires of an international audience of their websites in mind. Making a website friendly to international users isn’t prohibitively expensive and can provide an incredible return…
This tutorial walks through the best practices in adding a local business to Google Maps including how to properly search for duplicates, marker placement, and inputing business details. Login to your Google Account, then search for "Google Mapmaker…
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

17 Experts available now in Live!

Get 1:1 Help Now