Solved

Computing whether two orders in a database come from the same person

Posted on 2004-10-28
582 Views
Last Modified: 2008-02-01

Basically, I have a database of orders.  I now want to find which orders have come from the same person (billing information).  

A seemingly straightforward problem, but consider that fact that people might order using different credit cards, might be using coporate credit cards that don't have their names on them, etc.  I think I'll have to use several functions (or one function) that computes a 'confidence' that two orders are similar.

.. And I get the very strong gut feeling that this will be a matrix problem.

Any suggestions to point me in the right direction?

- hbz
0
Question by:hbz
    27 Comments
     
    LVL 22

    Assisted Solution

    by:grg99
    I would use some ranking function that compares their billing information but makes allowances for the common variations:

    Missing "mr" or "Mrs" or "ms"

    Missing word on end of street address "drive", "st", "street"

    Extra words on end of street  "Apt 204"

    Extra digits on zip code (zip+9)

    Alternate city name (Queens instead of New York, Cambridge instead of Boston),
    easiest way is to check if alternate name maps to same or adjacent ZIP code.

    SLIGHT mispelling of name.  

    Missing middle name or initial instead of middle

    Alternate names-  Bob for Robert, etc...

    -------------------------
    Important fields to NOT consider similar include:

    Anything but a slight mispelling of the name

    Any change in numeric address.

    ----------------
    I dont see how anything as clean as a matrix is going to be of much use as this is a rather human quirk based problem.

    -------------------------------

    You'll have to resign yourself to some false positive matches.. For example our insurance agent had two clients whose medical files got badly mixed up, they had identical (and unusual) names,  same DOB, and same physician.
    0
     
    LVL 5

    Author Comment

    by:hbz

    grg99:

    "I would use some ranking function that compares their billing information..."

    Unless I can think of something better, I have these three string-based functions:

    f1: Leftmost 3 characters of first name and leftmost 4 characters of last name
    f2: Leftmost 10 characters of email
    f3: Lastname + state

    So that if f1(order #1) = f1(order #2), f1 calls them equal.  Now, these functions are not equal.  f2 has a very low false positive rate, for example.  So I will probably (?) use a weighted sum of these f = f1 + f2 + f3

    You'll have to forgive the rustiness/nonexistence of my math skills, but here is my reasoning regarding some sort of matrix is that if I'm comparing all pairs of orders against all others, I'd essentially have a 2-d matrix, N x N where N is the number of orders being compared.  Each element Xij would be the confidence returned by this function.  It would be symmetric about its diagonal.  Doing some Googling, I ran across some pages that mention "Cluster Analysis" and "distance matrices" which seem to be related to my problem.  ...But they are too dense for me to read.  Am I heading off the deep end here, perhaps making the problem too complicated?

    - hbz
    0
     
    LVL 22

    Expert Comment

    by:grg99
    Me too, my head hurts when I try to picture all this.  I'd stick with your idea of a weighted sum.

    BTW there's often a lot of variability involved in domain names parts, for example I get mail at abc@xyz.edu, abc@maincampus.xyz.edu, abc@kremvax.xyz.edu, which are all really the same thing.  You may want to allow some flexibility here.

    0
     
    LVL 26

    Assisted Solution

    by:aburr
    You basic idea of the three functions is a good one. The matrix idea is a good way of easily handling the calculation of the three functions. As you point out the matrix will be diagonal so you will only have to calculate half of the elements. Manipulation of the indices will simplify the programing. You can calculate the three functions, add them in weighted fasion, and have the program spit out all pairs with a high score for human decision.
    0
     
    LVL 45

    Expert Comment

    by:Kdo
    Hi hbz,

    What about starting with something simple and obvious, like shipping address?

    Kent
    0
     
    LVL 3

    Expert Comment

    by:Andrea Ercolino
    aburr explained your approach very well, but I don't think it fits here, mostly because it's strange the use of a mathematical method to get an intermediate result: <<have the program spit out all pairs with a high score for human decision>> (aburr), when there is a procedural algorithm to get a final result: have the program do it all for you.

    The main problem with your approach is the uncertainty introduced by the use of the hash functions; maybe they are useful for solving fuzzy problems like those described by grg99, but I think you never should take aaaabccccc@d.com = aaaabccccc1@d.com (f2) or Gene Hackman = Genevieve Hackerman (f1)!!

    Data Consolidation is a tricky field. In many respects it's similar to Data Compression: usually a less than optimal compression is better than too much compression. Specific knowledge can be very useful too: <<something simple and obvious, like shipping address>> (Kdo).
    0
     
    LVL 24

    Expert Comment

    by:SunBow
    My cents...
    this is not doable as phrased.

    > Any suggestions to point me in the right direction?
    Yes, but first let me suggest the wrong TA was selected, a database question should go to database TA.  There at least used to be some people in MS-Access TA who worked on similar problems

    > I think I'll have to use several functions (or one function) that computes a 'confidence' that two orders are similar.
    no

    >.. And I get the very strong gut feeling that this will be a matrix problem.
    no

    Not math, but identification, accuracy, and database.
    My IMO is that it is a problem for human, not computer or calculator
    Typically, this is resolved by creating a unique key that links the variety of records together.

    Unfortunately many people have too similar names, such as the Jans Doe example, John Smith.
    I had recent situation at work, person had over twenty letters in each of first and last name. Neith name had I seen before, so thought was the names were uncommon. Yet the two names were identical, as were their zipcodes, city, place of business/work, address, building, floor, hall, etc.  At their own recommendation, what we did was reduce records for the first name of one to be only three letters. Their interest was not same, it was for insuring that other people could tell them apart, such as for eMail and even paper mail.

    That is among reasons I say the problem needs human involvement to resolve.  Use the human interest to get the records associated properly. They are interested, more than you can imagine.

    I know some peolpe who have similar issue in banking management.  For only a single account at single bank, they would like to participate in handling transactions remotely. But since they have computers at home, and at work, the bank will still reject them based upon such things as different telephone numbers and changing of internet addresses.  They want resolution, the business seems to care less, forcing, restricting their options.

    >  might be using coporate credit cards that don't have their names on them, etc

    I don't accept that.

    > You'll have to resign yourself to some false positive matches..

    This is unacceptable,  Defining a final resolution using lawyers and courtrooms is too expensive an option

    >  I now want to find which orders have come from the same person (billing information).  

    Know well we are not faced with resolving a federal case of handling fraudulent activity and criminals or suspects trying to hide.

    Where a person is truely interested in continuing a business relationship, purchasing from a company using multiple IDs, the answer is to ask the persons themselves which IDs to match together.  Where some companies still restrict all billing to one address, others permit separate billings, to separate addresses. A person can, say, have the same 'home address' for each record, the same 'work address' for each, and yet a different 'billing address'. As far as the database goes. The reality could be that some or all addresses may be identical for many individuals.

    What ties them all together must be a unique database field for each individual, that can be associated with each transaction, bill, and payment.  The billing company can 'borrow' a number from any of the IDs associated with person to do this.  Better practise is to have the company invent its own ID for cross-referencing, one that is defined to best suit their purposes.

    The proof of the record match can include the payment of the bill without question.

    For the math part, it is not useful in grouping them together. This is easily accomplished by merel selecting the link to records, or sorting them and finding first best. Now, the math (eg matrix) does come in handy for resolving of duplications, and for making efficiencies inside database by eliminating or at least reducing any redundancies.

    But if you assume you can be confident in using computers to consistently match diverse records to individuals, you will fail.
    0
     
    LVL 24

    Expert Comment

    by:SunBow
    Please stop sending me bills for the purchases of SunShine!                          :-)
    Nevermind, I'll just take all future business elsewhere!                                  :-(
    (unless you give me address of SunShine, looks like lots of good stuff was ordered)              ;-)
    0
     
    LVL 3

    Assisted Solution

    by:Andrea Ercolino
    SunBow,
    You're right, maybe there are better places for this question. Nonetheless, the author wanted to use matrices to solve their problem. So we can leave it here, I think.

    Your suggestions are good, but I think there is no space for changing the boundaries of the problem. I mean, consolidation issues happen frequently in information processing, and much more if data creation is easy, like in the Internet, so we should try to solve it per se...

    Any two records can be put in the same group if they share at least one key information.
    And if two records don't share any key information then we cannot say anything about them, so we should keep them apart.

    For example we know that for each email or shipping address there is only one person receiving there (safe hipothesis), so these addresses are good candidates for key information... based on domain of interest there should be others (but you really need to find data pointing to only one person)

    Then it's possible to find an algorithm for grouping records together.
    0
     
    LVL 24

    Expert Comment

    by:SunBow
    RAPUTA >  Nonetheless, the author wanted to use matrices to solve their problem
    Not so, see here:
    hbz >  strong gut feeling that this will be a matrix problem. Any suggestions to point me in the right direction?

    It is not a statement desiring to find way to use matrix, it is a question of whether matrix can produce reasonable solution

    > but I think there is no space for changing the boundaries of the problem

    Only true in malinformed malconstructed material of classroom exercise.

    > consolidation issues happen frequently in information processing

    True, witness my reference to elimination of redundancies (although that is not practiced very well at all)
    Also there is potential for the finding of missing persons, or rather, criminal investigations of those seeking to hide identity

    > Any two records can be put in the same group if they share at least one key information

    well, the key is to identify keys. For sure. In the example as given I claim there are none. Two different credit cards have two different numbers.  Neither first name nor last is unique, and combining them does not solve the problem. Addresses are not unique.

    In crime solving as well, even upon making of matches, the presumption remains that it is not true, for being innocent until found guilty. So investigators seek further evidence to corroborate the identification process to make the match.  Again, it is a people issue, not one for a formula or a function or an algorithm, but it can be a process. The process, in business, includes the people involved or the business is subject to failure, as witnessed by the dot.com debacle.

    Here is a personal example. I had a credit card. I took a vacation and bought presents. In between shopping stops, my card was denied right to purchase. Upon denial, I asked what problem was. I was not informed, but quite likely some formula was used. I was not in my home town. Still, being disaffected, I cancelled the card and all future business with that company.  I changed scheme, and got credit card from bank where I had money deposited, effectively merging both accounts, so that all of the data fields line up perfectly between the two. One year I successfully use card for trip to city spending huge amount of money.  The next year I go to same city, spending money, but only one day instead of several. As I neared home I was low on fuel and stopped to fill it up. Once I had obtained the fuel, I found my card was no longer good for purchase. Fortunately I found enough change to make it out of there. Upon face to face confrontation with banker, a few phone calls were made and my card was reenabled and I was told that the reason it was denied was that my purchases did not match prior 'pattern'. It was either unknown or at least unexplained what that pattern was. But their pattern program denied them their ability to be useful in my fuel transaction.

    >  for each email or shipping address there is only one person receiving there (safe hipothesis),

    Not true at all in the real world. Many of same name even at same address, and many do share a common eMail.

    There are a couple of other similar situations regarding pattern matching in the area of fraud. One is for credit card theft (physical one) and the other is for identity theft.  Now in each there can be a pattern that could lead an algorithm to report 500 orders per day from same home, or 500 people claiming to live in same room, etc.  Now you have a use for the pattern, but the purpose is to identify which ones are different, to distinguish the valid holder of the identity from the fraudulent one, yet still to leave it for the human review to determine, or assess, the probability of match or fraud using other tools.

    > Then it's possible to find an algorithm for grouping records together.

    You can group, but you cannot succeed.

    People who deploy such methods are responsible for having workcrews declared dead, permitting dead people to receive checks and even to vote, and worse.  If done by government, too bad, they are sorry, you still stuck keeping them around,  if done by a business, the business will fail.

    I recall some ISP example, father and son having same name, somehow the son got added to a list, and the father was being sued for not paying for internet service. Father had no computer or even telephone line (they would uae a pay phone down the street if needed). Ends up with the government putting ISP out of business for cause of fraudulent billing.

    One must be duly careful about making assumptions for nearness of matches.  Personal contact always helps, and many card companies actually require a valid telephone number, and will not permit activation of card until receipt answers that very phone. Then, then they can associate the telephone number as a key, but as in by above example on banking, they may still need to add a flexibility to satisfy market conditions
    0
     
    LVL 5

    Author Comment

    by:hbz

    Wow you guys have been busy!  Sorry for the delay -- I've been busy, too.  Here is some more information about my problem.  The goal is to find past orders which belong to the same person so that we can recognize them as high-frequency customers and reward them with a little "holiday bonus" as a token of appreciation.  As I've said before, the buyer may be corporate, may use different credit cards (which may have different billing names/addresses on them), may input different shipping addresses when they buy things as gifts for others, etc.

    -----

    > Kdo: What about starting with something simple and obvious, like shipping address?

    As stated above, the shipping address would not be the solution by itself, since they may ship to different addresses.   However, a match on shipping address should be considered a strong match, I think.

    > RAPUTA: have the program do it all for you.

    I cannot let it be completely automated, since the client wants to get as close as possible to 100%.  This means that whatever algorithm I use must have a modest false positive rate.  There must be a human at the end of the process.

    > ... The main problem with your approach is the uncertainty introduced by the use of the hash functions.... I think you never should take aaaabccccc@d.com = aaaabccccc1@d.com (f2) or Gene Hackman = Genevieve Hackerman (f1)!!

    My functions are based on the data I have on hand, and I notice that, w.r.t. to email addresses and even their own names, that they tend to get the first few characters right, and that misspellings occur later on.  For example, they will type joe_schmoe@yahoocom (and leave out the second period in the address).  I realize that my functions are not very pure in the mathmatical sense.  As I've said before, I'm not accustomed to this type of analysis.  If you can suggest better ones, I'd love to hear them!

    > SunBow: Yes, but first let me suggest the wrong TA was selected, a database question should go to database TA

    Perhaps.  I thought about it, but it seems like a much more theoretical problem.  Tell you what:  I'll crosspost some points to the DB area.

    > ... Typically, this is resolved by creating a unique key that links the variety of records together.... Better practise is to have the company invent its own ID for cross-referencing.

    This would be the ideal.  Unfortunately, no such field exists in the database.

    -----

    Anyways, I feel like this has generated a lot of good dialogue, and may be a good intellectual exercise.  On the other hand, I don't feel any closer to any sort of satisfactory solution.  I'll crosspost to the DB area to get their input...

    - hbz
    0
     
    LVL 4

    Expert Comment

    by:chanito
    I saw your question in the DB area.

    I've spent lots of time working of similar projects.  The truth is there is no easy answer, it has to be a combination of some matching algorithms and some human work and you'll end with a somewhat satisfactory outcome.

    1. Match orders by name.  As said before you have to have some tolerance for titles, variations in spelling, etc.  There are functions like SOUNDEX() that make a phonetical comparison but they don't work that well in my experience.

    2. Find matches using the credit card.  Maybe they gave you a different name but the card matches.

    3. Find matches using the phone number or the e-mail.

    4. Find matches using the billing address and maybe the shipping address.

    5. Have people review all your matches, just make it easy for them, give them a list with all the info so they can compare easily.

    And as I said, having done many projects of this sort there are no universal answers, every database is different.  Some times I have a great deal of success with only the name and zip code.  Other times the phone numbers are good.  It all depends on what pieces of your database are reliable and which aren't.
    0
     
    LVL 3

    Expert Comment

    by:Andrea Ercolino
    >> but I think there is no space for changing the boundaries of the problem
    > Only true in malinformed malconstructed material of classroom exercise.
    real life is full of wrong or incomplete stored data, and many people want to use those data

    >>  for each email or shipping address there is only one person receiving there (safe hipothesis),
    > Not true at all in the real world. Many of same name even at same address, and many do share a common eMail.
    I don't know how a postal address works in the USA, but here in Spain an address is a key for those leaving there: first name + last name + street name + build number + flat number + door number + postal code.
    0
     
    LVL 15

    Expert Comment

    by:Raisor
    Hi,

    ... this is just a view from behind the curtains ...

    My customers have a unique ID ... when they order online they have to logon -> so I know who my customers are ...

    With the same unique ID they can order by phone, by fax, by mail ... by CtoB and BtoB and PtoB interfaces ...

    ... through that sole ID I know how many orders they've actually processed ... if it comes to a certain amount I can treat them like "benefited" clients/customers/partners ...

    ... I'd like to explain and to help but I'm not getting the point right now ...


    Best regards,
    Raisor
    0
     
    LVL 26

    Expert Comment

    by:Alan Warren
    Hi

    Would be really helpful to see some of database structure, table names, field names etc...

    Agree with Raisor, if your db is normalised, then I expect there will be a primary key in  the table Customers that is represented as a foreign key in the table Orders , you can use this key to do a join on the two tables, returning some summary info, hmmm, maybe count of orders grouped by customer in descending order

    using he the Northwind sample database you would use a query with source SQL like this.

    SELECT Customers.ContactName, Count(Orders.OrderID) AS CountOfOrderID
    FROM Customers INNER JOIN Orders ON Customers.CustomerID = Orders.CustomerID
    GROUP BY Customers.ContactName
    ORDER BY Count(Orders.OrderID) DESC;


    Which returns output something like this:
    Contact Name      CountOfOrderID
    Jose Pavarotti      31
    Roland Mendel      30
    Horst Kloss                      28
    Maria Larsson      19
    Patricia McKenna      19
    Paula Wilson      18
    Carlos Hernández      18
    Christina Berglund      18


    Probably enough data to mail out the holiday bonus

    take care...


    Alan (PageEd Databases)
    0
     
    LVL 40

    Expert Comment

    by:Vadim Rapp
    I think you can safely assume that the order came from the same person if the shipping address is the same.|

    Another possible combination - name and zipcode. Better, if you had a function to calculate the proximity of zipcodes. This is for the case when I order something from home and then from work for the company.
    0
     
    LVL 3

    Accepted Solution

    by:
    >> RAPUTA: have the program do it all for you.
    > hbz: I cannot let it be completely automated, since the client wants to get as close as possible to 100%.  This means that whatever algorithm I use must have a modest false positive rate.  There must be a human at the end of the process.

    I think that if you and your client accept the following two rules, then it's all 100% decided...

    1. Any two records can be put in the same group if they share at least one key information.
    2. And if two records don't share any key information then we cannot say anything about them, so we should keep them apart.

    Let me explain it with a symbolic example. Let's say you have these data:

    name        email         ccardnum          address
    ------------------------------------------------------
    a               b              c                       d
    e               b              g                       h
    k               n              o                       h
    p               q              c                       s
    u               v              x                       y

    With the above rules, if email, ccardnum and address are our key info, then persons with name a, e, k, p are the "same individual" and person with name u is another one.
    0
     
    LVL 1

    Expert Comment

    by:mrnaz
    OK perhaps I'm completely off track, but what's wrogn with making your application require a user to first register to use it, and then log in each time they wish to place an order. That would make it easy to associate orders and completely eliminate the need for any "fuzzy logic" algorithms to compute the possibility of two orders coming from the same customer.

    You could also requre the use of a valid email address to verify each order. That way the bvast majority of customers would use the same email address each time, certainly giving you at least as good a matching as any confidence match approach with 1/10th the complexity.

    In short, you should get customers to identify themselves before allowing them to place an order. That way you could associate credit cards with paritcular customers, easily allowing muktiple cards per user. A sample table structure is as follows:

    USERS
    userid
    name
    surname
    address

    CREDIT_CARDS
    cardid
    userid -> USERS.userid
    carnumber
    expiry
    cardtype

    ORDERS
    orderid
    cardid CARDS -> cardid
    details

    As you can see users can have as many cards associated with them as they want, and orders can be associated with different cards and then linked to the owner by finding the owner of that card.

    Hope this helps.
    0
     
    LVL 4

    Expert Comment

    by:chanito
    I think many of you are missing the point that the database already exists and he wants to detect people that have made several orders.  Obviously he doesn't have a unique customer ID and putting it now would be a good idea but still wouldn't solve the problem.
    0
     
    LVL 24

    Expert Comment

    by:SunBow
    chanito  >I think many of you are missing the point that the database already exists and he wants to detect people that have made several orders. Obviously he doesn't have a unique customer ID

    :-))

    hbz > recognize them as high-frequency customers and reward them with a little "holiday bonus" as a token of appreciation

    Sorry, still apples & oranges, better to either give them duplicate discounts (or bonuses) or to advertise the coming of bonuses, suggesting they notify the business of their multiple accounts.

    Since we do not know these people, we also have to remember other rules of the game as well, such as the separation of work from pleasure.  So it goes, that businesses can and do ask for all such bonuses to be given to the business itself, and not to the worker. They justify by it was the nature of the business transactions that produced the reward, so they are entitles to it.  This has another little sense to it, that businesses do not want employees makinh transactions that may be more costly than necessary, just to enable their own gain through discounts and other persk. This includes the likes of bonus travel miles, for example, from a public carrier, a rather non-tangible asset.   I do wonder, however, for the time when the business offers a free cup of beverage, how best to turn some of these perks over to the boss for proper accounting.

    The general rule is, that these two forms of expense cannot be mixed. So it remains incumbent to have the customer perform the declaration that it is alright. How, for example, would a recipient declare it for income tax recordkeeping purposes?

    > must have a modest false positive rate.

    but guessing wrong just once, can be such a problem for a recipient that they raise a major lawsuit, such as class action.  It could be viewed as discriminatory to one of different religion, race, or gender, for example.  Many first names are not gender specific. Make a mistake on that, well most people are tolerant and will shrug it off, but others may have a chip on shoulder or just simply want to benefit from their friendship with a class action lawyer

    OTOH, for discounts or holiday bonus, the problem I raised for shared eMail and shared name and address is not as applicable. For even a grandfather, mother, and child to all share same name, they will typically agree that their bonusses can be pooled.  For one, it is usually the Senior's money that was used to make purchases, and for another, if they did not get along so well they should have moved out on their own. But if you have a choice of a Sr vs a Jr, you'd probably do better with merging benefits to the former, who can pass it on if so desired
    0
     
    LVL 2

    Assisted Solution

    by:skeeveswp
    hbz,

    I saw your request in the DB area.  I have worked with a similar problem ... finding and consolidating duplicate identities.  Nothing is really 100%, but I managed to pare down some of the duplicates using a combination of name, street address, city, state, and zip code entries.  Of course, I don't know what fields you have to work with, but I suspect that the billing address for the credit cards may be useful.  Granted, this won't help with those who are using corporate cards, but it should give you a starting place.

    If you could give me an idea what the required fields are for the orders, I may be able to give you some more specific assistance.

    SteveP
    0
     
    LVL 4

    Expert Comment

    by:chanito
    I think SunBow's advice is good and sound.  Just to add to my previous comment, if you want to do some programatic matching use whatever criteria, my advice is to first come up with an analysis.  For example:

    Using credit card number you find 400 suspects of making say 900 orders.  First thing you could try to do is see if other data match, for example phone number, e-mail, etc.  And the most important thing is to create a list with the 900 entries and group them by the matching criteria and have someone (or you in the worst case) go through the list and see if you're detecting John P. Smith is the same as Jon Smith which sound right to me, or in the other hand you're matching Jane Smith as the same person too (maybe the daughter of John P.)
    0
     
    LVL 3

    Assisted Solution

    by:Andrea Ercolino
    hbz >  For example, they will type joe_schmoe@yahoocom (and leave out the second period in the address).
    that is clearly a typo, that you can "easily" take care of... the "problem" is when you have something like the following

    1.  joe_schmoe@yahoo.com
    2.  joe_schmo@yahoo.com
    3.  joe_schnoe@yahoo.con
    4.  joe-schmoe@yahoo.com
    5.  joe_schmoe@yahoocom

    I think that you should recognize 4 different individuals here, because only 1. and 5. share (the key information represented by) the email address... the others seem typos, but you should be careful, and consider them different persons (here is why many websites try to validate emails with a message before activating accounts for customers)

    so, after correcting easily recognizable typos, you do not (need) have to use a "cofidence" function...


    The equivalence relation between personal data records and its transitive property will do!!

    DEF: Person1 = Person2 if and only if there exists at least one field F for which String( Person1.F ) = String( Person2.F )
    Transitive property: if Person1 = Person2 and Person2 = Person3 then Person1 = Person3

    Note that the Transitive property says not only that:
    -- if String( Person1.F ) = String( Person2.F ) and String( Person2.F ) = String( Person3.F ) then Person1 = Person3

    but also that:
    -- if String( Person1.F ) = String( Person2.F ) and String( Person2.G ) = String( Person3.G ) then Person1 = Person3
    where the field F is different from the field G <-- this is what makes possible to devise the 100% consolidation algorithm

    (maths is everywhere -- this looked like a database question but was finally demonstrated its very nature ;-)
    0
     
    LVL 5

    Expert Comment

    by:PointyEars
    Sorry for being late.  I was away for a while.  I would tend to think that the problem has no satisfactory mathematical solution.  I would make a different suggestion: identify the orders which clearly come from the same person.  For example because the names are identical, or differ by one letter but the addresses are identical, or were done with the same credit card number, etc.

    Calculate the number of people who are going to get a reward by dividing your budget for rewards by the cost of each reward (all included: packaging, shipping, what have you).  Then draw that number of names from the list you prepared.

    One risk will be that some of them will get more than one reward.  You can limit the damage by saying that only one price per person can be received (e.g. if you already have received a letter like this, please ignore it, etc.).  Some people might still cash in more than one reward, but it cannot then be helped.

    The other problem (a certainty) is that some people who ordered several times will not be included in the list.  But then, who said that a promotion must be perfect?  On all your bills or catalogues you could clearly state that if people buy with different means they will risk to miss out on rewards.

    I think that you should not [over]engineer a solution.  Ultimately, you do have a budget which you cannot exceed, or?  I would keep it simple.
    0
     
    LVL 26

    Expert Comment

    by:aburr
    The question has obviouly been answered but it will be difficult to pick out the relative comtributions of the responders.
    0
     
    LVL 5

    Author Comment

    by:hbz

    It's been a while... The project that prompted this question has since been cancelled, so there was little motivation to pursue an answer to this question.  I guess what I was looking for was something a little more theoretical than what was discussed.  The most interesting web page I saw was http://www.dcs.shef.ac.uk/~sam/stringmetrics.html, but again I had no time to dive into it.  Thank you, everyone, however, for the interesting discussion!

    - hbz
    0
     
    LVL 3

    Expert Comment

    by:Andrea Ercolino
    Hello hbz!!
    Thank you for accepting my comments!!
    0

    Write Comment

    Please enter a first name

    Please enter a last name

    We will never share this with anyone.

    Featured Post

    Course: From Zero to Hero with Nodejs & MongoDB

    Interested in Node.js, but don't know where to start or how to learn it properly? Confused about how the MEAN stack pieces of MongoDB, Expressjs, Angularjs, and Nodejs fit together? Or how it's even possible to run JavaScript outside of the browser?

    A Guide to the PMT, FV, IPMT and PPMT Functions In MS Excel we have the PMT, FV, IPMT and PPMT functions, which do a fantastic job for interest rate calculations.  But what if you don't have Excel ? This article is for programmers looking to re…
    Lithium-ion batteries area cornerstone of today's portable electronic devices, and even though they are relied upon heavily, their chemistry and origin are not of common knowledge. This article is about a device on which every smartphone, laptop, an…
    With the advent of Windows 10, Microsoft is pushing a Get Windows 10 icon into the notification area (system tray) of qualifying computers. There are many reasons for wanting to remove this icon. This two-part Experts Exchange video Micro Tutorial s…
    Migrating to Microsoft Office 365 is becoming increasingly popular for organizations both large and small. If you have made the leap to Microsoft’s cloud platform, you know that you will need to create a corporate email signature for your Office 365…

    884 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    24 Experts available now in Live!

    Get 1:1 Help Now