Solved

Find most common patterns with Lists of numbers

Posted on 2014-03-31
8
281 Views
Last Modified: 2014-04-07
Hi,

I'm trying to find the optimal way of identifying common patterns with groups of items (an item is identified by a number)

e.g. every transaction has a list (of size 1 - 20) of items. I can easily say which single items appear most often by keeping a count and incrementing per transaction. I guess i could use the same approach for combinations of items (2+) that appear in transactions, but i'm not sure if this is the best approach?

note that order of the item list is not important, but that i'm not looking to treat the full list as a unique transaction (I want to split each list into its sub combinations - ab,ac,bc, abc, etc and match on these)

any help appreciated,
thanks
0
Comment
Question by:basil365
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
8 Comments
 
LVL 27

Expert Comment

by:d-glitch
ID: 39966576
Need more information.

What is your goal in doing this?
How much data are we talking about?  Hundreds of transactions or millions or more??
What are the items?
How many different items are there?  
Can a single transaction have multiple copies of a particular item?

You probably want to sort the list of items in each transaction for easier comparison.
Then you could compare every pair of transactions, and keep track of the number items that overlap in an array.

Then you could work through the array to find the most common single items and most common subsets.
0
 
LVL 27

Expert Comment

by:d-glitch
ID: 39966590
If there is a finite number of items N, you could assign a 1xN array to each transaction, where the values in the array represent the count of particular items in each transaction.  This would also simplify the subsequent analysis.
0
 

Author Comment

by:basil365
ID: 39966647
Hi d-glitch,

The amount of data is variable, but will tend towards the millions. I intended to find a solution then optimise it by adding assumptions (e.g. only allow combinations up to size 4 to limit results set)

- The items are a complex object representing product, but in this case i only care about the id
- Each transaction can only have one copy of a particular item.
- There are n different items, but ~100000 at a given point in time (again this value will be restricted by applying logic to the data set e.g. older than x)


thanks for your responses
0
Salesforce Has Never Been Easier

Improve and reinforce salesforce training & adoption using WalkMe's digital adoption platform. Start saving on costly employee training by creating fast intuitive Walk-Thrus for Salesforce. Claim your Free Account Now

 
LVL 27

Assisted Solution

by:d-glitch
d-glitch earned 250 total points
ID: 39966751
Millions of transactions with up to 20 items per transaction isn't too bad.

What are you using for software??

You probably want to go through the data and make lots of lists:
  A list of unique items and the number it times they appear.

Then figure out which items are most popular and make more lists.  
  A list of the transactions that include Fav_1
  A list of the transactions that include Fav_2
  A list of the transactions that include Fav_3
  . . . .
  A list of the transactions that include Fav_M  (not N).

You can decide how deep you need to go into the Favorites.  Certainly you can eliminate any items that only appear once.  You will have to look at the data, or prototype a subsample to decide.  You would have to go through the entire data set at least twice.

Finally you would have to compare the Fav_ lists with each other to find how many transactions overlap.  This is a lot of work, but you can start at the top.  The most likely pairs (or triples or quads) of items are likely (but not guaranteed) to include the most common individual items.
0
 

Author Comment

by:basil365
ID: 39966774
Thanks d-glitch, thats the approach i've taken:

- store all combinations of ids when a transaction occurs, as well as the list within the transactions
- for each combination (up to a size of 6 at the moment) I'm iterating through all lists within each transaction and checking if the combination appears.


I'm using .net (programming with linq mainly) and just about to create a few unit tests to check performance

thanks
0
 

Author Comment

by:basil365
ID: 39974671
Speed is definitely the issue here - I had to further reduce my pattern size to combinations of 4 in order to keep up with transactions when using the above approach
0
 
LVL 84

Accepted Solution

by:
ozo earned 250 total points
ID: 39974713
Suffix trees can find the most frequently occurring substrings of a minimum length in O(n) time, O(n) space
http://en.wikipedia.org/wiki/Suffix_tree
0
 

Author Comment

by:basil365
ID: 39982634
Thanks ozo - i'll take a look to see if i can apply these to the problem.
0

Featured Post

Understanding Linux Permissions

Linux for beginners: How to view the permissions associated with files and directories and also how you can change them.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Real-time is more about the business, not the technology. In day-to-day life, to make real-time decisions like buying or investing, business needs the latest information(e.g. Gold Rate/Stock Rate). Unlike traditional days, you need not wait for a fe…
When there is a disconnect between the intentions of their creator and the recipient, when algorithms go awry, they can have disastrous consequences.
This is a video describing the growing solar energy use in Utah. This is a topic that greatly interests me and so I decided to produce a video about it.
I've attached the XLSM Excel spreadsheet I used in the video and also text files containing the macros used below. https://filedb.experts-exchange.com/incoming/2017/03_w12/1151775/Permutations.txt https://filedb.experts-exchange.com/incoming/201…
Suggested Courses
Course of the Month11 days, 16 hours left to enroll

623 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question