Link to home
Start Free TrialLog in
Avatar of suhaib79
suhaib79

asked on

C#, Execute thousands of complex validations and logic on thousands and more of database records ,

I am looking to a way to execute thousands of complex validations and logic on thousands and more of database records spanning multiple tables in few of seconds when user make action for reporting purpose.

For example (unreal just to imagine business):
Assumption:
Let’s suppose we have hundreds of stores, and each stores has thousands of customer, and each customer has thousands of orders, and there is a stores union that is watching how the stores are selling products according to the stores union’s rules and regulations, and stores union has access to all stores data including customers and orders and the stores’ data is imported to central database.

1- Our application have to report which orders were violated the stores union’s rules and regulations (against thousands of regulations on each order item which can expand multiple entity/tables) and these regulations can be changed from time to time
Ex. The stores union wants to generate reports and graphs on fly to check which the registered stores had sold cigarettes to customer who is age below 18 years and has medical condition and the store is allowed to sell cigarettes and the customer bought water and milk with same order…. And …. Or… Not…. etc.

2-In additional to that a way to monitor store data to notify users when violation happens and what is the reason.
For example:
The stores union wants to be notified directly after a registered store have sold cigarettes to customer who is age below 18 years and has medical condition and the store is allowed to sell cigarettes and the customer bought water and milk with same order…. And …. Or… Not…. etc.

So, what is the best way to build and run this kind of changeable validations and to monitor any violation? Keep in mind the performance is critical, lets avoid BI tools
 
Technologies:
C# , Oracle
ASKER CERTIFIED SOLUTION
Avatar of johnsone
johnsone
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I wouldn't avoid BI tools, cause it sounds like a normal BI application.

Also thousands and more rows are no problem at all (order of magnitude!). It will get only interesting when you have to check 10 millions row and more.

From your description: A normal data warehouse should do it. So get your own copy of Kimball and Inmon and have fun with dimensional modelling.
Avatar of suhaib79
suhaib79

ASKER

thanks johnsone,

but maintaining thousands of validations which access multiple tables in trigger might be a nightmare for developers , it is better to have some visualized tool to build these kind of validations and logic outside application.
If you have the right developers it isn't an issue.  We built it completely table driven.  Add a rule to the table and it gets checked.  If you want to build a tool to maintain the table of validations you can, but I wouldn't imagine changes would be that frequent.
Thanks ste5an,

from my little knowledge in BI, I think we need to build ETL  to extract data according based on these validation and logic, if I am right do we need to modify the ETL each time the logic is changed?
if I am wrong where these validation and logic will be applied ?
Validation and ETL are two different processes in a data warehouse approach and your given information.

Your described validation rules are data warehouse queries.

Either T-SQL on the plain dimensional model or DAX/MDX on tabular/multidimensional analysis models.
First of all, do these validations need to be provided in real time, or as a nightly batch, or a weekly job, etc.?
I would advise against real time, since, assuming hundreds of stores with point of sale systems, running these validations with triggers would likely bring the system to its knees. A couple of simple validation checks is one thing, hundreds or thousands is something else.

A nightly batch job that runs the rules checks is very feasible, and is somewhat similar to what we do. A logging table, containing pertinent information would be my recommendation. Do you already have a data warehouse in place? If not, it would probably be something to look at in the near future, but I would assume looking at a DW solution for your problem is not within the scope of a solution, as that in and of itself is a major undertaking (e.g. scope, design, testing implementation of a DW is not an overnight task :-))

Avoiding BI, assuming no DW, I would first create the logging table that contains pertinent information (e.g. date/time of sale, date/time of job run, store, customer, order numbers, infraction). I would create an infraction table with all possible infractions. That way, you can add new ones easily, and all you have in the log table is a FK back to the infraction.

I would look into a way to break down infractions into logical groups, and create a stored procedure for each group. There are likely going to be a lot of CASE statements in your code. Of primary concern is to have proper indexes on your base tables, both for JOINs and for conditions in your WHERE clauses. Don't be afraid to have (actually, with violations logically grouped, you will) many INSERT/SELECT statements within each procedure, each one handling a different violation.

We run something very similar (e.g. trading violations within the various security markets) on a nightly basis. Numerous tables, many joins, millions of rows and terabyte and larger databases.

Tuning is going to be the biggest issue. Don't try to do everything in one statement. Do only the fewest number of JOINs necessary to get the job done. Keep the log table to a minimum. Only FK references when possible. For example, storeID, customerID, orderID, violationID and time stamps. Multiple statements (e.g. INSERT/SELECT) per procedure. You can then create views from this table with the appropriate JOIN statements (or create them on-the-fly with ad-hoc queries).

Have fun!
Avoiding BI, assuming no DW

in this case take a look at StreamInsight.
Don't invent the wheel. You need a data analytic tool. Of course you can create your own but will take time and money.
Some validations can also be made with database triggers but in general you'll really need a data analytic tool.