Link to home
Create AccountLog in
Avatar of koshari
koshariFlag for United States of America

asked on

Relationship of DB structure, stored procedures, ETL, data warehouse, data content

There is a dispute among technical team members in the group I'm in. I want the parties to all understand each other's view points, but in the end a decision is usually made by the senior technical lead.

The discussion has concluded that the ETL designers are not getting sufficient consideration in the design. The design of the Database structure, ETL, stored procedures, and triggers need to be harmonious. So the group is building a complex design review process, and I'm not sure how that will affect developers and testers.

Would any of the wonderful experts reading this care to comment? I will award points for clarity, logic, and how pertinent the response is, and expect to have to divide them among several experts.
SOLUTION
Avatar of Kobe_Lenjou
Kobe_Lenjou
Flag of Belgium image

Link to home
membership
Create an account to see this answer
Signing up is free. No credit card required.
Create Account
SOLUTION
Avatar of Zberteoc
Zberteoc
Flag of Canada image

Link to home
membership
Create an account to see this answer
Signing up is free. No credit card required.
Create Account
ASKER CERTIFIED SOLUTION
Link to home
membership
Create an account to see this answer
Signing up is free. No credit card required.
Create Account
Avatar of koshari

ASKER

Kobe_Lenjou  says, "...Performance and design are issues to be handled by the poor DBA, who's hands are tied because of the extremely bad design of the ETL guys..." The notion of the review process is to make sure the design isn't bad because the ETL guys didn't get feedback from the poor DBAs who thought the ETL design stinks.

Zberteoc, the ETL is used frequently to add/update data in the warehouse, and is not limited to an initial load. What logic is in the ETL vs. what is in the stored procedures is part of the debate.

dbmullen, you hit the nail on the head: "...is no right answer.  Each problem, application, source database is different." "...one of those answers is WHAT are you going to do with it?  reporting, transactions, backup, one-time history, data sync.. "

My question said that we were "... building a complex design review process..." If that smells of a standard, then it is the process that is a standard.

The application of all the potential solutions you mention are the topic of the design that should result, be reviewed, and revised as a result of the review by those who contribute and have a stake.

So I reiterate my point: "The group is building a complex design review process, and I'm not sure how that will affect developers and testers." I thank each of you who have commented, and you three have not refuted the notion of a review and revision process, you have reinforced it.

Would anyone care to take the testers and developer's viewpoints into account?
SOLUTION
Link to home
membership
Create an account to see this answer
Signing up is free. No credit card required.
Create Account
SOLUTION
Link to home
membership
Create an account to see this answer
Signing up is free. No credit card required.
Create Account
"Zberteoc, the ETL is used frequently to add/update data in the warehouse, and is not limited to an initial load. What logic is in the ETL vs. what is in the stored procedures is part of the debate."

I dind't say it was, au contrary I said it can be that or an ongoing process. The point is that the database desigh is a sepparate thing that ETL and from you question it didn't seem you were aware of that.

The question is way too general anyway and little bit confusing. Every answer given here has it points that can be taken into consideration. A research on the subject of database desgn and ETL can give you far more insight then couple of fast and short answer. The subject is simply to vast for EE.
"Zberteoc, the ETL is used frequently to add/update data in the warehouse, and is not limited to an initial load. What logic is in the ETL vs. what is in the stored procedures is part of the debate."

I dind't say ETL was only a one time process, au contrary I said it can be that or an ongoing process. The point is that the database design is a separate thing from ETL and in the question you didn't seem you were aware of that.

The question is way too general anyway and a little bit confusing. Every answer given here has its points that can be taken into consideration. A research on the subjects of database desgn and ETL can give you far more insight then few fast and short answer. The subject is simply to vast for EE.


Then you say:
"What logic is in the ETL vs. what is in the stored procedures is part of the debate."

What exactly is the meaning of this? You can't put ETL one one side and stored procedures on the other side. ETL is a process that can have a lots of different components including stored procedures.
Avatar of koshari

ASKER

koshari: "What logic is in the ETL vs. what is in the stored procedures is part of the debate."

Zberteoc:: What exactly is the meaning of this? You can't put ETL one one side and stored procedures on the other side. ETL is a process that can have a lots of different components including stored procedures.

So if I understand what you are telling me is that If the design specifies that the Transform will expect a Stored Procedure to perform a function, there is a dependency formed between the ETL and Stored Procedure that would be better worked out during design than during build - agree? Is this not one of the reasons to have an agreed design spec?

dbmullen: ETL either works or it doesn't. So you are meaning that incorrect data in the database would be the fault of ETL, and that other components (3rd party tools with jdbc connections, stored procedures, SQL, the data itself...) are blameless? Do tests need to be constructed to verify each dependent component working together?

This discussion leads me to believe that the design process not only needs to yield an agreed-to (harmonious) design, but also an agreed-to test approach (how and what to test, not the specific test cases).
SOLUTION
Link to home
membership
Create an account to see this answer
Signing up is free. No credit card required.
Create Account
Avatar of koshari

ASKER

Thanks for kicking in your opinions. In summary:
1. A DBA, Test Manager, and ETL Developer are participants in a design. If the EXTRACT is complex, someone familiar with the source does it with views and/or procedures.
2. Use a good design to build a database an then use the ETL to populate it. Use
triggers, snapshots, queues, truncate/reload, soa, web-service, database links, oracle gateway, udb information intagrator, sql linked-servers, ftp flat files, logical standby, disk mirrors, SSIS, home-grown code, and 3rd party tools with jdbc connections where appropriate.
3. Consider in the design: Why are you duplicating the data somewhere else; What are you going to do with it once it's there; When does it need refreshed; How often does the "source" data change?