We help IT Professionals succeed at work.

Re-design an existing app's DB

If you need to totally re-design an existing app's database, how much % effort would be distributed between

*Analyzing existing structure and schema and data flow and business rules and needed enhancement.
*Designing new schema
*Create new code for data access
*Create new reports
*migrate data
*test
*document

Any other critical piece missing in the above?

thank you.
Comment
Watch Question

Commented:
It all depends on your application design and also how complex the data is, how much amount of data the database holds etc etc - Based on which, we need to distribute the effort.

If the design is complex, then more % of efforts goes to it. If the data volume is high, then you need to put more effort into it. So, it all depends on the above factors.

At the 1st shot, the points you have specified looks perfectly fine.

Do let me know if you need any other information.
Top Expert 2014
Commented:
It also depends on the database, E/R, UML tools you can use to leverage your efforts.
To asdd to geek_vj's comments:
*Analyzing existing structure and schema and data flow and business rules and needed enhancement.

You need to do at least some review of the existing database design and some analysis of how well it is performing.  If there are performance issues (which is often the case if you are considering redesigning it ;-), then you need to try to figure out whether this is due to the engine selected, the indexes that have been included, the way the application has been coded to access, or the basic design of the database itself (or some combination of all of those).
You definitely need to know the current and, preferably, any desired Business Rules that wlil come into play.  It may be that changes in the Business Rules and, therefore, the expectations as to how the database is used, may be the cause of performance issues or may be able to be addressed in ways that will improve performance.
The data flow needs to be analyzed also for the same basic reason as the Business Rules . . . it may have been that there was one kind of data source in the past, e.g. someone was sending in spreadsheets with data, but now there is a different one, e.g. a CSV file being FTP'ed to a location.  The original datasource may have pretty much dictated a particular approach to getting the data and, while that approach is still functional, there may now be a better option due to the newer data source.  Again, this could provide enhanced performance.
*Designing new schema

As mentioned, before you design a new schema, learn what the old schema is . . . and make a decision  about designing a new schema based on that information.  From where I sit and with the ability I have to see your old schema, I can't really tell youwhether you need to or how much effort you need to put into designing a new schema. However, you probably have a better view.  ;-)
*Create new code for data access
Also, whether or not to create new code for accessing the data may well depend on the existing code.  If your apps are all creating dynamic SQL from scratch, moving those data accesses to stored procedures may offer a lot of benefits.  It will take some effort to determine where all the dynamic SQL is being generated and then not only what all of the SP's are that are needed but also generating their code (thus the need for analyzing the existing database and apps ;-).  However, that would very likely be a worthwhile investment of your efforts.
*Create new reports
At least initially, you can probably create views that can be used to drive the existing reports, so you need to determine whetehr the existing reports provide what is needed.  If so (or if, say, 85% of the need is answered by existing reports), then I'd set up the views to provide the data in the form expected by the reports and I'd fight the reports battle a little later. ;-)
*migrate data
I would set up an ETL process to accomplish this.  Essentially, the initial use of the ETL process would be to create the test bed database for accomplishing any redesigns of schemas, data access, etc.  I would approach it by moving data to staging tables and then extracting the data from the staging table and populating the tables in the new database.  
When the testing is done and you will be able to say that:
  • The ETL process works correctly;
  • Any redesigned data access works correctly;
  • Any views designed for any existing reports that will be retained (even if only for a "transition period") are correct.
At that point, you have the ETL for migrating the exiting data to the new database. ;-)
*test
Hmmm, how important is it that this redesign and migration are done correctly?  You need to design the tests and test based on that answer.  (Remember, failure of the new database to perform as expected [or at all? :-O ] may be directly related to the security of your job. ;-)
*document
I use what I refer to as the "Squashed on the Freeway" rule with regard to documentation.  That rule, basically, is "Provide at least enough documentation so that, if you are pretty much squashed on the freeway and are in a coma in an ICU, the database/apps world won't come to an end also."
However, I recommend documenting your analysis of the old database/system as you are analyzint it (maybe not in full detail but enough so that you know what it does, how it does it, and any new expectations regarding improvement in or enhancements of performance).  I also recommend documenting your planned approach to the new database, not only in the sense of the design and expectations but also how you plan to migrate and test.  Finally, I would recommend documenting the process and events of the test migration and performance testing and, especially, of the final design, interfaces/data access, and migration.
As to how much time/effort to spend where . . . well, that is going to depend on how much is needed.  However, I will say this much, the better the design of the whole process (including the test migration and testing), the fewer the surprises along the way and the fewer the surprises, the less trouble along the way.  
This is probably not a weekend activity in the sense that sometime Friday you get the go ahead to do this and Monday morning it is completed and ready to go. ;-)  If the database is of any size at all, this needs to be carefully thought out, planned, and scheduled.  Also, everyone (i.e. all stakeholders, whether business, IT, or whatever) need to be not only aware of this activity but, at least initially, included in the discussions (if only by department heads) and they need to understand and buy into the process.  (You don't want someone, about 1/2 way through the migration, to start complaining that they didn't know it was happening, that it would take this long, or that they and/or their needs were never considered.)

Author

Commented:
geek_vj, the complexity and amount of data are not proportional, are they? ie. Huge data does not necessarily mean complex and vice versa, right?

This DB I am looking at is 400 GB. it is OLAP environment. The concern is too much redundant data, and hence the data refresh is putting some of the data 10 times across the database; so the logic may not be complex, but it is getting huge.

Author

Commented:
aikimark, by E/R, do you  mean Entity relationships? (or the lack of it)

can you please suggest an example of a efficient UML tool that can help in the redesign of a SQL Server database?

Author

Commented:
8080_Diver, you are amazing- thanks so kindly for taking the time to investigate each point!! thanks for the examples and the humor also. i laughed at least couple of time loud!!

thanks for confirming along with geek_vj about me not missing any critical points in the big picture.

the performance issues is primarily in excessive time to refresh.

just by looking at schema (you mean ERD?), how can you determine the complexity or how much effort is needed to redo it? can you please suggest some thoughts there, please?

reg code, it is all in SPs, but lot of cursors- so that is a problem, too, right?

reports also use SPs. You were suggesting views, if they did not need parameters to be passed, right?

by 'test bed database', do you mean the staging tables?

in your experience, what is the least amount of time you have taken to totally redo an app's DB? and the longest time? (just to get an idea of the kind of complexities and the time it takes for experts to 'squash' it)
Top Expert 2014
Commented:
@anushahanna

>>do you  mean Entity relationships? (or the lack of it)
Yes.  E/R is entity relationship software (as well as DB modeling path)

>>can you please suggest an example of a efficient UML tool that can help in the >>redesign of a SQL Server database?
Here are some packages worth exploring:
* Sparx Enterprise Architect
http://www.sparxsystems.com/

* Red Gate software

* Toad software
http://www.quest.com/database-management/

* Various products from Embarcadero
http://www.embarcadero.com/products/er-studio (specifically)
http://www.embarcadero.com/products
the performance issues is primarily in excessive time to refresh.
Define refresh.  What is the source for the data being used to refresh the database?

just by looking at schema (you mean ERD?), how can you determine the complexity or how much effort is needed to redo it? can you please suggest some thoughts there, please?
Well, you originally used the reference to the Schema, so did you mean ERD?  
As for how one determines the complexity of the Schema/Database, one way is to consider things like:
  • The number of FK/PK relationships in general;
  • The number of FK/PK relationships that occur in any given table;
  • The number of instances of cross-reference tables that handle the Many-to-Many conceptual relationships;
  • The number and variations of constraints in the various queries.
reg code, it is all in SPs, but lot of cursors- so that is a problem, too, right?
Cursors will almost guarantee performance issues.  

reports also use SPs. You were suggesting views, if they did not need parameters to be passed, right?
Depending on how much is involved with migrating the database and how urgent the need is to get some of the other aspects working better, you may need to consider whether or not you want to rewrite the SP's that the reports depend on.  The other option is to create views that simulate the original tables (even knowing that parameterized queries against them and constraints used with them may mean a slight performance hit on them).  

by 'test bed database', do you mean the staging tables?

No, what I was referring to is a test database.  Prior to migrating for real, I would highlty recommend migrating to a test environment so that you can work any kinks out of not only the migreation ETL but the ETL  that is set up to handle data updates and the reports.  You should also test the replacement stored procs and any other changes you are making.  As I said a bit later on in my reply, I would test as though your job depends on it becuase it just may. ;-)
in your experience, what is the least amount of time you have taken to totally redo an app's DB? and the longest time? (just to get an idea of the kind of complexities and the time it takes for experts to 'squash' it)
Least amount of time: 4 hours
Most amount of time: Roughly 3 man-years
The former was a relatively simle database, needless to say, while the latter was a much more complex database that involved a team of 3 of us who were intimately familiar with the initial database.
You mention 400GB; however, the quantity of data involved is not nearly as big of an issue as the number of tables, indexes, SP's, triggers, and UDF's.  If you have 400Gb in 6 tables, that is entirely different from 400 Gb in 400 tables.  For the most part, I wouldn't worry about the size of the database in terms of Gb.  Insteada, I would worry about the number od database objects to be converted, the complexity of their relationships to each other,and the complexity of the transactions affecting them.

Author

Commented:
8080_Diver

thanks for your helpful input:

refresh: the SPs that load the data from several sources into the reporting database.

schema-erd: Isn't ERD the best way to get a quick picture of the database schema in general? The other way to do is through system views (like sysobjects etc, right).

can you please look at tht attached ERD, and suggest what category would you see it under?

as you can see, there are 100s of tables, but also lot of redundant data. because of no relationship. So, regardless of size, we have a problem, right?!

"Depending on how much is involved with migrating the database and how urgent the need is to get some of the other aspects working better, you may need to consider whether or not you want to rewrite the SP's that the reports depend on."

Do you mean if there is less time, don't mess with the code change? and use the other option of views? with the views, inserts should be OK. but what about deletes and updates? will they be consistent as inserts?

In your case where it took 3 man-years to do a re-do, how many tables/SPs had you touched/changed? what was the complexity that needed extra care and attention to the project?

thanks again for your very thoughtful answers...

Author

Commented:
forgot the attachment..
erd.jpg
Top Expert 2014

Commented:
where are the relationship lines?  I only see a few between tables in the lower left corner of the diagram.
anushahanna,
I don't see an attached ERD . . . maybe I am missing something though.
When you refer to the refresh/loading of data from multiple sources, are you completely reloading or are you just working with deltas/updates?
With situations where you have "100s's of tables", IMHO, there basically isn't a good way to get a good handle on the "design" by trying to look at the whole thing.  The approach I usually try to take is to pick a tabel, preferably one that appears to be reasonably important, and then, in effect, pick up the spider web by that point.  then identify the key relationships associated with that table.  Then I would move to the next primary table.  Eventually, you have to begin to take these "snowflakes" and figure out their interactions with each other.  
That process is going to take time and, if you don't have much time, you may have to approach the problem using "badnaids" rahter than major surgery.  In other words, if you are being pressed to get this done in, say, a month, then you need to treat the symptoms of the problem and plan how you are going to treat the problem later.  If you only have enough time to take a segment of the database and redesign it and then build the views to simulate the original design then I would take that route; however, you need to move all access to the database to stored procedures and that those stored procs accept the old version parameters and then work on the new version design.  You can progressively modify the database design in this manner but, at some point, there will come a time when you need to revise/redesign the applciations.
In the case where it took the 3 man-years, we had something like 20 tables and the apps were all using dynamic SQL when we started the project.  The first step we took was moving all access to the database into SP's.  Once we had control of the access outside of the applications, we then moved on to the databse design issues.  We basically took the approach I described above in that we identified the worst subset of tables and worked out a new design for them.  That resulted in our having to use different table names which were then used to create views that matched the old table names.  We then created stored procs that accepted the same parameters that were previously being provided and then simply handled the processing of the data differently so that any INSERT/UPDATE/DELETE activity correctly impacted the new tables.  That way, any impacts on the database were, in effect, replicated through the views so that everything looked the same to the apps.
The final stage in the process was to figure out what the apps should have been doing and then reconfigure them.  That resulted in more SP revisions and a good bit of code changes in the apps. ;-)
I have to agree with aikimark, that looks a lot more like an ED than an ERD, i.e. an Entity Diagram vs Entity Relationship Diagram. ;-)
As a first step, I would look at those cursors and see if I couldn't eliminate them.  
Top Expert 2014

Commented:
Since this is an image, the details pixelate when zooming in to try and understand what is in the tables.

Author

Commented:
>>where are the relationship lines?  I only see a few between tables in the lower left corner of the diagram.

aikimark, yes, that is what it is!! so you would say it is certainly worth a re-design, if it wants to be an enterprise DW reporting DB, right?

Author

Commented:
8080_Diver, thanks for your helpful post.

>>are you completely reloading or are you just working with deltas/updates?

only deltas, but it is not efficient - so looking at better options.

I'll focus on your idea to identify key snowflakes? perhaps star to begin with?

this is only a reporting DB; so all dynamic sql is within SPs only.

I may be able to buy time till year end. So would you recommend a more holistic approach. (major surgery!); and you feel a re-design is OK and a totally new design is needed from scratch, right?

in your 3 man-years, what would have influenced a suggestion to create the design from scratch instead of re-doing it?

"different table names which were then used to create views that matched the old table names." - is this the "INSTEAD of TRIGGER" approach?


Top Expert 2014
Commented:
>>...so you would say it is certainly worth a re-design...

No.  I'm saying that what you are showing us lacks inter-table relationships and the image you posted lacks field information when zooming.

If this is the basis of your redesign decision, then I would say that you should avoid it without some tools.  Don't even attempt to redesign this without sufficient information.

Author

Commented:
aikimark, i meant to confirm you that there is indeed no inter-table relationships at all that exists as of now, except those bare few at the bottom left.

this is meant to be a DW hosting the reporting DB
in your 3 man-years, what would have influenced a suggestion to create the design from scratch instead of re-doing it?
Essentially, that is what we undertook.  The reason we too the approach that we did was that, had we completely redesigned the entire process, it would have taken even longer.  By redesigning the database but using views to hide the fact that it was redesigned from the user interface application, we could do all of the behind the scenes work on segments of the database while still leaving the overall application intact.  It took a little longer because we elected to do the redesign as a "behind the scenes" change and to handle the conversion from the old ways to the new aways via the stored procs; however, the other choice would have been to freeze all development on the existing product while we took the time to redo the databse and the product.
this is meant to be a DW hosting the reporting DB
This is one of the falacies that is often thought to be fact.  A Data Warehouse actually requires the relationships as much as an OLTP database does.  Whether you define the relationships in the diagram or by means of your queries, it exists.  However, if you rely on the "intuitive nkopwledge" of the developers for the proper relationships to be used, then you are more subject to erroneous or absent relationships.  If, on the other hand, you clearly define the relationships in the Database Diagram, it is not only there for everyone to see but it is enforced/enforceable by the database.
When a report has to be developed, somehow, somewhere, someone has to figure out what data has to be reported and how to connect the various tables in order to do that.  The only conceivable way that using relationships can be avoided is if the database has a table for each report and that table contains everything that the report will require.  If that is the case, then the database is going to suffer from serious bloating because the same data will be stored in multiple tables.

Author

Commented:
Thanks for your wise input, indeed.