MSBI Geek: Database Design Consideration

So, this is my 2^nd blog which details various database design consideration that one should follow while developing database design. This is just a high-level overview of guidelines that I normally follow. The list is not exhaustive, so there will be a few more considerations here and there depending upon different personnel and environment under which databases are designed. What this blog doesn’t include is any in-depth SQL syntaxes or code walk-through as the sole intention behind blogging this topic is to share information around how different bits and pieces fits together from database development perspective.

Database design is one of the most important aspects of any application development life cycle. Performance issues because of poor design & technical debts are not always visible when the system is small & early production days but as the system grows with time they become more apparent and increasingly difficult to manage. It takes far more effort to improve performance for a live-application when measured against identifying & fixing issues during early phase of development cycle. It is always recommended to spend quality time (at least a couple of iterative cycles) in thinking, discussing & designing database architecture up front rather than leaving it with “Let's see how it will perform attitude”.

OLTP/OLAP Systems & Normalization :

OLTP System, also known as transactional systems contains day to day data processing information. These systems are built on top of Relational Database modelling techniques with optimal use of normalization for faster I/U/D operations. In these systems, maintaining data integrity & consistency with minimum redundancy takes precedence so data is carefully normalized to make this task easier. Having said that, the tables still has to be optimally normalized because an overly normalized database does have its own performance bottlenecks when performing I/U/D operation.

OLTP System, also known as transactional systems contains day to day data processing information. These systems are built on top of Relational Database modelling techniques with optimal use of normalization for faster I/U/D operations. In these systems, maintaining data integrity & consistency with minimum redundancy takes precedence so data is carefully normalized to make this task easier. Having said that, the tables still has to be optimally normalized because an overly normalized database does have its own performance bottlenecks when performing I/U/D operation.

Online Analytical Processing (OLAP) applications are designed for reporting & analysis purpose & it is built using dimensional modelling techniques. Here preference is to get data as quickly as possible in a required format, even at the cost of some redundant data & extra storage space. So some degree of de-normalization is commonly observed in OLAP system which means lesser joins among tables & faster retrievals.
The data between OLAP & OLTP systems are not normally in sync and usually lags behind by a few hours or a day even in some cases. However, there are automated jobs that run in the backend to sync up OLTP/OLAP systems over nightly as these systems are designed to work that way anyways.
Based on potential usage of the source system, careful consideration is required to strike a right balance with level of normalization.

Maintaining Data Integrity:

There are certain business rules that can be designed & controlled at the database level there by providing some flexibility to not to replicate them on the interfacing application or the feeder system. For eg – log current Date & Time as LastModifiedDate, log current logged-in user’s identity as LastModifiedBy, Age of an employee should be between 21 & 60 just to name a few. All these rules at the database level help in maintaining data integrity. Primary keys, foreign keys, & default constraints are some of the out of the box features that can be used to maintain data integrity.

Primary Keys: Primary key is something which uniquely identifies a record in a table. Spend good time on choosing a primary key for a table, giving future expansion plan a consideration. Once the primary key is decided and database architecture is designed around that, it will take great effort to change it to something else in future. Apart from this, primary keys also holds an index that speeds up searches requests, choosing one which doesn’t change too often is another criteria behind selecting the right key.

Foreign Keys: Another feature of Data Integrity is to handle Referential Integrity. Foreign key constraints come handy to control these validations. Putting foreign key constraints brings quality control as well as restricts data entry or deletions at the database level if the operation could result in orphaned and/or unaccounted records.
Consider a Billing operation system where nobody would like to look at a Bill or an Order which doesn’t have a customer associated with it. Proper use of foreign key constraint as per business requirement will always take care of scenarios like this without much add-on validations.

Default Constraints: It is always advisable to put a default constraint on a database field, if it can derive some business value out of default value, instead of making field nullable & storing NULL values. This also takes burden off the feeding system to not to worry about keying in information in a NOT NULL fields for which information is not available at that very moment.

Optimal Use 0f Data Types :

Most important of them all is choosing the right data type. Remember, every data type consumes some storage space as well as index takes time to re-arrange data with every I/U/D operation, so choosing data type with right size does make its own difference. Imagine how much difference it can make for every byte saved on choosing the right data type if we are dealing with millions & millions of records.

Considering future expansion plan is another criterion that should be taken care off while deciding data types. Choosing a data type which works at present but potentially could fail in future is not a good data type decision making. This is where data profiling comes into picture and has to be done carefully to analyse how the data is being used currently and how it may be used in future.

For eg - choosing int datatype to store 0s & 1s & choosing datetime data type for storing Date of Birth/Joining date is definitely not a good design approach. It would have been a wiser decision to use a bit and date respectively.

Indexing Decisions :

Indexes does speed up data access as it cuts down the number of records that needs to be searched through to return searched data but it does have negative impact on Inserts, Updates & delete operations.

Indexes does speed up data access as it cuts down the number of records that needs to be searched through to return searched data but it does have negative impact on Inserts, Updates & delete operations.

Creating an index is more like choosing a trade-off between whether the performance gain that is observed by creating a few indexes outweighs the potential issues of extra storage requirement as well as slowed down I/U/D/ operations.

There is no golden rule to the number of indexes a table should have and it also varies with whether you are designing an OLAP or an OLTP system but trying out different combination is the best way to figure out what is the optimum level of indexing your database tables should have.
For creating primary keys, Microsoft recommends to use integer as Primary keys because SQL Server is optimized to index integer data types. Similarly, when joining multiple tables, it is advisable to create non-clustered indexes on the foreign key columns to enhance performance.

Utilizing Stored Procedures / Views/ Triggers :
An enterprise database design is incomplete without use of any of these. There are always workarounds available to not to use any or all-of-these but these are integral part of any database project and must be used for respective cases because advantages of using them mostly outweighs the disadvantages of not using them.

Stored Procedures: Stored procedures are compiled only once & then they are stored in an executable form, so stored procedure calls are both efficient as well as quick. Apart from this, it also provides flexibility, scalability as well as easy maintainability. Remember if you don't use stored procedure and embed all database related business processing in the application, how difficult it will become to make even slightest of changes to the code block. This will call for code re-compilation & deployment of entire code every time. Encapsulating business rules in stored proc makes code easier to maintain as well as it provides unit testability.

Views: For an enterprise level application, no one wants to expose each and every table & database object to everyone. This is where views come handy. They provide flexibility to hide business complexity as well as marking out sensitive data of your database and expose only what you want others to see. It also provides code reusability. What if you pull up a long query which contains joins after joins to get some data. You don't have to right big SQL statement every time, just put it in one of the view & extend views wherever it is required.

Triggers: These can be used to maintain audit trail information. These are fully customizable & are executed automatically when the data is inserted/updated/deleted so in theory these are designed once and are done for all. Apart from this, these can be used to encapsulate business rules at the database level thereby preventing errors to bubble through to the presentation layer or the feeder system.
Triggers do have their own disadvantages as well like - putting some extra overhead on database server & being invisible to the outside world which means at times it becomes difficult to track what went wrong with the data in the database. So the trade-offs have to be carefully analysed to make the decision to use them or not to use them.

Naming Conventions & Standards:
Working on somebody else’s code is almost always a nightmare if proper naming conventions & standards are not followed. Following proper naming conventions bring both consistency as well as it reflects quality of work being developed. In a multi-team environment where more often than not the team that support an application is different from the team that actually developed the application, just imagine how difficult it will become for the support personnel to dig deep into each table to figure out what each bits and pieces are doing. A proper documentation & comments do play their own part in successful implementation of any data warehouse/front-end application.

In general, the bare minimum requirement for any database system is to :

1. Provide access to data as per the agreed format as quickly as possible.

2. Maintain data in a consistent, accurate & reliable state.

3. Minimum data redundancy which results in easier maintainability as well as it requires lower storage space.

4. Scope for scalability to meet future business requirements with minimum impact to existing system.

There are a number of factors on which database design consideration depends, some of which are as follows.

Always remember to give unique & easily identifiable names to the constraints, it makes it easier to diagnose & understand where they are used when someone generates consolidated database scripts for any reason.

Like I mentioned above, this is not an exhaustive list of guidelines but something which I refers to too often to design my applications so please do share your feedback & comments. I will be more than happy to amend/improve this posting further.

Thanks,
Rishi Saxena

References:

Microsoft
Wikipedia
Other Blogging sites

MSBI Geek

Pages

Thursday, June 12, 2014

Database Design Consideration

No comments:

Post a Comment

Blog Archive