Database Reengineering: Some Examples

| Vasyl Soloshchuk

Database Reengineering

There are several situations in which database reengineering may be required:

  • Software performance problems caused by an inefficient database.
  • Changed requirements to software products demanding updates to data structure.
  • Significant differences in the latest database version compared to earlier versions, where these differences demand changes in the database.

When database reengineering is on the agenda, there are a number of possible approaches. These are based on the following changes to the database:

  1. Redesign. If the existing data structure is deficient, its redesign can result in higher performance.
  2. Migration to or addition of a database of the same type. Using another database that has more acceptable indicators compared to the existing one can improve performance of the software system or solve other issues.
  3. Migration to a database of another type. In some cases, using a non relational database instead of a relational one may solve performance problems. In other cases, a relational database might be more efficient.

Today, I would like to discuss database reengineering, and to provide some examples of this.

Redesign

If, when analyzing a database structure, you see that it is not relevant to the first and second normal forms, it should be redesigned. This means that you may need to change essences, tables, or columns, or carry out decomposition and aggregation.

Under database redesign, significant code upgrades may be required. Any changes or redesign of the data structure will cause code modification on the server side. This is a complicated task that becomes easier if you are able to benefit from an ORM (e.g., Hibernate). Then, when you modify code, the database is automatically renewed and all the changes are represented there.

Example 1

A scheme in a database included about 450 tables. Data profiling showed problems with some of these tables. During reengineering, we provided the following changes:

  • added some columns and removed others;
  • cut all essentials that had been designed incorrectly, and added improved essentials instead.

As a result, the scheme included more than 450 tables. The system performance was improved as it was able to work more quickly thanks to the addition of new essentials (which included cascade operations, indexes, connections between tables, and primary keys), and the improvements to incorrect essentials (for example, where possible, simpler data types had been implemented).

Example 2

In this case, database reengineering was needed not due to performance problems, but to changes in business logic, which had caused new data and data types to appear. The database structure had thus changed, and some tables appeared to be redundant.

As a result of reengineering, the modified database became relevant to the new business logic.

Migration to or addition of a database of the same type

If some other database has better indicators, migrating to it may result in your software working more efficiently. The following table compares some of the indicators of popular relational databases.

Database Max DB size Max table size Max row size Average execution time(5) Average CPU utilization(5) Average memory usage(5)
MySQL Determined by operating system (1) MyISAM storage limits: 2TB;

Innodb storage limits: 64TB (1)

64KB (1) 1222.3 ms 24% 12.5 MB
Oracle Determined by operating system (2) 4GB * block size (2) 8KB (2) 1275.3 ms 34% 17.9 MB
SQL Server 524,272TB (3) 524,272TB (3) 8060 bytes (3) 1101.3 ms 27% 13.2 MB
BD2 248 bytes (4) 128TB (4) 32KB (4) 1032.3 ms 28% 22.6 MB

Data sources: (1) dev.mysql.com; (2) docs.oracle.com; (3) msdn.microsoft.com; (4) www.ibm.com; (5) Youssef Bassil. A Comparative Study on the Performance of the Top DBMS Systems. Journal of Computer Science & Research (JCSCR) – ISSN 2227-328X Vol. 1, No. 1, Pages. 20-31, February 2012

Depending on which parameters are the most important, the most suitable database can be selected.

Another reason for adding a new database to your software: the new software client stores data in a different database from the original. Thus, to enable the new client to use your software, the software product should work with the database.

Example 3

A software product was developed to process data stored in a Microsoft SQL Server. A new client had all data stored in PostgreSQL, and used the same data in own application, meaning that the data could not be migrated to another database. The software product was to be modified so that it could work with PostgreSQL as well. Each database has its own specific features. Thus, to minimize changes in code we had to withdraw some features and use only general ones. In addition, we changed the data structures in both databases to make them compatible. Still, we had to create some exclusive code for each database.

Migration to a database of another type

Because relational databases were not created for OOP languages, you may need to consider using a non-relational database.

Conclusion

Databases are one of the most important components of every software product, since they store all data used in the software. Database reengineering may be required for various reasons; depending on the need, one of the approaches mentioned above can be used. I hope the examples of database reengineering provided help to demonstrate some of these reasons, objectives, and challenges, as well as the results of our actions.