It does not happen everyday that as a developer we start working on a greenfield project. On many occasions, we have maintenance work to do on existing projects or we are rewriting an old application into microservices. What happens during these brownfield projects is that we have to cope with an existing database that is indeed critical as it contains all the historical data and we have to manage these for obvious reasons.
How can we efficiently work with an existing database? The answer is “it depends”. There are no one-size-fits-all solutions for this. What we need to do is analyse the database and how can it fit into today’s way of working.
However there are things we can use to help us determine what developers can do, and things that they should avoid.
Keep the database as it is !
“The existing database must stay”. That is what we’re told and we sometimes accept it blindly. What we really must do is understand what is behind this requirement. Data is the truth, it is what has been gathered by the company for years and it is very valuable. You can see that many hacking atempts are made to actually steal data as it is so precious.
So what does that statement really means? it means : “it has taken so long to gather this data and it is the company lifeline. We lose this data we’re as good as dead”. It doesn’t mean we cannot touch the database, it does not mean the data model has to stay the same, it simply means that whatever the model or data repository we are going to use, from the user’s perspective, the data she used to work with does not change. So it is more a data integrity imperative than forcefully making us work with an outdated database provider or format.
For some of us, this is stating the obvious but for many others, they understand this as a direct order that has to be followed or face consequences. In some organisations, this is the case and there are not a lot of things we can do. There might be valid reasons behind this but most of the time it is because IT people feel that data migration is a separate project that is inevitably costly and we cannot afford it.
As developers, we should never lose sight of one very important aspect of our job, that is advising. We all come with different skills and different experiences but we are more than people who write code to please the product owner overlords. More often than not, the keen insight a developer can provide will be welcome. We just need the confidence to dare express ourselves.
Changing the data model
The new application you are going to write will be awesome and successful, but this does not mean you have to follow blindly the guidelines the legacy database give you with its format, its presence or abscence of primary keys and foreign keys. What you can do is make your own model that better fits today’s technologies and architectures and just map afterwards (at the very bottom of all the layers your architecture might have) your brand new model to the old legacy database. That way, you keep having fun and deliver outstanding functionalities while preserving the database.
This way of working is particularly efficient when the legacy application keeps being used while you’re rewriting it. However, this means that in some case or the other, you still have to live with decisions that were taken 10, 15, 20 years ago, thousands of years in the computer age.
The good point though is that it is only in your data layer that you have to worry about it as the new model is the one used in your whole new application. It makes bearing the weight of ancient decisions a little bit easier.
Changing the database
With the new service being developed and a model that better matches the company’s way of working, you would like to have a database or a data repository that uses the full power of your new model. The good news is you can do it as long as you preserve the data integrity.
Data integrity means that whatever happens in your application, the existing data does not get corrupt and is preserved in its working state. While this can be difficult depending on the complexity of the existing database, it is not impossible and can be done while mitigating costs.
In the very likely event that the legacy application is still used, you can still change the database but you need to implement a synchronisation between the existing database and the new one, in addition to your data migration strategy. This can be seen as having redundancy but it makes phasing out the legacy application so much easier and can be done as fast as the new application is written. Without this, the refactoring or rewriting is not completed as long as the legacy database lives on.
So how to migrate?
You can either code it yourself or use tools like Flyway or Liquibase to do the deeds. In the ideal scenario, you will have a data architect that will advise you on doing this or a full team of dba to actually perform the migration and the synchronisation.
In the realy world, in can happen that these people don’t exist in your organisation and you are left alone to do this. In that case, it is still not a lost cause because you can do it. Writing the data migration script can take a long time but you can also write your own tool using your favorite ORM (so you like Entity Framework too?). This approach puts more pressure and responsability on the developer’s shoulder but it can be really rewarding once everything is done and goes smoothly.
Data migration does not have to be costly or combursome. It is not something to be afraid of and it is something that can greatly help the business in understanding their own way of working.
Fearing data migration hides another deeper fear, which can be that there are very few or no one who knows the business process of the organisation, and that is critical. By raising the point of the data migration, the IT department can actually help highlighting bigger and deeper issue in the company and that is also part of our job.
How to synchronise data
When you have two different data repositories, to have to keep them in sync for it to be viable as a project. The concept behind this is to have a tool monitoring the origin database for any change and send this change to the new database.
You can use third party software like Debezium to do so, or you can implement something yourself. If this is what you’re after, you can leverage the Sql Server Change Data Tracking (nice doc here) and implement something against it.
To the valiant developers who decided to move on from the legacy database, beware of the stored procedure trap. In many older applications, a big piece of the business is written inside stored procedure and that can be a real nightmare to deal with.
If your legacy database contains hundreds or thousands of stored procedures, you are facing terrible odds. This is the type database that gets to live on because the amount of stored procedure makes it so difficult to understand the legacy application and the rewriting process.
A gradual approach is key. You cannot avoid some analysis to be done to identify the stored procedure that are still in use and understand what they do. If you are a big fan of SQL code, then you are in heaven, if not, have courage ! Once the stored procedures are identified, you can start the process of replacing them one by one until they are all gone ! But this is an elusive target and in the whole world, there are plenty of databases living a long and happy life just because they have so many stored procedures that they can never be replaced !