Why should we protect container workload
Why should we protect container workloads, and why don't too many protect their container workloads that run in a kubernetes environment? This is probably a question many vendors and developers ask themselves.
If you read this blog you are probably already familiar with Docker och Kubernetes aka MicroServices or Container Workload that is a more generic word. If you want to know more about you can always go to this website: https://kubernetes.io/docs/concepts/overview/
Why do many not backup their containers
I have been sitting down with many developers during the last 2 years and trying to understand why many people are not protecting their workload, especially now when ransomware attacks pop up everywhere and data gets encrypted on a daily basis all over the world.
Most common statements I get from developers are following,
My Containers are stateless and they only do temporary job and we don’t need anymore
My Containers are stateless and our data is located in a external database/Bucket outside of kubernetes world
I have now idea how to backup my containers
And the first two comments are probably two good reasons to not run a backup via Kubernetes, especially the first one if you can delete your container and rebuild it and then you can start from scratch with no history, then sure you don’t need to backup your container.
The second reason I guess they have moved out the database and storage to some sort of external services like Google Bigtable, AWS Aurora or maybe a virtual machine with a MSSQL or similar and for storage I guess they are using a S3 storage location or NFS storage.
And in this example they are probably running a local backup for the MongoDB and the NFS Storage separate, so in case of a disaster or need to rebuild the container services they just deploy a new container pointing to the external database and storage, Voilà we have now protected our container world.
My understanding is that this is the right way to go by designing your container application to keep storage and database service outside the kubernetes platform, because of scale and performance, but is it the right way to protect your data? Maybe it is…
But wait…. Why do many developers still backup their containers?
What the developers tell us, why are they backing up their containers, and this is probably more interesting for developers who don't backup their containers.
We have migrate legacy applications to containers who are using persistent storage
We develop applications where we using persistent storage
For compliance reasons
We want a consistent backup of our applications where database and storage is in sync
I guess the first two lines where they use persistent storage are a good reason, they need to backup that data and by design this is probably not the right way to design your application, but I guess all legacy companies what to be part of the container world and do everything to have a containerized application, even if they have design their application “wrong”.
But then we comes to the last two comments, “For Complaince Reasons” and “We want a consistent backup of our applications where database and storage is in sync”
Compliance Reason:
I guess a CxO has demanded the developer to protect all data because of NIS2 / DORA regulation or similar regulation. Is it correct to backup these containers? Again this depends on what kind of data microservices does and if/where they save the data.
If the data is on persistent storage or not, or similar to the design above where data is located outside the kubernetes control. And this bring us in to the next statement.
Consistent backup of our application:
This was a really interesting statement a few developers said to me, if we backup the database on its own and our storage (S3 or NFS) on its own, we can not guarantee that the data is consistent between each other and we can either ending up in a compliance situation where we of legal reasons most guarantee that the data did look in a state and meta data need to represent that data correct. Or they can end up with ghost data on the storage side or losing data on the storage side where the database expected a file that should exist didn’t exist.
Or another situation is that we can track in container logs what really happened in the background, why did our data get corrupted or why did the application delete that data in the database. This has nothing to do with statelessness or not, this is just common sense in the same way as why we back up a normal VM in the cloud or on-prem.
Conclusion
This is probably nothing that says one way it’s better than other, but by backup your containers with a awareness of both database and storage layer it is very important, if the backup application doesn’t support this, it really doesn’t matter to backup your containers if you have designed your microservices in the right way, but if you have any data that use persistent storage or want a consistent backup, make sure you are using a backup application that backup all your data and not only a part of it.