Brand New Ways for Managing Open Source Databases Compiled by Experts


    Even in the near past, the database industry only consisted of a few major vendors. Databases remained largely relational and ran on single servers. Requirements like high availability were achieved through active-standby clusters. Using a vertical model to scale up, it was largely about shared storage as DRBD or SAN or asynchronous replication of the logs for synchronizing to any standby node. A couple of decades back, holding an entire database in the main memory was a really weird thought. On the other hand, distributing one database across various servers was something worrisome. Similarly, the concept of open-source DBs for mission-critical enterprise workloads was merely unthinkable.

    However, over the last decade, things have changed largely. We can nose see at least a dozen major open-source database vendors in the market, featuring different models of key-value, document, distributed, and graph databases by default. Memory-resident data now remains normal to ensure high performance with low latency. At least three among the top 5 performing databases are now open source, i.e., MySQL, MongoDB, and PostgreSQL. Nowadays, you can also think of managing a fleet of DB servers distributed across various data centers. You may also think of managing some of your databases by the third-party cloud-based database vendors. Considering all these advancements, let us explore how it will be like to manage the new-gen databases in the year 2021. 

    Automation taking up the center stage

    With a wide range of tasks to manage and limited hours in hand per day, DBAs and developers will not do everything manually. Automation has proven out to be the best possible way to get things done efficiently. When we have only very few databases to manage, then operating the databases would be much hands-on with some of the major tasks scripted in Perl or bash, etc. There should be a script to back-up the DB and another one to move the backup files to another location.

    Automation is a no-brainer in terms of enterprise database administration nowadays. There is a large number of IT configuration and automation systems that you can leverage. Some examples are Chef, Puppet, Ansible, and Salt, which all offer general-purpose frameworks to be used for building automation for various database topologies. The cluster management software specifically is written for managing database setups includes ClusterControl and MongoDB Ops Manager. All these applications will help the operations teams to better manage their functional cluster with a kind of system that is readily available. Building a fully-featured cluster management system from scratch by using a configuration management system is a big deal. It requires a significant level of expertise in managing automation tools and a good understanding of the management operations as backup scheduling and automatic failover reconfiguration of systems and rolling out the configuration changes, version upgrade, and downgrade as patching.

    Here comes the rise of the advanced DBaaS service platforms in which deployment, failover, health, backups, etc., are controlled by the software. The cloud providers offer excellent automation features. Amazon RDS is a wonderful example of automation of enterprise database at scale, which can automate deployments and patch the upgrades, point in time restoration, backups, scaling replicas, and ensure high availability against failover. For database-related queries, it is recommended to seek the support of reliable providers like


    Back in the 1990s and until the beginning of this century, Oracle and Sun Microsystems had made a great fortune by selling the scale-up DBs on big SMTP hardware. The database servers were much fewer in number but had been able to become more powerful as they grow vertically. These services, as the replicate, were given serial numbers and were cared for by the DBAs.  

    However, now the databases are much cheaper and run on commodity hardware. There are many of them, and we can easily get a new one at a much affordable price. The new-generation databases are also like a new breed of cattle, i.e., the open-source cattle. There of the top five DB engines, as we have seen, are open source. Open source DBS is slowly eating away into the database market share of the two major proprietary vendors in the field. Open source is quickly becoming the new data center standard for the databases, apart from the operating systems. 

    So, in the near future, it is highly likely that you will be dealing with an open-source database if you have not already been in it. Some are using even multiple open-source DB engines for their enterprise applications using heterogeneous data collection mode. In the modern world of microservices and polyglot persistence, the underlying datastore is largely dictated by the data’s nature. From the architectural perspective, the single instance databases with disk-based HA will be giving way to clusters that are distributed across various datacenters. 

    Is it necessary to have a DBA?

    In any case, DBA’s role is considered to be a specialized one. For someone, it may take years of experience and knowledge to become a seasoned DBA. There was only a couple of proprietary DBMS to consider in the past, so the specialized DBAs with specific skillsets related to that were ideal. However, over time even SQL Server and Oracle now have a considerable number of features and functions, which are not easy to manage. These are usually deployed typically as sole databases for the given application and need to be closely monitored, maintained, data to be backed up, and any issues to be dealt with in no time. These are the tasks that an ideal DBA needs to focus on.

    As we have seen above, over the past decade, a whole new type of database industry evolved. There are dozens of open source databases and cloud database services. So, it is not unlikely for any application to use multiple datastores. However, companies may rarely have a database administrator to manage all these datastores one use. On the other hand, one can also argue that the new-gen NoSQL DBs are much simpler and may not have the same learning curve as their erstwhile counterparts. However, this is also not true.