Amazon Web Services has grown to become a juggernaut of cloud computing, storage services, and networking. With almost 2 decades of refinement, and a 7-year head start from its competitors, it’s the market leader in cloud solutions.
AWS has over 300 hundred services, and a flexible pricing scheme, making it a top choice for any company or business looking to enhance their projects with cloud technology. Among its plethora of services, AWS can accommodate databases of almost any kind, so it’s almost natural to wonder: what are the benefits and limitations of SQL and NoSQL databases offered through AWS?
What’s a database and why does it matter?
Let’s start small. A database is a set of data that is held by a computer that can be accessed, expanded, or manipulated. Much like a library, a database has to have some kind of organization so it can be used efficiently. It’s easier and faster to find a book when they are arranged in alphabetical order than if they are haphazardly thrown around the room.
A database is where all the information used by the software is stored. To use a psychological analogy, if the CPU is the equivalent of the brain, and the program is the equivalent of cognitive functions, then a database is like our memory banks, the place where we store all the information about the world that we are constantly accessing and updating.
Almost every software needs a database, a place where information can be stored or retrieved. Without it, the program wouldn’t be able to save data, much like a person with global amnesia can’t remember their past nor create new memories.
Relational Databases
Conceived by Edgar Frank Codd in a paper called “A Relational Model of Data for Large Shared Data Banks”, the core idea of relational databases (SQL) is that data can be stored in tables and that those tables can, in turn, be related to one another.
Much like a spreadsheet, a table can have any number of columns and rows, with each cell containing data. So, for example, we can have a table where each row is a user and each column a piece of information about that user (name, address, email, age, gender, and so on).
Tables can be connected between them. For example, if we have a table with transactions we can link each row of that table to the user who did the transaction. That way we only need to have one entry for the user regardless of how many transactions they have done in the past.
Of course, there is a lot of nuisance involved but, at its core, the underlying objective of relational databases is to structure data in a way that is organized and easy to handle.
Non-relational databases
Relational databases have been the dominating paradigm in software development since their inception, but they are not perfect. That’s because data isn’t tabular in the real world and that poses a problem. For complex, massive, and unstructured data, building a model that accurately represents it can be challenging.
Non-relational databases (NoSQL) take a different approach to store data. One of the common examples is saving information as a key-value pair or as graph data. The idea is that a more unstructured and less restrictive model is easier to manipulate and to scale as needed.
As such, non-relational databases have grown exponentially in the last few years, and are becoming a trend in data science. There is still a long way to go before they compete in popularity with SQL solutions, but it’s undeniable that as data gathering grows they are becoming more and more useful.
Databases on AWS
AWS has extensive support for both SQL and NoSQL databases, offering a plethora of services from simple hosting to data science solutions.
For SQL solutions, Amazon offers Amazon Relational Database Service (RDS), a service that facilitates the process of setting up, maintaining, and scaling relational databases. It’s designed to work with the 5 most popular database engines on the market (PostgreSQL, MySQL, MariaDB, Oracle Database, and SQL Server) as well as its proprietary Amazon relational database service, Aurora.
RDS offers a series of tools that automate and facilitate tasks like automatic failover, backups, disaster recovery, monitoring, access management, and performance optimizations, either within the dashboard or via API calls.
Aurora, on the other hand, is a relational database service designed with cloud computing in mind. It’s a cheaper service than its counterparts, 5 times faster than MySQL, and fully compatible with MySQL and PostgreSQL, which makes migration a non-issue.
With Aurora Serverless, you can let Aurora handle the scaling for you. It starts up, shuts down, and scales capacity up or down depending on your needs. While other options are perfectly serviceable, it’s obvious that AWS pushes their clients towards trying and adopting Aurora as their relational solution.
Finally, before we talk a bit about NoSQL solutions, it’s important to consider Amazon Redshift, a database warehouse designed to run data analysis on relational databases. The basic idea is that the tools for analysis are ingrained in the service, so you don’t have to fetch the data to analyze it.
Amazon offers over 8 different solutions for non-relational databases, including Elasticache for Memcaches and Elasticache for Redis to handle in-memory databases, Amazon Neptune for graph databases, TimeStream for time series, QLDB for ledgers, among others. But let’s focus on their 2 most important services: DocumentDB and DynamoDB.
DocumentDB is a database service built for JSON data management at scale. It’s fully integrated with AWS and it can handle millions of requests per second without losing performance.
One of DocumentDB’s greatest assets is that it’s fully compatible with MongoDB. If you have a project that uses MongoDB you only have to change the database endpoint and DocumentDB will reply to the same calls without having to change a single line of code.
DynamoDB is Amazon’s proprietary Key-Value and document database solution. Like other cloud-based services, it’s serverless, which means that it can dynamically scale up and down as needed.
Another important point is that DynamoDB global tables replicate the data across multiple AWS Regions, so you can always access your data from the closest server available regardless of which part of the world you are in.
Which service to choose?
If you are going to be building a project from the ground up and you are sure that you are going to be working with AWS for the long haul, then, regardless of whether your database is relational or not, you should stick with Amazon’s proprietary solutions. They tend to be cheaper and have more support than the alternatives.
On the other hand, if you are looking to migrate to the cloud, or want to make sure that your database is as compatible as possible, then services like RDS or DocumentDB are your best bet. Either way, you can’t go wrong.