Forem: DavidCockerill

What is Geo Redundancy?

DavidCockerill — Tue, 30 Mar 2021 14:54:51 +0000

Let's talk geo-redundancy! Ever heard of it? There are many benefits to redundancy, and it is important for IT teams and general organizations to utilize/implement redundancy methodologies as a safety net. Having these systems in place can improve reliability and availability while reducing downtime. This is because there are certain variables that businesses rely on that are not guaranteed and therefore might need a backup plan. For example; Internet, hardware, power, and/or data storage. Many of these variables can affect a large area. They may not happen often, but one power failure that takes your application offline is one too many. Geo redundancy can reduce or remove these risks, and better prepare organizations to handle disaster recovery.

As the name implies, geo-redundancy refers to the practice of providing redundancy (extra or duplicates) through physically separating infrastructure across multiple geographical locations, and because I work for a database company, I will be discussing it in relation to databases.

Geo redundancy is a powerful (and somewhat magical) force that ensures high availability and disaster recovery. It will replicate your data and store it in other databases located in separate physical locations. It does this so that if a location fails or simply needs to be taken offline, your other location, which also stores your data, will not be affected.

Geo redundancy is super easy to implement in HarperDB through its clustering engine, which replicates data between instances of HarperDB using a highly performant, bi-directional pub/sub-model. The first step is to install HarperDB in multiple geographical locations. A single instance/installation of HarperDB constitutes a node. Once HarperDB node subscriptions are configured via the API they establish a WebSocket connection between each other to replicate data. When two or more nodes are subscribed to each other you have a cluster. Depending on how the nodes were subscribed to each other, a transaction on one node can automatically be published to another. And there you have it, the start of geo-redundancy!

In this case, HarperDB provides a “backup plan” to ensure that your organization is prepared for even the most unpredictable outcomes. The term redundancy means “too much” or “more than is needed.” This of course is not necessary or helpful in all situations of life or business, but when it comes to the world of data, IT, and engineering, you can bet your bottom dollar that you will regret not having a redundancy plan in place. If you fear that implementing redundancy is a waste of time, just think, what would happen if you lost your data? HarperDB enables you to implement geo-redundancy in a simple and cost-effective manner, and you can now sleep easy knowing your data is safely stored across the globe.

Strict Schema Enforcement vs. Schemaless vs. Dynamic Schema

DavidCockerill — Thu, 17 Dec 2020 16:01:10 +0000

The debate over whether to use a schema or not has passionate support on both sides. One side appreciates data integrity constraints and predictability, while the other prefers more flexibility (or “agility”) and time effectiveness. The ultimate answer as to which is “better” most likely depends on the specific project, data used, and associated skill set.

In this post I will cover strict schema enforcement, schemaless, and dynamic schema, including the pros and cons of each one.

Strict Schema

A schema is a blueprint of how a database is constructed. It doesn’t actually hold the data, but instead describes the shape of the data and how it might relate to other tables in the database. Schema’s contain information on all the objects in a database such as tables, attributes, data types and relationships, it can also include triggers, views, indexes and so on. Some common databases that use strict schemas are Oracle, MS SQL Server and PostgreSQL.

Pros

Gives a high level view of the structure and relationship of the tables in your database. Can make it easier to keep track of what information is and is not in the database.
Enforces data integrity constraints, these are a set of rules that maintain consistent formatting of all entries.
More predictable, which can provide a more efficient storage and indexing structure.

Cons:

Takes time to design and build when starting a new project. Modifying the schema can be tricky. Can be a lot of work to maintain.
Rigid limits, not flexible.

Schemaless

As the name implies, schemaless does not use a schema. It means the database does not have any fixed structure. A schemaless database does not enforce any data type limitations and can store structured and unstructured data. Some common schemaless databases are MongoDB, CouchDB, and Google Cloud Datastore.

Pros:

Quick and easy to setup because there is no schema to model or additional layers required, so the complexity is greatly reduced. With just a few clicks a developer can have a working database.
Updates can be made on the fly without having to make changes to a schema or shutting the database down.
More flexibility when storing data. You don’t need to decide up front what you’re going to store, how it’s structured or related to other information in the database.
Less overhead, which can lead to better performance and scalability.

Cons:

No columns means the application has to parse every document to find requested data.
No unified metadata, you end up looking at the application to understand the data rather than having that information.
No control over the data, you may be receiving garbage, but you don’t have any filters so bad data gets loaded either way. Data filters are pushed out to the application layer.

Dynamic Schema

What many claim as the best of both worlds, a dynamic schema is one that changes as you add data. There is no need to define the schema beforehand. When data is inserted, updated, or removed, the database builds a schema dynamically. Popular dynamic schema databases include HarperDB and MongoDB.

Pros:

Easy to set up, requires no input from the user.
Provides the structure that comes with a schema, which equals a more efficient storage and indexing model.
Doesn’t force data constraints, can ingest unstructured data.
Flexible to develop with as the data model can easily evolve over time.
Can handle semistructured data.

Cons:

No data enforcement means developers must ensure data adheres to the data model.
Data model can get messy if proper processes are not followed

As you can see, there are valid points on each side of the argument and numerous factors to considering when choosing which is right for your specific project. At the end of the day, this decision has a lot to do with the preference of the user and long term project goals. For example, at HarperDB, we are big fans of the dynamic schema, which enables us to ingest any type of data at scale. HarperDB frees you from the hassle of defining data types, providing unlimited flexibility as your applications evolve and scale over time. Which type of schema do you prefer?

While it may not be top of mind, it’s important to get your schema right upfront to avoid unnecessary headaches and additional time and costs later on. Foundation is key, and it’s much more difficult to go back and change that foundation once you’ve actually built on top of it. Take the time to weigh the pros and cons of strict schema enforcement vs. schemaless vs. dynamic schema before you start building, you won’t regret it.

How we Build and Deploy our Serverless HarperDB Studio Backend Application

DavidCockerill — Tue, 22 Sep 2020 16:00:49 +0000

The HarperDB Management Studio is a graphical user interface that enables users to install, design, cluster, and manage databases without having to write a line of code. The Studio’s backend is a serverless application that utilizes AWS Lambda and Amazon API Gateway. To build, test, debug and deploy this application we use the AWS Serverless Application Model (SAM), not to be confused with the ace HarperDB engineer named Sam!

SAM is an open-source framework that is used to build serverless applications on AWS. It is an extension of AWS CloudFormation. With just a few lines per resource, you can define the application you want and model it using YAML (a human-readable data-serialization language). During deployment, SAM transforms and expands the SAM syntax into AWS CloudFormation syntax and provisions all the AWS resources we need for the HarperDB Studio.

We decided to go serverless to shift more of our operational responsibilities to AWS. It allows for quick development, seamless scalability and reduced overhead. Our shift to serverless became a lot easier with SAM. SAM offers a single deployment configuration which makes it easy to organize related components and resources, and operate on a single stack. Our application has 30+ Lambdas, all integrated with API Gateway, along with multiple Lambda layers, Step Functions and SNS topics. We can build and deploy our complete application in around ten minutes overall.

The SAM CLI is an essential tool when working with SAM, it provides an easy way to test and debug your application locally. We use the CLI in conjunction with the AWS toolkit for JetBrains WebStorm, the IDE we predominantly use. It provides a Lambda-like execution environment locally, which helps us test the functions before we deploy them. It also allows us to step through and debug our code to understand what the code is doing, as you would with any other type of project.

I’ll run through a few points that we used to get our SAM project setup with our environment (not all of the following steps are essential).

Install Requirements

Install and setup the SAM CLI

https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/serverless-sam-cli-install.html

Install Docker for local testing

https://docs.docker.com/get-docker/

Pull the lambci lambda image for Docker

https://hub.docker.com/r/lambci/lambda/

Install and setup the WebStorm AWS toolkit

https://docs.aws.amazon.com/toolkit-for-jetbrains/latest/userguide/key-tasks.html#key-tasks-install

Install the AWS CloudFormation plugin

https://plugins.jetbrains.com/plugin/7371-aws-cloudformation

Install a YAML parser, I’m using this one

https://plugins.jetbrains.com/plugin/7792-yaml-ansible-support

If you haven’t already, download and run the the sample application

https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/serverless-getting-started-hello-world.html

Local Testing With WebStorm

Edit the Run/Debug Configurations and click on the Templates tab. With the AWS toolkit install you should see an AWS Lambda template tab. Click on that and then select Local.
In the template make sure it is pointing to your YAML file, by default it is called template.yaml. To the right of the template path, select the Function Name you are testing. If the function name doesn’t come up something is likely wrong with your template. Sometimes I need to save this setup, close and reopen for it to appear in the dropdown. The input is what is passed to the Lambda when it is invoked, you must have a value here.

If everything has been setup correctly (which it often isn’t, this process can be convoluted) you should be able to run your Lambda as you would any other project in WebStorm. Have it selected in the Run/Debug Configuration and press run or debug.

Deploying the Application

While there are multiple ways to build and deploy the application, this is how we do it at HarperDB. Find out more here - https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/serverless-sam-cli-command-reference.html

Build the application

sam build

Package the application. It creates a ZIP file of your code and dependencies, and uploads it to Amazon S3.

sam package --s3-bucket my-s3-bucket --output-template-file packaged.yaml
Deploy the application. I suggest adding option -g for guided the first time your deploy your new application.

sam deploy --template-file packaged.yaml --region us-east-2

Working with SAM can be a little confusing initially, it took me a while to get my head around adding resources to the template file (I found this document helpful). However if you stick at it I’m sure you’ll find great value. If you're curious to learn more about other aspects of HarperDB, feel free to check out our architectural overview, learn more about the product, or spin up a free instance!