Forem: Diogo Souza da Silva

Technologies I learned and that I do not use

Diogo Souza da Silva — Wed, 01 Apr 2020 12:07:01 +0000

Context

I was reflecting the other day on the tools I know and, specially, the ones I do not know. That led me to think “why do I not know these tools by now?”.

One of the reasons is that I learnt other things, and is what lead to this text.

That might sound obvious, but I actually have to choose what to invest my time, and make trade offs.

Here I share some technologies I learned and invested and that did not really pay off, that I abandoned or that just made me take a long run.

These are not the most recent or older, most bad or good, these are just the ones I remember better and find most interesting to share.

JavaFX

Now, this was a bet. For those for do not know, JavaFX was a new UI toolkit for java. For desktop, that is.

Back when JavaFX came out it was already time for “rich web apps”, so people had already moved on from desktop apps, so it this was a risky shot at least.

What I found interesting about it was that it had a functional programing style (or, at least, function as first class elements), two-way binding of properties and the UI and heavy focus on components and composability. You could even style components with what resembled CSS.

That said, it was still java on desktop, it looked slow at that time, and people were already on the web. Apart from that, I had no desktop app to build.

Docker before v1.0

There was actually a turning point in docker where it became stable, sometime after v1.0.

I began investing in docker around v0.5 or something. At that time a lot of functions were missing, and docker-compose was either not existent or very limited.

Due to being so new and lacking functionality, I have written and adopted several wrappers around docker for that year to provided much needed resources.

It was interesting that at each release of docker or docker-compose I could remove one extra tool or remove on more script from my custom setup in favor of the official docker way.

Still this one payed off in the end, as docker is the industry standard, but being an early adopter was a long and hard run.

CouchDB

My favorite DB. CouchDB offered a schemaless, HTTP and Map-reduce enabled database that was very resilient and easy to use. It even included replication over http and a feed endpoint. It was really awesome.

I actually got to build a major system in my career on top of it, paired with Elasticsearch. But after that, it did no gain enough traction.

MongoDB basically killed it, becoming the go to nosql at that time. And the fact that CouchDB was slow, view building specially, did not help.

CouchDB is still improving, now includes search, clustering is easier and other goodies.

Now a days I mostly use PostgreSQL and Elasticsearch, paired with something from the cloud my system might be using (dynamodb, firestore…).

JS Games and Augmented Reality

Now this was just for fun, but I always go back and forth in learning something related to games in the browser. I find it really fascinating and I really liked to work with the Canvas on HTML5.

One of the experiments that I spent most time with was building AR experiences for mobile browser. That was hard, limited experience and very much fun.

This was obviously just a hobby, and it did take time away from learning tools that I would actually use on a daily basis. But it was worth it nevertheless.

Lua

My favorite language that I never used.

Lua is an easy, simple and small language that you can embed anywhere and that have implementation on every platform. Used a lot in games, in nginx by resty (and Kong, based on that) and who knows where else.

I do know it well enough to safely use it in production. Even right now I am testing Fennel, a lisp in lua. But I have never used it for anything serious. Maybe a kong plugin soon?

Conclusion

I do not regret learning any of those, they all added something interesting, I had fun playing with as hobby and some were really useful for some time.

Now, time to learn something new that might or might not stick.

Efficient Clojure multistage docker images, with java and native-image

Diogo Souza da Silva — Wed, 15 Jan 2020 17:47:55 +0000

Here I explore a few optimization when building docker images for your clojure apps.

Image versions

One easy way to make it faster for you local development and for CI/CD is to just use smaller images, and to reuse images.

Using common public images make it more likely that you will use the same image over and over again, also pinning to the most specific version help assure the base image have not changed between builds. Choosing alpine or slim images can reduce the image size.

For the base image I use clojure:openjdk-13-tools-deps-slim-buster and openjdk:13-slim-buster. I prefer buster images over alpine due to compatibility with most native libs, and rumor has it that due to libc versions it can be faster.

Build cache

The next step is to leverage the docker image build cache, so the order of the steps you use for building the image matter.

You generally want to set non-changing configurations like ENV, WORKING_DIR and EXPOSE first.

To decouple installing the deps from actually building the artifact, the next thing you add is your deps files and install it.

The code is what changes most, so it goes last, right before actually building the uberjar.

Garbage collector and heap size

Not optimization of images but a tip.

Current versions of openjdk support running in containers, so it is best to use relative memory limits and container support with -XX:+UseContainerSupport and -XX:MaxRAMPercentage=85 just we don`t have to mess with Xmx or Xms at runtime anymore.

Remember that the JVM uses a little more memory than the heap, so give it some extra space.

Multistage

To further reduce the image size and remove clutter from base image, we can start with a JVM only image and copy the generated jar over. This will remove clojure specific tools, sources-code, intermediate artifacts and others.

Resulting Dockerfile

After applying these tips, here is the resulting Dockerfile:

Native image

Now, for bonus, we can also setup a native image using graalvm tools. This will reduce image size by a lot, as it does not depend on the JVM, and potentially reduce memory usage.

Note that native-image is only compatible with linux x86-64, and is new tech, so a lot o frameworks can break it. It also does not give better performance (latency, throughput, gc times…) comparing to JVM version.
Some flags may change depending on your tools of choice.

The native image will build upon the previous tips, but has to be based of java 11 instead of latest.

Here is the dockerfile:

Note that there is a lot of netty specific config there, as I use aleph for HTTP.

Opensource full example

All this experiments and other framework choices are available at my github klj-api project.

Hope it helped.

An incomplete comparison of geospatial file formats

Diogo Souza da Silva — Wed, 17 Oct 2018 11:46:01 +0000

A few years ago I was working with geospatial analysis and got to study and compare a few alternative file formats for storing and, especially, transferring such data.

I worked mainly with vector data (polygons and points) and served that for the web. The motivation of this work was to test the gains and cost of TopoJSON, and how it fared in size (important for transfer time) and encoding time (important for server resource usage).

A more complete description of advantages of the available formats can be found at Shapefile must die! , a good resource even if a bit strong on words (FYI: As a developer, I really dislike working with shapefile).

Even if I never got to finish it, I guess I can share the results anyway.

The contender file formats

First, let`s make some groups. Here we will compare three things:

Data structure
Format encoding
Compression

For data structure, I am are comparing how data is organized inside the file:

Shape
GeoJSON structure
TopoJSON structure

For format encoding , I am comparing how this structure is serialized:

Binary (shapefile)
JSON
MessagePack

For compression , I am comparing compression algorithms:

None
Deflate/GZIP
XZ/LZMA

The choices of what to test were made based on tooling availability on programing language(Java/Clojure) and ease of use from the web(Javascript).

A few notes of formats not tested:

CSV was not tested cause it is not as standard as it looks with several combinations of separators, enclosing and record separator. And also would have to compare WKT and WKB and other geometry encodings. But it would probably fare well as it can be streamed and well compressed.

Spatialite is a bit more complex to handle, as you will not only need an SQLite library but with extensions for Spatialite. Also would have to define table structure and such.

Given more time I would include more tests on both.

Shapefile was only tested as a baseline.

Overall results

The code and results can be found at my github.

| Structure | Format | Compress | Size | Time | |-----------|---------|----------|-------|------| | Shapefile | - | - | 5MB | - | | Shapefile | - | zip | 3.2MB | - | | Geo | JSON | - | 9MB | 10s | | Geo | JSON | gz | 2.5MB | 11s | | Geo | JSON | xz | 1.4MB | 21s | | Geo | MsgPack | - | 5.2MB | 9s | | Geo | MsgPack | gz | 3.5MB | 11s | | Geo | MsgPack | xz | 1.7MB | 15s | | Topo | JSON | - | 524KB | 22s | | Topo | JSON | gz | 84KB | 20s | | Topo | JSON | xz | 64KB | 22s | | Topo | MsgPack | - | 256KB | 21s | | Topo | MsgPack | gz | 76KB | 20s | | Topo | MsgPack | xz | 60KB | 22s |

Shapefiles are the baseline, they do not compress very well.

As expected raw GeoJSON files are huge, but as text files, they compress very well, and are reasonably fast to encode.

TopoJSON files are minimal in size but take a long time to encode. Also not included in this test is the fact that topology encoding takes a lot of memory, as it has to hold the whole collection to iterate over it.

MessagePack as a format offers reasonable space efficiency being a binary format and encode faster. They add more complexity to the web and lose most gains after compression. They are faster to read/write on the server but slower on the browser.

Deflate/GZ offer expected compression results. They are standard on the web which makes it an easy choice, you server already have them, and so does the browser.

LZMA/XZ is a bit harder to use on the browser, but it is able to deliver even more.

A few conclusions

This test is incomplete and you should run your own on your set of data to get more practical results to your reality.

But here is my take on it:

Shapefiles sucks cause they are a lot of files and with several limitations
If nothing else, at least enable DEFLATE on your geojson serving
TopoJSON is complex to deal with and expensive to encode
TopoJSON offers insane compaction, specially on polygons with a lot of shared lines
MsgPack offers nice compaction over text, but most of that is lost over compression
LZMA/XZ adds a little complexity but it gave good gains on bigger files

So, if you can afford to encode only once with resources to spare: TopoJSON with XZ gives the most value. If you have to encode/decode on the fly: GeoJSON with XZ. If you can spare disk, offer both: TopoJSON and GeoJSON with XZ and GZ.

Terraform workspaces and locals for environment separation

Diogo Souza da Silva — Mon, 11 Sep 2017 12:01:01 +0000

Terraform is this amazing tool to provision and manage changes on your cloud infrastructure while following the great practice of keeping your infrastructure-as-code.

One common need on infrastructure management is to build multiple environments, such as testing and production, with mostly the same setup but keeping a few variables different, like networking and sizing.

The first tool help us with that is terraform workspaces. Previously called environments, it allows you to create different and independent states on the same configuration. And as it’s compatible with remote backend this workspaces are shared with your team.

As an example, let’s work with the following simple infrastructure:

provider "aws" {
 region= "us-east-1"
}

resource "aws\_instance" "my\_service" {
 ami="ami-7b4d7900"
 instance\_type="t2.micro"
}

Now we have defined a single aws ec2 instance andterraform apply will have your testing server up.

But that is only one environment, in this simple example one might think that it would be ok to simple replicate and call one service “testing_my_service” and the other “prod_my_server”, but this approach will quickly lead to confusion as your setup grows in complexity and more resources are added.

What you can do instead is use workspaces to separate them.

terraform workspace new production

With this, you are now in production workspace. This one will have the same configuration, since we are in the same folder and module of terraform, but nothing created. Thus, if you terraform apply it will create another server with the same configuration but not changing the previous workspace.

To go back to testing you can terraform workspace select default , since we are using default as the testing environment to make sure we are not working on production by mistake.

But, obviously, there are differences between testing and production, and the first approach would be to use variables and an if switch on the resources. Instead a better approach would be to use the recently introduced terraform locals to keep resources lean of logic:

provider "aws" {
 region= "us-east-1"
}

locals {
 env="${terraform.workspace}"

counts = {
 "default"=1
 "production"=3
 }

instances = {
 "default"="t2.micro"
 "production"="t4.large"
 }

instance\_type="${lookup(local.instances,local.env)}"
 count="${lookup(local.counts,local.env)}"
}

resource "aws\_instance" "my\_service" {
 ami="[ami-7b4d7900](https://console.aws.amazon.com/ec2/home?region=us-east-1#launchAmi=ami-7b4d7900)"
 instance\_type="${local.instance\_type}"
 count="${local.count}"
}

The main difference from variables is that locals can have logic in them instead of in the resources, while variables allow only values and pushed the logic into the resources.

One thing to keep in mind is that terraform is rapidly evolving and is worth to keep an eye on it’s changes to make sure you making the most of it.