Forem: André Diego Piske

Ruby cryptographic gems

André Diego Piske — Sat, 04 Jun 2022 08:40:27 +0000

I'm not an expert in cryptography — I'm just a developer, and most developers are in this same boat of not being experts in cryptography.
This doesn't mean we're negligent by not studying this field, this just means it isn't our area of expertise.
What is negligent, however, is for someone like us to roll out their own cryptographic algorithms or libraries without knowing what they're doing.

Thus, never roll your own cryptography unless you absolutely know what you're doing. Because of that, I want to explore some cryptography gems that awesome people brought to the Ruby ecosystem.

bcrypt

The first one is bcrypt. It dals with a very common use case: storing user passwords in databases and checking them when signing a user in.

Here is some code demonstrating its usage. Explanation comes below.

The user_input variable is some password the user has input. Perhaps through a sign in form or through a account creation form. Let's assume the latter scenario for now.

So, the user is creating an account and they submitted the password they wish to use later to sign in. How to store such password in the database?

pwd = BCrypt::Password.create(user_input)

It is this simple. Now, just store pwd.to_s in a text field for the user password.
If you check the contents of pwd.to_s, you'll notice it looks something like
$2a$12$ccaJDXyKniehBeYgZM4wDOl91.zctTI03qPhOlDGVk5KZ1qcC9Hge. The value for you will likely be different, even though the password is the same, because there is a salt in the mix. Just to be clear: that funny string is the hashed version of the password.

Now, what to do with that funny string? You use it when the user is signing in, to check whether they typed the correct password. Remember that what will be stored in the "password" field in the database will be that funny value. So let's say a user is trying to sign in. They provide you their email address and the password. With the email address, you can fetch the entity in the database that has their hashed password (the funny string).

It's then just a matter of comparing the hashed password to what the user provided in the sign in form:

# User provided this in the sign in form
user_input = '9fn837nf'

# The hashed password as stored in your database
pwd_in_database = '$2a$12$ccaJDXyKniehBeYgZM4wDOl91.zctTI03qPhOlDGVk5KZ1qcC9Hge'

pwd = BCrypt::Password.new(pwd_in_database)
if pwd == user_input
  puts('user sign in successfully')
else
  puts('wrong password')
end

rbnacl

The other gem I want to explore is rbnacl.
This gem provides general purpose cryptography for many different scenarios and algorithms. They do so in a simplified way so that mortals like us don't have to become cryptography experts. Check out these docs to see what I'm talking about!

Symmetric encryption

Say you want to have a key that allows to encrypt some data and later to decrypt that data. It's easy as:

# Generates a random key with the correct length
key = RbNaCl::Random.random_bytes(RbNaCl::SecretBox.key_bytes)

sbox = RbNaCl::SimpleBox.from_secret_key(key)
encrypted_data = sbox.encrypt('hello world!')

You'll have to store the key somewhere safe. Note that it is a binary string, so you might want to base64-encode it to send it around. I generated the key that when base64-encoded is: nveFTikebVaqd4SMzCZ5P7BAKv6BeNwAqowTFkJbHjY=.

The encrypted_data is also binary data. Encoding it in base64, it turns into:

LbmIxONTUZXQGzGQwK1gB29H0OxoS8bn0GE/QAzJXt/8K9WNFnaz6XX3F0ecFwZRb9mm9Q==

Now, to decrypt the data, one only needs that key and the encrypted payload. Let's take a look:

require 'rbnacl'
require 'base64'

encrypted_payload = Base64.decode64(
  'LbmIxONTUZXQGzGQwK1gB29H0OxoS8bn0GE/QAzJXt/8K9WNFnaz6XX3F0ecFwZRb9mm9Q==')
key = Base64.decode64('nveFTikebVaqd4SMzCZ5P7BAKv6BeNwAqowTFkJbHjY=')

sbox = RbNaCl::SimpleBox.from_secret_key(key)
data = sbox.decrypt(encrypted_payload)

puts(data)

This script will print hello world!, which was the original message.

I didn't use replit here because rbnacl requires libsodium to be installed in the OS and I didn't manage to get that installed in replit. The folks at rbnacl provide some help on how to install that.

That's a very simple way of encrypting data without having much knowledge about cryptography. The gem also provides ways to have more control on the encryption, like choosing the encryption algorithm. But then you'll have to dig by yourself in the documentation — which by the way are just great.

If you know other gems on cryptography for Ruby, drop a comment!

Cover image by Markus Spiske on Unsplash.

I'm creating Lets Code videos! [feedback wanted]

André Diego Piske — Sat, 20 Feb 2021 08:52:46 +0000

I came to this: I will create content for programmers (that's you!)

I want to do relevant content for those who either want to learn software development from scratch or improve their skills. I've been in the programming field for 10+ years and it's already time to share some knowledge.

Here is something I've been working on:

Next video on that series goes online later today 💪

👉 Would very much like to hear your feedback!

My content focus on technologies I know very well and have been working with in the last few years: Ruby and Javascript for programming languages, a backend-ish full-stack career and devops.

Let me know what you think!

How profiling my slow Ruby code led me to publish my first gem!

André Diego Piske — Tue, 24 Mar 2020 18:15:16 +0000

My current side project is a HTTP/2 server, fully written in Ruby. This protocol is a major improvement in comparison to its antecessor, HTTP/1.1, but it also brings along a much higher complexity.

The major change is in the way data is transported over the wire. For instance, when downloading a file through http/1.1, after the headers were exchanged, a server only has to read the file contents from disk and forward them to the client, without any processing involved.

This is not true in http/2. Instead, a server has to send the file contents in chunks. Every one of those chunks has to be wrapped in what the protocol specification calls a frame.

A frame is composed of a header and a payload. Here is a depiction of a frame in HTTP/2, borrowed straight from the spec:

 +-----------------------------------------------+
 |                 Length (24)                   |
 +---------------+---------------+---------------+
 |   Type (8)    |   Flags (8)   |
 +-+-------------+---------------+-------------------------------+
 |R|                 Stream Identifier (31)                      |
 +=+=============================================================+
 |                   Frame Payload (0...)                      ...
 +---------------------------------------------------------------+

That is all binary data. That Length (24) there is a 24 bits long field, and it stores length of the payload inside the frame.

Where it starts

That binary thing, dealing with bits and bytes, that is where the problem starts.

That picture depicts how slow my server was before I could optimize it. The URL points to localhost, the picture shows a file being downloaded at 4.4 MB/s. Such speed is painfully slow, it should be instantaneous! Downloading a file from a localhost server that is serving only one request should be really fast.

During the file download, the server was showing 100% CPU usage. This gave the insight that the reason of the bad performance could be the fact that ruby was really busy doing stuff. But why so busy? Doing what?

To the profiler!

The best way to know what keeps a program busy is to get to the data, and getting the data means profiling the code. Fortunately, and thanks to some very nice people out there putting effort writing up that gem, ruby has stackprof.

Setting up the gem is very straightforward. I added a StackProf.start call before the code I was suspecting to be slow and also a StackProf.stop after it. This is the code:

def iterate
  StackProf.start(mode: :cpu) if @config.enable_profiling?

  writables.each &:notify_writeable
  readables.each &:notify_readable
  new_clients.each do |io|
    accept_client_connection(io)
  end

  StackProf.stop if @config.enable_profiling?
end

The iterate method above is run multiple times in a loop that only stops when the server is shut down. I only wanted to profile that portion, because that's where most of the stuff happens.

I wanted to be able to run the server with and without the profiler, so I made it configurable via those @config.enable_profiling? probes.

I also had to add some code to finish up the profiler. I added that in the code that calls the method above. It looks like this:

def run_forever
  loop do
    iterate

    if @signaled
      Debug.info 'Signaled, exiting...'

      if @config.enable_profiling?
        Debug.info 'Dumping profile info...'

        StackProf.results @config.profiling_result_file

        Debug.info 'Done'
      end

      return
    end
  end
end

The run_forever method above runs the loop that only finishes once the server is shut down. When the @signaled variable evaluates to true, the server has to shut down. The important piece of code is the line that calls StackProf.results. It will dump all the profiler data collected so far to a file.

Then I ran the server, fired some downloads using curl and stopped the server. The profiler results were now dumped into a file. To visualize them, there is another awesome ruby gem called stackprof-webnav. It provides a command to view the profiler results. The following installs it and gets it running:

$ gem install stackprof-webnav
$ stackprof-webnav

That has to be run in the same folder where the profiler result files are stored. It opens a web server and then there is a web UI to view the profiler results.

Finding the bottleneck

Have done the first run with the profiler on, the results revealed the following:

This was a profile run of that file download operation. And that has a lot to say: 42% of the time just running String.unpack! It's almost half of everything that is being done there.

It struck me. Since HTTP/2 is a binary based protocol, the server needs to deal with bitwise operations to deliver data. Remember how I told before that every chunk of data has to be wrapped in frames?

There are two important things there: chunk and wrapped. The http/2 spec dictates that the maximum size of a frame is only 16 KiB long. This means that every 16 KiB of data has to be wrapped into a frame. That is a lot of processing involved.

Wrapping the chunk is a simple but repetitive operation. And repetitions, namely loops, are the Achilles' heel of interpreted languages like Ruby. If the frames could be larger, it would be possible to wrap it less often, which would more easily allow for a higher performance.

Nevertheless, here is the code of that slowest method call, the BitWriter#write_bytes:

def write_bytes value
  value = value.unpack('C*') if value.is_a?(String)
  @buffer += value
  @cursor += value.length
end

The BitWriter class is responsible for writing binary data into http/2 frames. It has methods to write bytes, integers and strings. The write_bytes method just writes an array of bytes, so it only has to copy the data.

The BitWriter stores all the data internally as an Array of which element is a number from 0 to 255 - thus, a byte. The write_bytes method accepts either an array of bytes as input or a String. For the latter case, it has to first convert it to an array of numbers (bytes). That conversion should be a very simple operation. But that doesn't seem to be what the data is showing there.

My first suspect is the String#unpack, because I knew that most of the calls were actually passing a String to the method. Performing a quick benchmark of the String#unpack method, I concluded it was in fact much more slower than I had anticipated. You can check it yourself how it performs, just hit the Run button below:

That took 6.8 seconds to run on my machine. And that's only 10 thousand times!

Going back to the benchmark results above, the second most busy method call was in OpenSSL's write_nonblock method. That is the method that actually sends the data to the client. However that is a third party library that I have no control over, so I skipped it and went to look after the bytes_array method of the BitWriter class.

This class of mine was causing trouble. Two methods of it together are using almost 70% of the CPU time alone. And it shocks me again, since the code for the bytes_array method is even simpler than the other one. It is exactly this:

def bytes_array
  bytes.pack('C*')
end

Here, bytes is just an alias to the variable that stores the internal bytes. It is an Array of numbers. I just had found out that the unpack method is very slow, and now I'm about to find out that pack is also very slow. Doing a benchmark in the same manner as before:

This one took about 7.3 seconds on my machine. Again, very slow.

I tried playing around with different ways of achieving the same results. I even went to look into Ruby's source code to check out how that was implemented to why it was so slow. Hint: Reading ruby's source code in C is not that easy. So I went to look for another solution.

The solution

I had played with buffers in JavaScript in the past using classes like the DataView and ArrayBuffer.

Then I thought to myself: What if I had those classes in Ruby?

I went to search for something like that, but couldn't find it. So I thought I should do it myself. Finding out that there was no such thing for Ruby was an additional motivation for me, since I'd be filling a gap!

There was another thing that motivated me. It had been quite a while since I wanted to learn how to integrate Ruby code with C code. That was it. I could make it in C, which would be great for achieving the performance needs, and use it in Ruby!

And that is how I launched the arraybuffer gem:

andrepiske / rb-arraybuffer

Low level byte operators and buffers for Ruby

arraybuffer gem

What?

Ruby lacks classes like array buffer or byte array.

This gem aims to solve that by implementing that natively in C so that a decent performance can be achieved.

The design of the classes follows standards that are implemented in the Web world, like:

The standards above are, however, not strictly followed.

Why?

The only way I found to do this was manipulating an array of numbers treating them as bytes and using buffer.pack('C*') to transform it into a String object. However, for some reason the Array#pack method is painfully slow.

Other gems exist out there, like the class ByteBuffer class in nio4r. However, they didn't had a design I was satisfied with or they had different purposes that wouldn't suit my use case :)

What about JRuby?

Feel free to open a pull request for that!

View on GitHub

After changing all of the BitWriter code to use the arraybuffer gem with the ArrayBuffer and DataView classes, I went to profile it again. The result was really encouraging:

Amazing! Now the next challenging bottleneck is the OpenSSL writing operation. I have no idea on how to optimize that yet, but will surely look for a way. Coming in second and third places is the mark-and-sweep garbage collector in action. My code is probably producing too much trash. That will also be challenging to profile and optimize. I hope that renders me enough material to write a post!

Building my next HTTP server, part 2

André Diego Piske — Tue, 18 Feb 2020 20:14:49 +0000

This is the second post of a series about my HTTP server

On my first post of this series I explained how the plan for my HTTP server is for it to be asynchronous, so it makes the best use of I/O and CPU at the same time.

In order to be asynchronous, the problem I have to solve is that a call to TCPServer#accept or to TCPServer#read is a blocking call. This means that the method call will block the execution flow until there is something to be read from the other side.

If there are two clients connected to the server and one of the clients gets stuck, it may block the whole server. A single client must never be able to block a whole server!

There are a few technologies that can be used to solve this issue. But all of them have the same basic idea behind them. If we think about a blocking read operation, we can break it down into two pieces. Let's take those pieces to picture what could be the implementation of the TCPServer#read method:

class TCPServer
  def read(length)
    wait_until_bytes_available(length)
    read_available_bytes(length)
  end
end

Of course those two methods being called are fictitious, only for the purpose of illustrating. The idea is that the wait_until_bytes_available method waits until length amount of bytes are available to be read from the wire. This is where the actual blocking occurs, as this method will only return when there is enough bytes to be read. If the client never sends more data, the method would not return. In reality, the method would return with an error state if the connection is broken.

After that waiting, the read_available_bytes method call then does the actual reading. It reads length amount of bytes from the wire without blocking and then returns those bytes.

Now, in order to have it working asynchronously, it's just a matter of removing the waiting part. That is done by removing the wait_until_bytes_available method call. Now we're only left with the actual reading method. And ruby, in fact, has a method just for that. It's the IO#read_nonblock

The #read_nonblock method call takes one argument that is how many bytes should be read at maximum. That is, the method will never read more bytes than what was passed in the argument, but it can read less than it in case there just isn't enough bytes available to be read.

Now, the read method is not the only one that will block. We also have to solve the issue with the accept method. And that is very easy, because Ruby has the TCPServer#accept_nonblock method! Also, which will be needed later, there is the non-blocking counterpart for the write method, which is the, you guessed it, IO#write_nonblock.

Those *_nonblock methods are the way to go from here. But they introduce a lot of other difficulties to be dealt with. For instance, what should be done if there is nothing to read from the wire right now? That doesn't mean there won't be anything in the near future. Also, since the read_nonblock method can read less bytes than specified in its argument, this would mean that the data can be received in small pieces. How to manage those pieces and process them together afterwards?

I intend to cover those issues in the next post of this series, so see you there!

Building my next HTTP server

André Diego Piske — Wed, 20 Nov 2019 20:27:39 +0000

This is my first post in what I intend to be a series. It's based on my experience of building an HTTP/(1.1|2) server in Ruby.

So, I decided to hop into my next side project. I wanted to create a simple thing. But I wanted to make that thing not in the traditional way that I already know how to, but in a different one. I wanted to do a simple thing using a different approach from what I would normally use.

What about an HTTP server? Seems like a good idea. I already know the nuts and bolts of the protocol, already did it in the past, so it would, in a sense, be a simple thing. You know, it's simple: open a TCP socket and wait for someone to perform a request. Once they're there, process the request and give back a nice response. Then, unless keep-alive is specified, close the client socket.

So far so good. The code would look something like this:

require "socket"

server = TCPServer.new 3000
loop do
  client = server.accept
  deal_with_http_stuff client
  client.close
end

(Oh, by the way, I'm using Ruby. It's kinda my favorite language novadays.)

Then let the magic function deal_with_http_stuff just read the HTTP headers and send() back the result to the client.

Now, the deal_with_http_stuff function can be fairly simple. It can also get farily complex, like when doing some FastCGI stuff or acting as a reverse proxy. Despite those, that's not where the "different approach" that I'm looking for would be. There is a different kind of complexity that can be added, and that is where I would like to experiment with different approach.

But before diving that way, let's quickly analyze the solution above. Upon the connection of a client, the next steps are to receive the request information, process it and send back the results. It seems simple at first, but there already are some immediate issues.

The first one is that while one request is being read, processed and then having the response be sent back to the client, other requests will have to wait in the line. That is, this server can only process one request at a time. It lacks any parallelism or concurrency capabilities.

A common and relatively easy solution for this would be to delegate the deal_with_http_stuff processing to a separate processing thread or even a different process (by using fork). The threaded version of the code could look a bit like this:

require "socket"

server = TCPServer.new 3000
loop do
  client = server.accept
  Thread.new do
    deal_with_http_stuff client
    client.close
  end
end

The fork approach would look similar to that. But those approaches have their own issues as well. Also, that is not the way I want to go. I have been there in the past already and I'm looking for a different kind of fun!

Enter asynchronicity

As you may already be familiar with, nginx is a well known HTTP server out there. Somewhere in its wiki page, it's written that:

Unlike traditional servers, NGINX doesn’t rely on threads to handle requests. Instead it uses a much more scalable event-driven (asynchronous) architecture.

Right there. That is the approach I want to go with: asynchronous architecture.

But how is that then different from the threaded approach? Because threads are asynchronous citizens, aren't they? In fact they do. One could argue, though, that, in MRI ruby implementation, threads are not really asynchronous, since MRI uses only one CPU core (in contrast to, for instance, JRuby, which does uses real threads offered by the operating system).

It is exactly there where the difference lies upon. The threaded approach depends on the level of operating system and CPU to manage the asynchronicity. The same applies to the fork approach. The developer writes code just as if everything was synchronous and the OS in partnership with the CPU are the ones doing the heavy lifting of scheduling different threads at different times to different CPU cores.

The approach I want to use, which is what nginx uses, sort of moves that heavy lifting into the server itself. It is then the developer of the server the one who has to deal with the asynchronicity at all times (and by all times, I do mean it!) This way, it is possible to serve multiple requests simultaneously using only one real thread -- that is, a single CPU core.

Two things

The approach nginx uses, which I want to use, is really about two things: CPU and I/O. The concurrency that nginx uses, what it really addresses, is the fact that with just regular programming, either one of those are busy while the other is kept free.

For instance, reading a file from disk means telling the disk driver to fetch the contents of that file and then waiting for it to return the file contents. It takes time. Likewise, sending a package of data to the network means waiting for the network driver to do all electrical signaling and stuff required to deliver that package. Setting a timer, like in Javascript's setTimeout function, means just waiting for a certain amount of time to pass without even doing something useful at all.

The issue then is that while some I/O is being performed, the CPU is not being used. By doing the following:

content = File.read("/tmp/foo")

data = JSON.load(content)

puts(data['foobar'])

the first statement makes the disk busy and frees the CPU from doing any work. It asks the disk for the file data and waits for it to finish reading all the file before proceeding to the next line of code. Once finished, the data is transferred to the JSON module for it to parse the string in memory, producing ruby objects along the way. The parsing itself is, of course, a CPU intensive and I/O-free task. The last statement just prints the data. But pay attention here, because that puts instruction won't return until the statement has been printed, which also means waiting for the I/O to finish.

What if we could just minimize the total amount of time that both CPU and I/O are free? That is, to keep them busy as possible. When CPU is already being used to the max, do some I/O. When I/O is being used to the max, do some CPU intensive tasks.

This is exactly my plan, and I hope to explain how that works in future posts.