Forem: Exein

Pulsar preventing vulnerabilities #1 — polkit (CVE-2021–4034)

Exein — Thu, 29 Aug 2024 12:44:22 +0000

Pulsar is a security framework. Its primal goal is to prevent actual attacks in your IoT environment. One part of achieving that are generic rules which might help with detecting suspicious activity in your system, but that’s only one part of what we are focusing on. The other important part is studying known vulnerabilities in order to prevent them.

In this series of articles we are going to show Pulsar on the battlefield — how it can be used to prevent actual known vulnerabilities.

Understanding CVE-2021–4034

Introducing PwnKit

CVE-2021–4034, also known as “PwnKit,” is a security vulnerability discovered in polkit, a system service installed by default on many Linux distributions. Polkit (formerly PolicyKit) is used to manage system-wide privileges in Unix-like operating systems.

To be precise, the vulnerability specifically affects pkexec, a program that is part of the polkit system. pkexec allows an authorized user to execute commands as another user, including the superuser (root). It’s responsible for communicating with polkit and making a verdict whether the user executing the command should be able to do it, even though that user normally cannot execute the wrapped command.

You can think of pkexec of being somehow similar to sudo, although the difference is that sudo simply allows some groups of users (or some specific users) to authenticate as root just by using their passwords, while polkit has more complex rules which define what and under what circumstance can be done.

A common rule used in most graphical Linux installations is mounting external drives. Normally, mounting new filesystems with the mount syscall, requires root privileges. But when you are using Linux with a desktop environment, you are not being asked for a password after plugging in an USB stick. That’s because of udisks which has this polkit rule! Without it, using Linux on a desktop would be quite annoying, wouldn’t it?

That’s why the security of pkexec itself is extremely important. If there is any way of tricking pkexec into making a wrong decision, the game is over. And unfortunately, PwnKit won the game.

How PwnKit works

In Linux, all processes have an array of arguments, called argv[], which is followed by an array of environment variables, called envp[]. Both arrays are NULL terminated. The length of argv[] is being provided in another variable, called argc. For a simple pkctl my-command call, the arrays and their memory layout would look like:

There are no requirements with regards to the size of these — they can be empty. In that case, argc would have value 0.

The problem is that the vulnerable version of pkctl doesn’t check argc in a safe manner. Instead, it assumes that argv[1] is either NULL or a command to execute, no matter what. But then if argv[] is empty, according to pointer arithmetics in C, reading argv[1] results in reading envp[0]. C doesn’t check the bounds of the pointers, the language simply computes the memory location by treating the index (1 in argv[1]) as an offset from the starting point (argv[0]). This calculation leads directly to the memory location, without any validation of the pointer's limits.

If an attacker manages to make the argument array empty, pkexec will mistakenly use the content from the adjacent environment array as the application to execute. By adjusting these environment variables to include certain values and payloads, the attacker can exploit this to run commands as a privileged user. This execution occurs without needing any authentication.

If you are curious about details, you can look at one of the exploits example

Polkit upstream fix

Polkit fixed this issue by introducing explicit checks for argc.

To prevent such issues in general, Linux kernel 5.18 introduced a change which:

For userspace processes, when argv[] is empty, it adds an empty string and sets arcg to 1.
For kernel threads, it rejects ones with empty argv[].

Preventing polkit exploitation in vulnerable environments

To come up with an answer, let’s think about what’s unique about the pwnkit exploit in comparison to all the other processes in the system.

The answer is simple — it’s being scheduled with an empty array of arguments — argc of value 0 and with empty argv. We can already consider this simple fact as a suspicious property of any process. Why would we even have processes launching without argv? Every command which is being executed correctly, has at least its own name as an argument. Can you think of any legitimate use case of stripping the argv when executing a binary? Me neither!

Could we then just disallow any processes with empty argc and argv? Yes!

When running Linux with kernel older than 5.18, one can mitigate the issue by this simple Pulsar rule:

- name: CVE-2021-4034, possible envp manipulation with empty argv type: Exec condition: payload.argc == 0

This rule monitors all Exec events (a new process launched through execution a binary) and is triggered when argc (number of arguments) is 0.

Pulsar with the rule above is going to issue the following alert whenever we are trying to either use a PwnKit exploit or, in general, launch processes with empty argv[], trying our luck in finding out whether some software can be vulnerable to manipulation with argv[] and envp[].

Final remarks
As we mentioned at the beginning, with this post, we are starting a whole series about using Pulsar on the battlefield against CVEs and known attack vectors. Stay tuned for more rules and examples in the coming weeks.

If you like it, don’t forget to try out Pulsar and give it a star on github!

A tale of TimescaleDB, SQLx and testing in Rust

Exein — Thu, 02 May 2024 13:12:06 +0000

At Exein, we use TimescaleDB to store all kinds data throughout our services and rely on SQLx to interface with it. Throughout this article, we’ll explore how we recently changed our testing approach in relation to the database along with the problems we encountered and the solutions we came up with.

What is TimescaleDB?
Timescale is a database platform engineered to deliver speed and scale to resource-intensive workloads, which makes it great for things like time series, event, and analytics data. Timescale is built on PostgreSQL, so you have access to the entire PostgreSQL ecosystem, with a user-friendly interface that simplifies database deployment and management.

At the core of TimescaleDB sit hypertables, which are tables that automatically partition data into chunks by time along and on which policies such as chunk compression and retention can be set.

Another interesting feature of TimescaleDB is its continuous aggregates, which can be seen as a specialization of PostgreSQL materialized views, but for hypertables. Unlike materialized views, instead of refreshing the entire dataset every time, continuous aggregates store the aggregate of time based data and update according to their refresh policy or, in the case of real-time continuous aggregates, additionally aggregating recent, non-materialized data, to provide up-to-date results.

SQLx
SQLx is a popular Rust database framework which, at the time of the article, comes bundled with drivers for SQLite, MySQL and PostgreSQL. Since TimescaleDB is based on PostgreSQL, using the sqlx-postgres driver allows us to seamlessly interact with the database.

Testing
In both Continuous Integration (CI) and local development environments, it’s often necessary to test parts of our system using a live database. This process can become complicated when tests are executed concurrently or in parallel, as it may lead to collisions between the database objects being accessed, such as tables that are being read from or written to.

To avoid these issues we would run tests sequentially against a single live database, wrapping them in setup and teardownfunctions which allowed us to prevent the state from one test from spilling over to the next. While functionally sound, it takes significantly longer to run the entire test suite.

SQLx provides a neat testing mechanism through the sqlx::testprocedural macro. The macro provides a harness that runs each test on its own database, thus removing collisions between objects since each test is now isolated. It can even run migrations and even apply fixtures to get the database in the desired state before the test starts running.

Just like the compile-time query checks of SQLx, the test harness connects to the database through the same DATABASE_URL environment variable. This is generally fine, but we use some custom code to create the database connection pool and wanted to reuse that for testing as well, for consistency reasons.

Wrapper proc macros
Since sqlx::test does mostly what we want, I couldn’t help but wonder whether we can slightly adjust the way it works and still leverage it for our testing.

Rust’s attribute procedural macros have the following signature:

`#[proc_macro_attribute]
pub fn dummy(args: TokenStream, input: TokenStream) -> TokenStream {}`

People that have worked with Rust’s procedural macros might know that parsing and processing their arguments is quite involved, and replicating what sqlx::test does just to slightly change some inner workings would take a lot of boilerplate code and then effort to maintain it.

When thinking about procedural macros what comes to mind are annotations that take some code and either produce new code or alter the annotated code, or both. But what if among the modifications we make to the annotated code we could involve new procedural macros annotations? I’ll refer to these as wrapper proc macros.

While I haven’t realized it before, I actually saw quite a few examples of this out in the wild. One example that comes to mind is in serde derive macros which adds various attributes to the code it generates. The only questions remaining then is how to do this correctly?

Some information on this can be found on the rust-lang forum, specifically this post and the issues listed within it. The bottom line is that while there’s no formal documentation on this yet, work has been done towards the general intent of having macro attributes expand in an outside-in fashion, meaning that top attributes can see and even add/remove/modify bottom attributes.

The plan
Given all the information listed above, we can now envision a plan:

Write a wrapper proc macro that adds the sqlx::test annotation while also passing in all its arguments to it;
Add our custom database connection code to the generated code.

The first step of writing a wrapper proc macro is relatively straightforward if you have worked with proc macros before.

#[proc_macro_attribute]
pub fn test(args: TokenStream, input: TokenStream) -> TokenStream {    
        // We only convert this to `proc_macro2::TokenStream` to allow using    
        // it with the `quote!` macro.    
        let args = parse_macro_input!(args as proc_macro2::TokenStream);    
        let input = parse_macro_input!(input as ItemFn);    
        let ret = &input.sig.output;    
        let name = &input.sig.ident;    
        let inputs = &input.sig.inputs;    
        let body = &input.block;    
        let attrs = &input.attrs;  

        quote! {        
                #(#attrs)*        
                #[::sqlx_test(#args)]        
                async fn #name(#inputs) #ret {            
                    #body
            }    
      }    
      .into()
}

All that is required is that we destructure the input function so that we add the sqlx::test attribute at the end. Since this is a procedural macro, it will have to reside in a different crate in the workspace, say db_macros. However, this doesn’t cover the initial goal we set off to achieve, which is re-using our custom connection pool setup code in tests. But it does set the ground for that.

The next step is figuring out how to alter sqlx::test’s usage of the DATABASE_URL variable to make it work with our custom code. Perhaps expanding a sqlx::test annotated test could help with this.

Say we have this test function as an integration test in the db_internal crate:

#[sqlx::test(    
        migrator = "db_internal::MIGRATOR",    
        fixtures(        
                path = "../fixtures",        
                scripts("some_script.sql", "some_other_script.sql")    
        )
)]
async fn dummy_test(pool: PgPool) {    
        // do stuff with `pool`
}

This would expand to:

#[::core::prelude::v1::test]
fn dummy_test() {
    async fn dummy_test(pool: PgPool) {
        {
            {
                // do stuff with `pool`
            }
        }
    }
    let mut args = ::sqlx::testing::TestArgs::new(concat!(module_path!(), "::", stringify!(dummy_test)));
    args.migrator(&db_internal::MIGRATOR);
    args.fixtures(&[
        ::sqlx::testing::TestFixture {
            path: "../fixtures/some_script.sql",
            contents: include_str!("../fixtures/some_script.sql"),
        },
        ::sqlx::testing::TestFixture {
            path: "../fixtures/some_other_script.sql",
            contents: include_str!("../fixtures/some_other_script.sql"),
        },
    ]);
    let f: fn(_) -> _ = dummy;
    ::sqlx::testing::TestFn::run_test(f, args)
}

We can see that some TestArgs structure is constructed, containing migrations (or the provided migrator in our case) and fixtures. This is based on those args to sqlx::test that we really want to avoid having to parse and process ourselves.

For our mission, the TestFn trait is really the interesting part. Per the sqlx::test docs, the macro can be used on async functions accepting a connection pool, a single connection or connection options and the trait implementation for each of these function signatures drives the test setup.

Since sqlx::test relies on DATABASE_URL for tests we can try a naive implementation where we construct and set the environment variable based on our custom connection code before the test is run by injecting a line before the TestFntrait method call.

This raises another question, if we use our test wrapper proc macro to add the sqlx::test annotation, how are we going to manipulate the code expanded from the sqlx::test macro? Using other, “private”, procedural macros. We will again depend on the outside-in nature of proc macro expansion and leverage a second macro that we will add after sqlx::testso it can see the expanded code and manipulate it.

That macro will look something like this:

#[proc_macro_attribute]
pub fn __private_test(_args: TokenStream, input: TokenStream) -> TokenStream {
    let mut input = parse_macro_input!(input as ItemFn);

    // Pop the last statement
    let last = input
        .block
        .stmts
        .pop()
        .expect("sqlx::test always generates statements");

    // Generate custom statement
    let stmt = quote!{
        // custom connection code statement
    }.into();

    // Insert statement at the end of the block
    input
        .block
        .stmts
        .push(parse_macro_input!(stmt as Stmt));

    // Add the last statement back
    input.block.stmts.push(last);

    quote!(#input).into()
}

And so the test wrapper proc macro will now look something like this:

#[proc_macro_attribute]
pub fn test(args: TokenStream, input: TokenStream) -> TokenStream {
    let args = parse_macro_input!(args as proc_macro2::TokenStream);
    let input = parse_macro_input!(input as ItemFn);
    let ret = &input.sig.output;
    let name = &input.sig.ident;
    let inputs = &input.sig.inputs;
    let body = &input.block;
    let attrs = &input.attrs;

    quote! {
        #(#attrs)*
        #[::sqlx_test(#args)]
        #[::db_macros::__private_test]
        async fn #name(#inputs) #ret {
            #body
        }
    }
    .into()
}

Note the usage of the db_macros crate, which is the crate that houses our custom procedural macros.

Trying it out
With the wrapper proc macro in place, it’s time to try it out. Naturally, the local environment is the place to start. We spin up a Docker container of TimescaleDB and try to run our dummy test against it. The harness connects to the database and runs fixtures & migrations before the test function body, so it doesn’t matter that the test itself is empty.

It all seems to work, great! Time to give it a go in the GitHub CI.

failures:

---- dummy_test stdout ----
thread 'dummy_test' panicked at /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/sqlx-core-0.7.4/src/testing/mod.rs:248:14:
failed to apply migrations: Execute(Database(PgDatabaseError { severity: Error, code: "40P01", message: "deadlock detected", detail: Some("Process 104 waits for AccessExclusiveLock on advisory lock [21387,1005,0,29749]; blocked by process 188.\\nProcess 188 waits for ShareLock on relation 15242 of database 21387; blocked by process 104."), hint: Some("See server log for query details."), position: None, where: None, schema: None, table: None, column: None, data_type: None, constraint: None, file: Some("deadlock.c"), line: Some(1148), routine: Some("DeadLockReport") }))
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Uh-oh… A deadlock? From what? Trying it numerous times in both the CI and local environment always gave the same output: locally tests run fine but deadlocks occur almost every time in the CI.

Next plan: no plan
I have to admit that I was taken off guard by this. I used the sqlx::test harness with PostgreSQL numerous times without issues and the little trick we pulled off with the wrapper proc macro merely sets the DATABASE_URL environment variable at runtime so common sense heavily prevents me from considering this as the cause. I started exploring whether the issue was simple, silly and easily fixable.

Speculations and observations: Part 1
It immediately felt like this is a TimescaleDB related issue. But how different is TimescaleDB from vanilla PostgreSQL and could these differences really throw things off so much?

The answer lies in how all the neat features of TimescaleDB work: background workers. With each hypertable, continuous aggregate and policy on these objects TimescaleDB makes use of background jobs to do the necessary processing.

I strongly suspected that the deadlocks occurred between TimescaleDB background workers and the migration scripts. Out of these two, we can’t really directly alter the background workers behavior, but maybe our migrations have some incompatibility with how TimescaleDB works.

How SQLx migrations work
Without going into too much detail, analyzing the code between sqlx::test shows that migration scripts get run one by one, each in its own individual transaction. A deadlock happens between transactions, so it seemed fair to assume that the issue is between one or more migration steps and the background workers.

Getting database logs
The interesting part of the testing step in our GitHub CI goes something like this:

- name: Setup database
  run: docker compose up -d timescaledb

- name: Run tests
  uses: actions-rs/cargo@v1
  with:
    command: test
    args: --locked --workspace

Instead of getting the database logs post-mortem, I thought it would help to be able to get them closer to the events that generated them, so I combined these two steps into:

- name: Run tests
  run: |
      docker compose up timescaledb &
      cargo test --locked --workspace

Running the CI now provides the database logs along the running tests, and some interesting bits started showing up:

2024-04-01 08:10:19.415 UTC [443] ERROR:  deadlock detected
2024-04-01 08:10:19.415 UTC [443] DETAIL:  Process 443 waits for ShareLock on relation 16529 of database 24363; blocked by process 409.
        Process 409 waits for AccessExclusiveLock on advisory lock [24363,1001,0,29749]; blocked by process 443.
        Process 443: CALL _timescaledb_internal.policy_refresh_continuous_aggregate()
        Process 409: [ Migration step SQL here ]

At this point tests were still running and none failed yet. Then:

2024-04-01 08:22:53.991 UTC [104] ERROR:  deadlock detected
2024-04-01 08:22:53.991 UTC [104] DETAIL:  Process 104 waits for AccessExclusiveLock on advisory lock [21387,1005,0,29749]; blocked by process 188.
        Process 188 waits for ShareLock on relation 15242 of database 21387; blocked by process 104.
        Process 104: [ Migration step SQL here ]
        Process 188: CALL _timescaledb_internal.policy_refresh_continuous_aggregate()

This is when the first test failure could be noticed. Shortly after, when all the remaining tests either finished or were skipped, the entire CI job failed.

Speculations and observations: Part 2

The logs shed more light on what is happening. In the first log snippet, we see a deadlock taking place between a migration step and a background worker. However, this does not result in the test failure. The second log snippet shows the same thing, but the processes are reversed. This does result in a test failure.

PostgreSQL periodically checks for deadlocks and if it finds any it will stop one of the processes involved to allow the other one to continue. It looks like if the background worker gets killed, the migration step continues and the test moves on. But if the migration step process is killed (unlocking the background worker in the process) the test fails with a migration error.

Up until now all the failures happened at a given migration step. The step served to change the datatype of some columns in a hypertable (let’s use the table name data), including the column used as the time series. This requires re-creating the data table, so the step does something along the lines of:
1) Rename the data table to data_old;
2) Create a new table named data;
3) Move data from data_old to data;
4) Make data a hypertable;
5) Create indices, triggers and various TimescaleDB policies;
6) Drop some obsolete continuous aggregates based on the data_oldtable;
7) Drop the data_old.

Was there an underlying issue to how we did migrations, combining vanilla PostgreSQL and TimescaleDB specific actions in some of the bigger steps, like the one above? The logs provide a SQL snippet containing the code related to dropping those obsolete continuous aggregates. It seemed like a long-shot, but perhaps if we somehow dropped these before this step we would observe a different behavior?

Intermediary migrations

Modifying existing migrations is a no-go. SQLx keeps track of the migrations that have already run and does some integrity checks to ensure they have not changed. But could we maybe add a new step somewhere in the middle? It seems like we can!

Given a migrations folder such as:

migrations/
    20231129000000_first_step.sql
    20231202000000_second_step.sql
    20240105000000_third_step.sql

We can add a migration step in the middle by giving it a timestamp that is situated between steps.

migrations/
    20231129000000_first_step.sql
    20231202000000_second_step.sql
    20231202000001_intermediary_step.sql
    20240105000000_third_step.sql

The caveat here is how this behaves on databases that already went through the previous migrations vs test databases that start from scratch. For the former, the intermediary step actually gets run at the end, because SQLx knows that it already ran all the other migrations. For the latter, migrations will be ran in order. This means that our intermediary migration step has to be idempotent regardless of whether in runs in or out of order.

So with this in mind I added such an intermediary step that removed the obsolete continuous aggregates if they existed to see whether that impacts the deadlock occurrence. Since these objects were obsolete we have the luxury of dropping them if they exist at any point in the migration process.

The result was that deadlocks still occurred, but their frequency was reduced. If they previously happened in 80% of the runs, they would now happen in about 60%. So something seemed to have changed.

Circling back to TimescaleDB

At this point our CI test suite was running quite slowly, primarily due to the limited resources available in our GitHub runners. We were using the timescale/timescaledb-ha:pg15-latest Docker image, so I figured I could take another shot in the dark and see if there’s any difference with the latest image, timescale/timescaledb-ha:pg16. Newer versions come with bug fixes and performance increases so it was worth trying, if only for comparison’s sake.

The good news is that this further decreased the deadlock frequency to around 40%. The bad news is that the CI was still inconsistent. Furthermore, migrations would fail at other steps too (including the intermediary migration step I added above), so this rules out the issue being related to a single migration script. This in itself conveys some information; it points to a broader issue, unrelated to singular migration steps but rather to a broader issue when applying migrations containing TimescaleDB specific syntax.

Speculations and observations: Part 3

It began obvious that this was not going anywhere and I needed to shift approaches. Waiting for the CI to run was slow and time-consuming and I had too little control and access to properly understand what’s going on.

Constrained environment

Thinking about the slowness of the CI and resource contraints I thought I could simply limit the resources of a TimescaleDB Docker container and see whether I can replicate the issue locally.

Eventually, after limiting the CPU usage of the container further and further, I started getting the issue locally too when running TimescaleDB like this:

docker run --rm --name ts-container --env POSTGRES_USER=postgres --env POSTGRES_PASSWORD=postgres --memory=512M --cpus="0.125" timescale/timescaledb-ha:pg16

So when resources are low, both migrations and background workers take too long to finish and concurrency issues start occurring in the form of deadlocks. I started looking through TimescaleDB docs and forums and stumbled unpon something interesting: it looks like it was primarily designed to work on a single database per instance.

Documentation is sparse, but these are a few references:

Isolation: Database vs Schema

Since sqlx::test creates a new database for every test, performance seems to be affected due to the amount of TimescaleDB background jobs that get started. This makes the overall database server slower and more prone to deadlocks. The alternative presented in the timescale forums is isolating data through schemas rather than databases.

When isolating tests through separate databases, TimescaleDB manages state, settings and configuration through various internal schemas. Background jobs dedicated to one database get created so when creating multiple databases a set of background jobs and internal schemas get created for each of them.

If we were to isolate tests through schemas instead, we could increase performance because all the TimescaleDB related objects and background processes would be shared across all the tests, lowering the number of concurrent background workers.

Further down the rabbit hole

At this point it became clear that I’d have to either revert to the old sequential testing approach or get my hands dirty with some SQLx internals. Being the relentless individual that I am, I decided to give the latter a go. The goal still has not changed - avoiding boilerplate just to replicate the sqlx::test proc macro. It just looks like we will have to use more custom code than initially anticipated.

Schema based isolation

Fortunately for me, I was already somewhat familiar with SQLx’s internals due to the Exasol driver I wrote as a third party SQLx driver.

For PostgreSQL, the most relevent part of the code is here. With this in mind I changed some things around to rely on schemas instead of databases and even simplified some parts of the implementation as this was always meant to be for internal use only..

Additionally, a custom implementation of the TestFn trait was required and used in the __private_test proc macro. The custom TestFn impl looks something like this:

#[derive(Debug)]
pub struct CustomTestFn<Fut>(pub fn(PgPool) -> Fut)
where
    Fut: Future;

impl<Fut> TestFn for CustomTestFn<Fut>
where
    Fut: Future,
{
    type Output = Fut::Output;

    fn run_test(self, args: TestArgs) -> Self::Output {
        sqlx::test_block_on(async move {
            let (schema, pool) = setup(&args).await.expect("test setup failed");

            let res = (self.0)(pool.clone()).await;

            teardown(&args, schema, &pool)
                .await
                .expect("test teardown failed");

            res
        })
    }
}

This makes the __private_test proc macro look something like this:

#[proc_macro_attribute]
pub fn __private_test_fn(_args: TokenStream, input: TokenStream) -> TokenStream {
    let mut input = parse_macro_input!(input as ItemFn);

    let last = input
        .block
        .stmts
        .pop()
        .expect("sqlx::test always generates statements");

    // Generate and store a new statement to override the local variable
    let test_fn_stmt = quote!(let f = ::db_internal::CustomTestFn(f);).into();
    input
        .block
        .stmts
        .push(parse_macro_input!(test_fn_stmt as Stmt));

    // Add the last statement back
    input.block.stmts.push(last);

    quote!(#input).into()
}

Unfortunately, even with this change in isolation, deadlocks frequency would still be around 25%. It’s much lower compared to what we started off with but still unnacceptably high for an automated process.

Treating the symptoms

At this point it’s apparent that we are focusing on the symptoms rather than the cause. We know this is a concurrency issue and it’s happening between TimescaleDB background jobs and some of the migration scripts. We’re missing something. The small and silly kind.

I kept running test suites locally, looking through logs, trying to find some a new lead. And then I saw it. When it came to deadlock errors, they all had something in common because of schema based isolation and because I ran the test suite locally multiple times on the same Docker container. This meant that all tests would use the same TimescaleDB internal objects and that these objects would not change since the container was the same.

So let’s take another look at a log snippet:

2024-04-01 08:22:53.991 UTC [104] ERROR:  deadlock detected
2024-04-01 08:22:53.991 UTC [104] DETAIL:  Process 104 waits for AccessExclusiveLock on advisory lock [20839,1005,0,29749]; blocked by process 188.
        Process 188 waits for ShareLock on relation 17261 of database 20839; blocked by process 104.
        Process 104: [ Migration step SQL here ]
        Process 188: CALL _timescaledb_internal.policy_refresh_continuous_aggregate()

What is relation 17261? I’m no PostgreSQL guru so finding a way to figure this out required a short search. As it turns out, this actually is very easy to find out:

SELECT 17261::regclass;

Which outputs:

regclass

_timescaledb_config.bgw_job

This is always one of the objects being contended, locked by a transaction that then waits on something else and waited by a second transaction that locked the something else the first one wants!

Mitigating the cause

I could not find a lot of information about this table specifically, but right away I could tell it’s internal to TimescaleDB. Querying it returns a list of jobs that exist for the database. This led to the following revelation:

Background jobs run periodically, so background workers access this table to check whether or when they should run which may involve updating the table;
At the same time, migration scripts issue all kinds of TimescaleDB specific statements and some of them definitely interact with this table due to their nature (i.e., adding, removing or modifying background jobs).

A less abstract scenario may look as follows.

A background worker could update the _timescaledb_config.bgw_job in a transaction and then want to do something with the data table;
At the same time, a migration step would alter the data table, and also re-create some background jobs within the same transaction (which implies updating the _timescaledb_config.bgw_job table).

When running in a constrained environment and things move slow, neither of these transactions finish in due time and end up running together, locking each other. The scenario above is just an example, but the idea is always the same: deadlocks between two transactions involve two objects. To avoid deadlocks, one has to ensure that there’s never contention on both objects at the same time.

In our case, _timescaledb_config.bgw_job is always one of the objects. The other object seems to vary, because there are multiple migration steps that use TimescaleDB related syntax and it only seems to be a matter of processing power where and if a deadlock would occur.

In conclusion, a permanent solution would be restricting access to the background jobs table, this way allowing migrations to run unbothered. As a result, background workers can then also run as usual without issues since migrations are over.

This was incorporated as part of the custom impl of TestFn, with the relevant bit looking like this:

// Start a transaction to lock various TimescaleDB specific resources to 
// prevent deadlocks between migrations and TimescaleDB background workers.
let mut tx = conn.begin().await?;
tx.execute("LOCK TABLE _timescaledb_config.bgw_job IN SHARE ROW EXCLUSIVE MODE;").await?;
migrator.run(&mut *tx).await?;
tx.commit().await?;

The primary objective is to start a transaction that spans across all migration steps and first and foremost lock the background jobs table. The chosen lock allows concurrent READS on the table, but only allows one transaction to WRITE to the table.

While we could alternatively lock the background jobs table before each migration step and run each of them in its own transaction, it would only increase contention on the table because all tests would try to lock the same object for every migration step. Locking it and running all migrations at once is faster.

The long awaited victory

Running the test suite now would still pose problems sometimes, but unlike previous cases deadlocks would now happen when DROP SCHEMA is invoked. Therefore we have to add the background jobs table lock in our teardown routine as well. Many succesive test suite runs were now succesful. Given that many things here were experiments, I went ahead and removed the the intermediary migration steps and reverting the TimescaleDB Docker image used to timescale/timescaledb-ha:pg15-latest.

So what about test isolation? Well, I would’ve never seen the same relation ID in logs if not for schema based isolation and that’s what led me to try locking the _timescaledb_config.bgw_job table. This increased debuggability as well as the reduced number of background jobs were good enough reasons to keep using this approach.

After a couple of weeks, though, deadlocks between some TimescaleDB internals started happening every now and then when dropping test schemas. The frequency was very low in CI; so low that a subsequent run was pretty much guaranteed to succeed. Also, regardless of how few resources I’d give to a TimescaleDB Docker container it would never happen locally. Therefore, we decided to switch back to database test isolation instead hoping that it bypasses some, if not all, of the TimescaleDB internals involved here.

The problem seems to be caused by friction on some advisory locks when dropping a schema, which also deletes background jobs, and some of the background jobs are already running. I reported it to the TimescaleDB team here and not only did they reply quickly but they even shared ideas to further test in our quest to getting consistent test suite runs in CI. We’ll experiment with them and see what comes out of it.

Conclusion

This brings us to the end of our journey. It’s interesting how something so seemingly simple can raise such convoluted problems.

It all started from not wanting to rely on the DATABASE_URL environment variable and aiming to make tests concurrent and we ended up writing wrapper proc macros, creating our own TestFn SQLx test harness implementation and debugging deadlocks on TimescaleDB internals. The result? Dramatically reduced testing time due to running tests concurrently!

I hope you enjoyed reading this article and that the ideas contained within will come in handy at some point. If instead you wanted to find out more about IoT devices cybersecurity, feel free to check out our project on GITHUB :)

Elevating Security with Meta-Exein: Achieving Yocto Project Compatibility

Exein — Thu, 14 Mar 2024 11:18:43 +0000

In the changing field of cybersecurity, staying ahead of threats requires constant vigilance and the use of current, reliable technology. We’re excited to share that our high-tech software Pulsar is now available through the meta-exein layer, which has been officially recognized as a Yocto Project Compatible layer. This important step shows our dedication to providing top-level security to the open source community and makes the Yocto Project system even stronger.

Understanding the Yocto Project Compatibility
Before we talk about the impact of meta-exein, we need to understand what Yocto Project Compatibility means. The Yocto Project provides tools and resources to help developers build custom Linux systems for any hardware. Being compatible with this project is a big deal; it means a product or software meets high standards and works well with others, making it reliable and easy to integrate.

If a product is compatible, it’s met certain standards that help make developing and building more efficient.

The Role and Importance of Meta-Exein
Meta-exein isn’t just another part of the Yocto Project. It’s a big step forward in making embedded systems safer from cyber threats. With cyber-attacks becoming more common and tricky, we need better security now more than ever. Meta-exein offers a strong, adaptable security system that fits right into any Yocto Project-based development, with target x86_64, aarch64 and RiscV.

The fact that meta-exein is now an official Yocto Project Compatible layer shows that it’s reliable and effective. It proves that meta-exein offers good, compatible security solutions within the wider Yocto Project ecosystem. This means developers and organizations can use a checked, top-quality security layer in their projects. This layer can boost protection without giving up efficiency or flexibility. Meta-exein is compatible with other systems, facilitating its use by developers.

Additionally, as part of the Yocto Project, meta-exein receives regular updates and support from a vast community of developers and security experts. This ensures rapid detection and resolution of any issues, maintaining your system security.

Looking Forward
Despite this achievement, we’re still focused on improving meta-exein to counter the constantly changing security threats.

We encourage developers, groups, and enthusiasts to discover how meta-exein can benefit their Yocto Project-based projects. By working together, we can create a stronger and more secure system, ready for current and future challenges.

In conclusion, meta-exein being an official Yocto Project Compatible layer is not just proof of its quality and dependability; it’s a call to join us in making a safer future for embedded systems worldwide.

If you’re interested in seeing meta-exein in action and how it can improve your security, we invite you to check out our GitHub page.

Container Support in Pulsar

Exein — Fri, 01 Mar 2024 13:24:50 +0000

Our Rust Engineer, Michal Rostecki, recently discussed about how Pulsar supports containers and enhances their security. This blog post is a simplified overview of his talk.

Pulsar is an open-source project that stands out for people looking for a strong security observability framework. It was first made for IoT devices, but because of its flexible rule language and complete default rule set, it can be used more widely. This flexibility lets users make custom rules based on Pulsar's specific event hooks, making it useful beyond just IoT devices.

Michal's presentation introduces Pulsar's features, focusing on its ability to protect container workloads. He explains the importance of Pulsar's rule system and engine, which allow users to set up custom security policies for their specific operations. Through hands-on examples, Michal shows how Pulsar uses the Linux kernel's eBPF technology to track and respond to different system events, from process activities to network operations.

A key point in the presentation is the new container support in Pulsar, a big step for the project. This feature lets Pulsar monitor processes in containers, improving security in environments using containers. Michal gives information on how Pulsar can track events from containers, adding extra details like image URIs and checksums. This improvement makes it easier to create rules which target container workloads and even specific containers (recognized by their images).

Additionally, Michal discusses usual threats to containers like escape attempts and unauthorized access to host resources. He shows how Pulsar's advanced rules can identify and stop these threats early, highlighting Pulsar's effectiveness in keeping containerized environments safe.

Check out Pulsar's GitHub repository and website to learn more. There, you'll find guides and how-tos. It's a great place to share information and contribute.

If you're interested in cybersecurity, Michal’s talk gives a detailed look at how Pulsar is leading the way in making container environments safer. As containers are now a common part of deploying applications, Pulsar's progress here is an important move towards a safer digital world.)

Check out Pulsar's GitHub repository and website to learn more. There, you'll find guides and how-tos. It's a great place to share information and contribute.

Container Monitoring in Pulsar 0.7.0

Exein — Thu, 21 Dec 2023 18:27:31 +0000

Pulsar 0.7.0
The Exein team has released a new version of Pulsar, 0.7.0. Pulsar is a runtime security agent for Linux systems, designed with specific focus on performance and low overhead. Powered by eBPF, it traces system activity from the Linux kernel and lets you apply security policies in real time on top of the events traced.

What’s in 0.7.0
The latest release of Pulsar marks a significant advancement in its security capabilities, introducing crucial features such as container monitoring support and SMTP integration for threats logging.

For a comprehensive overview of all the new features implemented, please refer to the detailed release notes for 0.7.0 and the documentation.

Container monitoring support
Pulsar 0.7.0 now supports container monitoring. This latest feature empowers Pulsar to extend its monitoring capabilities to containerized environments, allowing users to monitor the health and status of individual containers and identify potential vulnerabilities and threats.

How does container detection work
Pulsar employs eBPF to detect system activity. Containers are merely processes isolated by namespaces and cgroups. So, how can we determine if a new process belongs to a new container?

We can identify new processes at the following tracepoints:

sched_process_exec
sched_process_fork

Both of these receive the information about a new process in the form of task_struct. When we go deeper in the members of that struct, we can retrieve the numeric IDs of namespaces.

The mount namespace is always created, whether Docker or Podman is used. Therefore, we can detect if a new process comes with a new mount namespace.

Catching the process with a new mount namespace doesn’t provide all the needed information, such as the container engine (e.g., Docker, Podman), unique container ID, and container image details. Task_struct and tracepoints don’t offer these answers, but there’s an alternative method.

To identify the created container, one approach is to send a user-space event for the new process, a functionality consistently performed by Pulsar. Container engines typically assign container IDs to cgroups. By examining /proc//cgroup for any process within a container, the container ID can be extracted.

$ cat /proc/18337/cgroup
15:misc:/libpod_parent/libpod-64cabea89c8766c5432d42ee9fe93c7f853bdc
[...]
2:cpuset:/libpod_parent/libpod-64cabea89c8766c5432d42ee9fe93c7f853bdc29055a30abdbe4e26c9001d904
1:name=openrc:/libpod_parent/libpod-64cabea89c8766c5432d42ee9fe93c7f853bdc29055a30abdbe4e26c9001d904
0::/libpod_parent/libpod-64cabea89c8766c5432d42ee9fe93c7f853bdc29055a30abdbe4e26c9001d904

$ podman ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 64cabea89c87 docker.io/library/alpine:latest /bin/sh 55 seconds ago Up 55 seconds tender_pascal

Once obtained, detailed information about the container and its image can be retrieved by referencing /var/lib/docker/containers//config.v2.json for Docker or using podman inspect –type=container for Podman.
For example, we can retrieve the container image URI with:

$ podman inspect --type=container 64cabea89c87 --format "{{ .ImageName }}" docker.io/library/alpine:latest

SMTP Integration for threats logging
Pulsar now features a robust SMTP integration, further improving its threat-logging capabilities. With this new feature, Pulsar can seamlessly send threat alerts via email, ensuring timely and effective communication during security incidents.

Security administrators and IT teams can now configure Pulsar to send detailed threat notifications directly to designated email addresses. This functionality ensures that crucial threat information is promptly delivered to the right personnel, enabling swift responses to potential security breaches.

To set up the SMTP feature, simply configure the following fields in the Pulsar configuration file:

[smtp-notifier] enabled=true receivers=<receivers email address> server=<SMTP server> username=<username> password=<password> sender=<sender email address> port=587 encryption=starttls

Full changelog
Added support for monitoring containers within the core functionality #213
Introduced a description field in the Threat structure, providing a human-readable description of the threat #189
Added the namespaces field for events related to fork and exec operations. #207
Implemented SMTP integration within the module for logging threats to sent threats also via email #201
Added the ability to modules to display warnings as part of their functionality #203
Added syslog capabilities to the module #212
Introduced the enabled_by_default flag for every module, allowing the definition of default behavior #220
Refactored preemption in the BPF probes #188
Updated the Continuous Integration (CI) process to support OpenSSL. #202
Updated the GitHub workflow in the CI process #217
Resolved the issue introduced by changes in the kernel affecting the layout of the struct iov_iter #226
Corrected the doctest in the validation module #195
Implemented a check to verify the payload before applying the ruleset in the validatron module #211
Disabled the stack protector in the build process#205

Thank you for reading!
If you like it, don't forget to try out Pulsar and give us a star ⭐️