Forem: lostghost

Distributed Applications

lostghost — Tue, 28 Oct 2025 05:48:16 +0000

This is a blog series on distributed applications - what goes into them, from start to finish. I spend less time recounting theory - instead prefferring to link to it - and more time describing my takeaways. Hope you enjoy it, and please leave feedback.

Part 1 - Overview
Part 1.5 - Network API
Part 2 - Distributed Design Patterns
Part 3 - Distributed State

If you enjoyed the series, consider also my series on Linux.

Distributed Applications. Part 3 - Distributed State

lostghost — Mon, 27 Oct 2025 07:11:14 +0000

This blog is part of a series on designing distributed applications. In this chapter, we look at distributed state - and the applications that manage it, databases. We won't go too deep into the individual products, and instead try to keep the discussion about the tradeoffs of particular approaches.

The simplest database is a file (or a block device). You can seek around and write to it, or you can mmap it and write to memory.

What are the issues with using a file as a database? Lack of a networked API - you can't expose a file on a TCP port, you need an intermediary (which is a shame, linux has all kinds of low-level file oriented primitives - why not this one?), lack of granular concurrency - you can only lock the entire file, and lack of high-availability - a file can't natively be replicated or sharded.

We discussed network APIs in part 1.5, so let's look at consensus and concurrency control. Reliable writes to a cluster can be leaderful, or leaderless. Leadership can be with manual failover - like in Postgres, or automatic, with elections - like in Paxos and Raft. I suggest watching the original presentation on Raft, it's really insightful.

Leaderful means that all writes have to go through the leader. That prevents scaling, but helps consistency - Postgres being a good example. The way you can achieve scaling with leaders, is by sharding - have portions of data controlled by independent processes, and thus have different leaders. But you sacrifice consistency between shards - and you get Cassandra LightWeight Transactions - docs. They only work within one shard. As for Postgres - sharding an individual Postgres instance is awkward, so why not run multiple Postgres, and shard that way? What you get is Citrus, Apache Cloudberry.

But what if you don't want leader-per-partition, but instead true multi-master writing? You will inevitably get conflicting updates, and you have to decide between merging, and Last Writer Wins. Merging can be built-in - like with CRDTs, which you should check out (for example, here). But CRDTs are not general-purpose. Conflicts can be exposed to the user - like with Git, which is a kind of database. Or you can just do Last Writer Wins - which is what Cassandra does by default.

Raft and Paxos give us atomic compare-and-swap within a shard - this can be extended to full serializability, since any reads and writes can be made atomic under a single leader. The mechanism would be 2PL, or MVCC with SSI extension. But what about across shards, with different leaders? Well, we can run one more Paxos or Raft algorithm, but this time over these shards - who then have their own Paxos or Raft groups, and achieve consensus. This is a lot of round-trips - so the fewer shards are involved in a transaction, the better. The paper on distributed commit with Paxos is here.

This is much better than the established two-phase-commit protocol, 2PC. It is a lot like state machine replication in Raft - the leader sends updates, quorum confirms receiving them, leader sends "commit" - except, 2PC operates only one replica of the coordinator process, so if the leader fails - there is no automatic reelection, everyone just waits for leader to come back alive. To me, Raft/Paxos looks strictly better. In fact, if you take 2PC, run multiple processes that come to a quorum, and make it do leader reelection - it starts to look a lot like Raft.

I mentioned 2PL, Two Phase Locking, and MVCC - Multi-Version Concurrency Control. These are the two main concurrency control system within a shard. 2PC establishes locks for readers and writes - making the best effort to avoid blocking, while MVCC creates snapshots of state for each transaction - which avoid all blocking, but need garbage collection afterwards. MVCC is generally faster, especially for transactions with relaxed isolation level of "Snapshot isolation" - ideal for read-only transactions, that simply need a consistent view of data. Using snapshot isolation for writing transactions can lead to write skew - so to be safe, one should stick to serializable writing transactions.

And lastly, there is one more subject to address - the interaction between application-level transactions and database-level transactions. You should watch this video for full context. Basically, with every application having its own database, and following the saga pattern - they implement their own distributed transactions, just at the application level. This is not ideal, since these implementations are improvised, and prone to errors. But we don't want to do distributed transactions across the whole internet either - not with 2PC certainly, we can't lock the world, to wait for decisions from across the Internet. The real world is not consistent - if we are to model it, we need to deal with that fact.

So what's the solution? We want distributed state to be handled by a dedicated system, making application logic as pure as possible (doesn't matter if it's a functional or object-oriented language - both can be modeled with pure computation in mind, and be equivalent). And we want distributed transactions. If we have 2PC across the internet - this won't scale. We need to use Paxos Commit, or equivalents. Furthermore, current serializability enforcement is too coarse - it only checks, which rows were read and written, but not the contents, not business constraints. For example, consider that an entity/object/actor of "account" has the business constraint of being non-negative. Multiple long-lived transactions across services also decrement the balance, checking that it's non-negative. If the DB only checks the fact of reading and writing - these transactions would cancel each other, despite not actually violating the constraint. This doesn't scale.

Let's imagine a world with distributed transactions, that do actually check business constraints. They could still cancel each other at unpredictable times. This is fine, except - applications are written under different assumptions. The source of truth is in-process state, which gets reflected into a database. A workflow has a notion of forward-progress - and cannot be rolled back several steps, if the distributed transaction gets cancelled. What I'm proposing is a different model - where the source of truth is the database, and application code is a pure function, making no assumptions about monotonic time, and instead gets driven entirely by the database. Under that paradigm, distributed transactions can scale. This is very similar to the actor model.

Thank you for reading this post! Next one will be about practical design of a distributed system.

Distributed Applications. Part 1.5 - Network API

lostghost — Sun, 26 Oct 2025 14:22:31 +0000

This blog is part of a series on designing distributed applications. In this chapter, we will examine different technologies and approaches for a network API.

We can split the different approaches to a network API into the following categories:

Binary over TCP/UDP
Text over TCP/UDP
JSON over HTTP
gRPC
GraphQL

Binary over a socket is a popular choice - Postgres, MySQL, MSSQL, Cassandra, Redis, Mongo, CockroachDB use it. You can tailor the protocol to your use-case, and get the most performance out of it. You don't lock yourself into a corporation's technology, that you have no influence over. But there are downsides - with TCP, you lose in performance, so the protocol-level optimization may be not worth it. With UDP you have to reinvent the wheel - using QUIC would be a good choice. You then need to evolve your protocol and version it, as well as all of the custom tools and drivers for it.

Text presents a lower barrier for tooling - but it is less efficient. And ultimately, you will be transmitting binary data - so you either base64 encode it, bloating its size, or you leave it as is, making the output of your protocol unreadable. Or you use two pairs of sockets - one for "control-plane" text-only queries, and one for binary data. A potential footgun is with firewalls, as with FTP.

JSON over HTTP has much more infrastructure already in place, but it is ultimately still text over TCP - and all the problems that come with it. Afterall, there is a reason MongoDB uses BSON - a binary protocol.

gRPC solves that problem, and was basically imagined as a binary JSON over HTTP - with a built-in schema. But there are issues:

It is controlled by Google, and your needs should align with Google's needs
You have to use ProtoBuf for schema definition and, most often, binary encoding. Most tools assume ProtoBuf on the wire
It demands HTTP/2, even though HTTP/3 is available. Google should get around to adopting HTTP/3 at some point, but it doesn't look like they are in a hurry.
It is not natively supported in the browser. And proxies offer a limited functionality

And finally, GraphQL. It is a powerful query language - much like SQL. But exposing SQL to a client is challenging - databases are not built for that. ACLs are hard, versioned APIs are hard, rate limiting is hard. Some of it can be solved by proxies, such as Envoy and HAProxy, but not all. And you will still want an internal representation, and a public API - may as well use a different technology. But the issues with GraphQL are the same - ACLs are hard, versioned APIs are hard, rate limiting is hard. I don't think it's a good choice for those outside Meta (it is not an independent project). Read more here.

There is also the option of a language-level RPC library, such as tRPC or RPyC. The former is a great option, if your code is Javascript/Typescript on both frontend and backend, with a bonus if your team is full-stack. If you can force the clients to upgrade, and you have dedicated APIs for every client - you can change them at will, with no versioning, no coordination. As for RPyC, because it works on object proxies - if the server or the client restarts, all the proxies break - and if they are all over your state, the whole system has to restart.

So what do we do then? I'd stick with JSON over HTTP, based on OpenAPI spec. Coupled with tools for code-generation and input (and output) validation based on the spec, this is a good universal solution - a web standard, available on all platforms, and not tied to any company. JSON is not as efficient as binary, even when gzipped - that is a tradeoff.

But there is a larger takeaway. Say you have one backend, and many clients - browser, phone, desktop. You can create dedicated APIs for every client - and, if upgrades are mandated, not version them, not coordinate with anyone, just deploy new APIs. To me, this looks close to optimal. Another option is to have a universal JSON API for all clients. But clients have different needs, and may be developed by different teams - you need versioning. And it will be hard to avoid overfetching and underfetching. You may then want to introduce a query language - but you will just be reinventing GraphQL, with all its problems. But if you want the most universal API, which can integrate with any possible application - you are looking at the "Semantic Web" protocol suite. It is worth looking at - here are a couple resources.

This is it, when it comes to networked API design - thank you for reading!

Distributed Applications. Part 2 - Distributed Design Patterns

lostghost — Fri, 17 Oct 2025 17:30:14 +0000

This blog is part of a series on designing distributed applications. In this chapter, we will examine the different design patterns, commonly employed in the industry for distributed applications.

First let's address a widely heard, but poorly understood concept - microservices. What is a microservice, and how is it useful? For some reason, the term came to mean "your program should be many small programs, which send JSON over HTTP". What makes a small program? Why JSON and HTTP specifically? Seems like some sort of cargo cult.

A microservice is a service owned as completely as possible by one team, that can be deployed without coordination with any other team. Microservices are about office politics (namely, Conway's law), and not any particular technical solution.

With that out of the way, let's discuss CQRS, a widely adopted and useful pattern. CQRS stands for "Command-Query Responsibility Segregation", and it means that interations with a software system should be separated into commands and queries. Commands change the state of the system, and produce no results, while queries produce results, but don't change the state. This has many implications.

Commands require special handling - once accepted, they have to be processed, and in doing so, their effect should only be applied once. The first part means that a command is durably recorded under a unique ID, the second part means that a command is either carefully managed under that ID, or is idempotent.

Queries require much less handling - they don't need to be recorded and then executed, they can be executed right away, and have their results returned on the same connection. If the connection breaks - they can be retried with no worries, since they are idempotent by their nature.

And lastly, both commands and queries are in themselves stateless. State lives in a higher-level container called "workflow", consisting of commands and queries, that is responsible for the entire business-relevant operation, and exists either in a dedicated piece of software, or is scattered across different services.

Let us now go over widespread communication patterns. They aren't as relevant, if you are only making the "stateless compute" part of a distributed system yourself, which requires no coordination among itself, and rely on the database for all state handling. But if you do need reliable communication over the network, how do you arrange for it?

Direct RPC is a good option - with your own framing over TCP, or even your own reliability over UDP. Maybe JSON over HTTP. Either way, you decide if state lives only within one session - then you can put the other end behind a load balancer, but you lose the benefit of data locality - or if state lives between sessions - then you need to expose all IPs and ports, and worry about correct failover.

But direct RPC requires a degree of availability - at least one instance of every service should be always available. If that constraint isn't met, you need a sort of "socket activation" intermediary, to hold on to messages. Further, with direct RPC, every service needs to durably record commands that were sent, to their respective database - what if they were recorded in a shared database? This brings us to queues and message brokers. They aren't strictly better than direct RPC, and create a single point of failure, but they are a solid option, widely adopted in the industry.

Both direct RPC and queues commonly have at-least-once delivery guarantees - if reception of a message could not be confirmed, the message will be resent. This means, commands need to be idempotent.

Let's now scale this out to multiple independent services (possibly owned and operated by different teams). What if you need to perform an operation, that spans multiple services? Check for a condition in two services, and then perform a command on two services. But there can be a race condition - while you are checking the second service, the condition in the first service may become false. You need a transaction. There are two options.

First is a distributed transaction, with a transaction manager, and the OpenXA protocol, with two-phase commit. Issue is that if you have no control over a third-party service - you can't force them to adopt the transaction manager. And they may not want to let you lock their database.

The second option is the Saga pattern. Check the condition on the first service, perform an operation, and check the condition again. If the result is not what you want - rollback the operation, with a compensating operation, that should be paired with any positive operation. You no longer need to stop the world to coordinate different services - letting go of the illusion of control gives better performance.

Issue is, if you have a complex business process, that spans multiple services - the code of it may also get scattered across those services, and the whole process will be hard to make sense of, especially when things go wrong. One thing you should consider doing, is centralised logging and tracing, to follow the request between services. Second, factor out the process into a separate service, so the code is all in one place - keep the other services dumb. Third - you may want to have a centralised dashboard for logs and traces for all operations, with the option of canceling or restarting any operation - a workflow engine like Temporal can help in this case. Fourth, you may want to allow non-technical people to program and observe workflows at a high-level - a BPMN engine like Camunda and Flowable is a good option.

Another pattern is eventual consistency. Programs rely on forward progress, on monotonically increasing time and instruction pointer. Programs shouldn't "jump backwards" - otherwise logic, and cause and effect breaks. This won't happen as much, if you have stateful and sticky sessions - the same node will be receiving reads and writes. But what if you use stateless backends behind a loadbalancer, and separate read replicas? It may take time for writes to propagate to where you're reading from - so you won't see the effects of your own commands. There are two options - either specify "wait for propagation" in all of your commands, or accept eventual consistency, by the way of versioning:

Give every entity a version
Show the user the entity, remember its version N
If the user updates the entity - specify in the command "if version is still N", and additionally increment the version (in the same command, atomically)
If the version is no longer N - it was updated by somebody else - either check with business logic that the update still makes sense, or ask the user again, based on the new data

Many databases allow you to specify conditions in the update - if you know that the user made an update based on not the whole data, but specific conditions - you can avoid versioning, and include the conditions in the command (if the price is still under 100$, order the phone - no need to ask the user again because the price changed, since the price went down)

Another thing to consider is the sharding of your backend.

At a high level, the unit of sharding is either user request, or application entity. Per user request means that your entities are passive stores of data - and coordination of their manipulation will require locking. But if application entities are living participants in the communication, you get the actor model - which does not need locks.

At the low level, your application can be single-threaded, multi-process, multi-threaded, async with coroutines, async with virtual threads. Consider that Linux tracks memory (and performs OOM kill) per-process, and that async coroutines introduce function coloring.

And lastly, Event Sourcing. Event Sourcing is the observation that the state of your application is the result of processing a series of inputs, and the conclusion of separating the state into "source of truth" which is the append-only event log, and many "read-friendly" representations of it. Journaled file-systems, journaled databases are examples of event sourcing - sending the database WAL file to a separate analytical database is the example of creating a new read-friendly representation for a particular need. The downside is that you need to evolve (and version) multiple sets of entities - the ones in the event log, and the ones in every read representation - this can quickly become overwhelming. The upside is that you can destroy any read-friendly representation - it can always be recreated from the event log.

This concludes our exploration of application-level design patterns for distributed systems. Let me know if I missed any! In the next blog we will discuss the internal organisation and tradeoffs of different databases. Thank for reading

Distributed Applications. Part 1 - Overview

lostghost — Thu, 16 Oct 2025 10:55:56 +0000

In this blog series we will examine distributed systems - and the modern practices associated with them. This will not be an academic series, but a practical one - there won't be citations of any papers, only evaluation of software products, many of which were originally based on a paper in a scientific journal. As for literature, I recommend "Designing Data-Intensive Applications" by Martin Kleppmann. He has a youtube channel, and a lecture series on distributed systems. He also gives many talks. There also is a book by the titan Andrew Tanenbaum - you can get it for free.

So what is a distributed system? The accepted definition is - a software system with components on different networked computers. Now why is the networked part important, why can't threads or processes on the same computer form a distributed system? Because the network is unreliable, compared to memory. It could also be slower than disk, if you add up all the overhead - modern NVMe is ~7GB/s, 100Gb Ethernet is ~12.5GB/s

This unreliable network introduces the core challenges faced by distributed systems:

It's hard to know if your message was processed by the recipient
A netsplit splits a shared understanding of state

This is on top of challenges faced by a concurrent program - a good distributed program is also a good concurrent program. The distributed solution is more general.

A separate, but related issue is horizontal scaling. A system's processing capacity should grow with the amount of hardware that gets added to it - while not getting eaten up by the extra overhead. For stateless compute, this is simple enough - spawn more copies of the application - while for the stateful database, sharding needs to be employed, and that involves careful coordination.

And finally, fault tolerance. At any moment, there is a chance that your program catches a bug and breaks, or slows down to the point of being unresponcive. If your system consists of a large number of such components - something, somewhere is always broken. And your system needs to tolerate that. This would also make for a good local program - but it's easier for them to crash completely, if anything is broken. Once again, the problem of distributed computing is the more general than local computing.

And really, the central problem of distributed computing is distributed state - that's why there's so much talk of databases, since they are responsible for it. Each database presents its tradeoffs, which you need to keep in mind for your actual application. This is in addition to problems of local state - namely, the correct modeling of it, and the ability of its evolution - you may want to look into Domain Driven Design for that.

What is distributed state? To an individual program, running on an individual computer, there is no local or distributed state, there is only state - the only one it has. Much like to a person, there is no subjective or objective reality, there is only reality - the one they experience. Objective reality, much like distributed state, is a contrivance - it is simply whatever the individual participants can agree on. This means that for distributed state, you need an algorithm - one that, within pre-determined failure conditions, can make it so individual participants can always agree on an objective reality, on distributed state.

Say there is a netsplit - 5 out of 9 nodes recorded an operation, while 4 out of 9 didn't, and they can't reach each other. What is the reality, did the operation happen? That is, once again, determined by the algorithm, which is usually based on a rule of thumb - if one node is unavailable, that's most likely because that node is broken, if many nodes are unavailable - the majority decides what the distributed state is, because the majority is more fault-tolerant, due to having more nodes. But deciding that in the moment, there is no valid shared state, that we should stop accepting operations and wait for the other nodes to become available again, is also a valid design - you are trading availability for consistency, which is the dilemma posed by the CAP theorem. A possible solution is replication - if the remaining 5 nodes can assume ownership over entities previously owned by the other 4 nodes, because they have all the necessary data - the system can continue accepting operations. The 4 nodes would need to realise that they are in the minority, stop accepting operations based on stale data, catch up on the data, once available, and only then accept operations again.

Another common fault is a transient network outage - a message gets lost, and a sender gets no reply. Should the sender send the message again, at the risk of potentially doing the same thing twice? (like removing money from someone's bank account, which you don't want to do twice by accident). This creates at-least-once and at-most-once delivery guarantees for messaging frameworks, which one also needs to design around.

To summarise, regular applications need to correctly model internal state, based on the real world, and make operations on that state, which are useful to the business. Distributed applications then need an algorithm to agree on shared state, and in doing so, decide on the tradeoff between consistency and availability, in the event of outages and netsplits.

In the following blog, we will examine the common patterns employed by applications, to deal with distributed state. Thank you for reading!

Linux from the developer's perspective. Part 4 - strace and pmap

lostghost — Sun, 20 Jul 2025 13:50:12 +0000

This blog is part of a series.

Getting the program to compile and run is a challenge. Getting it to run correctly is even more of a challenge. You would want to know exactly what the program is doing, and how it's different from what you intended for it to do. This is known as debugging.

Debugging takes place at various stages of a program's lifecycle, starting from the programming stage. There are various linters and scanners that go over your source code and identify undesireable functionality. Most of them are intended for use from your editor-turned-IDE, such as VIm. As an example, for the programs ctags and cscope, here's a video.

After programming and compilation, you get a binary program image, which you can also analyse. Analysis at both source code and binary image stages is called static analysis. We already did static analysis, when going over program segments with readelf. A popular static analysis tool is valgrind.

Now let's turn to dynamic analysis. Because a userspace process is a virtualized environment, any interaction with the outside world, any side-effect, has to go through the kernel, in the form of a syscall. So to get an idea for what the program is really doing, monitoring system calls is a good idea. strace helps with that.

Take the same program from the previous blog, compiled statically:

[lostghost1@archlinux c]$ cat main.c 
#include <stdio.h>
int main(int argc, char** argv){
    if (argc<2) return 1;
    printf("%s\n",argv[1]);
    return 0;
}

And let's see what it does:

[lostghost1@archlinux c]$ strace ./a.out 
execve("./a.out", ["./a.out"], 0x7ffeb1ab5720 /* 41 vars */) = 0
arch_prctl(ARCH_SET_FS, 0x405658)       = 0
set_tid_address(0x405790)               = 1631947
exit_group(1)                           = ?
+++ exited with 1 +++

First, it execs the program. Then it sets the FS register, used for pointing to thread-local variables. Then it sets the TID - thread ID - address, used for multithreading. Finally, it exits with code zero.
And now with an argument:

[lostghost1@archlinux c]$ strace ./a.out hello
execve("./a.out", ["./a.out", "hello"], 0x7ffdf3a92938 /* 41 vars */) = 0
arch_prctl(ARCH_SET_FS, 0x405658)       = 0
set_tid_address(0x405790)               = 1632018
ioctl(1, TIOCGWINSZ, {ws_row=24, ws_col=80, ws_xpixel=0, ws_ypixel=0}) = 0
writev(1, [{iov_base="hello", iov_len=5}, {iov_base="\n", iov_len=1}], 2hello
) = 6
exit_group(0)                           = ?
+++ exited with 0 +++

Besides what we already went over, it asks for the size of the terminal - 80x24 in this case, and writes "hello" to the terminal - the "2hello" is it writing out "hello" mid-output from strace. It then exits successfully with code 0.

Now let's compile our executable dynamically. To get the full trace, run it the following way:

[lostghost1@archlinux c]$ strace -f /lib/ld-musl-x86_64.so.1 ./a.out hello

execve("/lib/ld-musl-x86_64.so.1", ["/lib/ld-musl-x86_64.so.1", "./a.out", "hello"], 0x7ffd9a5ff938 /* 41 vars */) = 0
arch_prctl(ARCH_SET_FS, 0x740a78ee5b68) = 0
set_tid_address(0x740a78ee5fd0)         = 1632581
open("./a.out", O_RDONLY|O_LARGEFILE)   = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0@\20\0\0\0\0\0\0"..., 960) = 960
mmap(NULL, 20480, PROT_READ, MAP_PRIVATE, 3, 0) = 0x740a78e38000
mmap(0x740a78e39000, 4096, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 0x1000) = 0x740a78e39000
mmap(0x740a78e3a000, 4096, PROT_READ, MAP_PRIVATE|MAP_FIXED, 3, 0x2000) = 0x740a78e3a000
mmap(0x740a78e3b000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0x2000) = 0x740a78e3b000
close(3)                                = 0
brk(NULL)                               = 0x5555572c6000
brk(0x5555572c8000)                     = 0x5555572c8000
mmap(0x5555572c6000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x5555572c6000
mprotect(0x740a78ee2000, 4096, PROT_READ) = 0
mprotect(0x740a78e3b000, 4096, PROT_READ) = 0
ioctl(1, TIOCGWINSZ, {ws_row=24, ws_col=80, ws_xpixel=0, ws_ypixel=0}) = 0
writev(1, [{iov_base="hello", iov_len=5}, {iov_base="\n", iov_len=1}], 2hello
) = 6
exit_group(0)                           = ?
+++ exited with 0 +++

This is because:

[lostghost1@archlinux c]$ file /lib/ld-musl-x86_64.so.1
/lib/ld-musl-x86_64.so.1: symbolic link to /usr/lib/musl/lib/libc.so

With musl, the C library is it's own loader! So you don't need to load the loader, then the C library, then the executable - the loader and the library are loaded together (due to being the same file), and only the executable is left. Compare this to glibc:

[lostghost1@archlinux c]$ strace ./a.out 
execve("./a.out", ["./a.out"], 0x7ffc43b5b910 /* 41 vars */) = 0
brk(NULL)                               = 0x5adf33995000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=138995, ...}) = 0
mmap(NULL, 138995, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7894a5b10000
close(3)                                = 0
openat(AT_FDCWD, "/usr/lib/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0px\2\0\0\0\0\0"..., 832) = 832
pread64(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 840, 64) = 840
fstat(3, {st_mode=S_IFREG|0755, st_size=2006328, ...}) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7894a5b0e000
pread64(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 840, 64) = 840
mmap(NULL, 2030680, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7894a591e000
mmap(0x7894a5942000, 1507328, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x24000) = 0x7894a5942000
mmap(0x7894a5ab2000, 319488, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x194000) = 0x7894a5ab2000
mmap(0x7894a5b00000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1e1000) = 0x7894a5b00000
mmap(0x7894a5b06000, 31832, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7894a5b06000
close(3)                                = 0
mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7894a591b000
arch_prctl(ARCH_SET_FS, 0x7894a591b740) = 0
set_tid_address(0x7894a591ba10)         = 1632711
set_robust_list(0x7894a591ba20, 24)     = 0
rseq(0x7894a591b680, 0x20, 0, 0x53053053) = 0
mprotect(0x7894a5b00000, 16384, PROT_READ) = 0
mprotect(0x5adf31a26000, 4096, PROT_READ) = 0
mprotect(0x7894a5b67000, 8192, PROT_READ) = 0
prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
munmap(0x7894a5b10000, 138995)          = 0
exit_group(1)                           = ?
+++ exited with 1 +++

It has to go through the effort of finding the libc first. Let's now compare the actual memory maps. For that, modify the source code:

[lostghost1@archlinux c]$ cat main.c 
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>

int main() {
    printf("My PID is: %d\n", getpid());
    printf("Press Enter to exit...");
    calloc(2048,2048);
    getchar();
    return 0;
}

Run the program in one terminal. In another, run pmap <PID>:

[lostghost1@archlinux ~]$ pmap 1634216
1634216:   ./a.out
000055aa8a1c4000      4K r---- a.out
000055aa8a1c5000      4K r-x-- a.out
000055aa8a1c6000      4K r---- a.out
000055aa8a1c7000      4K r---- a.out
000055aa8a1c8000      4K rw--- a.out
000055aa8bdb0000    132K rw---   [ anon ]
00007be23b9b4000   4112K rw---   [ anon ]
00007be23bdb8000    144K r---- libc.so.6
00007be23bddc000   1472K r-x-- libc.so.6
00007be23bf4c000    312K r---- libc.so.6
00007be23bf9a000     16K r---- libc.so.6
00007be23bf9e000      8K rw--- libc.so.6
00007be23bfa0000     40K rw---   [ anon ]
00007be23bfcc000      4K r---- ld-linux-x86-64.so.2
00007be23bfcd000    164K r-x-- ld-linux-x86-64.so.2
00007be23bff6000     44K r---- ld-linux-x86-64.so.2
00007be23c001000      8K r---- ld-linux-x86-64.so.2
00007be23c003000      4K rw--- ld-linux-x86-64.so.2
00007be23c004000      4K rw---   [ anon ]
00007ffe8077a000    132K rw---   [ stack ]
00007ffe807b2000     16K r----   [ anon ]
00007ffe807b6000      8K r-x--   [ anon ]
ffffffffff600000      4K --x--   [ anon ]
 total             6644K

We can see that our program takes up 5 pages, with different permissions. The second one is definitely code, or .text - it has "rx" permissions. The last one, under "rw" - is global variables. The rest are constants, read-only data.

Then comes the heap - it's the largest element in the output. I made it so by allocating a large amount of memory - so it stands out. It is boxed in by the libc and the program itself. Let's see how much room it has to grow:

(0x00007be23bdb8000 - 0x000055aa8bdb0000) / 1024 / 1024/1024/1024
= 38.2175293266773223877

38 Terabytes. Should be enough :)

After the libc and the linker, comes the stack. It has room to grow up til' the fff address. Let's calculate, how much that is:

(0xffffffffff600000 - 0x00007ffe8077a000) / 1024 / 1024/1024/1024/1024/1024
= 15.99987793525954060669

15 Exabytes. So why is it that a program so easily crashes with "stack overflow", when it has 15 Exabytes of stack?

[lostghost1@archlinux c]$ ulimit -s
8192

That's the answer. The stack is artificially limited, to catch "endless recursion" bugs. If you remove that limit - you won't catch "stack overflow", it will be regular "Out of memory" instead.

Let's compare this pmap to that of a static executable:

[lostghost1@archlinux ~]$ pmap 1636409
1636409:   ./a.out
0000000000400000      4K r---- a.out
0000000000401000     20K r-x-- a.out
0000000000406000      4K r---- a.out
0000000000407000      8K rw--- a.out
0000000002036000   4096K rw---   [ anon ]
00007ffd8a010000    132K rw---   [ stack ]
00007ffd8a189000     16K r----   [ anon ]
00007ffd8a18d000      8K r-x--   [ anon ]
ffffffffff600000      4K --x--   [ anon ]
 total             4292K

Much nicer!

But pmap doesn't tell us the full story - only the part that relates to our program. To learn the full story, of the virtual memory mapping for a userspace executable on Linux - refer to this document.

An even better writeup can be found here.

Why is it important to know all this? Because dynamic libraries are a security nightmare. If the attacker tricks the executable into loading a malicious library, the executable is compromised, and can execute arbitrary code (with the privileges of the executable - made worse by SUID. How can an executable be tricked into loading a malicious library?

With LD_PRELOAD
With write access to folders in library search path
With rpath pointing to a writable directory
With replacing the libc loader
With replacing the loader path in a binary

Yeah, this is a broken system. Containers help, I guess. But I'd say, getting rid of shared libraries altogether is a good idea.

Thanks for reading!

Wow, everything is graph!

lostghost — Sun, 13 Jul 2025 14:57:37 +0000

There are two sides to programming, two approaches.

One approach is low-level. A computer is a collection of ciruits, some more programmable than others. A process sees a virtualized memory layout, which allows it to access devices over a bus - memory being just one of the devices. And what programming is about, is learning how to shove bits down the pipe - as fast as the hardware allows it. Those who appreciate computers, myself included, can see how this view is appealing. It correlates with the imperative/procedural way of describing computations.

Image source

The second approach is mathematical. A program is a transformation of inputs to outputs - it is a function. Functions operate on data - sets, vectors, numbers, sets of numbers. Functions can also depend on their own results - thus you get a finite state machine. There are queue networks, Petri nets - robust mathematical models. This approach is more in line with the functional way of describing computations.

Image source

But there is a third approach, more oriented towards computational systems. In my view, a system is a collection of interacting parts, that together perform a higher-level functionality. And the natural way to represent such a system is with a graph, where each node is also a subgraph. This maps well to the object graph.

Image source

The object graph unites two graphs - the execution graph, and the data graph, by the virtue of objects containing both fields and methods. To me, this leads to a natural question - if what we create is a graph, why are we programming text?

So much of programming complexity is in keeping track of how things relate to each other - and by changing a part of the system, what other parts will be affected. A graph answers that question much better than text - yet so far, it's power was only successfully utilized in quick diagrams on a whiteboard, and BPMN engines. Is the program really that hard to represent as a graph, and also - is it too hard to actually program?

Image source

Issue is that it's two graphs - execution and data, which makes for a disjointed view. Could we create one compelling graph, when data has non-trivial cross-references, while also participating in different executions? As a bonus, how to represent metaprogramming - if execution also changes?

Image source

Data consists of entities referencing each other. But all references aren't equal. Some carry ownership information, others don't - like direct fields in a struct vs pointers. Some allow for mutation, others are read-only. Some allow for concurrent access, others have mutual exclusion.

With modern programming practices, the following optimal design emerges. We want to avoid shared mutable data. Ownership gives responsibility of managing the lifecyle, and memory. For mutation by non-owner - borrowing needs to take place, to ensure that a reference stays valid for the duration of the mutation, kind of like in a database transaction. With this, we can represent data as squares, and the data that they own - as squares within. Data that they simply reference - as squares without, connected with arrows.

Here, Foo owns Bar, and Bar references Baz without ownership.

Now for the call graph - methods belong to objects, making them squares within squares. Methods call each other - this is represented by arrows.

So how do we unite the two? By establishing that first there are methods, and data belongs to methods. But what about data belonging to different methods at different times? Who said our graph needs to be static - data can move as the system progresses.

And the data never needs to be in two executions at the same time - it's either copied or borrowed.

So this solves it! But wait. This is a pretty nice way of representing a snapshot of a live system - but what about actually programming a system? How do we represent a system for that goal?

What's important to realise is that programming as we know it, isn't actually programming - it's meta-programming.

Imagine a live system. It has instructions - when a new order is placed, update and display the sum of the cost. When a user creates an order, that changes the state of the system - how is this not programming?

Now imagine a program. It has instructions - when a system is started, register a repository for orders, a web form for creating them, and a processing pipeline, which responds to changes in the repository, and calculates the sum. It's a program over the live system - it is a meta-program.

Now imagine a meta-program. It has instructions - when a web form for creating orders is registered, register copies of it for the user, an internal copy for an admin, and a demo copy for the sales department. This is a program over the meta program - making it meta^2.

So in reality, you're always modifying a graph - just, it's usually not the live system graph, but a graph above it, with instructions of how to modify the underlying graph. This meta-stack can be as high as you want - and the workflow for inspecing and modifying any level of those graphs is the same. And there is no code, which is "dead", and merely a template for constructing a system - all graphs are live graphs, and you are viewing their live state, as it is.

Hopefully you found information in this post to be insightful, and it gave you a broader perspective on what programming is all about. I believe graphs to be the future, and if you agree or disagree - let me know what you think, by writing a comment. Thanks for reading.

Linux from the developer's perspective. Part 3 - Loading and running

lostghost — Sat, 12 Jul 2025 10:53:12 +0000

This blog is part of a series.

We compiled ourselves a binary. How does it get loaded and run? And what does it do while running? Let's start with an example, in the previous post we ran:

[lostghost1@archlinux c]$ ./main 
Hello!

What exactly happens when we run ./main?

An OS process gets created and started. A process being an independent execution, that has it's own context and resources, such as file descriptors and memory.

Firstly, it is important to understand, that a process is never created for no reason - it is always started by another process. In this case, the chain of who started whom goes ./main<-bash<-xfce4-terminal<-xfce4<-X11<-sx<-bash<-login<-systemd<-init (in initramfs)<-systemd-boot<-UEFI Firmware.

Next, a process is never started from "nothing" - it can only start within an already running process, with the exec system call. But you can't have more than one process this way - this is resolved by the clone syscall.

clone copies the process - and all of it's memory. But not quite - a lot of the address space of the process is shared libraries, which don't need to be copied. And those mappings that do need to be copied, for example, program stack and heap - are initially copied with Copy-on-Write, CoW. This way, only minimal physical memory is needed for the running program, and if the process is to later be replaced by another, with exec - no extra effort was wasted.

So, when launching a program - first clone is performed, then exec. exec removes the current process memory map, mapping the segments of the target ELF executable process image instead. After that, a jump to ENTRY symbol (usually _start) is performed.

Speaking of mappings - they are performed by the mmap syscall (or the internal kernel version of it, when the kernel makes mappings implicitly, during the execution of exec). While usually programs create anonymous mappings for the heap, or they are automatically created by hitting the guard page for the stack, this time it's an actual file mapping - and both portions of the file being mapped, and the memory they are mapped to, need to align at page boundaries. If the offset of a segment within a file is not aligned, the segment is mapped with a portion of the segment next to it, if it doesn't fill a page exactly - it is zero-extended.

This is how a statically-linked program is launched. What about a dynamically linked one? It requires an interpreter to be launched first. So, for our main:

[lostghost1@archlinux c]$ readelf -a ./main
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              DYN (Position-Independent Executable file)
  Machine:                           Advanced Micro Devices X86-64

...

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000000040 0x0000000000000040 0x0000000000000040
                 0x0000000000000310 0x0000000000000310  R      0x8
  INTERP         0x00000000000003b4 0x00000000000003b4 0x00000000000003b4
                 0x000000000000001c 0x000000000000001c  R      0x1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]

Notice the last line - this means, that the program /lib64/ld-linux-x86-64.so.2 is launched instead of our program. Typically this is a dynamic loader - but it can be any program, really!

The linker that runs at compile time assigns addresses wherever they are needed, exports symbols, and creates relocations. Relocations are needed because libraries should be loadable at any address - so calls to functions in them should be indirect. So what you get is a GOT - a table that gets modified at runtime, which points to the actual locations of the functions. Think of it this way - a library exports a table of function addresses, but not as direct addresses - because it doesn't know where the library will be loaded - but as offsets from the "base" address instead. And when the library does get loaded, and the "base" address is established - you only need to add offsets to that base address, to know where in memory the functions are stored. This is a simplification, but I feel it is a helpful one.

So, when a dynamically linked program is started, the dynamic linker/loader is started first, which looks at the headers of the executable, and finds the list of dynamic libraries that are needed. Then it looks for those libraries - the rules for where to look are listed in /etc/ld.so.conf. There is a cache for library locations - /etc/ld.so.cache. And finally, the path where to look can be overriden entirely, with rpath. Those libraries themselves may depend on other libraries - the search is recursive.

There is one more thing to keep in mind with dynamic libraries - versioning. By convention, libraries have 3 version numbers - breaking ABI release, backwards-compatible ABI release, internal change. So the file on disk is libfoo.so.X.Y.Z. But the program using the library doesn't care about Y and Z - it just needs a compatible X version. So in the program header, it requests libfoo.so.X but then - how does a dynamic linker/loader match the requested libfoo.so.X against the actually existing libfoo.so.X.Y.Z? By the use of symlinks:

libfoo.so -> libfoo.so.X -> libfoo.so.X.Y -> libfoo.so.X.Y.Z

These symlinks are created by ldconfig. Typically it is run after any system upgrade.

Why do we need the symlinks, why not just have libfoo.so on disk? Well, if some programs need different X versions of the same library. In practice, the package manager tracks all of these version numbers, so you never have that situation - but this provides a mechanism for if you do. Ok then, why not have libfoo.so.X on disk, and not the Y and Z? That's for if you are upgrading a system, you can do that atomically - put the new file alongside the old one, switch the symlink, delete the old file. There's never a point when the library is missing. Of course, now that the industry uses immutable images, this is largely redundant.

And after all that, the dynamic linker mmaps the libraries and our executable into memory, performs relocations, resolves symbols and writes them into the GOT (or delayes resolving them - in which case the PLT is created. In short - it is a special "default" place to jump for unresolved functions, which resolves the function, writes the address into the GOT, and jumps there - so on subsequent calls the GOT is used right away.

Do we actually need all of this machinery? I'd argue we don't - just having static binaries, that dlopen loadable modules is a much simpler mechanism. But it's still important to know how dynamic libraries work, even if purely for being a well-rounded professional.

This is it for now - in the next blog, we will cover debugging a program. So stay tuned for that!

Linux from the developer's perspective. Part 2 - Compilation and linking

lostghost — Tue, 01 Jul 2025 18:52:40 +0000

This blog is part of a series

How does a C program get compiled? For C-like languages, compilation involves four steps:

Preprocessing, compile-time metaprogramming
Compilation itself, translation of the source code to assembly
Assembling, turning assembly into machine code in an object file
Linking, turning the object file into an executable or a library

Of course, all these categories, except for linking, are to some degree arbitrary. Preprocessing is an anomaly, a language within a language, a crutch - it exposes the limited expressive power of the base language. Compilation is to a degree arbitrary, because you can embed assembly code into C code, which doesn't require compilation. Assembly is not actually assembly - it's Gnu Assembly, the universal assembly. Originally, the assembly language was described in the ISA Manual, and the manufacturer provided with it the assembler itself, which read and compiled the assembly - GNU Assembly is not that. It's a higher-level, universal assembler. Still, the mental framework of these four steps is a net positive, but past a point of experience, you can see gaps in the structure.

We already discussed the preprocessor in the previous blog, let's now turn our attention to compilation. Compile our test program like this:

[lostghost1@archlinux c]$ gcc -S main.c

Or rather, for more clean, unoptimized assembly:

[lostghost1@archlinux c]$ gcc -S -O0 -fno-asynchronous-unwind-tables -fno-unwind-tables -fno-ident -fno-stack-protector main.c

Resulting assembly with explanatory comments:

    .file   "main.c"
    .text
    .globl  main
    .type   main, @function
main:
    pushq   %rbp                  # Prologue: save old base pointer
    movq    %rsp, %rbp            # Set new base pointer
    subq    $16, %rsp             # Allocate 16 bytes for local variables

    movl    %edi, -4(%rbp)        # Save argc (1st argument, int) at -4(%rbp)
    movq    %rsi, -16(%rbp)       # Save argv (2nd argument, char **) at -16(%rbp)

    cmpl    $1, -4(%rbp)          # Compare argc to 1
    jg      .L2                   # If argc > 1, jump to .L2 (print argument)

    movl    $1, %eax              # argc <= 1: set return value to 1
    jmp     .L3                   # Return

.L2:
    movq    -16(%rbp), %rax       # Load argv into %rax
    addq    $8, %rax              # Advance to argv[1] (first argument, skipping program name)
    movq    (%rax), %rax          # Dereference: load pointer to argument string
    movq    %rax, %rdi            # Move that pointer to %rdi (argument for puts)
    call    puts@PLT              # Print argv[1] with puts()
    movl    $0, %eax              # Set return value to 0

.L3:
    leave                         # Epilogue: restore frame pointer and stack
    ret                           # Return to caller

    .size   main, .-main
    .section    .note.GNU-stack,"",@progbits

As you can see, many C constructs translate into assembly directly. For example:

int a = 10, b = 20, c;
c = a + b;

Translates to:

mov eax, 10
mov ebx, 20
add eax, ebx
mov c, eax

Another example:

int arr[4] = {1, 2, 3, 4};
int *p = &arr[2];
*p = 99;

Translates to:

mov eax, [arr + 8]   ; access arr[2] (int, 4 bytes each)
mov dword ptr [arr + 8], 99

So in a way, C is just higher-level assembly. But in other ways, it isn't - some constructs don't have a translation, producing undefined behavior. Structs, enums and unions are higher-level datatypes, which don't have a direct assembly counterpart. Calling conventions vary between CPUs and OS'es. In fact, if you want to explore, how exactly does code translate into assembly - there is a really useful website for that, GodBolt.

After compilation comes assembly, which translates assembly code into machine code, for a given ISA. But it doesn't output just text - it outputs a binary image. Specifically, one in an ELF format.

But the resulting artifact is an object file, which isn't the final process image. It contains information about sections (.text, .data, .bss) and their contents (machine code, using section-relative addresses), as well as references to symbols imported from external libraries. However, machine code uses section-relative addresses - addresses based on offsets from start of sections. But because we don't yet know at which address these sections are loaded - so we can't run the program yet. What lays out the sections in memory, thus turning them into segments, is a linker - and it does so with a linker script. On Arch Linux, these are at /lib/ldscripts/.

Let's examine one. Take elf_x86_64.x.

OUTPUT_FORMAT("elf64-x86-64", "elf64-x86-64", "elf64-x86-64") // self-explanatory
OUTPUT_ARCH(i386:x86-64)
ENTRY(_start) // which symbol is the entry point to the executable
SEARCH_DIR("/usr/x86_64-pc-linux-gnu/lib64"); SEARCH_DIR("/usr/lib"); SEARCH_DIR("/usr/local/lib"); SEARCH_DIR("/usr/x86_64-pc-linux-gnu/lib"); // which directories to look for for libraries, while linking
SECTIONS
{
  /* Read-only sections, merged into text segment: */
  PROVIDE (__executable_start = SEGMENT_START("text-segment", 0x400000));
  . = SEGMENT_START("text-segment", 0x400000) + SIZEOF_HEADERS;
  /* Place the build-id as close to the ELF headers as possible.  This
     maximises the chance the build-id will be present in core files,
     which GDB can then use to locate the associated debuginfo file.  */
  .note.gnu.build-id  : { *(.note.gnu.build-id) }
  .interp         : { *(.interp) }
  .hash           : { *(.hash) }

This shows the mapping of sections into segments, starting at address 0x400000.
Let's now link the program manually

[lostghost1@archlinux c]$ gcc -c main.c 
[lostghost1@archlinux c]$ ld main.o --dynamic-linker /lib64/ld-linux-x86-64.so.2 /usr/lib/crt1.o -lc -o main
[lostghost1@archlinux c]$ ./main hello
hello

When invoking ld, our linker, we needed to specify the path to the dynamic loader (which is specified as --dynamic-linker - quite confusing), because we are compiling a dynamic and not a static executable - more on the distinction later. crt1.o is a special object file, part of the standard C library, which contains the entry point (the _start) symbol. -lc is libc, glibc in our case - alternatives such as musl libc exist.

Now let's inspect the binary:

[lostghost1@archlinux c]$ readelf -a main
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x401060
  Start of program headers:          64 (bytes into file)
  Start of section headers:          13088 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         12
  Size of section headers:           64 (bytes)
  Number of section headers:         24
  Section header string table index: 23

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .interp           PROGBITS         00000000004002e0  000002e0
       000000000000001c  0000000000000000   A       0     0     1
  [ 2] .hash             HASH             0000000000400300  00000300
       0000000000000018  0000000000000004   A       4     0     8
  [ 3] .gnu.hash         GNU_HASH         0000000000400318  00000318
       000000000000001c  0000000000000000   A       4     0     8
  [ 4] .dynsym           DYNSYM           0000000000400338  00000338
       0000000000000048  0000000000000018   A       5     1     8
  [ 5] .dynstr           STRTAB           0000000000400380  00000380
       0000000000000039  0000000000000000   A       0     0     1
  [ 6] .gnu.version      VERSYM           00000000004003ba  000003ba
       0000000000000006  0000000000000002   A       4     0     2
  [ 7] .gnu.version_r    VERNEED          00000000004003c0  000003c0
       0000000000000030  0000000000000000   A       5     1     8
  [ 8] .rela.dyn         RELA             00000000004003f0  000003f0
       0000000000000018  0000000000000018   A       4     0     8
  [ 9] .rela.plt         RELA             0000000000400408  00000408
       0000000000000018  0000000000000018  AI       4    18     8
  [10] .plt              PROGBITS         0000000000401000  00001000
       0000000000000020  0000000000000010  AX       0     0     16
  [11] .text             PROGBITS         0000000000401020  00001020
       0000000000000075  0000000000000000  AX       0     0     16
  [12] .rodata           PROGBITS         0000000000402000  00002000
       0000000000000004  0000000000000004  AM       0     0     4
  [13] .eh_frame         PROGBITS         0000000000402008  00002008
       0000000000000088  0000000000000000   A       0     0     8
  [14] .note.gnu.pr[...] NOTE             0000000000402090  00002090
       0000000000000040  0000000000000000   A       0     0     8
  [15] .note.ABI-tag     NOTE             00000000004020d0  000020d0
       0000000000000020  0000000000000000   A       0     0     4
  [16] .dynamic          DYNAMIC          0000000000403e60  00002e60
       0000000000000180  0000000000000010  WA       5     0     8
  [17] .got              PROGBITS         0000000000403fe0  00002fe0
       0000000000000008  0000000000000008  WA       0     0     8
  [18] .got.plt          PROGBITS         0000000000403fe8  00002fe8
       0000000000000020  0000000000000008  WA       0     0     8
  [19] .data             PROGBITS         0000000000404008  00003008
       0000000000000004  0000000000000000  WA       0     0     1
  [20] .comment          PROGBITS         0000000000000000  0000300c
       000000000000001b  0000000000000001  MS       0     0     1
  [21] .symtab           SYMTAB           0000000000000000  00003028
       0000000000000180  0000000000000018          22     5     8
  [22] .strtab           STRTAB           0000000000000000  000031a8
       00000000000000a6  0000000000000000           0     0     1
  [23] .shstrtab         STRTAB           0000000000000000  0000324e
       00000000000000cc  0000000000000000           0     0     1

We see that we still have the section headers - along with the program headers! Let's remove all of it, since we won't be debugging this executable:

[lostghost1@archlinux c]$ strip --strip-section-headers main
[lostghost1@archlinux c]$ readelf -a main
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x401060
  Start of program headers:          64 (bytes into file)
  Start of section headers:          0 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         12
  Size of section headers:           0 (bytes)
  Number of section headers:         0
  Section header string table index: 0

There are no sections in this file.

There are no section groups in this file.

Much better!

Now on the difference between static and dynamic executables. Object files that call out to external functions, produce unresolved symbols. They are resolved during linking - when the executable is laid out in program segments, the points where functions are called get replaced with jumps to the actual function addresses. This makes for a static executable. However, we can choose to postpone resolving the symbols - and resolve them at program start. Then, we will declare which libraries we need, and which symbols from them are needed - and at program start, the linker will run first, find those libraries, load them, and resolve the symbols. This makes for a dynamic executable.

Let's see which one our program is:

[lostghost1@archlinux c]$ ldd main
    linux-vdso.so.1 (0x00007ffedcd23000)
    libc.so.6 => /usr/lib/libc.so.6 (0x0000756dbada8000)
    /lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x0000756dbafc0000)

Both libc and the loader are needed at runtime (linux-vdso is a special pseudo-library). That makes the executable dynamic.

Glibc shouldn't produce static executables. To compile one, install musl-libc:

[lostghost1@archlinux c]$ yay -S musl clang
[lostghost1@archlinux c]$ musl-clang --static main.c -o main
[lostghost1@archlinux c]$ ldd main
    not a dynamic executable
[lostghost1@archlinux c]$ ./main hello
hello

This executable has all its symbols resolved - no dynamic loader needed!

Lastly, let's touch upon compiling dynamic and static libraries themselves. A static library is just an archived object file:

[lostghost1@archlinux c]$ cat main.c
#include <stdio.h>
#include "sayhello.h"
int main(int argc, char** argv){
    sayhello();
    return 0;
}
[lostghost1@archlinux c]$ cat sayhello.h
#ifndef _SAYHELLO_H
#define _SAYHELLO_H
void sayhello();
#endif
[lostghost1@archlinux c]$ cat sayhello.c
#include <stdio.h>
void sayhello(){
    printf("Hello!\n");
}
[lostghost1@archlinux c]$ musl-clang -c sayhello.c
[lostghost1@archlinux c]$ musl-clang -c main.c
[lostghost1@archlinux c]$ ar q libsayhello.a sayhello.o
ar: creating libsayhello.a
[lostghost1@archlinux c]$ musl-clang --static main.o -L. -lsayhello -o main
[lostghost1@archlinux c]$ ldd main
    not a dynamic executable
[lostghost1@archlinux c]$ ./main 
Hello!

Here, -L. means "look in this directory", -lsayhello means "look for a file libsayhello.a" (.a because we specified --static, otherwise it would be .so).

As for a dynamic library:

[lostghost1@archlinux c]$ rm main
[lostghost1@archlinux c]$ gcc -shared sayhello.o -o libsayhello.so
[lostghost1@archlinux c]$ gcc main.o -L. -lsayhello -o main
[lostghost1@archlinux c]$ ldd main
    linux-vdso.so.1 (0x00007ffc384aa000)
    libsayhello.so => not found
    libc.so.6 => /usr/lib/libc.so.6 (0x000074e040a48000)
    /lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x000074e040c65000)
[lostghost1@archlinux c]$ ./main 
./main: error while loading shared libraries: libsayhello.so: cannot open shared object file: No such file or directory
[lostghost1@archlinux c]$ LD_LIBRARY_PATH=. ./main 
Hello!

Typically we don't look in the current directory - neither for executables (which is why we have to specify ./ when running ./main), nor for libraries - this is for security reasons, so that we don't accidentally run what we didn't intend to. Which is why we have to resort to specifying the environment variable.

Of course, the shared library advertises it's exported symbol:

[lostghost1@archlinux c]$ readelf -a libsayhello.so
...
Symbol table '.dynsym' contains 7 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND _ITM_deregisterT[...]
     2: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND [...]@GLIBC_2.2.5 (2)
     3: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND __gmon_start__
     4: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND _ITM_registerTMC[...]
     5: 0000000000000000     0 FUNC    WEAK   DEFAULT  UND [...]@GLIBC_2.2.5 (2)
     6: 0000000000001110    20 FUNC    GLOBAL DEFAULT   11 sayhello

And that's all I have to share, when it comes to compiling and linking a C program. In the next blog we will examine loading and running an ELF executable file. See ya then!

Linux from the developer's perspective. Part 1 - C language introduction

lostghost — Sun, 29 Jun 2025 09:57:38 +0000

This blog is part of a series.

Unix is the programmer's OS. Let's see that for ourselves. A further exploration of the material shown here can be found in this book.

Linux and C share a lot of their DNA - the syscall API is a C API. Understanding Linux helps you understand C, and vice versa. To get an overview of how the whole mechanism functions, let's compile and run a simple C program, and carefully examine the process.

We will develop in the terminal, on a minimal Linux installation. How to get such an installation - refer to an earlier blog.

The minimal program will look like this:

[lostghost1@archlinux c]$ cat main.c 
#include <stdio.h>
int main(int argc, char** argv){
    if (argc<2) return 1;
    printf("%s\n",argv[1]);
    return 0;
}

I suggest using the nano editor. Let's go over it line-by-line.

On line 1 there is a preprocessor directive - those start with #. A preprocessor is a special kind of language, programs in which run at compile time. It's used for metaprogramming, modifying the way the program behaves. Modern languages try to avoid creating a second language for metaprogramming, but that's not the case with C.

In C, the compilation unit is a file - the compiler considers one file at a time. So if the file needs function signatures, to be able to call into libraries - the signatures need to be included into every file. This is what the #include directive does - it takes the file that you pass to it, and copies the entire contents to the place where the #include was written. We can see that for ourselves, by running the C PreProcessor, cpp (that requires installing the package gcc).

[lostghost1@archlinux c]$ cpp main.c
# 0 "main.c"
# 0 "<built-in>"
# 0 "<command-line>"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 0 "<command-line>" 2
# 1 "main.c"
...
extern void funlockfile (FILE *__stream) __attribute__ ((__nothrow__ , __leaf__)) __attribute__ ((__nonnull__ (1)));
# 949 "/usr/include/stdio.h" 3 4
extern int __uflow (FILE *);
extern int __overflow (FILE *, int);
# 973 "/usr/include/stdio.h" 3 4

# 2 "main.c" 2

# 2 "main.c"
int main(int argc, char **argv){
    if (argc<2) return 1;
    printf("%s\n",argv[1]);
    return 0;
}

We can see a lot of text, and in the end - our program. The mass of text that precedes the program, is all that was included from our #include directive - plus metadata, such as files from which the included text came from. Of course, #include is a pretty crude way of being able to call libraries - modern languages use module systems, instead of header files.

What does a header file look like? You could find the stdio.h header file - it should be at /usr/include/stdio.h. Or we can write our own header file, for our program:

[lostghost1@archlinux c]$ cat main.h
#ifndef _MAIN_H
int main(int argc, char **argv);
#define _MAIN_H
#endif

And include it into our program:

#include "main.h"
#include <stdio.h>
int main(int argc, char **argv){
    if (argc<2) return 1;
    printf("%s\n",argv[1]);
    return 0;
}

Include with <these> braces means including from the system directory (/usr/include/), while with "these" means from the custom one - including the current directory.

In the header file, there are weird preprocessor directives, above and below the actual signature. That is called an include guard - it prevents the same header from being included twice. Which, is another argument in favor of a proper module system - you don't need hacks like this one.

Next comes the function header - it's our main function, from which the program starts running (more on that later). It returns an int, and takes two arguments - argc, and an array of strings argv.

Let's first discuss the return value. It's only a loose convention, of which values mean what. If the function performs an operation - typically the return value of zero means success, and a positive number - means an error code. If it returns an address - then the value of 0, which is NULL, means failure, and a positive number means a valid address. Sometimes numbers from 0 and up are all valid - like with file descriptors, returned by the open syscall, in which case the value -1 means an error. When it comes to program exit status, which is what the main function returns - despite the data type being a signed int, only values 0-255 are allowed - where 0 means success, and any other number is an error code. Error codes aren't standard across programs either - you need to look at the documentation for a particular program, to find out what it means. Of course, a better system would be a standard set of exception attributes - about if the exception is fatal, if it's transient, which module does it relate to, what is the cause. Some exceptions would be more specific, others - less. But for now, it's just numbers.

Next let's take a look at datatypes. They in large part correspond to the machine data types, that the processor operates on - C is not far removed from assembly. There's signed and unsigned numbers, a character, which is actually a single byte, bool, float, struct, union and pointer. Pointer is special - it is actually not a "real" datatype - it is just an unsigned number, wide enough to store an address for the current architecture. You can easily convert a pointer to a number and back. A pointer to 0, a NULL pointer, is a special kind of pointer, which is technically valid - you can dereference it - but typically you don't want to dereference it, because you probably didn't intend to look at the value at address 0, and it is a programming mistake if you do.

Next is another datatype, that doesn't really exist - an array. An array is just a pointer, and to index into the array you use an offset. So these are equivalent:

*(argv+1)
argv[1]
1[argv]

How then do you determine where the array ends? You use a separate variable. Or, you use a decades-old dirty trick.

Another datatype, that doesn't really exist - a character string. A character string is just an array, which is just a pointer. But there is no second variable to determine the length of the string. Instead, there is an implicit contract - the last character of a string will be a zero byte, a NULL. So, if you print out a string - the printing function will start at the start address, and continue reading byte-by-byte - until it finds the NULL character. If for some reason it's not there - the function will continue reading either up to an unmapped page boundary, in which case there will be a SegFault, or a random other NULL byte - and if between the start and the NULL byte, there was stored, for example, a password - it will be printed out as well. As you can imagine, historically, this has been a major source of vulnerabilities.

When writing a program, you are doing computer maths. And you would prefer to operate with actual mathematical concepts, and not concrete computer ones - you shouldn't care about how many bits are in a number, or if an array is a pointer and an offset. That's not the role C plays - it's close to hardware. For a higher-level system, look into how integers are handled in Python - when small, they use the native datatype. But if they get too large - they implicitly convert to "BigInteger" implementation, and don't just loop around.

So then, what is char**? It's an array of NULL-terminated character strings, the length of which is passed in the argc variable. The array contains command-line arguments, with which the program was launched - the first argument always contains the path to the executable itself.

So then, what does the program do? It prints out it's first actual command-line argument. Compile it and run it:

[lostghost1@archlinux c]$ gcc main.c
[lostghost1@archlinux c]$ ./a.out hello
hello

Success! But wait, what did this gcc program actually do? And how exactly was the resulting executable, well, executed? That's for a later blog. See ya then!

Linux from the user's perspective - Part 3: Graphical Interface

lostghost — Wed, 25 Jun 2025 16:51:37 +0000

This blog is part of a series. More here

Before reading this blog, I recommend watching this talk. I am very much influenced by it, and I don't want to rip it off.

Graphics are pretty cool, right? Along with having multiple applications running, that you can interact with at the same time. But all applications need to be able to interact with the graphics system in a consistent way - which requires a common standard. Let's see what the industry came up with.

The X11 protocol, and it's open-source implementation X.Org have a rich history. Originally designed for a network-transparent mainframe-centric model, along with the vast zoo of incompatible displays, keyboards, mice, and the like, they had to be repurposed for the modern age. Over the course of that journey, it went through three stages of evolution, and how has a successor - Wayland.

I won't touch on history of the protocol's development - the talk linked above does an excellent job already. Let's discuss the technical side. In the original incarnation, the vision was as follows.

You have a powerful mainframe, and a graphics-capable terminal. First you run the X server on the terminal - it takes control of the screen, the keyboard and the mouse. It loads from the disk fonts, bitmaps, themes and styles, configs. It does so by the virtue of the startx script that runs the commands from ~/.xinitrc, which typically had the following contents:

xrdb -merge ~/.Xresources &    # load x resources
xsetroot -solid grey &         # background color
xclock -geometry 50x50+40+40 & # simple app
xterm -geometry 80x50+20+150 & # terminal
exec twm                       # window manager; 'exec' makes it keep X running until WM exits

But this is only a part of the configuration. Configuration of devices goes into xorg.conf or xorg.conf.d/ file/directory, located at /etc/X11/xorg.conf.d/, /usr/share/X11/xorg.conf.d/, ~/.xorg.conf.d/ - they are merged together.

Then the X server is started - and it would query for available XDMCP remotes. XDMCP daemon would run on a remote host, a mainframe, and connect to the X server via TCP. The X server would present a login screen. After login, a remote desktop environment would be started, and you could start apps - all of them would run remotely, on the mainframe.

When a graphical app would start, it would:

set up a TCP socket to the X server, via Xlib
use themes, fonts, bitmaps, that were loaded by the X server
get a window, which is a framebuffer to be rendered-to in immediate mode
send drawing commands - put pixel, draw line, draw square, draw text

As for arranging windows on a screen, a special privileged client connected to the X server - the window manager. Just as regular apps would subscribe to events, including keyboard events, the window manager would subscribe to window events - window created, destroyed, moved, resized. It would then tell the X server, where to place windows. This allowed for clean separation of concerns, and allowed the X server to stay flexible, and allow for different window management strategies, by adhering to the "mechanism, not policy" philosophy.

Let's configure a server, for exactly that experience. We will use the VM as the XDMCP server, so the X server should already be running on your host. Make sure it allows for TCP connection, by editing /etc/X11/xinit/xserverrc. Also authorize the remote server with the command

xhost +<IP>

Configuring the network and the unprivileged user
To prepare the host, let's configure the network with NetworkManager, and disable dhcpcd.

[root@archlinux ~]# pacman -S networkmanager
[root@archlinux ~]# systemctl enable networkmanager
[root@archlinux ~]# nmtui # configure the interfaces here
[root@archlinux ~]# systemctl disable dhcpcd@enp1s0

Let's now create an unprivileged user.

[root@archlinux ~]# useradd -m -G adm,wheel,tty,sys -s /usr/bin/bash user

Allow root without a password

[root@archlinux ~]# pacman -S sudo
[root@archlinux ~]# nano /etc/sudoers # input the text below
root ALL=(ALL) ALL
Defaults targetpw
ALL ALL=(ALL) NOPASSWD: ALL

Set up password, switch to the user

[root@archlinux ~]# passwd user
[root@archlinux ~]# sudo su user

Now install the xfce4 desktop environment, xorg, and gdm - it will be our XDMCP daemon

[user@archlinux ~]$ sudo pacman -S xfce4 xorg gdm
[user@archlinux ~]$ sudo systemctl enable gdm
[user@archlinux ~]$ sudo systemctl start gdm
[user@archlinux ~]$ nano ~/.xsession # set contents to
startxfce4

Edit /etc/gdm/custom.conf, set contents to:

[daemon]
WaylandEnable=false

[security]
DisallowTCP=false

[xdmcp]
Enable=true

[chooser]

[debug]
Enable=true

And now, on your local host, run from TTY:

[user@archlinux ~]$ sudo X :2 -query <ip>

Or, from an emulated terminal:

[user@archlinux ~]$ Xephyr :2 -query 192.168.100.2

(this is actually running Gnome, idk, whatever)

And while this currently works (even if over plain TCP - use SSH or TLS tunneling), the modern approach is remote desktop - RDP or VNC. The difference of what's being transmitted is vector graphics vs raster graphics. And X11 does vector graphics poorly, at least from what I've surmised. I'd love to see a modern immediate-mode vector graphics rendering technology that works remotely, but alas, we are stuck with raster graphics for now. Speaking of remote desktop, let's set it up!

First, install yay
Then run

[user@archlinux ~]$ yay -S xrdp
[user@archlinux ~]$ nano ~/.xsession # set to text below
xfce4-session
[user@archlinux ~]$ nano /etc/X11/Wrapper.config # set to text below
allowed_users=anybody
needs_root_rights=no
[user@archlinux ~]$ sudo systemctl start xrdp.service

Now connect to it, for example, with Remmina

This is with xrdp - modern desktop enviroments, such as KDE and GNOME provide their own built-in RDP servers, which you can use.

Since we are discussing RDP, it would be useful to know, how it actually works under the hood. On X, any client can read the contents of the entire window, with XGetImage - and send them over. With a modern alternative, like Wayland, there is a special protocol - you can only read the contents of the windows that the user authorizes you to read.

The next step of evolution came with supporting hardware acceleration, through shared memory and graphics APIs - like OpenGL, with the xgl extension. Of course - none of this is possible over a network. So, local X was the new paradigm.

We can also forward any single application. Simply run

[user@archlinux ~]$ export DISPLAY=<hostIP>:0
[user@archlinux ~]$ xterm

The correct display number can be seen in env on the host
Result:

The final step of evolution came with compositors, which broke an important architectural decision in the X server. Namely - because every application drew to its window in immediate mode, you couldn't have some windows cast shadows on other windows, or have them be transparent. That would require baking such interactions into the protocol, and making every application support them - which would be impossible. So, a hack was found - applications would draw not to the screen, but to an off-screen buffer. After a buffer was drawn, the compositor, which is the more powerful window manager, would be notified - it would then create it's own buffer, into which the buffers of all active applications would be overlayed in correct order, and with effects. Then only this buffer would be given to X for rendering.

The trajectory of the X11's development is as follows - it performed less and less functions. More and more drawing was done either by the app, or the compositor X was just passing buffers back and forth. So, a more modern protocol was developed - Wayland. The goal of it is to be minimal, and allow apps and the compositor to do their job, and stay out of their way otherwise. It does raster graphics, and only on the current machine. Remote solutions exist, such as waypipe - but it sends over raster graphics for a window, not vector graphics.

Let's do some reflection. Which is better - raster graphics or vector graphics? I say vector graphics, but you have to do them well, and they are more complicated. Next would be running applications remotely - how do you do that well? Ideally, you would run applications on a cluster - and not care about which individual server that is. But that requires that all resources an application may use are also available on the cluster - which is not a trivial problem. For now, seems that local apps + RDP is what the industry settled for.

That's it, when it comes to the Graphical Interface on Linux - next time, we will discuss system administration.

Linux from the user's perspective - Part 2: Terminal Interface

lostghost — Thu, 19 Jun 2025 08:15:44 +0000

This blog is the second in a series. More here.

Linux traces its lineage to the era of computations, where the command line was the primary means of interacting with the computer. Nowadays we use the keyboard to input text, but otherwise our primary means of interaction with programs is through pointing to and activating interactive elements on the screen - with a touchscreen, a touchpad, a mouse. A further streamlining of that process would be eye-tracking, or a direct link to the brain.

With that, you have two options for a UI - Command Line Interface - CLI and Terminal User Interface - TUI. A CLI allows you to input individual commands, and get their response printed out, wereas the TUI presents you with interactive visual elements placed around the screen - like a graphical interface, just with text instead. An example of CLI would be our main shell - bash, while a TUI would be the text editor we used - nano.

CLI example - Source

TUI example - Source

The TUI offers a much richer way of interacting with the system, and yet the default UI for Linux is CLI - for two reasons. Firstly, legacy - line-oriented terminals came prior to those with a screen you could draw anywhere on. Secondly, CLI is a protocol is much easier - you just read text and write text. TUI as the main UI would require programs to agree to a bigger protocol - and the Linux ecosystem has a poor record when it comes to agreeing on protocols.

Protocols are difficult - this will be a recurring theme. Both CLI and TUI use the terminal. Now, how hard is it to agree on how to input and output text? Harder than it first appears. The terminal requires a control protocol - what kind of mode to switch to, where on the screen to place the cursor, which sound effect to play, and a data protocol - which character to output. Historically, control information is passed in-band, on the data channel - in the form of special ASCII characters, and escape sequences. For example, character number 7 means to ring a bell. And the sequence \033[H means "move the cursor to the top left corner of screen". More information can be found here.

Problem is that historically, these sequences were not standardized. Which is why libraries such as termcap were resorted to. At least the character encodings were standardised - you had Baudot code, ITA-2, EBCDIC, ASCII.

Ascii table - Source

Linux emulates VT102/VT220 terminals, with the help of the getty program.

VT100 - Source

With the current technologies and standards, how would we implement a terminal? It's job is to take text input, and print out text at the correct screen position. But nowadays, our monitors are graphical, our physical connectors such as HDMI, DisplayPort, ThunderBolt are graphical - we don't need to send text over a wire, we can send the rendered graphics. Then we don't need special control characters in the encoding. To render text into graphics, the OS kernel would load the font, the application would configure the size of the terminal, and that's it - the application can input and output text. For cross-platform rendering of graphics, there is a UEFI standard - Console I/O Protocol Text Output Protocol, Text Input Protocol, Graphics Output Protocol.

Over the raw terminal, runs a special program that allows you to input not just text, but commands - that program being a shell. Based on user commands, the shell interacts with the kernel - and through it, possibly with other programs. The original shell for Unix was the Thompson Shell, while the one most commonly used nowadays is the Bourne Again Shell. There are other shells available, such as the slimmed-down dash, the expanded zsh, the unique fish. But for myself and many others, returns are diminishing after bash.

These shells support aliasing long commands by short names, embedding subcommands into a larger command, and doing general programming, with commands, variables and functions, conditions and loops, making them Turing-complete. As with any Turing-complete system, there is a point at which programming in it is optimal. Some Linux users push that boundary, when it comes to shell scripting.

A shell allows you to perform administrative tasks, or launch an application. To launch an application, just input it's name! Well, not so simple. The shell carries state, that will influence how the application will be run, and whether it will be executed successfully. The shell state consists of, among other things, the following:

Current directory
Environment variables
Internal variables
umask setting
User that is logged in
ulimits

All of them influence how and if an application is run. I would argue that only the current logged in user should influence the application, but that's not how it is today.

The current application can be interrupted with Ctrl+C, shut down with Ctrl+\, suspended with Ctrl+Z, resumed with fg, or run in the background with bg. But what if you want multiple active programs in one terminal? For that you would need a terminal multiplexer, a middleground between a CLI and a TUI - there are options such as GNU screen and tmux.

tmux - Source

Within the application, you would want to point to things - for example, to move the cursor to the desired position in text. Previously, terminals didn't have mouse support - that required sophisticated keyboard shortcuts to be able to point the cursor to the desired position. For example, in programs such as vi and emacs

Emacs - Source

But enough yapping - let's actually do something. One of the most common task on a computer is editing a document - so let's edit one in the console, like a person from the 80's would do. We will use some modern tools, but the feeling will be the same.

Start up the VM, remount the filesystem read-write, configure the network

echo 'nameserver 8.8.8.8' > /etc/resolv.conf
mount -o remount,rw /
pacman -S nano
systemctl enable dhcpcd@enp1s0 # - replace with the correct interface
systemctl start dhcpcd@enp1s0

You can do the remount automatically, by editing the "/etc/fstab" file

echo '/ / none remount,rw 0 0' > /etc/fstab

Now let's set up telnet, for easier copy-pasting of commands
First, add the IP address for the host network, from within the VM - you can see the appropriate subnet, if you check the addresses for the second virtual interface (first one being used for NAT - correlate the one on the host with the one on the VM) on the host. For me that was:

On the host:

4: virbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc htb state UP group default qlen 1000
    link/ether 52:54:00:d3:77:04 brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
       valid_lft forever preferred_lft forever
5: virbr1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc htb state UP group default qlen 1000
    link/ether 52:54:00:21:87:cf brd ff:ff:ff:ff:ff:ff
    inet 192.168.100.1/24 brd 192.168.100.255 scope global virbr1
       valid_lft forever preferred_lft forever

On the VM:

2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:5d:53:69 brd ff:ff:ff:ff:ff:ff
    altname enx5254005d5369
    inet 192.168.122.241/24 brd 192.168.122.255 scope global dynamic noprefixroute enp1s0
       valid_lft 2630sec preferred_lft 2180sec
    inet6 fe80::865a:a008:b8a:d06b/64 scope link 
       valid_lft forever preferred_lft forever
<enp2s0 was empty - not shown>

First one is NAT, so second one (enp2s0) will be the LAN. Thus, on the VM:

ip a a 192.168.0.2/24 dev enp2s0
ip link set up enp2s0
systemctl start telnet.socket

Permit root login via telnet - add the line "pts/0" to /etc/securetty.
And login from the host:

telnet 192.168.100.2

Now, let's install the needed packages for the actual demo:

pacman -Sy vim texlive-basic texlive-latex texlive-doc texlive-mathscience wget

And download the example document:

wget https://github.com/mundimark/markdown-vs-latex/raw/refs/heads/master/samples/sample2e.tex

Open it for editing:

vim sample2e.tex

Now, let's change the year from 1994 to 1995. Input:

:%!sed 's/1994/1995/g'

To save the changes, input:

:w

Compile to dvi format:

:!latex %

And preview the document:

:!dvi2tty sample2e.dvi

Now you would send the dvi file to the line printer.

Scroll to the end with spacebar, then exit vim with:

:wq

This should give you an idea of how documents were edited back in the day.

Now what does this say about the CLI ecosystem of end-user applications? It consists of large programs, such as vim and emacs - and small programs, that perform their individual tasks on text buffers, that can then plug into the big editors. And for scripting, the same small commands can integrate with the shell. In this way, it forms a complete, coherent, text-based echosystem. The ecosystem goes much further - hand-editable text-based config files, git as a text-oriented database, text-oriented logs, email stored as text files, the network protocols being just text over TCP (as is the case with HTTP, SMTP, FTP), the docs in the form of man pages and texinfo files, being text - these are a few examples. Here is a talk on the subject.

These small programs were originally (and still largely are, but at least grouped into packages) individual executable files, but nowadays tools such as busybox and toybox exist - they provide one binary file, which implements a lot of the commands. This makes the system better partitioned. Speaking of which, do you know the original meaning of a busybox?

But documents don't exist in a vacuum - they are stored as files, in a filesystem. The filesystem is a lot, but from a certain angle, it's a tree-oriented database. Both system files and user documents are stored in this database - and both uses have different priorities when it comes to the database. With system state, you want the mechanism for storage, modification, and retrieval, to be specialised to the task, guarantee consistency, and hide implementation. With documents, you want arbitrary grouping, for example, with a tag-oriented database, or a knowledge graph, or both - with collaborative editing, saving of previous versions, synchronisation with other devices. The filesystem is a compromise that provides a good enough interface for both usecases. But if an OS were to be implemented from scratch in the modern times - would we actually need a filesystem?

In the next blog, we will take a look at the graphical interface.