Forem: Albert Zeyer

OpenVPN userspace with tunsocks (without TUN devices)

Albert Zeyer — Mon, 03 Nov 2025 23:10:31 +0000

I want to connect to an OpenVPN server, from a Linux client where I don't have root access and cannot (and want not) create a TUN device. I don't really need the TUN device: I just want to connect to a SSH server within the VPN.

This is possible. But it needs a patched openvpn. Specifically bendlas/openvpn-tuna or ValdikSS/openvpn-tunpipe. That adds the possibility to use --dev "|<pipetool>" for openvpn. So instead of using a TUN device, it would run some command and pipe all the data to it. Then, there is russdill/tunsocks which you can use as the program here. For example, tunsocks -L [bind_address:]bind_port:host_address:host_port.

It took me a while to get this working:

I first tried with bendlas/openvpn-tuna. The README suggested to use Nix. So I tried with Nix. Running nix directly was not really working well as non-root (I did not knew about the custom --store), so I tried it within Docker/Apptainer/Singularity. I first tried with --fakeroot, which also does not work well with nix. But without --fakeroot, it worked. But then, the suggested commands did not really work. E.g. I tried nix run github:bendlas/openvpn-tuna#tunsocks -- config.ovpn. That asked the for login, but ended up in an endless loop of Connection reset, restarting [0], SIGUSR1[soft,connection-reset] received, process restarting, Restart pause, 1 second(s). I also tried the other commands but nothing really worked.

Then I also compiled bendlas/openvpn-tuna directly without Apptainer and Nix, by just using autoreconf, configure and make, and could also run it. But I got just the same behavior.

Btw, Gemini was not helpful at all for this generic task to use OpenVPN in user-space without a TUN device, just for some port forwarding. It basically said it is not possible. It also misunderstood the purpose of openvpn-tuna, openvpn-tunpipe, tunsocks, etc. It also misunderstood the instructions from the openvpn-tuna README. It also misunderstood any of the OpenVPN errors.

However, Gemini was quite helpful in debugging random Apptainer and Nix issues (e.g. the problem with --fakeroot, which was quite involved and non-trivial to figure out). It mostly understood the issues, or at least gave me very useful hints on where to look next.

Then I tried the slightly older ValdikSS/openvpn-tunpipe. Now without Nix. I again did autoreconf, configure and make, and could also run it. I first tried ./src/openvpn/openvpn config.ovpn. And that worked, up to ERROR: Cannot ioctl TUNSETIFF tun: Operation not permitted (errno=1), which was expected. So then I wanted to try the --dev "|tunsocks -L ...". For that, I also needed to clone russdill/tunsocks and build that, which was fairly straightforward. And then it just worked!

The final command:

./src/openvpn/openvpn --config config.ovpn --script-security 2 --dev "|../tunsocks/tunsocks -L 2222:<sshhost>:22"

Note, alternatively, you could also make a SOCKS proxy, or use other things from tunsocks.

Note, for WireGuard, there seem to exist easier-to-use solutions for the same functionality. There is whyvl/wireproxy. There is aramperes/onetun. There is noisysockets/noisysockets. There is cloudflare/boringtun.

Why game development is a great learning playgroung

Albert Zeyer — Wed, 25 Jun 2025 21:01:50 +0000

(This is copy of the article from here.)

Game development is a great way to learn coding and computer science in general. You can learn a lot by game development. And not only about coding itself, but also about advanced algorithms of various sorts and much more. If you compare it to the learning value of working on any tool or whatever other application, in most cases you will learn more by developing a game because games include a very wide area of technics while most other applications are mostly very limited in the amount of theories and techniques involved.

Related resources

Coding Game Intro. Coding Introduction / Tutorial, mostly by writing small little fun games.
On Coding
On Debugging
On Profiling
Coding for kids
Quora: Is game development a good way to learn a programming language? (yes...)
Kshitij Dhar: Why Video Game Programming Is Awesome (And How to Get Started with Creating Your First Game)
What is a music player?. In a similar manner, this article shows that a music player also covers various different aspects.
Joshua Barretto: Writing Toy Software Is A Joy. List of toy programs rated by difficulty and time required.

Programming languages

Of course the most basic part is the coding itself.
That includes the programming language and getting used to it. Learning a programming language is not only learning the syntax and semantics - it is also about how you do it right.

In a medium/big sized game (and often even in small), there are often a bunch of different languages used together.

For example, in the OpenLieroX project, you will find applications of at least these languages:

C++. Most of the game and its core is written with it.
C. Some rare parts of the game are C code.
Assembler. Some performance critical parts of the game are written in Asm.
Lua. We use Lua for ingame scripting.
Python. The dedicated script and a few other scripts use Python.
Bash / Zsh. Many scripts to do some management (like preparing release packages and distributing them) are using either Bash or Zsh.
CMake. Our build system uses it.
PHP. Our homepage uses it and also the reference implementation of the HTTP masterserver.

But when you are just starting with coding,
for a simpler game project, any single language is enough.

The programming language doesn't really matter that much for learning. Don't be scared about C++. It isn't really that complicated. Or maybe Rust, or Nim, or some of the other newer native languages. Python is of course also a very good starting language.

Libraries

Knowing programming languages is not enough.
Most languages also come with their own standard library but in many cases, you also need some external libraries or frameworks.

Here a list of libraries that the OpenLieroX core uses:

C standard library
C++ standard library (STL)
Boost
SDL. This is for initialising the window/screen, doing graphics operations and handling user input.
SDL_image. For some more image formats.
GD. Another lib for various image formats and image handlings.
HawkNL. For networking (TCP/UDP).
zlib. Compression.
libzip. ZIP support.
libxml2. XML support.
libcurl. HTTP support.
OpenAL. For sound.
ALUT. For initialising the OpenAL environment and WAV support.
libvorbis/libogg. For Vorbis OGG support.

For the other languages, mostly only the standard library is used. (Python has a "batteries included" philosophy.)

Also see this list about useful game libraries.

Cross platform

If you don't want to restrict yourself to only one platform (like Linux or MacOSX or even Windows), you need to write your code in a way that it is mostly cross platform.

One important point is that the dependencies and libraries you are using are all also cross platform, i.e. available for most or all platforms. If you take a look at the list of libraries from above, you will find that this is mostly the case.

In practice, this is not all. You will face problems that some things will just behave and work different depending on the platform. Also the platforms themself are by purpose somewhat different and handle things differently. So you will still end up in having different solutions and code paths in these cases for different platforms.

By working on a cross platform game like OpenLieroX, you will learn how to deal with this. When you use the right libraries,
this is not that much of a problem.

Architecture independence

This is somewhat related to cross platform. If you don't want to restrict yourself to one architecture (like x86 32bit) you need to write your code in a way that it is architecture independent.
As some platforms (like iPhone or Playstation) are naturally running on a different architecture (like arm or ppc), you are often forced to write architecture independent if you want to support those.

OpenLieroX should run on pretty much every architecture.
Problems you are facing are for example about big/little endianness and 32bit versus 64bit.

Protocols and file formats

In many cases, knowledge of protocols and file formats can be useful, even if a library is doing all the handling for you.
In some cases, you even need to implement some basic support for yourself.

Here a list of protocols or file formats you most likely will at least get a bit in touch with at OpenLieroX:

TCP, UDP
HTTP, SMTP, DNS
INI, XML, MP3

Of course, OpenLieroX introduces also some own protocols (for networking) and file formats (maps and mods).

Algorithms and data structures

You will need and learn all of the basic algorithms and data structures.
That are at least:

linked lists, hash maps, vectors, stacks, queues, sets, trees
sort, search, ...

Graphics

When writing graphics related code, you are dealing with a wide range of topics,
many of them being mathematical. Despite the mathematical part, you are often also dealing with performance problems and optimisations because this is often the most performance critical part. So you need to learn how to optimise your code and at the same time keep it maintainable.

Depending if your game is 2D or 3D, the type of mathematics you are applying differs a bit (and probably 3D graphics need even some more deep knowledge of linear algebra) but there is also a big intersection.

User interface

The user interface is e.g. the menus, the buttons, the dialogs, etc.

Audio

You learn about the basic calculation and principles how to calculate the sound modifications when you want to get a 3D sound experience.

Physics

Most games need some kind of physics engine to simulate physics of the game world. For simple games, this is usually often kept very simple.

Artificial intelligence

The most obvious application of artificial intelligence in games is the computer enemy. AI in games and AI in general is really a topic for itself. Maybe the biggest and most important of all things which you learn by game development but that is up to your own focus.

Also, the subtopics of AI you can apply in a game depends a bit on the type of game, but in general, a lot of different AI topics can be applied in games. (The actual topics you will apply in a game are usually much simpler though.)
For example:

Planning and problem solving
Searching and pathfinding (e.g. A*)
Machine learning (all kinds of: supervised, reinforced and unsupervised), neural networks, genetic algorithms, etc.
Knowledge representation
Pattern matching and recognition
Cognitive science
Natural language processing

OpenLieroX has some advanced searching functions (variant of A* in an almost continuous space) and a bunch of hardcoded strategies and plans. There could be much more, like a self-learning bot (based on neural networks) or a genetically engineered bot. Or some basic support of natural language processing so that the bot can chat.

Game theory

Game theory deals with theoretical questions about strategies and states of games. If you want to answer questions like if there is a strategy for your game where you always will win - that is game theory.

Game theory also matters for some of the artificial intelligence applications.

Networking

If you want to support some sort of multiplayer over a network (like the Internet), you need to know a bit about networking.
It is not just about encoding and serializing your data and transfer it somehow over network. A game often needs very low latency. You are usually dealing with problems like:

synchronisation of time (and events in time)
synchronisation of the game world state
unreliable data channel

Operating systems

How operating systems work is also a whole huge topic for itself. You will scratch many parts of this because a game often has many low level parts in its code where you must know some basics about operating systems. Esp, that is:

File operations and file systems
Pipes
Multithreading
Memory

Memory handling

In case you need to handle with huge amounts of memory, you need to know about memory optimisations and how a computer program usually handles the memory allocations on a low level. See the malloc() function,
or jemalloc (popular efficient malloc implementation).

OpenLieroX provides a build with a custom malloc implementation for debugging. See file memstats.cpp.

Parallel computing

Parallel computing is where many computations are done simultaneously in parallel.
Games often need to make a lot of very heavy and intense calculations. The trend of computers is to have multiple CPUs (or cores) instead of having a single one. To use them all, it is mandatory to write your code in a way that several parts can be executed in parallel. Despite the need for it, it also often leads to more clean code (if you do it right).

So, what you will learn is how to do parallel computing in general. That is both theoretic knowledge and practical knowledge. In the practice, parallel computing is often solved via multiple threads, i.e. multithreading.
The interesting parts are about how you synchronise your threads and how you communicate your data from one thread to another in a safe way.

Version control

You will learn how to work with some kind of version control system which manages all your source code files and often also other resources of your project. It will keep track over all changes in all files and will make it possible that multiple people can work on the same code. With the history of the changes, you will be able to see what person has done what changes and you can hunt down bugs to specific changes. You can also see what person has written which parts of a source file and thus ask him about some details.

Such a system is even very helpful and nice to have if you work alone on a project. And it is absolutely mandatory once you work in a team.

In OpenLieroX, we use Git - a very powerful distributed version control system.
Learning Git is really a whole topic for itself.
You might want to use a platform like GitHub to host your code and to make it available for others.

Testing

You might want to setup some kind of testing for your code, e.g. unit tests or integration tests.

Continuous integration

Continuous integration is a way to automate the testing and building of your code.
E.g. GitHub supports this with GitHub Actions.

Debugging

You will stumble upon bugs and problems in your code. You will need to learn how to find and fix them. This is called debugging. Fortunately, there are many tools available to help you with debugging, such as debuggers. See here for a list.

Profiling

Your code might be too slow (or hang) or use too much memory.
Profiling is a way to find out where your code is slow or uses too much memory. There are many tools available to help you with profiling, usually called profilers. See here for a list.

Media design

A game usually needs graphics, sounds, levels or other media. Those need to be created and designed.

There are also many existing assets available online
(see this list).

Story and game design

Depending on the type of game, you need a story.
Game design is a whole topic for itself.

Teamwork

A project of the size of OpenLieroX needs teamwork. Various people doing various different tasks (coding, designing, distribution, support, testing) need to coordinate. This is even more important for the coding which can be the biggest of those tasks. Once there are multiple people working on the same code, they need to coordinate. The source code version control management system is very central for this
(see version control)
(but this is useful even if you are working alone on a project).

Project management

Once the project is getting bigger, you need to manage the project, e.g. keep track of the tasks, bugs, features, etc.
Platforms like GitHub provide some tools for this
such as an issue tracker.
Project management is again a whole topic for itself.

Where to start

See Coding Game Intro. You could start some own simple game project
(some ideas),
or take an existing open source game and modify it
(examples), or contribute to an existing open source game project.

Reading code of existing games

To be able to do something useful with the code, it is very helpful / mandatory that you have a rough overview over the code. Many projects have an overview document. Also see here.

Contributing to an existing project

Many projects have a document like CONTRIBUTING.md describing how to contribute. It usually also makes sense to speak to someone from the project first. Also see here.

Setting up a development environment

For a beginner, this is often the most complicated part.
It is often not straightforward how you setup an IDE / editor / compiler to build the source code.
This is somewhat silly but you have to get through that.
(The compiler will convert the source code into an executable binary; on Windows, that is an EXE-file.)

The IDE is the editor where you edit the code. In theory, you could also use a simple text editor; however, an IDE will be mure more helpfull and easy to use.

In some cases (like MSVC), the IDE is bundled together with the compiler. In some other cases (on Linux), they are highly separate and independent from each other. That means that setting up the compiling is in some cases (like MSVC or Xcode) the same as setting up the IDE, in some cases (like on Linux) independent.

However, compiling the source code is the most important thing you need to be able to do. You cannot do anything if you are not able to do this.

Projects usually provide some documentation how to set up the development environment.

When you start a new project, it makes sense to see some simple examples how to set up the development environment.
Usually you would create a new version control repository, which is then an empty directory, and you would put your stuff into it. Also see here.

It might also make sense to setup some generic playground environment to test minimal ideas, code snippets, etc.
See here.

Playing around with Ultra HDR

Albert Zeyer — Sat, 27 Jan 2024 14:43:03 +0000

This is the ultrahdr.py script, which generates this HDR-demo.

(Check gregbenzphotography.com/hdr/ whether you can display HDR properly.)

High-dynamic range (HDR) is to extend the usual color range (Standard Dynamic Range (SDR)) and usually also extends the common 8bit color depth to 10bit or more.

Some modern displays (~2021) (e.g. MacBook M1, some OLED TVs) support HDR, but it is still a rare feature.

There are multiple formats for HDR images, e.g.:

OpenEXR
AVIF
JPEG XT embedded in JPEG XL
JPEG XR
Ultra HDR (used here) embedded in standard JPEG

Ultra HDR uses the JPEG multi-picture format (MPF). It stores the normal SDR JPEG image as the first image, so all existing JPEG decoders can display the normal image. Then it stores a HDR gain map embedded in MPF which can be used to reconstruct the HDR image.

Currently, (end of 2023), Google Chrome stable (end of 2023) supports this format.
(Another alternative in Google Chrome is AVIF.)
(Firefox currently does not support it.)

Currently, (end of 2023), Google Pixel phones can capture Ultra HDR images (e.g. when they use night mode).

(Note, many websites, e.g. Twitter, will reencode JPEGs after you upload them, and often they don't support Ultra HDR yet, so then it will be lost, and you will just see the normal SDR JPEG image.)

About the Ultra HDR format:
https://developer.android.com/media/platform/hdr-image-format

This document defines the behavior of a new file format that encodes a logarithmic range gain map image in a JPEG image file. Legacy readers that don't support the new format read and display the conventional low dynamic range image from the image file. Readers that support the format combine the primary image with the gain map and render a high dynamic range image on compatible displays.

To use the simple script here, for preparation:

First, build this: https://github.com/google/libultrahdr
Make sure FFMpeg is installed

This script does nothing fancy: It just upscales the input JPEG color range (FFmpeg does that here currently) and then encodes the HDR gain map using Google's Ultra HDR encoder (libultrahdr). The effect is that the image will display brighter on HDR displays.

Python Preloaded

Albert Zeyer — Sun, 09 Oct 2022 12:01:51 +0000

Python Preloaded

Repository: https://github.com/albertz/python-preloaded

Problem:

The startup time of CPython including loading big libraries like PyTorch or TensorFlow is too slow. In case of slow file systems, I have seen startup times including such import of 10-20 seconds.

Very simple idea:

Keep the state of CPython right after we imported the big libraries and make it available instantly when needed. When loading the state, we can continue to run any random Python script (we can use runpy).

Install

pip install preloaded

Method 1: Fork server

Start CPython and import the libraries. Then keep the process running as a fork server. Whenever a new instance it needed, we make a fork (os.fork), and apply a similar logic as reptyr. Some technical details are here.

This solution is very portable across Unix. I tested it so far on Linux and MacOSX, but it should run on most other Unixes as well.

Example

Create the starter script python-tf.bin:

$ py-preloaded-bundle-fork-server.py tensorflow -o python-tf.bin

This starter script is supposed to be a dropin replacement to python itself.

For testing, there is demo-import-tensorflow.py, with only the following content:

import tensorflow as tf
print("TF:", tf.__version__)

Now try to run it directly, and measure the time:

$ time python3 demo-import-tensorflow.py
TF: 2.3.0

________________________________________________________
Executed in    8.31 secs    fish           external
   usr time    3.39 secs  278.00 micros    3.39 secs
   sys time    0.67 secs   83.00 micros    0.67 secs

This is on a slow filesystem, NFS specifically. This is already after the files are cached (I just ran the same command immediately before). Otherwise, the startup time is even over 14 seconds.

The starter script was not run yet, so the first start is just as slow:

$ time ./python-tf.bin demo-import-tensorflow.py
Existing socket but can not connect: [Errno 111] Connection refused
Import module: tensorflow
TF: 2.3.0

________________________________________________________
Executed in    8.35 secs    fish           external
   usr time    3.19 secs  768.00 micros    3.19 secs
   sys time    0.72 secs  228.00 micros    0.72 secs

Now it is running in the background. It is in no way fixed to demo-import-tensorflow.py but could also run any other script now. However, we continue the demo with the same script:

$ time ./python-tf.bin demo-import-tensorflow.py
Existing socket, connected
Open new PTY
Send PTY fd to server
Wait for server to be ready
Entering PTY proxy loop
TF: 2.3.0

________________________________________________________
Executed in  261.56 millis    fish           external
   usr time   64.24 millis  542.00 micros   63.70 millis
   sys time   33.59 millis  163.00 micros   33.43 millis

As you see, the startup time is now very fast. This is also just as fast when executed at a later time, when the files are not cached anymore.

Interactively test the starter script environment:

$ ./python-tf.bin -m IPython

Method 2: Process pool

We always keep some pool (e.g. N=10 instances) of CPython + preloaded libraries alive in the background, and once we need a new instance, we just pick one from the pool.

This shares a lot of logic with the fork server. The main difference basically is that we use subprocess.Popen instead of os.fork.

(Currently not implemented)

Method 3: Program checkpoint on disk

Use some checkpointing tool (CRIU) to store the state of CPython right after we imported the libraries. Then later we can load this checkpoint (very fast).

CRIU currently needs root access for dump/restore. However, there is ongoing work to support a non-root option in https://github.com/checkpoint-restore/criu/pull/1930.

Or maybe DMTCP is a better alternative to CRIU?

(Currently incomplete)

Managing paper bibliography, reading list

Albert Zeyer — Sun, 23 Jan 2022 10:46:06 +0000

(Meta question: Maybe this community not the right place to ask, as this question is more about academic research, less about software development. But I really don't know where to ask. StackExchange will close this as opinion-based. Quora is only for short questions (asked here). Reddit maybe but not sure where exactly. Twitter maybe, as this would reach some relevant people, but this is again only for short questions.)

So far, when I stumbled upon a nice paper (all the time when I open my Twitter timeline), I put it on a reading list, which is a Google Doc, or leave the paper open in a tab until I get time to read it later. That's how I easily end up with 100s of open tabs. Some of them I read, and then I also put them in other Google Docs for some specific research field which I work on where it relates to. I also have a huge Bibtex file which I edit by hand, where I put papers in which I cite in own papers. I further have some scripts which organize, unify and deduplicate entries in that Bibtex file.

Google Doc is online, and the Bibtex file in some Git repo, so everything is properly synchronized. This is an important requirement.

However, it doesn't scale well, and is annoying to organize. I have all these tabs open because it is way too complicated to open the Google Doc and put it in there for potential later read, maybe with some note where I found it or why I think it is relevant for me. Then sometimes each month I go through all these tabs and put it into the Google Doc, which is pretty annoying.

So now I considered to use Zotero. But this lacks a lot of what I do currently, e.g. running my scripts to unify entries. I saw that there is a plugin architecture, and the Zotero-better-bibtex plugin, and a Python script using that debug bridge. So probably I can use that as a base to make my existing scripts work. But not sure if I should work on that now. I also wonder a bit that I still need to put so much own work into it, and how other people are doing this.

Also, importing a paper in Zotero is still not so easy. E.g. when I have some Arxiv page open, ideally I want to do a single click to import it. But now I need to click the "export as bibtex" button, copy that into clipboard, then import that in Zotero, and then currently manually edit the entry to be consistent. Edit I just needed a reload of the Arxiv page after I enabled the Chrome extension. After that, the import properly works now with a single click, which is nice. It still might need some post-editing, e.g. changing "Journal article" to "Report", and maybe other things.

Maybe I have not found the right plugins yet. Or maybe there are also better solutions than Zotero. Or maybe I do things wrong. I would just like to know how others are doing this.

Fastest implementation of `ast.literal_eval`

Albert Zeyer — Thu, 02 Sep 2021 06:51:58 +0000

I asked this as a question on StackOverflow and then answered it myself by some own implementation.

I have some text (str, bytes; actually gzipped in a file on disk) which can be parsed via ast.literal_eval.

(It consists of a list of dicts, where the dict keys are strings, and values strings, int or float. But maybe this question could be generic for any string which can be parsed via ast.literal_eval.)

It is large: ~22MB uncompressed.

What is the fastest way to parse it?

Surely I can use ast.literal_eval, but this seems quite slow. Standard eval is slightly faster (interestingly, but probably as expected, depending how well you know Python; see the implementation of ast.literal_eval) but still slow.

In comparison, when I serialize the same data as JSON, and then load the JSON (json.loads), this is way faster (>10x). So this shows that in principle it should be possible to parse it just as fast.

Some statistics:

Gunzip + read time: 0.15111494064331055
Size: 22035943
compile: 3.1023156170000004
parse: 3.3381092380000004
eval: 3.0252232049999996
ast.literal_eval: 3.765798232
json.loads: 0.2657175249999994

This benchmark script and also a script to generate such a dummy text file can be found: here

(Maybe the answer is: "this needs a faster C implementation; no-one has implemented that yet")

After posting this, I found some related questions. I did not found them via Google though (maybe my search term "faster literal_eval" was bad).

This partly answers the question on why ast.literal_eval is slow.

Also, this basically tells you, when you are thinking whether Python code is a good human readable serialization format (e.g. via repr), then this tells you, better use JSON instead.

So, to the best of my knowledge, there currently did not exist a faster implementation than ast.literal_eval (well, eval itself is a bit faster, but unsafe).

So I implemented my own simple implementation, which converts the literal Python code into equivalent binary Pickle data.
So, for some bytes data, instead of ast.literal_eval(data.decode("utf8")), you would use pickle.loads(py_to_pickle(data)), and get a speedup by 5.5x.

The repo is here.
This is a quite straight-forward implementation in C++, and you can easily directly use it with ctypes (there is an example in the repo).

New statistics:

Gunzip + read time: 0.1663219928741455
Size: 22540270
py_to_pickle: 0.539439306
pickle.loads+py_to_pickle: 0.7234611099999999
compile: 3.3440755870000003
parse: 3.6302585899999995
eval: 3.306765757000001
ast.literal_eval: 4.056752016000003
json.loads: 0.3230752619999997
pickle.loads: 0.1351051709999993
marshal.loads: 0.10351717500000035

Difference of FICLONE vs FICLONERANGE vs copy_file_range (for copy-on-write support)

Albert Zeyer — Thu, 02 Sep 2021 06:43:31 +0000

This was original a StackOverflow question by me, although it was maybe way too specific for StackOverflow, and too long. I then later also answered it myself. So the answer comes below.
I feel this might be interesting for some developers, so this is why I post it here.

I wonder about an efficient way to copy files (on Linux, on a FS which supports copy-on-write (COW)).
Specifically, I want that my implementation uses copy-on-write if possible, but otherwise falls back to other efficient variants. Specifically, I also care about server-side copy (supported by SMB, NFS and others), and also zero-copy (i.e. bypassing the CPU or memory if possible).

(This question is not really specific to any programming language. It could be C or C++, but also any other like Python, Go or whatever has bindings to the OS syscalls, or has any way to do a syscall. If this is confusing to you, just answer in C.)

It looks like ioctl_ficlonerange, ioctl_ficlone (i.e. ioctl with FICLONE or FICLONERANGE) support copy-on-write (COW). Specifically FICLONE is used by GNU cp (here, via --reflink).

Then there is also copy_file_range, which also seems to support COW, and server-side-copy.
(LWN about copy_file_range.)

It sounds as if copy_file_range is more generic (e.g. it supports server-side-copy; not sure if that is supported by FICLONE).

However, copy_file_range seems to have some issues.
E.g. here, Paul Eggert comments:

[copy_file_range]'s man page
says it uses a size_t (not off_t) to count the number of bytes to be
copied, which is a strange choice for a file-copying API.

Are there situations where FICLONE would work better/different than copy_file_range?

Are there situations where FICLONE would work better/different than FICLONERANGE?

Specifically, assuming the underlying FS supports this, and assume you want to copy a file. I ask about the support of these functions for the functionality of:

Copy-on-write support
Server-side copy support
Zero-copy support

Are they (FICLONE, FICLONERANGE, copy_file_range) always performing exactly the same operation? (Assuming the underlying FS supports copy-on-write, and/or server-side copy.)

Or are there situations where it make sense to use copy_file_range instead of FICLONE? (E.g. COW only works with copy_file_range but not with FICLONE. Or the other way around. Or can this never happen?)

Or formulating the same question differently: Would copy_file_range always be fine, or are there situations where I would want to use FICLONE instead?

Why does GNU cp use FICLONE and not copy_file_range? (Is there a technical reason, or is this just historic?)

Related: GNU cp originally did not use reflink by default (see comment by the GNU coreutils maintainer Pádraig Brady).
However, that was changed recently (this commit, bug report 24400), i.e. COW behavior is the default now (if possible) (--reflink=auto).

Related discussion about FICLONE vs copy_file_range by Python developers. I.e. this seems to be a valid question, and it's not totally clear whether to use FICLONE or copy_file_range.

Related Syncthing documentation about the choice of methods for copying data between files, and
Syncthing issue about copy_file_range and others for efficient file copying, e.g. with COW support.
It also suggests that it is not so clear that FICLONE would do the same as copy_file_range, so their solution is to just try all of them, and fallback to the next, in this order:
ioctl (with FICLONE), copy_file_range, sendfile, duplicate_extents, standard.

Related issue by Go developers on the usage of copy_file_range.
It sounds as if they agree that copy_file_range is always to be preferred over sendfile.

The answer:

See the Linux vfs doc about copy_file_range, remap_file_range, FICLONERANGE, FICLONE and FIDEDUPERANGE.

Then see
vfs_copy_file_range. This first tries to call remap_file_range if possible.

FICLONE calls ioctl_file_clone (here),
and FICLONERANGE calls ioctl_file_clone_range.
ioctl_file_clone_range calls the more generic ioctl_file_clone (here).
ioctl_file_clone calls vfs_clone_file_range (here).
vfs_clone_file_range calls do_clone_file_range and that calls remap_file_range (here).

I.e. that answers the question. copy_file_range is more generic, and anyway tries to call remap_file_range (i.e. the same as FICLONE/FICLONERANGE) first internally.

I think the copy_file_range syscall is slightly newer than FICLONE though, i.e. it might be possible that copy_file_range is not available in your kernel but FICLONE is.

In any case, if copy_file_range is available, it should be the best solution.

The order done by Syncthing (ioctl (with FICLONE), copy_file_range, sendfile, duplicate_extents, standard) makes sense.

How to cleanup a branch (PR) with huge number of commits

Albert Zeyer — Thu, 02 Sep 2021 06:36:01 +0000

I was trying to implement some new feature in some larger somewhat messy project (RETURNN but not so relevant).

So I created a new branch, also made a GitHub draft PR (here), and started working on it.

While working on it, it turned out that I needed several other things fixed or extended first. There was not really a clear boundary of these other things (i.e. whether they are to be considered as totally independent), nor was it really clear from the beginning on what actually is needed. This only became clear more and more along the way.

In other cases, this was still not too much, and easy to manage. But for this particular thing, it became quite extreme.

So now I have 90 commits (when you look at the PR at some later point, maybe less because I already cleaned up).

What is a good strategy now to handle them? Some of them can be squashed together. The PR can also be split up because they touch upon several things, so each thing maybe could be moved over into an individual branch (and PR). But they partly also depend on each other in the way that they use a newly introduced feature/function, or tests would fail without the other thing, or so.

Squashing can potentially already clean up a lot. Because there are some commits which introduce TODO comments, and then later implement this. Or sometimes I made some change, and then later I rewrote the code to do it differently.

Is there some tool which can automatically tell which commits are good candidates to be squashed together? E.g. because they modify code in the same function or in nearby code.

My usual strategy (on smaller PRs) is that I first reorder the commits logically as far as possible, and then in the next step I squash commits together.

When there are other unrelated changes in the PR, I reorder them to be first. And then I create a new branch (and PR) consisting of the first set of changes. Then I wait for the test suite to run through for the new PR. Then I merge the PR into master. Then I rebase the main PR on master. And I repeat doing this process. It can require further work when the tests do not run through on some individual new PR.

This way of working takes lots of time, and I need to wait often. The test cases take about 10 minutes to finish on GitHub CI. So for everything I do, I need to wait 10 minutes. So most of the time I would just wait. This often seems very unproductive to me.

For the reordering and squashing and potential further changes, I use the Git GUI in PyCharm, by interactively rebasing again and again.

Am I doing something suboptimally? How can I do it better?

Are there other tools which can help me somehow?

Are there good strategies in general how to deal with this situation?

Maybe I could have avoided the situation somehow by working in a different way? Usually the branches (PRs) are much smaller, so this is much simpler to handle then. Should I actively try to keep changes minimal in some branch, so I don't end up in such situation?

I guess one thing which could avoid such situation is to reduce the amount of technical debt in this project. This is one of the reasons that it was not clear from the beginning what is actually needed.

Actually after formulating all this, I remembered that I did ask already a similar question on StackOverflow before on "How to find pairs/groups of most related commits" (my memories are bad...).
And actually I also did implement a similar script already here, description of the algorithm here. The algorithm is quadratic in the number of commits, and somewhat slow. So maybe already too slow here.

(Note this is a cross post from Reddit. But so far there is no real good solution.)

(I asked before whether dev.to is a good fit for a question like this. I usually use StackOverflow but usually they do not want such opinion-based questions.)

Is dev.to good for asking dev questions?

Albert Zeyer — Wed, 01 Sep 2021 12:14:04 +0000

When I browse dev.to, most posts are just articles. I rarely see that questions are being asked here.

If dev.to is not good for asking questions, what is a good place to ask questions?

Of course there is StackOverflow. But that is only good for questions which have simple, clear, non-opinionated answers. I use StackOverflow quite extensively and it is great for questions which are "allowed". However, I often also have questions about recommendations and opinions, which are strictly not allowed there. But where can I ask such questions?