<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Luca Barbato</title>
    <description>The latest articles on Forem by Luca Barbato (@luzero).</description>
    <link>https://forem.com/luzero</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F153809%2F870c14cf-a8ed-486a-9e26-0cd57543115b.png</url>
      <title>Forem: Luca Barbato</title>
      <link>https://forem.com/luzero</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/luzero"/>
    <language>en</language>
    <item>
      <title>Bringing up an AmpereOne system</title>
      <dc:creator>Luca Barbato</dc:creator>
      <pubDate>Sat, 28 Feb 2026 17:16:52 +0000</pubDate>
      <link>https://forem.com/luzero/bringing-up-an-ampereone-system-1ln6</link>
      <guid>https://forem.com/luzero/bringing-up-an-ampereone-system-1ln6</guid>
      <description>&lt;p&gt;I decided to buy something quite large since my workstations are aging and I wanted to build stuff quickly.&lt;/p&gt;

&lt;p&gt;I decided to buy a &lt;a href="https://www.asrockrack.com/general/productdetail.asp?Model=AMPONED8-2T/BCM" rel="noopener noreferrer"&gt;AMPONED8-2T/BCM&lt;/a&gt; with a nice 128-cores CPU and a bit of RAM.&lt;/p&gt;

&lt;p&gt;It took a bit for the people at &lt;a href="https://www.mifcom.de/" rel="noopener noreferrer"&gt;https://www.mifcom.de/&lt;/a&gt; to source all the components, but eventually I got it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Initial Setup
&lt;/h2&gt;

&lt;p&gt;The board comes with a very pretty BMC so all you need is to connect the management one (or any since they are bonded by default) and let the dhcp get an ip and then feed it to your browser.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fboskg7izb53ibe9suqz8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fboskg7izb53ibe9suqz8.png" alt="BMC" width="800" height="432"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Since I'm using Gentoo the &lt;a href="https://www.gentoo.org/downloads/arm64/" rel="noopener noreferrer"&gt;boot media&lt;/a&gt; just worked, you may feed it directly from the Remote KVM&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3do5pwnxg010tth40v6b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3do5pwnxg010tth40v6b.png" alt=" " width="800" height="432"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And just boot it from the browser&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb2hr12gt71ctr2yqasnw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb2hr12gt71ctr2yqasnw.png" alt=" " width="800" height="432"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Then the process is the usual one&lt;/p&gt;

&lt;h3&gt;
  
  
  Partition and untar
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Partition&lt;/th&gt;
&lt;th&gt;Mount point&lt;/th&gt;
&lt;th&gt;Filesystem&lt;/th&gt;
&lt;th&gt;Recommended size&lt;/th&gt;
&lt;th&gt;label&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;p1&lt;/td&gt;
&lt;td&gt;/boot/efi&lt;/td&gt;
&lt;td&gt;vfat&lt;/td&gt;
&lt;td&gt;1 GiB&lt;/td&gt;
&lt;td&gt;EF00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;p2&lt;/td&gt;
&lt;td&gt;/&lt;/td&gt;
&lt;td&gt;btrfs&lt;/td&gt;
&lt;td&gt;500+ GiB&lt;/td&gt;
&lt;td&gt;8300&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;p3&lt;/td&gt;
&lt;td&gt;/home&lt;/td&gt;
&lt;td&gt;btrfs&lt;/td&gt;
&lt;td&gt;500+ GiB&lt;/td&gt;
&lt;td&gt;8302&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;mkfs.vfat &lt;span class="nt"&gt;-n&lt;/span&gt; EFI /dev/nvme0n1p1
&lt;span class="nb"&gt;mkdir&lt;/span&gt; /mnt/btrfs
mkfs.btrfs &lt;span class="nt"&gt;-L&lt;/span&gt; root &lt;span class="nt"&gt;--checksum&lt;/span&gt; xxhash /dev/nvme0n1p2
mount /dev/nvme0n1p2 /mnt/btrfs
btrfs subvolume create /mnt/btrfs/@
btrfs subvolume create /mnt/btrfs/@repos
btrfs subvolume create /mnt/btrfs/@snapshots
btrfs subvolume create /mnt/btrfs/@var_log
btrfs subvolume create /mnt/btrfs/@containers
umount /mnt/btrfs

mount &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="nv"&gt;subvol&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;@ /dev/nvme0n1p2 /mnt/gentoo
&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; /mnt/gentoo/&lt;span class="o"&gt;{&lt;/span&gt;.snapshots,var/&lt;span class="o"&gt;{&lt;/span&gt;db/repos,log&lt;span class="o"&gt;}&lt;/span&gt;,var/lib/containers&lt;span class="o"&gt;}&lt;/span&gt;
mount &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="nv"&gt;subvol&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;@snapshots /dev/nvme0n1p2 /mnt/gentoo/.snapshots
mount &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="nv"&gt;subvol&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;@repos     /dev/nvme0n1p2 /mnt/gentoo/var/db/repos
mount &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="nv"&gt;subvol&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;@var_logs  /dev/nvme0n1p2 /mnt/gentoo/var/log
mount &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="nv"&gt;subvol&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;@containers /dev/nvme0n1p2 /mnt/gentoo/var/lib/containers

&lt;span class="nb"&gt;cd&lt;/span&gt; /mnt/gentoo
wget &lt;span class="o"&gt;{&lt;/span&gt;stage3&lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="nb"&gt;tar&lt;/span&gt; &lt;span class="nt"&gt;-xpf&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;stage3&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Enter the stage
&lt;/h3&gt;

&lt;p&gt;Usual chroot&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Mount special filesystems&lt;/span&gt;
mount &lt;span class="nt"&gt;--types&lt;/span&gt; proc /proc /mnt/gentoo/proc
mount &lt;span class="nt"&gt;--rbind&lt;/span&gt; /sys /mnt/gentoo/sys
mount &lt;span class="nt"&gt;--make-rslave&lt;/span&gt; /mnt/gentoo/sys
mount &lt;span class="nt"&gt;--rbind&lt;/span&gt; /dev /mnt/gentoo/dev
mount &lt;span class="nt"&gt;--make-rslave&lt;/span&gt; /mnt/gentoo/dev
mount &lt;span class="nt"&gt;--bind&lt;/span&gt; /run /mnt/gentoo/run
mount &lt;span class="nt"&gt;--make-slave&lt;/span&gt; /mnt/gentoo/run

&lt;span class="c"&gt;# copy the nameservers already present&lt;/span&gt;
&lt;span class="nb"&gt;cp&lt;/span&gt; &lt;span class="nt"&gt;--dereference&lt;/span&gt; /etc/resolv.conf /mnt/gentoo/etc/

&lt;span class="c"&gt;# Enter the chroot&lt;/span&gt;
&lt;span class="nb"&gt;env&lt;/span&gt; &lt;span class="nt"&gt;-i&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nv"&gt;HOME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/root &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nv"&gt;TERM&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$TERM&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nv"&gt;PS1&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"(chroot) &lt;/span&gt;&lt;span class="nv"&gt;$PS1&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nb"&gt;chroot&lt;/span&gt; /mnt/gentoo /bin/bash &lt;span class="nt"&gt;--login&lt;/span&gt;

&lt;span class="c"&gt;# Inside chroot now:&lt;/span&gt;
&lt;span class="nb"&gt;source&lt;/span&gt; /etc/profile
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;PS1&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"(chroot) &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PS1&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Initial bits to set up
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;emerge &lt;span class="nt"&gt;--sync&lt;/span&gt;
getuto
&lt;span class="c"&gt;# Install your favourite editor, even from the binhost&lt;/span&gt;
emerge &lt;span class="nt"&gt;-G&lt;/span&gt; vim
&lt;span class="c"&gt;# do your changes&lt;/span&gt;
vim /etc/portage/make.conf
&lt;span class="c"&gt;# minimal and simple setup to to boot&lt;/span&gt;
emerge gentoo-sources grub dracut
eselect kernel &lt;span class="nb"&gt;set &lt;/span&gt;1
&lt;span class="nb"&gt;cd&lt;/span&gt; /usr/src/linux
&lt;span class="c"&gt;# the default config works well enough&lt;/span&gt;
zcat /proc/config.gz &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; .config
make olddefconfig
&lt;span class="c"&gt;# add what you need for e.g. docker or lxd&lt;/span&gt;
make menuconfig
make &lt;span class="nt"&gt;-j&lt;/span&gt; 128 &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; make &lt;span class="nt"&gt;-j128&lt;/span&gt; modules &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; make modules_install &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; make &lt;span class="nb"&gt;install&lt;/span&gt;
&lt;span class="c"&gt;# set up your user&lt;/span&gt;
emerge supeadduser
superadduser
&lt;span class="c"&gt;# set up ssh&lt;/span&gt;
emerge openssh metalog
rc-update add sshd
rc-update add metalog
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All considered the whole process was quite uneventful and took literally minutes to then build additional toolchains and start doing some tests.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;qlop &lt;span class="nt"&gt;-c&lt;/span&gt; | egrep &lt;span class="s1"&gt;'llvm|rust'&lt;/span&gt;
app-eselect/eselect-rust: 2′38″ average &lt;span class="k"&gt;for &lt;/span&gt;1 merge
dev-lang/rust: 15′19″ average &lt;span class="k"&gt;for &lt;/span&gt;1 merge
dev-lang/rust-bin: 13′21″ average &lt;span class="k"&gt;for &lt;/span&gt;2 merges
dev-lang/rust-common: 42s average &lt;span class="k"&gt;for &lt;/span&gt;1 merge
dev-util/rustup: 1′42″ average &lt;span class="k"&gt;for &lt;/span&gt;1 merge
llvm-core/clang: 4′21″ average &lt;span class="k"&gt;for &lt;/span&gt;2 merges
llvm-core/clang-common: 12s average &lt;span class="k"&gt;for &lt;/span&gt;2 merges
llvm-core/clang-linker-config: 5′53″ average &lt;span class="k"&gt;for &lt;/span&gt;2 merges
llvm-core/clang-toolchain-symlinks: 5s average &lt;span class="k"&gt;for &lt;/span&gt;2 merges
llvm-core/llvm: 9′05″ average &lt;span class="k"&gt;for &lt;/span&gt;2 merges
llvm-core/llvm-common: 2′41″ average &lt;span class="k"&gt;for &lt;/span&gt;1 merge
llvm-core/llvm-toolchain-symlinks: 11s average &lt;span class="k"&gt;for &lt;/span&gt;2 merges
llvm-core/llvmgold: 7s average &lt;span class="k"&gt;for &lt;/span&gt;2 merges
llvm-runtimes/clang-rtlib-config: 5′54″ average &lt;span class="k"&gt;for &lt;/span&gt;2 merges
llvm-runtimes/clang-runtime: 6s average &lt;span class="k"&gt;for &lt;/span&gt;2 merges
llvm-runtimes/clang-stdlib-config: 5′56″ average &lt;span class="k"&gt;for &lt;/span&gt;2 merges
llvm-runtimes/clang-unwindlib-config: 5′58″ average &lt;span class="k"&gt;for &lt;/span&gt;2 merges
llvm-runtimes/compiler-rt: 26s average &lt;span class="k"&gt;for &lt;/span&gt;2 merges
llvm-runtimes/compiler-rt-sanitizers: 37s average &lt;span class="k"&gt;for &lt;/span&gt;2 merges
llvm-runtimes/openmp: 6′10″ average &lt;span class="k"&gt;for &lt;/span&gt;2 merges
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>gentoo</category>
      <category>ampere</category>
    </item>
    <item>
      <title>cargo-c common questions</title>
      <dc:creator>Luca Barbato</dc:creator>
      <pubDate>Mon, 02 Sep 2024 09:44:38 +0000</pubDate>
      <link>https://forem.com/luzero/cargo-c-common-questions-2cdc</link>
      <guid>https://forem.com/luzero/cargo-c-common-questions-2cdc</guid>
      <description>&lt;p&gt;I wrote and maintain &lt;a href="https://crates.io/crates/cargo-c" rel="noopener noreferrer"&gt;cargo-c&lt;/a&gt; a &lt;a href="https://doc.rust-lang.org/cargo/index.html" rel="noopener noreferrer"&gt;cargo&lt;/a&gt; &lt;del&gt;applet&lt;/del&gt; custom subcommand that let you build a &lt;a href="https://doc.rust-lang.org/book/ch07-01-packages-and-crates.html" rel="noopener noreferrer"&gt;rust crate&lt;/a&gt; sporting a C-API as a proper C library and install it along with C headers and &lt;a href="https://www.freedesktop.org/wiki/Software/pkg-config/" rel="noopener noreferrer"&gt;pkg-config&lt;/a&gt; files.&lt;/p&gt;

&lt;p&gt;When I notice some rust software sporting a C-API that I'd consider packaging in &lt;a href="https://gentoo.org" rel="noopener noreferrer"&gt;Gentoo&lt;/a&gt; I tend to provide a patch to add the few metadata entries needed to have &lt;strong&gt;cargo-c&lt;/strong&gt; do all the work, some other people politely ask if &lt;strong&gt;cargo-c&lt;/strong&gt; is supported and sometimes it is an uphill battle usually because the project maintainer doesn't know the required effort is minimum or is not aware setting &lt;code&gt;crate-type=cdylib&lt;/code&gt; is not enough.&lt;/p&gt;

&lt;p&gt;I'm writing this to explain some of the problems &lt;strong&gt;cargo-c&lt;/strong&gt; solves and hopefully give few pointers since the &lt;a href="https://github.com/lu-zero/cargo-c/blob/master/README.md" rel="noopener noreferrer"&gt;README&lt;/a&gt; containing all the documentation maybe grew to be fairly big.&lt;/p&gt;

&lt;h2&gt;
  
  
  Top 6 questions, and their answers
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Why this exists, why cargo isn't doing all of this on its own?
&lt;/h3&gt;

&lt;p&gt;It exists because &lt;strong&gt;cargo&lt;/strong&gt; itself does not have all the logic needed to deal with the platform specifics that come when dealing with &lt;a href="https://en.wikipedia.org/wiki/Dynamic_linker" rel="noopener noreferrer"&gt;dynamic linking&lt;/a&gt; and the details that come with &lt;strong&gt;installing&lt;/strong&gt;/&lt;strong&gt;packaging&lt;/strong&gt; a library.&lt;/p&gt;

&lt;p&gt;Depending on the platform, even if they all use ELF as binary format, you may have different rules on how to encode the version information, e.g: Linux distributions have &lt;a href="https://tldp.org/HOWTO/Program-Library-HOWTO/shared-libraries.html" rel="noopener noreferrer"&gt;rules on setting the version&lt;/a&gt; while Android and FreeBSD do follow others. &lt;/p&gt;

&lt;p&gt;macOS uses a different binary format and it has &lt;a href="https://developer.apple.com/library/archive/documentation/DeveloperTools/Conceptual/DynamicLibraries/100-Articles/DynamicLibraryUsageGuidelines.html" rel="noopener noreferrer"&gt;different rules&lt;/a&gt; and so does Windows.&lt;/p&gt;

&lt;p&gt;Linux distribution may have different preferred paths for the libraries, with Debian using a &lt;a href="https://wiki.debian.org/Multiarch/LibraryPathOverview" rel="noopener noreferrer"&gt;multiarch&lt;/a&gt; setup.&lt;/p&gt;

&lt;p&gt;Both dynamic library details and installing/packaging are currently out of &lt;strong&gt;cargo&lt;/strong&gt; scope.&lt;/p&gt;

&lt;h3&gt;
  
  
  I prefer doing it on my own, why should I use cargo-c
&lt;/h3&gt;

&lt;p&gt;With &lt;strong&gt;cargo-c&lt;/strong&gt; I try to use the best practices to support as many platform as possible, trying to stay in sync with what &lt;a href="https://mesonbuild.com" rel="noopener noreferrer"&gt;meson&lt;/a&gt; does. Sadly what is conceptually trivial, installing a package, has lots of details that are platform-specific.&lt;/p&gt;

&lt;p&gt;If you want to use &lt;a href="https://crates.io/crates/cdylib-link-lines" rel="noopener noreferrer"&gt;cdylib-link-lines&lt;/a&gt; to solve the library creation problem, you still have to deal with how every platform expects the library to be installed and sadly it gets annoying.&lt;/p&gt;

&lt;h3&gt;
  
  
  I'm fine with just giving a static library, the user can copy it as they want
&lt;/h3&gt;

&lt;p&gt;You might, but the people distributing your software may then have to redo quite a bit of work, at least maintain a patch &lt;code&gt;Cargo.toml&lt;/code&gt; to add the cargo-c metadata. Some might consider that is not worth the effort and drop your package from the distribution.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does cargo-c require lots to set up?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;cargo-c&lt;/strong&gt; is &lt;a href="https://repology.org/project/cargo-c/versions" rel="noopener noreferrer"&gt;widely distributed&lt;/a&gt; so all you need is to add to &lt;code&gt;Cargo.toml&lt;/code&gt; the following lines&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="nn"&gt;[features]&lt;/span&gt;
&lt;span class="py"&gt;capi&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;capi&lt;/code&gt; feature is used by cargo-c to know which crate in a workspace has to be made into a C library and can be used to keep the C-API within the main crate and enable it only when building with &lt;code&gt;cargo-c&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;There are lots of optional features to accommodate different needs and tune all the details regarding how the library, the headers and the pkg-config needs to be generated and installed.&lt;/p&gt;

&lt;p&gt;A fairly common one is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="nn"&gt;[package.metadata.capi.header]&lt;/span&gt;
&lt;span class="py"&gt;generation&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To not generate the header using &lt;a href="https://crates.io/crates/cbindgen" rel="noopener noreferrer"&gt;cbindgen&lt;/a&gt;, it will copy the header &lt;code&gt;{crate name}.h&lt;/code&gt; from the &lt;code&gt;assets/&lt;/code&gt; directory.&lt;/p&gt;

&lt;h3&gt;
  
  
  How secure is cargo-c?
&lt;/h3&gt;

&lt;p&gt;In the wake of the &lt;a href="https://cwe.mitre.org/data/definitions/506.html" rel="noopener noreferrer"&gt;xz utils supply chain attack&lt;/a&gt; some people started caring more, so I happened to be asked about it.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;I rely on &lt;a href="https://deps.rs/crate/cargo-c/0.10.3+cargo-0.81.0" rel="noopener noreferrer"&gt;deps.rs&lt;/a&gt; to ensure &lt;strong&gt;cargo-c&lt;/strong&gt; is not relying on compromised and/or outdated crates.&lt;/li&gt;
&lt;li&gt;I track cargo as my upstream and I try to cut a release every time a new one is released.&lt;/li&gt;
&lt;li&gt;The code is fully auditable on github and the binaries provided are built from the CI so even that process can be checked.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In general any method that can be used to ensure &lt;code&gt;cargo&lt;/code&gt; itself is not compromised applies to &lt;code&gt;cargo-c&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Since cargo-c is a cargo extension, why not move it in cargo?
&lt;/h3&gt;

&lt;p&gt;In general cargo extensions exist to keep cargo as lean as possible, so the decision to keep the &lt;code&gt;install&lt;/code&gt; subcommand as bare as possible and not deal with all the details I dealt with in &lt;code&gt;cargo-c&lt;/code&gt; is intentional.&lt;/p&gt;

&lt;p&gt;Making C-libraries out of crates is not a strictly core functionality and until we do not have a stable Rust-ABI the need to produce shared/dynamic libraries in a platform-correct fashion is not a priority.&lt;/p&gt;

&lt;p&gt;Probably everything will change once there is enough progress in that regard, if somebody feel that it should be prioritized I'm open to sponsorship ;)&lt;/p&gt;

&lt;h2&gt;
  
  
  In closing
&lt;/h2&gt;

&lt;p&gt;I hope this article helps people convince maintainers of crates trying to replace a C library to adopt it since it will spare quite a bit of headaches to distributors and make much simpler to foster adoption of memory-safer alternatives.&lt;/p&gt;

</description>
      <category>cargo</category>
      <category>rust</category>
    </item>
    <item>
      <title>Is cargo/rustc that slow?</title>
      <dc:creator>Luca Barbato</dc:creator>
      <pubDate>Fri, 30 Aug 2024 15:12:40 +0000</pubDate>
      <link>https://forem.com/luzero/is-cargorustc-that-slow-4mm9</link>
      <guid>https://forem.com/luzero/is-cargorustc-that-slow-4mm9</guid>
      <description>&lt;p&gt;As a side-discussion from the ongoing &lt;a href="https://lore.kernel.org/lkml/20240828211117.9422-1-wedsonaf@gmail.com/" rel="noopener noreferrer"&gt;mess&lt;/a&gt; I read &lt;a href="https://mstdn.party/@tragicomedy/113050650045175106" rel="noopener noreferrer"&gt;this&lt;/a&gt; about &lt;code&gt;Waiting 15 minutes for a small program, e.g. like Yazi to compile is wild.&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;I tried and on my laptop took about 2 minutes to build in release mode, not too bad.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;❯ cargo build
...
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 27.51s
❯ cargo clean
❯ cargo build --release
...
    Finished `release` profile [optimized] target(s) in 2m 01s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;yazi isn't exactly a small program:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;❯ cargo tree | grep -v yazi | wc -l
     685
❯ tokei
===============================================================================
 Language            Files        Lines         Code     Comments       Blanks
===============================================================================
 JSON                    1            1            1            0            0
 Lua                    34         1973         1639           31          303
 Markdown                2          207            0          149           58
 Nix                     4          270          235            6           29
 Shell                   3           56           41            6            9
 TOML                   18         1819         1484          128          207
 YAML                    1           33           29            1            3
-------------------------------------------------------------------------------
 Rust                  463        27288        22926          280         4082
 |- Markdown             6           56            0           51            5
 (Total)                          27344        22926          331         4087
===============================================================================
 Total                 526        31647        26355          601         4691
===============================================================================

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then I looked a bit further and I saw:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="nn"&gt;[profile.release]&lt;/span&gt;
&lt;span class="py"&gt;codegen-units&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="py"&gt;lto&lt;/span&gt;           &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="py"&gt;panic&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"abort"&lt;/span&gt;
&lt;span class="py"&gt;strip&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Definitely not friendly if you want a fast release build&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;codegen-units = 1&lt;/code&gt; may generate a bit more optimal code but then you can use only 1 core and if you have many...&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;lto = true&lt;/code&gt; tends to take lots of time for some optimality, but &lt;code&gt;lto = "thin"&lt;/code&gt; exists and the tradeoff tends to be very good, for &lt;code&gt;rav1e&lt;/code&gt; on x86_64 for a long while managed to produce even a better binary.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So what happens if I do those changes?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;❯ cargo build --release
...
    Finished `release` profile [optimized] target(s) in 47.05s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I guess sometimes it is better to not stop at the first program to evaluate if a toolchain is fast or slow :)&lt;/p&gt;

&lt;h2&gt;
  
  
  P.S.
&lt;/h2&gt;

&lt;p&gt;If you are on Linux, at least right now the quickest combination seems to be &lt;code&gt;clang&lt;/code&gt; as linker with &lt;code&gt;mold&lt;/code&gt; doing the actual work.&lt;br&gt;
&lt;code&gt;gcc + mold&lt;/code&gt; seems sensibly slower and &lt;code&gt;clang + ld.bfd&lt;/code&gt; is curiously a tad slower than &lt;code&gt;gcc + ld.bfd&lt;/code&gt;.&lt;br&gt;
Setting&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[target.{yourarch}]
linker = "clang"
rustflags = ["-C", "link-arg=-fuse-ld=mold"]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;in &lt;code&gt;~/.cargo/config.toml&lt;/code&gt; may be optimal at least right now.&lt;/p&gt;

</description>
      <category>rust</category>
      <category>cargo</category>
    </item>
    <item>
      <title>cross-stages experiments</title>
      <dc:creator>Luca Barbato</dc:creator>
      <pubDate>Fri, 02 Aug 2024 09:15:40 +0000</pubDate>
      <link>https://forem.com/luzero/cross-stages-experiments-lfm</link>
      <guid>https://forem.com/luzero/cross-stages-experiments-lfm</guid>
      <description>&lt;p&gt;I'm still playing with RISC-V and the &lt;a href="https://docs.banana-pi.org/en/BPI-F3/BananaPi_BPI-F3" rel="noopener noreferrer"&gt;bpi-f3&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Support status for BPI-F3 upstream
&lt;/h2&gt;

&lt;p&gt;The upstreaming effort at least for the kernel is &lt;a href="https://patchwork.kernel.org/project/linux-riscv/list/?series=874775" rel="noopener noreferrer"&gt;ongoing&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://gitee.com/bianbu-linux/" rel="noopener noreferrer"&gt;bsp&lt;/a&gt; seems actively maintained with  changes landing quite often in their fork of opensbi, u-boot and linux. Sadly being bsp and since it covers multiple boards they tend of lump lots of changes together so it can get annoying to follow what they are fixing or adding.&lt;/p&gt;

&lt;p&gt;Given enough opensource developers are getting the board or similar hardware, I hope we'll have enough eyes and hands to clean up the code in a shape it can also be upstreamed to &lt;code&gt;u-boot&lt;/code&gt; and &lt;code&gt;opensbi&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  My experiments
&lt;/h2&gt;

&lt;p&gt;I wanted to see how different would feel a whole system built to leverage at least a bit RVV 1.0 through the compiler, sadly until &lt;a href="https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115789" rel="noopener noreferrer"&gt;this bug&lt;/a&gt; won't be addressed it would be hard to have a fully working system compiled with the autovectorizer on and I found it out the hard way.&lt;/p&gt;

&lt;p&gt;The Gentoo default way to build stages and images via &lt;a href="https://wiki.gentoo.org/wiki/Catalyst" rel="noopener noreferrer"&gt;catalyst&lt;/a&gt; relies on &lt;code&gt;qemu-user&lt;/code&gt; when no fast native system is available to avoid pitfalls of packages failing at cross-compilation. It is a bit slow but fairly reliable.&lt;/p&gt;

&lt;p&gt;Since things should improve over time, I tried to leverage &lt;a href="https://wiki.gentoo.org/wiki/Crossdev" rel="noopener noreferrer"&gt;crossdev&lt;/a&gt; and see if:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[x] At least the stage1 packages could be cross-compiled and installed via crossdev (&lt;code&gt;perl&lt;/code&gt; seems the only package misbehaving)&lt;/li&gt;
&lt;li&gt;[x] Freshening a stage3 works. As seen with &lt;a href="https://github.com/chewi/cross-boss" rel="noopener noreferrer"&gt;cross-boss&lt;/a&gt; it feasible, I wanted to see it can be made even more straightforward.&lt;/li&gt;
&lt;li&gt;[x] Add packages and setup demons&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All without using &lt;code&gt;qemu-user&lt;/code&gt; so it can be made quickly, so far it mostly works.&lt;/p&gt;

&lt;p&gt;On top of it I just cross-compiled the bsp packages and copied the approach they used to assemble the final image using &lt;a href="https://github.com/pengutronix/genimage" rel="noopener noreferrer"&gt;genimage&lt;/a&gt;, trying to make it as simple as possible by using a single configuration file that does everything in a single invocation.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://github.com/lu-zero/crossdev-stages" rel="noopener noreferrer"&gt;crossdev-stages&lt;/a&gt; set of scripts should let anybody with a recent Gentoo with the dependencies listed to build their images in reasonable time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Usage
&lt;/h2&gt;

&lt;p&gt;I prepared 3 scripts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;chroot-stage.sh&lt;/code&gt; that make quite easy to fetch a stage3, unpack it and and optionally enter it using &lt;code&gt;bubblewrap&lt;/code&gt; (thanks chewi for introducing me to it).
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# If you already have Gentoo you can just fetch the stage&lt;/span&gt;
&lt;span class="nv"&gt;$ &lt;/span&gt;bash chroot-stage.sh setup stage3/ riscv64
&lt;span class="c"&gt;# If you do not have Gentoo you can use the script to have a host chroot&lt;/span&gt;
&lt;span class="nv"&gt;$ &lt;/span&gt;bash chroot-stage.sh setup my-chroot
&lt;span class="nv"&gt;$ &lt;/span&gt;bash chroot-stage.sh enter my-chroot
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;cross-stage.sh&lt;/code&gt; that takes care of all the building, you can go from-scratch or start from a pre-existing stage3
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# This will rebuild all the stage3 using the newer compiler and the CFLAGS you set in the script&lt;/span&gt;
&lt;span class="nv"&gt;$ &lt;/span&gt;bash cross-stage.sh update stage3/
&lt;span class="c"&gt;# Install the bare minimum needed to have the bootloader working&lt;/span&gt;
&lt;span class="nv"&gt;$ &lt;/span&gt;bash cross-stage.sh install_boot stage3/
&lt;span class="c"&gt;# Install additional packages you may find useful&lt;/span&gt;
&lt;span class="nv"&gt;$ &lt;/span&gt;bash cross-stage.sh install_more stage3/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;make-image.sh&lt;/code&gt; that puts everything together and build the bsp packages.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# It will produce a gentoo-linux-k1_dev-sdcard.img.xz in the build-directory&lt;/span&gt;
&lt;span class="nv"&gt;$ &lt;/span&gt;bash make-image.sh /tmp/build-directory stage3/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So far it works for me you are welcome to try it, help improving it or just use the pre-made &lt;a href="https://dev.gentoo.org/~lu_zero/gentoo-linux-k1_dev-sdcard.img.xz" rel="noopener noreferrer"&gt;image&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Enjoy!&lt;/p&gt;

</description>
      <category>gentoo</category>
      <category>bpif3</category>
    </item>
    <item>
      <title>Bringing up BPI-F3 - Part 3</title>
      <dc:creator>Luca Barbato</dc:creator>
      <pubDate>Tue, 04 Jun 2024 17:45:32 +0000</pubDate>
      <link>https://forem.com/luzero/bringing-up-bpi-f3-part-3-101h</link>
      <guid>https://forem.com/luzero/bringing-up-bpi-f3-part-3-101h</guid>
      <description>&lt;h2&gt;
  
  
  Initramfs
&lt;/h2&gt;

&lt;p&gt;Initially I was hoping that it would not be needed, but since the SoC has a &lt;a href="https://www.kernel.org/doc/html/latest/staging/remoteproc.html" rel="noopener noreferrer"&gt;remote processor&lt;/a&gt; and the defconfig for it enables it, I guess it is simpler to use an initramfs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Remoteproc firmware
&lt;/h3&gt;

&lt;p&gt;As seen &lt;a href="https://github.com/BPI-SINOVOIP/armbian-build/commit/f4d657eda0400386bb2bf6d4db8798741afae963" rel="noopener noreferrer"&gt;here&lt;/a&gt; the remoteproc needs a firmware bit and if you happen to forget about it you'd be welcomed by:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[    4.205609] remoteproc remoteproc0: rcpu_rproc is available
[    4.211421] remoteproc remoteproc0: Direct firmware load for esos.elf failed with error -2
[    4.214379] riscv-pmu-sbi: SBI PMU extension is available
[    4.219790] remoteproc remoteproc0: powering up rcpu_rproc
[    4.225306] riscv-pmu-sbi: 16 firmware and 18 hardware counters
[    4.230776] remoteproc remoteproc0: Direct firmware load for esos.elf failed with error -2
[    4.245106] remoteproc remoteproc0: request_firmware failed: -2
[    4.246235] es8326 2-0019: assuming static mclk
[    4.256170] enter spacemit_snd_sspa_pdev_probe
[    4.301833] usb 2-1: new high-speed USB device number 2 using xhci-hcd
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you like to use &lt;a href="https://wiki.gentoo.org/wiki/Dracut" rel="noopener noreferrer"&gt;dracut&lt;/a&gt; all you need is to add to your &lt;code&gt;/etc/dracut.conf.d/firmware.conf&lt;/code&gt; is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;install_items+&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;" /lib/firmware/esos.elf "&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you use &lt;a href="https://wiki.gentoo.org/wiki/Genkernel" rel="noopener noreferrer"&gt;Genkernel&lt;/a&gt;, set in &lt;code&gt;/etc/genkernel.conf&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Add firmware(s) to initramfs&lt;/span&gt;
&lt;span class="nv"&gt;FIRMWARE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"yes"&lt;/span&gt;

&lt;span class="c"&gt;# Specify directory to pull from&lt;/span&gt;
&lt;span class="nv"&gt;FIRMWARE_DIR&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"/lib/firmware"&lt;/span&gt;

&lt;span class="c"&gt;# Specify a comma-separated list of firmware files or directories to include,&lt;/span&gt;
&lt;span class="c"&gt;# relative to FIRMWARE_DIR.  If empty or unset, the full contents of &lt;/span&gt;
&lt;span class="c"&gt;# FIRMWARE_DIR will be included (if FIRMWARE option above is set to YES).&lt;/span&gt;
&lt;span class="nv"&gt;FIRMWARE_FILES&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"esos.elf"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;as explained &lt;a href="https://wiki.gentoo.org/wiki/Genkernel#Firmware_loading" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Coming next
&lt;/h2&gt;

&lt;p&gt;Now the remaining bits I'd like to have done are having a nicer u-boot configuration and hopefully wrap everything up so we can have a Gentoo image that can be simply flashed to the SD/eMMC/NVMe.&lt;/p&gt;

</description>
      <category>riscv</category>
      <category>bpif3</category>
      <category>gentoo</category>
    </item>
    <item>
      <title>Bringing up BPI-F3 - Part 2.5</title>
      <dc:creator>Luca Barbato</dc:creator>
      <pubDate>Fri, 24 May 2024 07:31:29 +0000</pubDate>
      <link>https://forem.com/luzero/bringing-up-bpi-f3-part-25-27o4</link>
      <guid>https://forem.com/luzero/bringing-up-bpi-f3-part-25-27o4</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;this is a sort of intermission&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Getting perf to work up to a point
&lt;/h2&gt;

&lt;p&gt;Apparently the &lt;a href="https://github.com/riscv-software-src/opensbi/blob/master/docs/pmu_support.md" rel="noopener noreferrer"&gt;opensbi-mediated&lt;/a&gt; access to the performance counter does not map so using the usual &lt;code&gt;cycles&lt;/code&gt; and &lt;code&gt;instructions&lt;/code&gt; event works in &lt;code&gt;perf record&lt;/code&gt;. I got this board mainly to help with &lt;a href="https://code.videolan.org/videolan/dav1d" rel="noopener noreferrer"&gt;dav1d&lt;/a&gt; development efforts, so not having perf support would make harder to reason about performance.&lt;/p&gt;

&lt;p&gt;The best workaround after a &lt;a href="https://forum.banana-pi.org/t/perf-record-does-not-seem-to-work-is-the-device-tree-with-the-wrong-information/18051/11" rel="noopener noreferrer"&gt;discussion in the forums&lt;/a&gt;, is to build the &lt;code&gt;pmu-events&lt;/code&gt; to include custom ones and then rely on the overly precise cpu-specific events instead:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ perf list | grep cycle
  bus-cycles                                         [Hardware event]
  cpu-cycles OR cycles                               [Hardware event]
  ref-cycles                                         [Hardware event]
  stalled-cycles-backend OR idle-cycles-backend      [Hardware event]
  stalled-cycles-frontend OR idle-cycles-frontend    [Hardware event]
  m_mode_cycle
       [M-mode cycles]
  rtu_flush_cycle
  s_mode_cycle
       [S-mode cycles]
  stalled_cycle_backend
       [Stalled cycles backend]
  stalled_cycle_frontend
       [Stalled cycles frontend]
  u_mode_cycle
       [U-mode cycles]
  vidu_total_cycle
  vidu_vec0_cycle
  vidu_vec1_cycle
...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ perf list | grep inst
  branch-instructions OR branches                    [Hardware event]
  instructions                                       [Hardware event]
  br_inst
       [Branch instructions]
  cond_br_inst
       [Conditional branch instructions]
  indirect_br_inst
       [Indirect branch instructions]
  taken_cond_br_inst
       [Taken conditional branch instructions]
  uncond_br_inst
       [Unconditional branch instructions]
instruction:
  alu_inst
       [ALU (integer) instructions]
  amo_inst
       [AMO instructions]
  atomic_inst
       [Atomic instructions]
  bus_fence_inst
       [Bus FENCE instructions]
  csr_inst
       [CSR instructions]
  div_inst
       [Division instructions]
  ecall_inst
       [ECALL instructions]
  failed_sc_inst
       [Failed SC instructions]
  fence_inst
       [FENCE instructions]
  fp_div_inst
       [Floating-point division instructions]
  fp_inst
       [Floating-point instructions]
  fp_load_inst
       [Floating-point load instructions]
  fp_store_inst
       [Floating-point store instructions]
  load_inst
       [Load instructions]
  lr_inst
       [LR instructions]
  mult_inst
       [Multiplication instructions]
  sc_inst
       [SC instructions]
  store_inst
       [Store instructions]
  unaligned_load_inst
       [Unaligned load instructions]
  unaligned_store_inst
       [Unaligned store instructions]
  vector_div_inst
       [Vector division instructions]
  vector_inst
       [Vector instructions]
  vector_load_inst
       [Vector load instructions]
  vector_store_inst
       [Vector store instructions]
  id_inst_pipedown
       [ID instruction pipedowns]
  id_one_inst_pipedown
       [ID one instruction pipedowns]
  issued_inst
       [Issued instructions]
  rf_inst_pipedown
       [RF instruction pipedowns]
  rf_one_inst_pipedown
       [RF one instruction pipedowns]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Building perf
&lt;/h3&gt;

&lt;p&gt;Perf way to deal with cpu-specific events is through some machinery called jevents.&lt;/p&gt;

&lt;p&gt;It lives in &lt;code&gt;tools/perf/pmu-events&lt;/code&gt; and you can manually trigger it with.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;./jevents.py riscv &lt;span class="nb"&gt;arch &lt;/span&gt;pmu-events.c
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And produce C code from a bunch of JSON and a CSV map file.&lt;/p&gt;

&lt;p&gt;When I tried build the sources the first time I tried to cut it by setting most &lt;code&gt;NO_{}&lt;/code&gt; make variables and left &lt;code&gt;NO_JEVENTS=1&lt;/code&gt;, luckily I fixed it after noticing the different output in the forum.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;## I assume you have here the custom linux sources&lt;/span&gt;
&lt;span class="nb"&gt;cd&lt;/span&gt; /usr/src/pi-linux/tools/perf
&lt;span class="c"&gt;## being lazy I disabled about everything instead of installing dependencies, one time I disabled too much.&lt;/span&gt;
make &lt;span class="nt"&gt;-j&lt;/span&gt; 8 &lt;span class="nv"&gt;V&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1 &lt;span class="nv"&gt;VF&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1 &lt;span class="nv"&gt;HOSTCC&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;riscv64-unknown-linux-gnu-gcc &lt;span class="nv"&gt;HOSTLD&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;riscv64-unknown-linux-gnu-ld &lt;span class="nv"&gt;CC&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;riscv64-unknown-linux-gnu-gcc &lt;span class="nv"&gt;CXX&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;riscv64-unknown-linux-gnu-g++ &lt;span class="nv"&gt;AR&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;riscv64-unknown-linux-gnu-ar &lt;span class="nv"&gt;LD&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;riscv64-unknown-linux-gnu-ld &lt;span class="nv"&gt;NM&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;riscv64-unknown-linux-gnu-nm &lt;span class="nv"&gt;PKG_CONFIG&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;riscv64-unknown-linux-gnu-pkg-config &lt;span class="nv"&gt;prefix&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/usr &lt;span class="nv"&gt;bindir_relative&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;bin &lt;span class="nv"&gt;tipdir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;share/doc/perf-6.8 &lt;span class="s1"&gt;'EXTRA_CFLAGS=-O2 -pipe'&lt;/span&gt; &lt;span class="s1"&gt;'EXTRA_LDFLAGS=-Wl,-O1 -Wl,--as-needed'&lt;/span&gt; &lt;span class="nv"&gt;ARCH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;riscv &lt;span class="nv"&gt;BUILD_BPF_SKEL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;BUILD_NONDISTRO&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1 &lt;span class="nv"&gt;JDIR&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;CORESIGHT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;GTK2&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; feature-gtk2-infobar&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;NO_AUXTRACE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;NO_BACKTRACE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;NO_DEMANGLE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;NO_JEVENTS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0 &lt;span class="nv"&gt;NO_JVMTI&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1 &lt;span class="nv"&gt;NO_LIBAUDIT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1 &lt;span class="nv"&gt;NO_LIBBABELTRACE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1 &lt;span class="nv"&gt;NO_LIBBIONIC&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1 &lt;span class="nv"&gt;NO_LIBBPF&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1 &lt;span class="nv"&gt;NO_LIBCAP&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1 &lt;span class="nv"&gt;NO_LIBCRYPTO&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;NO_LIBDW_DWARF_UNWIND&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;NO_LIBELF&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;NO_LIBNUMA&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1 &lt;span class="nv"&gt;NO_LIBPERL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1 &lt;span class="nv"&gt;NO_LIBPFM4&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1 &lt;span class="nv"&gt;NO_LIBPYTHON&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1 &lt;span class="nv"&gt;NO_LIBTRACEEVENT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;NO_LIBUNWIND&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1 &lt;span class="nv"&gt;NO_LIBZSTD&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1 &lt;span class="nv"&gt;NO_SDT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1 &lt;span class="nv"&gt;NO_SLANG&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1 &lt;span class="nv"&gt;NO_LZMA&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1 &lt;span class="nv"&gt;NO_ZLIB&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;TCMALLOC&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;WERROR&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0 &lt;span class="nv"&gt;LIBDIR&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/usr/libexec/perf-core &lt;span class="nv"&gt;libdir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/usr/lib64 &lt;span class="nv"&gt;plugindir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/usr/lib64/perf/plugins &lt;span class="nt"&gt;-f&lt;/span&gt; Makefile.perf &lt;span class="nb"&gt;install&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now I have a &lt;code&gt;perf&lt;/code&gt; with still &lt;code&gt;cycles&lt;/code&gt; and &lt;code&gt;instructions&lt;/code&gt; not working with &lt;code&gt;perf record&lt;/code&gt;, I wonder if there is a way at opensbi or kernel level to aggregate events to make it work properly, but I never had to look into perf internals so probably I poke it way later if nobody address it otherwise, anyway&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;perf record --group -e u_mode_cycle,m_mode_cycle,s_mode_cycle
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;produces something close enough for cycles, well &lt;code&gt;u_mode_cycle&lt;/code&gt; is enough.&lt;/p&gt;

&lt;p&gt;While for instructions the situation is a bit more annoying&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;perf record --group -e alu_inst,amo_inst,atomic_inst,fp_div_inst,fp_inst,fp_load_inst,fp_store_inst,load_inst,lr_inst,mult_inst,sc_inst,store_inst,unaligned_load_inst,unaligned_store_inst
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;is close to count all the scalar instructions, but trying to add &lt;code&gt;vector_div_inst,vector_inst,vector_load_inst,vector_store_inst&lt;/code&gt; somehow makes perf record stop collecting samples silently, adding just 3 more events works though, so I guess I can be happy with &lt;code&gt;u_mode_cycle,alu_inst,atomic_inst,fp_inst,vector_inst&lt;/code&gt; at least.&lt;/p&gt;

</description>
      <category>bpif3</category>
    </item>
    <item>
      <title>Bringing up BPI-F3 - Part 2</title>
      <dc:creator>Luca Barbato</dc:creator>
      <pubDate>Mon, 20 May 2024 19:38:42 +0000</pubDate>
      <link>https://forem.com/luzero/bringing-up-bpi-f3-part-2-2ikj</link>
      <guid>https://forem.com/luzero/bringing-up-bpi-f3-part-2-2ikj</guid>
      <description>&lt;p&gt;This is part2, in &lt;a href="https://dev.to/luzero/bringing-up-bpi-f3-part-1-3bm4"&gt;part1&lt;/a&gt; I shown how to get a Gentoo system on the nvme taking some shortcuts. Here are the notes from the second day with some improvements.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setting up the eMMC
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://stijn.tintel.eu/blog/2024/05/19/compiling-uboot-bpi-f3/" rel="noopener noreferrer"&gt;stintel&lt;/a&gt; already documented how to build uboot and install it, I'm summarizing it as well here.&lt;/p&gt;

&lt;h3&gt;
  
  
  Dependencies
&lt;/h3&gt;

&lt;p&gt;We need &lt;code&gt;dtc&lt;/code&gt; to compile the device-tree and &lt;code&gt;mkimage&lt;/code&gt; and git if we want to clone the trees.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;emerge dtc u-boot-tools dev-vcs/git
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Bootloader
&lt;/h3&gt;

&lt;p&gt;This board boot process relies on &lt;a href="https://github.com/riscv/opensbi" rel="noopener noreferrer"&gt;opensbi&lt;/a&gt; + &lt;a href="https://u-boot.org" rel="noopener noreferrer"&gt;u-boot&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;First build opensbi
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/BPI-SINOVOIP/pi-opensbi &lt;span class="nt"&gt;-b&lt;/span&gt; v1.3-k1
&lt;span class="nb"&gt;cd &lt;/span&gt;pi-opensbi
make &lt;span class="nv"&gt;PLATFORM&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;generic menuconfig
make &lt;span class="nt"&gt;-j&lt;/span&gt; 8
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As &lt;a href="https://stijn.tintel.eu/blog/2024/05/19/compiling-uboot-bpi-f3/" rel="noopener noreferrer"&gt;stintel&lt;/a&gt; already reported, the default configuration is wrong, so we have to make sure that the &lt;code&gt;build/platform/generic/kconfig/.config&lt;/code&gt; has the support for K1x:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# CONFIG_PLATFORM_SPACEMIT_K1PRO is not set
CONFIG_PLATFORM_SPACEMIT_K1X=y
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If everything goes well &lt;code&gt;build/platform/generic/firmware/fw_dynamic.bin&lt;/code&gt; will exist.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Then build u-boot
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENSBI&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;../pi-opensbi/build/platform/generic/firmware/fw_dynamic.bin
git clone https://github.com/BPI-SINOVOIP/pi-u-boot &lt;span class="nt"&gt;-b&lt;/span&gt; v2022.10-k1
&lt;span class="nb"&gt;cd &lt;/span&gt;pi-u-boot
make k1_defconfig
make &lt;span class="nt"&gt;-j8&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If everything goes well it will produce the following files&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;FSBL.bin  
bootinfo_emmc.bin  
bootinfo_sd.bin  
bootinfo_spinand.bin  
bootinfo_spinor.bin 
u-boot.bin
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Partitions
&lt;/h3&gt;

&lt;p&gt;The &lt;a href="https://docs.banana-pi.org/en/BPI-F3/GettingStarted_BPI-F3" rel="noopener noreferrer"&gt;upstream instructions&lt;/a&gt; suggest to copy over the layout, so the &lt;code&gt;uboot&lt;/code&gt;, &lt;code&gt;bootfs&lt;/code&gt;, &lt;code&gt;rootfs&lt;/code&gt; partitions stay the same, but now the eMMC &lt;code&gt;fsbl&lt;/code&gt; and &lt;code&gt;opensbi&lt;/code&gt; are to live in their dedicated &lt;code&gt;/dev/mmcblk2boot0&lt;/code&gt;. So it is up to you to do what you prefer with them.&lt;br&gt;
Surely you have to setup &lt;code&gt;/dev/mmcblk2boot0&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;echo 0 &amp;gt; /sys/block/mmcblk2boot0/force_ro
dd if=bootinfo_emmc.bin of=/dev/mmcblk2boot0
dd if=FSBL.bin of=/dev/mmcblk2boot0 bs=512 seek=1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As &lt;strong&gt;stintel&lt;/strong&gt; tested, all that's need is that a partition labeled &lt;code&gt;uboot&lt;/code&gt; exists and contains &lt;code&gt;u-boot.bin&lt;/code&gt; and a &lt;code&gt;bootfs&lt;/code&gt; partition that contains &lt;code&gt;env_k1-x.txt&lt;/code&gt;. It is a good idea to have &lt;code&gt;env&lt;/code&gt; and &lt;code&gt;rootfs&lt;/code&gt; as well.&lt;/p&gt;

&lt;h2&gt;
  
  
  Kernel
&lt;/h2&gt;

&lt;p&gt;Building the kernel is straightforward enough:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/BPI-SINOVOIP/pi-linux &lt;span class="nt"&gt;-b&lt;/span&gt; linux-6.1.15-k1
&lt;span class="nb"&gt;ln&lt;/span&gt; &lt;span class="nt"&gt;-s&lt;/span&gt; pi-linux/ linux
&lt;span class="nb"&gt;cd &lt;/span&gt;linux
zcat /proc/config.gz &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; .config
make oldconfig &lt;span class="c"&gt;# (or menuconfig)&lt;/span&gt;
make &lt;span class="nt"&gt;-j&lt;/span&gt; 10 &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; make modules_install
mount /boot &lt;span class="c"&gt;# make sure it picks the right partition&lt;/span&gt;
make &lt;span class="nb"&gt;install&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;Make sure to enable &lt;code&gt;HWMON&lt;/code&gt; and &lt;code&gt;HWMON_NVME&lt;/code&gt; so you'll have the thermal sensors available.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Edit &lt;code&gt;boot/env_k1-x.txt&lt;/code&gt; accordingly and your new kernel is ready.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;NOTE&lt;/strong&gt;: The initial kernel has at least a known annoying bug and it is missing &lt;code&gt;HWMON&lt;/code&gt; support for its thermal sensors.&lt;br&gt;
If you use Rust, you'll notice immediately some strange issue with &lt;a href="https://bugzilla.kernel.org/show_bug.cgi?id=217923" rel="noopener noreferrer"&gt;futex()&lt;/a&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/BPI-SINOVOIP/pi-linux/pull/4" rel="noopener noreferrer"&gt;here&lt;/a&gt; a patch to address the bug backported to the 6.1.15 vendored kernel.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/BPI-SINOVOIP/pi-linux/pull/3" rel="noopener noreferrer"&gt;here&lt;/a&gt; my tiny patch to get something going regarding HWMON.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Coming next
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Do not use at all an initrd (right now if you are following along you'd be using the initrd from the starting distro)&lt;/li&gt;
&lt;li&gt;Other u-boot tweaks&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>riscv</category>
      <category>gentoo</category>
      <category>bpif3</category>
    </item>
    <item>
      <title>Bringing up BPI-F3 - Part 1</title>
      <dc:creator>Luca Barbato</dc:creator>
      <pubDate>Sun, 19 May 2024 15:05:09 +0000</pubDate>
      <link>https://forem.com/luzero/bringing-up-bpi-f3-part-1-3bm4</link>
      <guid>https://forem.com/luzero/bringing-up-bpi-f3-part-1-3bm4</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;I like a lot non-x86 architectures, I still enjoy writing software for PowerPC even nowadays and I'm looking forward to see how SVE will be once there is hardware in the right price range. &lt;br&gt;
Today it is the turn of RISC-V since there is eventually a, hopefully, nice board sporting a cpu with the RISC-V Vector Extension 1.0 that is now part of the &lt;a href="https://riscv.org/technical/specifications/" rel="noopener noreferrer"&gt;RISC-V&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Where to start
&lt;/h2&gt;

&lt;p&gt;The official documentation is available &lt;a href="https://docs.banana-pi.org/en/BPI-F3/BananaPi_BPI-F3" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The board comes with an empty eMMC, so either you start with a SD, or you need to upload a valid image using &lt;a href="https://docs.u-boot.org/en/latest/usage/dfu.html" rel="noopener noreferrer"&gt;dfu&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Sadly the usb-c does not multiplex also the serial, so you'll need to connect the usual 3 pins with a 3.3V usbserial.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ERROR:   CMD8
ERROR:   sd f! l:76
bm:0
ERROR:   emmc: invalid bootinfo magic code:0x0
ERROR:   invalid bootinfo image.
ERROR:   entering download mode.
ROM: usb download handler
usb2d_initialize : enter
Force usb FS mode!
Controller Run
usb rst int
SETUP: 0x0 0x5 0x1
SETUP: 0x80 0x6 0x100
SETUP: 0x80 0x6 0x100
SETUP: 0x80 0x6 0x302
SETUP: 0x80 0x6 0x302
SETUP: 0x80 0x6 0x301
SETUP: 0x80 0x6 0x301
SETUP: 0x80 0x6 0x30a
SETUP: 0x80 0x6 0x30a
SETUP: 0x80 0x6 0x200
SETUP: 0x80 0x6 0x200
SETUP: 0x80 0x6 0x300
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Their images are Debian derivatives, I prefer &lt;a href="https://gentoo.org" rel="noopener noreferrer"&gt;Gentoo&lt;/a&gt; and since we have &lt;a href="http://distfiles.gentoo.org/releases/riscv/autobuilds/" rel="noopener noreferrer"&gt;stage3&lt;/a&gt; for &lt;code&gt;rv64_lp64d-openrc&lt;/code&gt;, I guess I can give it a try.&lt;/p&gt;

&lt;p&gt;Vendor-sources for kernel and bootloaders are available and I can piggy back over that image if I stumble over &lt;a href="https://github.com/BPI-SINOVOIP/pi-opensbi/tree/v1.3-k1" rel="noopener noreferrer"&gt;opensbi&lt;/a&gt; or &lt;a href="https://github.com/BPI-SINOVOIP/pi-u-boot/tree/v2022.10-k1" rel="noopener noreferrer"&gt;u-boot&lt;/a&gt; or I run out of time sorting out the &lt;a href="https://github.com/BPI-SINOVOIP/pi-linux/tree/linux-6.1.15-k1" rel="noopener noreferrer"&gt;patches for linux-6.1.15&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://github.com/wez/wezterm" rel="noopener noreferrer"&gt;wezterm&lt;/a&gt; or &lt;a href="https://www.gnu.org/software/screen/" rel="noopener noreferrer"&gt;screen&lt;/a&gt; to see what's going on&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://packages.gentoo.org/packages/dev-embedded/u-boot-tools" rel="noopener noreferrer"&gt;u-boot-tools&lt;/a&gt; are needed since the kernel image will  would come in a &lt;a href="https://docs.u-boot.org/en/latest/usage/fit/index.html" rel="noopener noreferrer"&gt;FIT&lt;/a&gt; image.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The &lt;a href="https://github.com/BPI-SINOVOIP/armbian-build/commits/v24.04.30/" rel="noopener noreferrer"&gt;armbian&lt;/a&gt; tree or any of the pre-made builds provided.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A recent &lt;a href="http://distfiles.gentoo.org/releases/riscv/autobuilds/" rel="noopener noreferrer"&gt;stage3&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Lazy Setup
&lt;/h2&gt;

&lt;p&gt;The base images share this layout:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Number  Start (sector)    End (sector)  Size       Code  Name
   1             256             767   256.0 KiB   8300  fsbl
   2             768             895   64.0 KiB    8300  env
   3            2048            4095   1024.0 KiB  8300  opensbi
   4            4096            8191   2.0 MiB     8300  uboot
   5            8192          532479   256.0 MiB   8300  bootfs
   6          532480              &amp;lt;&amp;gt;   &amp;lt;&amp;gt; GiB     8300  rootfs
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With the &lt;code&gt;rootfs&lt;/code&gt; taking the space it needs.&lt;/p&gt;

&lt;p&gt;The easiest way is to copy the first 5 partitions to the eMMC&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# dd if=/dev/mmcblk0 of=/dev/mmcblk2 bs=1M count=259
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And then recreate the &lt;code&gt;rootfs&lt;/code&gt; partition.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;[!note] the initramfs does not support btrfs, nor the u-boot configuration. Until kernel and initramfs are replaced it is safer to stick to ext4 (I made this mistake).&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Then the usual manual unpack of a &lt;code&gt;stage3&lt;/code&gt; and chroot described in &lt;a href="https://wiki.gentoo.org/wiki/Handbook:Main_Page" rel="noopener noreferrer"&gt;Gentoo Handbook&lt;/a&gt; works.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pitfalls known so far
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Make sure to enable the serial ports in &lt;code&gt;/etc/inittab&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;s0:12345:respawn:/sbin/agetty -L 115200 ttyS0 vt100
s1:12345:respawn:/sbin/agetty -L 115200 ttyS1 vt100
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;The default &lt;code&gt;u-boot&lt;/code&gt; command script is incomplete, if you want to use the nvme, you have to enter the u-boot console (press &lt;code&gt;s&lt;/code&gt;)
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;=&amp;gt; nvme scan
=&amp;gt; run nor_boot
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Coming next
&lt;/h2&gt;

&lt;p&gt;I hadn't tried yet to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;update u-boot and change a bit its configuration&lt;/li&gt;
&lt;li&gt;build a new kernel&lt;/li&gt;
&lt;li&gt;put everything together to have btrfs on nvme&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>riscv</category>
      <category>gentoo</category>
      <category>bpif3</category>
    </item>
    <item>
      <title>Quick notes on using insta</title>
      <dc:creator>Luca Barbato</dc:creator>
      <pubDate>Fri, 16 Dec 2022 14:29:35 +0000</pubDate>
      <link>https://forem.com/luzero/quick-notes-on-using-insta-44hg</link>
      <guid>https://forem.com/luzero/quick-notes-on-using-insta-44hg</guid>
      <description>&lt;p&gt;&lt;a href="//insta.rs"&gt;Insta&lt;/a&gt; is a nice snapshot testing tool for Rust.&lt;/p&gt;

&lt;h2&gt;
  
  
  Snapshot testing
&lt;/h2&gt;

&lt;p&gt;Snapshot tests are a more general extension of the normal &lt;a href="https://doc.rust-lang.org/book/ch11-01-writing-tests.html" rel="noopener noreferrer"&gt;tests&lt;/a&gt; we have in Rust.&lt;/p&gt;

&lt;p&gt;Usually we use &lt;code&gt;assert_eq!&lt;/code&gt; and similar macros. Insta let you deal with the reference values in a richer way and store them inside or outside your codebase.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pitfalls to keep in mind
&lt;/h2&gt;

&lt;p&gt;I spent way too much time stumbling upon a bunch of annoyances while converting some tests in &lt;a href="https://github.com/mozilla/rust-code-analysis" rel="noopener noreferrer"&gt;rust-code-analysis&lt;/a&gt; so it is better to list them so you will avoid to suffer from similar mistakes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;cargo insta&lt;/code&gt; is git-aware, make sure to pass &lt;code&gt;--no-ignore&lt;/code&gt; if your snapshots directory is populated by different means even better if you can use &lt;strong&gt;git-submodules&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;If you deal with floating points, round them to the precision you care about. Different arches/libc may have different precisions&lt;/li&gt;
&lt;li&gt;If you save paths keep in mind that Windows is special in this regard and you may have to manually serialize it.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Nice things to keep in mind
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;cargo test -- --nocapture&lt;/code&gt; provides enough information to see what went wrong.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;cargo insta test --review&lt;/code&gt; makes adding more tests and make sure you do not break anything by mistake.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's all for today&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Templating inline asm in Rust</title>
      <dc:creator>Luca Barbato</dc:creator>
      <pubDate>Mon, 05 Sep 2022 11:51:23 +0000</pubDate>
      <link>https://forem.com/luzero/templating-inline-asm-in-rust-37ee</link>
      <guid>https://forem.com/luzero/templating-inline-asm-in-rust-37ee</guid>
      <description>&lt;p&gt;Kostya wrote &lt;a href="https://codecs.multimedia.cx/2022/09/rust-inline-assembly-experience/" rel="noopener noreferrer"&gt;about it&lt;/a&gt; and he asked me to figure out if what's he is missing isn't already available one way or another.&lt;/p&gt;

&lt;h2&gt;
  
  
  Before we start
&lt;/h2&gt;

&lt;p&gt;For those not used to Multimedia, Rust or Assembly&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multimedia is all about processing data and serving/rendering it to the user at the right moment.&lt;/li&gt;
&lt;li&gt;That requires to have a good control over latency and use the least amount of cpu.&lt;/li&gt;
&lt;li&gt;This leads to use architecture specific extensions such as &lt;code&gt;x86_64 AVX2&lt;/code&gt; or &lt;code&gt;ARM NEON&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Higher level languages such as C (or Rust) offer access to those extension via intrinsics, but quite often they are cumbersome enough that writing assembly as-is ends up being more pleasant.&lt;/li&gt;
&lt;li&gt;You may look &lt;a href="https://code.videolan.org/videolan/dav1d" rel="noopener noreferrer"&gt;dav1d&lt;/a&gt; and &lt;a href="https://github.com/xiph/rav1e/" rel="noopener noreferrer"&gt;rav1e&lt;/a&gt; for examples.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Rust and assembly
&lt;/h2&gt;

&lt;p&gt;Rust had a fairly weak point in supporting assembly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;rustc&lt;/code&gt; does not compile &lt;code&gt;.s&lt;/code&gt; or &lt;code&gt;.S&lt;/code&gt; as &lt;code&gt;gcc&lt;/code&gt; and &lt;code&gt;clang&lt;/code&gt; do, and that makes you rely on &lt;a href="https://crates.io/crates/cc" rel="noopener noreferrer"&gt;cc-rs&lt;/a&gt; or &lt;a href="https://crates.io/crates/nasm-rs" rel="noopener noreferrer"&gt;nasm-rs&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;rustc&lt;/code&gt; until recently did not have a stable support for inline-assembly, and right now it has some useful parts still in nightly.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How far are we?
&lt;/h2&gt;

&lt;p&gt;Kostya tried the current stable and he managed to write some assembly for his h264 decoder and get a 20% gain, but he had 3 issues:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;He couldn't figure out how to manage sub-registers and the compiler warning wasn't that helpful for him since it assumes you know the &lt;code&gt;format!&lt;/code&gt; terminology and you know &lt;code&gt;asm!&lt;/code&gt; relates to it. &lt;a href="https://github.com/rust-lang/rust/pull/101253" rel="noopener noreferrer"&gt;this is being addressed&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;asm!&lt;/code&gt; operands in stable do not include &lt;a href="https://github.com/rust-lang/rust/issues/93333" rel="noopener noreferrer"&gt;sym&lt;/a&gt; and &lt;a href="https://github.com/rust-lang/rust/issues/93332" rel="noopener noreferrer"&gt;const&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Kostya couldn't figure out how to deal with templating the assembly using &lt;code&gt;macro_rules!()&lt;/code&gt; as he is used to do with &lt;code&gt;gcc&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I asked on &lt;a href="https://rust-lang.zulipchat.com/#narrow/stream/216763-project-inline-asm/topic/Do.20we.20have.20a.20mean.20to.20support.20custom.20mnemonics.2Ftemplating.3F" rel="noopener noreferrer"&gt;zulip&lt;/a&gt; since I couldn't think of better ways than &lt;a href="https://danielkeep.github.io/tlborm/book/pat-incremental-tt-munchers.html" rel="noopener noreferrer"&gt;munching tokens&lt;/a&gt; and usually this means I'm missing something much simpler and obvious.&lt;br&gt;
Luckily Amanieu helped us and this blog post is more or less about keeping notes.&lt;/p&gt;
&lt;h3&gt;
  
  
  Use cases for inline asm templating
&lt;/h3&gt;

&lt;p&gt;In multimedia software you often write tiny kernels that operate over blocks of pixels, &lt;code&gt;4x4&lt;/code&gt;, &lt;code&gt;8x8&lt;/code&gt;, &lt;code&gt;16x16&lt;/code&gt; and so on and usually the same inner logic is shared across and you ideally would like to not repeat yourself, even more if the very same logic can be shared across the many many extensions x86 has.&lt;/p&gt;

&lt;p&gt;Kostya used this as example&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;avg_4&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dst&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;u8&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;dstride&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;usize&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;src&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;u8&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;sstride&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;usize&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bh&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;usize&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;unsafe&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nd"&gt;asm!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="s"&gt;"2:"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s"&gt;"movd   xmm1, [{src}]"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s"&gt;"movd   xmm3, [{src} + {sstride}]"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s"&gt;"movd   xmm0, [{dst}]"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s"&gt;"movd   xmm2, [{dst} + {dstride}]"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s"&gt;"lea    {src}, [{src} + {sstride} * 2]"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s"&gt;"pavgb  xmm0, xmm1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s"&gt;"pavgb  xmm2, xmm3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s"&gt;"movd   [{dst}], xmm0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s"&gt;"movd   [{dst} + {dstride}], xmm2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s"&gt;"lea    {dst}, [{dst} + {dstride} * 2]"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s"&gt;"sub    {h}, 2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s"&gt;"jnz    2b"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;src&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;inout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;src&lt;/span&gt;&lt;span class="nf"&gt;.as_ptr&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;sstride&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;sstride&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;dst&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;inout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;dst&lt;/span&gt;&lt;span class="nf"&gt;.as_mut_ptr&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;dstride&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;dstride&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;h&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;inout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;bh&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="nf"&gt;out&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"xmm0"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="nf"&gt;out&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"xmm1"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="nf"&gt;out&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"xmm2"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="nf"&gt;out&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"xmm3"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;avg_8&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dst&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;u8&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;dstride&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;usize&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;src&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;u8&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;sstride&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;usize&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bh&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;usize&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;unsafe&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nd"&gt;asm!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="s"&gt;"2:"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s"&gt;"movq   xmm0, [{src}]"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s"&gt;"movq   xmm1, [{src} + {sstride}]"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s"&gt;"movq   xmm2, [{dst}]"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s"&gt;"movq   xmm3, [{dst} + {dstride}]"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s"&gt;"lea    {src}, [{src} + {sstride} * 2]"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s"&gt;"pavgb  xmm0, xmm2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s"&gt;"pavgb  xmm1, xmm3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s"&gt;"movq   [{dst}], xmm0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s"&gt;"movq   [{dst} + {dstride}], xmm1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s"&gt;"lea    {dst}, [{dst} + {dstride} * 2]"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s"&gt;"sub    {h}, 2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s"&gt;"jnz    2b"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;src&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;inout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;src&lt;/span&gt;&lt;span class="nf"&gt;.as_ptr&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;sstride&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;sstride&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;dst&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;inout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;dst&lt;/span&gt;&lt;span class="nf"&gt;.as_mut_ptr&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;dstride&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;dstride&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;h&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;inout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;bh&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="nf"&gt;out&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"xmm0"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="nf"&gt;out&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"xmm1"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="nf"&gt;out&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"xmm2"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="nf"&gt;out&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"xmm3"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;avg_16&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dst&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;u8&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;dstride&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;usize&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;src&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;u8&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;sstride&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;usize&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bh&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;usize&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;unsafe&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nd"&gt;asm!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="s"&gt;"2:"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s"&gt;"movaps xmm0, [{src}]"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s"&gt;"movaps xmm1, [{src} + {sstride}]"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s"&gt;"pavgb  xmm0, [{dst}]"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s"&gt;"pavgb  xmm1, [{dst} + {dstride}]"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s"&gt;"lea    {src}, [{src} + {sstride} * 2]"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s"&gt;"movq   [{dst}], xmm0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s"&gt;"movq   [{dst} + {dstride}], xmm1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s"&gt;"lea    {dst}, [{dst} + {dstride} * 2]"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s"&gt;"sub    {h}, 2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s"&gt;"jnz    2b"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;src&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;inout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;src&lt;/span&gt;&lt;span class="nf"&gt;.as_ptr&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;sstride&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;sstride&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;dst&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;inout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;dst&lt;/span&gt;&lt;span class="nf"&gt;.as_mut_ptr&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;dstride&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;dstride&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;h&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;inout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;bh&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="nf"&gt;out&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"xmm0"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="nf"&gt;out&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"xmm1"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Between &lt;code&gt;avg_4&lt;/code&gt; and &lt;code&gt;avg_8&lt;/code&gt; there is just a &lt;code&gt;movd&lt;/code&gt; vs &lt;code&gt;movq&lt;/code&gt; and for this Amalieu suggested to use the &lt;code&gt;concat!&lt;/code&gt; pattern he uses in &lt;a href="https://github.com/Amanieu/corosensei/blob/master/src/arch/riscv.rs#L135" rel="noopener noreferrer"&gt;corosensei&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;avg_4 and avg_8 would end up being&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="nd"&gt;macro_rules!&lt;/span&gt; &lt;span class="n"&gt;avg&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ident&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$mov:literal&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nv"&gt;$name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dst&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;u8&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;dstride&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;usize&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;src&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;u8&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;sstride&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;usize&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bh&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;usize&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;unsafe&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="nd"&gt;asm!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="s"&gt;"2:"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="nd"&gt;concat!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$mov&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;" xmm1, [{src}]"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                    &lt;span class="nd"&gt;concat!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$mov&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;" xmm3, [{src} + {sstride}]"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                    &lt;span class="nd"&gt;concat!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$mov&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;" xmm0, [{dst}]"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                    &lt;span class="nd"&gt;concat!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$mov&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;" xmm2, [{dst} + {dstride}]"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                    &lt;span class="s"&gt;"lea    {src}, [{src} + {sstride} * 2]"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="s"&gt;"pavgb  xmm0, xmm1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="s"&gt;"pavgb  xmm2, xmm3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="nd"&gt;concat!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$mov&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;" [{dst}], xmm0"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                    &lt;span class="nd"&gt;concat!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$mov&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;" [{dst} + {dstride}], xmm2"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                    &lt;span class="s"&gt;"lea    {dst}, [{dst} + {dstride} * 2]"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="s"&gt;"sub    {h}, 2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="s"&gt;"jnz    2b"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;src&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;inout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;src&lt;/span&gt;&lt;span class="nf"&gt;.as_ptr&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;sstride&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;sstride&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;dst&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;inout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;dst&lt;/span&gt;&lt;span class="nf"&gt;.as_mut_ptr&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;dstride&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;dstride&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;h&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;inout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;bh&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="nf"&gt;out&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"xmm0"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="nf"&gt;out&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"xmm1"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="nf"&gt;out&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"xmm2"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="nf"&gt;out&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"xmm3"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="p"&gt;);&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nd"&gt;avg!&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;avg_4&lt;/span&gt;&lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="nd"&gt;avg!&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;avg_8&lt;/span&gt;&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To factorize away avg_8 and avg_16, you need a way to deal with the &lt;code&gt;operands&lt;/code&gt; and while the asm statements are &lt;code&gt;literals&lt;/code&gt;, the &lt;code&gt;operands&lt;/code&gt; can be only expressed as &lt;code&gt;tt&lt;/code&gt; in a &lt;code&gt;macro_rules!&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The normal way to deal with this use-case is to rely on the preprocessor &lt;code&gt;#if&lt;/code&gt; directives, using macro_rules you have to be a bit more creative.&lt;/p&gt;

&lt;p&gt;Amalieu gave me &lt;a href="https://github.com/Amanieu/corosensei/blob/81a1a84a1f801efe66e55412a59716947c3a2bdf/src/arch/arm.rs#L78" rel="noopener noreferrer"&gt;another example&lt;/a&gt; and I ended up crafting:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="nd"&gt;macro_rules!&lt;/span&gt; &lt;span class="n"&gt;avg_common&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$name:ident&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nv"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$load:literal&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nv"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$store:literal&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="nv"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$out:tt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nv"&gt;$name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dst&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;u8&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;dstride&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;usize&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;src&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;u8&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;sstride&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;usize&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bh&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;usize&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;unsafe&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="nd"&gt;asm!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="s"&gt;"2:"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="nv"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$load&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="s"&gt;"lea    {src}, [{src} + {sstride} * 2]"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="s"&gt;"pavgb  xmm0, xmm2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="s"&gt;"pavgb  xmm1, xmm3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="nv"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$store&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="s"&gt;"lea    {dst}, [{dst} + {dstride} * 2]"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="s"&gt;"sub    {h}, 2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="s"&gt;"jnz    2b"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;src&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;inout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;src&lt;/span&gt;&lt;span class="nf"&gt;.as_ptr&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;sstride&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;sstride&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;dst&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;inout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;dst&lt;/span&gt;&lt;span class="nf"&gt;.as_mut_ptr&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;dstride&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;dstride&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;h&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;inout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;bh&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="nv"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$out&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;
                &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nd"&gt;macro_rules!&lt;/span&gt; &lt;span class="n"&gt;avg&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;avg_8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nd"&gt;avg_common!&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;avg_8&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="s"&gt;"movq   xmm0, [{src}]"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s"&gt;"movq   xmm1, [{src} + {sstride}]"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s"&gt;"movq   xmm2, [{dst}]"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s"&gt;"movq   xmm3, [{dst} + {dstride}]"&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="s"&gt;"movq   [{dst}], xmm0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s"&gt;"movq   [{dst} + {dstride}], xmm1"&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="nf"&gt;out&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"xmm0"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="nf"&gt;out&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"xmm1"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="nf"&gt;out&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"xmm2"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="nf"&gt;out&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"xmm3"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;avg_16&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nd"&gt;avg_common!&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;avg_16&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="s"&gt;"movaps xmm0, [{src}]"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s"&gt;"movaps xmm1, [{src} + {sstride}]"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s"&gt;"pavgb  xmm0, [{dst}]"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s"&gt;"pavgb  xmm1, [{dst} + {dstride}]"&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="s"&gt;"movq   [{dst}], xmm0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s"&gt;"movq   [{dst} + {dstride}], xmm1"&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="nf"&gt;out&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"xmm0"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="nf"&gt;out&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"xmm1"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;More or less solving Kostya problem even if supporting multiple blocks of operands would require extra care.&lt;/p&gt;

&lt;h2&gt;
  
  
  Coming next
&lt;/h2&gt;

&lt;p&gt;I had been quite busy with the &lt;a href="https://www.sifis-home.eu/" rel="noopener noreferrer"&gt;SIFIS-Home&lt;/a&gt; project in particular writing a new implementation of &lt;a href="https://github.com/w3c/wot" rel="noopener noreferrer"&gt;WebOfThings&lt;/a&gt; in Rust, soon we'll release the first version supporting wot-1.1 and probably I'll write a bit about it.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>A month in rav1e - February</title>
      <dc:creator>Luca Barbato</dc:creator>
      <pubDate>Wed, 10 Mar 2021 10:33:06 +0000</pubDate>
      <link>https://forem.com/luzero/a-month-in-rav1e-february-3cfm</link>
      <guid>https://forem.com/luzero/a-month-in-rav1e-february-3cfm</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://github.com/xiph/rav1e" rel="noopener noreferrer"&gt;rav1e&lt;/a&gt; is an AV1 encoder written in &lt;a href="https://rust-lang.org" rel="noopener noreferrer"&gt;Rust&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Here a quick summary of what happened last month. I will try to write a recap every month.&lt;/p&gt;

&lt;h2&gt;
  
  
  February Summary
&lt;/h2&gt;

&lt;p&gt;We poured lots of work on improving the encoding speed, you may read some details of the journey:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Analyze our memory access patterns and &lt;a href="https://dev.to/barrbrain/video-encoder-rollback-optimization-in-rav1e-4d5k"&gt;improve the layout and the update strategy&lt;/a&gt; of a structure accessed a lot in our hottest code-path.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/luzero/temporal-rdo-update-optimization-2pf1"&gt;parallelize one of the remaining bottleneck&lt;/a&gt; so we improve the average thread usage and improve both speed and latency.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/master_of_zen/per-speed-rdo-lookahead-frames-optimization-5ai2"&gt;add the temporal rdo lookahead to our speed levels&lt;/a&gt;, measure its quality-vs-speed impact and retune them accordingly.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;The benchmarks are prepared using &lt;a href="https://crates.io/crates/speed-levels-rs" rel="noopener noreferrer"&gt;speed-levels-rs&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The encoder is using the following settings:&lt;/p&gt;


&lt;pre class="highlight plaintext"&gt;&lt;code&gt;--threads 16 --tiles 16 -l 100 &amp;lt;file&amp;gt; -o &amp;lt;encoded&amp;gt; -s &amp;lt;level&amp;gt;
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;The source file is &lt;strong&gt;Bosphorus&lt;/strong&gt; from the &lt;a href="http://ultravideo.fi/#testsequences" rel="noopener noreferrer"&gt;ultravideo test sequences&lt;/a&gt;, the 1080p 10bit version is the 4k 10bit version scaled down, since it is not available on the website.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fos6dw65b9m5seno2515n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fos6dw65b9m5seno2515n.png" alt="alt text" width="800" height="495"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Overall our &lt;strong&gt;aarch64&lt;/strong&gt; support is getting fairly good, but there is still a lot of room for improvement on 8bit. &lt;/p&gt;

&lt;p&gt;On the other hand there are 10bit optimizations it that aren't yet available for x86_64. Help in improving our SIMD coverage is very welcome :)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9879wj71zq8e9ep97wbg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9879wj71zq8e9ep97wbg.png" alt="alt text" width="800" height="495"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Digging deeper
&lt;/h2&gt;

&lt;h3&gt;
  
  
  x86_64
&lt;/h3&gt;

&lt;p&gt;As expected the memory layout optimization that happened between &lt;code&gt;p20210209&lt;/code&gt; and &lt;code&gt;p20210216&lt;/code&gt; had the largest impact on the speed 0 and 1, while optimizing and tuning the temporal rdo lookahead computation has the largest impact on speed level 9 and 10.&lt;/p&gt;

&lt;center&gt;

Speed Level | p20210209 | p20210216 | p20210223
-- | -- | -- | --
0 | **x1.23** | **x1.29** | x1.30
1 | **x1.20** | **x1.24** | x1.33
2 | x1.08 | x1.11 | x1.22
3 | x1.04 | x1.07 | x1.25
4 | x1.04 | x1.06 | x1.24
5 | x1.05 | x1.07 | x1.27
6 | x1.04 | x1.05 | x1.37
7 | x1.03 | x1.06 | x1.36
8 | x1.04 | x1.06 | x1.39
9 | x1.00 | x1.02 | **x1.52**
10| x1.00 | x1.01 | **x1.94**

&lt;/center&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9rwru7ueagf17nkxgtrn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9rwru7ueagf17nkxgtrn.png" alt="alt text" width="800" height="990"&gt;&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;The &lt;code&gt;x86_64&lt;/code&gt; &lt;strong&gt;10bit&lt;/strong&gt; encoding is behaving similarly. Our SIMD support for it received a &lt;a href="https://github.com/xiph/rav1e/commit/8b930d2a" rel="noopener noreferrer"&gt;large&lt;/a&gt; &lt;a href="https://github.com/xiph/rav1e/commit/a420bc3" rel="noopener noreferrer"&gt;boost&lt;/a&gt; in January and there is an ongoing effort to improve it even further in March.&lt;/p&gt;

&lt;center&gt;

Speed Level | p20210209 | p20210216 | p20210223
-- | -- | -- | --
0 | **x1.12** | **x1.12** | x1.17
1 | **x1.10** | **x1.11** | x1.26
2 | x1.04 | x1.04 | x1.23
3 | x1.00 | x1.02 | x1.28
4 | x1.01 | x1.02 | x1.27
5 | x1.02 | x1.02 | x1.29
6 | x1.00 | x1.01 | x1.37
7 | x1.01 | x1.01 | x1.37
8 | x1.00 | x1.01 | x1.38
9 | x0.99 | x1.00 | **x1.50**
10 | x0.99 | x1.00 | **x1.95**

&lt;/center&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4eya69ra0no06fzhqdei.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4eya69ra0no06fzhqdei.png" alt="alt text" width="800" height="990"&gt;&lt;/a&gt; &lt;/p&gt;

&lt;h3&gt;
  
  
  Aarch64
&lt;/h3&gt;

&lt;p&gt;The impact of the optimizations on &lt;strong&gt;aarch64&lt;/strong&gt; had been more radical with a fairly large relative improvement on speed 10.&lt;/p&gt;

&lt;center&gt;

Speed Level |p20210209 | p20210216 | p20210223
-- | -- | -- | --
0 | **x1.14** | **x1.15** | x1.31
1 | **x1.11** | **x1.10** | x1.59
2 | x1.03 | x1.03 | x1.63
3 | x1.03 | x1.01 | x1.76
4 | x1.01 | x1.01 | x1.77
5 | x1.02 | x1.01 | x1.88
6 | x1.02 | x1.00 | x2.07
7 | x1.01 | x1.00 | x2.07
8 | x1.02 | x1.00 | x2.10
9 | x1.00 | x0.99 | x2.45
10 | x1.01 | x0.98 | **x4.75**

&lt;/center&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv8spw1xib28cjyh998gm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv8spw1xib28cjyh998gm.png" alt="alt text" width="800" height="990"&gt;&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;The 10bit boost is not as extreme, but still substantial.&lt;/p&gt;

&lt;center&gt;

Speed Level | p20210209 | p20210216 | p20210223
-- | -- | -- | --
0 | **x1.13** | **x1.17** | x1.30
1 | x1.08 | x1.11 | x1.54
2 | x1.02 | x1.05 | x1.57
3 | x1.00 | x1.03 | x1.66
4 | x1.00 | x1.02 | x1.67
5 | x1.00 | x1.03 | x1.74
6 | x1.00 | x1.02 | x1.87
7 | x1.00 | x1.03 | x1.87
8 | x1.00 | x1.02 | x1.89
9 | x0.99 | x1.01 | **x2.12**
10 | x0.98 | x1.02 | **x2.96**

&lt;/center&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1f6y793e3n8hejmiw44f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1f6y793e3n8hejmiw44f.png" alt="alt text" width="800" height="988"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I tested on some different aarch64 systems to see if there is a large difference in its behavior.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsvrmuqfia60wa6frbz6q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsvrmuqfia60wa6frbz6q.png" alt="alt text" width="800" height="791"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;Apple M1&lt;/strong&gt; is fairly different, but that's something I would expect. I will talk a bit more about it in other blogposts probably.&lt;/p&gt;

&lt;h2&gt;
  
  
  Coming next
&lt;/h2&gt;

&lt;p&gt;We already landed additional SIMD for both &lt;strong&gt;x86_64&lt;/strong&gt; and &lt;strong&gt;aarch64&lt;/strong&gt;, &lt;a href="https://dev.to/barrbrain/"&gt;David Barr&lt;/a&gt; started working on improving the &lt;a href="https://github.com/xiph/rav1e/pull/2682" rel="noopener noreferrer"&gt;segment selection&lt;/a&gt; and I have &lt;a href="https://github.com/lu-zero/demo-mt" rel="noopener noreferrer"&gt;eventually&lt;/a&gt; came up with the internals architecture that would give us a better thread pool usage while not impacting a lot the overall latency.&lt;/p&gt;

&lt;p&gt;March is going to be exciting.&lt;/p&gt;

</description>
      <category>rav1e</category>
      <category>rust</category>
      <category>av1</category>
    </item>
    <item>
      <title>Temporal RDO update optimization</title>
      <dc:creator>Luca Barbato</dc:creator>
      <pubDate>Sun, 28 Feb 2021 15:26:55 +0000</pubDate>
      <link>https://forem.com/luzero/temporal-rdo-update-optimization-2pf1</link>
      <guid>https://forem.com/luzero/temporal-rdo-update-optimization-2pf1</guid>
      <description>&lt;p&gt;This is the second story of a specific optimization that gave a significant speed-up in rav1e. For the first one go &lt;a href="https://dev.to/barrbrain/video-encoder-rollback-optimization-in-rav1e-4d5k"&gt;here&lt;/a&gt;. For the next one go &lt;a href="https://dev.to/master_of_zen/per-speed-rdo-lookahead-frames-optimization-5ai2"&gt;here&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For readers unfamiliar with AV1 and rav1e, we will start with some background. If you already read the first article you may skip ahead to the implementation details.&lt;/p&gt;

&lt;p&gt;The Alliance for Open Media is an effort founded by Google, Cisco and Mozilla, with many other companies joining them over the past few years. The Alliance for Open Media AV1 video codec is the first to be released as a result of this effort. We will refer to it as AV1.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/xiph/rav1e" rel="noopener noreferrer"&gt;rav1e is an AV1 encoder&lt;/a&gt; -- written primarily in the &lt;a href="https://www.rust-lang.org/" rel="noopener noreferrer"&gt;Rust programming language&lt;/a&gt;, its goal is to be safe and fast.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture of an AV1 encoder
&lt;/h2&gt;

&lt;p&gt;AV1 has a conventional block transform video coding architecture with motion compensation. A video is a sequence of frames which are divided into one or more tiles. Tiles are divided into large squares called superblocks -- either 64 or 128 pixels in width and height for AV1. Superblocks are partitioned recursively into smaller square or rectangular blocks.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://commons.wikimedia.org/wiki/File:AV1_Partitioning.svg" rel="noopener noreferrer"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fupload.wikimedia.org%2Fwikipedia%2Fcommons%2Ff%2Ffd%2FAV1_Partitioning.svg" alt="AV1 Partitioning"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Each block may be predicted by neighboring blocks in the same tile or by a motion projection of regions in other frames. There are many modes to choose from for both of these cases. The core of the video encoder is deciding how to partition the blocks, which prediction mode and reference to use for each block and how to transform the difference. The following diagram illustrates the range of techniques available in AV1.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://commons.wikimedia.org/wiki/File:The_Technology_Inside_Av1.svg" rel="noopener noreferrer"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fupload.wikimedia.org%2Fwikipedia%2Fcommons%2F2%2F2e%2FThe_Technology_Inside_Av1.svg" alt="The technology inside AV1"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Rate-distortion optimization (RDO)
&lt;/h2&gt;

&lt;p&gt;The process for guiding encoding choices is called rate-distortion optimization, commonly abbreviated as RDO. For each set of choices there is a trade between the &lt;strong&gt;rate&lt;/strong&gt; of bits that describe the decisions and how much the decoded frame is &lt;strong&gt;distorted&lt;/strong&gt; compared to the original. The ratio of this trade-off can be estimated for a given quality range.&lt;/p&gt;

&lt;h2&gt;
  
  
  Temporal RDO
&lt;/h2&gt;

&lt;p&gt;The RDO logic is normally applied over a small set of frames within a larger group. We call the smaller group sub-GOP and the larger group GOP.&lt;/p&gt;

&lt;p&gt;The temporal RDO analyzes a larger set of frames to find areas that would recur outside the single sub-GOP and compute a bias that would guide the normal RDO decisions to invest more bits in the blocks that would be more important in the future.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn5iuhsx6wx6xbpclmtjz.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn5iuhsx6wx6xbpclmtjz.jpg"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The idea of temporal RDO is to compress objects well-predicted in future frames with higher quality. For example, a car moving across the screen can be encoded finely on the first frame, which will improve its visual quality on all subsequent frames.&lt;br&gt;
On the other hand, an object that soon disappears can be encoded coarsely to save bits. This can also apply to multi-view images: spend more bits on objects visible in multiple views and less bits on objects visible in only a single view.&lt;br&gt;
-- Ivan Molodetskikh&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The temporal RDO needs to see moderately far in the future in order to be effective.&lt;/p&gt;

&lt;p&gt;Every time a decision is made for a block, it has to propagate -- both within the frame and over all the past ones.&lt;/p&gt;

&lt;p&gt;This caused its implementation to be one of the largest bottlenecks in the encoder:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;As per every lookahead, it adds latency, potentially a lot of it.&lt;/li&gt;
&lt;li&gt;The update process is inherently serial over the frames.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Temporal RDO implementation
&lt;/h2&gt;

&lt;p&gt;In rav1e the bulk of the code implementing it lives in &lt;code&gt;compute_block_importances()&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9u5cix6x3j9bpcylipru.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9u5cix6x3j9bpcylipru.png" alt="hawktrace of encoding"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Being it completely serial, once the encoding process is run on a machine with enough threads it takes a fairly large share of the overall encoding time.&lt;/p&gt;

&lt;p&gt;In the case of encoding using 16 threads and 16 tiles, it takes a little over the 30% of the time spent on receive_packet for the speed level 10.&lt;/p&gt;

&lt;p&gt;The rest of the encoding workload is spread nearly evenly per-tile, the computation of the block importance bias is not.&lt;/p&gt;

&lt;h2&gt;
  
  
  Making it faster
&lt;/h2&gt;

&lt;p&gt;The update process is fairly serial if we consider frames and the per-block updates. But the analysis itself has good parallelism opportunities since it happens per block.&lt;/p&gt;

&lt;h3&gt;
  
  
  Enter rayon
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://docs.rs/rayon" rel="noopener noreferrer"&gt;rayon&lt;/a&gt; is a data-parallelism crate that makes it incredibly easy to take an iterator and execute it in parallel using a pool of workers. We use it extensively in rav1e.&lt;/p&gt;

&lt;p&gt;After a refactor&lt;sup id="fnref1"&gt;1&lt;/sup&gt; to split the update loop in one that computes the per-block costs and one that propagates the importance biases to its neighbors, all it took was an apparently small change:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight diff"&gt;&lt;code&gt;&lt;span class="gh"&gt;diff --git a/src/api/internal.rs b/src/api/internal.rs
index 395a66e4..b2ab16cd 100644
&lt;/span&gt;&lt;span class="gd"&gt;--- a/src/api/internal.rs
&lt;/span&gt;&lt;span class="gi"&gt;+++ b/src/api/internal.rs
&lt;/span&gt;&lt;span class="p"&gt;@@ -21,6 +21,7 @@&lt;/span&gt; use crate::rate::{
   RCState, FRAME_NSUBTYPES, FRAME_SUBTYPE_I, FRAME_SUBTYPE_P,
   FRAME_SUBTYPE_SEF,
 };
&lt;span class="gi"&gt;+use crate::rayon::prelude::*;
&lt;/span&gt; use crate::scenechange::SceneChangeDetector;
 use crate::stats::EncoderStats;
 use crate::tiling::Area;
&lt;span class="p"&gt;@@ -809,14 +810,14 @@&lt;/span&gt; impl&amp;lt;T: Pixel&amp;gt; ContextInner&amp;lt;T&amp;gt; {
     let plane_org = &amp;amp;frame.planes[0];
     let plane_ref = &amp;amp;reference_frame.planes[0];
     let lookahead_intra_costs_lines =
&lt;span class="gd"&gt;-      fi.lookahead_intra_costs.chunks_exact(fi.w_in_imp_b);
&lt;/span&gt;&lt;span class="gi"&gt;+      fi.lookahead_intra_costs.par_chunks_exact(fi.w_in_imp_b);
&lt;/span&gt;     let block_importances_lines =
&lt;span class="gd"&gt;-      fi.block_importances.chunks_exact(fi.w_in_imp_b);
&lt;/span&gt;&lt;span class="gi"&gt;+      fi.block_importances.par_chunks_exact(fi.w_in_imp_b);
&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;     let costs: Vec&amp;lt;_&amp;gt; = lookahead_intra_costs_lines
       .zip(block_importances_lines)
       .enumerate()
&lt;span class="gd"&gt;-      .flat_map(|(y, (lookahead_intra_costs, block_importances))| {
&lt;/span&gt;&lt;span class="gi"&gt;+      .flat_map_iter(|(y, (lookahead_intra_costs, block_importances))| {
&lt;/span&gt;         lookahead_intra_costs
           .iter()
           .zip(block_importances.iter())
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On speed level 10, the time spent on &lt;code&gt;compute_block_importances()&lt;/code&gt; is reduced to nearly 1/3.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9el7bn7vgxjd2dtfx006.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9el7bn7vgxjd2dtfx006.png" alt="hawktrace after the change"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Across the speed levels the impact is the following&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fymgy6ts1s9mhacykrolw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fymgy6ts1s9mhacykrolw.png" alt="alt text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As expected, making the &lt;code&gt;temporal-rdo update&lt;/code&gt; parallel made our quicker speed levels fairly faster.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn4iu3tz0k6dj5ln0e1f1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn4iu3tz0k6dj5ln0e1f1.png" alt="alt text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Given that the &lt;a href="https://dev.to/barrbrain/video-encoder-rollback-optimization-in-rav1e-4d5k"&gt;RDO rollback optimization&lt;/a&gt; had a larger impact on our slower speed levels, it compounds nicely.&lt;/p&gt;

&lt;p&gt;After this optimization we &lt;a href="https://dev.to/master_of_zen/per-speed-rdo-lookahead-frames-optimization-5ai2"&gt;retuned&lt;/a&gt; a little the speed levels, changing the default rdo-lookahead depth.&lt;/p&gt;




&lt;ol&gt;

&lt;li id="fn1"&gt;
&lt;p&gt;&lt;a href="https://github.com/xiph/rav1e/commit/705fb22f191c8468313a9b32ac9796887c7f93b6" rel="noopener noreferrer"&gt;https://github.com/xiph/rav1e/commit/705fb22f191c8468313a9b32ac9796887c7f93b6&lt;/a&gt; ↩&lt;/p&gt;
&lt;/li&gt;

&lt;/ol&gt;

</description>
      <category>rav1e</category>
      <category>rust</category>
      <category>rayon</category>
      <category>optimization</category>
    </item>
  </channel>
</rss>
