<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Anmol Sarma</title>
    <description>The latest articles on Forem by Anmol Sarma (@anmolsarma).</description>
    <link>https://forem.com/anmolsarma</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F188394%2F7b75c4e4-d8cc-49bc-8aba-0db479ae8132.jpeg</url>
      <title>Forem: Anmol Sarma</title>
      <link>https://forem.com/anmolsarma</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/anmolsarma"/>
    <language>en</language>
    <item>
      <title>File Creation Time in Linux</title>
      <dc:creator>Anmol Sarma</dc:creator>
      <pubDate>Sun, 23 Jun 2019 13:54:22 +0000</pubDate>
      <link>https://forem.com/anmolsarma/file-creation-time-in-linux-4i0d</link>
      <guid>https://forem.com/anmolsarma/file-creation-time-in-linux-4i0d</guid>
      <description>&lt;p&gt;The &lt;a href="http://man7.org/linux/man-pages/man1/stat.1.html"&gt;&lt;code&gt;stat&lt;/code&gt;&lt;/a&gt; utility can be used to retrieve the Unix file timestamps namely &lt;code&gt;atime&lt;/code&gt;, &lt;code&gt;ctime&lt;/code&gt; and &lt;code&gt;mtime&lt;/code&gt;. Of these, the benefit of &lt;code&gt;mtime&lt;/code&gt; which records the last time when the file was modified is immediately apparent. On the other hand, &lt;code&gt;atime&lt;/code&gt;&lt;sup id="fnref:1"&gt;1&lt;/sup&gt; which records the last time the file was accessed has been called &lt;a href="https://lore.kernel.org/lkml/20070804210351.GA9784@elte.hu/"&gt;“perhaps the most stupid Unix design idea of all times”&lt;/a&gt;. Intuitively, one might expect &lt;code&gt;ctime&lt;/code&gt; to record the creation time of a file. However, &lt;code&gt;ctime&lt;/code&gt; records the last time when the metadata of a file was changed.&lt;/p&gt;

&lt;p&gt;Typically, Unices do not record file creation times. While some individual filesystems do record file creation times&lt;sup id="fnref:2"&gt;2&lt;/sup&gt;, until recently Linux lacked a common interface to actually expose them to userspace applications. As a result, the output of &lt;code&gt;stat&lt;/code&gt; (GNU coreutils v8.30) on an ext4 filesystem (Which does record creation times) looks something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ stat .
  File: .
  Size: 4096 Blocks: 8 IO Block: 4096 directory
Device: 803h/2051d Inode: 3588416 Links: 18
Access: (0775/drwxrwxr-x) Uid: ( 1000/ anmol) Gid: ( 1000/ anmol)
Access: 2019-06-23 10:49:04.056933574 +0000
Modify: 2019-05-19 13:29:59.609167627 +0000
Change: 2019-05-19 13:29:59.609167627 +0000
 Birth: -
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;With the “&lt;code&gt;Birth&lt;/code&gt;” field, meant to show the creation time, sporting a depressing “&lt;code&gt;-&lt;/code&gt;”.&lt;/p&gt;

&lt;p&gt;The fact that &lt;code&gt;ctime&lt;/code&gt; does not mean creation time but change time coupled with the absence of a real creation time interface does lead to quite a bit of confusion. The confusion seems so pervasive that the &lt;code&gt;msdos&lt;/code&gt; driver in the Linux kernel &lt;a href="https://elixir.bootlin.com/linux/v5.1.14/source/fs/fat/inode.c#L883"&gt;happily clobbers&lt;/a&gt; the FAT creation time with the Unix change time!&lt;/p&gt;

&lt;p&gt;The limitations of the current &lt;code&gt;stat()&lt;/code&gt; system call have been known for some time. A new system call providing extended attributes was &lt;a href="https://www.spinics.net/lists/linux-fsdevel/msg33831.html"&gt;first proposed in 2010&lt;/a&gt; with the new &lt;a href="https://lwn.net/Articles/685791/#statx"&gt;&lt;code&gt;statx()&lt;/code&gt;&lt;/a&gt; interface finally &lt;a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a528d35e8bfcc521d7cb70aaf03e1bd296c8493f"&gt;being merged into Linux 4.11 in 2017&lt;/a&gt;. It took so long at least in part because kernel developers quickly ran into one of the hardest problems in Computer Science: &lt;a href="https://lkml.org/lkml/2010/7/22/249"&gt;naming things&lt;/a&gt;. Because there was no standard to guide them, each filesystem took to calling creation time by a different name. &lt;a href="https://elixir.bootlin.com/linux/v5.1.14/source/fs/ext4/ext4.h#L744"&gt;Ext4&lt;/a&gt; and &lt;a href="https://elixir.bootlin.com/linux/v5.1.14/source/fs/xfs/libxfs/xfs_inode_buf.h#L40"&gt;XFS&lt;/a&gt; called it &lt;code&gt;crtime&lt;/code&gt; while &lt;a href="https://elixir.bootlin.com/linux/v5.1.14/source/fs/btrfs/btrfs_inode.h#L187"&gt;Btrfs&lt;/a&gt; and &lt;a href="https://elixir.bootlin.com/linux/v5.1.14/source/fs/jfs/jfs_incore.h#L46"&gt;JFS&lt;/a&gt; called it &lt;code&gt;otime&lt;/code&gt;. Implementations also have slightly different semantics with JFS storing creation time only with the &lt;a href="https://elixir.bootlin.com/linux/v5.1.14/source/fs/jfs/jfs_imap.c#L3166"&gt;precision of seconds&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Glibc took a while to add a wrapper for statx() with support landing in &lt;a href="https://www.sourceware.org/ml/libc-alpha/2018-08/msg00003.html"&gt;version 2.28&lt;/a&gt; which was released in 2018. Fast forward to March 2019 when GNU &lt;a href="https://lists.gnu.org/archive/html/coreutils-announce/2019-03/msg00000.html"&gt;coreutils 8.31&lt;/a&gt; was released with &lt;code&gt;stat&lt;/code&gt; finally gaining support for reading the file creation time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ stat .
  File: .
  Size: 4096 Blocks: 8 IO Block: 4096 directory
Device: 803h/2051d Inode: 3588416 Links: 18
Access: (0775/drwxrwxr-x) Uid: ( 1000/ anmol) Gid: ( 1000/ anmol)
Access: 2019-06-23 10:49:04.056933574 +0000
Modify: 2019-05-19 13:29:59.609167627 +0000
Change: 2019-05-19 13:29:59.609167627 +0000
 Birth: 2019-05-19 13:13:50.100925514 +0000
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;






&lt;ol&gt;
&lt;li&gt;The impact of &lt;code&gt;atime&lt;/code&gt; on disk performance is mitigated by the use of &lt;a href="https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/power_management_guide/relatime"&gt;&lt;code&gt;relatime&lt;/code&gt;&lt;/a&gt; on modern Linux systems. &lt;sup&gt;[return]&lt;/sup&gt;
&lt;/li&gt;
&lt;li&gt;For ext4, one can get the &lt;code&gt;crtime&lt;/code&gt; of a file using the &lt;code&gt;stat&lt;/code&gt; subcommand of the confusingly named &lt;a href="https://linux.die.net/man/8/debugfs"&gt;&lt;code&gt;debugfs&lt;/code&gt;&lt;/a&gt; utility. &lt;sup&gt;[return]&lt;/sup&gt;
&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>linux</category>
      <category>filesystem</category>
      <category>todayilearned</category>
    </item>
    <item>
      <title>Network Redirections in Bash</title>
      <dc:creator>Anmol Sarma</dc:creator>
      <pubDate>Sat, 04 May 2019 15:29:15 +0000</pubDate>
      <link>https://forem.com/anmolsarma/network-redirections-in-bash-40i1</link>
      <guid>https://forem.com/anmolsarma/network-redirections-in-bash-40i1</guid>
      <description>&lt;p&gt;A few months ago, while reading the man page for &lt;code&gt;recvmmsg()&lt;/code&gt;, I came across this snippet:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ while true; do echo $RANDOM &amp;gt; /dev/udp/127.0.0.1/1234;
     sleep 0.25; done
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;And as advertised, it sends a UDP datagram containing a random number to port 1234 every 250 ms. I didn’t recall ever seeing a &lt;code&gt;/dev/udp&lt;/code&gt; and so was a bit surprised that it worked. And as it happens, &lt;code&gt;ls&lt;/code&gt; was not able to access the file that I had just written to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ls: cannot access '/dev/udp/127.0.0.1/1234': No such file or directory
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Puzzled and intrigued, I &lt;code&gt;echoe&lt;/code&gt;d &lt;em&gt;Foo Bar Baz&lt;/em&gt; to &lt;code&gt;/dev/udp/127.0.0.1/1337&lt;/code&gt; and reached for &lt;code&gt;strace&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;...
2423 socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP) = 4
12423 connect(4, {sa_family=AF_INET, sin_port=htons(1337), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
12423 fcntl(1, F_GETFD) = 0
12423 fcntl(1, F_DUPFD, 10) = 10
12423 fcntl(1, F_GETFD) = 0
12423 fcntl(10, F_SETFD, FD_CLOEXEC) = 0
12423 dup2(4, 1) = 1
12423 close(4) = 0
12423 fstat(1, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
12423 write(1, "Foo Bar Baz\n", 12) = 12
...
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Seemingly, a normal UDP socket was being created and written to using the regular sycall interface. That refuted my initial suspicion that some kind of a special file backed by the kernel was involved. But who was actually creating the socket?&lt;/p&gt;

&lt;p&gt;A peek at Bash’s code answered that question:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;redir.c:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/* A list of pattern/value pairs for filenames that the redirection
   code handles specially. */
static STRING_INT_ALIST _redir_special_filenames[] = {
#if !defined (HAVE_DEV_FD)
  { "/dev/fd/[0-9]*", RF_DEVFD },
#endif
#if !defined (HAVE_DEV_STDIN)
  { "/dev/stderr", RF_DEVSTDERR },
  { "/dev/stdin", RF_DEVSTDIN },
  { "/dev/stdout", RF_DEVSTDOUT },
#endif
#if defined (NETWORK_REDIRECTIONS)
  { "/dev/tcp/*/*", RF_DEVTCP },
  { "/dev/udp/*/*", RF_DEVUDP },
#endif
  { (char *)NULL, -1 }
};
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;So, redirection involving &lt;code&gt;/dev/udp/&lt;/code&gt; is handled specially by Bash&lt;sup id="fnref:1"&gt;1&lt;/sup&gt; and it uses BSD Sockets API to create a socket:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;lib/sh/netopen.c:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/*
 * Open a TCP or UDP connection to HOST on port SERV. Uses the
 * traditional BSD mechanisms. Returns the connected socket or -1 on error.
 */
static int
_netopen4(host, serv, typ)
     char *host, *serv;
     int typ;
{
  struct in_addr ina;
  struct sockaddr_in sin;
  unsigned short p;
  int s, e;

  if (_getaddr(host, &amp;amp;ina) == 0)
    {
      internal_error (_("%s: host unknown"), host);
      errno = EINVAL;
      return -1;
    }

  if (_getserv(serv, typ, &amp;amp;p) == 0)
    {
      internal_error(_("%s: invalid service"), serv);
      errno = EINVAL;
      return -1;
    }

  memset ((char *)&amp;amp;sin, 0, sizeof(sin));
  sin.sin_family = AF_INET;
  sin.sin_port = p;
  sin.sin_addr = ina;

  s = socket(AF_INET, (typ == 't') ? SOCK_STREAM : SOCK_DGRAM, 0);
  if (s &amp;lt; 0)
    {
      sys_error ("socket");
      return (-1);
    }

  if (connect (s, (struct sockaddr *)&amp;amp;sin, sizeof (sin)) &amp;lt; 0)
    {
      e = errno;
      sys_error("connect");
      close(s);
      errno = e;
      return (-1);
    }

  return(s);
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Which means we can actually make HTTP requests using Bash:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;exec 3&amp;lt;&amp;gt; /dev/tcp/checkip.amazonaws.com/80
printf "GET / HTTP/1.1\r\nHost: checkip.amazonaws.com\r\nConnection: close\r\n\r\n" &amp;gt;&amp;amp;3
tail -n1 &amp;lt;&amp;amp;3
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;No &lt;code&gt;curl&lt;/code&gt; needed! /jk&lt;/p&gt;

&lt;p&gt;Apart from Bash, in the versions and configurations packaged in Ubuntu 18.04, only &lt;code&gt;ksh&lt;/code&gt; supports network redirections – &lt;code&gt;ash&lt;/code&gt;, &lt;code&gt;csh&lt;/code&gt;, &lt;code&gt;dash&lt;/code&gt;, &lt;code&gt;fish&lt;/code&gt;, and &lt;code&gt;zsh&lt;/code&gt; do not.&lt;/p&gt;

&lt;p&gt;I don’t think I will actually have any use for network redirections but this was a fun little rabbit hole to dive into.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;NOTE:&lt;/strong&gt; Code snippets from Bash are licensed under GPLv3, the snippet from the man page is licensed &lt;a href="http://man7.org/linux/man-pages/man2/recvmmsg.2.license.html"&gt;differently&lt;/a&gt;&lt;/p&gt;




&lt;ol&gt;
&lt;li&gt;At least on Linux, the other special patterns handled by bash like &lt;a href="http://www.informit.com/articles/article.aspx?p=99706&amp;amp;seqNum=15"&gt;&lt;code&gt;/dev/fd&lt;/code&gt;&lt;/a&gt; and &lt;code&gt;/dev/stdint&lt;/code&gt; actually are special files backed by the kernel. The &lt;a href="https://www.gnu.org/software/bash/manual/html_node/Redirections.html"&gt;Bash manual&lt;/a&gt; notes that it may emulate them internally on platforms that do not support them. &lt;sup&gt;[return]&lt;/sup&gt;
&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>linux</category>
      <category>bash</category>
    </item>
    <item>
      <title>Single-stepping through the Kernel</title>
      <dc:creator>Anmol Sarma</dc:creator>
      <pubDate>Sun, 03 Feb 2019 13:27:45 +0000</pubDate>
      <link>https://forem.com/anmolsarma/single-stepping-through-the-kernel-4pk8</link>
      <guid>https://forem.com/anmolsarma/single-stepping-through-the-kernel-4pk8</guid>
      <description>&lt;p&gt;There may come a time in a system programmer’s life when she needs to leave the civilized safety of the userland and confront the unspeakable horrors that dwell in the depths of the Kernel space. While &lt;a href="https://lkml.org/lkml/2000/9/6/65" rel="noopener noreferrer"&gt;higher beings might pour scorn&lt;/a&gt; on the very idea of a Kernel debugger, us lesser mortals may have no other recourse but to single-step through Kernel code when the rivers begin to run dry. This guide will help you do just that. We hope you never actually have to.&lt;/p&gt;

&lt;p&gt;Ominous sounding intro-bait notwithstanding, setting up a virtual machine for Kernel debugging isn’t really that difficult. It only needs a bit of preparation. If you just want a copypasta, skip to the end. If you’re interested in the predicaments involved and how to deal with them, read on.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;N.B.:&lt;/strong&gt; “But which kernel are you talking about?”, some heathens may invariably ask when it is obvious that Kernel with a capital K refers to the &lt;a href="https://www.kernel.org/" rel="noopener noreferrer"&gt;One True Kernel&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Building the Kernel
&lt;/h3&gt;

&lt;p&gt;Using a minimal Kernel configuration instead of the kitchen-sink one that distributions usually ship will make life a lot easier. You will first need to grab the source code for the Kernel you are interested in. We will use the latest Kernel release tarball from &lt;a href="https://www.kernel.org/" rel="noopener noreferrer"&gt;kernel.org&lt;/a&gt;, which at the time of writing is &lt;a href="https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.20.6.tar.xz" rel="noopener noreferrer"&gt;4.20.6&lt;/a&gt;. Inside the extracted source directory, invoke the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;make defconfig
make kvmconfig
make -j4
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will build a minimal Kernel image that can be booted in QEMU like so:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;qemu-system-x86_64 -kernel linux-4.20.6/arch/x86/boot/bzImage
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This should bring up an ancient-looking window with a cryptic error message:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.anmolsarma.in%2Fimages%2Fkernel_panic.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.anmolsarma.in%2Fimages%2Fkernel_panic.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You could try pasting the error message into &lt;del&gt;Google&lt;/del&gt; a search engine: Except for the fact that you can’t select the text in the window. And frankly, the window just looks annoying! So, ignoring the actual error for a moment, let’s try to get QEMU to print to the console instead of a spawning a new graphical window:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;qemu-system-x86_64 -kernel -nographic linux-4.20.6/arch/x86/boot/bzImage
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;QEMU spits out a single line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;qemu-system-x86_64: warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://hisham.hm/htop/" rel="noopener noreferrer"&gt;Htop&lt;/a&gt; tells me QEMU is using 100% of a CPU and my laptop fan agrees. But there is no output whatsoever and &lt;code&gt;Ctrl-c&lt;/code&gt; doesn’t work! What &lt;a href="https://superuser.com/a/1211516" rel="noopener noreferrer"&gt;does work&lt;/a&gt;, however, is pressing &lt;code&gt;Ctrl-a&lt;/code&gt; and then hitting &lt;code&gt;x&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;QEMU: Terminated
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Turns out that by passing &lt;code&gt;-nographic&lt;/code&gt;, we have plugged out QEMU’s &lt;em&gt;virtual&lt;/em&gt; monitor. Now, to actually see any output, we need to tell the Kernel to write to a &lt;a href="https://www.kernel.org/doc/html/v4.20/admin-guide/serial-console.html" rel="noopener noreferrer"&gt;serial port&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;qemu-system-x86_64 -nographic -kernel linux-4.20.6/arch/x86/boot/bzImage -append "console=ttyS0"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It worked! Now we can read error message in all its glory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[1.333008] VFS: Cannot open root device "(null)" or unknown-block(0,0): error -6
[1.334024] Please append a correct "root=" boot option; here are the available partitions:
[1.335152] 0b00 1048575 sr0 
[1.335153] driver: sr
[1.335996] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)
[1.337104] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.20.6 #1
[1.337901] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[1.339091] Call Trace:
[1.339437] dump_stack+0x46/0x5b
[1.339888] panic+0xf3/0x248
[1.340295] mount_block_root+0x184/0x248
[1.340838] ? set_debug_rodata+0xc/0xc
[1.341357] mount_root+0x121/0x13f
[1.341837] prepare_namespace+0x130/0x166
[1.342378] kernel_init_freeable+0x1ed/0x1ff
[1.342965] ? rest_init+0xb0/0xb0
[1.343427] kernel_init+0x5/0x100
[1.343888] ret_from_fork+0x35/0x40
[1.344526] Kernel Offset: 0x1200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[1.345956] ---[end Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)]---
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So, the Kernel didn’t find a root filesystem to kick off the user mode and panicked. Lets fix that by creating a root filesystem image.&lt;/p&gt;

&lt;h3&gt;
  
  
  Creating a Root Filesystem
&lt;/h3&gt;

&lt;p&gt;Start by creating an empty image:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;qemu-img create rootfs.img 1G
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And then format it as &lt;a href="https://en.wikipedia.org/wiki/Ext4" rel="noopener noreferrer"&gt;&lt;code&gt;ext4&lt;/code&gt;&lt;/a&gt; and mount it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;mkfs.ext4 rootfs.img
mkdir mnt
sudo mount -o loop rootfs.img mnt/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now we can populate it using &lt;a href="https://wiki.debian.org/Debootstrap" rel="noopener noreferrer"&gt;&lt;code&gt;debootstrap&lt;/code&gt;&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo debootstrap bionic mnt/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will create a root filesystem based on Ubuntu 18.04 Bionic Beaver. Of course, feel free to replace &lt;code&gt;bionic&lt;/code&gt; with any release that you prefer.&lt;/p&gt;

&lt;p&gt;And unmount the filesystem once we’re done. &lt;strong&gt;This is important if you want to avoid corrupted images!&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo umount mnt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now boot the Kernel with our filesystem. We need to tell QEMU to use our image as a virtual hard drive and we also need to tell the Kernel to use the hard drive as the root filesystem:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;qemu-system-x86_64 -nographic -kernel linux-4.20.6/arch/x86/boot/bzImage -hda rootfs.img -append "root=/dev/sda console=ttyS0"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This time the Kernel shouldn’t panic and you should eventually see a login prompt. We could have setup a user while creating the filesystem but it’s annoying to have to login each time we boot up the VM. Let’s enable auto login as root instead.&lt;/p&gt;

&lt;p&gt;Terminate QEMU (&lt;code&gt;Ctrl-a&lt;/code&gt;, &lt;code&gt;x&lt;/code&gt;), mount the filesystem image again and then create the configuration folder structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo mount -o loop rootfs.img mnt/
sudo mkdir -p mnt/etc/systemd/system/serial-getty@ttyS0.service.d
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Add the following lines to &lt;code&gt;mnt/etc/systemd/system/serial-getty@ttyS0.service.d/autologin.conf&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[Service]
ExecStart=
ExecStart=-/sbin/agetty --noissue --autologin root %I $TERM
Type=idle
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Make sure to unmount the filesystem and then boot the Kernel again. This time you should be automatically logged in.&lt;/p&gt;

&lt;p&gt;Gracefully shutdown the VM:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;halt -p
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Attaching a debugger
&lt;/h3&gt;

&lt;p&gt;Let’s rebuild the Kernel with debugging symbols enabled:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;./scripts/config -e CONFIG_DEBUG_INFO
make -j4
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, boot the Kernel again, this time passing the &lt;code&gt;-s&lt;/code&gt; flag which will make QEMU listen on TCP port 1234:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;qemu-system-x86_64 -nographic -kernel linux-4.20.6/arch/x86/boot/bzImage -hda rootfs.img -append "root=/dev/sda console=ttyS0" -s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, in another terminal start gdb and attach to QEMU:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;gdb ./linux-4.20.6/vmlinux 
...
Reading symbols from ./linux-4.20.6/vmlinux...done.
(gdb) target remote :1234
Remote debugging using :1234
0xffffffff95a2f8f4 in ?? ()
(gdb)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can set a breakpoint on Kernel function, for instance &lt;code&gt;do_sys_open()&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;(gdb) b do_sys_open 
Breakpoint 1 at 0xffffffff811b2720: file fs/open.c, line 1049.
(gdb) c
Continuing.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now try opening a file in VM which should result in &lt;code&gt;do_sys_open()&lt;/code&gt; getting invoked… And nothing happens?! The breakpoint in gdb is not hit. This due to a Kernel security feature called &lt;a href="https://lwn.net/Articles/569635/" rel="noopener noreferrer"&gt;KASLR&lt;/a&gt;. KASLR can be disabled at boot time by adding &lt;code&gt;nokaslr&lt;/code&gt; to the Kernel command line arguments. But, let’s actually rebuild the Kernel without KASLR. While we are at it, let’s also disable loadable module support as well which will save us the trouble of copying the modules to the filesystem.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;./scripts/config -e CONFIG_DEBUG_INFO -d CONFIG_RANDOMIZE_BASE -d CONFIG_MODULES
make olddefconfig # Resolve dependencies
make -j4
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Reboot the Kernel again, attach gdb, set a breakpoint on &lt;code&gt;do_sys_open()&lt;/code&gt; and run &lt;code&gt;cat /etc/issue&lt;/code&gt; in the guest. This time the breakpoint should be hit. But probably not where you expected:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Breakpoint 1, do_sys_open (dfd=-100, filename=0x7f96074ad428 "/etc/ld.so.cache", flags=557056, mode=0) at fs/open.c:1049
1049 {
(gdb) c
Continuing.

Breakpoint 1, do_sys_open (dfd=-100, filename=0x7f96076b5dd0 "/lib/x86_64-linux-gnu/libc.so.6", flags=557056, mode=0) at fs/open.c:1049
1049 {
(gdb) c
Continuing.

Breakpoint 1, do_sys_open (dfd=-100, filename=0x7ffe9e630e8e "/etc/issue", flags=32768, mode=0) at fs/open.c:1049
1049 {
(gdb)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Congratulations! From this point, you can single-step away to your heart’s content.&lt;/p&gt;

&lt;p&gt;By default, the root filesystem is mounted read only. If you want to be able to write to it, add &lt;code&gt;rw&lt;/code&gt; after &lt;code&gt;root=/dev/sda&lt;/code&gt; in the Kernel parameters:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;qemu-system-x86_64 -nographic -kernel linux-4.20.6/arch/x86/boot/bzImage -hda rootfs.img -append "root=/dev/sda rw console=ttyS0" -s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Bonus: Networking
&lt;/h3&gt;

&lt;p&gt;You can create a point to point link between the QEMU VM and the host using a &lt;a href="https://en.wikipedia.org/wiki/TUN/TAP" rel="noopener noreferrer"&gt;TAP interface&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;First install &lt;code&gt;tunctl&lt;/code&gt; and create a persistent TAP interface to avoid running QEMU as root:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo apt install uml-utilities
sudo sudo tunctl -u $(id -u)
Set 'tap0' persistent and owned by uid 1000
sudo ip link set tap0 up
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now launch QEMU with a virtual &lt;code&gt;e1000&lt;/code&gt; interface connected the host’s tap0 interface:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;qemu-system-x86_64 -nographic -device e1000,netdev=net0 -netdev tap,id=net0,ifname=tap0 -kernel linux-4.20.6/arch/x86/boot/bzImage -hda rootfs.img -append "root=/dev/sda rw console=ttyS0" -s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once the guest boots up, bring the network interface up:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ip link set enp0s3 up
ip a
1: lo: &amp;lt;LOOPBACK,UP,LOWER_UP&amp;gt; mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: enp0s3: &amp;lt;BROADCAST,MULTICAST,UP,LOWER_UP&amp;gt; mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::5054:ff:fe12:3456/64 scope link 
       valid_lft forever preferred_lft forever
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;QEMU and the host can now communicate using their IPv6 Link-local addresses. After all, it is 2019.&lt;/p&gt;

&lt;h3&gt;
  
  
  Copypasta
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Building a minimal debuggable Kernel
make defconfig
make kvmconfig
./scripts/config -e CONFIG_DEBUG_INFO -d CONFIG_RANDOMIZE_BASE -d CONFIG_MODULES
make olddefconfig
make -j4

# Create root filesystem
qemu-img create rootfs.img 1G
mkfs.ext4 rootfs.img
mkdir mnt
sudo mount -o loop rootfs.img mnt/
sudo debootstrap bionic mnt/

# Add following lines to mnt/etc/systemd/system/serial-getty@ttyS0.service.d/autologin.conf
# START
[Service]
ExecStart=
ExecStart=-/sbin/agetty --noissue --autologin root %I $TERM
Type=idle
# END

# Unmount the filesystem
sudo umount mnt

# Boot Kernel with root file system in QEMU
qemu-system-x86_64 -nographic -kernel linux-4.20.6/arch/x86/boot/bzImage -hda rootfs.img -append "root=/dev/sda rw console=ttyS0" -s

# Attach gdb
gdb ./linux-4.20.6/vmlinux 
(gdb) target remote :1234
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>linux</category>
      <category>kernel</category>
    </item>
    <item>
      <title>DCCP: The socket type you probably never heard of</title>
      <dc:creator>Anmol Sarma</dc:creator>
      <pubDate>Tue, 13 Dec 2016 17:40:50 +0000</pubDate>
      <link>https://forem.com/anmolsarma/dccp-the-socket-type-you-probably-never-heard-of-1gh8</link>
      <guid>https://forem.com/anmolsarma/dccp-the-socket-type-you-probably-never-heard-of-1gh8</guid>
      <description>&lt;p&gt;&lt;em&gt;TL;DR: DCCP is a relatively newer transport layer protocol which draws from both TCP and UDP. Jump straight to the example C code.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Background
&lt;/h2&gt;

&lt;p&gt;Historically, the majority of the traffic on the Internet has been over &lt;a href="https://en.wikipedia.org/wiki/Transmission_Control_Protocol"&gt;TCP&lt;/a&gt; which provides a reliable connection-oriented stream between two hosts. &lt;a href="https://en.wikipedia.org/wiki/User_Datagram_Protocol"&gt;UDP&lt;/a&gt; has been mainly used by applications whose brief transfers would be unacceptably slowed by TCP’s connection establishment overhead or those for which timeliness is more important than reliability. However, the increasing use of UDP for applications such as internet telephony and streaming media which transfer a large amount of data can lead to significant &lt;a href="https://en.wikipedia.org/wiki/Network_congestion"&gt;network congestion&lt;/a&gt;. Since unlike TCP, UDP provides no inherent congestion control mechanism, an application can send UDP datagrams at a much higher rate than the available path capacity and cause congestion along the path. Increased congestion may lead to delays, packet loss and the degradation of the network’s quality of service.&lt;/p&gt;

&lt;p&gt;Applications and protocols that choose to use UDP as their transport must, therefore, employ mechanisms to prevent congestion and to establish some degree of fairness with concurrent traffic so that the network remains usable. A prominent example of such a congestion control scheme is &lt;a href="https://en.wikipedia.org/wiki/LEDBAT"&gt;LEDBAT&lt;/a&gt; employed by &lt;a href="https://en.wikipedia.org/wiki/BitTorrent"&gt;BitTorrent&lt;/a&gt;. However, implementing a congestion control scheme is difficult, time-consuming and error-prone. Multiple non-standard implementations also make it difficult to reason about how applications would respond to network congestion. &lt;a href="https://en.wikipedia.org/wiki/Datagram_Congestion_Control_Protocol"&gt;DCCP&lt;/a&gt; - Datagram Congestion Control Protocol is intended to mitigate this problem as a transport for unreliable datagrams with built-in congestion control.&lt;/p&gt;

&lt;p&gt;From an application programmer’s perspective, DCCP differs from UDP by providing four additional features:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Explicit connection establishment between hosts&lt;/li&gt;
&lt;li&gt;Selectable congestion control schemes&lt;/li&gt;
&lt;li&gt;Path MTU discovery to avoid fragmentation&lt;/li&gt;
&lt;li&gt;Service Codes for identifying applications&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;DCCP makes use of Explicit Congestion Notification but it is transparent the application. DCCP is designed to leave additional functionality such as reliability or Forward Error Correction (FEC) to be layered on top, as and when required rather than at the protocol level itself.&lt;/p&gt;

&lt;h2&gt;
  
  
  Explicit connection establishment
&lt;/h2&gt;

&lt;p&gt;The connection establishment semantics of DCCP mirror those of TCP with a client that actively connects to a server that is passively listening on a port. DCCP connections are bidirectional. Logically, however, a DCCP connection consists of two separate unidirectional connections, called half-connections. Each half-connection is a one-way, unreliable datagram pipe. The rationale for this explained in the next section.&lt;/p&gt;

&lt;h2&gt;
  
  
  Selectable congestion control schemes
&lt;/h2&gt;

&lt;p&gt;TCP implements congestion control entirely transparently to the application. While it is possible to configure the host to use a specific variant, there is no way for the application to discover which congestion control scheme is in force, let alone negotiate one. DCCP, however, can cater to the different needs of applications by allowing applications to negotiate the congestion control schemes. In fact, each of the half-connections can use a different scheme, allowing for greater control.&lt;/p&gt;

&lt;p&gt;Congesting the network by sending data at a rate that is faster than the slowest link between the endpoints will overwhelm it. This may lead to packet loss leading to retransmissions which may, in turn, lead to further congestion. The solution to this problem is to start transmitting data at a slow rate on a new connection and to then ramp up the speed until packet loss is detected. The transmission rate may then be scaled back until no further packet loss occurs. The optimum speed at which to transfer data will change with network conditions over the life of the connection. Congestion control schemes differ in how packet loss is estimated and the rate at which is the transmission speed is ramped up or scaled back. DCCP congestion control schemes are denoted by Congestion Control Identifiers - CCIDs. Currently, three CCIDs have been formally specified:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://tools.ietf.org/html/rfc4341"&gt;CCID 2&lt;/a&gt; - TCP-like Congestion Control:&lt;/strong&gt; A quick reacting scheme modelled after TCP which will rapidly ramp up speed to take advantage of available bandwidth and also rapidly scale back when congestion is detected. Suitable for applications that can handle large swings in transmission rates.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://tools.ietf.org/html/rfc5348"&gt;CCID 3&lt;/a&gt; - TCP-Friendly Rate Control (TFRC):&lt;/strong&gt; A slower reacting scheme intended to be friendly to concurrent TCP flows in the network. Provides a relatively smoother sending rate at the expense of possibly not utilising all available bandwidth. Suitable for media streaming applications that prefer to minimise abrupt changes in the sending rate.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://tools.ietf.org/html/rfc4828"&gt;CCID 4&lt;/a&gt; - TCP-Friendly Rate Control for Small Packets (TFRC-SP):&lt;/strong&gt; An experimental scheme for applications that use a small datagram size and those that change their sending rate by varying the datagram size.&lt;/p&gt;

&lt;p&gt;In addition, the Linux kernel’s &lt;a href="https://github.com/uoaerg/linux-dccp"&gt;DCCP Test Tree&lt;/a&gt; contains an experimental implementation of a scheme modelled after &lt;a href="https://en.wikipedia.org/wiki/CUBIC_TCP"&gt;TCP CUBIC&lt;/a&gt;. There is also a mode that disables congestion control altogether for &lt;em&gt;UDP-like&lt;/em&gt; behaviour.&lt;/p&gt;

&lt;h2&gt;
  
  
  PMTU discovery
&lt;/h2&gt;

&lt;p&gt;Data between two internet hosts is transferred transmitted as a series of IP packets that pass through intermediate links. Each of these links has a maximum packet size or maximum transmission unit (MTU) that it can transmit without having to break it up into smaller fragments. The largest packet size that does not require fragmentation anywhere along a path is referred to as the path maximum transmission unit or PMTU. Applications can usually get better error tolerance by producing packets smaller than the PMTU. DCCP defines a maximum packet size (MPS) based on the PMTU and the congestion control scheme used for each connection. DCCP implementations will not send any packet bigger than the MPS and instead return an appropriate error to the application. The application can query the DCCP stack for the current MPS and restrict itself from sending datagrams larger than this value and thereby avoid &lt;a href="https://en.wikipedia.org/wiki/IP_fragmentation"&gt;fragmentation&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Service Codes
&lt;/h2&gt;

&lt;p&gt;DCCP defines a 32 bit Service Code to disambiguate between multiple applications associated with a single a server port. The client specifies the Service Code it wants to connect to and this is used to identify the intended service or application to process a DCCP connection request. Essentially, Service Codes provide an additional level of indirection for connection multiplexing. A server listening on a port may be associated with multiple Service Codes but a client may have only one Service Code, indicating the application it wishes to connect to.&lt;/p&gt;

&lt;h2&gt;
  
  
  Usage
&lt;/h2&gt;

&lt;p&gt;The mainline Linux kernel has included DCCP support since &lt;a href="https://lwn.net/Articles/149756/"&gt;2.6.14&lt;/a&gt; and mainstream distributions like Ubuntu enable it by default. However, to get the newer experimental features, you will have to build the kernel from the DCCP Test Tree. Or you can also grab the latest stable kernel release merged with the experimental DCCP changes from &lt;a href="https://github.com/unmole/linux-dccp/releases/latest"&gt;here&lt;/a&gt;. Be sure to enable all the CCIDs in the kernel configuration in &lt;em&gt;Networking Support&lt;/em&gt; –&amp;gt; &lt;em&gt;Networking Options&lt;/em&gt; –&amp;gt; &lt;em&gt;The DCCP Protocol&lt;/em&gt; –&amp;gt; &lt;em&gt;DCCP CCIDs Configuration&lt;/em&gt;. Like the Debian Installation Guide Says, “&lt;em&gt;Don’t be afraid to try compiling the kernel. It’s fun and profitable.&lt;/em&gt;” For now, Linux is the only operating system supporting native DCCP, unless you count the patch for an ancient version of FreeBSD.&lt;/p&gt;

&lt;h2&gt;
  
  
  Example in C
&lt;/h2&gt;

&lt;p&gt;The server and client look almost exactly the same as their TCP counterparts with the exception fo the socket type and setting of the service code. The client uses &lt;em&gt;getsockopt()&lt;/em&gt; to read the current maximum packet size. Reading the available CCIDs on the host is shown in &lt;strong&gt;probe.c&lt;/strong&gt;. As libc doesn’t still have a &lt;strong&gt;netinet/dccp.h&lt;/strong&gt; header, you will have to get the required constants from the kernel sources or directly use the &lt;strong&gt;dccp.h&lt;/strong&gt; header below. &lt;a href="https://www.anmolsarma.in/dl/dccp_socket_example.tar.gz"&gt;Download Code&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;server.c&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;#include &amp;lt;stdio.h&amp;gt;
#include &amp;lt;stdlib.h&amp;gt;
#include &amp;lt;unistd.h&amp;gt;
#include &amp;lt;sys/socket.h&amp;gt;
#include &amp;lt;netinet/in.h&amp;gt;
#include &amp;lt;arpa/inet.h&amp;gt;
#include &amp;lt;errno.h&amp;gt;

#include "dccp.h"

#define PORT 1337
#define SERVICE_CODE 42

int error_exit(const char *str)
{
    perror(str);
    exit(errno);
}

int main(int argc, char **argv)
{
    int listen_sock = socket(AF_INET, SOCK_DCCP, IPPROTO_DCCP);
    if (listen_sock &amp;lt; 0)
        error_exit("socket");

    struct sockaddr_in servaddr = {
        .sin_family = AF_INET,
        .sin_addr.s_addr = htonl(INADDR_ANY),
        .sin_port = htons(PORT),
    };

    if (setsockopt(listen_sock, SOL_SOCKET, SO_REUSEADDR, &amp;amp;(int) {
               1}, sizeof(int)))
        error_exit("setsockopt(SO_REUSEADDR)");

    if (bind(listen_sock, (struct sockaddr *)&amp;amp;servaddr, sizeof(servaddr)))
        error_exit("bind");

    // DCCP mandates the use of a 'Service Code' in addition the port
    if (setsockopt(listen_sock, SOL_DCCP, DCCP_SOCKOPT_SERVICE, &amp;amp;(int) {
               htonl(SERVICE_CODE)}, sizeof(int)))
        error_exit("setsockopt(DCCP_SOCKOPT_SERVICE)");

    if (listen(listen_sock, 1))
        error_exit("listen");

    for (;;) {

        printf("Waiting for connection...\n");

        struct sockaddr_in client_addr;
        socklen_t addr_len = sizeof(client_addr);

        int conn_sock = accept(listen_sock, (struct sockaddr *)&amp;amp;client_addr, &amp;amp;addr_len);
        if (conn_sock &amp;lt; 0) {
            perror("accept");
            continue;
        }

        printf("Connection received from %s:%d\n",
               inet_ntoa(client_addr.sin_addr), ntohs(client_addr.sin_port));

        for (;;) {
            char buffer[1024];
            // Each recv() will read only one individual message.
            // Datagrams, not a stream!
            int ret = recv(conn_sock, buffer, sizeof(buffer), 0);
            if (ret &amp;gt; 0)
                printf("Received: %s\n", buffer);
            else
                break;

        }

        close(conn_sock);
    }
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;client.c&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;#include &amp;lt;stdio.h&amp;gt;
#include &amp;lt;stdlib.h&amp;gt;
#include &amp;lt;string.h&amp;gt;
#include &amp;lt;unistd.h&amp;gt;
#include &amp;lt;sys/socket.h&amp;gt;
#include &amp;lt;netinet/in.h&amp;gt;
#include &amp;lt;arpa/inet.h&amp;gt;
#include &amp;lt;errno.h&amp;gt;

#include "dccp.h"

int error_exit(const char *str)
{
    perror(str);
    exit(errno);
}

int main(int argc, char *argv[])
{
    if (argc &amp;lt; 5) {
        printf("Usage: ./client &amp;lt;server address&amp;gt; &amp;lt;port&amp;gt; &amp;lt;service code&amp;gt; &amp;lt;message 1&amp;gt; [message 2] ... \n");
        exit(-1);
    }
    struct sockaddr_in server_addr = {
        .sin_family = AF_INET,
        .sin_port = htons(atoi(argv[2])),
    };

    if (!inet_pton(AF_INET, argv[1], &amp;amp;server_addr.sin_addr.s_addr)) {
        printf("Invalid address %s\n", argv[1]);
        exit(-1);
    }

    int socket_fd = socket(AF_INET, SOCK_DCCP, IPPROTO_DCCP);
    if (socket_fd &amp;lt; 0)
        error_exit("socket");

    if (setsockopt(socket_fd, SOL_DCCP, DCCP_SOCKOPT_SERVICE, &amp;amp;(int) {htonl(atoi(argv[3]))}, sizeof(int)))
        error_exit("setsockopt(DCCP_SOCKOPT_SERVICE)");

    if (connect(socket_fd, (struct sockaddr *) &amp;amp;server_addr, sizeof(server_addr)))
        error_exit("connect");

    // Get the maximum packet size
    uint32_t mps;
    socklen_t res_len = sizeof(mps);
    if (getsockopt(socket_fd, SOL_DCCP, DCCP_SOCKOPT_GET_CUR_MPS, &amp;amp;mps, &amp;amp;res_len))
        error_exit("getsockopt(DCCP_SOCKOPT_GET_CUR_MPS)");
    printf("Maximum Packet Size: %d\n", mps);

    for (int i = 4; i &amp;lt; argc; i++) {
        if (send(socket_fd, argv[i], strlen(argv[i]) + 1, 0) &amp;lt; 0)
            error_exit("send");
    }

    // Wait for a while to allow all the messages to be transmitted
    usleep(5 * 1000);

    close(socket_fd);
    return 0;
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;probe.c&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;#include &amp;lt;stdio.h&amp;gt;
#include &amp;lt;sys/socket.h&amp;gt;
#include &amp;lt;netinet/in.h&amp;gt;

#include "dccp.h"

int main()
{
    int sock_fd = socket(AF_INET, SOCK_DCCP, IPPROTO_DCCP);

    // Check the congestion control schemes available
    socklen_t res_len = 6;
    uint8_t ccids[6];
    if (getsockopt(sock_fd, SOL_DCCP, DCCP_SOCKOPT_AVAILABLE_CCIDS, ccids, &amp;amp;res_len)) {
        perror("getsockopt(DCCP_SOCKOPT_AVAILABLE_CCIDS)");
        return -1;
    }

    printf("%d CCIDs available:", res_len);
    for (int i = 0; i &amp;lt; res_len; i++)
        printf(" %d", ccids[i]);

    return res_len;
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;dccp.h&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/* This file only contains constants necessary for user space to call
 * into the kernel and thus, contains no copyrightable information. */

#ifndef DCCP_DCCP_H
#define DCCP_DCCP_H

// From the kernel's include/linux/socket.h
#define SOL_DCCP 269

// From kernel's include/uapi/linux/dccp.h
#define DCCP_SOCKOPT_SERVICE 2
#define DCCP_SOCKOPT_CHANGE_L 3
#define DCCP_SOCKOPT_CHANGE_R 4
#define DCCP_SOCKOPT_GET_CUR_MPS 5
#define DCCP_SOCKOPT_SERVER_TIMEWAIT 6
#define DCCP_SOCKOPT_SEND_CSCOV 10
#define DCCP_SOCKOPT_RECV_CSCOV 11
#define DCCP_SOCKOPT_AVAILABLE_CCIDS 12
#define DCCP_SOCKOPT_CCID 13
#define DCCP_SOCKOPT_TX_CCID 14
#define DCCP_SOCKOPT_RX_CCID 15
#define DCCP_SOCKOPT_QPOLICY_ID 16
#define DCCP_SOCKOPT_QPOLICY_TXQLEN 17
#define DCCP_SOCKOPT_CCID_RX_INFO 128
#define DCCP_SOCKOPT_CCID_TX_INFO 192

#endif //DCCP_DCCP_H
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;h2&gt;
  
  
  Caveats and Conclusion
&lt;/h2&gt;

&lt;p&gt;DCCP is not mainstream. It is not widely deployed or even supported. Documentation is sparse. Although Linux DCCP NAT is functional, many intermediate boxes will probably just drop DCCP traffic. DCCP is the Fixed-gear bicycle of Layer 4, it is the ultimate hipster transport.&lt;/p&gt;

</description>
      <category>network</category>
      <category>protocol</category>
    </item>
  </channel>
</rss>
