<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: keaukraine</title>
    <description>The latest articles on Forem by keaukraine (@keaukraine).</description>
    <link>https://forem.com/keaukraine</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F335640%2F40783790-a067-4b28-9859-fe89c6f5bd4f.png</url>
      <title>Forem: keaukraine</title>
      <link>https://forem.com/keaukraine</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/keaukraine"/>
    <language>en</language>
    <item>
      <title>Automatically picking the best ASTC textures quality</title>
      <dc:creator>keaukraine</dc:creator>
      <pubDate>Fri, 29 Aug 2025 14:20:26 +0000</pubDate>
      <link>https://forem.com/keaukraine/automatically-picking-the-best-astc-textures-quality-2ke8</link>
      <guid>https://forem.com/keaukraine/automatically-picking-the-best-astc-textures-quality-2ke8</guid>
      <description>&lt;p&gt;In this small article I’ll describe the simple script I’ve created while working on the Cartoon Lighthouse live wallpaper. You can find a live web demo of this app &lt;a href="https://keaukraine.github.io/webgl-kmp-cartoonlighthouse/index.html" rel="noopener noreferrer"&gt;here&lt;/a&gt; and take a look at its source code &lt;a href="https://github.com/keaukraine/webgl-kmp-cartoonlighthouse" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I’ve encountered a slight problem with encoding 31 textures into ASTC format. The thing is that usually in all our apps geometries are batched and merged to reduce the amount of draw calls. So typically we use a couple of large texture atlases for all geometries. In this scene however geometries were not batched and I decided to keep this as is since the total amount of draw calls is still rather small (around 30). So this resulted in 31 textures which were to be converted into ASTC format.&lt;/p&gt;

&lt;p&gt;So usually I just encoded textures with some ASTC block sizes (4x4, 6x6, 8x8 and 10x10) and manually picked the one which looked good enough for my liking. While this was OK for a handful of textures it was quite a cumbersome process for 31 ones. This needed some automation, and fortunately I’ve encountered this &lt;a href="https://x.com/castano/status/1953247742941380822" rel="noopener noreferrer"&gt;wonderful tweet&lt;/a&gt; about quality improvements of Spark realtime texture encoder (make sure to check it out, the project is fascinating by itself). What was interesting in it for me is the usage of a tool called SSIMULACRA2. This &lt;a href="https://github.com/cloudinary/ssimulacra2" rel="noopener noreferrer"&gt;open-source&lt;/a&gt; tool compares two images and gives it a quality score. And the algorithm is specifically tuned to look for both user-perceived and blocky compression-related artifacts in the images! That’s exactly what I needed!&lt;/p&gt;

&lt;p&gt;Over the weekend I’ve created a bash script (bash is not my native language so I’ve used ChatGPT to help me here and there) and tested it. The algorithm of the script can be described with this pseudocode:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;for each input file  
    for each ASTC block size from the lowest to the highest quality  
        encode image and save its decoded image  
        compare quality between original and decoded images  
        if quality is within threshold encode with this block size
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can optionally specify SSIMULACRA2 score but the default is 70 which can be described as “there are some compression artifacts if you look for them specifically”.&lt;/p&gt;

&lt;p&gt;There’s additional logic for the detection of mipmaps - these must be encoded with the same compression so only the highest level is tested for quality.&lt;/p&gt;

&lt;p&gt;There’s a limitation of the SSIMULACRA2 tool - it doesn’t validate images smaller than 8x8 pixels so these are encoded with fallback 4x4 block size.&lt;/p&gt;

&lt;p&gt;Also I’ve played around with bash output a little so it is not so silent while working - it informs how it processes the current image, and shows scores for the processed images:  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqiukerjivwivk3ranmeh.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqiukerjivwivk3ranmeh.gif" alt=" " width="720" height="405"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This saved not only my time - I didn’t have to check the quality of each file manually - but file sizes too! Because of my laziness I checked manually only a subset of ASTC block sizes but this script relentlessly runs through all 14 blocks supported by ASTC, choosing the best suitable one. And for some smallish textures with quite uniform images even 12x12 block size was good enough.&lt;/p&gt;

&lt;p&gt;You can check the source code of the script &lt;a href="https://github.com/keaukraine/astc-compression" rel="noopener noreferrer"&gt;here on the GitHub&lt;/a&gt; and adapt it for your needs. Also it has an imagemagick script to create mip maps.  &lt;/p&gt;

</description>
      <category>astc</category>
      <category>texture</category>
      <category>codequality</category>
      <category>ssimulacra2</category>
    </item>
    <item>
      <title>How to improve MSAA performance of MTKView</title>
      <dc:creator>keaukraine</dc:creator>
      <pubDate>Fri, 10 Jan 2025 18:42:26 +0000</pubDate>
      <link>https://forem.com/keaukraine/how-to-improve-msaa-performance-of-mtkview-25da</link>
      <guid>https://forem.com/keaukraine/how-to-improve-msaa-performance-of-mtkview-25da</guid>
      <description>&lt;p&gt;The easiest and fastest way to use Metal in MacOS app is to use &lt;code&gt;MTKView&lt;/code&gt;. It is a handy wrapper which initializes all low-level stuff under the hood so you can get right to the fun part — implementing actual rendering.&lt;/p&gt;

&lt;p&gt;However, because of its simplicity it does have a couple of shortcomings and for some reasons doesn’t provide access to all internal things under its hood. One of these minor inconveniences is the way it initializes multisampled render targets.&lt;/p&gt;

&lt;p&gt;To understand why this is important let’s explain how Metal handles MSAA. It supports multiple ways of implementing it:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;You can have a multisampled render target and then resolve it to on-screen render target automatically.&lt;/li&gt;
&lt;li&gt;You can use this multisampled render target with custom resolve shaders.&lt;/li&gt;
&lt;li&gt;On supported hardware, you can omit mutlisampled render target and resolve it automatically directly to the final render target. This will still use multisampled render target but it will be memoryless.&lt;/li&gt;
&lt;li&gt;The same approach but with tile shaders to resolve (apply custom tone mapping, etc).&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;You can find a more detailed explanation of these methods with a sample XCode project in this official Apple documentation article — &lt;a href="https://developer.apple.com/documentation/metal/metal_sample_code_library/improving_edge-rendering_quality_with_multisample_antialiasing_msaa" rel="noopener noreferrer"&gt;https://developer.apple.com/documentation/metal/metal_sample_code_library/improving_edge-rendering_quality_with_multisample_antialiasing_msaa&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;What is of a particular interest for us is the memoryless multisampled render targets. They are very efficient since they are transient and reside only in (extremely fast and tiny) temporary tile memory of GPU. Because of this they don’t use main memory allocations and don’t add up to precious VRAM access bandwidth.&lt;/p&gt;

&lt;p&gt;Here is the typical 4x MSAA rasterization process with default render pass created by &lt;code&gt;MTKView&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqs5rq7vyyubumkkrxd0y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqs5rq7vyyubumkkrxd0y.png" alt="Default MSAA pipeline" width="800" height="180"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And here is the same one but using efficient memoryless render target:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy96oy6cb3tjbbme09u0c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy96oy6cb3tjbbme09u0c.png" alt="Image description" width="800" height="187"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Basically, the only difference is that we substitute transient multisampled render target with the memoryless one and this results in a huge improvement in memory allocation and bandwidth. Please note that according to the &lt;a href="https://developer.apple.com/metal/Metal-Feature-Set-Tables.pdf" rel="noopener noreferrer"&gt;Metal Feature Set&lt;/a&gt; tables memoryless render targets are not supported by old devices. Namely, Intel-based Macs don’t support tiled rendering and cannot use them. But if you target shiny new Apple-silicon devices then you definitely must use them because they are so extremely efficient.&lt;/p&gt;

&lt;p&gt;The thing is that (possibly for a better support of all hardware) &lt;code&gt;MTKView&lt;/code&gt; initializes MSAA only with classic in-memory render targets — multisampled one for the rendering and and the final one for resolving and presenting result on the screen.&lt;/p&gt;

&lt;p&gt;In the aforementioned official Metal MSAA example you can find a proper way of initialization of memoryless MSAA resolve but it doesn’t use this handy &lt;code&gt;MTKView&lt;/code&gt; — instead there’s quite a lot of glue code to make it work.&lt;/p&gt;

&lt;p&gt;However I’ve found a hacky yet relatively simple and perfectly working way of initializing efficient memoryless MSAA resolve using the default &lt;code&gt;MTKView&lt;/code&gt; wrapper view.&lt;br&gt;
Let’s take a look at what configuration options &lt;code&gt;MTKView&lt;/code&gt; does provide.&lt;br&gt;
Obviously there’s a &lt;code&gt;sampleCount&lt;/code&gt; which will initialize MSAA render targets. Also there are &lt;code&gt;depthStencilPixelFormat&lt;/code&gt; and &lt;code&gt;depthStencilStorageMode&lt;/code&gt; fields. And you can change depth+stencil to use memoryless storage too by setting &lt;code&gt;depthStencilStorageMode=.memoryless&lt;/code&gt;, which also saves a lot of RAM usage and bandwidth in case you don’t need depth information of your frames.&lt;br&gt;
Here’s a typical &lt;code&gt;MTKView&lt;/code&gt; initialization code (for an Apple-silicon GPUs, which support memoryless textures):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight swift"&gt;&lt;code&gt;&lt;span class="n"&gt;_metalView&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;depthStencilPixelFormat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;depth32Float&lt;/span&gt;
&lt;span class="n"&gt;_metalView&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;depthStencilStorageMode&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memoryless&lt;/span&gt;
&lt;span class="n"&gt;_metalView&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;preferredFramesPerSecond&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;
&lt;span class="n"&gt;_metalView&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sampleCount&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt; &lt;span class="c1"&gt;// hard-coded 4 samples but you can query max available samples for GPU and set it accordingly&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s cool, let’s also switch color render target to the memoryless mode too! Unfortunately for the color render target there is only &lt;code&gt;colorPixelFormat&lt;/code&gt; available (typically set up automatically) and there is no &lt;code&gt;colorStorageMode&lt;/code&gt;. So there’s no easy way to just set it up to use memoryless MSAA mode.&lt;/p&gt;

&lt;p&gt;Still there’s a relatively simple way of &lt;em&gt;switching&lt;/em&gt; it to the memoryless mode after it has been initialized!&lt;/p&gt;

&lt;p&gt;The thing is that Metal API allows you to change the MSAA resolve texture of the current render pass. The descriptor of this render pass is provided to you by &lt;code&gt;MTKView&lt;/code&gt; and obviously it is pre-initialized with in-memory texture.&lt;br&gt;
So all you need to do is on the first frame you draw to create a memoryless render target and substitute the default resolve texture with the new one.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight swift"&gt;&lt;code&gt;&lt;span class="c1"&gt;// New memoryless MSAA texture&lt;/span&gt;
&lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="nv"&gt;textureMsaa&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;MTLTexture&lt;/span&gt;&lt;span class="p"&gt;?&lt;/span&gt;

&lt;span class="o"&gt;.................&lt;/span&gt;

&lt;span class="kd"&gt;func&lt;/span&gt; &lt;span class="nf"&gt;yourCodeToDrawStuff&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Before rendering, create and replace MSAA resolve RTT.&lt;/span&gt;
    &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;resolveTexture&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;view&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;currentRenderPassDescriptor&lt;/span&gt;&lt;span class="p"&gt;?&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;colorAttachments&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;resolveTexture&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;resolveTexture&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="kc"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;width&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;resolveTexture&lt;/span&gt;&lt;span class="o"&gt;!.&lt;/span&gt;&lt;span class="n"&gt;width&lt;/span&gt;
            &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;height&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;resolveTexture&lt;/span&gt;&lt;span class="o"&gt;!.&lt;/span&gt;&lt;span class="n"&gt;height&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;textureMsaa&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="kc"&gt;nil&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="n"&gt;textureMsaa&lt;/span&gt;&lt;span class="p"&gt;?&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;width&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;width&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="n"&gt;textureMsaa&lt;/span&gt;&lt;span class="p"&gt;?&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;height&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;height&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="c1"&gt;// Auto-purge the old unused resolve texture&lt;/span&gt;
                &lt;span class="n"&gt;renderPassDescriptor&lt;/span&gt;&lt;span class="p"&gt;?&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;colorAttachments&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;texture&lt;/span&gt;&lt;span class="p"&gt;?&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setPurgeableState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;volatile&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="n"&gt;textureMsaa&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="nf"&gt;create2DRenderTargetMemoryless&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;width&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;width&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;height&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;height&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;pixelFormat&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bgra8Unorm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;metalDevice&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="n"&gt;textureMsaa&lt;/span&gt;&lt;span class="p"&gt;?&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;label&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"Main pass RTT"&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nf"&gt;fatalError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Cannot create MSAA texture: &lt;/span&gt;&lt;span class="se"&gt;\(&lt;/span&gt;&lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="se"&gt;)&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="c1"&gt;// Use new memoryless texture&lt;/span&gt;
    &lt;span class="n"&gt;view&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;currentRenderPassDescriptor&lt;/span&gt;&lt;span class="p"&gt;?&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;colorAttachments&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;texture&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;textureMsaa&lt;/span&gt;

    &lt;span class="c1"&gt;// Do you rendering here as usual&lt;/span&gt;
    &lt;span class="o"&gt;.....................&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;func&lt;/span&gt; &lt;span class="nf"&gt;create2DRenderTargetMemoryless&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;width&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;Int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;height&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;Int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;pixelFormat&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;MTLPixelFormat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;metalDevice&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;MTLDevice&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;throws&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="kt"&gt;MTLTexture&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;descriptor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kt"&gt;MTLTextureDescriptor&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;texture2DDescriptor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;pixelFormat&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;pixelFormat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;width&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;width&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;height&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;height&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;mipmapped&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;descriptor&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;textureType&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;type2DMultisample&lt;/span&gt;
    &lt;span class="n"&gt;descriptor&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sampleCount&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt; &lt;span class="c1"&gt;// Yes I use hard-coded 4 samples here too :)&lt;/span&gt;
    &lt;span class="n"&gt;descriptor&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;renderTarget&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;descriptor&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;resourceOptions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;storageModeMemoryless&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;metalDevice&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;makeTexture&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;descriptor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;descriptor&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="kc"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="kt"&gt;RuntimeError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Cannot create texture with pixelFormat &lt;/span&gt;&lt;span class="se"&gt;\(&lt;/span&gt;&lt;span class="n"&gt;pixelFormat&lt;/span&gt;&lt;span class="se"&gt;)&lt;/span&gt;&lt;span class="s"&gt; of size &lt;/span&gt;&lt;span class="se"&gt;\(&lt;/span&gt;&lt;span class="n"&gt;width&lt;/span&gt;&lt;span class="se"&gt;)&lt;/span&gt;&lt;span class="s"&gt;x&lt;/span&gt;&lt;span class="se"&gt;\(&lt;/span&gt;&lt;span class="n"&gt;height&lt;/span&gt;&lt;span class="se"&gt;)&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This simple trick effectively substitutes the auto-created in-memory color render target with the memoryless one.&lt;/p&gt;

&lt;p&gt;There is one important step to do — you must set the purgeable state to volatile for the old unused render target in order for it to free memory. Otherwise even if it won’t be used it will still keep a large amount of memory allocated for it. This is an extremely powerful and easy-to-use feature of Metal API which I love — if you don’t use some resource, API can get rid of it for you automagically. You don’t have to manually delete them as in OpenGL.&lt;/p&gt;

&lt;p&gt;Here are some final memory usage comparisons on a MacBook Air M1 with a full-screen 2560x1600 render target:&lt;/p&gt;

&lt;p&gt;First, a default approach — &lt;code&gt;MTKView&lt;/code&gt; with 4x MSAA in-memory resolve texture:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fonkhqof7uk610krf6imf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fonkhqof7uk610krf6imf.png" alt="Image description" width="800" height="365"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This multisampled texture uses 78 MB of memory which is being accessed (both write and read) on every frame!&lt;/p&gt;

&lt;p&gt;And here is the memoryless one:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9fmka7y9ebmhkdaojnqr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9fmka7y9ebmhkdaojnqr.png" alt="Image description" width="800" height="366"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Notice the 78 MB texture is now listed in the unused resources. It actually uses 0 bytes, only listed as a “dormant” 78MB resource which could be re-allocated in case it will be reused again.&lt;br&gt;
This can be confirmed in the Activity Monitor. Before optimization:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwuww60q18w4x90ktbzf6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwuww60q18w4x90ktbzf6.png" alt="Image description" width="749" height="390"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And after:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fahoe8rutn0zh7hgkhdlk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fahoe8rutn0zh7hgkhdlk.png" alt="Image description" width="747" height="390"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now my app uses just under 80 MB of total RAM instead of 150! This is a good result for a full-screen 3D app — actually it about as much as just two stock MacOS calculators! (Yes you can check it yourself — &lt;em&gt;Calculator&lt;/em&gt; uses ~40MB of RAM which seems to be a bit excessive).&lt;/p&gt;

&lt;p&gt;Hope this little tutorial will be useful and will make your Metal app more memory and power-efficient!&lt;/p&gt;

</description>
      <category>metal</category>
      <category>3d</category>
      <category>msaa</category>
    </item>
    <item>
      <title>Stylized Castle WebGL demo</title>
      <dc:creator>keaukraine</dc:creator>
      <pubDate>Sat, 28 Oct 2023 07:07:08 +0000</pubDate>
      <link>https://forem.com/keaukraine/stylized-castle-webgl-demo-1k29</link>
      <guid>https://forem.com/keaukraine/stylized-castle-webgl-demo-1k29</guid>
      <description>&lt;p&gt;This is my first attempt to create a kitbashed scene from tiling assets. I have used assets by &lt;a href="https://kenney.itch.io/"&gt;Kenney&lt;/a&gt; and his &lt;a href="https://kenney.itch.io/assetforge-deluxe"&gt;Asset Forge&lt;/a&gt; Deluxe as an editor to create the scene. Result was exported to OBJ files and then converted into ready-to-use buffers with vertex data and indices.&lt;/p&gt;

&lt;p&gt;Scene has a distinct stylized look with no textures used except for animated characters — knights and birds. To add depth to the scene, a simple linear vertex fog and real-time shadows are applied.&lt;/p&gt;

&lt;p&gt;Total scene polycount with both static and dynamic objects is 95k triangles.&lt;/p&gt;

&lt;h2&gt;
  
  
  Static geometry
&lt;/h2&gt;

&lt;p&gt;All static objects in the scene are merged into 2 large meshes to reduce the amount of draw calls. These assets don’t use textures, instead vertex colors are used. These colors are stored as indices which allows for an easier customization of color themes. Shaders accept up to 32 colors set via uniforms.&lt;/p&gt;

&lt;p&gt;The stride for static objects vertex data is 12 bytes — 3 FP16 values are used for position, 3 normalized signed bytes for normals, and 1 unsigned byte for color. 2 unused bytes are used for 4-byte alignment of data:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--7qaANOvj--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/msxcs4ss4zu5t8ow7z2j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--7qaANOvj--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/msxcs4ss4zu5t8ow7z2j.png" alt="static geometries vertex data" width="695" height="155"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I also tried using a more compact packed &lt;code&gt;GL_INT_2_10_10_10_REV&lt;/code&gt; type for vertex positions to fit data in 8 bytes. Unfortunately, its precision was just not enough for this purpose. This data type provides precision of roughly 1 meter per 1 km. And since the scene uses quite large geometries batched into 2 meshes this precision was clearly not enough.&lt;/p&gt;

&lt;h2&gt;
  
  
  Shadow maps
&lt;/h2&gt;

&lt;p&gt;Lighting in the scene is not baked — shadow maps are used for all objects to cast dynamic shadows. Shadow maps have no cascades since the scene is rather small. However, the detail of shadows is adjusted in a different way. The light source FOV is slightly adjusted per each camera to have more detailed shadows for close-up scenes and less detailed for overviews.&lt;/p&gt;

&lt;p&gt;Shadow map resolution is 2048x2048 which is sufficient to create detailed enough shadows.&lt;/p&gt;

&lt;p&gt;To smooth out hard shadow edges, the hardware bilinear filtering of shadow map is used. Please note that OpenGL ES 3.0+ / WebGL 2.0 is required for this. OpenGL ES 2.0 supports only unfiltered sampling from shadow maps which results in boolean-like comparison whether a fragment is in shadow or not. Hardware filtering is combined with 5-tap sub-texel percentage closer filtering (PCF). This results in smooth shadow edges with a relatively small amount of texture samples.&lt;/p&gt;

&lt;p&gt;I also considered a more expensive 9-tap PCF which improved image quality in case of unfiltered shadow texture but with the filtered texture improvement over 5-tap one was negligible. So according to the golden rule of real-time graphics programming “it looks good enough”, the final filtering used in the app is 5-tap PCF with hardware filtering.&lt;/p&gt;

&lt;p&gt;Here you can see comparison of different shadow filtering modes:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--qnldub0U--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/e0std482ecsdi96mpm3h.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--qnldub0U--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/e0std482ecsdi96mpm3h.gif" alt="shadowmaps filtering" width="600" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Performance
&lt;/h3&gt;

&lt;p&gt;To improve performance, a couple of optimizations are used.&lt;/p&gt;

&lt;p&gt;First is a quite typical, simple and widely used one — shadow map is updated at half framerate. This is almost unnoticeable since nothing in the scene moves too fast. For cases when the camera and light direction is about to switch to a new position it is rendered at full framerate to prevent 1-frame flickering of shadow map rendered with the old light source.&lt;/p&gt;

&lt;p&gt;The second trick is that the PCF is not applied for distant fragments — instead a single sample of shadow texture is used. It is impossible to spot any difference in image quality in the distance because shadows are still hardware-filtered but the performance and efficiency are improved. But isn’t it considered a bad practice to use branching in the shaders? Yes and no. In general, it is not so bad on modern hardware — if &lt;a href="https://solidpixel.github.io/2021/12/09/branches_in_shaders.html"&gt;used properly&lt;/a&gt; and not in an attempt to create some all-in-one uber-shader. Actually, it is quite often used in raymarching where it can provide a measurable performance improvement by branching out empty/occluded parts. In this particular case branching helps to save one of the most critical resources on mobile GPUs — memory bandwidth.&lt;/p&gt;

&lt;p&gt;So how can we test if this branching actually improved things or only unnecessarily complicated shaders?&lt;/p&gt;

&lt;p&gt;First, let’s perform a static analysis of shaders. To do this, I use the Mali Offline Compiler tool. Here are results:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Non-optimized:&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Work registers: 25
Uniform registers: 16
Stack spilling: false
16-bit arithmetic: 66%
                              FMA     CVT     SFU      LS       V       T    Bound
Total instruction cycles:    0.31    0.14    0.06    0.00    1.00    1.25        T
Shortest path cycles:        0.31    0.11    0.06    0.00    1.00    1.25        T
Longest path cycles:         0.31    0.14    0.06    0.00    1.00    1.25        T

FMA = Arith FMA, CVT = Arith CVT, SFU = Arith SFU, LS = Load/Store, V = Varying, T = Texture
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Conditional PCF:&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Work registers: 21
Uniform registers: 16
Stack spilling: false
16-bit arithmetic: 68%
                              FMA     CVT     SFU      LS       V       T    Bound
Total instruction cycles:    0.31    0.22    0.06    0.00    1.00    1.50        T
Shortest path cycles:        0.17    0.11    0.06    0.00    1.00    0.25        V
Longest path cycles:         0.31    0.20    0.06    0.00    1.00    1.25        T

FMA = Arith FMA, CVT = Arith CVT, SFU = Arith SFU, LS = Load/Store, V = Varying, T = Texture
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So according to these results, the new version is no longer texture-bound in the shortest path and still has the same cycles for the longest path. Also, the number of used registers is reduced. Looks good on paper, isn’t it?&lt;/p&gt;

&lt;p&gt;But of course both versions of shaders perform identically in the Android app on my Pixel 7a — it always runs at stable 90 fps. So to see if GPU is less loaded, let’s run Android GPU Inspector on two versions of app and compare some metrics from profiles:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--WqzgV43c--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/rwte2izrmhqumfwtp9fg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--WqzgV43c--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/rwte2izrmhqumfwtp9fg.png" alt="Image description" width="800" height="285"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As expected, it doesn’t affect overall GPU cycles much but reduces load on texture units. As a result, the GPU now is less busy — it consumes less power and has more free resources to smoothly render home screen UI on top of live wallpaper.&lt;/p&gt;

&lt;h2&gt;
  
  
  Animation
&lt;/h2&gt;

&lt;p&gt;All animated objects in the scene are animated procedurally. No baked skeletal or vertex animations are used. The simple shapes of these objects allow them to be animated in vertex shaders relatively easily.&lt;/p&gt;

&lt;p&gt;I have found some inspiration for procedural animations of rats in the “&lt;em&gt;King, Witch and Dragon&lt;/em&gt;” (&lt;a href="https://torchinsky.me/shader-animation-unity/"&gt;https://torchinsky.me/shader-animation-unity/&lt;/a&gt;) and fish &lt;em&gt;ABZU&lt;/em&gt; (&lt;a href="https://www.youtube.com/watch?v=l9NX06mvp2E"&gt;https://www.youtube.com/watch?v=l9NX06mvp2E&lt;/a&gt;). Animations in our scene are of course simpler than the ones in these games because animated objects have stylized boxy look and therefore movements are also stylized and simplified.&lt;/p&gt;

&lt;h3&gt;
  
  
  Knights
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--FkKjMce8--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/mtlngqhn1dcdcyv8liuj.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--FkKjMce8--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/mtlngqhn1dcdcyv8liuj.gif" alt="knight" width="330" height="330"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Scene has 16 knights with each model made of just 48 triangles.&lt;/p&gt;

&lt;p&gt;Animation is done in the &lt;a href="https://github.com/keaukraine/webgl-stylized-castle/blob/main/src/shaders/KnightAnimatedShader.ts"&gt;KnightAnimatedShader.ts&lt;/a&gt;. Let’s take a look at how it animates the model. To do this, first it needs to detect vertices belonging to different body parts of the knight. But the vertex data doesn’t have a special “bone id” attribute for this. This cannot be done by testing vertex positions because some body parts overlap. For example, the head has 4 bottom coordinates identical to the body. Some texture coordinates also overlap so we cannot rely on them too as in the case of &lt;a href="https://torchinsky.me/shader-animation-unity/"&gt;rats animation&lt;/a&gt; in the “King, Witch and Dragon” game. So I grouped the vertices for each body in the buffer, and the vertex shader determines body parts simply by comparing their &lt;code&gt;gl_VertexID&lt;/code&gt;. Of course this is not optimal because it introduces branching but it is not overly excessive and it is done in a vertex shader for a very low-poly model. Model is grouped to have first body vertices, then head and then arms:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--zlzyo2n3--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/uvmdmi74uwjwemkylnht.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--zlzyo2n3--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/uvmdmi74uwjwemkylnht.png" alt="knight vertex data" width="773" height="152"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And here is the knight model with applied test coloring to visualize body parts:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--4tp-tnV9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/9znzz4z1j7h47oguhr4f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--4tp-tnV9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/9znzz4z1j7h47oguhr4f.png" alt="knight colored" width="268" height="299"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now that shader knows which vertex belongs to which body part it applies rotations to them provided via uniforms. Rotation pivot points are hard-coded in shader. You may notice that only the head and arms are animated. Because models don’t have separate legs they are not animated. Instead a bobbing is applied to the whole model to add a rather convincing effect of “walking”. The bobbing is simply an absolute value of the sine wave:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--jhlKspT9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/48kfvke2stc8fel88ska.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--jhlKspT9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/48kfvke2stc8fel88ska.png" alt="bobbing" width="595" height="265"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Birds
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--LgNpsGiB--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/6mdvxydw0gchdf6pqa84.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--LgNpsGiB--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/6mdvxydw0gchdf6pqa84.gif" alt="bird animation" width="244" height="244"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There are 6 eagles soaring in the sky, each model made of 70 triangles. They are flying in different circular paths.&lt;/p&gt;

&lt;p&gt;Birds are rendered with &lt;a href="https://github.com/keaukraine/webgl-stylized-castle/blob/main/src/shaders/EagleAnimatedShader.ts"&gt;EagleAnimatedShader.ts&lt;/a&gt;. Animations are done in the same way as for knights but they are simpler since only wings are animated and they are rotated synchronously. So only a single rotation timer is passed into the shader via uniform to control animation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Flags
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--_BCqMJqM--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/oena1ldry4s44n59d2v8.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--_BCqMJqM--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/oena1ldry4s44n59d2v8.gif" alt="flag animation" width="440" height="340"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Scene has 3 different flags, all animated with the same &lt;a href="https://github.com/keaukraine/webgl-stylized-castle/blob/main/src/shaders/FlagSmShader.ts"&gt;FlagSmShader.ts&lt;/a&gt;. Technique is inspired by the wavy animation of the fish in &lt;em&gt;ABZU&lt;/em&gt;. The simple sine wave is applied to vertices, reducing amplitude closer to the flagpole and increasing near the end of the flag. To correctly apply lighting, normals are also bent. To bend them the cosine wave of the same frequency is used since it is derivative of sine.&lt;/p&gt;

&lt;h3&gt;
  
  
  Wind Stripes
&lt;/h3&gt;

&lt;p&gt;A small detail added to the scene as a last touch is mostly inspired by the &lt;em&gt;Sea Of Thieves&lt;/em&gt; wind effect. In the &lt;em&gt;Sea Of Thieves&lt;/em&gt; these stripes serve a purpose of showing wind direction to align sails so they are more straight. In our scene they are purely for the looks so they bend and twist much more.&lt;/p&gt;

&lt;p&gt;Let’s take a look at the shader to draw them — &lt;a href="https://github.com/keaukraine/webgl-stylized-castle/blob/main/src/shaders/WindShader.ts"&gt;WindShader.ts&lt;/a&gt;. It is even “more procedural” than the ones used for animated objects. It doesn’t use geometry buffers at all and generates triangles based on &lt;code&gt;gl_VertexID&lt;/code&gt;. Indeed, as you can see in the source of its &lt;code&gt;draw()&lt;/code&gt; method it doesn’t set up any buffers with vertex data. Instead it uses the hard-coded &lt;code&gt;VERTICES&lt;/code&gt; array for two triangles declaring a single square segment. So if we need to draw 50 segments, we issue a &lt;code&gt;glDrawArrays&lt;/code&gt; call with 50 * 3 * 2 triangle primitives. The vertex shader offsets each segment based on the &lt;code&gt;gl_VertexID&lt;/code&gt; and tapers both ends of a final stripe using &lt;code&gt;smoothstep&lt;/code&gt;. Then the coordinates of a resulting spline shape are shifted by an offset timer so they appear moving. Next, the shape is deformed by two sine wave noises in world-space. Color is animated to fade in and out, and offset is animated to move the stripe. All this results in a random snake-like movement of the stripes but keeps them aligned to the world-space path.&lt;/p&gt;

&lt;p&gt;Here is a breakdown of rendering of a short 20-segments wind stripe with test coloring of segments to clearly visualize geometry:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--YTYUhBfx--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/vb88l2dmw7xqnxyw6old.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--YTYUhBfx--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/vb88l2dmw7xqnxyw6old.gif" alt="wind stripe steps" width="450" height="270"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Depth-only shaders
&lt;/h2&gt;

&lt;p&gt;All objects in the scene except wind stripes cast shadows so they have to be rendered to a depth map texture using the light source camera. For this, the simplified versions of corresponding animated and static shaders are used. They perform all the same vertex transformations but don’t calculate lighting.&lt;/p&gt;

&lt;p&gt;For example, let’s take a look at the &lt;a href="https://github.com/keaukraine/webgl-stylized-castle/blob/main/src/shaders/KnightDepthShader.ts"&gt;KnightDepthShader.ts&lt;/a&gt;. It performs all the same vertex transformations to animate head and arms but does not calculate lighting based on normals. Even more than that, you may notice that its fragment shader is empty — it provides no color output at all. These are perfectly valid shaders in GLSL ES 3.00 since their only purpose is to write to the depth (shadow map) attachment.&lt;/p&gt;

&lt;h2&gt;
  
  
  Results and possible additional optimizations
&lt;/h2&gt;

&lt;p&gt;Live web demo: &lt;a href="https://keaukraine.github.io/webgl-stylized-castle/index.html"&gt;https://keaukraine.github.io/webgl-stylized-castle/index.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Source code: &lt;a href="https://github.com/keaukraine/webgl-stylized-castle"&gt;https://github.com/keaukraine/webgl-stylized-castle&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Android live wallpaper app - &lt;a href="https://play.google.com/store/apps/details?id=org.androidworks.livewallpaper.cartooncastle3d"&gt;https://play.google.com/store/apps/details?id=org.androidworks.livewallpaper.cartooncastle3d&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Final web demo is ~2.2 MB which is not quite optimal because all objects are exported as 2 huge batched meshes. And there are a lot of repetitive objects like trees and cannons which are good candidates for instanced rendering.&lt;/p&gt;

</description>
      <category>webgl</category>
      <category>3d</category>
      <category>webdev</category>
      <category>javascript</category>
    </item>
    <item>
      <title>Voxel Airplanes 3D WebGL demo</title>
      <dc:creator>keaukraine</dc:creator>
      <pubDate>Mon, 19 Dec 2022 19:10:59 +0000</pubDate>
      <link>https://forem.com/keaukraine/voxel-airplanes-3d-webgl-demo-1k65</link>
      <guid>https://forem.com/keaukraine/voxel-airplanes-3d-webgl-demo-1k65</guid>
      <description>&lt;p&gt;While working on Floating Islands live wallpaper I stumbled upon these cute voxel &lt;a href="https://maxparata.itch.io/voxel-plane"&gt;3D models of airplanes by Max Parata&lt;/a&gt;. I already wanted to bring life to some stylized low-fi 3D art so in my mind I immediately saw how to create a stylized old-skool low-fi scene with these assets. You can watch a &lt;a href="https://keaukraine.github.io/webgl-voxel-airplanes/index.html"&gt;live WebGL demo here&lt;/a&gt;.&lt;br&gt;
This project was quite fast to implement - the time span between the first commit with rough WIP layout and the final version is about 20 days. Yet it was quite fun to create it because during development I’ve got some fresh ideas on how to improve the scene. All additions were really minor to keep the scene as simple as possible in accordance with its art design. My brother provided valuable feedback on how to improve it and also helped with optimization of some geometries.&lt;/p&gt;




&lt;h2&gt;
  
  
  Scene composition
&lt;/h2&gt;

&lt;p&gt;Scene is aesthetically simple so it contains just four objects: planes, ground, clouds and wind:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--zWJ_8MQ6--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/jmdrnp2xjjlyokwtpxbl.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--zWJ_8MQ6--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/jmdrnp2xjjlyokwtpxbl.gif" alt="Scene render order" width="575" height="377"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next, let’s take a look at what shaders are used to render models, and how geometries of these models are optimized.&lt;/p&gt;

&lt;h2&gt;
  
  
  Planes
&lt;/h2&gt;

&lt;p&gt;Technically planes are rendered not as voxels (each voxel cube individually) but as a ready mesh, exported from &lt;a href="https://www.voxelmade.com/magicavoxel/"&gt;MagicaVoxel&lt;/a&gt;. They are not simplified using &lt;a href="https://www.thestrokeforge.xyz/vox-cleaner"&gt;VoxCleaner&lt;/a&gt; to use texture atlases and reduce polycount - I decided to use them as is because it will be easier to create alternate palettes, and anyways vertex data for planes have ridiculously small memory footprint.&lt;/p&gt;

&lt;p&gt;Each plane consists of 3 parts - plane body, glass cockpit and rotating propellers. Plane is rendered using 2 shaders - a simple directionally lit  &lt;a href="https://github.com/keaukraine/webgl-voxel-airplanes/blob/main/src/shaders/PlaneBodyLitShader.ts"&gt;PlaneBodyLitShader.ts&lt;/a&gt; for body and props and its variation &lt;a href="https://github.com/keaukraine/webgl-voxel-airplanes/blob/main/src/shaders/GlassShader.ts"&gt;GlassShader.ts&lt;/a&gt; for glass with stylized reflections.&lt;/p&gt;

&lt;p&gt;The specifics of plane models allow vertex data to be packed using really small data types. All plane models are small and fit in the -127…+127 bounding box. And since vertices represent voxels they are always snapped to 1x1 grid. So I chose to store vertex positions in signed bytes which have just enough precision for the job.&lt;br&gt;
Also, since the palette textures used by models are of 256x1 size, the V texture coordinate is omitted and hardcoded to the 0.5 - the center of texel. Remaining U coordinate fits in one unsigned byte. One byte of padding is used to align data by 8 bytes. Here is vertex data stride for plane and prop models:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--zKkfru1I--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/6t2283xssl3f1vo88nyw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--zKkfru1I--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/6t2283xssl3f1vo88nyw.png" alt="Plane vertex data" width="704" height="145"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In a separate draw call, a glass with scrolling stylized fake reflection is drawn:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--jCE9nXdj--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/zh1yumuidcgn1m48ol5p.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--jCE9nXdj--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/zh1yumuidcgn1m48ol5p.gif" alt="Glass" width="316" height="246"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Glass is rendered without palette texture - its color is set via uniform. The texture passed to this shader is a mask for reflection. Its UV coordinates are calculated in the vertex shader based on model-space vertex coordinates. Of course, it is unfiltered for artistic purposes. Stride for glass models is the same but without texture coordinates:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--g8oPf1S2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ik8nx8lgrot9o5iw38o3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--g8oPf1S2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ik8nx8lgrot9o5iw38o3.png" alt="Glass vertex data" width="704" height="144"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/keaukraine/webgl-voxel-airplanes/blob/main/src/shaders/GlassShader.ts"&gt;GlassShader&lt;/a&gt; samples texture using &lt;code&gt;textureLod&lt;/code&gt; with 0 mipmap level. This is done to explicitly tell the OpenGL ES driver that we access the texture without mipmaps and to reduce some overheads. You can read more about this and some other texture sampling optimizations tricks in Pete Harris blog - &lt;a href="https://solidpixel.github.io/2022/03/27/texture_sampling_tips.html"&gt;https://solidpixel.github.io/2022/03/27/texture_sampling_tips.html&lt;/a&gt;&lt;br&gt;
Also glass models have small polycount so they use unsigned byte indices which also reduces memory bandwidth.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;UPDATE:&lt;/em&gt; In the newer version of &lt;code&gt;GlassShader&lt;/code&gt; I have found a way to optimize strides for glass models even more. Now they fit in just 4 bytes:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--_dpfYyn2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/rlaov9dnazdcm7l0w8j4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--_dpfYyn2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/rlaov9dnazdcm7l0w8j4.png" alt="Glass vertex data" width="356" height="143"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So how is normal stored in single byte? Since voxels can have only 6 variations of normals they can be stored as an index of an array of actual normals which are hard-coded in vertex shader. Updated shader uses this technique to reduce the amount of vertex data.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wind stripes
&lt;/h2&gt;

&lt;p&gt;For wind stripes I decided to create a shader which doesn’t perform any memory reads at all. Wind stripe has a very simple geometry - a 100x100 units quad, stretched into an appropriately thin line by model matrix. Because of its simplicity, all this geometry can be hardcoded in vertex shader code. And it doesn’t use any textures as well - fragment color is passed via uniform. You can find the implementation in &lt;a href="https://github.com/keaukraine/webgl-voxel-airplanes/blob/main/src/shaders/WindStripeShader.ts"&gt;WindStripeShader.ts&lt;/a&gt;. It uses &lt;code&gt;gl_VertexID&lt;/code&gt; to get position for a given vertex. When this shader is used to draw a wind stripe, no buffers or textures are bound.&lt;/p&gt;

&lt;p&gt;Technically it even can be used as a “building block” to draw more complex shapes by issuing draw calls with different rotating/scaling/shearing its base hard-coded quad geometry but this will be too inefficient.&lt;/p&gt;

&lt;h2&gt;
  
  
  Terrain
&lt;/h2&gt;

&lt;p&gt;Terrain textures are 256x256 tiling images. They are based on aerial photos with some GIMP magic sprinkled over them - contrast adjustments and colors reduced to just 10-12. This adds a more old-skool look to them and makes each texel more pronounced.&lt;/p&gt;

&lt;p&gt;Shader to render terrain is in &lt;a href="https://github.com/keaukraine/webgl-voxel-airplanes/blob/main/src/shaders/DiffuseScrollingFilteredShader.ts"&gt;DiffuseScrollingFilteredShader.ts&lt;/a&gt; file. Let’s take a look at it.&lt;br&gt;
It is a rather simple shader which simply pans UV coordinates to create an illusion of moving ground beneath the airplane. However there is one additional thing it does, and it is texture filtering. You may be wondering what filtering is used here, ground clearly is unfiltered, it uses &lt;code&gt;GL_NEAREST&lt;/code&gt; sampling! However, there is a custom antialiased blocky filtering used here. The thing is, that regular &lt;code&gt;GL_NEAREST&lt;/code&gt; sampling produces a lot of aliasing on the edges of texels. This becomes especially noticeable at certain angles of the continuously rotating camera. The &lt;code&gt;textureBlocky()&lt;/code&gt; function alleviates these aliasing artifacts while preserving that extra crispy old-skool look of unfiltered textures. Ground texture actually uses &lt;code&gt;GL_LINEAR&lt;/code&gt; filtering and the &lt;code&gt;textureBlocky()&lt;/code&gt; calculates sampling point to get either an interpolated filtered value at the edges or an exact unfiltered one from the center of texel for any other area.&lt;br&gt;
The original author of this filtering is &lt;a href="https://www.shadertoy.com/user/Permutator"&gt;Permutator&lt;/a&gt;, and code is used under CC0 license from this shader toy - &lt;a href="https://www.shadertoy.com/view/ltfXWS"&gt;https://www.shadertoy.com/view/ltfXWS&lt;/a&gt; (you may find some deeper explanation of math used in this filtering technique there).&lt;br&gt;
Here is a comparison (with 4x zoom) of a regular &lt;code&gt;GL_NEAREST&lt;/code&gt; filtering vs custom blocky filtering. As you can see, both are pixelated but the latter one is not aliased.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--3i7eLfBz--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/0d12zrlhirlrbf33o16s.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--3i7eLfBz--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/0d12zrlhirlrbf33o16s.gif" alt="Unfiltered" width="300" height="300"&gt;&lt;/a&gt; &lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--yQOxliZb--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/2prn7yv3422xsxvgnkc0.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--yQOxliZb--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/2prn7yv3422xsxvgnkc0.gif" alt="Filtered" width="300" height="300"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;One of the last additions to the scene is a transition between two different terrain textures. When you switch them, they don’t just toggle but instead a cute pixelated transition effect is used to smoothly switch between textures.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--l2EO4epR--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/vqblnb38cgh83fztebwb.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--l2EO4epR--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/vqblnb38cgh83fztebwb.gif" alt="Terrain transition" width="600" height="336"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can find a code for this transition in the &lt;a href="https://github.com/keaukraine/webgl-voxel-airplanes/blob/main/src/shaders/DiffuseScrollingFilteredTransitionShader.ts"&gt;DiffuseScrollingFilteredTransitionShader.ts&lt;/a&gt; file. Transition uses tiling blue noise texture for uniformly appearing square blocks on the ground. To make transition smoother, &lt;code&gt;smoothstep()&lt;/code&gt; is used. However there is a commented out line with &lt;code&gt;step()&lt;/code&gt; which makes transition more abrupt if you prefer it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Clouds
&lt;/h2&gt;

&lt;p&gt;Clouds don’t use this antialiased blocky filtering because they don’t rotate, are quite transparent and move relatively fast. This makes it quite hard to spot aliasing artifacts on them so they use the cheapest option available - &lt;code&gt;GL_NEAREST&lt;/code&gt; sampling. Clouds use a custom mesh with cutouts where texture is empty. This significantly reduces overdraw compared to a regular quad mesh. Here it is visualized by disabling blending:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--O1F3CasE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/4oia5ug7ejcuia82fany.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--O1F3CasE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/4oia5ug7ejcuia82fany.png" alt="Clouds geometry" width="720" height="549"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Result
&lt;/h2&gt;

&lt;p&gt;You can see a live web demo &lt;a href="https://keaukraine.github.io/webgl-voxel-airplanes/index.html"&gt;here&lt;/a&gt; and if you like to have it on the home screen of your Android phone you can get a live wallpaper app on &lt;a href="https://play.google.com/store/apps/details?id=org.androidworks.livewallpaper.voxelairplanes"&gt;Google Play&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Source code is available on &lt;a href="https://github.com/keaukraine/webgl-voxel-airplanes"&gt;GitHub&lt;/a&gt;, feel free to play around with it.&lt;/p&gt;

&lt;p&gt;As always, the web demo is heavily optimized for the fastest downloading of resources and Android app for the best efficiency and performance. The size of data transferred during initial loading of the web demo is just 189 kB, and the size of all models and textures is 1.4 MB so you can fit this data on a floppy disk.&lt;/p&gt;

&lt;h2&gt;
  
  
  P.S.
&lt;/h2&gt;

&lt;p&gt;These WebGL demo and Android app have been made during war in Ukraine despite regular power outages caused by deliberate destruction of country’s electric infrastructure. Please support Ukraine however you can and boycott your local russian aka terrorist businesses.&lt;/p&gt;

</description>
      <category>webgl</category>
      <category>3d</category>
      <category>javascript</category>
    </item>
    <item>
      <title>Floating Islands WebGL demo 🇺🇦</title>
      <dc:creator>keaukraine</dc:creator>
      <pubDate>Sat, 29 Oct 2022 15:59:37 +0000</pubDate>
      <link>https://forem.com/keaukraine/floating-islands-webgl-demo-995</link>
      <guid>https://forem.com/keaukraine/floating-islands-webgl-demo-995</guid>
      <description>&lt;h2&gt;
  
  
  Idea and inspiration
&lt;/h2&gt;

&lt;p&gt;Idea for this &lt;a href="https://keaukraine.github.io/webgl-rock-pillars/index.html"&gt;3D scene&lt;/a&gt; comes from the magnificent Zhangjiajie National Forest Park in China. You can clearly see where the inspiration originates from - this majestic real life location also has grassy rock pillars covered in dense clouds and when observed from above, their bottom parts are disappearing in dense fog. To improve the magical feeling to the scene, we decided to make some rocks float mid-air. This additional inspiration comes from the map Gateway to Na Pali from my favorite game Unreal. This location has floating rocks in the distant background and is placed inside a huge floating rock itself. We decided to create a scene which would have a lot of similar floating islands densely packed in one area.&lt;/p&gt;

&lt;h2&gt;
  
  
  AI generated concept art
&lt;/h2&gt;

&lt;p&gt;We also tried to use Stable Diffusion to generate some concept art for the scene in hope AI will hallucinate some unusual points of view or might incorporate some details which we may find fitting the scene. However, all images appeared to be virtually identical. AI created a series of rather dull images - the same rocks in the same fog without any additional details. We only used a couple of them as a reference for vivid sunrise color palettes which could as well be picked from any other sources.&lt;/p&gt;

&lt;h2&gt;
  
  
  Scene composition
&lt;/h2&gt;

&lt;p&gt;To create this scene we’ve used and reused some stylized hand-painted 3D models from packs we’ve purchased quite some ago for our previous projects. No new assets were purchased for this project. Scene uses just 3 rock models, some generic ferns and trees, and birds flying in the sky.&lt;/p&gt;

&lt;p&gt;Render order is the following: depth pre-pass, rocks, birds, sky, soft cloud particles.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--CmAqDqtZ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/8j5a0qpr4wnkt3xrj6rt.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--CmAqDqtZ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/8j5a0qpr4wnkt3xrj6rt.gif" alt="Scene rendering order" width="800" height="379"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Camera path and objects placement
&lt;/h2&gt;

&lt;p&gt;To make an impression of an endless random scene there were 2 options: a true random scene and a looped generated path. The first option requires object placement on the fly in front of the camera which means these positions have to be transferred to GPU dynamically. So a better option is to generate a static looped path once and draw objects along it as the camera moves.&lt;br&gt;
You can find a function to generate a base spline in the &lt;code&gt;positionOnSpline&lt;/code&gt; function in &lt;a href="https://github.com/keaukraine/webgl-rock-pillars/blob/main/src/ObjectsPlacement.ts"&gt;ObjectPlacement.ts&lt;/a&gt; file. It creates a circular looped path for the camera with oscillating radius. A couple of harmonics are applied to randomize the circle radius so it appears random but is still perfectly looped. Then, all objects are placed around this path - trees are under the camera, rocks above and to the sides.&lt;/p&gt;

&lt;p&gt;Object positions and rotations are stored in typed Float32Array in the form of textures on GPU.&lt;br&gt;
&lt;code&gt;drawInstances&lt;/code&gt; method in &lt;a href="https://github.com/keaukraine/webgl-rock-pillars/blob/main/src/Renderer.ts"&gt;Renderer.ts&lt;/a&gt; renders objects visible only from a certain point on the spline. Because of scene simplicity there's no need to use frustum culling - objects are drawn at a certain distance in front and back of the camera. This visibility distance is slightly larger than fog start distance so the new objects appear fully covered in fog and don't pop. Instances are ordered front-to-back so when drawn they make use of Z-buffer culling.&lt;/p&gt;

&lt;p&gt;Only rocks and trees models are placed this way alongside the camera path. Bird flocks use hand-picked linear paths to cover the whole area of the scene with minimal paths.&lt;/p&gt;

&lt;p&gt;Here is camera path visualized with only a subset of objects rendered in its vicinity:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--rZ1lXB8t--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/mj6lfhv2v9k3mnqdjf1n.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--rZ1lXB8t--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/mj6lfhv2v9k3mnqdjf1n.gif" alt="Objects culling" width="610" height="610"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Fog cubemaps
&lt;/h2&gt;

&lt;p&gt;Initial implementation used fog of uniform color which looked rather bland. To add more color variation from different directions (like sun halo) we decided to use cubemaps for fog. This allows great flexibility for the artist (my brother) - he can completely change the look of the whole scene by creating a cubemap and tweaking a couple of colors in the scene preset. Cubemaps were initially created as equirectangular images since it is easy to paint them. Then we used an &lt;a href="https://jaxry.github.io/panorama-to-cubemap/"&gt;online tool&lt;/a&gt; to convert an equirectangular source image to 6 cubemap faces, and a simple ImageMagick script to fix their rotations to suit our coordinate system (Z-up).&lt;/p&gt;

&lt;p&gt;You can find cubemap fog implementation in static constants from &lt;a href="https://github.com/keaukraine/webgl-rock-pillars/blob/main/src/shaders/FogShader.ts"&gt;FogShader.ts&lt;/a&gt;. All fog shaders use them. Final fog coefficient used by vertex shader for color mixing also contains the height fog coefficient.&lt;/p&gt;

&lt;p&gt;In the web demo UI you can adjust different fog parameters - start distance, transition distance, height offset and multiplier. Also, changing the scene time of day is done by using a different cubemap texture and a couple of colors for each preset.&lt;/p&gt;

&lt;p&gt;Interestingly, after implementing this I’ve found out that fog cubemaps are widely used in Source engine, and of course this technique has also been incorporated in some indie games too.&lt;/p&gt;

&lt;h2&gt;
  
  
  Grass on rocks
&lt;/h2&gt;

&lt;p&gt;To make rocks less dull we also apply grass texture on top of them. This technique is commonly used to simulate surfaces covered by snow or soaked with the rain. Grass texture is mixed with rock texture based on vertex normal. You can play around with the &lt;code&gt;grassAmount&lt;/code&gt; slider in UI to see how it affects grass spread on rocks.&lt;/p&gt;

&lt;p&gt;Source code of shader which applies grass texture on top of rocks is in &lt;a href="https://github.com/keaukraine/webgl-rock-pillars/blob/main/src/shaders/FogVertexLitGrassShader.ts"&gt;FogVertexLitGrassShader.ts&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Soft clouds shader
&lt;/h2&gt;

&lt;p&gt;Clouds are not instanced but are drawn one by one because transformation matrices for these objects have to be adjusted to always face the camera. There’s not that many of them so this doesn’t add too many draw calls. Actually, if GPU state is not changed (no uniforms are updated, blending mode switched, etc.) then even non-instanced rendering is quite fast on modern mobile and desktop GPUs. For test purposes we had a quick and dirty visualization of a camera spline with non-instanced rendering of 5000 small spheres and it caused no slowdowns.&lt;/p&gt;

&lt;p&gt;There’s also one minor trick in this shader. As the camera flies through clouds, they can be abruptly culled by a near clipping plane. To prevent this, there is a simple &lt;code&gt;smoothstep&lt;/code&gt; fading applied to fade right in front of the camera. You can find code in &lt;a href="https://github.com/keaukraine/webgl-rock-pillars/blob/main/src/shaders/FogSpriteShader.ts"&gt;FogSpriteShader&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Result
&lt;/h2&gt;

&lt;p&gt;You can see a live web demo &lt;a href="https://keaukraine.github.io/webgl-rock-pillars/index.html"&gt;here&lt;/a&gt; and if you like to have it on the home screen of your Android phone you can get a &lt;a href="https://play.google.com/store/apps/details?id=org.androidworks.livewallpaper.floatingislands"&gt;live wallpaper app on Google Play&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/keaukraine/webgl-rock-pillars"&gt;Source code&lt;/a&gt; is available on GitHub, feel free to play around with it.&lt;/p&gt;

&lt;p&gt;As always, the web demo is heavily optimized for the smallest data size and Android app for the best efficiency and performance. Web version uses WebP for textures which offer better compression than PNG, better image quality than JPEG and support alpha channel even with lossy compression. Mipmaps are generated for all textures. The total gzipped size of the web demo is just 374 kB so you can copy it to a floppy disk to show to friends who have no Internet :)&lt;/p&gt;

</description>
      <category>webgl</category>
      <category>webgl2</category>
      <category>3d</category>
      <category>demo</category>
    </item>
    <item>
      <title>Efficient WebGL vegetation rendering 🇺🇦</title>
      <dc:creator>keaukraine</dc:creator>
      <pubDate>Wed, 10 Aug 2022 08:30:29 +0000</pubDate>
      <link>https://forem.com/keaukraine/efficient-webgl-vegetation-rendering-4g2g</link>
      <guid>https://forem.com/keaukraine/efficient-webgl-vegetation-rendering-4g2g</guid>
      <description>&lt;p&gt;In this article I’ll explain the rendering pipeline of Spring Flowers WebGL Demo and its corresponding Android app. Also I will describe what problems we’ve encountered and what solutions we used to overcome during development and testing of the &lt;a href="https://play.google.com/store/apps/details?id=org.androidworks.livewallpaper.flowers"&gt;Android live wallpaper app&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;You can check out the &lt;a href="https://keaukraine.github.io/webgl-flowers/index.html"&gt;live demo page&lt;/a&gt; and play with various configuration options in the top right controls section.&lt;/p&gt;

&lt;h2&gt;
  
  
  Implementation
&lt;/h2&gt;

&lt;p&gt;Scene is composed from the following main objects: sky, ground, and 3 types of grass: flowers (each containing individual instances of leaves, petals and stems), small round grass and tall animated grass. To make the scene more alive, a sphere for glare and moving ants+butterflies are also drawn.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--b5ZgXG2G--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/q2tjdts7dd2wewpxh2w9.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--b5ZgXG2G--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/q2tjdts7dd2wewpxh2w9.gif" alt="Scene draw order" width="730" height="398"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Draw order is the following: first objects closer to the camera and larger ones to use z-buffer efficiently, then objects closer to ground, then sky and ground. Ground plane has transparent edges which blur with the background sky sphere so it is drawn the last after the sky.&lt;/p&gt;

&lt;p&gt;For sun glare effect we draw a sphere object with a specular highlight. It is drawn last over the whole geometry without depth test. This way everything is slightly over-brightened when viewed against the sun, and glare is less prominent when the camera is not facing the sun.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tiled culling of instances
&lt;/h3&gt;

&lt;p&gt;Grass and flowers are drawn using similar shaders with a common part in them being instanced positioning. These instanced objects get their transformations from the FP32 RGB texture.&lt;/p&gt;

&lt;p&gt;All instanced shaders use the same include &lt;a href="https://github.com/keaukraine/webgl-flowers/blob/master/src/shaders/InstancedTexturePositionsShader.ts#L12"&gt;COMMON_TRANSFORMS&lt;/a&gt; which uses 2 samples from texture to retrieve translation in XY plane, scale and rotation. Please note that rotation is stored in the form of sine and cosine of an angle to save on rotation math.&lt;/p&gt;

&lt;p&gt;Original transformation is stored in arrays &lt;code&gt;FLOWERS&lt;/code&gt;, &lt;code&gt;GRASS1&lt;/code&gt; and &lt;code&gt;GRASS2&lt;/code&gt; declared in &lt;a href="https://github.com/keaukraine/webgl-flowers/blob/master/src/GrassPositions.ts"&gt;GrassPositions.ts&lt;/a&gt;. However, these arrays have coordinates for all instances of objects, they are not split into tiles yet. For this, they are processed using &lt;a href="https://github.com/keaukraine/webgl-flowers/blob/master/src/GrassPositions.ts#L43"&gt;sortInstancesByTiles&lt;/a&gt; function. It creates a new FP32 array with rearranged positions+rotations and creates an array of tiles which specify instances count and start offset in the final texture used by the shader. This ready information is stored in the &lt;a href="https://github.com/keaukraine/webgl-flowers/blob/master/src/Utils.ts#L19"&gt;TiledInstances&lt;/a&gt; object. This function allows to split all instances spreaded across square ground area into arbitrary N^2 tiles. In both web demo and Android app all instances are split into reasonable 4x4 tiles area with 16 tiles. Tiles have a small padding which allows them to slightly overlap. This padding has a size of grass model so the instances placed at the very edge of tile won’t disappear abruptly when tile gets culled.&lt;/p&gt;

&lt;p&gt;To visualize how instances are split into tiles, let’s imagine a sample area with 20 randomly placed objects which we would like to cull per tile. Let’s rearrange these instances into 2x2 grid with 4 total tiles:&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--AoFPoqET--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ts2c71s9si5lfsolv5nb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--AoFPoqET--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ts2c71s9si5lfsolv5nb.png" alt="Sample scene" width="606" height="609"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here is the structure of texture containing these objects, showing tiling and data stored in each component per instance:&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--RTQJuJfZ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/xgiowxy5kuiaw44rk87a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--RTQJuJfZ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/xgiowxy5kuiaw44rk87a.png" alt="Texture format" width="832" height="319"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here instances for tile 0 have offset=0 and count=5, for tile 1 offset=5 and count=4, and so on.&lt;/p&gt;

&lt;p&gt;This structure allows us to draw all 20 instances in 4 draw calls and cull them in batches per tile without updating any data on the GPU.&lt;/p&gt;

&lt;p&gt;Culling of tiles bounding boxes on CPU is also relatively cheap. It is done on each frame and you can see how many tiles and individual instances are currently rendered in “Stats” section of controls.&lt;/p&gt;

&lt;p&gt;And reducing grass density to scale performance is also really easy with this approach because inside each tile instances are random. All we have to do is to proportionally reduce number of instances per draw call (you can use density slider in controls to test it):&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--BmUGi3Fn--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/3t4hysqehoss3ddbumdd.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--BmUGi3Fn--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/3t4hysqehoss3ddbumdd.gif" alt="Changing grass density" width="829" height="195"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There are different instanced shaders for drawing different objects. Small grass and flower petals are the simplest ones - they use a simple diffuse colored shading. Dandelion stems and leaves apply specular highlights, and the &lt;a href="https://github.com/keaukraine/webgl-flowers/blob/master/src/shaders/InstancedTexturePositionsGrassAnimatedShader.ts"&gt;shader&lt;/a&gt; used to render tall grass blades also uses vertex animation for wind simulation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Random ants
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ARUphoWr--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/cz6ekc9zscz5md61tzd8.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ARUphoWr--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/cz6ekc9zscz5md61tzd8.gif" alt="Ants on the ground" width="600" height="338"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To make the ground more alive, we draw some ants on it. They are also instanced - total 68 ants are rendered in 2 draw calls.&lt;/p&gt;

&lt;p&gt;They move in circles with a random radius and center. They are drawn in two draw calls for clockwise and counterclockwise rotation within these circular paths. You can examine the math for positioning vertices in the shader’s &lt;a href="https://github.com/keaukraine/webgl-flowers/blob/master/src/shaders/AntsShader.ts"&gt;source code&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;It will be almost impossible to notice any animation on these pretty small and very fast moving objects so we don’t animate them at all.&lt;/p&gt;

&lt;h3&gt;
  
  
  Butterflies
&lt;/h3&gt;

&lt;p&gt;No summer can be imagined without butterflies so we added them too. They are positioned similar to ants but a sine wave is added to their height. Each instance gets its color from texture atlas with 4 different variants.&lt;/p&gt;

&lt;p&gt;On the contrary to ants, butterflies must have animation. To animate them we don’t use any kind of baked animation. Actually, a really cheap trick is used to animate it in &lt;a href="https://github.com/keaukraine/webgl-flowers/blob/master/src/shaders/ButterflyShader.ts#L59"&gt;vertex shader&lt;/a&gt;. It is animated by simply moving wing tips up and down. Wing tips are determined as vertices with high absolute values of X coordinate. Of course it is not a correct circular movement of wings around the butterfly’s body - wings elongate noticeably with higher movement amplitude, but this simplifies shader math and looks convincing enough in motion, as can be seen on this image:&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--1JdFSOvM--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/jwikp8v77pt55cxrwakr.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--1JdFSOvM--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/jwikp8v77pt55cxrwakr.gif" alt="Butterfly animation" width="278" height="278"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Android-specific optimizations
&lt;/h2&gt;

&lt;p&gt;As always with our web demos, they are optimized for the smallest possible network data size and the fastest loading times. So it doesn’t use compressed or supercompressed textures. The &lt;a href="https://play.google.com/store/apps/details?id=org.androidworks.livewallpaper.flowers"&gt;Android app&lt;/a&gt; is optimized for power efficiency so it uses compressed textures (ASTC or ETC2) depending on hardware capabilities.&lt;/p&gt;

&lt;p&gt;To further improve efficiency it uses variable rate shading (VRS) on &lt;a href="https://opengles.gpuinfo.org/listreports.php?extension=GL_QCOM_shading_rate"&gt;supported hardware&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;And when the app detects that the device is in energy saving mode (triggered manually or when battery is low) it will reduce FPS and will use simplified grass shader without animation to significantly reduce power draw. Additionally, in this mode the app will use more aggressive VRS.&lt;/p&gt;

&lt;p&gt;We’ve encountered general performance issues with rendering lots of instanced geometries on low-end Android phones - the bottleneck appeared to be vertex shaders. So when the app detects it is running on a low-end device it will render grass with slightly reduced grass density and without wind animation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Failed implementations
&lt;/h2&gt;

&lt;p&gt;Before implementing this tiled rendering pipeline a couple of more naive less performant implementations have been tried and tested.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fully randomized
&lt;/h3&gt;

&lt;p&gt;The very first version of grass was with fully randomized positioning of instances. It didn’t use texture to store pre-calculated random transformation for instances but calculated them in shader instead. This introduced more complexity  in vertex shaders (random and noise functions have quite some math in them). Additionally, random values were different on different GPUs which made it impossible to finely hand-pick camera paths. Take a look at this photo where we tested this version on different devices - while the code is identical, placement of instances is different:&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--zmPs_tHC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/jd28udd05ge9b1fxlwiq.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--zmPs_tHC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/jd28udd05ge9b1fxlwiq.jpg" alt="Random positions" width="880" height="662"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This version had no visibility calculation or frustum culling which also affected performance.&lt;/p&gt;

&lt;h3&gt;
  
  
  Per-instance culling
&lt;/h3&gt;

&lt;p&gt;The first naive implementation of culling has been implemented per-instance. This version already used texture to reduce vertex shader math but each instance has been tested for visibility and then texture has been updated with only visible instances.&lt;/p&gt;

&lt;p&gt;This worked just fine on PC and on high-end Android devices but proved to be way too slow on low-end phones - CPU took about 10 ms to calculate visibility of instances. Updating the texture on the fly with &lt;code&gt;glTexSubImage2D()&lt;/code&gt; also was unacceptably slow - it took ~20 ms. For comparison, tiled culling takes ~1 ms of CPU time on low-end devices.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final result
&lt;/h2&gt;

&lt;p&gt;Total size of the demo page is just 741 kB so you can carry it on a floppy disk.&lt;/p&gt;

&lt;p&gt;You can play around with different parameters on the &lt;a href="https://keaukraine.github.io/webgl-flowers/index.html"&gt;live demo page&lt;/a&gt;. You can alter time of day, density of grass and other settings. Double click toggles free camera mode with WASD camera movement and rotation with right mouse button down (similar to navigation in viewport in Unreal Engine).&lt;/p&gt;

&lt;p&gt;And as usual you can get &lt;a href="https://github.com/keaukraine/webgl-flowers"&gt;source code&lt;/a&gt; which is licensed under MIT license, so feel free to play around with it.&lt;/p&gt;

</description>
      <category>opengl</category>
      <category>webgl</category>
      <category>3d</category>
      <category>rendering</category>
    </item>
    <item>
      <title>Variable Rate Shading on Adreno GPUs 🇺🇦</title>
      <dc:creator>keaukraine</dc:creator>
      <pubDate>Sun, 03 Jul 2022 18:17:45 +0000</pubDate>
      <link>https://forem.com/keaukraine/variable-rate-shading-on-adreno-gpus-279m</link>
      <guid>https://forem.com/keaukraine/variable-rate-shading-on-adreno-gpus-279m</guid>
      <description>&lt;p&gt;“With high screen DPI doesn’t come high GPU fillrate” — that’s the main problem of GPUs nowadays. Modern consoles struggle to sustain stable 30, let alone 60 fps on large 4k screens. The common technique to increase FPS is rendering at lower resolution with fancy upscaling techniques like DLSS and FSR. But modern VR-capable hardware has to be able to target both very high frame rates and high image quality, and upscaling does show its limitations here — depending on implementation the image will be either blurry, too sharpened or will introduce ghosting artifacts. Variable rate shading (VRS) is a temporally stable approach of improving performance with (if applied correctly) virtually unnoticeable quality reduction.&lt;/p&gt;

&lt;p&gt;Modern mobile Adreno GPUs by Qualcomm support Variable Rate Shading, and phones with these GPUs have been available since autumn 2021. Because our live wallpapers have to be power-efficient, we have got a test device with Adreno 642L to implement this feature in our apps.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Variable Rate Shading
&lt;/h2&gt;

&lt;p&gt;The idea behind VRS is to rasterize a single fragment and then interpolate color between adjacent pixels on screen.&lt;/p&gt;

&lt;p&gt;A good explanation of how VRS is implemented on Adreno GPUs can be found in the official &lt;a href="https://developer.qualcomm.com/blog/variable-rate-shading-has-arrived-mobile-impressive-results"&gt;Qualcomm Developer blog here&lt;/a&gt;. You can understand how simple it is by looking at this image from aforementioned blog post:&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Y5WYR3tF--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/g9ca4byro439b8iu91zt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Y5WYR3tF--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/g9ca4byro439b8iu91zt.png" alt="VRS - image by Qualcomm" width="663" height="207"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;VRS is better than generic downsample of the whole frame because:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;It preserves geometry edges (except cases when the shape is determined by discarding fragments).&lt;/li&gt;
&lt;li&gt;Can be adjusted per each draw call — one object can be rendered at full detail while the other one will have reduced quality.&lt;/li&gt;
&lt;li&gt;Can be applied dynamically to keep target FPS by gradually reducing image quality.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Implementation
&lt;/h2&gt;

&lt;p&gt;On Snapdragon SoCs it is implemented with &lt;a href="https://www.khronos.org/registry/OpenGL/extensions/QCOM/QCOM_shading_rate.txt"&gt;QCOM_shading_rate&lt;/a&gt; extension. Adreno GPUs support blocks of 1x1, 1x2, 2x1, 2x2, 4x2, and 4x4 pixels. Please note that some useful dimensions like 2x4 or 4x1 are not available because they are not supported by hardware.&lt;/p&gt;

&lt;p&gt;To apply VRS to certain objects you simply make a call to &lt;code&gt;glShadingRateQCOM&lt;/code&gt; with desired rate before the corresponding draw calls.&lt;/p&gt;

&lt;p&gt;To disable VRS for geometries which should preserve details and be rendered at native shading rate, simply call &lt;code&gt;glShadingRateQCOM&lt;/code&gt; with 1x1 block size.&lt;/p&gt;

&lt;p&gt;One of the first apps we’ve added VRS support to is &lt;a href="https://play.google.com/store/apps/details?id=org.androidworks.livewallpaperbonsai"&gt;Bonsai Live Wallpaper&lt;/a&gt;. This is a good example because it has 3 very different types of geometries ranging from perfect candidates for VRS optimizations to the very unsuitable ones.&lt;/p&gt;

&lt;p&gt;Let’s take a look at a typical scene from the app and how different parts of image can benefit from reduced shading rate:&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--HmD5zzft--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ysnn3maci68rqhhg44yn.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--HmD5zzft--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ysnn3maci68rqhhg44yn.jpg" alt="Bonsai 3D live wallpaper screenshot" width="880" height="1496"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The best type of geometry to be optimized by VRS is the one which is blurred and has small color variation between fragments. So, for sky background we apply a quite heavy 4x2 VRS which still introduces virtually no quality degradation, especially with constantly moving cameras.&lt;/p&gt;

&lt;p&gt;On the opposite side of the scales is leaves geometry. On the screenshot below we applied 4x4 VRS to the whole scene to showcase the issue with alpha-testing. Please note that branches, while also using the same heavy 4x4 reduction in this example, have the same smooth and anti-aliased edges, clearly showing a benefit of VRS over traditional upscaling.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s---mScipec--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/poob5p4okk6mxo5txvn2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s---mScipec--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/poob5p4okk6mxo5txvn2.png" alt="VRS distortions on geometries with discarded fragments" width="800" height="550"&gt;&lt;/a&gt;&lt;br&gt;
Needless to say, VRS is clearly not suitable for geometries with discarded fragments.&lt;/p&gt;

&lt;p&gt;Also because VRS is applied in screen-space, it introduces significant distortions to transparent dust particles. Their size is comparable to VRS block and they start flickering during movement. I’ve noticed a somewhat similar rendering technique used in the COD:MW game on PC when enabling half-resolution particles — sparks and other small particles flicker way too much and look very blocky.&lt;/p&gt;

&lt;p&gt;And somewhere between these two geometries lies the ground plane. This is where we apply 2x1 rate reduction. This results in OK image quality because there’s a larger color difference between adjacent vertical pixels compared to the horizontal ones.&lt;/p&gt;

&lt;p&gt;Where VRS definitely shines is when it is applied to geometries with very little color difference between adjacent fragments, and Bonsai wallpaper has a stylized silhouette mode where fragments use literally single color:&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--QkOBYUfc--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/e1hvu4xi4n9huwdn6v1b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--QkOBYUfc--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/e1hvu4xi4n9huwdn6v1b.png" alt="Bonsai live wallpaper, silhouette mode" width="880" height="1496"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here we have 3 types of shaders:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Alpha-testing for leaves. We already know that we should not apply VRS to these geometries.&lt;/li&gt;
&lt;li&gt;Solid black silhouette and ground. The heaviest 4x4 VRS introduces literally zero quality degradation.&lt;/li&gt;
&lt;li&gt;For the sky gradient we use 2x1 blocks. Technically it would be perfect to have a 4x1 or even 16x1 blocks because gradient changes vertically and adjacent horizontal fragments have identical color but Adreno hardware supports only 2x1 ones.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;All of these applied to the scene results in identical rendering (screenshots comparison found 0 pixels difference) and 1.5x of shading speed improvement.&lt;/p&gt;

&lt;h2&gt;
  
  
  Dynamic quality
&lt;/h2&gt;

&lt;p&gt;All our wallpapers use some ways of reducing GPU load when the battery is low. Usually this is done by limiting FPS and omitting a couple of effects.&lt;/p&gt;

&lt;p&gt;For more efficient power usage we apply stronger VRS to certain objects in low battery mode. Tree trunks are shaded with 2x1 blocks, sky and transparent effects (light shafts and vignette) are shaded with 4x4 instead of 4x2 or 2x2 blocks. This reduction of quality is still almost unnoticeable but reduces GPU load by additional 3%.&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance gains vs quality tradeoff
&lt;/h2&gt;

&lt;p&gt;You will be hard-pressed to find any difference between original and VRS-optimized rendering — color deviation is negligible, and blocky artifacts are really hard to spot. Only ImageMagick was able to show different pixels:&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--vae0GTwH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/iauwc2ik509x00ls21pa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--vae0GTwH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/iauwc2ik509x00ls21pa.png" alt="Image quality comparison" width="880" height="499"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Both VRS-enabled and regular rendering pipelines result in steady 120 FPS on our test device (Galaxy Samsung A52s). So we’ve run a Snapdragon Profiler to analyze performance and efficiency of the optimized build. Here are the numbers:&lt;/p&gt;

&lt;p&gt;Bonsai 3D Live Wallpaper, regular mode:&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--FXPS4X_B--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/d9fjh818l6uuynq6irb2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--FXPS4X_B--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/d9fjh818l6uuynq6irb2.png" alt="VRS performance table" width="880" height="234"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Bonsai 3D Live Wallpaper, battery saving mode.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--yiH1SzFk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/q1kbf2vqy093w9v6rn0j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--yiH1SzFk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/q1kbf2vqy093w9v6rn0j.png" alt="VRS performance table" width="880" height="233"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Bonsai 3D Live Wallpaper, silhouette mode.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--7LvsxamC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/eb89qqe2hc07hnkt4t6z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--7LvsxamC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/eb89qqe2hc07hnkt4t6z.png" alt="VRS performance table" width="880" height="232"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the silhouette scene we don’t use different VRS blocks for regular and power saving modes because it already uses maximum block size and still renders the image identical to non-VRS one.&lt;/p&gt;




&lt;p&gt;Long story short, we’ve improved rendering efficiency by approximately 30% with little to (literally) none image quality reduction.&lt;/p&gt;

</description>
      <category>3d</category>
      <category>qualcomm</category>
      <category>android</category>
      <category>opengl</category>
    </item>
    <item>
      <title>WebGL Grim Reaper demo 🇺🇦</title>
      <dc:creator>keaukraine</dc:creator>
      <pubDate>Mon, 07 Mar 2022 09:18:47 +0000</pubDate>
      <link>https://forem.com/keaukraine/webgl-grim-reaper-demo-14kf</link>
      <guid>https://forem.com/keaukraine/webgl-grim-reaper-demo-14kf</guid>
      <description>&lt;p&gt;A couple weeks before Halloween 2021 I browsed Sketchfab and encountered a cool 3D model of &lt;a href="https://sketchfab.com/3d-models/3drt-grim-reaper-d8c7ec2429b643958603937bed6533e8" rel="noopener noreferrer"&gt;Grim Reaper&lt;/a&gt; by 3DRT. It has a reasonable polycount, a set of different colours and smooth animations. So the decision was made to create a Halloween-themed live wallpaper with this model. However, I was not able to finish it before Halloween because I gradually added some new effects and features which took quite some time to implement and then tweak.&lt;/p&gt;

&lt;p&gt;You can find a live web demo &lt;a href="https://keaukraine.github.io/webgl-reaper/index.html" rel="noopener noreferrer"&gt;here&lt;/a&gt;, and for persons sensitive to flickering lights a version without lightning is &lt;a href="https://keaukraine.github.io/webgl-reaper/index.html#nolights" rel="noopener noreferrer"&gt;here&lt;/a&gt;. You can interact with it by clicking the mouse on screen — this will change animation. Also you can enter free-camera mode which uses WASD navigation by pressing the Enter key.&lt;/p&gt;

&lt;p&gt;As usual, source code is &lt;a href="https://github.com/keaukraine/webgl-reaper" rel="noopener noreferrer"&gt;available on Github&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;And of course you can get an &lt;a href="https://play.google.com/store/apps/details?id=org.androidworks.livewallpaper.reaper" rel="noopener noreferrer"&gt;Android live wallpaper app&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Scene Composition
&lt;/h2&gt;

&lt;p&gt;Scene is pretty simple so it doesn’t require any sorting of objects — carefully chosen hardcoded render order achieves minimal overdraw:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flf8x6t4nkylxht2kab8r.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flf8x6t4nkylxht2kab8r.gif" alt="Scene rendering stages&amp;lt;br&amp;gt;
"&gt;&lt;/a&gt;&lt;br&gt;
First, opaque (cloth is alpha-masked so it is also opaque) geometries are rendered. These animated objects use vertex animation with data stored in FP16 textures, so WebGL 2 is required for the demo.&lt;br&gt;
After rendering opaque geometries, writing to depth is disabled with &lt;code&gt;glDepthMask(false)&lt;/code&gt; and then transparent effects — smoke, dust and ghosts are drawn over them with blending. Sky is also drawn at this stage. Because it is the most distant object, it doesn’t have to contribute to depth — it is basically treated as a far clipping plane.&lt;/p&gt;
&lt;h2&gt;
  
  
  Effects
&lt;/h2&gt;

&lt;p&gt;That’s where most of the time was spent — thinking of, creating, tweaking and rejecting various effects for a really simple scene with literally a single character in it.&lt;/p&gt;

&lt;p&gt;Every time I had an idea on how to improve a look I added it to the Trello board. Then I had some time to think about it — how will it fit the scene, how to implement it, etc. So here is a breakdown of all used effects.&lt;/p&gt;

&lt;p&gt;First, soft particles are added to the reaper. Half of them rise upwards, half of them sink down from roughly the centre of the reaper model which fluctuates a little depending on animation. Of course to get the best visual appearance soft particles are used, hence the depth pre-pass. You can read about implementation of soft particles in one of my &lt;a href="https://dev.to/keaukraine/implementing-soft-particles-in-webgl-and-opengl-es-3l6e"&gt;previous articles&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Then some flickering dust is rendered. You may notice that its brightness is synchronized with lightning strikes — usually dust slowly fades in and out but at lightning strikes it is more visible.&lt;/p&gt;

&lt;p&gt;As a final touch, a rather heavy vignette is applied. This effect blends nicely with the gloomy atmosphere, helps to draw attention to the centre of screen and to visually conceal the bland void in the corners of the screen.&lt;/p&gt;

&lt;p&gt;There are still a couple of effect ideas noted in my Trello board but I think that adding them will only clutter the scene without adding any more noticeable eye candies.&lt;/p&gt;
&lt;h2&gt;
  
  
  Sky shader
&lt;/h2&gt;

&lt;p&gt;Sky is used to fill in the void around the main character. To add some dynamics and movement to these empty parts of the scene it is rendered with a shader which applies simple distortion and lightning to static clouds texture.&lt;/p&gt;

&lt;p&gt;Let’s analyse the &lt;a href="https://github.com/keaukraine/webgl-reaper/blob/main/src/shaders/SkyShader.ts" rel="noopener noreferrer"&gt;shader code&lt;/a&gt;. It combines three simple effects to create a dynamic sky:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;It starts with applying colour to rather bland-looking greyscale base sky texture:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0g88rnf67lx0gtsy7azg.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0g88rnf67lx0gtsy7azg.jpg" alt="Colorized sky"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Then, waves from a small distortion texture are applied (a similar but more pronounced effect can be used for water ripples). Effect is subtle but does noticeably improve overall look:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feddsgixycuc29r3h875b.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feddsgixycuc29r3h875b.gif" alt="Distortion applied&amp;lt;br&amp;gt;
"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;And the final touch is lightning. To recreate somewhat realistic looking lighting which cannot get through dense clouds but shines through clear areas, brightness is increased exponentially — darker parts will get very little increase in brightness while bright areas will be highlighted. Final result with all effects combined looks like this:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgaore4cz8nxem0t83pfw.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgaore4cz8nxem0t83pfw.gif" alt="Combined sky effects"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Timer for the lightning strikes is a periodic function of several sine waves combined, clamped to range [0…2]. I’ve used a really handy &lt;a href="https://www.desmos.com/calculator" rel="noopener noreferrer"&gt;Desmos graphing calculator&lt;/a&gt; to visualize and tweak coefficients for this function — you can clearly see that the “spikes” of positive values create short periodic randomized bursts:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzg1mho8o1lpf1orre0ai.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzg1mho8o1lpf1orre0ai.png" alt="Lightning intensity graph&amp;lt;br&amp;gt;
"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Additionally, the sky sphere slowly rotates to make the background less static.&lt;/p&gt;
&lt;h2&gt;
  
  
  Ghosts shader
&lt;/h2&gt;

&lt;p&gt;Ghostly trails floating around the grim reaper are inspired by this Unreal Engine 4 Niagara tutorial — &lt;a href="https://www.artstation.com/artwork/ba4mNn" rel="noopener noreferrer"&gt;https://www.artstation.com/artwork/ba4mNn&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Initial idea was to use a geometry in a shape of cut-out from the cylinder side and rotate it around the centre of the reaper model. However, my brother created &lt;a href="https://github.com/keaukraine/webgl-reaper/blob/main/src/shaders/BendShader.ts" rel="noopener noreferrer"&gt;a shader&lt;/a&gt; for a more flexible approach to use a single geometry which can be rotated at arbitrary radius and stretched to arbitrary length.&lt;/p&gt;

&lt;p&gt;To achieve this, vertex shader changes the geometry of the original mesh. It modifies X and Y coordinates of the input model, bending them around the circle of given radius. Z coordinate is not getting additional transformations. It is responsible for scaling the final effect vertically. (World space is Z-up). Shader is tailored to work with a specific model — a tessellated sheet in the XZ plane (all Y coordinates are zero):&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxni0pcqr4c74ppprrtsw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxni0pcqr4c74ppprrtsw.png" alt="Ghost geometry"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Later, geometry was optimized to tightly fit our sprite texture in order to reduce overdraw:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzs8gc41k1aau87t2ur1t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzs8gc41k1aau87t2ur1t.png" alt="Ghost geometry optimized"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Based on the math of chord length, the X and Y coordinates of bent model are:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;x = R * sin(theta);
y = R * cos(theta);
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;where &lt;code&gt;theta = rm_Vertex.x / R&lt;/code&gt;, and &lt;code&gt;R&lt;/code&gt; is a bend radius. However, theta is calculated differently in the shader:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;float theta = rm_Vertex.x * lengthToRadius;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;lengthToRadius&lt;/code&gt; value is a uniform, but it is not just a reciprocal of &lt;code&gt;R&lt;/code&gt; — we can pass values greater than &lt;code&gt;1/R&lt;/code&gt; to get effect length scaled (because it essentially is a pre-multiplication of &lt;code&gt;rm_Vertex.x&lt;/code&gt;).&lt;br&gt;
This minor change is done in order to eliminate redundant uniform-only math in the shader. Preliminary division of length by radius is done on the CPU and this result is passed into the shader via &lt;code&gt;lengthToRadius&lt;/code&gt; uniform.&lt;br&gt;
I’ve tried to improve this effect by applying displacement distortion in fragment shader but it appears to be virtually unnoticeable in motion. So we kept the original simpler version with static texture, which is also cheaper for the GPU.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reduced colours filter
&lt;/h2&gt;

&lt;p&gt;Not implemented in the web version, but present in &lt;a href="https://play.google.com/store/apps/details?id=org.androidworks.livewallpaper.reaper" rel="noopener noreferrer"&gt;Android app&lt;/a&gt; is a reduced colours post-processing. This gritty effect perfectly fits the overall atmosphere and adds a right mood to the scene. It is implemented not as a separate post-processing render pass but is done in the fragment shader, so rendering is still essentially single-pass.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgip0zabwokkmqgty65dz.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgip0zabwokkmqgty65dz.jpeg" alt="Reduced colours filter"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It is based on code from Q1K3 WebGL game &lt;a href="https://github.com/phoboslab/q1k3" rel="noopener noreferrer"&gt;https://github.com/phoboslab/q1k3&lt;/a&gt;, and I highly recommend to read a blog post about making of seemingly impossible Q1K3 — &lt;a href="https://phoboslab.org/log/2021/09/q1k3-making-of" rel="noopener noreferrer"&gt;https://phoboslab.org/log/2021/09/q1k3-making-of&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Textures compression
&lt;/h2&gt;

&lt;p&gt;Android live wallpaper targets OpenGL ES 3.0+ and uses efficient ETC2 and ASTC compressed textures. However, WebGL demo is optimized only for the fastest possible loading time. I really hate when some simple WebGL demo takes forever to load its unjustifiably huge resources. Because of this, a decision not to use hardware compressed textures was made. Instead, textures are compressed as lossy WebP. Total size of all assets including HTML/CSS/JS is just 2.7 MB so it loads pretty fast.&lt;br&gt;
Recently, our &lt;a href="https://github.com/keaukraine/webgl-mountains" rel="noopener noreferrer"&gt;mountains WebGL demo&lt;/a&gt; has also been updated with smaller resources but it is still way larger than the Reaper one — it downloads 10.8 MB of data on initial load.&lt;/p&gt;

</description>
      <category>webgl</category>
      <category>webgl2</category>
      <category>javascript</category>
      <category>3d</category>
    </item>
    <item>
      <title>Filtering of half-float textures on different mobile GPUs</title>
      <dc:creator>keaukraine</dc:creator>
      <pubDate>Sun, 01 Aug 2021 18:18:24 +0000</pubDate>
      <link>https://forem.com/keaukraine/filtering-of-half-float-textures-on-different-mobile-gpus-40bl</link>
      <guid>https://forem.com/keaukraine/filtering-of-half-float-textures-on-different-mobile-gpus-40bl</guid>
      <description>&lt;p&gt;In this article I’ll describe what issues we’ve encountered during development of our latest Reunion &lt;a href="https://play.google.com/store/apps/details?id=org.androidworks.livewallpaperreunion"&gt;live wallpaper&lt;/a&gt; and its &lt;a href="https://keaukraine.github.io/webgl-reunion/"&gt;WebGL demo&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;All gorgeous hand-painted artwork is done by Conrad Justin, with only minor additions by me and my brother (lights for candles, dust particles, etc). You can compare our scene with an original 3D model on Sketchfab &lt;a href="https://sketchfab.com/3d-models/reunion-animated-7b8228daa20748e680e430703dfa706a"&gt;here&lt;/a&gt; and also make sure you take a look at the artist’s full portfolio &lt;a href="https://conradjustin.com/"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Usage of FP16 textures
&lt;/h2&gt;

&lt;p&gt;In both web demo and Android app we have animated objects for characters and decorations. And for this new art style of wallpaper scene, we introduced new types of rendering techniques and custom shaders. Here we have used vertex animations which are stored in floating-point textures (so the web demo requires WebGL 2).&lt;/p&gt;

&lt;p&gt;For our use case precision of half-float to store vertex positions is just good enough. And according to &lt;a href="https://www.khronos.org/registry/OpenGL-Refpages/es3.0/html/glTexImage2D.xhtml"&gt;OpenGL ES 3.0 specs&lt;/a&gt;, FP16 textures even can be filtered, which is really handy for animations - linear interpolation between animation frames will be handled by hardware virtually for free, without calculations in shader.&lt;/p&gt;

&lt;h2&gt;
  
  
  Issues with textures filtering
&lt;/h2&gt;

&lt;p&gt;Well, using texture filtering for interpolating animations was a good option only in theory - all you have to do is enable &lt;code&gt;GL_LINEAR&lt;/code&gt; for texture and you’re good to go. However in practice, the arithmetic precision of filtering is not perfect and is even somewhat hardware-specific.&lt;/p&gt;

&lt;p&gt;During development on PC everything was fine, but testing on different mobile phones with various GPUs revealed noticeable visual differences in hardware FP16 texture filtering. I suspect that the ANGLE wrapper for WebGL always uses full-precision floating point values in both shaders and textures, because we’ve already encountered precision issues in shader calculations during development of the &lt;a href="https://github.com/keaukraine/webgl-mountains"&gt;Iceland WebGL demo&lt;/a&gt; - they were seen only on mobile devices with native WebGL-to-OpenGL ES translation.&lt;/p&gt;

&lt;p&gt;Here is the reference rendering of squirrel character on PC without issues (Windows 10, Chrome or Firefox with ANGLE renderer):&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--oVFqak9D--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/n3umfdybor1ge1ex2t6x.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--oVFqak9D--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/n3umfdybor1ge1ex2t6x.gif" alt="Animation on PC"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And here are some results from different phones:&lt;/p&gt;

&lt;p&gt;Galaxy A21S with Mali GPU. Vertex shivering with some gaps between triangles:&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--6rarZ3ff--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/rqrscajckjtln5vy3r9a.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--6rarZ3ff--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/rqrscajckjtln5vy3r9a.gif" alt="Animation on Mali"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Moto E6s with PowerVR GPU. This one is closest to reference rendering on PC, but still has some gaps:&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--GMU-IlU9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/mccxjww8t3czexgwvdz4.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--GMU-IlU9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/mccxjww8t3czexgwvdz4.gif" alt="Animation on PoweVR"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And Google Pixel 3 with Adreno 630 GPU. Shivering is also present:&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--G33dbuZP--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/wnc74sl8aoh173o1z8b9.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--G33dbuZP--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/wnc74sl8aoh173o1z8b9.gif" alt="Animation on Adreno"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Solution
&lt;/h2&gt;

&lt;p&gt;The most obvious fix would be to use 32-bit &lt;code&gt;GL_FLOAT&lt;/code&gt; textures which have better precision and should be interpolated more correctly, but unfortunately they are &lt;a href="https://www.khronos.org/registry/OpenGL-Refpages/es3.0/html/glTexImage2D.xhtml"&gt;not filterable at all&lt;/a&gt; and animations look like in Quake 1 - without any interpolation:&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--piGCE20T--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/do5m55r3rufhdaub7a74.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--piGCE20T--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/do5m55r3rufhdaub7a74.gif" alt="Animation without filtering"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So we decided to implement a custom linear interpolation in the vertex shader. In our implementation, textures represent animation in the direction of increasing Y coordinate, so we have to sample only two adjacent texels in this direction.&lt;/p&gt;

&lt;p&gt;You can examine the source code of this shader &lt;a href="https://github.com/keaukraine/webgl-reunion/blob/master/src/shaders/DiffuseAnimatedTextureChunkedShader.ts"&gt;here&lt;/a&gt;. As you can see, it uses &lt;code&gt;highp&lt;/code&gt; precision for floats for precise and smooth interpolation of vertex positions.&lt;/p&gt;

&lt;p&gt;First, this shader has a function &lt;a href="https://github.com/keaukraine/webgl-reunion/blob/master/src/shaders/DiffuseAnimatedTextureChunkedShader.ts#L25"&gt;getCenter()&lt;/a&gt; which returns a center of texel for any arbitrary coordinate. It is used to get the color of two points.&lt;/p&gt;

&lt;p&gt;Actual filtering is done in the &lt;a href="https://github.com/keaukraine/webgl-reunion/blob/master/src/shaders/DiffuseAnimatedTextureChunkedShader.ts#L29"&gt;linearFilter()&lt;/a&gt; function, which samples two values half-texel higher and half-texel lower and linearly interpolates them based on how far centers of their texels are from actual sampled coordinates.&lt;/p&gt;

&lt;p&gt;Please note that shader samples colors not exactly half-texel higher and lower but a texel height is multiplied by a value slightly smaller than 0.5 - 0.49. This is done because floating point precision is limited and sampling exactly at the edge of 2 texels might get into texels which we don’t need. This will result in a broken animation - interpolation with the previous or next frame, instead of the current one. Sampling at offsets slightly lower than half texel height eliminates this issue.&lt;/p&gt;

&lt;p&gt;This implementation of custom texture filtering results in smooth interpolation, and it is identical on all tested platforms.&lt;/p&gt;

&lt;h2&gt;
  
  
  Result
&lt;/h2&gt;

&lt;p&gt;You can take a look at live demo page with this custom interpolation here - &lt;a href="https://keaukraine.github.io/webgl-reunion/"&gt;https://keaukraine.github.io/webgl-reunion/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And of course full source code is available on Github - &lt;a href="https://github.com/keaukraine/webgl-reunion"&gt;https://github.com/keaukraine/webgl-reunion&lt;/a&gt;&lt;/p&gt;

</description>
      <category>webgl</category>
      <category>opengl</category>
      <category>gpu</category>
    </item>
    <item>
      <title>More efficient ASTC decoding</title>
      <dc:creator>keaukraine</dc:creator>
      <pubDate>Sun, 04 Jul 2021 16:38:34 +0000</pubDate>
      <link>https://forem.com/keaukraine/more-efficient-astc-decoding-3do7</link>
      <guid>https://forem.com/keaukraine/more-efficient-astc-decoding-3do7</guid>
      <description>&lt;p&gt;This will be a really short but still useful article, explaining a couple lines of code which improve performance on compatible devices.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.khronos.org/opengl/wiki/ASTC_Texture_Compression" rel="noopener noreferrer"&gt;ASTC&lt;/a&gt; is a very efficient texture compression format - it combines decent image quality with high compression. It helps saving a lot of memory bandwidth on modern mobile GPUs.&lt;br&gt;
But can we crank it to 11 and make it run even faster? It appears we can (in certain scenarios and on supported hardware).&lt;/p&gt;

&lt;p&gt;ARM Mali GPUs support the extension &lt;a href="https://www.khronos.org/registry/OpenGL/extensions/EXT/EXT_texture_compression_astc_decode_mode.txt" rel="noopener noreferrer"&gt;GL_EXT_texture_compression_astc_decode_mode&lt;/a&gt;. According to ASTC specifications, even LDR textures are decoded into 16-bit floating point values. This extension provides a possibility to switch the hardware ASTC decoder into faster mode, decoding textures into lower precision normalized 8-bit unsigned integers. This is good enough for most real-life applications, since source textures are usually 24-bit RGB or 32-bit RGBA bitmaps.&lt;br&gt;
Using extension is as simple as adding 1 line of code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;GLES30.glTexParameteri(GLES30.GL_TEXTURE_2D, TEXTURE_ASTC_DECODE_PRECISION_EXT, GLES30.GL_RGBA8);
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Visually there was no image quality degradation, and performance was at the same steady 60 fps. However, looking at certain metrics in the GPU profiler, we can see reduced load on the compressed texture decoder and improved texture cache access.&lt;br&gt;
Here are measurements from ARM Streamline profiler for our &lt;a href="https://play.google.com/store/apps/details?id=org.androidworks.livewallpaperbuddha" rel="noopener noreferrer"&gt;3D Buddha Live Wallpaper&lt;/a&gt;, taken on mid-range Galaxy A21s phone:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmhd7zi7faqfre584r1xh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmhd7zi7faqfre584r1xh.png" alt="Performance improvement"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As you can see, this simple trick have noticeably improved texture lookups. As a result, it reduced memory bandwidth usage and device power consumption. So don’t be lazy and if this extension is detected, use it — it’s a minor change to code which gives virtually free performance boost on Mali GPUs.&lt;/p&gt;

&lt;p&gt;This optimization was suggested in 3-part ARM webcast “&lt;a href="https://www.brighttalk.com/webcast/17792/475587" rel="noopener noreferrer"&gt;Optimizing Android Graphics&lt;/a&gt;” — I highly recommend watching them.&lt;/p&gt;

</description>
      <category>opengl</category>
      <category>astc</category>
      <category>android</category>
      <category>optimization</category>
    </item>
    <item>
      <title>Creating mountains landscape in OpenGL ES</title>
      <dc:creator>keaukraine</dc:creator>
      <pubDate>Mon, 26 Apr 2021 19:43:27 +0000</pubDate>
      <link>https://forem.com/keaukraine/creating-mountains-landscape-in-opengl-es-4jaa</link>
      <guid>https://forem.com/keaukraine/creating-mountains-landscape-in-opengl-es-4jaa</guid>
      <description>&lt;p&gt;A few days ago we released a new nature-themed app — &lt;a href="https://play.google.com/store/apps/details?id=org.androidworks.livewallpapermountains"&gt;Iceland 3D Live Wallpaper&lt;/a&gt;. It has an interactive WebGL demo too, which you can find &lt;a href="https://keaukraine.github.io/webgl-mountains/index.html"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Terrain is based on this beautiful and detailed &lt;a href="https://sketchfab.com/3d-models/iceland-landscape-world-machine-d19e67af2292493a822dc3becd14efb4"&gt;3D model by Sergey Kuydin&lt;/a&gt;. Interestingly, this is not a real landscape of some part of Iceland. Even though it looks like a real thing, it actually is generated in World Machine. After analyzing model in Sketchfab we decided to create a live wallpaper with it, adding dynamic time of day. You should check out more of Sergey’s work, he has some high quality models and 3D scans too.&lt;/p&gt;

&lt;h2&gt;
  
  
  Scene Composition
&lt;/h2&gt;

&lt;p&gt;Scene is created from the purchased 3D model of terrain and other assets, like textures and models for the sky hemisphere, birds and sprites. They were created and tailored to fit the scene by my brother, who also proposed some recommendations on how to optimize certain aspects of the scene, and tweaked shaders when needed. As usual, the web demo was created before the Android app because it is faster to create a web prototype than an Android app and it is way easier for me and my brother to collaborate on the web project.&lt;/p&gt;

&lt;p&gt;To analyze scene rendering I will refer to source code. You can clone it from &lt;a href="https://github.com/keaukraine/webgl-mountains/"&gt;this repository&lt;/a&gt; or examine code on GitHub using links to files I will provide below.&lt;/p&gt;

&lt;p&gt;Scene is rendered in 35 total draw calls. Order of rendering is carefully chosen to efficiently use z-buffer culling. The nearest objects are drawn first, the most distant ones last. After that we render transparent objects:&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--vSKJ_9nM--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ohhgvixyq0w9q2u8zxcp.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--vSKJ_9nM--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ohhgvixyq0w9q2u8zxcp.gif" alt="Scene draw order"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;All actual draw calls are issued in &lt;a href="https://github.com/keaukraine/webgl-mountains/blob/main/src/MountainsRenderer.ts#L690"&gt;drawSceneObjects()&lt;/a&gt; method of &lt;code&gt;MountainsRenderer.ts&lt;/code&gt;. Let’s analyze how they are rendered.&lt;/p&gt;

&lt;p&gt;Birds are rendered first because they can occlude both terrain and sky. They are rendered in 12 draw calls.&lt;/p&gt;

&lt;p&gt;Next, we render terrain. Original high-poly model is simplified in Blender using the Decimate modifier to ~30k triangles which results in detailed enough geometry.&lt;/p&gt;

&lt;p&gt;And of course to create a vast, huge mountainous landscape by reusing a single terrain model we use the same terrain skirt technique as in Dunes wallpaper (described in our previous article &lt;a href="https://dev.to/keaukraine/rendering-dunes-terrain-in-webgl-30k2"&gt;here&lt;/a&gt;, and the &lt;a href="https://youtu.be/In1wzUDopLM?t=2586"&gt;original implementation is in Halo Wars&lt;/a&gt;). The basic idea of this technique is to draw the same terrain tile mirrored at the every edge of main terrain. However, in Dunes live wallpaper there was one flaw in this. On mirrored tiles shadows from pre-rendered lightmaps were on the wrong slopes — lit by the sun. Because of the overall dunes terrain simplicity and low camera placement it was concealed and virtually unnoticeable. I must give a huge credit to &lt;em&gt;u/icestep&lt;/em&gt; from Reddit who &lt;a href="https://www.reddit.com/r/opengl/comments/kq64df/interactive_dunes_desert_scene_webgl/"&gt;have found this and suggested a fix&lt;/a&gt; to create 4 different lightmaps for 4 possible tile orientations. But because mountains have deep, sharp shadows this cheap trick becomes clearly visible from almost any place in the scene so we had to implement a fix for this. Luckily, by clever placement of the sun (alongside one of the axes) we have to render just 2 lightmaps — for sunlight in the correct and in the flipped direction. While actual tiles are still mirrored (cameras avoid certain angles where seams are too obvious), proper lighting somewhat conceals this cheap trick with geometry from human’s eyes.&lt;/p&gt;

&lt;p&gt;Here you can see that with correct lightmaps shadows appear on correct side of both flipped and regular tiles:&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--hcJkf2y0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/03iwalom0h98ze0fojjy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--hcJkf2y0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/03iwalom0h98ze0fojjy.gif" alt="Skirt lightmaps"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After the terrain we draw a sky hemisphere object with the basic &lt;a href="https://github.com/keaukraine/webgl-framework/blob/ts/src/shaders/DiffuseShader.ts"&gt;DiffuseShader&lt;/a&gt;, and then draw 11 cloud sprites. Then we draw a sun sprite. These transparent objects are rendered without writing to the depth buffer. Clouds and sun have trimmed geometries for less overdraw. You can read about this optimized sprites technique &lt;a href="https://dev.to/keaukraine/implementing-soft-particles-in-webgl-and-opengl-es-3l6e"&gt;here&lt;/a&gt;. We decided not to use soft particles for clouds because scene size allowed us to place them so that they don’t intersect with other geometries while still partially occluding some peaks. Not using soft particles is beneficial for performance because they require additional render pass to render scene depth.&lt;/p&gt;

&lt;h2&gt;
  
  
  Terrain shaders breakdown
&lt;/h2&gt;

&lt;p&gt;The main object in the scene is obviously the terrain. And it must look good while maintaining acceptable performance. Here I’ll explain some optimizations and tricks used to achieve a balance between these two mutually exclusive goals.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/keaukraine/webgl-mountains/blob/main/src/shaders/TerrainWaterShader.ts"&gt;Terrain shader&lt;/a&gt; applies the following effects to the base diffuse color:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Water reflection&lt;/li&gt;
&lt;li&gt;Baked lightmap&lt;/li&gt;
&lt;li&gt;Fog&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This allows terrain to have crisp shadows, subtle atmospheric fog and the sun reflection in the water creeks and puddles created by thawed snow. The last one is a small detail but really improves overall scene quality when viewed against the sun:&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--YsZnfzP8--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/tnzy6oqaij8mc1bi7foa.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--YsZnfzP8--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/tnzy6oqaij8mc1bi7foa.gif" alt="Specular water reflections"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So in addition to diffuse texture and two lightmaps (for regular and flipped tiles), this requires a separate specular channel for water. And these textures are really large ones — 4096x4096 pixels, so that’s quite a lot of data. To optimally store this information, we use only two large textures and one small auxiliary one. First texture is necessarily a diffuse map. The second one is a combined lightmap, which contains two lightmaps for regular and flipped tiles in red and green channels. Blue channel is used to store the water specular reflection map. But wait, you may say, in sunrise and sunset scenes it is clearly seen that lightmaps are colored! How RGB data can be stored in a single channel? That’s why we use that auxiliary texture. It is a small color ramp — a 256x1 gradient for coloring greyscale lightmap.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--i6nFzm3A--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/zm2ncsw28sim08cd3m6i.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--i6nFzm3A--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/zm2ncsw28sim08cd3m6i.gif" alt="Different stages of terrain rendering&amp;lt;br&amp;gt;
"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Assuming that the virtual sun is positioned alongside the same axis of the scene as the flipped skirt tiles, we can optimize this even further. This way we can actually have only two lightmaps rendered — for high and low sun position. We can treat the regular lightmap channel as a sun direction and the flipped one as a “sun is on the opposite side of sky” direction. This allows us to reuse the same “high sun” lightmap for day/night and “low sun” lightmap for sunrise/sunset, by merely swapping regular and flipped channels for different times of day.&lt;/p&gt;

&lt;p&gt;Let’s take a look at shader source code. It is in the file &lt;a href="https://github.com/keaukraine/webgl-mountains/blob/main/src/shaders/TerrainWaterShader.ts"&gt;TerrainWaterShader.ts&lt;/a&gt;. At the very end of fragment shader code you can uncomment one of 6 lines to visualize intermediate passes shown on GIF above. You may notice that shader doesn’t consume normals from any attribute and instead in calculation of specular reflection we use &lt;a href="https://github.com/keaukraine/webgl-mountains/blob/main/src/shaders/TerrainWaterShader.ts#L58"&gt;constant normal&lt;/a&gt;. This is another optimization to reduce geometry size — geometry indeed doesn’t have normals because water is placed in almost perfectly flat part of terrain and accurate vertex normal can be substituted with constant upward normal.&lt;/p&gt;

&lt;p&gt;For skirt terrain we use a simplified version of the shader without water reflection — &lt;a href="https://github.com/keaukraine/webgl-mountains/blob/main/src/shaders/TerrainShader.ts"&gt;TerrainShader.ts&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In the &lt;code&gt;MountainsRenderer&lt;/code&gt; in &lt;a href="https://github.com/keaukraine/webgl-mountains/blob/main/src/MountainsRenderer.ts#L498"&gt;initShaders()&lt;/a&gt; method you may see that we create a pair of each terrain shader — with water and simplified one, both regular and flipped.&lt;/p&gt;

&lt;h2&gt;
  
  
  Shader precision
&lt;/h2&gt;

&lt;p&gt;You may notice that the terrain fragment shader for the skirt has &lt;a href="https://github.com/keaukraine/webgl-mountains/blob/main/src/shaders/TerrainShader.ts#L37"&gt;reduced floating point precision&lt;/a&gt;. For main terrain we need &lt;code&gt;highp&lt;/code&gt; precision for correct rendering of water, and since the skirt version doesn’t have these reflections, &lt;code&gt;mediump&lt;/code&gt; is enough.&lt;/p&gt;

&lt;p&gt;At first this may look like a minor optimization, however it is actually a quite important one because it runs noticeably faster, meaning GPU load is reduced. Even when tested on not state-of-the-art Pixel 3 phone, both precisions result in steady 60 fps. However, reducing load on GPU means it has more free power to draw smoother UI and reduces overall power consumption which is very important for live wallpapers.&lt;/p&gt;

&lt;p&gt;In general, &lt;code&gt;highp&lt;/code&gt; instructions on modern mobile GPUs are twice as slow as &lt;code&gt;mediump&lt;/code&gt; or &lt;code&gt;lowp&lt;/code&gt;. Obviously shader has a bunch of other non-math instructions to run, so what impact does reducing precision have? While this value is different for different GPUs, we can use some tools to measure it. For example, an offline PowerVR shader compiler can be used to analyze it for this specific hardware. And targeting PowerVR Series6 GPUs we get 18 cycles for &lt;code&gt;highp&lt;/code&gt; and 13 cycles for &lt;code&gt;mediump&lt;/code&gt; shaders. This is 28% of performance increase for a shader which is used to draw quite a significant part of a scene’s fragments.&lt;/p&gt;

&lt;h2&gt;
  
  
  Targeting different OpenGL ES versions for Android
&lt;/h2&gt;

&lt;p&gt;This is our first Android live wallpaper which doesn’t support OpenGL ES 2.0 at all. &lt;a href="https://developer.android.com/about/dashboards"&gt;Only 10%&lt;/a&gt; of Android devices are limited to OpenGL ES 2.0 and these must be some really old, outdated devices. So we support only OpenGL ES 3.0 and up — app has two sets of resources for ES 3.0 and ES 3.2. For devices with ES 3.0 we use ETC2 textures which provide acceptable image quality at the same size as ETC1. However, compression is still not enough to keep textures small so we had to downsample them for ES 3.0. On devices with ES 3.2 we use more advanced ASTC compression for textures with better quality and better compression. This allows us to use textures of high resolution on modern devices. Here are some sample texture sizes:&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ftbAZWf2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/euhnso50c34706n0lyf4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ftbAZWf2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/euhnso50c34706n0lyf4.png" alt="Texture size comparison"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Color ramp textures are uncompressed because color accuracy is critical here, but since they are really tiny they don’t use a lot of memory.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Update:&lt;/strong&gt; In about a week after releasing app, I've compared diffuse texture compressed with ASTC 8x8 and 10x10 blocks. Higher compression definitely introduces some distortions caused by extreme compression. However, on such fuzzy images as aerial terrain imagery it is really hard to tell the difference between compression artifacts and actual random features of terrain. You can see the very similar results when compressing different images into medium-quality JPEG, which also uses fixed 8x8 pixel blocks to compress images. Images with thin sharp lines (like text and diagrams) will have notorious blocky JPEG artifacts, but you won't tell a difference between compressed and original photos of nature. So I've updated app to use even better compressed diffuse texture.&lt;/p&gt;

&lt;p&gt;For geometry, both vertex and texture coordinates use half floats. This precision is enough for vertex coordinates, and because we use textures significantly larger than 256 we can’t use bytes for texture coordinates — 8-bit precision for 4096x4096 diffuse texture will be 16 texels.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final result
&lt;/h2&gt;

&lt;p&gt;Full source code is available on GitHub &lt;a href="https://github.com/keaukraine/webgl-mountains/"&gt;here&lt;/a&gt; and the live demo page is &lt;a href="https://keaukraine.github.io/webgl-mountains/index.html"&gt;here&lt;/a&gt;. Click on the scene to change time of day (it may take a couple seconds to load textures), and by pressing Enter you can enter free-camera mode. Press and hold the right mouse button to look, and use WASD to move around.&lt;/p&gt;

&lt;p&gt;And of course you can get an Android live wallpaper app from Google Play &lt;a href="https://play.google.com/store/apps/details?id=org.androidworks.livewallpapermountains"&gt;here&lt;/a&gt;, it is free.&lt;/p&gt;

</description>
      <category>webgl</category>
      <category>javascript</category>
      <category>opengl</category>
      <category>3d</category>
    </item>
    <item>
      <title>Rendering dunes terrain in WebGL</title>
      <dc:creator>keaukraine</dc:creator>
      <pubDate>Tue, 19 Jan 2021 09:41:23 +0000</pubDate>
      <link>https://forem.com/keaukraine/rendering-dunes-terrain-in-webgl-30k2</link>
      <guid>https://forem.com/keaukraine/rendering-dunes-terrain-in-webgl-30k2</guid>
      <description>&lt;p&gt;We’ve released a new &lt;a href="https://play.google.com/store/apps/details?id=org.androidworks.livewallpaperdunes" rel="noopener noreferrer"&gt;live wallpaper for Android&lt;/a&gt; and simultaneously published a live demo page showcasing all features of an app. You can check the &lt;a href="https://keaukraine.github.io/webgl-dunes/index.html" rel="noopener noreferrer"&gt;webpage here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Historically, the web demo was created first — it was used as a prototyping playground to compose a scene and to fine-tune shaders. Also, this really helps in sharing work between a team of two people without necessity to learn Android Studio for both. And when everything was polished and looked good enough, an Android app was created quite fast based on the web demo code. Porting code to Android is a quite straightforward and easy process because our &lt;a href="https://github.com/keaukraine/webgl-framework" rel="noopener noreferrer"&gt;WebGL framework&lt;/a&gt; has the same method signatures as the framework used in Android apps.&lt;/p&gt;

&lt;h2&gt;
  
  
  Scene composition
&lt;/h2&gt;

&lt;p&gt;Scene is quite simple and contains just six objects — terrain, sky, dust particles, sun, birds, and palm trees.&lt;/p&gt;

&lt;p&gt;To examine how objects are rendered, you can take a look at &lt;code&gt;drawScene()&lt;/code&gt; method in &lt;a href="https://github.com/keaukraine/webgl-dunes/blob/master/src/DunesRenderer.ts" rel="noopener noreferrer"&gt;DunesRenderer.ts&lt;/a&gt; — first we render depth map to texture (this is needed for soft particles), then render on-screen objects in front-to-back order (first closest and largest objects, then distant) to efficiently utilize z-buffer culling.&lt;br&gt;
Terrain in the scene is represented as a single square tile. The base for terrain is &lt;a href="https://www.cgtrader.com/3d-models/exterior/landscape/desert-fa02d784-9991-4458-9a01-95cf91ec0178" rel="noopener noreferrer"&gt;this model purchased on CGTrader&lt;/a&gt;. Its polycount is reduced to 31k faces in order not to split geometry and to draw it with a single draw call. This polycount produces a reasonably good quality. However, its area is not quite large enough to create a feel of infinite sand desert — when the camera is placed slightly above terrain boundaries of square terrain its limits are clearly visible:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F5vgef4ym63by6zrkh360.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F5vgef4ym63by6zrkh360.jpg" alt="Clearly visible boundaries of terrain&amp;lt;br&amp;gt;
"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Apparently this reduces the range of camera movement and creates an unwanted feeling of terrain “floating” in space. To eliminate this effect and improve the immersiveness of the scene we use a technique called “terrain skirt”. We learned about it from this great &lt;a href="https://youtu.be/In1wzUDopLM?t=2586" rel="noopener noreferrer"&gt;GDC talk about terrain in Halo Wars&lt;/a&gt;. You should definitely watch the whole video as it explains a lot of other interesting and unique techniques which might come in handy. The idea behind this terrain skirt is to render the same tile at the edges of tile but mirrored away from the center of the scene. This significantly expands the area of terrain. This screenshot shows all 8 additional tiles rendered (with additional gaps to separate tiles):&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F5usuy4twg5eri7d82nyy.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F5usuy4twg5eri7d82nyy.jpg" alt="Additional skirt tiles&amp;lt;br&amp;gt;
"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can see a mirroring of tiles at the edges where duplicate tiles connect with the main one but it is not noticeable in the final app because the camera is placed only within the main tile avoiding looking at those edges directly. We render additional tiles 1.5 times larger than original ones, effectively increasing perceived dimensions of terrain 4 times. This short clip showcases how final extended terrain looks with and without skirt:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F6fqssrhdcu0sa56p19qa.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F6fqssrhdcu0sa56p19qa.gif" alt="Terrain with and without skirt&amp;lt;br&amp;gt;
"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As you can see, this simple trick creates a vast, seemingly endless terrain stretching up to horizon with very little effort and reuses existing geometries.&lt;/p&gt;

&lt;h2&gt;
  
  
  Dust particles
&lt;/h2&gt;

&lt;p&gt;For dust effect soft particles are used. You can read more about this technique in our previous article — &lt;a href="https://dev.to/keaukraine/implementing-soft-particles-in-webgl-and-opengl-es-3l6e"&gt;https://dev.to/keaukraine/implementing-soft-particles-in-webgl-and-opengl-es-3l6e&lt;/a&gt;.&lt;br&gt;
The only object rendered to a depth texture for soft particles is the main terrain tile because that’s the only geometry particles intersect with. To make this rendering faster, the simplest fragment shader is used to render this object instead of the complex one used to render the on-screen terrain.&lt;/p&gt;

&lt;h2&gt;
  
  
  Dunes shader
&lt;/h2&gt;

&lt;p&gt;To simulate the effect of wind creating sand waves on the dunes surface we’ve developed a quite complex shader. Let’s take a look inside of it. Please note that while we will explain GLSL code of shader, the generic techniques and approaches used in it can also be applied to recreate similar material in Unity/Unreal engines.&lt;br&gt;
The code of the shader can be found in &lt;a href="https://github.com/keaukraine/webgl-dunes/blob/master/src/shaders/DunesShader.ts" rel="noopener noreferrer"&gt;DunesShader.ts&lt;/a&gt;. Let’s analyze it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Diffuse color and lightmaps
&lt;/h3&gt;

&lt;p&gt;Terrain uses a quite large texture — 2048x2048 for web demo, and up to 4096x4096 in Android app. Obviously it takes quite some memory so to efficiently use it, some tricks were used. The main diffuse color for dunes is actually stored as a single-channel grayscale value in the red channel of terrain texture. Actual color of sand is specified by &lt;code&gt;uColor&lt;/code&gt; uniform which is multiplied by grayscale diffuse value. The other 2 channels contain lightmaps for high sun (day and night) and low sun (sunrise and sunset). Since it is not possible to use uniforms for accessing texture data, two versions of shader are compiled for two lightmaps. Final diffuse color is multiplied with shadows color.&lt;/p&gt;

&lt;h3&gt;
  
  
  Moving sand effect
&lt;/h3&gt;

&lt;p&gt;Next, let’s take a look at how the moving wind effect is created. You may notice that it is different for windward and leeward slopes of dunes. To determine which effect to apply to which slope, we calculate blending coefficients from surface normal. These coefficients are calculated per vertex and are passed into the fragment shader via &lt;code&gt;vSlopeCoeff&lt;/code&gt; and &lt;code&gt;vSlopeCoeff2&lt;/code&gt; varyings. You can uncomment corresponding lines in fragment shader to visualize windward and leeward parts with different colors:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fsirwl5qaar5hguobc1sy.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fsirwl5qaar5hguobc1sy.jpg" alt="Slopes visualized&amp;lt;br&amp;gt;
"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Both slopes use the same texture applied to them but windward one is more stretched. Texture coordinates for both slopes are also calculated in vertex shader to prevent dependent texture reads. Wind movement is done by adding offset to texture coordinates from &lt;code&gt;uTime&lt;/code&gt; uniform.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fog
&lt;/h3&gt;

&lt;p&gt;The next important thing to get a realistic result is to apply atmospheric fog. For performance reasons, we use a simple linear fog which is calculated in the vertex shader. Fog range is controlled by two uniforms — &lt;code&gt;fogStartDistance&lt;/code&gt; and &lt;code&gt;fogDistance&lt;/code&gt; and value to be used in the fragment shader is calculated and stored in &lt;code&gt;vFogAmount&lt;/code&gt; varying. Fragment shader applies fog color from &lt;code&gt;uFogColor&lt;/code&gt; uniform based on the value of this varying.&lt;br&gt;
Fog color is adjusted for far terrain edges to blend with sky texture. And the sky texture is also edited to have a distant haze of the same fog color in places where it should blend with the terrain.&lt;/p&gt;

&lt;h3&gt;
  
  
  Detail texture
&lt;/h3&gt;

&lt;p&gt;Even though the overall terrain texture is quite large, it covers a large area and therefore still not detailed enough for close-ups. To make dunes less blurry and more realistic when observed from the ground we apply a detail texture to it. It is a small 256x256 texture which has 2 different sand ripples patterns in 2 channels for different slopes. Detail texture can either darken or lighten diffuse color. To achieve this, first we subtract 0.5 from the detail color so it can have negative value, and then this value is added to the final color. This way, 50% gray color in detail texture doesn’t affect diffuse color, darker values will darken it and brighter values will brighten color. Detail texture is applied the similar way as the fog — it has two uniforms to adjust cutoff distance where detail texture is not needed. You can uncomment a line in fragment shader to visualize detail texture range in red channel:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fh5numt4jcyjcxkomomtx.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fh5numt4jcyjcxkomomtx.jpg" alt="Detail texture range"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Result
&lt;/h2&gt;

&lt;p&gt;You can see a live demo page &lt;a href="https://keaukraine.github.io/webgl-dunes/index.html" rel="noopener noreferrer"&gt;here&lt;/a&gt;. It is interactive — you can click to change time of day. And on desktop to examine the scene from any arbitrary position you can go into free flight mode by pressing the Enter key. In this mode, to rotate the camera hold the right mouse button and to move use WASD keys, Space to go up and C to go down. Hold Shift while moving to accelerate.&lt;br&gt;
Full source code is available on &lt;a href="https://github.com/keaukraine/webgl-dunes/blob/master/src/shaders/DunesShader.ts" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;, if you are interested in recreating similar effects you can clone and use it for your needs — it is licensed under permissive MIT license.&lt;/p&gt;

</description>
      <category>webgl</category>
      <category>javascript</category>
      <category>3d</category>
      <category>webdev</category>
    </item>
  </channel>
</rss>
