Forem: Adam

I Built a Chess AI That Plays Like Me—Here’s How It (Almost) Failed

Adam — Wed, 26 Mar 2025 03:56:47 +0000

Do you like playing chess? With the chess boom that has been happening throughout the last few years, many new people got into it including me. Have you ever played against those celebrity bots on Chess.com and wondered, why are there limited amount of them at a very limited time? I mean, can't we just built our own chess AI? Stockfish and other engines are much better at chess than us anyway. If Chess.com can adjust it to play like someone, why not anyone? That's what I've thought too, and my main drive to build DeezChess!

First plan, first failure

I just learned Unity, so why not build a chess game via that. You can even build it for web via WebGL and publish on itch.io. Plus, I'm sure most chess and programming fans have watched this chess bot video by Sebastian Lague and got interested in building their own bot. I mean how hard could it be?(famous last words)

The plan is simple, use Unity as front end and C++ as backend cause it's fast (definitely not a good reason) and I can connect it with Unity via a DLL. Could be an awesome learning experience right? Well, yes until you realize C++ DLL won't build for WebGL!

But now comes the other problem, how to make the core 'AI plays like me' to work? At first, the first thought that came to mind was to do something like what fine tuning Leela Chess Zero, but since I am still an unemployed student with no GPU, using Google Colab for experimenting is just not worth it.

Well, I'll just use stockfish and fine tune it to play according to an opening book and certain strength. But stockfish.exe just simply, again, won't build for WebGL!

Screw it, let's just get it working

So in the end, I settled with using a FastAPI backend that will handle all the legal move generation as well as host the stockfish engine. It's a simple design that's definitely not very scalable, but it got the job done. The final flow is something like this:

User upload a PGN file as well as the name of the player it want to mimic.
The FastAPI server respond with two files, a .bin file for the opening book, and a .json config file which contains information that we can use to 'fine tune' stockfish, such as estimated elo and contempt score.
Unity download those files and store it as persistent data path
Every time a bot move need to be made, Unity make a request with those 2 files, and get a uci move string as a response.
The server will reply with either the move found in the opening book, or prompt stockfish using given configs.

Holy crap it actually works?!

After months of implementing it (probably could have been shorter but I'm also busy with exams and studies) and countless hours of debugging, I actually managed to get it working on both WebGL and normal Windows unity build!

I then containerized the backend python code using Docker and deploy it on render.com because of how fast and easy it is. I also build everything for WebGL for the last time, upload it on itch.io, and a huge sense of accomplishment when I can actually play against my own bot!

Lessons, regrets, and what's next

The biggest lesson for me was definitely that your plan could, and definitely would change. Your final product would be vastly different from what you had initially in mind, especially if it's your first few big projects. Just reiterate, change what you can, and actually get it to work first. No need to use fancy technology and tools that you don't understand (like I did for C++) unless you're absolutely sure you need it.

In the end, I did learn a lot and enjoyed the journey. After 6 months, I think it's time to move on. There are so many things I could improve, but sometimes, you just have to say 'it's done' and move on to the next challenge.

Anyways, hope you enjoy this article, learnt something, or at least entertained. Do let me know in the comments what you think of this project, any advice you want to give!

Image super-resolution using GAN (SRGAN)

Adam — Sat, 25 May 2024 12:51:05 +0000

Hi and welcome to my other blog post on the series on image super-resolution using GAN. This post is heavily based on this research paper that proposed a better way to solve image super-resolution problem compared to other methods available at the time

So before we start , I will assume you have a basic understanding of super-resolution using CNN, if you're not you can check out this series i made explaining it. I also would assume that you already have basic understanding of GAN or Generative Adversarial Network. If not, you can check my previous post on it here

The problem with SRCNN

So one big problem that's the solution is proposed in the paper is the loss function used in SRCNN, the Mean Squared Error loss function. Although it works fine as a loss function, it didn't preserve the perception we human have on images, it just try to make sure the pixel values match the label as close as possible. But the human eye didn't view image on the pixel-basis, so using this loss function makes the model miss opportunities to capitalize on the more important factor for human in a high resolution image.

The paper proposed a new loss function, called the perceptual loss which is a combination of adversarial loss and content loss. The exact equation used in the paper is here, although we won't dive into the math here:

So the perceptual loss is although a combination of content loss and adversarial loss, it seems like the content loss is given 1000x more priorities than the adversarial loss. You can think of it like this. If the adversarial loss is given high priorities or in this case weights, than the perceptual loss, than our result would be the generator produce image that are too similar to the original image, which misses the point of actually generating realistic super-resolved image.

Content loss

So basically this loss function is a form of MSE loss but modified to not depend on the pixel value, giving more priority for the perception of image itself. This is achieved by combining it with VGG loss, where VGG is a pre trained model on millions of images that have a good understanding of what image is made up of. This way, the model would have better perception of image quality compared to pixel-wise loss function.

Adversarial loss

This loss is added to the perceptual loss equation to favor the image that closely resemble the original image. This loss is used for the adversarial portion of the GAN, which is when the output of the generator act as input of the discriminator.

Discriminator

The job of discriminator here is to determine whether an image is an original image or super-resolved image. The architecture of the discriminator is as follows:

Note that k = kernel size, n = number of feature maps, and s = strides.
BN is batch normalization which is a way to further normalize the data in a batch so that if an outlier value exist during training, it won't affect the network much if it's been normalized. Note that the last 4 layers is implemented to flatten the previous layer's array onto a single dimension, like a normal ANN. It also introduces non- linearity through Leaky ReLu, combining all the weights onto single output dense layer, then finally adding sigmoid function to it for the final output which will classify whether an image is real or super-resolved.

Generator

Meanwhile, the generator job is to convince the discriminator that the image it produced is a real image, not a super-resolved one. The architecture for generator as proposed in the paper is here:

Also here, note that the actual number of residual blocks is 16, which is mentioned in the paper but omitted in the image for simplicity. This generator used the perceptual loss is used for training of this network as proposed by the researchers.

Conclusion

So that's the overview of super-resolution using GAN. Hope you guys like it and learn something new here. If you would like to learn more, I link video resources from Youtube down below in the resources section which give a more detailed explanation as well as how to implement the model in terms of code. So, that's all for me for now and see you guys next time!

Resources

If you want the complete walkthrough of the paper, I found these videos helpful:

Generative Adversarial Network (GAN)

Adam — Sat, 25 May 2024 11:34:16 +0000

In the field of machine learning, there's all kinds of models and architecture proposed by researchers around the world every year to solve a particular problem. One such model architecture are called Generative Adversarial Network, or GAN for short. Today we are going to dive into it and learn what is it, how it works, as well as it's application in the real world.

What is it?

So first of all, I would assume you are familiar Convolutional Neural Network (CNN) because GAN is built on top of it with a little more modification to it. If you didn't know what CNN is yet, you can read my blog post series about it here

So now that that's out of the way, let's dive into it. What does GAN stands for? It's Generative Adversarial Network. To put it simply, the network contain some adversarial or in other word competing part that generate something. So for there to be a competition there needs to be at least two people or things right? So in this context, the two things are the Generator Discriminator and Discriminator model.

Discriminator

Let's start with the discriminator. Its main job is to differentiate fake data from real data. What does that mean? Suppose we're using a GAN to generate fake human faces. The discriminator's job is simple: it tells whether an image is a real human face or not. You can imagine the structure of this model similar to a normal CNN with a sigmoid function on the output layer that gives the probability that the image is a human face. Pretty simple, right?

Generator

Now, the generator's job is to create fake images that it inputs into the discriminator to fool it. Returning to the human face example, the generator's role is to create fake human faces, starting from random noise and outputting an image to pass as input to the discriminator. Structurally, the generator is similar to a CNN that outputs pixel values of an image.

Combining them together

First, we train the discriminator model. We train it with a combination of real and fake human face images, labeled so it can learn to differentiate them. It does this by extracting features of the images, like recognizing that human faces have two eyes and a nose. Once the discriminator gets good at its job, we start training the generator. Initially, the generator produces random images that don't look like faces at all. These images are passed to the discriminator, which correctly identifies them as fake.

Based on the results, the model that loses (incorrectly identifies or generates) updates itself to improve. For example, if the discriminator correctly identifies a fake face, the generator learns from this feedback and adjusts to produce more realistic faces. Conversely, if the discriminator mistakenly identifies a real face as fake, it updates itself to improve its accuracy. This process continues iteratively until the generator produces convincingly realistic images.

Conclusion

That's the basic idea of GAN. There are many use cases of it, ranging from computer vision, natural language processing and even game development and virtual reality. In the next post though, we will see how GAN is implemented in the task of image super resolution using GAN, or SRGAN for short, based on a fairly new research paper on 2017. Until then, I hope you guys like the post and learn something from it. See you!

Super Resolution using CNN

Adam — Mon, 29 Apr 2024 14:41:41 +0000

Hello everyone! Now that I've explained the basics of CNN in the last post of this series, now I want to further expand on it's usage, specifically in super resolution or SR for short. If you guys haven't read about it, you can do so here

Super-Resolution is a type of problem where a low resolution image need to be converted to a high resolution one. There are many ways to solve this problem, but in this post I want to specifically explain a way to use CNN as it's the method this paper use to solve SR.

Types of methods in solving SR

There are two main ways SR can be achieved, either by internal-based SR or external based SR.

Internal based SR means that the high resolution (HR) image is constructed directly from the input image. This could be achieved by let's say adding an extra pixel in between each pixel, and fill that 'gap' pixel with some predicted values, like via interpolation techniques such as bicubic interpolation.

Meanwhile, external-based SR is achieved by using external data, meaning images other than the input image to find a characteristics and features that exist and trying to reconstruct a higher resolution version of the input image by the patterns we collected from many other images. One type of solution that can be classified in this category is sparse coding SR.

CNN based SR method we're discussing is another type of external based SR, where it basically improve the sparse coding based SR, but optimizing it in places where sparse coding SR didn't. To understand this further, let's understand sparse coding based SR solution.

Sparse coding

So this method is basically divided into 3 parts: patch extraction, sparse representation, dictionary learning and reconstruction.

1. Patch extraction and representation

So let's say we have an 8x8 pixels image of the number 2. We can have a patch matrix, let's give it a dimension of 2x2.

This patch matrix will overlap our image, and basically copying the overlapped pixel values and store it as a patch vector. Then, we move the patch a few pixel next, let's say of step 1, and repeat this process until the whole image is covered.

A patch could, for example, represent the horizontal straight line at the bottom of the number 2

2. Sparse representation

The patches would enter some kind of function that would make it sparse, meaning most of the values would be zeros. This would make the patch focus on the feature it extracted only.

In our example of the patch that represent the horizontal line, sparse representation of it could mean that the vector is now better represent the line, like there could be no more extra lines that connect the lower part with the upper curve of the shape 2, just the horizontal line

3. Dictionary learning

Here, we will update our dictionary with the sparse patch vector, where we can use it for reconstruction of new image later.

So, we can add the horizontal line to our dictionary, making it available to use when reconstructing another image later.

4. Reconstruction

Now that we have the patch in our dictionary, when we receive a new image as an input, we will try to represent the image using our dictionary.

For example we now get the number 7. We can use the feature horizontal line we extracted earlier from number 2, and put it to form the image of number 7.

Advantages of using CNN for this problem

So CNN solution for SR share some similarity with the sparse coding method, but instead of optimizing the algorithm by optimizing the dictionary learning part, CNN directly create a mapping between high resolution and low resolution image, essentially eliminating the use of dictionary altogether.

This is possible because of the nature of neural network that can learn complex patterns via weights and biases, different number of layers, activation functions and so on.

As a result of this, CNN method for SR is much more lightweight, making it faster and more suitable for production. Furthermore, more complex patterns could be found and used as part of reconstruction if more training data is used, hence making it a better model overall. Also, this method can also be easily customized to find the most suitable combination of advantages and trade offs by increasing and decreasing the hyperparameters like learning rate, number of layer, dimension of kernels, steps in convolution operation and so on.

How it works

So now that we know why SR using CNN could be better than other existing external based SR method, let's dive deeper into how it works.

1. Patch extraction and representation

Unlike in sparse coding where we manually extract the patch , features is extracted when the image is fed to the neural network, through the convolution operations and so on.

2. Non-linear mapping

This step is important, as it introduce non linearity to our features. Meaning, a feature could be formed from more than one type of patterns, making more possibilities for us to pick and choose the features to reconstruct the image later. This is achieved through activation functions, commonly used ones are rectified linear unit (ReLu), sigmoid and hyperbolic tangent functions.

3. Reconstruction

After training is completed, where losses is minimized and all the weights, biases and hyperparameters have been optimized, we can reconstruct a low resolution image in a feed forward manner and output a higher resolution image as an output.

Conclusion

We have discussed what is super resolution, how the problem is tackled via internal and external base methods. We also expanded on the external base method, by explaining the details of sparse coding and relating it to CNN based SR which basically improved it further by directly mapping the low resolution image to the high resolution one.

I hope this post has been helpful to anyone reading this. If I made some mistakes or overlooked some important points, please do point it out on the comments below so we all can learn more about it together. Thank you and see you guys next time!

Convolutional Neural Network

Adam — Sun, 28 Apr 2024 14:09:26 +0000

Hello everyone! If you've been following my post for the past week, I would like to announce that I need to pause my AI Chess project since a new exciting offer just came to me and I need to learn a new technology! So, this series would be about image enhancement, which basically means changing an image from low-resolution to a high-resolution(usually called super resolution, or SR for short). I am doing this series as a way to both teach you guys about it and also strengthen my understanding of it. So, without further ado, let's jump right into it

Assumptions

I would first of all start with the assumption that you know what a traditional neural network is, which is the basic of the convolutional neural network but with further tweaks and adjustments for a different task. If you don't know what a neural network is, I would highly recommend this video series on YouTube by 3Blue1Brown to learn about it.

Why CNN

Now that that's out of the way, why not just use regular neural network with nodes in layers and weights connecting nodes of different layers? Wouldn't that achieve similar result?

Enormous computing power

A normal colored image of dimension 1000x1000 pixels, consist of 3 channels, each representing the color red, blue and green. So you say, okay now we can have 3x1000x1000 nodes in our input layer. Okay, that's fine. But, what about the hidden layer. Let's say the first hidden layer has 100 nodes. The total weights would be 3x1000x1000x100= 300000000 weights, and that's just the connection between the input layer and the first layer! As a result of that, the image processing would be so slow that for example if we're building an image detection model using videos from cameras, the videos would lag noticeably.

To learn more complex patterns in images, the neural network would need more hidden layers and way more nodes in each layer, so the traditional neural network or artificial neural network (ANN) would not quite cut it for image related tasks.

Convolution operation

So how does CNN differ from ANN? ANN perform convolution operation on images instead of weights and nodes architecture in ANN. The basic idea of it is that you have a filter (or kernel) of fixed dimension, ranging from 1x1 to 5x5. Let's start with an example of an image of size 4x4 pixel (yes, I know it's pretty small) for simplicity's sake, and a filter of size 2x2:

For now, let's just put simple random numbers inside the kernel and image.

Convolution operation usually start by sliding the kernel over the image array and the dot product of the overlapped array is calculated and put in an output array:

Then, the process is repeated by sliding the kernel one step to the right (or any step actually, depend on your architecture) and repeat the process until the whole output array is filled. After all the element in the row is completed, slide down and to the left of the next row to continue the process:

Of course we give the kernel a random value for this example, but in an actual CNN architecture, the values inside the kernels are the weights that need to be adjusted based on training data by doing the process similar in ANN which is forward propagation, calculating the loss using suitable loss function, the performing backpropagation to update the weights and biases.

Small problems

However, there is a few problem with this. First, the output array dimension is significantly reduced from the original input image, resulting in many information loss over many iterations of it.

Also, not all pixel in the image array is used the same number of time for convolution operation. For example, the top left corner pixel only undergo convolution operation once which is the first one. As a result, the feature that may be in the corner may be given less importance, which may not be true for all images in the world right.

To combat this, a concept called padding is introduced

Padding

Padding is added to image array by putting 0's all over the frame of the array like this:

With this, the convolution operation would capture feature more effectively and the output array size would be bigger so it solve both the problems we discussed above.

Overfitting

However, we may encounter another problem which is overfitting, where our model do great with training data but but horribly on data it has never seen before. To counter this, an optional layer is added after the convolutional layer which is called the pooling layer

Pooling

Pooling works by having a fixed size array like kernel, but instead of having values inside them and doing dot product on overlapped part of the image array, we perform a specific operation depend on the type of pooling. There's 2 main types of pooling, the max pooling and average pooling

1. Max pooling

This is performed by just taking the maximum value of the overlapped image array with the pooling array, and store that value in the output array.

2. Average pooling

This is also kind of similar to max pooling, but instead of the maximum value, the average of all the overlapped elements is stored in the output array.

Activation functions

Finally, in CNN we also use activation functions like in ANN to introduce non-linearity. Common activation functions are ReLu and sigmoid.

Putting it all together

So in a CNN, we first have an input layer of 3 channels for images with colors, then we have the first layer which consist of the convolutional layer where the convolution operation with kernels is performed, and an optional pooling layer, then the second layer and so on until the output layer.

Conclusion

So that wraps up the basic of CNN or Convolutional Neural Network. If you would like to learn more, I highly recommend this video series by CodingLane, highly underrated in my opinion!

That's all from me for this post, stay tune for more and happy coding!

My first Mega Project: Chess Website

Adam — Sun, 21 Apr 2024 14:00:00 +0000

Hello everyone! Today I'm excited to announce my very first series, where I share everything I learned from completing my very first mega project(yes, I made up that term just now): DeezChess.

DeezChess is a full stack web application where the core functionality is to enable user to build their own custom bot by uploading game files as input, and they can play against the bot as well as save and share the bot so that other people can play it too.

Why

I am the type of persons who love to get hands one when learning new skills. As I was thinking what project to do to familiarize myself more with machine learning, immediately I thought of chess.com's custom bots. There are bots of various strengths but strangely they only updated new bots anytime there's an event or a major update. In other words, it's not directly possible for user to create their own bots. So, I decided to take on this challenge.

Project Description

My objective of this project is to create a chess website where players can either play chess like normal, or create their own custom bots by uploading necessary data to fine-tune a trained model on the cloud. I want to push this project to production(first time), and make sure the code is as maintainable and readable as possible. This is because I plan to make it open source once I'm satisfied with the main functionality as a way to give back to the community.

My plan

I set up a due date for myself to complete this project , which is on the 31st of December 2024. I chose this date because by that time I would be entering my first year as an undergraduate(hopefully in my dream course and university), and I want to have something to talk about when finding like-minded friends there, while also equipped myself with enough skills to be considered as an intern.

I know that sounds like a really ambitious plan as I literally only know how to use ReactJs barely, but hey if I don't challenge myself than who will right? However, I do realize that I won't be motivated the whole year so I want to aim to write blogpost of this series here whenever I have new findings or just general updates. I plan to update weekly though we'll see how that goes.

Technologies and tools I plan to use

Most of the things I list here I don't fully understand or haven't even touch yet but I know I need to pick up eventually so why not now right

Front-end

I want to use the latest NextJs app router framework for React as I already completed several projects using them(although it's not deployed) and I really like the way it handles routing, I mean the folder based routing is so easy to understand right? I also aim to fully utilize the server side rendering feature of NextJs since I didn't really care about it that much before but if I want to push it to production it better be optimized as much as possible.

Back-end

I want to use Firebase as the backend as the database because of it's ease to setup and use. I don't think this project will gain traffics that will exceed the free tier, so I guess I'll change the database if it's worth my time and effort. I'm also using GraphQL to enable the web application to interact with the pretrained model on the cloud on a separate API.

Machine Learning

-Since I know quite a lot of Python already, I decided to just use that but I also want to use PyTorch library whenever possible since I just learned it and want to practice with it more. I plan to put my pre-trained model on the cloud using providers like AWS ,Google Cloud or Microsoft Azure (haven't decided on which one yet. Any recommendation is appreciated!).

Challenges

For me at least, my biggest concern is the budget needed for this to be able to run properly on the cloud. I'll try to be in the limit of free tiers during development, and maybe a couple of bucks a month won't break my bank after that. Maybe I'm just way too ambitious and optimistic about this but we'll see.

Of course I also need to address the technical challenges. Thankfully, I recently just learned about Github projects and issues to better keep track of my progress, so I'll meticulously plan every details and milestones of my project so that I always know what I need to do that session.

Conclusion

I've never completed something of this scale before, while also learned so much new technology and tools to get it done within the due date. I'm not feeling overwhelmed yet though, but we'll see how my feelings develop over time. Any suggestions, advice or thoughts you have on this, feel free to comment and share this post! Until the next post, see you guys next time!

Hello World

Adam — Fri, 19 Apr 2024 13:50:47 +0000

Welcome everyone to my very first post! So excited to connect and contribute to the community!

My name is Adam Azuddin, I'm a student from Malaysia, and currently learning new technologies that interest me. I have a passion of learning and understanding things on a deeper level, as I value not only the knowledge but also the practical skills needed to implement it. Sounds like a lot of jargon coming from a newbie though, but writing this feels quite fun honestly.

As for my skills and backgrounds, I've learned and done quite a lot of personal projects on web dev using React and NextJs. I also joined a competitive programming competition in high school which is what got me first into programming where I first learned Python.

Currently, I'm focusing on building projects that integrate both web dev and AI, as I recently got heavily invested in Artificial Intelligence and Machine Learning. Completed Andrew Ng's first course on Coursera about machine learning which gave me the fundamentals, but I feel like it's not enough. So, I decided to dive deeper into neural networks, first through the 3Blue1Brown playlist about neural network, which also links to an awesome book by Michael Nielsen about the mathematical details of neural network. I also learned the basics of PyTorch from here. Although I didn't finish the whole video (it's literally about a day long), I decided to get my hands dirty and work on projects using PyTorch, which is the way I love to do to gain any new skills.

Currently, I'm planning my first mega project which integrate both web development and machine learning models to create a website that I can have pride on. I know it's an enormous mountain to climb, but even if I don't reach the top at least I got higher than if I didn't try at all right?

Would love any opportunity to connect with anyone, as I am just getting my foot in the door. You guys can contact me through my Linkedin, Github or email me at azuddinadam@gmail.com. Thank you and see you guys next time!