<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Samarth BC</title>
    <description>The latest articles on Forem by Samarth BC (@samarth_bc).</description>
    <link>https://forem.com/samarth_bc</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3012574%2F1a7bce6f-162c-4ae9-bfbf-2f790010c5f6.jpg</url>
      <title>Forem: Samarth BC</title>
      <link>https://forem.com/samarth_bc</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/samarth_bc"/>
    <language>en</language>
    <item>
      <title>How a neural network can learn any function</title>
      <dc:creator>Samarth BC</dc:creator>
      <pubDate>Sun, 20 Apr 2025 05:12:39 +0000</pubDate>
      <link>https://forem.com/samarth_bc/how-a-neural-network-can-learn-any-function-26oa</link>
      <guid>https://forem.com/samarth_bc/how-a-neural-network-can-learn-any-function-26oa</guid>
      <description>&lt;p&gt;This blog focuses on showing how a neural network can learn any continuous function, The Universal Approximation Theorem.&lt;/p&gt;

&lt;h2&gt;
  
  
  Universal Approximation Theorem
&lt;/h2&gt;

&lt;p&gt;Formally, Universal Approximation Theorem states that,&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa6gm9u8jzyqd2r85kp90.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa6gm9u8jzyqd2r85kp90.png" alt="Universal Approximation Theorem" width="641" height="325"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In simple words, it means that,&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;A feedforward neural network with a single hidden layer containing a finite number of neurons can approximate any continuous function on compact subsets of ℝⁿ, under mild assumptions on the activation function.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This blog aims in giving a visualization on how the theorem works.&lt;/p&gt;

&lt;h2&gt;
  
  
  Perceptron
&lt;/h2&gt;

&lt;p&gt;A perceptron is the most fundamental part of deep learning.&lt;/p&gt;

&lt;p&gt;Perceptron is a unit, that takes in ‘n’ inputs say x1, x2…. xn and each input has a weight associated with it say w1, w2, w3…wn and a bias term say ‘b’.&lt;/p&gt;

&lt;p&gt;The perceptron has two parts, a summation part that adds up all the weight * input and another layer that has an activation layer that determines the final output of the perceptron after applying non linearity.&lt;/p&gt;

&lt;p&gt;Lets consider a simple activation function (step function)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3f32sfvswn91mt38ac0c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3f32sfvswn91mt38ac0c.png" alt="Step funtion" width="347" height="94"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Lets consider a simple perceptron with two inputs x1, x2 ∈ {0,1} and corresponding weights w1, w2.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzlzuarbz3eqeyuld66ja.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzlzuarbz3eqeyuld66ja.png" alt="Perceptron" width="313" height="392"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The perceptron can be used to learn 2 input boolean function. There are 16 two input boolean functions.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm43y7n99cbdt4yjkiiro.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm43y7n99cbdt4yjkiiro.png" alt="2 input Boolean functions" width="800" height="97"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Learning 2 input boolean functions using perceptron:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F69nwmmlbiuef4afiv9wn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F69nwmmlbiuef4afiv9wn.png" alt="Perceptron learning 2 input boolean functions" width="688" height="373"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the above image,&lt;br&gt;
θ : b&lt;br&gt;
x1, x2,x3 : inputs&lt;br&gt;
Step function is used for σ&lt;/p&gt;

&lt;p&gt;We can notice that, a perceptron with step function as activation is basically a function that separates the plane into two parts and outputs 0 for one half and 1 for the other.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgdon95iebo758l5se281.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgdon95iebo758l5se281.png" alt="Learning AND" width="267" height="250"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiqo4ynhce5zz7ssxfxm9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiqo4ynhce5zz7ssxfxm9.png" alt="Learning OR" width="271" height="243"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Basically, a perceptron can be used to represent a ‘linearly separable’ function.&lt;br&gt;
Now.., are all the 16 two input binary functions linearly separable?&lt;br&gt;
No.&lt;/p&gt;

&lt;p&gt;Let us consider the case of XOR function,&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flopo6b6sf2yqb8on5iey.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flopo6b6sf2yqb8on5iey.png" alt="XOR" width="654" height="275"&gt;&lt;/a&gt;&lt;br&gt;
We can see, the function is not linearly separable. Hence, we cannot use a single perceptron to learn XOR. Among all the two input binary functions, only XOR and XNOR are non linearly separable.&lt;/p&gt;

&lt;h2&gt;
  
  
  How a network of perceptrons can solve this issue.
&lt;/h2&gt;

&lt;p&gt;As per the universal approximation theorem, a network with single layer of perceptrons should be able to learn all the functions.&lt;/p&gt;

&lt;p&gt;Consider the below network:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7eazkhp3g87l3rm5gx5q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7eazkhp3g87l3rm5gx5q.png" alt="Perfect perceptron network" width="452" height="438"&gt;&lt;/a&gt;&lt;br&gt;
Lets understand whats going on,&lt;br&gt;
Let us name the perceptrons as P1, P2, P3, P4.&lt;br&gt;
P1 is activated only if x1, x2 are -1, -1&lt;br&gt;
P2 is activated only if x1, x2 are -1, 1&lt;br&gt;
P3 is activated only if x1, x2 are 1, -1&lt;br&gt;
P4 is activated only if x1, x2 are 1, 1&lt;/p&gt;

&lt;p&gt;So, at one given input, only one perceptron is activated.&lt;br&gt;
In a sense, its like, the input space is divided into multiple parts and divided among the perceptrons.&lt;br&gt;
This network can learn any two input boolean function linearly separable or not.&lt;/p&gt;

&lt;p&gt;Similarly, this is for 3 input boolean functions,&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcj4d6lzlwbyoe6axwrrn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcj4d6lzlwbyoe6axwrrn.png" alt="Perfect 3 input network" width="800" height="394"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Feed forward neural networks
&lt;/h2&gt;

&lt;p&gt;So.., how can we relate whatever we understood until now to a real valued continuous function that can be learnt by a feed forward neural network?&lt;/p&gt;

&lt;p&gt;Let us try to build a layer that can learn the function using the observations we made before.&lt;/p&gt;

&lt;p&gt;In a feed forward neural network, sigmoidal neurons are used ie sigmoid function is used as activation.&lt;/p&gt;

&lt;p&gt;We somehow have to make it to that one neuron activates only if the input lies in its input space.&lt;/p&gt;

&lt;p&gt;Even if that is somehow taken care of, we have to make it so that the neuron learns only the inputs in that input space?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvbm6sqd9uw1zyd0wra8s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvbm6sqd9uw1zyd0wra8s.png" alt="Function with bars" width="708" height="202"&gt;&lt;/a&gt;&lt;br&gt;
Considering an arbitrary function, we can divide the function into multiple parts and the number of parts is proportional to the number of neurons in the layer.&lt;br&gt;
Notice, how many ever finite parts we divide the function into, there is always an error which the Universal Approximation theorem considers as ε.&lt;/p&gt;

&lt;p&gt;How can a sigmoidal neuron be used to replicate the above required scenario?&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frc7cp48k4xfng1d9ox7t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frc7cp48k4xfng1d9ox7t.png" alt="Sigmoidal Neuron" width="800" height="280"&gt;&lt;/a&gt;&lt;br&gt;
This is the sigmoid function and we know that, tweaking the ‘w’ changes the slope and tweaking the ‘b’ changes the position of the function on x axis.&lt;/p&gt;

&lt;p&gt;Increasing the value of w, we get something like this,&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmmmfvnq03dwr19h3ldu2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmmmfvnq03dwr19h3ldu2.png" alt="Sigmoid with high w" width="800" height="326"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Consider two sigmoidal functions y1, y2,&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyp3vv8td4xojjd2mz8af.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyp3vv8td4xojjd2mz8af.png" alt="Two neurons" width="620" height="399"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Considering y3 = y2 - y1,&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwb9mjcv243g7r063p4wt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwb9mjcv243g7r063p4wt.png" alt="Tower function" width="655" height="452"&gt;&lt;/a&gt;&lt;br&gt;
We got what we wanted !!!&lt;br&gt;
This creates a tower like structure for required input range (tweaking b), by using multiple sigmoidal neurons like this, we can approximately create the required function.&lt;/p&gt;

&lt;p&gt;Let us call this function that creates the required structure as “tower function”&lt;/p&gt;

&lt;p&gt;Representing the tower function as a network.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Figxul4arx3y5ddnip8i0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Figxul4arx3y5ddnip8i0.png" alt="Tower function using network" width="351" height="203"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Finally,&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmqb887i07qly59eiuo8y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmqb887i07qly59eiuo8y.png" alt="Final representation" width="273" height="301"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Using multiple of these tower functions, each tower gets activated only if the input lies in its input range. Same as what happened in the case of perceptrons  while learning two input boolean functions.&lt;/p&gt;

&lt;p&gt;This concept can be extended to 3D space, where 3D tower functions can approximately represent the 3D function.&lt;/p&gt;

&lt;p&gt;Creating 3D tower functions is visualized below,&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9gnf10fp7sm9na3z17dw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9gnf10fp7sm9na3z17dw.png" alt="3D tower p1" width="702" height="369"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzlunx684rkwkwdurb37t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzlunx684rkwkwdurb37t.png" alt="3D tower p2" width="280" height="311"&gt;&lt;/a&gt;&lt;br&gt;
To get the tower, we have to extract the red part, ie if its 1 I want it else I dont. We can pass it through a sigmoid closest to step function to achieve this.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr4ritrnc07z2ragummre.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr4ritrnc07z2ragummre.png" alt="3D tower p3" width="263" height="520"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Appending everything together, this is the steps to create a 3D tower function.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcfr0wkfz2souz1xpmauz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcfr0wkfz2souz1xpmauz.png" alt="Final 3D" width="500" height="315"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Using multiple of these tower functions, we can approximately represent a function using neurons within an error ε given enough neurons.&lt;/p&gt;

&lt;p&gt;Thank you !!!&lt;/p&gt;

&lt;p&gt;Upcoming blog : Representing sin(x) using tower functions with visualization&lt;/p&gt;

&lt;p&gt;References:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Deep Learning - IIT Ropar (NPTEL) by Mitesh M Khapra&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.deep-mind.org/2023/03/26/the-universal-approximation-theorem/#Universal_Approximation_Theorem" rel="noopener noreferrer"&gt;https://www.deep-mind.org/2023/03/26/the-universal-approximation-theorem/#Universal_Approximation_Theorem&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://neuralnetworksanddeeplearning.com/chap4.html" rel="noopener noreferrer"&gt;http://neuralnetworksanddeeplearning.com/chap4.html&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://en.wikipedia.org/wiki/Universal_approximation_theorem#Arbitrary-width_case" rel="noopener noreferrer"&gt;https://en.wikipedia.org/wiki/Universal_approximation_theorem#Arbitrary-width_case&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>deeplearning</category>
      <category>machinelearning</category>
    </item>
  </channel>
</rss>
