Gradient Descent Algorithm is an iterative algorithm to find a Global Minimum of an objective function (cost function) J(?). The categorization of GD algorithm is for accuracy and time consuming factors that are discussed below in detail. This algorithm is widely used in machine learning for minimization of functions. Here,the algorithm to achieve objective goal of picture below is in this tutorial below.
We use gradient descent to minimize the functions like J(?). In gradient descent, our first step is to initialize the parameters by some value and keep changing these values till we reach the global minimum. In this algorithm, we calculate the derivative of cost function in every iteration and update the values of parameters simultaneously using the formula:
where ‘?’ is the learning rate.
We will consider linear regression for algorithmic example in this article while talking about gradient descent, although the ideas apply to other algorithms too, such as
In linear regression we have a hypothesis function:\(H(X)=\theta_0+\theta_1X_1+\ldots +\theta_nX_n\)
Where \(\theta_0,\theta_1,\ldots, \theta_n\) are parameters and \(X_1,\ldots, X_n\) are input features. In order to solve the model, we try to find the parameter, such that the hypothesis fits the model in the best possible way.To find the value of parameters we develop a cost function J(\(\theta\)) and use gradient descent to minimize this function.
The following pseudo-code explains the working :
Various variants of gradient descent are defined on the basis of how we use the data to calculate derivative of cost function in gradient descent. Depending upon the amount of data used, the time complexity and accuracy of the algorithms differs with each other.
It is the first basic type of gradient descent in which we use the complete dataset available to compute the gradient of cost function.
As we need to calculate the gradient on the whole dataset to perform just one update, batch gradient descent can be very slow and is intractable for datasets that don’t fit in memory. After initializing the parameter with arbitrary values we calculate gradient of cost function using following relation:
Batch gradient descent is not suitable for huge datasets. The code below explains implementing gradient descent in python.
Batch Gradient Descent turns out to be a slower algorithm. So, for faster computation, we prefer to use stochastic gradient descent.
The first step of algorithm is to randomize the whole training set. Then, for updation of every parameter we use only one training example in every iteration to compute the gradient of cost function. As it uses one training example in every iteration this algo is faster for larger data set. In SGD, one might not achieve accuracy, but the computation of results are faster.
After initializing the parameter with arbitrary values we calculate gradient of cost function using following relation:
Following is the pseudo code for stochastic gradient descent:
SGD Never actually converges like batch gradient descent does,but ends up wandering around some region close to the global minimum.
Mini batch algorithm is the most favorable and widely used algorithm that makes precise and faster results using a batch of ‘ m ‘ training examples. In mini batch algorithm rather than using the complete data set, in every iteration we use a set of ‘m’ training examples called batch to compute the gradient of the cost function. Common mini-batch sizes range between 50 and 256, but can vary for different applications.
In this way, algorithm
After initializing the parameter with arbitrary values we calculate gradient of cost function using following relation:
where ‘ b ‘ is number of batches and ‘ m ‘ is number training examples.
Some of the important points to remember are:
” J(?) should decrease after every iteration and should become constant (or converge ) after some iterations.”
Above statement is because after every iteration of gradient descent and ? takes values such that J(?) moves towards depth i.e. value of J(?) decreases after every iteration.
In this article, we learned about the basics of gradient descent algorithm and its types. These optimization algorithms are being widely used in neural networks these days. Hence, it’s important to learn. The image below shows a quick comparison in all 3 types of gradient descent algorithms:
A SWOT analysis is a business strategy to assess the Strengths, Weaknesses, Opportunities and Threats…
Seasonal hiring can be a daunting task, whether it is peak accounting season for finance…
The hiring landscape has seen a paradigm shift in terms of diversity in people, talent,…
The introduction of AI in recruitment has revolutionized how hiring workflows are designed. It paved…
Today’s era has seen a steep increase in the use of technology in hiring and…
Today’s job landscape has become increasingly competitive for both job seekers and recruiters. One of…