Gradient Descent is an algorithm for finding the minimum of a function. It iteratively calculates partial derivatives (gradients) of the function and descends in steps proportional to those partial derivatives. One major application of Gradient Descent is fitting a parameterized model to a set of data: the function to be minimized is an error function for the model.

880 questions
13 views

### Neural evolution combined to gradient descent

To optimize a model we can either use gradient descent to train a model with a dataset, or we can use neural evolution and generate only the first initialized weights without training them and use ...
35 views

### How do backpropagation works in CNN? [on hold]

Can somebody please explain how backpropagation works for a CNN with max pooling layer and some convolutional layer with stride more than 1 and padding? And is there a way to compute these gradients ...
44 views

### Neural Network Not Converging “anywhere”

In the past 2 weeks I've been trying to implement a Hand-Written-Digit Classifier with a Feed-Forward Neural Network, using the MNIST Database. The neural network uses Cross-Entropy loss, and Softmax ...
37 views

### Why would I choose a loss-function differing from my metrics?

When I look through tutorials in the internet or at models posted here at SO, I often see that the loss function differs from the metrics used to evaluate the model. This might look like: model....
34 views

### Tensorflow Haskell Linear Regression diverges

I've been looking into the [tensorflow haskell bindings. However, I struggle to get the basic linear regression example from the readme to work properly: it diverges on what seems to be a very easy ...
13 views

### Method to bound/constrain a matrix elementwise in TensorFlow?

In TensorFlow, I currently have a matrix that represents the weights between layers of a Neural Network. However, I am trying to implement some kind of projected/constrained gradient descent, where ...
23 views

### Steepest Descent Trace Behavior

I've written code that performs steepest descent on a quadratic form given by the formula: 1/2 * (x1^2 + gamma * x2^2). Mathematically, I am taking the equations given in Boyd's Convex Optimization ...
32 views

### My vectorization implementation of gradient descent does not get me the right answer

I'm currently working on Andrew Ng's gradient descent exercise using python but keeps getting me the wrong optimal theta. I followed this vectorization cheatsheet for gradient descent --- https://...
22 views

### Time of execution in Tensorflow higher than in Numpy

I'm reproducing a Gradient Descent example from Geron's Hands-On Machine Learning both with Tensorflow and Numpy. It's odd but even with GPU enabled, Numpy seems to be 5 to 6 times faster than ...
30 views

### unable to apply condition on output of custom layer using keras layers module

I want to apply a condition on the output of a dense layer. For this, I tried to customize the Dense layer of Keras but when I run my code I get the error ValueError: No gradients provided for any ...
27 views

### Short Definition of Backpropagation and Gradient Descent

I need to write a very short definition of backpropagation and gradient descent and I'm a bit confused what the difference is. Is the following definition correct?: For calculating the weights of ...
20 views

### Questions around XGBoost

I am trying to understand the XGBoost algorithm and have a few questions around it. I have read various blogs but all seem to tell a different story. Below is a snippet from the code that I am using (...
23 views

### Should I exit my gradient descent loop as soon as the cost increases?

I'm trying to learn machine learning so I'm taking a course and currently studying gradient descent for linear regression. I just learned that if the learning rate is small enough, the value returned ...
41 views

### Linear Regression - Implementing Feature Scaling

I was trying to implement Linear Regression in Octave 5.1.0 on a data set relating the GRE score to the probability of Admission. The data set is of the sort, 337 0.92 324 0.76 316 0.72 ...
31 views

### First gradient descent : how to normalize X and Y?

I'm doing my first gradient descent ever , following a course about Machine Learning. But it doesn't seem to work correctly as it oscillates (converges then diverges then converges ... ) and at the ...
25 views

### overflow of square funcion during gradient descent calcultion

i written the linear regression ( in one variable) along with gradient descent, it is working fine for smaller dataset, but for larger data set, it is giving error as: OverflowError: (34, 'Numerical ...
15 views

### Why can't Forward Stagewise Additive Modeling work with absolute loss function?

In Forward Stagewise Additive Modeling, if the loss function is squared loss, the next weak learner fits to the residual error. Why not we do like this when the loss function is absolute error or ...
33 views

### Numpy Concatenation different results

I wrote a little script that performs polynomial gradient descent, and I am trying to reduce the size of the code, even if it diminishes readability. This piece of code works fine for fitting a curve ...
20 views

### Why using RMSE as loss function in logistic regression takes non convex form but doesn't in linear regression?

I am taking this deep learning course from Andrew NG. In the 3rd lecture of 2nd week of the first course, he mentions that we can use RMSE for logistic regression as well but it will take a nonconvex ...
45 views

### Multiple linear regression with gradient descent

Halo, I'm new in machine learning and Python and I want to predict the Kaggle House Sales in King County dataset with my gradient descent. I'm splitting 70% (15k rows) training and 30% (6k rows) ...
36 views

### Initialize neural network weights with Tensorflow

I am developing a neural network model using Tensorflow. In the LOSO cross validation, I need to train a model for 10 folds, since I have data from 10 different subjects. Taking this into account, I ...
19 views

### Odd behavior of cost over time with SGD

I am relatively new to ML/DL and have been trying to improve my skills by making a model that learns the MNIST data set without TF or keras. I have 784 input nodes, 2 hidden layers of 16 neurons each, ...
45 views

### Exploding gradient for gpflow SVGP

When optimizing a SVGP with Poisson Likelihood for a big data set I see what I think are exploding gradients. After a few epochs I see a spiky drop of the ELBO, which then very slowly recovers after ...
36 views

### Curve fitting with gradient descent

I wrote some code that performs gradient descent on a couple of data points. For some reason the curve is not converging correctly, but I have no idea why that is. I always end up with an exploding ...
24 views

### sgdclassifier not giving optimal result as logistic regression

I am training a dataset with sklearns's LogisticRegression and SGDclassifier with log as loss function. And I am using Logloss as my evaluation metric. But with SGDclassifier it is giving very high ...
26 views

### How do I properly train and predict value like biomass using GRU RNN?

My first time trying to train a dataset containing 8 variables in a time-series of 20 years or so using GRU RNN. The biomass value is what I'm trying to predict based on the other variables. I'm ...
13 views

### gradient descent with momemtum formula

Following is from deep learning course from Andrew Ng SGD with momemtum. In implementation details professor mentioned as below v(dw) = beta * v(dw) + (1-beta)dw v(db) = beta * v(db) + (1-beta)db W ...
36 views

### Problem while tring to write a vectorized matrix notation for the gradient descent algorithm

I was trying to write a vectorized notation for the iterative process of converging theta values in gradient descent algorithm. I found the vector notation but for some reason, the values are not ...
41 views

### How do I get the right amount of change to the slope for my linear regression?

I want to program a linear regression with Processing. But I got mixed up which parameters I have to multiply and then add or subtract from my slope. I have tried to change the parameters (make them ...
115 views

### Correct backpropagation in simple perceptron

Given the simple OR gate problem: or_input = np.array([[0,0], [0,1], [1,0], [1,1]]) or_output = np.array([[0,1,1,1]]).T If we train a simple single-layered perceptron (without backpropagation), we ...
25 views

### What will be the value of sub gradient at 0 for function |x|

I am learning about Lasso Regression and came across taking gradient with respect to 0. I came to know about subgradient but could not understand what will be it's value at 0. In lasso regression, we ...
41 views

### Why isn't my gradient descent algorithm working?

I made a gradient descent algorithm in Python and it doesn't work. My m and b values keep increasing and never stop until I get the -inf error or the overflow encountered in square error. import ...
10 views

### Reason for convergence of gradient descent with standard scaler

I have two queries Will the interpretation of the coefficients be same before and after the sklearn standard scaler StandardScaler() How things change after using standard scaler that the gradient ...
19 views

### How to do a Gradient Search to Minimize the Coherence of this Matrix?

thanks for looking into my question. I'm trying to create a function that takes in a matrix, and after a little bit returns a minimal value of a specific calculation. To be precise, I need something ...
31 views

### Best Way to Overcome Early Convergence for Machine Learning Model

I have a machine learning model built that tries to predict weather data, and in this case I am doing a prediction on whether or not it will rain tomorrow (a binary prediction of Yes/No). In the ...
37 views

### Gradient descent loop producing NaN in Matlab

I'm running a gradient descent loop to minimize a function, but my parameter vector w is calculated as NaN while it should be a numerical vector. This means that at some point the function is going to ...
30 views

### Parameter representation python

I know this can be interpreted as vague but I have to ask. I am working on a problem where theta is denoted by θ = {W,C,b} where W is a matrix of size d*d, C is a vector belonging to R^d and b is a ...
28 views

### gensim Word2Vec - how to apply stochastic gradient descent?

To my understanding, batch (vanilla) gradient descent makes one parameter update for all training data. Stochastic gradient descent (SGD) allows you to update parameter for each training sample, ...
34 views

### How can I supply custom gradient to torch.optim.LBFGS?

I have some research task for logistic-like regression: import torch import math import numpy as np from sklearn.datasets import make_moons from matplotlib import pyplot from pandas import DataFrame ...
44 views

### Stopping criteria/rule for ADAM optimization in pytorch?

In the code below, we define two functions, and then do some optimization using adam and pytorch. The code seems to work. However, we do a pre-defined number of iterations for the adam optimization (...
36 views

### how is local minima possible in gradient descent?

gradient descent works on the equation of mean squared error, which is an equation of a parabola y=x^2 we often say weight adjustment in a neural network by gradient descent algorithm can hit a local ...
40 views

### Why are gradient descent results so far off lm results?

I´m playing around with the gradDescent package on some made up data to get a feel for it. As I understand it, I should be getting similar results from both linear regression and gradient descent when ...
19 views

### backpropagation with more than one node per layer

I read this article about how backpropagation works, and I understood everything they said. They said that to find the gradient we have to take a partial derivative of the cost function to each weight/...
30 views

### Why am I getting a negative cost function for logistic regression using gradient descent in python?

I'm trying to apply what I've learned in Andrew Ng's Coursera course. I've successfully implemented this same algorithm the same way I'm doing it here on the Kaggle Titanic Dataset, but now with this ...
40 views

### Minimizing Function with vector valued input in MATLAB

I want to minimize a function like below: Here, n can be 5,10,50 etc. I want to use Matlab and want to use Gradient Descent and Quasi-Newton Method with BFGS update to solve this problem along with ...
40 views

### If one captures gradient with Optimizer, will it calculate twice the gradient?

I recently have some training performance bottleneck. I always add a lot of histograms in the summary. I want to know if by calculating gradients first then re-minimizing the lose will calculate twice ...
42 views

### feed forward neural network fails to classify due to dimensionality of biases

I'm making a basic feed forward neural network to solve XOR gate problem. Standard settings: input layer + hidden layer + output layer, constant learning rate of 0.01 and number of epochs is 500. ...
46 views

### How exactly works this simple calculus of a ML gradient descent cost function using Octave\MatLab?

I am following a machine learning course on Coursera and I am doing the following exercise using Octave (MatLab should be the same). The exercise is related to the calculation of the cost function ...