# Questions tagged [gradient-descent]

Gradient Descent is an algorithm for finding the minimum of a function. It iteratively calculates partial derivatives (gradients) of the function and descends in steps proportional to those partial derivatives. One major application of Gradient Descent is fitting a parameterized model to a set of data: the function to be minimized is an error function for the model.

**0**

votes

**0**answers

13 views

### Neural evolution combined to gradient descent

To optimize a model we can either use gradient descent to train a model with a dataset, or we can use neural evolution and generate only the first initialized weights without training them and use ...

**0**

votes

**2**answers

35 views

### How do backpropagation works in CNN? [on hold]

Can somebody please explain how backpropagation works for a CNN with max pooling layer and some convolutional layer with stride more than 1 and padding? And is there a way to compute these gradients ...

**1**

vote

**0**answers

44 views

### Neural Network Not Converging “anywhere”

In the past 2 weeks I've been trying to implement a Hand-Written-Digit Classifier with a Feed-Forward Neural Network, using the MNIST Database.
The neural network uses Cross-Entropy loss, and Softmax ...

**0**

votes

**2**answers

37 views

### Why would I choose a loss-function differing from my metrics?

When I look through tutorials in the internet or at models posted here at SO, I often see that the loss function differs from the metrics used to evaluate the model. This might look like:
model....

**0**

votes

**0**answers

34 views

### Tensorflow Haskell Linear Regression diverges

I've been looking into the [tensorflow haskell
bindings. However, I struggle to get the
basic linear regression example from the
readme to work properly:
it diverges on what seems to be a very easy ...

**0**

votes

**0**answers

13 views

### Method to bound/constrain a matrix elementwise in TensorFlow?

In TensorFlow, I currently have a matrix that represents the weights between layers of a Neural Network. However, I am trying to implement some kind of projected/constrained gradient descent, where ...

**1**

vote

**1**answer

23 views

### Steepest Descent Trace Behavior

I've written code that performs steepest descent on a quadratic form given by the formula: 1/2 * (x1^2 + gamma * x2^2). Mathematically, I am taking the equations given in Boyd's Convex Optimization ...

**0**

votes

**0**answers

32 views

### My vectorization implementation of gradient descent does not get me the right answer

I'm currently working on Andrew Ng's gradient descent exercise using python but keeps getting me the wrong optimal theta. I followed this vectorization cheatsheet for gradient descent --- https://...

**1**

vote

**0**answers

22 views

### Time of execution in Tensorflow higher than in Numpy

I'm reproducing a Gradient Descent example from Geron's Hands-On Machine Learning both with Tensorflow and Numpy. It's odd but even with GPU enabled, Numpy seems to be 5 to 6 times faster than ...

**1**

vote

**0**answers

30 views

### unable to apply condition on output of custom layer using keras layers module

I want to apply a condition on the output of a dense layer. For this, I tried to customize the Dense layer of Keras but when I run my code I get the error
ValueError: No gradients provided for any ...

**-1**

votes

**1**answer

27 views

### Short Definition of Backpropagation and Gradient Descent

I need to write a very short definition of backpropagation and gradient descent and I'm a bit confused what the difference is.
Is the following definition correct?:
For calculating the weights of ...

**1**

vote

**0**answers

20 views

### Questions around XGBoost

I am trying to understand the XGBoost algorithm and have a few questions around it.
I have read various blogs but all seem to tell a different story. Below is a snippet from the code that I am using (...

**0**

votes

**1**answer

23 views

### Should I exit my gradient descent loop as soon as the cost increases?

I'm trying to learn machine learning so I'm taking a course and currently studying gradient descent for linear regression. I just learned that if the learning rate is small enough, the value returned ...

**1**

vote

**1**answer

41 views

### Linear Regression - Implementing Feature Scaling

I was trying to implement Linear Regression in Octave 5.1.0 on a data set relating the GRE score to the probability of Admission.
The data set is of the sort,
337 0.92
324 0.76
316 0.72
...

**0**

votes

**1**answer

31 views

### First gradient descent : how to normalize X and Y?

I'm doing my first gradient descent ever , following a course about Machine Learning.
But it doesn't seem to work correctly as it oscillates (converges then diverges then converges ... ) and at the ...

**1**

vote

**0**answers

25 views

### overflow of square funcion during gradient descent calcultion

i written the linear regression ( in one variable) along with gradient descent, it is working fine for smaller dataset, but for larger data set, it is giving error as:
OverflowError: (34, 'Numerical ...

**0**

votes

**0**answers

15 views

### Why can't Forward Stagewise Additive Modeling work with absolute loss function?

In Forward Stagewise Additive Modeling, if the loss function is squared loss, the next weak learner fits to the residual error.
Why not we do like this when the loss function is absolute error or ...

**0**

votes

**0**answers

33 views

### Numpy Concatenation different results

I wrote a little script that performs polynomial gradient descent, and I am trying to reduce the size of the code, even if it diminishes readability.
This piece of code works fine for fitting a curve ...

**0**

votes

**0**answers

20 views

### Why using RMSE as loss function in logistic regression takes non convex form but doesn't in linear regression?

I am taking this deep learning course from Andrew NG. In the 3rd lecture of 2nd week of the first course, he mentions that we can use RMSE for logistic regression as well but it will take a nonconvex ...

**1**

vote

**1**answer

45 views

### Multiple linear regression with gradient descent

Halo,
I'm new in machine learning and Python and I want to predict the Kaggle House Sales in King County dataset with my gradient descent.
I'm splitting 70% (15k rows) training and 30% (6k rows) ...

**1**

vote

**1**answer

36 views

### Initialize neural network weights with Tensorflow

I am developing a neural network model using Tensorflow. In the LOSO cross validation, I need to train a model for 10 folds, since I have data from 10 different subjects.
Taking this into account, I ...

**-2**

votes

**1**answer

19 views

### Odd behavior of cost over time with SGD

I am relatively new to ML/DL and have been trying to improve my skills by making a model that learns the MNIST data set without TF or keras. I have 784 input nodes, 2 hidden layers of 16 neurons each, ...

**0**

votes

**2**answers

45 views

### Exploding gradient for gpflow SVGP

When optimizing a SVGP with Poisson Likelihood for a big data set I see what I think are exploding gradients.
After a few epochs I see a spiky drop of the ELBO, which then very slowly recovers after ...

**1**

vote

**1**answer

36 views

### Curve fitting with gradient descent

I wrote some code that performs gradient descent on a couple of data points.
For some reason the curve is not converging correctly, but I have no idea why that is. I always end up with an exploding ...

**0**

votes

**0**answers

24 views

### sgdclassifier not giving optimal result as logistic regression

I am training a dataset with sklearns's LogisticRegression and SGDclassifier with log as loss function.
And I am using Logloss as my evaluation metric.
But with SGDclassifier it is giving very high ...

**0**

votes

**1**answer

26 views

### How do I properly train and predict value like biomass using GRU RNN?

My first time trying to train a dataset containing 8 variables in a time-series of 20 years or so using GRU RNN. The biomass value is what I'm trying to predict based on the other variables. I'm ...

**0**

votes

**1**answer

13 views

### gradient descent with momemtum formula

Following is from deep learning course from Andrew Ng
SGD with momemtum. In implementation details professor mentioned as below
v(dw) = beta * v(dw) + (1-beta)dw
v(db) = beta * v(db) + (1-beta)db
W ...

**0**

votes

**1**answer

36 views

### Problem while tring to write a vectorized matrix notation for the gradient descent algorithm

I was trying to write a vectorized notation for the iterative process of converging theta values in gradient descent algorithm. I found the vector notation but for some reason, the values are not ...

**1**

vote

**1**answer

41 views

### How do I get the right amount of change to the slope for my linear regression?

I want to program a linear regression with Processing. But I got mixed up which parameters I have to multiply and then add or subtract from my slope.
I have tried to change the parameters (make them ...

**7**

votes

**0**answers

115 views

### Correct backpropagation in simple perceptron

Given the simple OR gate problem:
or_input = np.array([[0,0], [0,1], [1,0], [1,1]])
or_output = np.array([[0,1,1,1]]).T
If we train a simple single-layered perceptron (without backpropagation), we ...

**0**

votes

**0**answers

25 views

### What will be the value of sub gradient at 0 for function |x|

I am learning about Lasso Regression and came across taking gradient with respect to 0. I came to know about subgradient but could not understand what will be it's value at 0.
In lasso regression, we ...

**-1**

votes

**2**answers

41 views

### Why isn't my gradient descent algorithm working?

I made a gradient descent algorithm in Python and it doesn't work. My m and b values keep increasing and never stop until I get the -inf error or the overflow encountered in square error.
import ...

**0**

votes

**0**answers

10 views

### Reason for convergence of gradient descent with standard scaler

I have two queries
Will the interpretation of the coefficients be same before and after the sklearn standard scaler StandardScaler()
How things change after using standard scaler that the gradient ...

**0**

votes

**0**answers

19 views

### How to do a Gradient Search to Minimize the Coherence of this Matrix?

thanks for looking into my question. I'm trying to create a function that takes in a matrix, and after a little bit returns a minimal value of a specific calculation. To be precise, I need something ...

**-1**

votes

**2**answers

31 views

### Best Way to Overcome Early Convergence for Machine Learning Model

I have a machine learning model built that tries to predict weather data, and in this case I am doing a prediction on whether or not it will rain tomorrow (a binary prediction of Yes/No).
In the ...

**1**

vote

**0**answers

37 views

### Gradient descent loop producing NaN in Matlab

I'm running a gradient descent loop to minimize a function, but my parameter vector w is calculated as NaN while it should be a numerical vector. This means that at some point the function is going to ...

**1**

vote

**0**answers

30 views

### Parameter representation python

I know this can be interpreted as vague but I have to ask. I am working on a problem where theta is denoted by θ = {W,C,b} where W is a matrix of size d*d, C is a vector belonging to R^d and b is a ...

**0**

votes

**1**answer

28 views

### gensim Word2Vec - how to apply stochastic gradient descent?

To my understanding, batch (vanilla) gradient descent makes one parameter update for all training data. Stochastic gradient descent (SGD) allows you to update parameter for each training sample, ...

**0**

votes

**0**answers

34 views

### How can I supply custom gradient to torch.optim.LBFGS?

I have some research task for logistic-like regression:
import torch
import math
import numpy as np
from sklearn.datasets import make_moons
from matplotlib import pyplot
from pandas import DataFrame
...

**0**

votes

**0**answers

44 views

### Stopping criteria/rule for ADAM optimization in pytorch?

In the code below, we define two functions, and then do some optimization using adam and pytorch. The code seems to work. However, we do a pre-defined number of iterations for the adam optimization (...

**0**

votes

**0**answers

36 views

### how is local minima possible in gradient descent?

gradient descent works on the equation of mean squared error, which is an equation of a parabola y=x^2
we often say weight adjustment in a neural network by gradient descent algorithm can hit a local ...

**1**

vote

**0**answers

40 views

### Why are gradient descent results so far off lm results?

I´m playing around with the gradDescent package on some made up data to get a feel for it. As I understand it, I should be getting similar results from both linear regression and gradient descent when ...

**-2**

votes

**1**answer

19 views

### backpropagation with more than one node per layer

I read this article about how backpropagation works, and I understood everything they said. They said that to find the gradient we have to take a partial derivative of the cost function to each weight/...

**0**

votes

**1**answer

30 views

### Why am I getting a negative cost function for logistic regression using gradient descent in python?

I'm trying to apply what I've learned in Andrew Ng's Coursera course. I've successfully implemented this same algorithm the same way I'm doing it here on the Kaggle Titanic Dataset, but now with this ...

**1**

vote

**1**answer

40 views

### Minimizing Function with vector valued input in MATLAB

I want to minimize a function like below:
Here, n can be 5,10,50 etc. I want to use Matlab and want to use Gradient Descent and Quasi-Newton Method with BFGS update to solve this problem along with ...

**0**

votes

**1**answer

40 views

### If one captures gradient with Optimizer, will it calculate twice the gradient?

I recently have some training performance bottleneck. I always add a lot of histograms in the summary. I want to know if by calculating gradients first then re-minimizing the lose will calculate twice ...

**0**

votes

**0**answers

42 views

### feed forward neural network fails to classify due to dimensionality of biases

I'm making a basic feed forward neural network to solve XOR gate problem.
Standard settings: input layer + hidden layer + output layer, constant learning rate of 0.01 and number of epochs is 500.
...

**0**

votes

**1**answer

46 views

### How exactly works this simple calculus of a ML gradient descent cost function using Octave\MatLab?

I am following a machine learning course on Coursera and I am doing the following exercise using Octave (MatLab should be the same).
The exercise is related to the calculation of the cost function ...

**1**

vote

**0**answers

96 views

### Gradient Descent Overshooting and Cost Blowing Up when used for Regularized Logistic Regression

I'm using MATLAB to code Regularized Logistic Regression and am using Gradient Descent to discover the parameters. All is based on Andrew Ng's Coursera Machine Learning course. I am trying to code the ...

**1**

vote

**1**answer

66 views

### Getting gradient descent to work in octave.(Andrew ng's machine learn course, excersise 1)

So i am trying to implement/solve the first programming excersise from Andrew ng`s machine learn cours on coursera.
I have trouble implementing linear gradient descent(for one variable) in octave. I ...