Gradient Descent is an algorithm for finding the minimum of a function. It iteratively calculates partial derivatives (gradients) of the function and descends in steps proportional to those partial derivatives. One major application of Gradient Descent is fitting a parameterized model to a set of data: the function to be minimized is an error function for the model.

880 questions
13 views

Neural evolution combined to gradient descent

To optimize a model we can either use gradient descent to train a model with a dataset, or we can use neural evolution and generate only the first initialized weights without training them and use ...
35 views

How do backpropagation works in CNN? [on hold]

Can somebody please explain how backpropagation works for a CNN with max pooling layer and some convolutional layer with stride more than 1 and padding? And is there a way to compute these gradients ...
44 views

Neural Network Not Converging “anywhere”

In the past 2 weeks I've been trying to implement a Hand-Written-Digit Classifier with a Feed-Forward Neural Network, using the MNIST Database. The neural network uses Cross-Entropy loss, and Softmax ...
37 views

Why would I choose a loss-function differing from my metrics?

When I look through tutorials in the internet or at models posted here at SO, I often see that the loss function differs from the metrics used to evaluate the model. This might look like: model....
34 views

I've been looking into the [tensorflow haskell bindings. However, I struggle to get the basic linear regression example from the readme to work properly: it diverges on what seems to be a very easy ...
13 views

Method to bound/constrain a matrix elementwise in TensorFlow?

In TensorFlow, I currently have a matrix that represents the weights between layers of a Neural Network. However, I am trying to implement some kind of projected/constrained gradient descent, where ...
23 views

Steepest Descent Trace Behavior

I've written code that performs steepest descent on a quadratic form given by the formula: 1/2 * (x1^2 + gamma * x2^2). Mathematically, I am taking the equations given in Boyd's Convex Optimization ...
32 views

My vectorization implementation of gradient descent does not get me the right answer

I'm currently working on Andrew Ng's gradient descent exercise using python but keeps getting me the wrong optimal theta. I followed this vectorization cheatsheet for gradient descent --- https://...
22 views

Time of execution in Tensorflow higher than in Numpy

I'm reproducing a Gradient Descent example from Geron's Hands-On Machine Learning both with Tensorflow and Numpy. It's odd but even with GPU enabled, Numpy seems to be 5 to 6 times faster than ...
30 views

unable to apply condition on output of custom layer using keras layers module

I want to apply a condition on the output of a dense layer. For this, I tried to customize the Dense layer of Keras but when I run my code I get the error ValueError: No gradients provided for any ...
27 views

Short Definition of Backpropagation and Gradient Descent

I need to write a very short definition of backpropagation and gradient descent and I'm a bit confused what the difference is. Is the following definition correct?: For calculating the weights of ...
20 views

Questions around XGBoost

I am trying to understand the XGBoost algorithm and have a few questions around it. I have read various blogs but all seem to tell a different story. Below is a snippet from the code that I am using (...
23 views

Should I exit my gradient descent loop as soon as the cost increases?

I'm trying to learn machine learning so I'm taking a course and currently studying gradient descent for linear regression. I just learned that if the learning rate is small enough, the value returned ...
41 views

Linear Regression - Implementing Feature Scaling

I was trying to implement Linear Regression in Octave 5.1.0 on a data set relating the GRE score to the probability of Admission. The data set is of the sort, 337 0.92 324 0.76 316 0.72 ...
31 views

First gradient descent : how to normalize X and Y?

I'm doing my first gradient descent ever , following a course about Machine Learning. But it doesn't seem to work correctly as it oscillates (converges then diverges then converges ... ) and at the ...
25 views

overflow of square funcion during gradient descent calcultion

i written the linear regression ( in one variable) along with gradient descent, it is working fine for smaller dataset, but for larger data set, it is giving error as: OverflowError: (34, 'Numerical ...
15 views

Why can't Forward Stagewise Additive Modeling work with absolute loss function?

In Forward Stagewise Additive Modeling, if the loss function is squared loss, the next weak learner fits to the residual error. Why not we do like this when the loss function is absolute error or ...
33 views

Numpy Concatenation different results

I wrote a little script that performs polynomial gradient descent, and I am trying to reduce the size of the code, even if it diminishes readability. This piece of code works fine for fitting a curve ...
20 views

Why using RMSE as loss function in logistic regression takes non convex form but doesn't in linear regression?

I am taking this deep learning course from Andrew NG. In the 3rd lecture of 2nd week of the first course, he mentions that we can use RMSE for logistic regression as well but it will take a nonconvex ...
45 views

Multiple linear regression with gradient descent

Halo, I'm new in machine learning and Python and I want to predict the Kaggle House Sales in King County dataset with my gradient descent. I'm splitting 70% (15k rows) training and 30% (6k rows) ...
36 views

Initialize neural network weights with Tensorflow

I am developing a neural network model using Tensorflow. In the LOSO cross validation, I need to train a model for 10 folds, since I have data from 10 different subjects. Taking this into account, I ...
19 views

Odd behavior of cost over time with SGD

I am relatively new to ML/DL and have been trying to improve my skills by making a model that learns the MNIST data set without TF or keras. I have 784 input nodes, 2 hidden layers of 16 neurons each, ...
45 views

When optimizing a SVGP with Poisson Likelihood for a big data set I see what I think are exploding gradients. After a few epochs I see a spiky drop of the ELBO, which then very slowly recovers after ...
36 views

I wrote some code that performs gradient descent on a couple of data points. For some reason the curve is not converging correctly, but I have no idea why that is. I always end up with an exploding ...
24 views

sgdclassifier not giving optimal result as logistic regression

I am training a dataset with sklearns's LogisticRegression and SGDclassifier with log as loss function. And I am using Logloss as my evaluation metric. But with SGDclassifier it is giving very high ...
26 views

How do I properly train and predict value like biomass using GRU RNN?

My first time trying to train a dataset containing 8 variables in a time-series of 20 years or so using GRU RNN. The biomass value is what I'm trying to predict based on the other variables. I'm ...
13 views

Following is from deep learning course from Andrew Ng SGD with momemtum. In implementation details professor mentioned as below v(dw) = beta * v(dw) + (1-beta)dw v(db) = beta * v(db) + (1-beta)db W ...
36 views

Problem while tring to write a vectorized matrix notation for the gradient descent algorithm

I was trying to write a vectorized notation for the iterative process of converging theta values in gradient descent algorithm. I found the vector notation but for some reason, the values are not ...
41 views

How do I get the right amount of change to the slope for my linear regression?

I want to program a linear regression with Processing. But I got mixed up which parameters I have to multiply and then add or subtract from my slope. I have tried to change the parameters (make them ...
115 views

Correct backpropagation in simple perceptron

Given the simple OR gate problem: or_input = np.array([[0,0], [0,1], [1,0], [1,1]]) or_output = np.array([[0,1,1,1]]).T If we train a simple single-layered perceptron (without backpropagation), we ...
25 views

What will be the value of sub gradient at 0 for function |x|

I am learning about Lasso Regression and came across taking gradient with respect to 0. I came to know about subgradient but could not understand what will be it's value at 0. In lasso regression, we ...
41 views

Why isn't my gradient descent algorithm working?

I made a gradient descent algorithm in Python and it doesn't work. My m and b values keep increasing and never stop until I get the -inf error or the overflow encountered in square error. import ...
10 views

Reason for convergence of gradient descent with standard scaler

I have two queries Will the interpretation of the coefficients be same before and after the sklearn standard scaler StandardScaler() How things change after using standard scaler that the gradient ...
19 views

How to do a Gradient Search to Minimize the Coherence of this Matrix?

thanks for looking into my question. I'm trying to create a function that takes in a matrix, and after a little bit returns a minimal value of a specific calculation. To be precise, I need something ...
31 views

Best Way to Overcome Early Convergence for Machine Learning Model

I have a machine learning model built that tries to predict weather data, and in this case I am doing a prediction on whether or not it will rain tomorrow (a binary prediction of Yes/No). In the ...
37 views

Gradient descent loop producing NaN in Matlab

I'm running a gradient descent loop to minimize a function, but my parameter vector w is calculated as NaN while it should be a numerical vector. This means that at some point the function is going to ...
30 views

Parameter representation python

I know this can be interpreted as vague but I have to ask. I am working on a problem where theta is denoted by θ = {W,C,b} where W is a matrix of size d*d, C is a vector belonging to R^d and b is a ...
28 views

gensim Word2Vec - how to apply stochastic gradient descent?

To my understanding, batch (vanilla) gradient descent makes one parameter update for all training data. Stochastic gradient descent (SGD) allows you to update parameter for each training sample, ...
34 views

How can I supply custom gradient to torch.optim.LBFGS?

I have some research task for logistic-like regression: import torch import math import numpy as np from sklearn.datasets import make_moons from matplotlib import pyplot from pandas import DataFrame ...
44 views

Stopping criteria/rule for ADAM optimization in pytorch?

In the code below, we define two functions, and then do some optimization using adam and pytorch. The code seems to work. However, we do a pre-defined number of iterations for the adam optimization (...
36 views

how is local minima possible in gradient descent?

gradient descent works on the equation of mean squared error, which is an equation of a parabola y=x^2 we often say weight adjustment in a neural network by gradient descent algorithm can hit a local ...
40 views

Why are gradient descent results so far off lm results?

I´m playing around with the gradDescent package on some made up data to get a feel for it. As I understand it, I should be getting similar results from both linear regression and gradient descent when ...
19 views

backpropagation with more than one node per layer

I read this article about how backpropagation works, and I understood everything they said. They said that to find the gradient we have to take a partial derivative of the cost function to each weight/...
30 views

Why am I getting a negative cost function for logistic regression using gradient descent in python?

I'm trying to apply what I've learned in Andrew Ng's Coursera course. I've successfully implemented this same algorithm the same way I'm doing it here on the Kaggle Titanic Dataset, but now with this ...
40 views

Minimizing Function with vector valued input in MATLAB

I want to minimize a function like below: Here, n can be 5,10,50 etc. I want to use Matlab and want to use Gradient Descent and Quasi-Newton Method with BFGS update to solve this problem along with ...
40 views

If one captures gradient with Optimizer, will it calculate twice the gradient?

I recently have some training performance bottleneck. I always add a lot of histograms in the summary. I want to know if by calculating gradients first then re-minimizing the lose will calculate twice ...
42 views

feed forward neural network fails to classify due to dimensionality of biases

I'm making a basic feed forward neural network to solve XOR gate problem. Standard settings: input layer + hidden layer + output layer, constant learning rate of 0.01 and number of epochs is 500. ...
46 views

How exactly works this simple calculus of a ML gradient descent cost function using Octave\MatLab?

I am following a machine learning course on Coursera and I am doing the following exercise using Octave (MatLab should be the same). The exercise is related to the calculation of the cost function ... 