Questions tagged [gradient-descent]

Gradient Descent is an algorithm for finding the minimum of a function. It iteratively calculates partial derivatives (gradients) of the function and descends in steps proportional to those partial derivatives. One major application of Gradient Descent is fitting a parameterized model to a set of data: the function to be minimized is an error function for the model.

0
votes
0answers
13 views

Neural evolution combined to gradient descent

To optimize a model we can either use gradient descent to train a model with a dataset, or we can use neural evolution and generate only the first initialized weights without training them and use ...
0
votes
2answers
35 views

How do backpropagation works in CNN? [on hold]

Can somebody please explain how backpropagation works for a CNN with max pooling layer and some convolutional layer with stride more than 1 and padding? And is there a way to compute these gradients ...
1
vote
0answers
44 views

Neural Network Not Converging “anywhere”

In the past 2 weeks I've been trying to implement a Hand-Written-Digit Classifier with a Feed-Forward Neural Network, using the MNIST Database. The neural network uses Cross-Entropy loss, and Softmax ...
0
votes
2answers
37 views

Why would I choose a loss-function differing from my metrics?

When I look through tutorials in the internet or at models posted here at SO, I often see that the loss function differs from the metrics used to evaluate the model. This might look like: model....
0
votes
0answers
34 views

Tensorflow Haskell Linear Regression diverges

I've been looking into the [tensorflow haskell bindings. However, I struggle to get the basic linear regression example from the readme to work properly: it diverges on what seems to be a very easy ...
0
votes
0answers
13 views

Method to bound/constrain a matrix elementwise in TensorFlow?

In TensorFlow, I currently have a matrix that represents the weights between layers of a Neural Network. However, I am trying to implement some kind of projected/constrained gradient descent, where ...
1
vote
1answer
23 views

Steepest Descent Trace Behavior

I've written code that performs steepest descent on a quadratic form given by the formula: 1/2 * (x1^2 + gamma * x2^2). Mathematically, I am taking the equations given in Boyd's Convex Optimization ...
0
votes
0answers
32 views

My vectorization implementation of gradient descent does not get me the right answer

I'm currently working on Andrew Ng's gradient descent exercise using python but keeps getting me the wrong optimal theta. I followed this vectorization cheatsheet for gradient descent --- https://...
1
vote
0answers
22 views

Time of execution in Tensorflow higher than in Numpy

I'm reproducing a Gradient Descent example from Geron's Hands-On Machine Learning both with Tensorflow and Numpy. It's odd but even with GPU enabled, Numpy seems to be 5 to 6 times faster than ...
1
vote
0answers
30 views

unable to apply condition on output of custom layer using keras layers module

I want to apply a condition on the output of a dense layer. For this, I tried to customize the Dense layer of Keras but when I run my code I get the error ValueError: No gradients provided for any ...
-1
votes
1answer
27 views

Short Definition of Backpropagation and Gradient Descent

I need to write a very short definition of backpropagation and gradient descent and I'm a bit confused what the difference is. Is the following definition correct?: For calculating the weights of ...
1
vote
0answers
20 views

Questions around XGBoost

I am trying to understand the XGBoost algorithm and have a few questions around it. I have read various blogs but all seem to tell a different story. Below is a snippet from the code that I am using (...
0
votes
1answer
23 views

Should I exit my gradient descent loop as soon as the cost increases?

I'm trying to learn machine learning so I'm taking a course and currently studying gradient descent for linear regression. I just learned that if the learning rate is small enough, the value returned ...
1
vote
1answer
41 views

Linear Regression - Implementing Feature Scaling

I was trying to implement Linear Regression in Octave 5.1.0 on a data set relating the GRE score to the probability of Admission. The data set is of the sort, 337 0.92 324 0.76 316 0.72 ...
0
votes
1answer
31 views

First gradient descent : how to normalize X and Y?

I'm doing my first gradient descent ever , following a course about Machine Learning. But it doesn't seem to work correctly as it oscillates (converges then diverges then converges ... ) and at the ...
1
vote
0answers
25 views

overflow of square funcion during gradient descent calcultion

i written the linear regression ( in one variable) along with gradient descent, it is working fine for smaller dataset, but for larger data set, it is giving error as: OverflowError: (34, 'Numerical ...
0
votes
0answers
15 views

Why can't Forward Stagewise Additive Modeling work with absolute loss function?

In Forward Stagewise Additive Modeling, if the loss function is squared loss, the next weak learner fits to the residual error. Why not we do like this when the loss function is absolute error or ...
0
votes
0answers
33 views

Numpy Concatenation different results

I wrote a little script that performs polynomial gradient descent, and I am trying to reduce the size of the code, even if it diminishes readability. This piece of code works fine for fitting a curve ...
0
votes
0answers
20 views

Why using RMSE as loss function in logistic regression takes non convex form but doesn't in linear regression?

I am taking this deep learning course from Andrew NG. In the 3rd lecture of 2nd week of the first course, he mentions that we can use RMSE for logistic regression as well but it will take a nonconvex ...
1
vote
1answer
45 views

Multiple linear regression with gradient descent

Halo, I'm new in machine learning and Python and I want to predict the Kaggle House Sales in King County dataset with my gradient descent. I'm splitting 70% (15k rows) training and 30% (6k rows) ...
1
vote
1answer
36 views

Initialize neural network weights with Tensorflow

I am developing a neural network model using Tensorflow. In the LOSO cross validation, I need to train a model for 10 folds, since I have data from 10 different subjects. Taking this into account, I ...
-2
votes
1answer
19 views

Odd behavior of cost over time with SGD

I am relatively new to ML/DL and have been trying to improve my skills by making a model that learns the MNIST data set without TF or keras. I have 784 input nodes, 2 hidden layers of 16 neurons each, ...
0
votes
2answers
45 views

Exploding gradient for gpflow SVGP

When optimizing a SVGP with Poisson Likelihood for a big data set I see what I think are exploding gradients. After a few epochs I see a spiky drop of the ELBO, which then very slowly recovers after ...
1
vote
1answer
36 views

Curve fitting with gradient descent

I wrote some code that performs gradient descent on a couple of data points. For some reason the curve is not converging correctly, but I have no idea why that is. I always end up with an exploding ...
0
votes
0answers
24 views

sgdclassifier not giving optimal result as logistic regression

I am training a dataset with sklearns's LogisticRegression and SGDclassifier with log as loss function. And I am using Logloss as my evaluation metric. But with SGDclassifier it is giving very high ...
0
votes
1answer
26 views

How do I properly train and predict value like biomass using GRU RNN?

My first time trying to train a dataset containing 8 variables in a time-series of 20 years or so using GRU RNN. The biomass value is what I'm trying to predict based on the other variables. I'm ...
0
votes
1answer
13 views

gradient descent with momemtum formula

Following is from deep learning course from Andrew Ng SGD with momemtum. In implementation details professor mentioned as below v(dw) = beta * v(dw) + (1-beta)dw v(db) = beta * v(db) + (1-beta)db W ...
0
votes
1answer
36 views

Problem while tring to write a vectorized matrix notation for the gradient descent algorithm

I was trying to write a vectorized notation for the iterative process of converging theta values in gradient descent algorithm. I found the vector notation but for some reason, the values are not ...
1
vote
1answer
41 views

How do I get the right amount of change to the slope for my linear regression?

I want to program a linear regression with Processing. But I got mixed up which parameters I have to multiply and then add or subtract from my slope. I have tried to change the parameters (make them ...
7
votes
0answers
115 views

Correct backpropagation in simple perceptron

Given the simple OR gate problem: or_input = np.array([[0,0], [0,1], [1,0], [1,1]]) or_output = np.array([[0,1,1,1]]).T If we train a simple single-layered perceptron (without backpropagation), we ...
0
votes
0answers
25 views

What will be the value of sub gradient at 0 for function |x|

I am learning about Lasso Regression and came across taking gradient with respect to 0. I came to know about subgradient but could not understand what will be it's value at 0. In lasso regression, we ...
-1
votes
2answers
41 views

Why isn't my gradient descent algorithm working?

I made a gradient descent algorithm in Python and it doesn't work. My m and b values keep increasing and never stop until I get the -inf error or the overflow encountered in square error. import ...
0
votes
0answers
10 views

Reason for convergence of gradient descent with standard scaler

I have two queries Will the interpretation of the coefficients be same before and after the sklearn standard scaler StandardScaler() How things change after using standard scaler that the gradient ...
0
votes
0answers
19 views

How to do a Gradient Search to Minimize the Coherence of this Matrix?

thanks for looking into my question. I'm trying to create a function that takes in a matrix, and after a little bit returns a minimal value of a specific calculation. To be precise, I need something ...
-1
votes
2answers
31 views

Best Way to Overcome Early Convergence for Machine Learning Model

I have a machine learning model built that tries to predict weather data, and in this case I am doing a prediction on whether or not it will rain tomorrow (a binary prediction of Yes/No). In the ...
1
vote
0answers
37 views

Gradient descent loop producing NaN in Matlab

I'm running a gradient descent loop to minimize a function, but my parameter vector w is calculated as NaN while it should be a numerical vector. This means that at some point the function is going to ...
1
vote
0answers
30 views

Parameter representation python

I know this can be interpreted as vague but I have to ask. I am working on a problem where theta is denoted by θ = {W,C,b} where W is a matrix of size d*d, C is a vector belonging to R^d and b is a ...
0
votes
1answer
28 views

gensim Word2Vec - how to apply stochastic gradient descent?

To my understanding, batch (vanilla) gradient descent makes one parameter update for all training data. Stochastic gradient descent (SGD) allows you to update parameter for each training sample, ...
0
votes
0answers
34 views

How can I supply custom gradient to torch.optim.LBFGS?

I have some research task for logistic-like regression: import torch import math import numpy as np from sklearn.datasets import make_moons from matplotlib import pyplot from pandas import DataFrame ...
0
votes
0answers
44 views

Stopping criteria/rule for ADAM optimization in pytorch?

In the code below, we define two functions, and then do some optimization using adam and pytorch. The code seems to work. However, we do a pre-defined number of iterations for the adam optimization (...
0
votes
0answers
36 views

how is local minima possible in gradient descent?

gradient descent works on the equation of mean squared error, which is an equation of a parabola y=x^2 we often say weight adjustment in a neural network by gradient descent algorithm can hit a local ...
1
vote
0answers
40 views

Why are gradient descent results so far off lm results?

I´m playing around with the gradDescent package on some made up data to get a feel for it. As I understand it, I should be getting similar results from both linear regression and gradient descent when ...
-2
votes
1answer
19 views

backpropagation with more than one node per layer

I read this article about how backpropagation works, and I understood everything they said. They said that to find the gradient we have to take a partial derivative of the cost function to each weight/...
0
votes
1answer
30 views

Why am I getting a negative cost function for logistic regression using gradient descent in python?

I'm trying to apply what I've learned in Andrew Ng's Coursera course. I've successfully implemented this same algorithm the same way I'm doing it here on the Kaggle Titanic Dataset, but now with this ...
1
vote
1answer
40 views

Minimizing Function with vector valued input in MATLAB

I want to minimize a function like below: Here, n can be 5,10,50 etc. I want to use Matlab and want to use Gradient Descent and Quasi-Newton Method with BFGS update to solve this problem along with ...
0
votes
1answer
40 views

If one captures gradient with Optimizer, will it calculate twice the gradient?

I recently have some training performance bottleneck. I always add a lot of histograms in the summary. I want to know if by calculating gradients first then re-minimizing the lose will calculate twice ...
0
votes
0answers
42 views

feed forward neural network fails to classify due to dimensionality of biases

I'm making a basic feed forward neural network to solve XOR gate problem. Standard settings: input layer + hidden layer + output layer, constant learning rate of 0.01 and number of epochs is 500. ...
0
votes
1answer
46 views

How exactly works this simple calculus of a ML gradient descent cost function using Octave\MatLab?

I am following a machine learning course on Coursera and I am doing the following exercise using Octave (MatLab should be the same). The exercise is related to the calculation of the cost function ...
1
vote
0answers
96 views

Gradient Descent Overshooting and Cost Blowing Up when used for Regularized Logistic Regression

I'm using MATLAB to code Regularized Logistic Regression and am using Gradient Descent to discover the parameters. All is based on Andrew Ng's Coursera Machine Learning course. I am trying to code the ...
1
vote
1answer
66 views

Getting gradient descent to work in octave.(Andrew ng's machine learn course, excersise 1)

So i am trying to implement/solve the first programming excersise from Andrew ng`s machine learn cours on coursera. I have trouble implementing linear gradient descent(for one variable) in octave. I ...