# Questions tagged [gradient-descent]

Gradient Descent is an algorithm for finding the minimum of a function. It iteratively calculates partial derivatives (gradients) of the function and descends in steps proportional to those partial derivatives. One major application of Gradient Descent is fitting a parameterized model to a set of data: the function to be minimized is an error function for the model.

**0**

votes

**1**answer

21 views

### Why am I getting a negative cost function for logistic regression using gradient descent in python?

I'm trying to apply what I've learned in Andrew Ng's Coursera course. I've successfully implemented this same algorithm the same way I'm doing it here on the Kaggle Titanic Dataset, but now with this ...

**1**

vote

**1**answer

37 views

### Minimizing Function with vector valued input in MATLAB

I want to minimize a function like below:
Here, n can be 5,10,50 etc. I want to use Matlab and want to use Gradient Descent and Quasi-Newton Method with BFGS update to solve this problem along with ...

**0**

votes

**1**answer

37 views

### If one captures gradient with Optimizer, will it calculate twice the gradient?

I recently have some training performance bottleneck. I always add a lot of histograms in the summary. I want to know if by calculating gradients first then re-minimizing the lose will calculate twice ...

**-2**

votes

**0**answers

29 views

### Do I have to have a velocity for each weight and bias in nesterov accelerated gradient descent?

I'm coding a Neural Network in C++ and I got into trouble while initializing nestrov accelerated gradient descent, because I don't know if each bias and weight in my whole NN has its own velocity (v).
...

**0**

votes

**1**answer

33 views

### How exactly works this simple calculus of a ML gradient descent cost function using Octave\MatLab?

I am following a machine learning course on Coursera and I am doing the following exercise using Octave (MatLab should be the same).
The exercise is related to the calculation of the cost function ...

**1**

vote

**0**answers

61 views

### Gradient Descent Overshooting and Cost Blowing Up when used for Regularized Logistic Regression

I'm using MATLAB to code Regularized Logistic Regression and am using Gradient Descent to discover the parameters. All is based on Andrew Ng's Coursera Machine Learning course. I am trying to code the ...

**1**

vote

**1**answer

46 views

### Getting gradient descent to work in octave.(Andrew ng's machine learn course, excersise 1)

So i am trying to implement/solve the first programming excersise from Andrew ng`s machine learn cours on coursera.
I have trouble implementing linear gradient descent(for one variable) in octave. I ...

**0**

votes

**0**answers

21 views

### Scipy: Desired Error Not Achieved Due To Precision Loss From Absolute Values

I am attempting to solve a matrix factorisation problem with Scipy's nonlinear conjugate gradient descent implementation.
My problem attempts to solve for A and B in:
A @ np.transpose(B) = Y while ...

**1**

vote

**0**answers

32 views

### How to implement mini-batch gradient descent for maximum likelihood estimation python?

Currently, I have some code written that finds the combination of parameters that maximizes the log-likelihood function with some field data. The model now randomly selects the parameters out of a ...

**1**

vote

**0**answers

39 views

### Stochastic Gradient Descent in Python

I'm trying to implement stochastic gradient descent from scratch in Python in order to predict a specific polynomial function. I feel like I got the correct overall structure, but my weights (thetas) ...

**-3**

votes

**2**answers

15 views

### When do weights stop updating?

I'm implementing gradient descent for an assignment and am confused about when the weights are suppose to stop updating. Do I stop updating the weights when they don't change very much, i.e. when the ...

**0**

votes

**1**answer

32 views

### Gradient Descent without derivative

So I’m trying to understand Gradient Descent and I’m confused. If you have a parabola which is of the loss as you change a weight. Instead of taking the derivative at the point of x we are at, why not ...

**0**

votes

**2**answers

27 views

### Gradient Descent for Linear Regression not finding optimal parameters

I am trying to implement the gradient descent algorithm to fit a straight line to noisy data following the following image taken from Andrew Ng's course.
First, I am declaring the noisy straight line ...

**0**

votes

**1**answer

30 views

### Sklearn Implementation for batch gradient descend

What is the way of implementing Batch gradient descent using sklearn for classification?
We have SGDClassifier for Stochastic GD which will take single instance at a time and Linear/Logistic ...

**0**

votes

**0**answers

23 views

### How do I visualize the costs with different batch sizes in Stochastic gradient descent? and what wrong of the code?

I need to modify the code so that I can visualize the costs with different batch sizes. Each batch size would be separated plot for average cost (y axis) versus number of trained samples (x axis).
...

**0**

votes

**1**answer

69 views

### Linear Regression model (using Gradient Descent) does not converge on Boston Housing Dataset

I've been trying to find out why my linear regression model performs poorly when compared to sklearn's linear regression model.
My linear regression model (update rules based on gradient descent)
w0 ...

**-2**

votes

**0**answers

16 views

### about machine learning and projects

I'm a beginner in Machine Learning. Just completed Andrew Ng's ML course.
Questions:
1) I was working on the titanic data-set recently and noticed that in most of the kernels that were submitted, ...

**0**

votes

**0**answers

15 views

### some questions about tensorflow GradientDescentOptimizer

I tried to test my code use tensorflow, I set GradientDescentOptimizer learning rate equal to 1 and output Wx1 old weight, new weight, and gradients. but the new weight != old weight - 1*gradients, I ...

**0**

votes

**0**answers

25 views

### Why is my implementation of gradient descent on python producing outputs so slow?

Why are the outputs from the code getting slow with every successive iteration?
I want to write a working code , that implements Gradient descent and Newton's method on same function and I want to ...

**0**

votes

**1**answer

28 views

### Feature scaling in Gradient descent with single feature

I am writing code for linear regression in which my model will predict price of houses on basis of the area. So, i have only one feature that is the area of the house and my output is the price. My ...

**0**

votes

**0**answers

28 views

### getting nan in total loss after backprop on first batch

I have a total_loss which is sum of -
A BCELoss
A Crossentropy loss
A custom loss function for image gradient.
The problem I am facing is that after 1st batch, some weights are updated to nan which ...

**0**

votes

**1**answer

32 views

### How to solve logistic regression using gradient descent in octave?

I am learning Machine Learning course from coursera from Andrews Ng. I have written a code for logistic regression in octave. But, it is not working. Can someone help me?
I have taken the dataset ...

**1**

vote

**1**answer

14 views

### Loss over pixels

During backpropagation, will these cases have different effect:-
sum up loss over all pixels then backpropagate.
average loss over all pixels then backpropagate
backpropagate individuallyover all ...

**2**

votes

**1**answer

61 views

### Machine Learning Gradient descent python implementation

Problem
I have written this code, but this is giving errors:
RuntimeWarning: overflow encountered in multiply
t2_temp = sum(x*(y_temp - y))
RuntimeWarning: overflow encountered in ...

**0**

votes

**0**answers

9 views

### How do I print the best score and the optimal values from the Hyperparameter Tuning for SGD Regressor (sklearn)?

using Hyperparameter Tuning for SGD Regressor, I want to tune the following hyper-parameters using the following values.
alpha: 0.1, 0.01, 0.001
learning_rate: "constant", "optimal"
l1_ratio': from ...

**-1**

votes

**1**answer

40 views

### Are there alternatives to backpropagation?

I know a neural network can be trained using gradient descent and I understand how it works.
Recently, I stumbled upon other training algorithms: conjugate gradient and quasi-Newton algorithms.
I ...

**0**

votes

**0**answers

26 views

### Using tf.py_func as loss function to implement gradient descent

I'm trying to use tf.train.GradientDescentOptimizer().minimize(loss) to get the minimum value of the loss function. But the loss function is very complicated and I need to use numpy to calculate the ...

**0**

votes

**0**answers

15 views

### Unaggregated gradients in tensorflow again

My question is about the tf.gradient(ys,xs) which always returns sum(dy/dx) for all y in ys. The summing up is implicit and there does not seem to be an official way to get a list of gradients for ...

**3**

votes

**0**answers

82 views

### How can I implement this L1 norm Robust PCA equation in a more efficient way?

I recently learned in class the Principle Component Analysis method aims to approximate a matrix X to a multiplication of two matrices Z*W. If X is a n x d matrix, Z is a n x k matrix and W is a k x d ...

**-1**

votes

**0**answers

13 views

### Implementation of Natash2 alg by Allen-Zhu?

I wanted to ask if anyone knows of an implementation of the Natasha2 algorithm (introduced in this paper by Zeyuan Allen-Zhu: https://arxiv.org/abs/1708.08694)
Natasha2 uses the Oja’s algorithm to ...

**3**

votes

**0**answers

64 views

### How can I add custom gradients in the Haskell autodifferentiation library “ad”?

If I want to give a custom or known gradient for a function, how can I do that in the ad library? (I don't want to autodifferentiate through this function.) I am using the grad function in this ...

**2**

votes

**3**answers

41 views

### Linear regression using gradient descent algorithm, getting unexpected results

I'm trying to create a function which returns the value of θ0 & θ1 of hypothesis function of linear regression. But I'm getting different results for different initial (random) values ...

**-1**

votes

**0**answers

58 views

### Problem with calculating Gradient Descent with Python Numpy

I had trouble when I try to find Gradient Descent of some data.
I have two list which are x and y
x = [4512. 3738. 4261. 3777. 4177. 3585. 3785. 3559. 3613. 3982. 3443. 3993........ ](size is 200)
y =...

**2**

votes

**2**answers

108 views

### Gradient Descent implementation in python?

I have tried to implement gradient descent and it was working properly when I tested it on sample dataset but it's not working properly for boston dataset.
Can you verify what's wrong with the code. ...

**0**

votes

**0**answers

20 views

### 'numpy.int64' object has no attribute 'dot'

earlier my error was
'numpy.float64' object has no attribute 'dot'
so i modified my code as follow
def nonlin(X,deriv=False):
if(deriv==True):
return x*(1-x)
return 1/(1+np.exp(-x))
np.random.seed(...

**0**

votes

**1**answer

33 views

### Reinforcement learning cost function

Newb question
I am writing a OpenAI Gym pong player with TensorFlow and thus far have been able to create the network based on a random initialization so that it would randomly return to move the ...

**0**

votes

**1**answer

30 views

### Tensorflow with gradient decent results in wrong coefficients

Currently, i am trying to construct a linear regression that uses birth rate (x) as predictor to predict life expectancy (y).
y=w*x+b
The dataset could be found here: Dataset
Here is an online link ...

**-1**

votes

**1**answer

44 views

### How to do a gradient descent problem (machine learning)?

could somebody please explain how to do a gradient descent problem WITHOUT the context of the cost function? I have seen countless tutorials that explain gradient descent using the cost function, but ...

**0**

votes

**2**answers

65 views

### Why doesn't the learning rate (LR) go below 1e-08 in pytorch?

I am training a model. To overcome overfitting I have done optimization, data augmentation etc etc. I have an updated LR (I tried for both SGD and Adam), and when there is a plateu (also tried step), ...

**0**

votes

**0**answers

8 views

### parameter values theta keeps increasing in gradient descent

below is the code i am using for gradient descent, when initializing x as the 1st row the code is working fine and i am getting the proper thetas, but when using the x as defined in 2nd row rest all ...

**1**

vote

**1**answer

56 views

### Neural Network makes same predictions for different instances with different features

Out of interest, I created (or at least tried to create) an Artificial Neural Network with four layers as a classifier for the famous Iris flower data set. The target values vary from 0 to 2 as labels ...

**1**

vote

**0**answers

33 views

### nesterov momentum gradient calculation at predicted point

In nesterov momentum, the Gradient of the error function with respect to the parameters is calculated at a point different from that where the cost was calculated - that is, the model jumps ahead a ...

**0**

votes

**0**answers

29 views

### Problem with gradient checking in deep neural network

I'm currently writing code for a deep neural network. I've implemented forward porp and back prop. To check that my backpropagation was well done I implemented gradient checking. The difference ...

**0**

votes

**0**answers

23 views

### Gradient Descent cost function explosion

I am writing this code for linear regression and trying Gradient Descent to minimize the RSS. The cost function seems to explode to infinity within 12 iterations. I know this is not supposed to happen....

**0**

votes

**2**answers

760 views

### How to properly do gradient clipping in pytorch?

What is the correct way to perform gradient clipping in pytorch?
I have an exploding gradients problem, and I need to program my way around it.

**-1**

votes

**1**answer

35 views

### Why does the intercept parameter increases in an unexpected direction?

I'm doing 2 gradient descent iterations (initial condition: learning_rate = 0.1, and [w0,w1] = [0,0]) to find the 2 parameters (y_hat = w0 + w1*x) for linear model that fits a simple dataset, x=[0,1,2,...

**0**

votes

**0**answers

24 views

### Which layers are more intolerant to error in neural networks?

I am doing research and am curious about the impact of gradient descent on layers individually. As we all know, gradient descent always tries to takes us to the global minimum of the valley. However, ...

**-1**

votes

**1**answer

32 views

### What is the cost function of J(0,1) with a particular training set?

I am going over a Machine learning class on Coursera and I have trouble getting the correct answer on the following task:
For this question, assume that we are using the training set:
x, y
3, 2
1, ...

**0**

votes

**0**answers

32 views

### Tensorflow: generate input to obtain desired output

I am trying to apply gradient descent on the input variable in my TF model make the model output an arbitrary value. I first train the model with real data, then generate a random array to obtain a ...

**1**

vote

**0**answers

42 views

### How do I correctly define a custom STE gradient in Flux?

I am trying to write a custom STE gradient using Flux. The activation is basically just the sign() function, and its gradient is the incoming gradient as is iff its absolute value is <=1, and ...