The results are in! See what nearly 90,000 developers picked as their most loved, dreaded, and desired coding languages and more in the 2019 Developer Survey.

Questions tagged [gradient-descent]

Gradient Descent is an algorithm for finding the minimum of a function. It iteratively calculates partial derivatives (gradients) of the function and descends in steps proportional to those partial derivatives. One major application of Gradient Descent is fitting a parameterized model to a set of data: the function to be minimized is an error function for the model.

0
votes
1answer
21 views

Why am I getting a negative cost function for logistic regression using gradient descent in python?

I'm trying to apply what I've learned in Andrew Ng's Coursera course. I've successfully implemented this same algorithm the same way I'm doing it here on the Kaggle Titanic Dataset, but now with this ...
1
vote
1answer
37 views

Minimizing Function with vector valued input in MATLAB

I want to minimize a function like below: Here, n can be 5,10,50 etc. I want to use Matlab and want to use Gradient Descent and Quasi-Newton Method with BFGS update to solve this problem along with ...
0
votes
1answer
37 views

If one captures gradient with Optimizer, will it calculate twice the gradient?

I recently have some training performance bottleneck. I always add a lot of histograms in the summary. I want to know if by calculating gradients first then re-minimizing the lose will calculate twice ...
-2
votes
0answers
29 views

Do I have to have a velocity for each weight and bias in nesterov accelerated gradient descent?

I'm coding a Neural Network in C++ and I got into trouble while initializing nestrov accelerated gradient descent, because I don't know if each bias and weight in my whole NN has its own velocity (v). ...
0
votes
1answer
33 views

How exactly works this simple calculus of a ML gradient descent cost function using Octave\MatLab?

I am following a machine learning course on Coursera and I am doing the following exercise using Octave (MatLab should be the same). The exercise is related to the calculation of the cost function ...
1
vote
0answers
61 views

Gradient Descent Overshooting and Cost Blowing Up when used for Regularized Logistic Regression

I'm using MATLAB to code Regularized Logistic Regression and am using Gradient Descent to discover the parameters. All is based on Andrew Ng's Coursera Machine Learning course. I am trying to code the ...
1
vote
1answer
46 views

Getting gradient descent to work in octave.(Andrew ng's machine learn course, excersise 1)

So i am trying to implement/solve the first programming excersise from Andrew ng`s machine learn cours on coursera. I have trouble implementing linear gradient descent(for one variable) in octave. I ...
0
votes
0answers
21 views

Scipy: Desired Error Not Achieved Due To Precision Loss From Absolute Values

I am attempting to solve a matrix factorisation problem with Scipy's nonlinear conjugate gradient descent implementation. My problem attempts to solve for A and B in: A @ np.transpose(B) = Y while ...
1
vote
0answers
32 views

How to implement mini-batch gradient descent for maximum likelihood estimation python?

Currently, I have some code written that finds the combination of parameters that maximizes the log-likelihood function with some field data. The model now randomly selects the parameters out of a ...
1
vote
0answers
39 views

Stochastic Gradient Descent in Python

I'm trying to implement stochastic gradient descent from scratch in Python in order to predict a specific polynomial function. I feel like I got the correct overall structure, but my weights (thetas) ...
-3
votes
2answers
15 views

When do weights stop updating?

I'm implementing gradient descent for an assignment and am confused about when the weights are suppose to stop updating. Do I stop updating the weights when they don't change very much, i.e. when the ...
0
votes
1answer
32 views

Gradient Descent without derivative

So I’m trying to understand Gradient Descent and I’m confused. If you have a parabola which is of the loss as you change a weight. Instead of taking the derivative at the point of x we are at, why not ...
0
votes
2answers
27 views

Gradient Descent for Linear Regression not finding optimal parameters

I am trying to implement the gradient descent algorithm to fit a straight line to noisy data following the following image taken from Andrew Ng's course. First, I am declaring the noisy straight line ...
0
votes
1answer
30 views

Sklearn Implementation for batch gradient descend

What is the way of implementing Batch gradient descent using sklearn for classification? We have SGDClassifier for Stochastic GD which will take single instance at a time and Linear/Logistic ...
0
votes
0answers
23 views

How do I visualize the costs with different batch sizes in Stochastic gradient descent? and what wrong of the code?

I need to modify the code so that I can visualize the costs with different batch sizes. Each batch size would be separated plot for average cost (y axis) versus number of trained samples (x axis). ...
0
votes
1answer
69 views

Linear Regression model (using Gradient Descent) does not converge on Boston Housing Dataset

I've been trying to find out why my linear regression model performs poorly when compared to sklearn's linear regression model. My linear regression model (update rules based on gradient descent) w0 ...
-2
votes
0answers
16 views

about machine learning and projects

I'm a beginner in Machine Learning. Just completed Andrew Ng's ML course. Questions: 1) I was working on the titanic data-set recently and noticed that in most of the kernels that were submitted, ...
0
votes
0answers
15 views

some questions about tensorflow GradientDescentOptimizer

I tried to test my code use tensorflow, I set GradientDescentOptimizer learning rate equal to 1 and output Wx1 old weight, new weight, and gradients. but the new weight != old weight - 1*gradients, I ...
0
votes
0answers
25 views

Why is my implementation of gradient descent on python producing outputs so slow?

Why are the outputs from the code getting slow with every successive iteration? I want to write a working code , that implements Gradient descent and Newton's method on same function and I want to ...
0
votes
1answer
28 views

Feature scaling in Gradient descent with single feature

I am writing code for linear regression in which my model will predict price of houses on basis of the area. So, i have only one feature that is the area of the house and my output is the price. My ...
0
votes
0answers
28 views

getting nan in total loss after backprop on first batch

I have a total_loss which is sum of - A BCELoss A Crossentropy loss A custom loss function for image gradient. The problem I am facing is that after 1st batch, some weights are updated to nan which ...
0
votes
1answer
32 views

How to solve logistic regression using gradient descent in octave?

I am learning Machine Learning course from coursera from Andrews Ng. I have written a code for logistic regression in octave. But, it is not working. Can someone help me? I have taken the dataset ...
1
vote
1answer
14 views

Loss over pixels

During backpropagation, will these cases have different effect:- sum up loss over all pixels then backpropagate. average loss over all pixels then backpropagate backpropagate individuallyover all ...
2
votes
1answer
61 views

Machine Learning Gradient descent python implementation

Problem I have written this code, but this is giving errors: RuntimeWarning: overflow encountered in multiply t2_temp = sum(x*(y_temp - y)) RuntimeWarning: overflow encountered in ...
0
votes
0answers
9 views

How do I print the best score and the optimal values from the Hyperparameter Tuning for SGD Regressor (sklearn)?

using Hyperparameter Tuning for SGD Regressor, I want to tune the following hyper-parameters using the following values. alpha: 0.1, 0.01, 0.001 learning_rate: "constant", "optimal" l1_ratio': from ...
-1
votes
1answer
40 views

Are there alternatives to backpropagation?

I know a neural network can be trained using gradient descent and I understand how it works. Recently, I stumbled upon other training algorithms: conjugate gradient and quasi-Newton algorithms. I ...
0
votes
0answers
26 views

Using tf.py_func as loss function to implement gradient descent

I'm trying to use tf.train.GradientDescentOptimizer().minimize(loss) to get the minimum value of the loss function. But the loss function is very complicated and I need to use numpy to calculate the ...
0
votes
0answers
15 views

Unaggregated gradients in tensorflow again

My question is about the tf.gradient(ys,xs) which always returns sum(dy/dx) for all y in ys. The summing up is implicit and there does not seem to be an official way to get a list of gradients for ...
3
votes
0answers
82 views

How can I implement this L1 norm Robust PCA equation in a more efficient way?

I recently learned in class the Principle Component Analysis method aims to approximate a matrix X to a multiplication of two matrices Z*W. If X is a n x d matrix, Z is a n x k matrix and W is a k x d ...
-1
votes
0answers
13 views

Implementation of Natash2 alg by Allen-Zhu?

I wanted to ask if anyone knows of an implementation of the Natasha2 algorithm (introduced in this paper by Zeyuan Allen-Zhu: https://arxiv.org/abs/1708.08694) Natasha2 uses the Oja’s algorithm to ...
3
votes
0answers
64 views

How can I add custom gradients in the Haskell autodifferentiation library “ad”?

If I want to give a custom or known gradient for a function, how can I do that in the ad library? (I don't want to autodifferentiate through this function.) I am using the grad function in this ...
2
votes
3answers
41 views

Linear regression using gradient descent algorithm, getting unexpected results

I'm trying to create a function which returns the value of θ0 & θ1 of hypothesis function of linear regression. But I'm getting different results for different initial (random) values ...
-1
votes
0answers
58 views

Problem with calculating Gradient Descent with Python Numpy

I had trouble when I try to find Gradient Descent of some data. I have two list which are x and y x = [4512. 3738. 4261. 3777. 4177. 3585. 3785. 3559. 3613. 3982. 3443. 3993........ ](size is 200) y =...
2
votes
2answers
108 views

Gradient Descent implementation in python?

I have tried to implement gradient descent and it was working properly when I tested it on sample dataset but it's not working properly for boston dataset. Can you verify what's wrong with the code. ...
0
votes
0answers
20 views

'numpy.int64' object has no attribute 'dot'

earlier my error was 'numpy.float64' object has no attribute 'dot' so i modified my code as follow def nonlin(X,deriv=False): if(deriv==True): return x*(1-x) return 1/(1+np.exp(-x)) np.random.seed(...
0
votes
1answer
33 views

Reinforcement learning cost function

Newb question I am writing a OpenAI Gym pong player with TensorFlow and thus far have been able to create the network based on a random initialization so that it would randomly return to move the ...
0
votes
1answer
30 views

Tensorflow with gradient decent results in wrong coefficients

Currently, i am trying to construct a linear regression that uses birth rate (x) as predictor to predict life expectancy (y). y=w*x+b The dataset could be found here: Dataset Here is an online link ...
-1
votes
1answer
44 views

How to do a gradient descent problem (machine learning)?

could somebody please explain how to do a gradient descent problem WITHOUT the context of the cost function? I have seen countless tutorials that explain gradient descent using the cost function, but ...
0
votes
2answers
65 views

Why doesn't the learning rate (LR) go below 1e-08 in pytorch?

I am training a model. To overcome overfitting I have done optimization, data augmentation etc etc. I have an updated LR (I tried for both SGD and Adam), and when there is a plateu (also tried step), ...
0
votes
0answers
8 views

parameter values theta keeps increasing in gradient descent

below is the code i am using for gradient descent, when initializing x as the 1st row the code is working fine and i am getting the proper thetas, but when using the x as defined in 2nd row rest all ...
1
vote
1answer
56 views

Neural Network makes same predictions for different instances with different features

Out of interest, I created (or at least tried to create) an Artificial Neural Network with four layers as a classifier for the famous Iris flower data set. The target values vary from 0 to 2 as labels ...
1
vote
0answers
33 views

nesterov momentum gradient calculation at predicted point

In nesterov momentum, the Gradient of the error function with respect to the parameters is calculated at a point different from that where the cost was calculated - that is, the model jumps ahead a ...
0
votes
0answers
29 views

Problem with gradient checking in deep neural network

I'm currently writing code for a deep neural network. I've implemented forward porp and back prop. To check that my backpropagation was well done I implemented gradient checking. The difference ...
0
votes
0answers
23 views

Gradient Descent cost function explosion

I am writing this code for linear regression and trying Gradient Descent to minimize the RSS. The cost function seems to explode to infinity within 12 iterations. I know this is not supposed to happen....
0
votes
2answers
760 views

How to properly do gradient clipping in pytorch?

What is the correct way to perform gradient clipping in pytorch? I have an exploding gradients problem, and I need to program my way around it.
-1
votes
1answer
35 views

Why does the intercept parameter increases in an unexpected direction?

I'm doing 2 gradient descent iterations (initial condition: learning_rate = 0.1, and [w0,w1] = [0,0]) to find the 2 parameters (y_hat = w0 + w1*x) for linear model that fits a simple dataset, x=[0,1,2,...
0
votes
0answers
24 views

Which layers are more intolerant to error in neural networks?

I am doing research and am curious about the impact of gradient descent on layers individually. As we all know, gradient descent always tries to takes us to the global minimum of the valley. However, ...
-1
votes
1answer
32 views

What is the cost function of J(0,1) with a particular training set?

I am going over a Machine learning class on Coursera and I have trouble getting the correct answer on the following task: For this question, assume that we are using the training set: x, y 3, 2 1, ...
0
votes
0answers
32 views

Tensorflow: generate input to obtain desired output

I am trying to apply gradient descent on the input variable in my TF model make the model output an arbitrary value. I first train the model with real data, then generate a random array to obtain a ...
1
vote
0answers
42 views

How do I correctly define a custom STE gradient in Flux?

I am trying to write a custom STE gradient using Flux. The activation is basically just the sign() function, and its gradient is the incoming gradient as is iff its absolute value is <=1, and ...