博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
UFLDL softmax回归编程答案
阅读量:4147 次
发布时间:2019-05-25

本文共 6716 字,大约阅读时间需要 22 分钟。

softmax回归的算法原理请参看UFLDL的教程:

Step 0: Initialize constants and parameters

We've provided the code for this step in softmaxExercise.m.

Two constants, inputSize and numClasses, corresponding to the size of each input vector and the number of class labels have been defined in the starter code. This will allow you to reuse your code on a different data set in a later exercise. We also initialize lambda, the weight decay parameter here.

Step 1: Load data

The starter code loads the MNIST images and labels into inputData and labels respectively. The images are pre-processed to scale the pixel values to the range [0,1], and the label 0 is remapped to 10 for convenience of implementation, so that the labels take values in \{1, 2, \ldots, 10\}. You will not need to change any code in this step for this exercise, but note that your code should be general enough to operate on data of arbitrary size belonging to any number of classes.

Step 2: Implement softmaxCost

In softmaxCost.m, implement code to compute the softmax cost function J(θ). Remember to include the weight decay term in the cost as well. Your code should also compute the appropriate gradients, as well as the predictions for the input data (which will be used in the cross-validation step later).

It is important to vectorize your code so that it runs quickly. We also provide several implementation tips below:

Note: In the provided starter code, theta is a matrix where each the jth row is \theta_j^T

Implementation Tip: Computing the ground truth matrix - In your code, you may need to compute the ground truth matrix M, such that M(r, c) is 1 if y(c) = r and 0 otherwise. This can be done quickly, without a loop, using the MATLAB functions sparse and full. Specifically, the command M = sparse(r, c, v) creates a sparse matrix such that M(r(i), c(i)) = v(i) for all i. That is, the vectors r and c give the position of the elements whose values we wish to set, and v the corresponding values of the elements. Running full on a sparse matrix gives a "full" representation of the matrix for use (meaning that Matlab will no longer try to represent it as a sparse matrix in memory). The code for using sparse and full to compute the ground truth matrix has already been included in softmaxCost.m.

Implementation Tip: Preventing overflows - in softmax regression, you will have to compute the hypothesis

\begin{align} h(x^{(i)}) = \frac{1}{ \sum_{j=1}^{k}{e^{ \theta_j^T x^{(i)} }} }\begin{bmatrix} e^{ \theta_1^T x^{(i)} } \\e^{ \theta_2^T x^{(i)} } \\\vdots \\e^{ \theta_k^T x^{(i)} } \\\end{bmatrix}\end{align}

When the products \theta_i^T x^{(i)} are large, the exponential function e^{\theta_i^T x^{(i)}} will become very large and possibly overflow. When this happens, you will not be able to compute your hypothesis. However, there is an easy solution - observe that we can multiply the top and bottom of the hypothesis by some constant without changing the output:

\begin{align} h(x^{(i)}) &= \frac{1}{ \sum_{j=1}^{k}{e^{ \theta_j^T x^{(i)} }} }\begin{bmatrix} e^{ \theta_1^T x^{(i)} } \\e^{ \theta_2^T x^{(i)} } \\\vdots \\e^{ \theta_k^T x^{(i)} } \\\end{bmatrix} \\&=\frac{ e^{-\alpha} }{ e^{-\alpha} \sum_{j=1}^{k}{e^{ \theta_j^T x^{(i)} }} }\begin{bmatrix} e^{ \theta_1^T x^{(i)} } \\e^{ \theta_2^T x^{(i)} } \\\vdots \\e^{ \theta_k^T x^{(i)} } \\\end{bmatrix} \\&=\frac{ 1 }{ \sum_{j=1}^{k}{e^{ \theta_j^T x^{(i)} - \alpha }} }\begin{bmatrix} e^{ \theta_1^T x^{(i)} - \alpha } \\e^{ \theta_2^T x^{(i)} - \alpha } \\\vdots \\e^{ \theta_k^T x^{(i)} - \alpha } \\\end{bmatrix} \\\end{align}

Hence, to prevent overflow, simply subtract some large constant value from each of the \theta_j^T x^{(i)} terms before computing the exponential. In practice, for each example, you can use the maximum of the \theta_j^T x^{(i)} terms as the constant. Assuming you have a matrix M containing these terms such that M(r, c) is \theta_r^T x^{(c)}, then you can use the following code to accomplish this:

% M is the matrix as described in the textM = bsxfun(@minus, M, max(M, [], 1));

max(M) yields a row vector with each element giving the maximum value in that column. bsxfun (short for binary singleton expansion function) applies minus along each row of M, hence subtracting the maximum of each column from every element in the column.

Implementation Tip: Computing the predictions - you may also find bsxfun useful in computing your predictions - if you have a matrix M containing the e^{\theta_j^T x^{(i)}} terms, such that M(r, c) contains the e^{\theta_r^T x^{(c)}} term, you can use the following code to compute the hypothesis (by dividing all elements in each column by their column sum):

% M is the matrix as described in the textM = bsxfun(@rdivide, M, sum(M))

The operation of bsxfun in this case is analogous to the earlier example.

代码实现:

% the hypothesisM=theta*data;M=bsxfun(@minus, M, max(M, [], 1)); % to prevent overflowExpM=exp(M);h=bsxfun(@rdivide,ExpM,sum(ExpM));cost=-1/numCases*sum(sum(groundTruth.*log(h)))+lambda/2*sum(theta(:).^2);thetagrad=-1/numCases.*(groundTruth-h)*data.'+lambda.*theta;
PS: 在完成这部分代码进行gradient checking时,得到的diff总是在0.020左右,应该在10e-9order才是正确的,各种修改cost和thetagrad的表达式,怎么也不能得到理想的效果,弄了半天发现自己犯了一个stupid的错误
theta(:).^2 误写成  theta(:).*2  啊啊啊 ,真是疯了

Step 3: Gradient checking

Once you have written the softmax cost function, you should check your gradients numerically. In general, whenever implementing any learning algorithm, you should always check your gradients numerically before proceeding to train the model. The norm of the difference between the numerical gradient and your analytical gradient should be small, on the order of 10 − 9.

Implementation Tip: Faster gradient checking - when debugging, you can speed up gradient checking by reducing the number of parameters your model uses. In this case, we have included code for reducing the size of the input data, using the first 8 pixels of the images instead of the full 28x28 images. This code can be used by setting the variable DEBUG to true, as described in step 1 of the code.

Step 4: Learning parameters

Now that you've verified that your gradients are correct, you can train your softmax model using the function softmaxTrain in softmaxTrain.msoftmaxTrain which uses the L-BFGS algorithm, in the function minFunc. Training the model on the entire MNIST training set of 60000 28x28 images should be rather quick, and take less than 5 minutes for 100 iterations.

Factoring softmaxTrain out as a function means that you will be able to easily reuse it to train softmax models on other data sets in the future by invoking the function with different parameters.

Use the following parameter when training your softmax classifier:

lambda = 1e-4

Step 5: Testing

Now that you've trained your model, you will test it against the MNIST test set, comprising 10000 28x28 images. However, to do so, you will first need to complete the function softmaxPredict in softmaxPredict.m, a function which generates predictions for input data under a trained softmax model.

代码实现:

M=theta*data;M=bsxfun(@minus, M, max(M, [], 1)); % to prevent overflowExpM=exp(M);h=bsxfun(@rdivide,ExpM,sum(ExpM));[hmax,pred]=max(h);clear hmax;

Once that is done, you will be able to compute the accuracy (the proportion of correctly classified images) of your model using the code provided. Our implementation achieved an accuracy of 92.6%. If your model's accuracy is significantly less (less than 91%), check your code, ensure that you are using the trained weights, and that you are training your model on the full 60000 training images. Conversely, if your accuracy is too high (99-100%), ensure that you have not accidentally trained your model on the test set as well.

最终的分类精度为:92.640%

源代码下载:

转载地址:http://pmjti.baihongyu.com/

你可能感兴趣的文章
leetcode刷题191 位1的个数 Number of 1 Bits(简单) Python Java
查看>>
leetcode刷题198 打家劫舍 House Robber(简单) Python Java
查看>>
NG深度学习第一门课作业2 通过一个隐藏层的神经网络来做平面数据的分类
查看>>
leetcode刷题234 回文链表 Palindrome Linked List(简单) Python Java
查看>>
NG深度学习第二门课作业1-1 深度学习的实践
查看>>
Ubuntu下安装Qt
查看>>
Qt札记
查看>>
我的vimrc和gvimrc配置
查看>>
hdu 4280
查看>>
禁止使用类的copy构造函数和赋值操作符
查看>>
C++学习路线
查看>>
私有构造函数
查看>>
组队总结
查看>>
TitledBorder 设置JPanel边框
查看>>
DBCP——开源组件 的使用
查看>>
抓包工具
查看>>
海量数据相似度计算之simhash和海明距离
查看>>
DeepLearning tutorial(5)CNN卷积神经网络应用于人脸识别(详细流程+代码实现)
查看>>
DeepLearning tutorial(6)易用的深度学习框架Keras简介
查看>>
DeepLearning tutorial(7)深度学习框架Keras的使用-进阶
查看>>