classification - Is my Matlab code correct for applying PCA to data? -
I have the following code for calculating PCA in Matlab:
train_out = Train '; Test_out = test '; Decrease the mean off for% each dimension mn = mean (train_out, 2); Train_out = train_out - repmat (mn, 1, train_size); Test_out = test_out - repmat (mn, 1, test_size); Calculation of% sympathetic matrix co-operation = 1 / (train_size -1) * train_out * train_out '; % Eigenvectors and eigenvalues get [pc, v] = eig (co-type); Removing the diagonal of the matrix as Vector V = Dye (V); Variations in order decreasing [junk, rindices] = sort (-1 * v); V = rindices; PC = PC (:, rindices); % Project Basic Data Set = PC '* train_out; Train_out = out '; Out = PC '* test_out; Test_out = out ';
The train and test matrix lines have a feature variable in observation and columns. When I classify the original data (without PCA), I get better results than PCA, even when I have all the dimensions. When I tried to straighten the PCA on the whole dataset (train + test) then I saw that the interrelationship between these new principal components and the previous ones are either about 1 or about -1 which makes me feel weird. I'm probably doing something wrong, but I can not understand it.
The code is correct, although using the precomcump function can be easy:
< Pre> train_out = train; To save% original data, test_out = test; MN = means (train_out); Train_out = bsxfun (@ zero; train_out, mn); % Sub abstract means test_out = bsxfun (@ zero, test_out, mn); [Coffs, scores, variance] = prinxp (train_out, 'econ'); % PCA pervar = cumsum / variances; Dims = Max (search (pervar & lt; var_frac)); % Var_frac - e.g. 0.99 - explained the fraction of the variance Train_out = Train_out * Coff ((, 1: dim);% Dims - Keep this multiple dimensions test_out = test_out * coefs (:, 1: dims);% result train_out and test_out
Comments
Post a Comment