python - Calculate special correlation distance matrix faster -


I want to create a distance matrix using the Pearson correlation distance. I tried the first scipy.spatial.distance.pdist (df, 'correlation') which is a 20 dataset of 20 attributes for my 5000 rows.

Since I want to make a recommendor, I wanted to change a little distance, considering only those two features which are different for the NANs for both users. Actually, scipy Spatial.distance.pdist (df, 'correlation') When output is matched with any feature whose value is float ('nan').

It is mine code, DF mine 5000 * 20 pendus dataframe

  dist_mat = [] en = df.shape [1] for EN, enumerate in line_i ( Df.itertuples ()): for j, in row_j calculation (df.itertuples ()): if i & lt; J: print (i, j) ind = [wrong (math.isnan (row_i [t + 1]) or math.isnan (row_j [t + 1])) correct for d in another category (d)] dist_mat .append (scipy.spatial.distance.correlation (for row [row_i [t], [india] [row_j [t]))  

This code works but it is ashtoningly slow compared to scipy.spatial.distance.pdist (df, 'correlation') One of my questions is how can I improve my code so that it can be very fast? Or can I find a library that calculates the connection between two vectors, which only appears in those two which appears in them?

Thank you for your reply.

I think you need to do this with the statement, here is an example:

To check the output of the NAN without:

  SPP. SPECIALIST DISTENSE IMPORT PDIT, SQUAREFRAME, CONNECTION x = np.random.rand (2000, 20) np.allclose (pair_correlation (x), square (pdist (x, "correlation")))  
< P> NAN to check the output:

  x = np.rando M.rand (2000, 20) x [x & lt; 0.3] = np.nan r = pair_correlation (x) i, j = 200, 60 # Change this mask = {(NPISS (x [i]) | NPIn (x [j]) U = x [I, mask] v = x [j, mask] emphasizes abdomen (correlation (u, v) - r [i, j]]  

Comments

Popular posts from this blog

Member with no value in F# -

java - Joda Time Interval Not returning what I expect -

c# - Showing a SelectedItem's Property -