hadoop - Multidimensional analysis in Hive/Impala -


I have a denormalized table that looks like a sale:

Saleskie, Salesoffers, Salesoff Extracts , CostAufcel Some numerical measure Industry, country, state, sales area, device ID, customer ID, year of sale, month of sale and some more similar dimensions (total 12 dimensions)

Such as a year, the total number of sales in the month, their total cost etc. Need to support Apart from this, these sets need to be filtered, i.e. like the total sales in 2013, XYZ is connected to the customer's manufacturing industry.

I have the facts in these dimension tables and hive / ipala.

I do not think I have made a cube on all the dimensions I've read a paper to see how to do the OLAP on many dimensions:

Which basically small pieces Suggests execution of cubes and performs some sort of sequential calculation when the query expands to several cubes.

I'm not sure how to implement this model in Hive / Empala. Any indication / suggestion would be horrible.

Edit: I have about 10 million rows in the sales table, and the dimensions are not comparable to 100, but about 12 (15 can go) good cardinality each.

I will create cubes using a third-party software. For example, there is an in-memory OLAP server which can not have any problem on all 10 MoO rows over 12 digits. Then the reaction time will be sub-second in all the dimensions. Getting out of hive 10 MoO rows is not an issue (you can use the JDBC driver for that purpose). ICQ is specifically designed to handle high speed.


Comments

Popular posts from this blog

Member with no value in F# -

java - Joda Time Interval Not returning what I expect -

c# - Showing a SelectedItem's Property -