Low Latency Aggregations on Cassandra -


We are using a lot of time series metrics in our system. We can not handle written load with our existing infrastructure. Therefore, we are evaluating the cassandra.

More information about our current Realtime system

  • We collect application specific time series metrics
  • We will assign them a DB
  • Here we have sampling of data
  {appId: 'applicationId', route: 'root name', time: 1406845866304, resTime: 500, dbTime: 200}  
  • After the data, we are collecting them at various proposals on time
  • For example, we have Trackers per metric Pre-aggregated data for 1 minute, 30 minutes, 3 hours, day of Resolution
  • Then ask our front Aend app different question below. We do adhoc aggregation to Acer them.

  • Give me the average resume for this X time period
  • List me the top 10 routes that have high resume

Concerns with Casandra

We can pre-aggregate with some background jobs running. But we need to do adhoc queries under low latency (less then 5ms). With our pre-assembled data, it looks trivial.

But when there is no consolidation support from Casandra, then we need to find some other solutions. We tried to spark (without caseandra) with some of the memory data sources. But in this way it took more time to collect data for a small amount of data.

Therefore, the only viable option is to create an aggregation engine at the top of Kassandra. I just want to know that there are some other ways to do this, or are there some existing solutions?


Comments

Popular posts from this blog

Member with no value in F# -

java - Joda Time Interval Not returning what I expect -

c# - Showing a SelectedItem's Property -