0

I am trying to figure out how to store time series data for an ad platform I am working on.

Basically I want to know some strategies/solutions for storing billions of rows of data so that I can easily search it (about 6-8 indexes on the table) and get fast counts based on queries.

I tried mySQL with the tokuDB engine and this seems to be very fast but is extremely slow when I try to do a COUNT query when the rows reached about 5-8 million.

I was looking at some noSQL alternatives but since I want to be able to search this data this is probably not the best solution. I was using dynamoDB. I would have had to store the data is many places in order to account for all the searching on the data.

What I am storing is a row in the database for each click on an AD that occurs. This table will grow very fast, especially when this site gets large.

Another solution would be to separate this data per advertiser. This means each advertiser will have their own table where all their data goes into. This means it will be much smaller and the COUNT queries will be much faster. I can even split it up by advertiser and month.

My goal is to give an advertiser the ability to search and display in a paginated way all their clicks. They should be able to get data between a time period and filter by about 5-8 other indexes if they want to.

gprime
  • 101
  • 1
  • What does your current schema look like? – Mark Storey-Smith May 16 '13 at 19:27
  • Can you post the table definition (desc ) and the current indexes you are using ? - Because you are currently working on (as in still developing ?) on an ad platform, don't try to over think the solution. Check "OpenX" (open source ad server) and get an ideea on how others solved the problems you're facing now. - "each advertiser will have their own table where all their data goes into" .. -> Why not using one table, and use partitioning by advertiser_id. And later switch to sharding ...
  • – MTIhai May 17 '13 at 10:16
  • "My goal is to give an advertiser the ability to search and display in a paginated way all their clicks" -> Do you really want that ? Talk to your customers, and propose an "Download to Excel" click-data option. No one (of the big sites I've worked on) cares that much of the raw data. They will probably be more interested in features like click fraud prevention, revenue / costs forecasts ...
  • – MTIhai May 17 '13 at 10:16