We are receiving realtime data from over a 1000 sensors, each of which sends a data point every 10 seconds in average, which is amounting to about 1 million rows of data every day. Our system gives users the ability to select any sensor and a date range and download the data points as an Excel file.
We have seen that our users are mostly interested in data that is less than 30 days old. Data that is more than 30 days old is most probably already been downloaded. Only about 1% of our data retrieval requests come for data that is more than 30 days old. However, we cannot say that this data is totally useless, because our users sometimes want to download data that is even more than a year old. Deleting old data is not a possibility.
Currently we are using MySQL database to store the data, and all data is being stored into a single table. The table now has over 60 million rows. We use SSD and we have the right indices due to which the data retrieval still happens considerably faster.
An example database query we use to select every minute data is:
SELECT data_value AS value, param_id AS param_id, data_timestamp AS ts, FROM tbl_data_log WHERE param_id in (?) AND data_timestamp >= ? AND data_timestamp <= ? GROUP BY DATE(data_timestamp), HOUR(data_timestamp), MINUTE(data_timestamp), param_id ORDER BY data_timestamp ASC
Currently, this query takes less than 5 seconds for retrieving data that is more than 30 days old for a specific sensor.
As more data is stored into this table, it is going to get bigger, maybe up to 2 billion rows in the next 1 year (we are also adding more sensors everyday). I do not know how the query performance would be at that stage. To me, storing all this data in a MySQL database doesn’t seem to be right, because it is accessed very rarely, and having data that is more than 4 months old indexed, seems unnecessary.
One approach I thought of is to have only last 30 days of data in MySQL, and move old data to flat files with a folder structure like
/old_data/%YEAR%/%MONTH%/%DATE%/%PARAM_ID%.dat. This way our data size is not going to become unmanageable but at the same time data is still indexed in the form of flat files on disk.
Is the current approach good to scale? Does moving old data to flat files help or not? Is storing all data in a single table correct? Do we need to change our database engine itself? Please give your thoughts on this architecture. Thank you very much in advance!
✓ Extra quality
ExtraProxies brings the best proxy quality for you with our private and reliable proxies
✓ Extra anonymity
Top level of anonymity and 100% safe proxies – this is what you get with every proxy package
✓ Extra speed
1,ooo mb/s proxy servers speed – we are way better than others – just enjoy our proxies!
USA proxy location
We offer premium quality USA private proxies – the most essential proxies you can ever want from USA
Our proxies have TOP level of anonymity + Elite quality, so you are always safe and secure with your proxies
Use your proxies as much as you want – we have no limits for data transfer and bandwidth, unlimited usage!
Superb fast proxy servers with 1,000 mb/s speed – sit back and enjoy your lightning fast private proxies!
99,9% servers uptime
Alive and working proxies all the time – we are taking care of our servers so you can use them without any problems
No usage restrictions
You have freedom to use your proxies with every software, browser or website you want without restrictions
Perfect for SEO
We are 100% friendly with all SEO tasks as well as internet marketing – feel the power with our proxies
Buy more proxies and get better price – we offer various proxy packages with great deals and discounts
We are working 24/7 to bring the best proxy experience for you – we are glad to help and assist you!