I am working on a data warehouse that needs to have responsive look ups as it is used as the source for an operator console. The target is to achieve sub 1 Second lookups (ideal would be 100ms).
The size of the data is expected to increase into the millions, but this can be limited as older data is not as relevant.
Tables either contain data as an append only log, or act as an identifier as to which records are the “latest” for each resource.
First some information on the relevant table:
activity: has 3.5 million rows, 34 columns (append only log)
activitylatest: has 1.1 million rows, 4 columns (identifier for activity)
round: 3.5 million rows, 17 columns (append only log)
roundlatest: 0.7 million rows, 4 columns (identifier for round)
Indexes are available on all columns (except for large texts) since users may filter with any fields and “updateId” is the primary key for data tables.
I’m trying to get the data within activity depending on the user’s filtering which may include a single column from round table.
select activity.update_id, ... activity.activity_id, round.round_external_ref from activity join round on activity.round_id = round.round_id join roundlatest on round.update_id = roundlatest.update_id join activitylatest on activity.update_id = activitylatest.update_id where activity.community_id = 1 order by activity.update_id desc limit 1001
Executing this query will deliver the following plan (
Limit (cost=1.71..12367.37 rows=1001 width=2056) (actual time=2324.231..2341.835 rows=1001 loops=1) -> Nested Loop (cost=1.71..14219657.65 rows=1151081 width=2056) (actual time=2324.231..2341.721 rows=1001 loops=1) -> Nested Loop (cost=1.29..6270077.60 rows=5932256 width=2064) (actual time=2324.204..2333.877 rows=5158 loops=1) -> Nested Loop (cost=0.86..3654699.78 rows=1187917 width=2019) (actual time=0.024..1765.586 rows=390452 loops=1) -> Index Only Scan Backward using "IDX_datawarehouse_gameActivitylatest_update_id" on datawarehouse_gameactivitylatest (cost=0.43..60604.50 rows=1199068 width=8) (actual time=0.015..165.819 rows=390452 loops=1) Heap Fetches: 239183 -> Index Scan using "PK_gameActivity_update_id" on datawarehouse_gameactivity (cost=0.43..2.99 rows=1 width=2019) (actual time=0.002..0.003 rows=1 loops=390452) Index Cond: (update_id = datawarehouse_gameactivitylatest.update_id) Filter: (community_id = 1) -> Index Scan using "IDX_datawarehouse_gameRound_round_id" on datawarehouse_gameround (cost=0.43..2.15 rows=5 width=53) (actual time=0.001..0.001 rows=0 loops=390452) Index Cond: (round_id = datawarehouse_gameactivity.round_id) -> Index Only Scan using "IDX_datawarehouse_gameRoundlatest_update_id" on datawarehouse_gameroundlatest (cost=0.42..1.33 rows=1 width=8) (actual time=0.001..0.001 rows=0 loops=5158) Index Cond: (update_id = datawarehouse_gameround.update_id) Heap Fetches: 1067 Planning time: 1.153 ms Execution time: 2341.989 ms
The first time running the query it runs for over 30 seconds (up to 60 seconds). Successive runs seem to maintain approx 5 seconds durations. I assume this is due to system caching, I do need to fix both the first and successive queries to sub 1 second.
During my testing I noticed that simply switching ordering from DESC to ASC improves the query performance to approx 2.5 seconds with merge join being used. This has left me puzzled and I’m not sure as to the cause and I do need DESC.
ASC Explain Analyze:
Limit (cost=184.88..10004.46 rows=1001 width=2056) (actual time=0.064..23.341 rows=1001 loops=1) -> Nested Loop (cost=184.88..11301988.03 rows=1152097 width=2056) (actual time=0.063..23.265 rows=1001 loops=1) -> Nested Loop (cost=184.46..3332810.27 rows=5940024 width=2064) (actual time=0.054..14.535 rows=5227 loops=1) -> Merge Join (cost=184.03..710946.78 rows=1189471 width=2019) (actual time=0.046..7.915 rows=1001 loops=1) Merge Cond: (datawarehouse_gameactivity.update_id = datawarehouse_gameactivitylatest.update_id) -> Index Scan using "PK_gameActivity_update_id" on datawarehouse_gameactivity (cost=0.43..630047.01 rows=3522681 width=2019) (actual time=0.010..5.513 rows=3007 loops=1) Filter: (community_id = 1) -> Index Only Scan using "IDX_datawarehouse_gameActivitylatest_update_id" on datawarehouse_gameactivitylatest (cost=0.43..60662.53 rows=1200637 width=8) (actual time=0.007..0.204 rows=1001 loops=1) Heap Fetches: 0 -> Index Scan using "IDX_datawarehouse_gameRound_round_id" on datawarehouse_gameround (cost=0.43..2.15 rows=5 width=53) (actual time=0.002..0.005 rows=5 loops=1001) Index Cond: (round_id = datawarehouse_gameactivity.round_id) -> Index Only Scan using "IDX_datawarehouse_gameRoundlatest_update_id" on datawarehouse_gameroundlatest (cost=0.42..1.33 rows=1 width=8) (actual time=0.001..0.001 rows=0 loops=5227) Index Cond: (update_id = datawarehouse_gameround.update_id) Heap Fetches: 88 Planning time: 1.124 ms Execution time: 23.475 ms
(Explain analyse says 23 ms, but actually running the query takes 2.5 seconds according to pgadmin)
We are using PostgreSQL 9.6 on SSD cloud based VMs.
✓ Extra quality
ExtraProxies brings the best proxy quality for you with our private and reliable proxies
✓ Extra anonymity
Top level of anonymity and 100% safe proxies – this is what you get with every proxy package
✓ Extra speed
1,ooo mb/s proxy servers speed – we are way better than others – just enjoy our proxies!
USA proxy location
We offer premium quality USA private proxies – the most essential proxies you can ever want from USA
Our proxies have TOP level of anonymity + Elite quality, so you are always safe and secure with your proxies
Use your proxies as much as you want – we have no limits for data transfer and bandwidth, unlimited usage!
Superb fast proxy servers with 1,000 mb/s speed – sit back and enjoy your lightning fast private proxies!
99,9% servers uptime
Alive and working proxies all the time – we are taking care of our servers so you can use them without any problems
No usage restrictions
You have freedom to use your proxies with every software, browser or website you want without restrictions
Perfect for SEO
We are 100% friendly with all SEO tasks as well as internet marketing – feel the power with our proxies
Buy more proxies and get better price – we offer various proxy packages with great deals and discounts
We are working 24/7 to bring the best proxy experience for you – we are glad to help and assist you!