I have two tables in psql that I am trying to join based on certain attributes – they are very large – 17 M rows and 2.7 M rows respectively
mydb=> SELECT reltuples::bigint AS estimate FROM pg_class where relname='foo'; estimate ---------- 17087196 (1 row) mydb=> SELECT reltuples::bigint AS estimate FROM pg_class where relname='bar'; estimate ---------- 2763829 (1 row)
On both, I have created spatial indices with
CREATE INDEX foo_gix ON foo USING GIST (the_geom);
The query I am running is a point in polygon analysis based on data collected at historical time intervals – counting points in polygons when the timestamps on both tables match. Points are locations of mobile phone connections (bar_history
and bar
) and polygons are buffers around certain areas (foo_history
, foo
)
It looks like this:
select s.id, s.place_id, s.time, count(l.location) as total FROM foo_history as s LEFT JOIN foo as p ON s.place_id = p.id LEFT JOIN bar_history l ON ST_Contains(ST_Buffer(ST_Transform(ST_SetSRID(ST_Centroid(p.polygon), 4326), 32615), 100), ST_Transform(l.location, 32615)) LEFT JOIN bar ON bar.phone_id = l.id WHERE bar.network_id = 2 group by s.id LIMIT 5
This query returns results successfully in less than a few seconds. However, when I add join by timestamps:
select s.id, s.place_id, s.time, count(l.location) as total FROM foo_history as s LEFT JOIN foo as p ON s.place_id = p.id LEFT JOIN bar_history l ON ST_Contains(ST_Buffer(ST_Transform(ST_SetSRID(ST_Centroid(p.polygon), 4326), 32615), 100), ST_Transform(l.location, 32615)) LEFT JOIN bar ON bar.phone_id = l.id WHERE bar.network_id = 2 AND to_timestamp(floor((extract('epoch' from l.time::timestamp) / 600 )) * 600) = to_timestamp(floor((extract('epoch' from s.time::timestamp) / 600 )) * 600) group by s.id LIMIT 5
the query runs endlessly, even though nearly all of the timestamps will match each other from both tables. I am not a database expert and would appreciate any suggestions on how to speed up this query or add some kind of index that might speed it up.