I have data that is structured like the following:
users: id | name | parent_id 1 | Bob | NULL 2 | Jan | 1 3 | Mat | 2 4 | Irene | 2 5 | Ellie | 2 6 | Laura | 5 7 | Uma | 6 user_sales: user_id | sales_period | total_volume | total_revenue | .... 1 | Jan-2017 | 1000 | 56000 1 | Feb-2017 | 1500 | 65000 2 | Jan-2017 | 650 | 45500 5 | Jan-2017 | 800 | 49005 6 | Jan-2017 | 1000 | 56000 add a bunch more tables that use the core users tree structure...
We have client databases ranging in size from ~60GB to ~1TB and infinitely scaling database servers to support large ETL operations isn’t an option. In researching solutions, it looks like our best bet would be to find a way to employ parallel processing but a fundamental question we keep coming back to is whether you can use parallel processing when everything requires traversing a tree structure like we have?
Can anyone answer whether we can process a rooted tree data structure in parallel and if so, do you have suggestions on how it should be done?