using Postgres 9.6, pgAdmin 4 (for running the queries) on Windows 10 64-bit with Intel Core i7-6700 CPU @ 3.40GHz, 16GB RAM, 1 TB C drive HDD , 2 TB D drive HDD. Postgres data is stored in D drive
I’m using Postgres to process some dataset that I receive in csv format- the steps involved are as follows
- Create table schema – 4 tables are created, one for each set of csv files with the same schema as per the fields in the input csv files
- Import CSVs to Postgres DB – Import around 4 sets of 500 CSV files each of around 4 million rows into own tables – say
table04– I use a batch file to get the list of csv files in each set and use
psql -c "\copy ..command to import them. This process takes around 30s per csv file to import – so altogether 1000 minutes or ~17hrs . Each table has ~2bill rows and around 200MB per table.
- Create a subset of valid records and select relevant columns – For each table in #2 above, create a new table (say
table04_valid) which has a subset of columns and exclude some rows based on a criteria – after this operation the number of rows comes down to ~1.5bill per table with each table ~150MB in size – this operation takes ~20hrs to execute for the 4 tables
- Select distinct records and merge – in this step the distinct rows from each table is selected, merge them via
union alland a final distinct rows from the merged records is selected – this is saved into a new table say
table_valid_distinct_1234– the table size is ~500mill rows and ~50GB size – this operation takes ~12hrs
- Run transformations on some columns – in this step for each row, two columns are concatenated and selected as
idfield , two other columns are hashed, then concatenated and selected as
namefield. A final select is performed on this set to remove any
ids with more than one unique
name– I’m yet to run this query.
Given the above steps, I wanted to collect feedback on improving processing speed given the number of records, data size and the hardware I’m using. Ideas on improving performance and disk usage in any of the steps from 1 to 5 would be helpful. I could provide additional info needed. Also if there is some sort of standard process to investigate this, that would also be very helpful. If there are hardware related enhancements required – should I be looking at adding more memory, more RAM or something else ?
✓ Extra quality
ExtraProxies brings the best proxy quality for you with our private and reliable proxies
✓ Extra anonymity
Top level of anonymity and 100% safe proxies – this is what you get with every proxy package
✓ Extra speed
1,ooo mb/s proxy servers speed – we are way better than others – just enjoy our proxies!
USA proxy location
We offer premium quality USA private proxies – the most essential proxies you can ever want from USA
Our proxies have TOP level of anonymity + Elite quality, so you are always safe and secure with your proxies
Use your proxies as much as you want – we have no limits for data transfer and bandwidth, unlimited usage!
Superb fast proxy servers with 1,000 mb/s speed – sit back and enjoy your lightning fast private proxies!
99,9% servers uptime
Alive and working proxies all the time – we are taking care of our servers so you can use them without any problems
No usage restrictions
You have freedom to use your proxies with every software, browser or website you want without restrictions
Perfect for SEO
We are 100% friendly with all SEO tasks as well as internet marketing – feel the power with our proxies
Buy more proxies and get better price – we offer various proxy packages with great deals and discounts
We are working 24/7 to bring the best proxy experience for you – we are glad to help and assist you!