I have a relatively large SQLite database that I’m building. The general access profile of this database is:
- Initial load of 300 million rows.
- Initial read of each row, and UPDATE on approximately 30 million rows
- Ongoing read-only access through a python script with a MyDatabaseReader class.
The data structure is a flat denormalized table and queries are of the form:
WHERE chromosome=TEXT, position=INT, reference_sequence=TEXT, alternate_sequence=TEXT
I would like to validate that an incoming query requests a valid chromosome name, based on the contents of the database. Requesting chromosome=’chr13′ is valid, but in the case of ’13’ or ‘chr31’ I would like my python script to throw an error, instead of silently returning zero rows. To enable this, my during the init of my python class, it makes an initial query to get the set of valid chromosome names:
SELECT DISTINCT chromosome from dbsnp;
This query takes a LONG time, and bogs down the execution. I have tried both a compound index on (chromosome, position, reference_sequence, alternate_sequence), as well as a single index on chromosome, and have verified with EXPLAIN QUERY PLAN that in both cases the index gets used.
sqlite> EXPLAIN QUERY PLAN SELECT DISTINCT chromosome FROM dbsnp; order|from|detail 0|0|TABLE dbsnp WITH INDEX chromosome ORDER BY
My question: Is there some SQL trick to query directly from the chromosome index. I don’t care about anything else in the row, and it seems that the index is a pre-built version of the data that I want to return.
Alternatively, I’m considering constructing a chromosome_names table that I populate with my SELECT DISTINCT query above after database load and update. Because it scares me to construct a static table that can fall out of sync with the main table I’m considering triggers to update the chromosome_names table on change of the main table. However, I’m concerned that this may cause significant churn should I update rows in the main table, and more importantly, that I’m reinventing what is essentially contained in an index.
Is there a good way to get my distinct values query from the index directly, or alternatively, is there a way to have SQLite throw an error should the queried value for chromosome be outside of the set of contained values (note: position, ref, and alt sequences are expected to sometimes query for unexpected values so erroring on zero rows returned will not work).
✓ Extra quality
ExtraProxies brings the best proxy quality for you with our private and reliable proxies
✓ Extra anonymity
Top level of anonymity and 100% safe proxies – this is what you get with every proxy package
✓ Extra speed
1,ooo mb/s proxy servers speed – we are way better than others – just enjoy our proxies!
USA proxy location
We offer premium quality USA private proxies – the most essential proxies you can ever want from USA
Our proxies have TOP level of anonymity + Elite quality, so you are always safe and secure with your proxies
Use your proxies as much as you want – we have no limits for data transfer and bandwidth, unlimited usage!
Superb fast proxy servers with 1,000 mb/s speed – sit back and enjoy your lightning fast private proxies!
99,9% servers uptime
Alive and working proxies all the time – we are taking care of our servers so you can use them without any problems
No usage restrictions
You have freedom to use your proxies with every software, browser or website you want without restrictions
Perfect for SEO
We are 100% friendly with all SEO tasks as well as internet marketing – feel the power with our proxies
Buy more proxies and get better price – we offer various proxy packages with great deals and discounts
We are working 24/7 to bring the best proxy experience for you – we are glad to help and assist you!