Couldn’t find a newbie forum; hope this is in the right place.
In my quest to learn more about databases, analytics, and creating service APIs to work with them, I decided to scrape an online archive of Jeopardy data and plan to reconstruct the data into a functional db. Pretty sure this has been done before, but learning is the primary objective here π
I’d like to get the data structured so I can run some advanced statistics on it. To do so, I’ll need to query things like a player’s score at any given point in the game, see who picked which clues and in what order, identify (and grade) players’ wagering on daily doubles & Final Jeopardy, etc. If all goes to plan, it would be neat to expose this via public API so others can play with it too.
There’s a decent amount of data here, though it’s not very big compared to enterprise dbs:
- 6000 distinct games
- ~60 clues per game, so ~360k total unique clues.
- 12000 players, which includes duplicates (ex: Ken1, Ken2, Ken3)
Here’s the latest draft of my data model (SVG) based on the data available from j-archive, the general structure of Jeopardy games, and my limited understanding of database design. Appreciate any feedback — especially on the clue_response table, which is the only table with transactional data.
Data is currently stored in an sqlite3 db. Will probably migrate it to MySQL once I settle on a structure.