I have never designed a database schema in NoSQL, nor in RDBMS before but for my recent big data project I need to store and locate large amount of data in MongoDB(3.6). I have done it myself, but the querying was hard and speed was slow and I need some advice. Please help me out.
The first collection is 'news'
.
It has 'oid', 'aid', 'date', 'section', 'rank', 'title'
.
'oid', 'aid'
is their unique ID and I need to be able to find 'date', 'section', 'rank', 'title'
by querying them.
My current schema design is:
{ "1234567890123": { // oid + aid "date": "20180126", "section": "100", "rank": "1", "title": "Linus: Intel patch is a piece of shit ", "oid": "123", "aid": "4567890123" }, "1234567890124": { // oid + aid "date": "20180126", "section": "101", "rank": "2", "title": "Intel sues Torvalds for claiming fact", "oid": "123", "aid": "4567890124" } }
About 180 of these individual news info will be stored in a day.
The second collection is 'comments'
. Obviously, it has comments for items in 'news'
.
Each comment object consists of comments specific to items('oid', 'aid'
) in the 'news'
section.
The included data will be 'oid', 'aid', 'date', 'commentCount', 'deletedCommentsCount'
and it has an array of comments 'uid', 'id', 'time', 'upvotes', 'downvotes', 'contents'
My current schema design is:
{ "date": "20180126", //Same with one in 'news' "oid": "123", "aid": "4567890123", "commentCount": "7", "deletedCommentsCount": "0", "reply": { "12345678901230": { // oid + aid + commentid "id": "john_skeet", "time": "2018-01-26 07:32:45", "upvotes": "2", "downvotes": "0", "contents": "So true!" } "12345678901231": { // oid + aid + commentid "id": "sundar_pichai", "time": "2018-01-26 07:35:46", "upvotes": "5", "downvotes": "0", "contents": "@john_skeet Get back to work!" } } }
In a day for the 180 news * 1160 comments(approx) so about 200,000 comments will be stored. (65MB / day)
I need to:
- query all comments in specific
'oid'
- query all comments in specific
'oid'
and'aid'
- query all comments for specific word in comment
'contents'
- query all comments for specific id in comment
'id'
…. so on
Can you tell me the proper way to design a database containing these data? And if you can, the query statements too. ANY comments or answers will be appreciated. Thanks.