0

I'm writing an application that gathers statistics of users across multiple social networks accounts. I have a collection of users and I would like to store the statistics information of each user.

Now, I have two options:

  1. Create a collection that stores users statistics documents, and add a reference object to each of the user documents that links it to the corresponding document in the statistics collection.
  2. Embed a statistics document in each of the users document.

Besides for query performance (which I'm less concerned about):

  1. what are the pros and cons of each of these approaches?
  2. What should I take into account if I choose to use references rather than embedding the information inside the user document?
omer
  • 1,242
  • 4
  • 18
  • 45
  • In my view, embedding is good if you have one-to-one or one-to-many relationships between entities, and reference is good if you have many-to-many relationships. – Priyanka Kariya May 30 '19 at 11:27
  • 1
    Also, refer https://stackoverflow.com/questions/5373198/mongodb-relationships-embed-or-reference – Priyanka Kariya May 30 '19 at 11:34

1 Answers1

0

The shape of the data is determined by the application itself.

There’s a good chance that when you are working with the users data, you probably need statistics details.

The decision about what to put in the document is pretty much determined by how the data is used by the application.

The data that is used together as users documents is a good candidate to be pre-joined or embedded.

One of the limitations of this approach is the size of the document. It should be a maximum of 16 MB.

Another approach is to split data between multiple collections.

One of the limitations of this approach is that there is no constraint in MongoDB, so there are no foreign key constraints as well.

The database does not guarantee consistency of the data. Is it up to you as a programmer to take care that your data has no orphans.

Data from multiple collections could be joined by applying the lookup operator. But, a collection is a separate file on disk, so seeking on multiple collections means seeking from multiple files, and that is, as you are probably guessing, slow.

Generally speaking, embedded data is the preferable approach.