Mongoose Population

Mongoose is an object modeler for MongoDb. It has a particularly great feature, called population. Population allows you to include other objects in a specified object. It makes the process a lot easier and more automatic.
Document Design
Document databases provide a lot of flexibility in how data is modeled. Just as the name suggests, data will often mimic real-world documents. MongoDb isn't a document database per se, but it has enough flexibility to be treated as one. Mongoose population can help.
A document is often optimized for reads. All the data is included, such as it would be on a real-world piece of paper.
In Mongo, data is divided into collections. Mongoose helps define discrete schemas for data in each. This separates and makes data fields explicit. This helps with data organization, storage, and writing. Population helps bring data from different collections back together again easily. From the docs:
Population is the process of automatically replacing the specified paths in the document with document(s) from other collection(s).
So, let's try it out with a couple examples.
Data Definition
Mongo will add a primary surrogate key to each object, called ObjectId, referenced as _id in the data. This will be the key used to refer to objects in other collections. If I wanted to model players in leagues -- foosball leagues of course -- I might create schemas like this:
mongoose = require 'mongoose'
Schema = mongoose.Schema
playerSchema = new mongoose.Schema
display_name: String
leagues: [
type: Schema.Types.ObjectId
ref: 'League'
]
Player = mongoose.model 'Player', playerSchema
mongoose = require 'mongoose'
Schema = mongoose.Schema
leagueSchema = new mongoose.Schema
display_name: String
created_by:
type: Schema.Types.ObjectId
ref: 'Player'
players: [
type: Schema.Types.ObjectId
ref: 'Player'
]
League = mongoose.model 'League', leagueSchema
A few points of interest:
- Since we are using
_idto refer to other objects, we use theObjectIdtype in the Mongoose definition. You can use other fields. Just make sure the type matches. - The
refattribute must match exactly the model name in your model definition. Otherwise you'll get something like this little beauty:MissingSchemaError: Schema hasn't been registered for model "Player".. - Note that
League.playersis an array. Just surround the field definition in square brackets to get this functionality.
Data Read
Reading data is where Mongoose population really shines. This is the magic that makes reads of documents very straightforward, easy, and fast. The magic is in the populate() function.
Populate One Field
If I want to populate a single field in the query for an object, I specify the name of that field in a string to the populate function:
Player
.findOne({ _id: 'abc123' })
.populate('leagues')
.exec (err, player) -> #...
leagues will be populated with an array of full League objects when the resulting json returned, just like magic.
Populate Multiple Fields
Populating multiple referenced objects is similarly easy:
League
.find()
.populate('created_by players')
.exec (err, league) -> #...
Just separate the field names in your populate parameter with spaces. This query will return an array of League with the created_by and player fields populated with the associated Player objects.
Populate Partial Objects
Populating objects like this can quickly bloat your payload size. To limit included objects to only a subset of fields, you can specify exactly what parts you want populated. For instance, if my client UI only needed to show a list of leagues that a player belongs to, I could ask for just the display_name of the included League object by using a 2nd parameter:
Player
.findOne({ _id: 'abc123' })
.populate('leagues', 'display_name')
.exec (err, player) -> #...
For listing multiple parts, separate the attribute names with spaces.
Forget to Populate
You might get so used to having objects populated for some of your queries that you might wonder why they're not populated in your latest query. You probably just forgot to call populate() in your query. You must do this explicitly to get the inclusions you desire. Otherwise, you the data you query will just include the _id values.
Explicitly Exclude Field
It might not be that you forgot to populate, but that on some queries you don't want to populate. In these cases, you might not want to be sending around unneeded _id values, especially if they make up a large portion of your data size. You can explicitly exclude such fields. For instance, if you wanted all Player models but weren't going to populate leagues, you might query:
Player
.find()
.select('-leagues')
.exec (err, players) -> #...
Note the - sign in the select clause. This removes the field from the results.
Data Write
When you go to read the data, it's quick, easy, and automatic. But that's because some work was done previously to reference the correct objects and make sure these references are saved. Therefore, the naturally more work-intensive part of the population story is the data writing.
When we write to our example models, we need to save the proper references. For instance, when a new League is created, let's say it automatically needs a created_by Player reference saved and the creating Player will automatically join the league:
league =
display_name: myLeagueName
created_by: currentPlayer._id
League.create league, (err, league) ->
if err?
# ... do smart things that are never shown in a tutorial
else
Player.update { _id: currentPlayer._id },
$push:
leagues: league._id
, (err, numberAffected, raw) ->
if err?
# you know...
else
res.json league
A few points of interest:
- Lowercase
leagueis just the json to populate aLeaguemodel.leagueis also the shadowed variable name in theLeague.createcallback. currentPlayeris just an imaginedPlayerreference that has an_idthat you will use to associatePlayerto thisLeague.$pushis a specialupdateattribute that appends new elements to a model's array.
Depending on how complex your model relationships become, you may opt for a difference code strategy besides nesting callbacks. Don't like your pyramid of doom? Try the awesome async.js.
I don't know if Mongoose population is going to change your life, but I was very happy when I found this feature. I had been doing junk like this manually. What are some other great use cases that you've found?