Mongoose Population
Mongoose is an object modeler for MongoDb. It has a particularly great feature, called population. Population allows you to include other objects in a specified object. It makes the process a lot easier and more automatic.
Document Design
Document databases provide a lot of flexibility in how data is modeled. Just as the name suggests, data will often mimic real-world documents. MongoDb isn't a document database per se, but it has enough flexibility to be treated as one. Mongoose population can help.
A document is often optimized for reads. All the data is included, such as it would be on a real-world piece of paper.
In Mongo, data is divided into collections. Mongoose helps define discrete schemas for data in each. This separates and makes data fields explicit. This helps with data organization, storage, and writing. Population helps bring data from different collections back together again easily. From the docs:
Population is the process of automatically replacing the specified paths in the document with document(s) from other collection(s).
So, let's try it out with a couple examples.
Data Definition
Mongo will add a primary surrogate key to each object, called ObjectId, referenced as _id
in the data. This will be the key used to refer to objects in other collections. If I wanted to model players in leagues -- foosball leagues of course -- I might create schemas like this:
mongoose = require 'mongoose'
Schema = mongoose.Schema
playerSchema = new mongoose.Schema
display_name: String
leagues: [
type: Schema.Types.ObjectId
ref: 'League'
]
Player = mongoose.model 'Player', playerSchema
mongoose = require 'mongoose'
Schema = mongoose.Schema
leagueSchema = new mongoose.Schema
display_name: String
created_by:
type: Schema.Types.ObjectId
ref: 'Player'
players: [
type: Schema.Types.ObjectId
ref: 'Player'
]
League = mongoose.model 'League', leagueSchema
A few points of interest:
- Since we are using
_id
to refer to other objects, we use theObjectId
type in the Mongoose definition. You can use other fields. Just make sure the type matches. - The
ref
attribute must match exactly the model name in your model definition. Otherwise you'll get something like this little beauty:MissingSchemaError: Schema hasn't been registered for model "Player".
. - Note that
League.players
is an array. Just surround the field definition in square brackets to get this functionality.
Data Read
Reading data is where Mongoose population really shines. This is the magic that makes reads of documents very straightforward, easy, and fast. The magic is in the populate()
function.
Populate One Field
If I want to populate a single field in the query for an object, I specify the name of that field in a string to the populate
function:
Player
.findOne({ _id: 'abc123' })
.populate('leagues')
.exec (err, player) -> #...
leagues
will be populated with an array of full League
objects when the resulting json returned, just like magic.
Populate Multiple Fields
Populating multiple referenced objects is similarly easy:
League
.find()
.populate('created_by players')
.exec (err, league) -> #...
Just separate the field names in your populate
parameter with spaces. This query will return an array of League
with the created_by
and player
fields populated with the associated Player
objects.
Populate Partial Objects
Populating objects like this can quickly bloat your payload size. To limit included objects to only a subset of fields, you can specify exactly what parts you want populated. For instance, if my client UI only needed to show a list of leagues that a player belongs to, I could ask for just the display_name
of the included League
object by using a 2nd parameter:
Player
.findOne({ _id: 'abc123' })
.populate('leagues', 'display_name')
.exec (err, player) -> #...
For listing multiple parts, separate the attribute names with spaces.
Forget to Populate
You might get so used to having objects populated for some of your queries that you might wonder why they're not populated in your latest query. You probably just forgot to call populate()
in your query. You must do this explicitly to get the inclusions you desire. Otherwise, you the data you query will just include the _id
values.
Explicitly Exclude Field
It might not be that you forgot to populate, but that on some queries you don't want to populate. In these cases, you might not want to be sending around unneeded _id
values, especially if they make up a large portion of your data size. You can explicitly exclude such fields. For instance, if you wanted all Player
models but weren't going to populate leagues
, you might query:
Player
.find()
.select('-leagues')
.exec (err, players) -> #...
Note the -
sign in the select
clause. This removes the field from the results.
Data Write
When you go to read the data, it's quick, easy, and automatic. But that's because some work was done previously to reference the correct objects and make sure these references are saved. Therefore, the naturally more work-intensive part of the population story is the data writing.
When we write to our example models, we need to save the proper references. For instance, when a new League
is created, let's say it automatically needs a created_by
Player
reference saved and the creating Player
will automatically join the league:
league =
display_name: myLeagueName
created_by: currentPlayer._id
League.create league, (err, league) ->
if err?
# ... do smart things that are never shown in a tutorial
else
Player.update { _id: currentPlayer._id },
$push:
leagues: league._id
, (err, numberAffected, raw) ->
if err?
# you know...
else
res.json league
A few points of interest:
- Lowercase
league
is just the json to populate aLeague
model.league
is also the shadowed variable name in theLeague.create
callback. currentPlayer
is just an imaginedPlayer
reference that has an_id
that you will use to associatePlayer
to thisLeague
.$push
is a specialupdate
attribute that appends new elements to a model's array.
Depending on how complex your model relationships become, you may opt for a difference code strategy besides nesting callbacks. Don't like your pyramid of doom? Try the awesome async.js.
I don't know if Mongoose population is going to change your life, but I was very happy when I found this feature. I had been doing junk like this manually. What are some other great use cases that you've found?