DEV Community

Cover image for $elemMatch and Multi-Key Indexes
Franck Pachot for MongoDB

Posted on

5 2 2 2 2

$elemMatch and Multi-Key Indexes

In the previous post, I used the following index on the daily views data, which is an array of integers for each video:

db.youstats.createIndex({ 
 "views.daily.data": -1      , // for Sort on maximum daily view
 commentsNumber: 1           , // for additional filter on comments
}); 
Enter fullscreen mode Exit fullscreen mode

The index was used to find the videos that had more than ten million views in a day:

db.youstats.find({
  "views.daily.data": { $gt: 1e7 } ,
}).explain("executionStats").executionStats
Enter fullscreen mode Exit fullscreen mode

Such filter optimization is easy to understand as there's one index key for each value in the array, and the search simply looks for the values within the bound defined by $gt: 1e7:

      direction: 'forward',
      indexBounds: {
        'views.daily.data': [ '[inf.0, 10000000)' ],
        commentsNumber: [ '[MinKey, MaxKey]' ]
      },
Enter fullscreen mode Exit fullscreen mode

If you use multiple filters in the find() command, they apply to the same document but not to the same array value. For example, the following query will not find videos with daily views between ten and twenty million. Instead, it retrieves videos that had at least one day with views over ten million and one day with views under two million:

db.youstats.find({  
  $and: [  
    { "views.daily.data": { $gt: 1e7 } },  
    { "views.daily.data": { $lt: 2e7 } }  
  ]  
}).explain("executionStats").executionStats
Enter fullscreen mode Exit fullscreen mode

This is visible in the execution as the most selective filter used for the index scan and the other as a filter after fetching the document:

   stage: 'FETCH',
    filter: { 'views.daily.data': { '$lt': 20000000 } },
    nReturned: 8,
...
      stage: 'IXSCAN',
      nReturned: 8,
      indexBounds: {
        'views.daily.data': [ '[inf.0, 10000000)' ],
        commentsNumber: [ '[MinKey, MaxKey]' ]
      },
      keysExamined: 44,
      seeks: 1,
Enter fullscreen mode Exit fullscreen mode

Note that the following is exactly the same, with two filters that may apply to different key, and there's no 'between' operator in MongoDB:

db.youstats.find({  
  $and: [  
    { "views.daily.data": { $gt: 1e7 , $lt: 2e7} },  
  ]  
}).explain("executionStats").executionStats
Enter fullscreen mode Exit fullscreen mode

If you want to apply multiple filters to the same key (the same array element) you must use $elemMatch so that the filters apply to an array element rather than the document:

db.youstats.find({
  "views.daily.data": {   
    $elemMatch: { $gt: 1e7 , $lt: 2e7 }   
  } ,
}).explain("executionStats").executionStats
Enter fullscreen mode Exit fullscreen mode

In case of doubt, the execution plan makes it clear in the index bounds:

      indexBounds: {
        'views.daily.data': [ '(20000000, 10000000)' ],
        commentsNumber: [ '[MinKey, MaxKey]' ]
      },
      keysExamined: 38,
Enter fullscreen mode Exit fullscreen mode

There's no 'between' operator in MongoDB but you don't need it because the MongoDB query planner can combine the two bounds [ '(20000000, -inf.0]' ] and [ '[inf.0, 10000000)' ]' to [ '(20000000, 10000000)' ] with is effectively a between. It has also the advantage to be implicit about the bounds inclusion with $gt/$lt or $gte/$lte.

This query planner transformation is known as index bound intersection

Once again, the same index was used to serve different queries. On this field, daily views data, each array had a single value and my filters applied on the same field.

My sample dataset has also an array with entries being objects with multiple fields, to record the video sharing activity:

...
    gplus: [
      {
        activityLanguage: 'en',
        activityReshared: 'z120it0xupygjt2hm04cctoodsjmttkwrow0k',
        authorID: '118074003327949301125',
        activityType: 'share',
        authorName: 'Liz Lyon',
        activityID: 'z12hu5cgnwrscznhb04ccrtprnbeupqwicc',
        activityTimestamp: '1391094295'
      },
      {
        activityLanguage: 'en',
        activityReshared: 'z120it0xupygjt2hm04cctoodsjmttkwrow0k',
        authorID: '118074003327949301125',
        activityType: 'share',
        authorName: 'Liz Lyon',
        activityID: 'z12hu5cgnwrscznhb04ccrtprnbeupqwicc',
        activityTimestamp: '1391094295'
      },
...
Enter fullscreen mode Exit fullscreen mode

I have some use cases that needs to find what a user has shared and create the following index for this access pattern:

db.youstats.createIndex({ 
 "gplus.activityType": 1       ,  
 "gplus.authorName": 1         ,  
});
Enter fullscreen mode Exit fullscreen mode

This index can be used to list the activity types, as we have seen on a previous post:

db.youstats.distinct("gplus.activityType")

[ null, 'reshare', 'share' ]
Enter fullscreen mode Exit fullscreen mode

I can filter for the 'share' activity type and the author name. I use a regular expression to find a prefix in the name. I use $elemMatch as the two filters must apply on the same array element:

db.youstats.aggregate([  
  { $match: { 
      gplus: { $elemMatch: { 
                             activityType: "share",
                             authorName: { $regex: /^Franck.*/ } 
      } } 
  } },  
  { $unwind: "$gplus" },  
  { $group: { _id: { video: "$title", author: "$gplus.authorName" }, shareCount: { $sum: 1 } } },  
  { $project: { _id: 0, videoTitle: "$_id.video", "share.author": "$_id.author", shareCount: 1 } },  
  { $sort: { shareCount: -1 } }, 
  { $limit: 5 }                   
]).explain("executionStats").stages[0]['$cursor'].executionStats
Enter fullscreen mode Exit fullscreen mode

The query planner has combined the filters into two index bounds, to get fast access to the index entries for the desired document:

        direction: 'forward',
        indexBounds: {
          'gplus.activityType': [ '["share", "share"]' ],
          'gplus.authorName': [ '["Franck", "Francl")', '[/^Franck.*/, /^Franck.*/]' ]
        },
        keysExamined: 17,
        seeks: 2,
Enter fullscreen mode Exit fullscreen mode

This is known as compound index bounds

In this post, I continued to add new use cases to a document model that was initially designed without specific access patterns in mind. Although optimized for a particular domain, this general-purpose database can adapt to various access patterns, thanks to its powerful multi-key indexes and query planner.
You can reproduce this on a MongoDB database and the first post of this series explains how to setup this lab and load data. If you encounter a database that claims MongoDB compatibility, you can try the same queries but will not observe the same performance because MongoDB is unique in providing multi-key indexes that can cover equality, sort and range efficiently.

Top comments (1)

Collapse
 
duncan_true profile image
Dun

Great explanation of how $elemMatch works with multi-key indexes. The examples make it easy to understand the importance of using the right operators for array fields in MongoDB. Thanks for sharing these practical queries and insights.