DEV Community

Franck Pachot for MongoDB

Posted on • Edited on

2 2 1 1 2

Sampling Without Index

I won't create any indexes for this post yet, but I can still filter efficiently to run a query in milliseconds. However, since I haven't indexed any values, the filter will randomly sample data without offering any real value.

Data was imported in the first post of this series.

Without indexes, queries on a sample

Without indexes, scanning the whole collection would be inefficient, but I can scan a sample to get some insights about data. For example, I list and count the categories of videos from a random sample of one thousand:

db.youstats.aggregate([  
  { $sample: { size: 10000 } },  
  { $group: { _id: "$category", count: { $sum: 1 } , maxDuration: { $max: "$duration" } } },  
  {  $sort: { count: -1 } }  
]).explain('executionStats')

[
  { _id: 'Music', count: 1841, maxDuration: '99' },
  { _id: 'People', count: 1358, maxDuration: '99' },
  { _id: 'Entertainment', count: 1024, maxDuration: '99' },
  { _id: 'Education', count: 949, maxDuration: '99' },
  { _id: 'Tech', count: 698, maxDuration: '99' },
  { _id: 'Howto', count: 668, maxDuration: '99' },
  { _id: 'Sports', count: 498, maxDuration: '99' },
  { _id: 'News', count: 492, maxDuration: '99' },
  { _id: 'Animals', count: 477, maxDuration: '99' },
  { _id: 'Comedy', count: 434, maxDuration: '98' },
  { _id: 'Games', count: 415, maxDuration: '99' },
  { _id: 'Film', count: 384, maxDuration: '99' },
  { _id: 'Travel', count: 294, maxDuration: '97' },
  { _id: 'Autos', count: 276, maxDuration: '99' },
  { _id: 'Nonprofit', count: 151, maxDuration: '96' },
  { _id: '3', count: 28, maxDuration: '97' },
  { _id: '4', count: 10, maxDuration: '60' },
  { _id: '5', count: 3, maxDuration: '71' }
]
Enter fullscreen mode Exit fullscreen mode

An aggregation pipeline is like a SQL query where you have more control on the processing flow. The first stage filters (here a simple $sample), the second stage defines the group by ($group with an identifier built from the grouping fields, and the calculated fields), and the last stage is an order by ($sort with -1 for DESCending).

This allows to take a look at data, quickly, to gate an idea of the shape. You can also use MongoDB Compass to do so. And when connected to Atlas, you can type your query in your own language and get the aggregation pipeline generated:
Image description

In the next posts, we will create some indexes to serve more use cases.

Top comments (0)

👋 Kindness is contagious

Sign in to DEV to enjoy its full potential—unlock a customized interface with dark mode, personal reading preferences, and more.

Okay