MongoDB – Geospatial Queries

Most of the data we work with has geographic longitude/latitude information so we have recently started exploring doing some work with geospatial queries using different database technologies (MySQL, Oracle). We currently use MongoDB as our primary data store and we were surprised to find out that MongoDB already has this functionality built in. After doing some reading we also realized that it handles each of our use cases. Even though we use MongoDB everyday, we just never thought a document store would come with this functionality. For more information go to MongoDB Geospatial Indexing.

Download and Setup MongoDB

For this tutorial, we will just use the MongoDB interactive shell for interacting with the database because it is simple. There is no need to complicate this example by using the MongoDB Scala or Python drivers. They are simple to use once you gain an understanding of MongoDB. The basics are that MongoDB is a document store, data is stored as JSON documents, and queries are made by using subsets of the JSON from the documents you wish to match, with query commands mixed in.

Download MongoDB (Downloads):

wget http://fastdl.mongodb.org/osx/mongodb-osx-x86_64-2.0.0.tgz
tar -zxvf mongodb-osx-x86_64-2.0.0.tgz

Create a folder for MongoDB to store it’s database files (We are just using the default location for this example):

mkdir -p /data/db

NOTE: On windows this would be “C:\data\db”.

Start the MongoDB database:

cd mongodb-osx-x86_64-2.0.0/bin
./mongod

MongoDB provides an interactive shell which can be used to query your MongoDB database. For the rest of this tutorial, we will use the interactive shell.

Start the interactive shell:

cd mongodb-osx-x86_64-2.0.0/bin
./mongo
MongoDB shell version: 2.0.0
connecting to: test
>

You should see the shell start up and display “connecting to: Test”. This means you are connected to the default database “Test” which will be fine for this tutorial.

That’s all, we are ready to explore geospatial querying in MongoDB!

Geospatial indexes and queries

Let’s for these examples assume we are creating a database for a website where a customer can browse and search for automotive dealerships in a given area by different types of map configurations. Let’s explore the most common use cases for retrieving dealership information by a geospatial query.

Defining documents with geospatial co-ordinates

The latitude/longitude elements in a document must be stored in a field called “loc” and follow a certain format. Either it can be stored as an array of two elements such as “loc:[51,-114]” or as a dictionary with two elements such as “loc:{lat:51,lon:-114}”. I decided to use the dictionary since it more closely matches our existing data. So let’s create some documents for car dealerships that each contain latitude/longitude information.

db.dealerships.save({"name":"Frank's Fords", "affiliation":"Ford", "loc":{"lon":51.10682735591432,"lat":-114.11773681640625}})
db.dealerships.save({"name":"Steve's Suzukis", "affiliation":"Suzuki", "loc":{"lon":51.09144802136697,"lat":-114.11773681640625}})
db.dealerships.save({"name":"Charlie's Chevrolets", "affiliation":"Chevrolet", "loc":{"lon":51.08282186160978,"lat":-114.10400390625}})
db.dealerships.save({"name":"Nick's Nissans", "affiliation":"Nissan", "loc":{"lon":51.12076493195686,"lat":-113.98040771484375}})
db.dealerships.save({"name":"Tom's Toyotas", "affiliation":"Toyota", "loc":{"lon":50.93939251390387,"lat":-113.98040771484375}})

Create the geospatial index

Now, inorder to query by geo co-ordinates, we need to create an index over the “loc” field of our dealership documents.

db.dealerships.ensureIndex({loc:"2d"})

Common Use Cases

What if I want the two dealerships closest to a specific co-ordinate?

This can be done using the “near” and “limit” query options. This query finds the points closest to the co-ordinate provided and returns them sorted by distance from the point given (Yep, MongoDB handles that all for you, returns the data exactly as you would expect).

db.dealerships.find({loc: {$near:[51,-114]}}).limit(2)

Query returns:

{ "_id" : ObjectId("4e8927066f9caf7713a8421b"), "name" : "Tom's Toyotas", "affiliation" : "Toyota", "loc" : { "lon" : 50.93939251390387, "lat" : -113.98040771484375 } }
{ "_id" : ObjectId("4e8926f96f9caf7713a8421a"), "name" : "Nick's Nissans", "affiliation" : "Nissan", "loc" : { "lon" : 51.12076493195686, "lat" : -113.98040771484375 } }
What if I want to filter by dealership affiliation in the query?

No problem, the MongoDB people have thought of that as well. They call these “Compound Indexes”. When creating the geospatial index you can also include other fields in your document in that index. So for example if you wanted to have your application query for all “Ford” affiliated dealerships available close to the co-ordinates provided, you would create the following index:

Add the Compound index:

db.dealerships.ensureIndex({loc:"2d", affiliation:1})

Then your application would be able to query by dealership affiliation as well:

db.dealerships.find({loc: {$near:[51,-114]}, "affiliation":"Ford"})

Query returns:

{ "_id" : ObjectId("4e8926696f9caf7713a84215"), "name" : "Frank's Fords", "affiliation" : "Ford", "loc" : { "lon" : 51.10682735591432, "lat" : -114.11773681640625 } }

You can see the value in being able to do these geospatial queries so easily. For example, if your website has a map, showing dealership locations, the customer can click on and zoom in on any area of the map. When they do, the items displayed on the map will be refreshed based on a geospatial query, returning the N number of items closest to the point selected. Of course, this can also be filtered further by allowing the customer to select filter criteria such as “Affiliation”.

What if I want to search for all dealerships within a given area of town?

Well MongoDB handles that as well with “Bounded Queries”. With bounded queries you can use either a rectangle, circle, or polygon. Since areas of cities are best represented by a polygon, we will use that for this example.

Let’s define a polygon for a specific area of town:

areaoftown = { a : { x : 51.12335082548444, y : -114.19052124023438 }, b : { x : 51.11904092252057, y : -114.05593872070312 }, c : { x : 51.02325750523972, y : -114.02435302734375 }, d : { x : 51.01634653617311, y : -114.1644287109375 } }

Once this polygon has been defined, we can then search our dealerships collection for dealers that fall within this boundary.

db.dealerships.find({ "loc" : { "$within" : { "$polygon" : areaoftown } } })

NOTE: Polygon searches are only available in versions >=1.9

Query returns:

{ "_id" : ObjectId("4e892d8c7f369ee980a3662b"), "name" : "Charlie's Chevrolets", "affiliation" : "Chevrolet", "loc" : { "lon" : 51.08282186160978, "lat" : -114.10400390625 } }
{ "_id" : ObjectId("4e892d797f369ee980a36629"), "name" : "Frank's Fords", "affiliation" : "Ford", "loc" : { "lon" : 51.10682735591432, "lat" : -114.11773681640625 } }
{ "_id" : ObjectId("4e892d837f369ee980a3662a"), "name" : "Steve's Suzukis", "affiliation" : "Suzuki", "loc" : { "lon" : 51.09144802136697, "lat" : -114.11773681640625 } }

MongoDB makes this simple and easy to use, good job!

MongoDB – Negative Regex Query in Mongo shell

MongoDB’s interactive shell, supports the use of Regular Expressions as one of their Advanced Query options. The other day it came up that I needed to do a query for all documents where the text of a field did NOT start with a given pattern. It took a little bit of trial and error to get the regex right in the mongo shell, so here is what worked for us.

So let’s say for example we have a MongoDB database with a collection called “cars”. The primary key for each document is “make:model”, for example “Ford:Fusion”.

To find all cars made by “Ford”, you would just query for all documents where the id starts with the text “Ford”:

db.cars.find({"_id":/^Ford/})

But, if you want to query for all cars that are NOT made by “Ford”, you can do this with a negative regex such as:

db.cars.find({"_id":/^((?!Ford).)/})

Obviously, this type of regex is not efficient, but if you just need to do a quick query from the Mongo shell, it can come in handy.

I hope that saves you some time!

PyMongo-Frisk – Now supports Replica Sets

Yep, we have updated the open source project PyMongo-Frisk (0.0.4) to now support MongoDB Replica Sets! Our library extends the PyMongo Connection class and adds an additional method called “check_health()”. This method when called returns who is the master, a list of all slave nodes, verifies read and write connectivity with the master, and finally verifies it can read from ALL slaves in the Replica Set. The intended usage for this connection class extension would be an internal monitoring page that is pinged regularly by an automated monitor and sets off an alarm if anything fails.

We created this project because we wanted to ensure that our app server had the ability to reach all failover slave nodes. We did not want to wait until a failure of the master occurred to find out the app server could not reach one of our slaves!

To use our extension of the connection class:

from pymongo_frisk import FriskConnection as Connection
connection = Connection(['host1','host2','host3'])
results = connection.check_health()

When called the method returns a dictionary containing info about the health of ALL nodes in a Replica Set:
{'db_master_host': 'host1:27017',
'db_slave_hosts': ['host2:27018','host3:27019'],
'db_master_can_write': True,
'db_master_can_read': True,
'db_slaves_can_read': [('host2',True),('host3',True)] }

Note, the library still supports Replica Pairs using authentication. We have moved on to using version 1.6+ of MongoDB, so we decided to make the jump recently from Replica Pairs to Replica Sets. Replica Pairs will be removed in MongoDB version 1.8.

Creating a REST API in Python using Bottle and MongoDB

I have been using Bottle and MongoDB for a REST API project for almost a year now. I frequently get asked the question “Bottle, what is Bottle, I have never heard of it” and “Ok, I have read the Bottle tutorial, it is simple, but how do I use Bottle with MongoDB?”. So I decided to write up a simple “Getting Started” tutorial for my friends curious about these technologies.

Prerequisites: Python 2.7 or higher

Setup MongoDB

MongoDB is a document store, a schema less database. What makes MongoDB unique (and why we decided to use it) is that MongoDB is a hybrid document store. What this means is that MongoDB offers the freedom of storing data in a schema less fashion, while still providing a flexible query syntax that will make anyone familiar with SQL feel comfortable.

You will need to download the latest version of MongoDB from the downloads page. Let’s download MongoDB and get it up and running.

First, setup the data folder. By default MongoDB stores all data files in the folder /data/db, so let’s create the folder:

sudo mkdir -m 777 -p /data/db

Download MongoDB:

wget http://fastdl.mongodb.org/osx/mongodb-osx-x86_64-1.6.5.tgz
tar -zxvf mongodb-osx-x86_64-1.6.5.tgz

Now let’s start MongoDB:

cd mongodb-osx-x86_64-1.6.5/bin
./mongod

NOTE: We have not created a database, or even a collection (you can think of a collection as a schema free table). When a document is being saved, MongoDB will lazily create the collection if it does not exist, and if the database does not exist, it will also create the database. It’s just that simple!

If you would like a more detailed introduction to using MongoDB, the query syntax, etc, please refer to my previous post: Getting Started with MongoDB

Setup Bottle

Bottle is a very simple Python web framework that is contained in a single Python file (bottle.py). We have been using Bottle for creating REST APIs and have been very impressed with just how easy it is to use.

Install the Bottle package into your Python site-packages folder:

pip install bottle

Setup PyMongo

PyMongo is a library for interacting with MongoDB from a Python application. This package bridges the gap between Bottle and MongoDB.

Install the PyMongo package into your Python site-packages folder:

pip install pymongo

Putting it all together: A Simple REST API

Now, on to the fun stuff! Let’s write a simple Python REST API, with a GET and a PUT, using Bottle, that allows us to save and retrieve documents to and from MongoDB using PyMongo.

Create a new file:

touch myrestapi.py

Cut and paste the following code:

import json
import bottle
from bottle import route, run, request, abort
from pymongo import Connection

connection = Connection('localhost', 27017)
db = connection.mydatabase

@route('/documents', method='PUT')
def put_document():
data = request.body.readline()
if not data:
abort(400, 'No data received')
entity = json.loads(data)
if not entity.has_key('_id'):
abort(400, 'No _id specified')
try:
db['documents'].save(entity)
except ValidationError as ve:
abort(400, str(ve))

@route('/documents/:id', method='GET')
def get_document(id):
entity = db['documents'].find_one({'_id':id})
if not entity:
abort(404, 'No document with id %s' % id)
return entity

run(host='localhost', port=8080)

Start the application:

python myrestapi.py

Using our new REST API

Now let’s try saving and retrieving some documents. If you do not have a preferred REST client handy and are using the Google Chrome browser, you should try the extension Simple REST Client.

Save a document
  • Open up your favorite REST client and point it at the following URL:
  • Now select PUT and enter the following data:
    • {"_id": "doc1", "name": "Test Document 1"}

NOTE: The primary key of a document in a MongoDB collection is always the field “_id”. If you specify an “_id” in your document it will be used, if you do not, MongoDB will generate a unique id.

Retrieve a document
  • In your REST client enter the URL for retrieving the document we just saved:
  • Now select GET and you should see the same document returned:
    • {"_id": "doc1", "name": "Test Document 1"}

That is it! You now have a basic, working, REST API written in Python using Bottle and MongoDB!

PyMongo – Checking the health of Replica Pair connections

We are currently using MongoDB setup with Replica Pairs (We are waiting until version 1.8.0 to switch to Replica Sets). Our application that uses MongoDB is a Python REST API using PyMongo. Once we started development on this project the first thing we wanted to do was sort out how we were going to monitor the application. We ended up deciding to add a monitoring page to our REST API and used it to display information showing the health of the application. We setup monitors to verify each instance of MongoDB was up and running, monitors to verify that both instances of MongoDB were pairing correctly, and we setup monitors to call our new monitoring page to verify our application had read and write connectivity to the Master in our Replica Pair. However, there was a very important piece not being verified by our monitors:

In the event of a failure of the Master in our Replica Pair, would our application be able to successfully connect to the Slave instance when it took over as the Master?

The issue is that when PyMongo connects to a Replica Pair, if it is able to connect to the first MongoDB instance in the pair, it just connects and never verifies that it can also connect to the Slave instance. So for example if there were some firewall rule that would prevent the application from connecting to the second instance of MongoDB in the pair, we would prefer to find this out BEFORE a failure of the Master occurs!

So our solution was to extend the PyMongo connection class to add an additional method called “check_health()”. This method verifies that it can connect to both MongoDB instances in the Replica Pair, verifies it can write to the Master, read from the Master, and can read from the Slave.

My co-worker James and have created a small open source project called PyMongo-Frisk which you can use to get our additional connection functionality. You use it just like the connection class in PyMongo:

import pymongo_frisk as pymongo
connection = pymongo.connection.Connection.from_uri("mongo://username:password@host1,host2/database")
results = connection.check_health()

The check_health() call returns the results as a dictionary:
{'db_master_url': 'host2', 'db_slave_url': 'host1', 'db_master_can_write': True, 'db_slave_can_read': True, 'db_master_can_read': True}

You can get a copy of the PyMongo-Frisk code from any of the following sources:

Softpedia
pypi
Github

MongoDB – Setup a Replica Pair running in auth mode

If you are using MongoDB and need a Master/Slave configuration that will give you a Slave that will automatically be promoted to Master in the event of a Master DB failure, then MongoDB Replica Pairs will do the job for you. In a Replica Pair the Slave checks the Master for updates every few seconds. If the Master fails to respond, the Slave automatically takes over as the Master. So as far as your application is concerned, everything is still functioning correctly. In the event that the failed pair comes back online, it will see the other pair is currently the Master, will start as a Slave and sync up with the Master.

Testing the Replica Pair setup, it works very well. However, we wanted to be able to run Replica Pairs with MongoDB auth mode turned on so we could password protect our databases. We figured out how to do this setup, after a couple of attempts, so here are our instructions. Hopefully this will help!

Replica Pair using auth mode

Server1 = Your server that has all of the data you want to use
Server2 = Your server that current has no data (This is our Failover server)

Assuming both servers have a data file location of /data/db

  • Server1: delete all files with local.*
    • rm -f /data/db/local.*
  • Server2: ensure your /data/db folder is empty
  • Start Server1
    • mongod –pairwith Server2 –dbpath /data/db
    • Server1 will become the current Master
  • Start Server2
    • mongod –pairwith Server1 –dbpath /data/db
    • Server2 will become the current Slave
  • With the mongo shell connect to Server1
    • mongo –host Server1
    • Add credentials to the admin database
      • use admin
      • db.addUser(“admin”,”adminpassword”)
      • db.auth(“admin”,”adminpassword”)
    • Add replication credentials to the admin database
      • use local
      • db.addUser(“repl”,”replpassword”)
      • exit
  • Stop Server1 (ctrl+c)
    • Server2 should now switch to being the new Master
  • With the mongo shell connect to Server2
    • mongo –host Server2
    • Authenticate with the admin credentials, they were copied from Server1(Master) to Server2 (Slave)
      • use admin
      • db.auth(“admin”,”adminpassword”)
    • Add the replication credentials to the “local” database, these were not copied from Server1 to Server2 automatically while running as replica pairs
      • use local
      • db.addUser(“repl”,”replpassword”)
      • exit
  • Stop Server2 (ctrl+c)
  • Start Server1 in auth mode (It will be the Master)
    • mongod –pairwith Server2 –auth –dbpath /data/db
  • Start Server2 in auth mode (It will be the Slave)
    • mongod –pairwith Server1 –auth –dbpath /data/db

Python Replica Pair Connection

Example connection string for Python connecting to a replica pair (Server1, Server2)

import pymongo
connection=pymongo.Connection.from_uri("mongodb://username:password@Server1,Server2/database")

MongoDB – Connecting a Slave to a Master running in auth mode

We have been running MongoDB in a test environment with a Master and a Slave working fine. However, for production we want to run our MongoDB Master and Slave using auth mode. To do this, according to the documentation on the MongoDB site, you need to create an account on both the Master and the Slave in the “Local” database that has the username “repl” (i.e., replication user). This common user on both instances of MongoDB is what is used to authenticate a Slave .The documentation on the MongoDB site on how to set up a Master-Slave configuration in auth mode is kind of vague. So to help out anyone who may be attempting this setup for the first time, here are our step by step instructions.

To setup a Master and Slave running in auth mode:

Setup Master

  • Create a directory to store your mongo DB database files
    • mkdir /data/db
  • Go to the bin folder of where your MongoDB code was extracted
    • e.g., /Users/me/mongodb/mongodb-osx-x86_64-1.4.0/bin
  • Start the Master DB
    • mongod –dbpath /data/db
  • Open another command prompt in the same folder and run the MongoDB shell
    • mongo
    • NOTE: You can also connect to a remote mongo server using the shell: mongo [remotehostname]
  • Create an admin user on the admin database
    • use admin
    • db.addUser(“admin”,”adminpassword”)
    • exit
  • Stop the mongodb server, use ctrl+c in the command prompt where it was started
  • Start the mongodb in “auth” mode
    • mongod –master –auth –dbpath /data/db
  • Now let’s login to the admin database using the admin credentials and add the “repl” (replication) user
    • mongo admin -u admin -p adminpassword
    • use local
    • db.addUser(“repl”,”replpassword”)
    • exit
  • Just to make sure it works, let’s login to our mongo db server with the “repl” credentials
    • mongo local -u repl -p replpassword
    • If you get a command prompt, then the setup was all successful!
    • exit

Setup Slave

Follow the same setup as Master on your Slave server. A MongoDB instance is always configured as a Master. Which db is a Master and which is a Slave is determined when the database is started.

NOTE: The above instructions assume you are setting up the Master and the Server on two physical machines

  • If you want to setup both on the same machine, you will need to use different “dbpath” folders for each
  • For Example:
    • mongodb –master –dbpath /data/masterdb
    • mongodb –slave –dbpath /data/slavedb

Start the Master and the Slave

  • On Master server: mongod –master –auth –dbpath /data/masterdb
  • On Slave Server: mongod –slave –auth –source [masterhostname] –dbpath /data/slavedb/
  • The Slave should start now and successfully connect to the Master running in auth mode

Test your configuration

  • Open a mongo shell to the Master database
    • mongo –host [masterhostname] admin -u admin -p adminpassword
  • Now lets add a new database and add a user account
    • use foo
    • db.addUser(“foouser”,”foopassword”)
  • Now let’s check to make sure the foo database and user have been replicated to the slave
    • Open a mongo shell to the Slave database
    • mongo –host [slavehostname] foo -u foouser -p foopassword
    • show collections
    • If you get a command prompt and can type “show collections”, everything is working fine!

NOTE: If your configuration was setup incorrectly you will probably see the following error:

        replauthenticate: no user in local.system.users to use for authentication

Getting Started with Mongo DB

Why are we considering Mongo DB?

In our current system we receive data from our customers, store it in a relational database, and then when we make that data available via a REST service we change very little. The data we are storing is all related, each chunk is really a document. So lately we have been questioning why we go through so much work to break the data down into a schema we have created, to only put the data back into the same format when we use it. This extra work has made us wonder if a Document Store is a better fit for our needs. There is nothing wrong with a relational database, it is just that as the needs of data storage change, a one size fits all solution is not necessarily the best solution in every case.

To just give an example of some of the current problems we have been running into: constant changes in the data we have to store. Each time a customer requires that we store a new field, we must also modify our database schema to include this new field and update our application to map data received to this new field. For us, we really don’t want to define a schema, we just want to store the data given to us by our customers, and surface it as we received it. Trying to force the data into some sort of database schema we define based on what we “think” our customers “might” send us is time consuming and in this case just does not feel right.

Our team has been considering document style database solutions such as Couch DB for the last few months, but we still have not been ready to make the jump. Switching from a relational to a document style database is a big shift, not just in technology, but also in how we approach the design of our application. Moving to Couch DB is a more drastic shift than switching to Mongo DB (You can read about the differences here). Recently we have become more interested in Mongo DB and decided to spend some time exporing it.

A Blend of Document and Relational Databases

We do not want to have a predefined schema, but rather we want the database to just take what data we give it and store it. However, we are not quite ready to give up all of the flexibility in querying that is provided by a relational database. This is what peaked our interest in Mongo DB. What makes Mongo DB unique amongst document style databases is that it allows you to store full documents in collections (Think of a collection as a schema free Table) in JSON format, BUT it still gives you the ability to query over any fields in those documents and create indexes for fields often queried. This is what made us so interested in Mongo DB.

Alright, to the fun stuff! Let’s try out Mongo DB

  • Download the software: http://www.mongodb.org/display/DOCS/Downloads
  • Unzip the Mongo DB software to a folder such as: C:\mongodb-win32-x86_64-1.2.2
  • Create a folder to store database files:
    • The default folder is C:\data\db on windows and “/data/db” on unix systems.
    • Make sure to create this folder before starting Mongo DB.
  • Start Mongo DB
    • cd C:\mongodb-win32-x86_64-1.2.2\bin
    • mongod.exe
    • NOTE: If you get an error saying “Assertion: dpath (/data/db/) does not exist” then you have either not created the directory, or permissions have not been set appropriately.

Interactive Shell

Mongo DB provides a shell interface for querying the database directly. This is very useful when you want to try the database for the first time.

To start the Interactive shell:
cd C:\mongodb-win32-x86_64-1.2.2\bin
mongo.exe

Just to get started let’s try out a few basic statements

Insert a document into a new collection
Note that when we insert the document “car” we are also creating a brand new collection called “cars”. You can think of a collection as being like a table in a relational database, but without any defined schema. Also note that creation of collections is on demand, if you try to insert a document with a collection name that does not exist, a new collection will be create and the document will be inserted.

> car = { make: "Ford" , model: "Galaxy"};
{ "make" : "Ford", "model" : "Galaxy" }
> db.cars.save(car);
> db.cars.find();
{ "_id" : ObjectId("4b7789f2fb5c000000006faa"), "make" : "Ford", "model" : "Galaxy" }

Query for the document
In the above statement we inserted a new document and just did a “find” which returns ALL documents in the collection. But what if we want to query for a specific document? Mongo DB providers a function called “findOne” which returns a single document based on search criteria provided.

> db.cars.findOne({ make: "Ford" });
{
"_id" : ObjectId("4b7789f2fb5c000000006faa"),
"make" : "Ford",
"model" : "Galaxy"
}

Update the existing document
In this update statement we are saying for every document in collection “cars” with a “model” type of “Galaxy”, set the “status” field to “InStock”. In an update if the “status” field already exists it will be updated, but also if the field does not exist on that document it will also be added.

> db.cars.update({ model: "Galaxy"}, {make: "Ford", model: "Galaxy", status: "InStock"});
> db.cars.findOne({ make: "Ford" });
{
"_id" : ObjectId("4b7789f2fb5c000000006faa"),
"make" : "Ford",
"model" : "Galaxy",
"status" : "InStock"
}

Query for a list of documents
When you use the “find” command you can specify criteria to search by (which feels very much like a select statement in a relational database). The find command returns a cursor which allows you to call functions like “next()” and “hasNext()” to retrieve documents. The interesting thing is that the cursor is not a snapshot (a list of all documents that meet the search criteria at that moment in time), but instead it does a live query for the next item each time “next()” is called. So if you create a cursor for all “cars” matching the criteria of “make” being “Ford” and then as your loop is running, new documents matching this criteria are added to the “cars” collection, they will also be included. This feature is especially interesting to us because in our current system we provide data feeds. To be able to start the data transfer and know that any new documents added while the job is running, will not be missed, is a big plus.

> var cursor = db.cars.find({ make: "Ford" });
> cursor.length()
1
> car = { make: "Ford" , model: "Fairlane"};
{ "make" : "Ford", "model" : "Fairlane" }
> db.cars.save(car);
> cursor.length()
2
> db.cars.find({ make: "Ford" });
{ "_id" : ObjectId("4b7789f2fb5c000000006faa"), "make" : "Ford", "model" : "Galaxy", "status" : "InStock" }
{ "_id" : ObjectId("4b79c916fb5c000000006fab"), "make" : "Ford", "model" : "Fairlane" }

Import JSON data from a file

Also, if you have data in valid JSON format in a file and would like to import this data, you can use the import from file utility:

mongoimport.exe –file [PathToMyFile] –collection [NameOfNewCollection]

For our initial test of Mongo DB we created a test file with 200,000 JSON records of real production data and loaded it into Mongo DB with this file import utitlity. The performance was very good, it did the import in about 1000 records per second.

Conclusion

As with any technology change, it is important to think about the problem you are trying to solve and pick the right tool for the job. Avoid looking at cool technologies then trying to apply them to every problem as some sort of silver bullet solution, which is all too common in our industry. MongoDB is not a one size fits all solution just as a relation database is not a one size fits all solution. For most of our applications a relational database is still the best choice, but we found in this case a document store was more appropriate, which lead us to start exploring a number of document store solutions, and out of that evaluation MongoDB was the best fit for our needs. So evaluate MongoDB and see if it is the right tool to solve your problem, or just use it because MongoDB is Web Scale (haha).

If you are interested in learning more about Mongo DB and especially in how the query syntax works, check out the tutorial provided on the Mongo DB website.