My Adventures in Coding

February 15, 2010

Getting Started with Mongo DB

Filed under: MongoDB,NOSQL — Brian @ 4:28 pm
Tags: , , , ,

Why are we considering Mongo DB?

In our current system we receive data from our customers, store it in a relational database, and then when we make that data available via a REST service we change very little. The data we are storing is all related, each chunk is really a document. So lately we have been questioning why we go through so much work to break the data down into a schema we have created, to only put the data back into the same format when we use it. This extra work has made us wonder if a Document Store is a better fit for our needs. There is nothing wrong with a relational database, it is just that as the needs of data storage change, a one size fits all solution is not necessarily the best solution in every case.

To just give an example of some of the current problems we have been running into: constant changes in the data we have to store. Each time a customer requires that we store a new field, we must also modify our database schema to include this new field and update our application to map data received to this new field. For us, we really don’t want to define a schema, we just want to store the data given to us by our customers, and surface it as we received it. Trying to force the data into some sort of database schema we define based on what we “think” our customers “might” send us is time consuming and in this case just does not feel right.

Our team has been considering document style database solutions such as Couch DB for the last few months, but we still have not been ready to make the jump. Switching from a relational to a document style database is a big shift, not just in technology, but also in how we approach the design of our application. Moving to Couch DB is a more drastic shift than switching to Mongo DB (You can read about the differences here). Recently we have become more interested in Mongo DB and decided to spend some time exporing it.

A Blend of Document and Relational Databases

We do not want to have a predefined schema, but rather we want the database to just take what data we give it and store it. However, we are not quite ready to give up all of the flexibility in querying that is provided by a relational database. This is what peaked our interest in Mongo DB. What makes Mongo DB unique amongst document style databases is that it allows you to store full documents in collections (Think of a collection as a schema free Table) in JSON format, BUT it still gives you the ability to query over any fields in those documents and create indexes for fields often queried. This is what made us so interested in Mongo DB.

Alright, to the fun stuff! Let’s try out Mongo DB

  • Download the software: http://www.mongodb.org/display/DOCS/Downloads
  • Unzip the Mongo DB software to a folder such as: C:\mongodb-win32-x86_64-1.2.2
  • Create a folder to store database files:
    • The default folder is C:\data\db on windows and “/data/db” on unix systems.
    • Make sure to create this folder before starting Mongo DB.
  • Start Mongo DB
    • cd C:\mongodb-win32-x86_64-1.2.2\bin
    • mongod.exe
    • NOTE: If you get an error saying “Assertion: dpath (/data/db/) does not exist” then you have either not created the directory, or permissions have not been set appropriately.

Interactive Shell

Mongo DB provides a shell interface for querying the database directly. This is very useful when you want to try the database for the first time.

To start the Interactive shell:
cd C:\mongodb-win32-x86_64-1.2.2\bin
mongo.exe

Just to get started let’s try out a few basic statements

Insert a document into a new collection
Note that when we insert the document “car” we are also creating a brand new collection called “cars”. You can think of a collection as being like a table in a relational database, but without any defined schema. Also note that creation of collections is on demand, if you try to insert a document with a collection name that does not exist, a new collection will be create and the document will be inserted.

> car = { make: "Ford" , model: "Galaxy"};
     { "make" : "Ford", "model" : "Galaxy" }
> db.cars.save(car);
> db.cars.find();
     { "_id" : ObjectId("4b7789f2fb5c000000006faa"), "make" : "Ford", "model" : "Galaxy" }

Query for the document
In the above statement we inserted a new document and just did a “find” which returns ALL documents in the collection. But what if we want to query for a specific document? Mongo DB providers a function called “findOne” which returns a single document based on search criteria provided.

> db.cars.findOne({ make: "Ford" });
     {
             "_id" : ObjectId("4b7789f2fb5c000000006faa"),
             "make" : "Ford",
             "model" : "Galaxy"
     }

Update the existing document
In this update statement we are saying for every document in collection “cars” with a “model” type of “Galaxy”, set the “status” field to “InStock”. In an update if the “status” field already exists it will be updated, but also if the field does not exist on that document it will also be added.

> db.cars.update({ model: "Galaxy"}, {make: "Ford", model: "Galaxy", status: "InStock"});
> db.cars.findOne({ make: "Ford" });
     {
             "_id" : ObjectId("4b7789f2fb5c000000006faa"),
             "make" : "Ford",
             "model" : "Galaxy",
             "status" : "InStock"
     }

Query for a list of documents
When you use the “find” command you can specify criteria to search by (which feels very much like a select statement in a relational database). The find command returns a cursor which allows you to call functions like “next()” and “hasNext()” to retrieve documents. The interesting thing is that the cursor is not a snapshot (a list of all documents that meet the search criteria at that moment in time), but instead it does a live query for the next item each time “next()” is called. So if you create a cursor for all “cars” matching the criteria of “make” being “Ford” and then as your loop is running, new documents matching this criteria are added to the “cars” collection, they will also be included. This feature is especially interesting to us because in our current system we provide data feeds. To be able to start the data transfer and know that any new documents added while the job is running, will not be missed, is a big plus.

> var cursor = db.cars.find({ make: "Ford" });
> cursor.length()
     1
> car = { make: "Ford" , model: "Fairlane"};
     { "make" : "Ford", "model" : "Fairlane" }
> db.cars.save(car);
> cursor.length()
     2
> db.cars.find({ make: "Ford" });
     { "_id" : ObjectId("4b7789f2fb5c000000006faa"), "make" : "Ford", "model" : "Galaxy", "status" : "InStock" }
     { "_id" : ObjectId("4b79c916fb5c000000006fab"), "make" : "Ford", "model" : "Fairlane" }

Import JSON data from a file

Also, if you have data in valid JSON format in a file and would like to import this data, you can use the import from file utility:

mongoimport.exe –file [PathToMyFile] –collection [NameOfNewCollection]

For our initial test of Mongo DB we created a test file with 200,000 JSON records of real production data and loaded it into Mongo DB with this file import utitlity. The performance was very good, it did the import in about 1000 records per second.

Conclusion

As with any technology change, it is important to think about the problem you are trying to solve and pick the right tool for the job. Avoid looking at cool technologies then trying to apply them to every problem as some sort of silver bullet solution, which is all too common in our industry. MongoDB is not a one size fits all solution just as a relation database is not a one size fits all solution. For most of our applications a relational database is still the best choice, but we found in this case a document store was more appropriate, which lead us to start exploring a number of document store solutions, and out of that evaluation MongoDB was the best fit for our needs. So evaluate MongoDB and see if it is the right tool to solve your problem, or just use it because MongoDB is Web Scale (haha).

If you are interested in learning more about Mongo DB and especially in how the query syntax works, check out the tutorial provided on the Mongo DB website.

Advertisements

6 Comments »

  1. Отличная статья! Спасибо!

    Comment by Михаил — December 28, 2010 @ 1:19 am | Reply

    • Спасибо, я рад, что вы сочли полезным!

      Comment by Brian — December 28, 2010 @ 8:19 am | Reply

  2. […] If you would like a more detailed introduction to using MongoDB, the query syntax, etc, please refer to my previous post: Getting Started with MongoDB […]

    Pingback by Creating a REST API in Python using Bottle and MongoDB « My Adventures in Coding — January 2, 2011 @ 10:15 pm | Reply

  3. […] If you would like a more detailed introduction to using MongoDB, the query syntax, etc, please refer to my previous post: Getting Started with MongoDB […]

    Pingback by Bottle与MongoDB制作REST API - 扫巧网 — June 20, 2012 @ 10:40 am | Reply

  4. […] Getting Started with Mongo DB […]

    Pingback by MongoDB – A NoSQL Database : SL Tech Umbrella — November 12, 2012 @ 3:28 pm | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: