Most of the data we work with has geographic longitude/latitude information so we have recently started exploring doing some work with geospatial queries using different database technologies (MySQL, Oracle). We currently use MongoDB as our primary data store and we were surprised to find out that MongoDB already has this functionality built in. After doing some reading we also realized that it handles each of our use cases. Even though we use MongoDB everyday, we just never thought a document store would come with this functionality. For more information go to MongoDB Geospatial Indexing.
Download and Setup MongoDB
For this tutorial, we will just use the MongoDB interactive shell for interacting with the database because it is simple. There is no need to complicate this example by using the MongoDB Scala or Python drivers. They are simple to use once you gain an understanding of MongoDB. The basics are that MongoDB is a document store, data is stored as JSON documents, and queries are made by using subsets of the JSON from the documents you wish to match, with query commands mixed in.
Download MongoDB (Downloads):
wget http://fastdl.mongodb.org/osx/mongodb-osx-x86_64-2.0.0.tgz tar -zxvf mongodb-osx-x86_64-2.0.0.tgz
Create a folder for MongoDB to store it’s database files (We are just using the default location for this example):
mkdir -p /data/db
NOTE: On windows this would be “C:\data\db”.
Start the MongoDB database:
cd mongodb-osx-x86_64-2.0.0/bin ./mongod
MongoDB provides an interactive shell which can be used to query your MongoDB database. For the rest of this tutorial, we will use the interactive shell.
Start the interactive shell:
cd mongodb-osx-x86_64-2.0.0/bin ./mongo MongoDB shell version: 2.0.0 connecting to: test >
You should see the shell start up and display “connecting to: Test”. This means you are connected to the default database “Test” which will be fine for this tutorial.
That’s all, we are ready to explore geospatial querying in MongoDB!
Geospatial indexes and queries
Let’s for these examples assume we are creating a database for a website where a customer can browse and search for automotive dealerships in a given area by different types of map configurations. Let’s explore the most common use cases for retrieving dealership information by a geospatial query.
Defining documents with geospatial co-ordinates
The latitude/longitude elements in a document must be stored in a field called “loc” and follow a certain format. Either it can be stored as an array of two elements such as “loc:[51,-114]” or as a dictionary with two elements such as “loc:{lat:51,lon:-114}”. I decided to use the dictionary since it more closely matches our existing data. So let’s create some documents for car dealerships that each contain latitude/longitude information.
db.dealerships.save({"name":"Frank's Fords", "affiliation":"Ford", "loc":{"lon":51.10682735591432,"lat":-114.11773681640625}})
db.dealerships.save({"name":"Steve's Suzukis", "affiliation":"Suzuki", "loc":{"lon":51.09144802136697,"lat":-114.11773681640625}})
db.dealerships.save({"name":"Charlie's Chevrolets", "affiliation":"Chevrolet", "loc":{"lon":51.08282186160978,"lat":-114.10400390625}})
db.dealerships.save({"name":"Nick's Nissans", "affiliation":"Nissan", "loc":{"lon":51.12076493195686,"lat":-113.98040771484375}})
db.dealerships.save({"name":"Tom's Toyotas", "affiliation":"Toyota", "loc":{"lon":50.93939251390387,"lat":-113.98040771484375}})
Create the geospatial index
Now, inorder to query by geo co-ordinates, we need to create an index over the “loc” field of our dealership documents.
db.dealerships.ensureIndex({loc:"2d"})
Common Use Cases
What if I want the two dealerships closest to a specific co-ordinate?
This can be done using the “near” and “limit” query options. This query finds the points closest to the co-ordinate provided and returns them sorted by distance from the point given (Yep, MongoDB handles that all for you, returns the data exactly as you would expect).
db.dealerships.find({loc: {$near:[51,-114]}}).limit(2)
Query returns:
{ "_id" : ObjectId("4e8927066f9caf7713a8421b"), "name" : "Tom's Toyotas", "affiliation" : "Toyota", "loc" : { "lon" : 50.93939251390387, "lat" : -113.98040771484375 } }
{ "_id" : ObjectId("4e8926f96f9caf7713a8421a"), "name" : "Nick's Nissans", "affiliation" : "Nissan", "loc" : { "lon" : 51.12076493195686, "lat" : -113.98040771484375 } }
What if I want to filter by dealership affiliation in the query?
No problem, the MongoDB people have thought of that as well. They call these “Compound Indexes”. When creating the geospatial index you can also include other fields in your document in that index. So for example if you wanted to have your application query for all “Ford” affiliated dealerships available close to the co-ordinates provided, you would create the following index:
Add the Compound index:
db.dealerships.ensureIndex({loc:"2d", affiliation:1})
Then your application would be able to query by dealership affiliation as well:
db.dealerships.find({loc: {$near:[51,-114]}, "affiliation":"Ford"})
Query returns:
{ "_id" : ObjectId("4e8926696f9caf7713a84215"), "name" : "Frank's Fords", "affiliation" : "Ford", "loc" : { "lon" : 51.10682735591432, "lat" : -114.11773681640625 } }
You can see the value in being able to do these geospatial queries so easily. For example, if your website has a map, showing dealership locations, the customer can click on and zoom in on any area of the map. When they do, the items displayed on the map will be refreshed based on a geospatial query, returning the N number of items closest to the point selected. Of course, this can also be filtered further by allowing the customer to select filter criteria such as “Affiliation”.
What if I want to search for all dealerships within a given area of town?
Well MongoDB handles that as well with “Bounded Queries”. With bounded queries you can use either a rectangle, circle, or polygon. Since areas of cities are best represented by a polygon, we will use that for this example.
Let’s define a polygon for a specific area of town:
areaoftown = { a : { x : 51.12335082548444, y : -114.19052124023438 }, b : { x : 51.11904092252057, y : -114.05593872070312 }, c : { x : 51.02325750523972, y : -114.02435302734375 }, d : { x : 51.01634653617311, y : -114.1644287109375 } }
Once this polygon has been defined, we can then search our dealerships collection for dealers that fall within this boundary.
db.dealerships.find({ "loc" : { "$within" : { "$polygon" : areaoftown } } })
NOTE: Polygon searches are only available in versions >=1.9
Query returns:
{ "_id" : ObjectId("4e892d8c7f369ee980a3662b"), "name" : "Charlie's Chevrolets", "affiliation" : "Chevrolet", "loc" : { "lon" : 51.08282186160978, "lat" : -114.10400390625 } }
{ "_id" : ObjectId("4e892d797f369ee980a36629"), "name" : "Frank's Fords", "affiliation" : "Ford", "loc" : { "lon" : 51.10682735591432, "lat" : -114.11773681640625 } }
{ "_id" : ObjectId("4e892d837f369ee980a3662a"), "name" : "Steve's Suzukis", "affiliation" : "Suzuki", "loc" : { "lon" : 51.09144802136697, "lat" : -114.11773681640625 } }
MongoDB makes this simple and easy to use, good job!
Nice examples, thanks Brian.
When I saw that MongoDB came with spatial queries built in, I knew that spatial functionality had really hit the mainstream! Thanks for pointing out the new-ish point-in-polyon functionality, I’m excited to try that out.
regards,
-Frank
Comment by Frank Hardisty (@frankhardisty) — October 3, 2011 @ 12:10 pm |
Thanks for the feedback Frank. I was really surprised to see a document store offer this functionality, which is very interesting. I have been using MongoDB in a production environment for a while now and never realized this functionality was available. The one piece that still seems to be missing for me (Unless they have it and I just did not see it in the documentation) is the ability to query in terms of layers. So at the moment you can query by saying “Give me all documents close to the point I have provided”, “Give me all documents within a specified polygon”, but so far I don’t see “Give me all polygons that a specific point fits inside”. Basically having polygons that overlap, or you can think about it as multiple layers of polygons. So I want to take a point and find the polygon at every layer that it cuts through and return that list of polygon names. I can mimic this functionality at the application level with multiple queries, but if this functionality was built in, would be very cool. Oracle and MySQL already have this.
Comment by Brian — October 3, 2011 @ 1:17 pm |
I think you are correct that Mongo does point-in-polygon but not polygon-in-point.
If you have heavy spatial needs, I would favor PostGIS (the spatial extension to PostgreSQL) over MySQL. As I recall, MySQL works only with Minimum Bounding Rectangles (MBRs) when finding intersections. Which is super-speedy, but insufficient if you want to find the intersection of two polygons. Oracle also has excellent spatial support, but PostGIS is fully caught up and is much cheaper, so that’s what our group favors. However, neither is document-oriented. Lucene and Solr offer spatial search (http://wiki.apache.org/solr/SpatialSearch) which seems similar-ish to what Mongo provides. And then there’s GeoCouch (https://github.com/couchbase/geocouch) to consider as well. Exciting times for geo-text!
Comment by Frank Hardisty (@frankhardisty) — October 4, 2011 @ 10:46 am |
Thanks for the info Frank! I was not aware there was a package for CouchDB offering spatial support. I will have to try that out. It is also interesting to know that PostGIS is the preferred relational database for handling polygon intersections. We had played with Oracle, it’s spatial support does exactly what we need, but as you mentioned, it is very expensive. We had not thought to try out PostgreSQL, I did not realize it had the same spatial support as Oracle.
Comment by Brian — October 5, 2011 @ 1:13 pm |