Tuesday, March 4, 2014

How to Deploy a Sharded MongoDB Cluster

Introduction

In this tutorial we will show how to deploy a sharded MongoDb cluster. We assume that you already have MongoDB and Node.js installed.


Before moving forward, let's explain some key concepts that will make this tutorial easier to follow:
- Sharding: It is the fact of adding more machines to support data growth and greater demand of read and write operations.
- Cluster: It is a set of connected servers that can be viewed as one system.


Our cluster will be composed of the following components:
- 2 shards: these are just regular mongodb databases to store the data   
- 1 query router: also named mongos, it interfaces with client applications and direct operations to the appropriate shard.
- 1 config server: this is a mongodb database that stores the cluster's metadata including the mapping of the cluster's data set to the shards. The query router uses this metadata to route requests to shards.


Installation

Let's start by creating our 2 shards.
The shard needs a folder that will house its files. You can create it like so (-p is to create all the nested folders in the path if they don't exist):
~~~~
mkdir -p /data/db1
~~~~


Now you can start the first shard by issuing this command:
~~~~
mongod --shardsvr --port 27016 --dbpath /data/db1
~~~~


Let's repeat the same steps for the second shard:
~~~~
mkdir -p /data/db2
mongod --shardsvr --port 27017 --dbpath /data/db2
~~~~


Now let's configure the config server.
Issue the following command to create the folder that will house the files of the config server:
~~~~
mkdir -p /data/configdb
~~~~


then start the config server like so:
~~~~
mongod --configsvr --dbpath /data/configdb --port 27018
~~~~


The only component left is the query router server (mongos). You can start it by typing the following command:
~~~~
mongos --configdb localhost:27018 --port 27019
~~~~


All the components of our cluster are now installed and running. Let's move to the next step and configure our system to use sharding.


Sharding configuration

For this step, we will be using the "test" database.

In order to start the configuration, we need to connect to the mongos instance. The connection can be established like so:
~~~~
mongo --host localhost --port 27019
~~~~


Add the first shard to the cluster by typing this command from the mongos console prompt:
~~~~
mongos> sh.addShard("localhost:27016")
~~~~


You should see an output matching this line:
~~~~
{ "shardAdded" : "shard0000", "ok" : 1 }
~~~~


Add the second shard:
~~~~
mongos> sh.addShard("localhost:27017")
~~~~


The output should match this:
~~~~
{ "shardAdded" : "shard0001", "ok" : 1 }
~~~~


To verify that the two shards have been created, type the following command:
~~~~
mongos> sh.status()
~~~~


You should see two shards listed in the output like so:
~~~~
chunks:
                   shard0000 1
                   shard0001 1
~~~~


To enable sharding for the db, you can type the following command:
~~~~
mongos> sh.enableSharding("test")
~~~~


The output should be something like this:
~~~~
output: { "ok" : 1 }
~~~~


To enable sharding for a collection, issue this command (the collection will be created if it does not exist):
~~~~
mongos> sh.shardCollection("test.location", { "zipcode": 1} )
~~~~


You should see this output:
~~~~
{ "collectionsharded" : "test.location", "ok" : 1 }
~~~~


Verification and Testing

Now it is time to test our setup and verify that the whole system works as expected. To achieve that, we need to write some code to insert 300000 records into the database and then verify that this data has actually been distributed over the two shards we have.



For our test, let's create a file called "sharded_cluster_test.js" and add the below Node.js snippet to it. This code will insert some data into our sharded cluster, :
~~~~
var MongoClient = require('mongodb').MongoClient
    , format = require('util').format;


MongoClient.connect('mongodb://127.0.0.1:27019/test', function(err, db) {
    if(err) throw err;


    var collection = db.collection('location');
      for(i=0;i < 300000;i++){
           collection.insert({ "zipcode" : i}, function(){});
      }
      db.close();
});
~~~~


Run this file like so:
~~~~
node sharded_cluster_test.js
~~~~


Verify that the data made it to the database:
~~~~
mongos> use test
mongos> db.location.count()
~~~~


The above command should return: 300000


Let's now verify it has been distributed over our two shards.
Log into the first shard like so:
~~~~
mongo --host localhost --port 27016
~~~~


and then count the items that have been added:
~~~~
db.location.count()
~~~~


Repeat the same steps for the second shard:
~~~~
mongo --host localhost --port 27017
db.location.count()
~~~~


The total of the items added to both shards should be equal to 300000


Congratulations! you have successfully deployed your first MongoDb sharded cluster.


Resources
http://docs.mongodb.org/manual/sharding/


No comments: