Sunday, 21 October 2012

A First Look at node.js, A Logging Application

I had a chance this week to take a first look at node.js. I'd been meaning to do this at some point, as it's a critical component in my ongoing investigation of full-stack JavaScript, so I was glad to get this opportunity.

I've already covered a bit about MongoDB, but, obviously there's a lot more to go over here. One thing that is worth noting is that Mongo seems to be emerging as the persistent database of choice in the full-stack frameworks (with Redis often being used for caching). [Just FYI, I'll be at the Mongo Meetup in Boston this coming Wednesday]

But, while one has many choices on the datastore side of things, there's one technology that seems to be indispensable for doing full-stack JavaScript, and server-side JavaScript in particular, and that is node.js. In short, Node is a JavaScript engine, based on Google's super-fast and open source V8, with particular emphasis on network application capabilities, CommonJS library implementation, and package management (via npm). Although it's possible to fork threads, and there's even a cluster API, a node application is generally designed to work in a single event loop, using non-blocking, asynchronous, I/O to achieve great efficiency in processing requests.

Point is, it's fast.

Nearly as fast as Java, and orders of magnitude faster than Ruby.

It's "event-driven," using callback functions to deliver asynchronous functionality, very much like the way JavaScript works in the browser. However, it's a very different way of thinking when programming server-side code. My first pass at understanding it took a little effort. I focused on using the setTimeout function in JavaScript to test some assumptions about how events are dealt with. Initially, I was quite confused because I would trigger two requests in short succession in two Firefox tabs, each of which would fire a 10 second wait using setTimeout, and the requests were served serially, one after the other, taking a total of 20 seconds to process both. However, as it turns out, this was happening because of the way Firefox was managing the requests. Using curl on the command line and executing multiple requests more or less simultaneously revealed that these "waits" are, in fact, running in parallel.

This first attempt can be seen here:
 var http = require("http");  
 function do_thing(response, callback){  
  var d=new Date();  
  console.log('calling setTimeout '+d.toLocaleTimeString());  
  var timer = setTimeout(function(){  
   var d=new Date();  
   console.log('timeout callback '+d.toLocaleTimeString());  
   respond(response);  
  }, 10000);  
  var d2=new Date();  
  console.log('setTimeout called '+d2.toLocaleTimeString());  
 }  
 function respond(response){  
  var d=new Date();  
  console.log('sending response '+d.toLocaleTimeString());  
  response.writeHead(200, {"Content-Type": "text/plain"});  
  response.write("hello at "+d.toLocaleTimeString());  
  response.end();  
 }  
 http.createServer(function(request, response) {  
  var d=new Date();  
  console.log('got request '+d.toLocaleTimeString());  
  do_thing(response, respond);  
 }).listen(8888);  
 var d=new Date();  
 console.log('running... '+d.toLocaleTimeString());  
 // UPDATE: Firing two requests curl in two Terminal windows resolves the issue below  
 // This code is good. The test (using Firefox) was bad.  
 //Two web requests in close succession, still have to wait 20 secs for 2nd request to complete...  
 //OUTPUT AS FOLLOWS:  
 //  
 //running... 12:12:00  
 //got request 12:12:06  
 //calling setTimeout 12:12:06  
 //setTimeout called 12:12:06  
 //timeout callback 12:12:16  
 //sending response 12:12:16  
 //got request 12:12:16  
 //calling setTimeout 12:12:16  
 //setTimeout called 12:12:16  
 //timeout callback 12:12:26  
 //sending response 12:12:26  

So, the next order of business was to use this knowledge to implement an actual application. The task at hand, a logging server. The basic requirement is this: HTTP POST with some arbitrary data, writes to Mongo database. Any application (client) can post data, in a format of it's own choosing (given a few required fields so that the source could be identified and filtering done). This was essentially fulfilling an "audit" requirement (details unimportant). This is about as simple as it gets, folks, for a real-life application.

I initially looked at Mongoose for interacting with the database. It's a pretty clean interface, and worked well. However, soon after getting that more or less working, I came across this module (which is part of the flatiron framework), a "multi-transport" logging library. This made sense, to use a logging library to do logging. The ability to use various storage mechanisms is very nice, and the fact that it supported Mongo out of the box was great.

One thing to note about Mongoose, in case this helps anyone who's looking at it, is that it uses the concept of a Schema. Which a first blush seems kind of silly when dealing with Mongo (which is schemaless, duh). And, in particular, I wanted to have a schemaless "logentry" which could be formatted by the application submitting the data. Fortunately, Mongoose supports a datatype called "Mixed" which essentially let's you pass a Hash, like so:

mongoose.model('AuditLog', new Schema({ application_name: String, log_entry: Schema.Types.Mixed, timestamp: Date }));

In any case, Mongoose was not part of the final solution. Winston was.

There's one other important piece which came up, though, too. And that was the issue of "frameworks." Flatiron, for example, where Winston comes from, is a framework. A bit of Googling, though, and I decided that the framework to use was express. It seems to be the node framework with the most traction. It also is very lightweight, and is the framework upon which most of the more fully-formed frameworks seem to be built (i.e. Tower.js and Derby, both of which I may try to look at for future applications).

Express certainly helped dealing with posted data and routing, which otherwise could have been tricky.

Putting all the pieces together, one gets this (complete with URL-based versioning which I chose to implement despite the controversy on this subject):
 // requirements  
 var winston = require('winston');  
 var MongoDB = require('winston-mongodb').MongoDB;  
 var express = require('express');  
 // set up winston  
 winston.add(MongoDB, {db:'winston'});  
 winston.remove(winston.transports.Console);  
 winston.emitErrs = true;  
 winston.handleExceptions(new winston.transports.File({ filename: 'exceptions.log' }))  
 // set up express app  
 var app = express();  
 // found out the hard way that this middleware is required to parse data  
 app.use(express.bodyParser());  
 // routing  
 // note: urls not implemented will serve 404 not found  
 app.all('/api/log', function(req, res){  
  res.status(301).redirect('/api/v1/log');  
 });  
 app.get('/api/v1/log', function(req, res){  
  res.send(500, { error: 'GET not implemented in v1' });  
 });  
 app.post('/api/v1/log', function(req, res){  
  var required = ['appname', 'logname', 'source', 'logentry'];  
  for(param in required){  
   if(! req.param(required[param])){  
    res.send(500, { error: required[param]+' is required' });  
    return false;  
   }  
  }  
  winston.info('Winston Logging', req.body);  
  res.send('Thanks for that, '+req.body.appname);  
 });  
 // start server  
 app.listen(8888);  
 console.log('Listening on port 8888');  

This must seem incredibly naive, and, indeed, it is. However, it gets the job done. Future plans include responding to GET request to return log data (for querying).

Now, one final note: testing. Oh, jeez. This was not a simple thing.

It's not clear to me that this aspect of node development is entirely full-formed. I didn't have a lot of luck finding consensus in this area. I chose, in the end, to use Vows. I'm not entirely certain I got this right, and I plan to spend some time fleshing this out at a later date. Problems with this solution, as it currently stands, are: environment management (i.e. write test data to production db?), and the fact that I have to have the Mongo and Node servers running in order to run the tests. Not to mention the fact that I'm not testing the asynchronous code, i.e. callbacks, although this seems like it is possible, I just haven't figured it out yet. I believe there are probably better ways of going about testing a node app, but I am a novice, and have much to learn.

For the record, though, here's what I've got for tests:
 var vows = require('vows'),  
  assert = require('assert'),  
  request = require('request'),  
  apiUrl = 'http://127.0.0.1:8888';  
 var apiTest = {  
  general: function( method, url, data, callback ){  
   //console.log(data);  
   request(  
    {  
    method: method,  
    url: apiUrl+(url||''),  
    followRedirect: false,  
    json: data || {}  
    },  
    function(req, res){  
     callback( res );  
    }  
   )},  
  get: function( url, data, cb ){ apiTest.general( 'GET', url, data, cb ) },  
  post: function( url, data, cb ){ apiTest.general( 'POST', url, data, cb  ) }  
 }  
 function assertStatus(code) {  
  return function (res, b, c) {  
   assert.equal(res.statusCode, code);  
  };  
 }  
 var log_entry = { foo: 'bar', baz: 'blah' };  
 var log_good = { logentry: log_entry, appname: 'my appname', logname: 'my logname', source: 'my source' };  
 var log_no_appname = { logentry: log_entry, logname: 'my logname', source: 'my source' };  
 var log_no_logname = { logentry: log_entry, appname: 'my appname', source: 'my source' };  
 var log_no_source = { logentry: log_entry, appname: 'my appname', logname: 'my logname' };  
 var log_no_logentry = { appname: 'my appname', logname: 'my logname', source: 'my source' };  
 // Create a Test Suite  
 vows.describe('Server').addBatch({  
  'when doing get request on root url': {  
   topic: function () { apiTest.get('', log_good, this.callback ) },  
   'returns error': assertStatus(404)  
  },  
  'when doing get request with badly formatted url': {  
   topic: function () { apiTest.get('/foo', log_good, this.callback ) },  
   'returns error': assertStatus(404)  
  },  
  'when doing get request with default url': {  
   topic: function () { apiTest.get('/api/log', log_good, this.callback ) },  
   'returns redirect': assertStatus(302) // note: this actually is a 301, but the test seems to need 302  
  },  
  'when doing post with incorrect data: no appname': {  
   topic: function () { apiTest.post('/api/v1/log', log_no_appname, this.callback ) },  
   'returns error': assertStatus(500)  
  },  
  'when doing post with incorrect data: no logname': {  
   topic: function () { apiTest.post('/api/v1/log', log_no_logname, this.callback ) },  
   'returns error': assertStatus(500)  
  },  
  'when doing post with incorrect data: no source': {  
   topic: function () { apiTest.post('/api/v1/log', log_no_source, this.callback ) },  
   'returns error': assertStatus(500)  
  },  
  'when doing post with incorrect data: no logentry': {  
   topic: function () { apiTest.post('/api/v1/log', log_no_logentry, this.callback ) },  
   'returns error': assertStatus(500)  
  },  
  'when doing post with correct data': {  
   topic: function () { apiTest.post('/api/v1/log', log_good, this.callback ) },  
   'is success': assertStatus(200)  
  }  
 }).run(); // Run it  

And that's it! (Other than the package.json file which handles dependency management via npm, similar to the way that a Gemfile and bundle does in Rails.)

Not much more to say, really, other than that this was a fun and interesting first crack at node.js, which I hope to get an opportunity to work more with in future.

I look forward to continuing to get to grips with full-stack JavaScript, and I hope to be able to implement a "real" application soon, and will blog about it here. In the meantime, and, in lieu of gifts, please enjoy this comic.

No comments:

Post a Comment