Why Do All the Great Node.js Developers Hate CoffeeScript?

Why do all the great Node.js developers hate CoffeeScript?

Take a look at the following Github repositories of the well-known Node.js developers:

Did you look at them? Not one of them has a project (that isn’t forked) that is written in CoffeeScript. So does the absence of CoffeeScript on Github imply these developers hate it? Absolutely not. Listen to episode 18 or 19 of Nodeup (don’t remember which one) but there are a couple of instances where they (expert Node.js devs) joke and laugh about writing in CoffeeScript. If this offensive? Of course not. But the attitude is curious to me.

One of the aforementioned developers said the following about a technology:

What if we could omit braces? How about semi-colons?

Sounds like the developer is talking about CoffeeScript, doesn’t it? No, it was TJ Holowaychuk describing Stylus, his CSS replacement language. Look at Stylus, look how CoffeeScript-esque it is. This is the the same TJ that doesn’t like CoffeeScript. This is meant to be partially tongue & cheek, but it does lend credance to my point.

Can you guess what the second most depended-upon package is on NPM? If you guessed CoffeeScript, you’d be right!

So if it’s the second most depended-upon package, it must be in use by us mere-mortal developers. Having defected from Rails, I love CoffeeScript. But, I ask again, why do the greats have a haughty attitude towards CoffeeScript? This isn’t meant to be a crusade trying to get people to convert to the holier-than-though CoffeeScript, but a genuine lack of understanding of why the disdain exists. Especially given the acceptance towards Haml, SASS, SCSS, Jade, etc. I mean, when it comes down to it, write in whatever makes you happy, but I feel like I’m missing something. If you’re part of the Node.js community, you’ll know what I’m talking about.

Looking over the CoffeeScript page, I think that you can safely conclude that in general, you’ll write less lines of code using CoffeeScript. Code is our enemy so that’s a good thing.

What do you think about CoffeeScript? Why do you think these developers don’t like CoffeeScript?

More fun CoffeeScript hatred:

If you use Git with others, you should checkout Gitpilot to make collaboration with Git simple. We would love your advice.

If you made it this far, follow me on Twitter: @jprichardson

-JP

Quick and Dirty Screen Scraping with Node.js using Request and Cheerio

I wrote my own screen scraping module built on PhantomJS, but unfortunately it’s too slow for most screen scraping tasks that don’t require browser-side JavaScript. One easy way to scrape pages with Node.js is to use Request and Cheerio.

Here is an example of scraping Bing to get all of the search results:

var request = require('request');
var cheerio = require('cheerio');

var searchTerm = 'screen+scraping';
var url = 'http://www.bing.com/search?q=' + searchTerm;

request(url, function(err, resp, body){
  $ = cheerio.load(body);
  links = $('.sb_tlst h3 a'); //use your CSS selector here
  $(links).each(function(i, link){
    console.log($(link).text() + ':\n  ' + $(link).attr('href'));
  });
});

Cheerio acts a jQuery replacement for a lot of jQuery tasks. It doesn’t replicate jQuery in every way, and most importantly it’s not meant for the browser but for the server. But it beats the pants off of the jsdom/jQuery combo for screen scraping.

Do you use Git? If so, checkout Gitpilot to make collaborating on software development easy.

You should follow me on Twitter: @jprichardson.

-JP

Submitting/Posting Files and Fields to an HTTP Form using C#/.NET

Awhile back, I had to integrate a C# program with a web system that allowed the user to upload a few files and include some misc. data. I Googled around and didn't find a comprehensive solution.

I did use some code I found on the internet, unfortunately I don't remember where, so I can't give proper attribution. If you know, please let me know; it's the code relevant to the MimePart class. I added the form values code and packaged it up into the HttpForm sugar.

Here is the code:

public class HttpForm {

    private Dictionary<string, string> _files = new Dictionary<string, string>();
    private Dictionary<string, string> _values = new Dictionary<string, string>();

    public HttpForm(string url) {
        this.Url = url;
        this.Method = "POST";
    }

    public string Method { get; set; }
    public string Url { get; set; }

    //return self so that we can chain
    public HttpForm AttachFile(string field, string fileName) {
        _files[field] = fileName;
        return this;
    }

    public HttpForm ResetForm(){
        _files.Clear();
        _values.Clear();
        return this;
    }

    //return self so that we can chain
    public HttpForm SetValue(string field, string value) {
        _values[field] = value;
        return this;
    }

    public HttpWebResponse Submit() {
        return this.UploadFiles(_files, _values);
    }


    private HttpWebResponse UploadFiles(Dictionary<string, string> files, Dictionary<string, string> otherValues) {
        var req = (HttpWebRequest)WebRequest.Create(this.Url);

        req.Timeout = 10000 * 1000;
        req.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
        req.AllowAutoRedirect = false;

        var mimeParts = new List<MimePart>();
        try {
            if (otherValues != null) {
                foreach (var fieldName in otherValues.Keys) {
                    var part = new MimePart();

                    part.Headers["Content-Disposition"] = "form-data; name=\"" + fieldName + "\"";
                    part.Data = new MemoryStream(Encoding.UTF8.GetBytes(otherValues[fieldName]));

                    mimeParts.Add(part);
                }
            }

            if (files != null) {
                foreach (var fieldName in files.Keys) {
                    var part = new MimePart();

                    part.Headers["Content-Disposition"] = "form-data; name=\"" + fieldName + "\"; filename=\"" + files[fieldName] + "\"";
                    part.Headers["Content-Type"] = "application/octet-stream";
                    part.Data = File.OpenRead(files[fieldName]);

                    mimeParts.Add(part);
                }
            }

            string boundary = "----------" + DateTime.Now.Ticks.ToString("x");

            req.ContentType = "multipart/form-data; boundary=" + boundary;
            req.Method = this.Method;

            long contentLength = 0;

            byte[] _footer = Encoding.UTF8.GetBytes("--" + boundary + "--\r\n");

            foreach (MimePart part in mimeParts) {
                contentLength += part.GenerateHeaderFooterData(boundary);
            }

            req.ContentLength = contentLength + _footer.Length;

            byte[] buffer = new byte[8192];
            byte[] afterFile = Encoding.UTF8.GetBytes("\r\n");
            int read;

            using (Stream s = req.GetRequestStream()) {
                foreach (MimePart part in mimeParts) {
                    s.Write(part.Header, 0, part.Header.Length);

                    while ((read = part.Data.Read(buffer, 0, buffer.Length)) > 0)
                        s.Write(buffer, 0, read);

                    part.Data.Dispose();

                    s.Write(afterFile, 0, afterFile.Length);
                }

                s.Write(_footer, 0, _footer.Length);
            }

            var res = (HttpWebResponse)req.GetResponse();

            return res;
        } catch (Exception ex) {
            Console.WriteLine(ex.Message);
            foreach (MimePart part in mimeParts)
                if (part.Data != null)
                    part.Data.Dispose();

            return (HttpWebResponse)req.GetResponse();
        }
    }

    private class MimePart {
        private NameValueCollection _headers = new NameValueCollection();
        public NameValueCollection Headers { get { return _headers; } }

        public byte[] Header { get; protected set; }

        public long GenerateHeaderFooterData(string boundary) {
            StringBuilder sb = new StringBuilder();

            sb.Append("--");
            sb.Append(boundary);
            sb.AppendLine();
            foreach (string key in _headers.AllKeys) {
                sb.Append(key);
                sb.Append(": ");
                sb.AppendLine(_headers[key]);
            }
            sb.AppendLine();

            Header = Encoding.UTF8.GetBytes(sb.ToString());

            return Header.Length + Data.Length + 2;
        }

        public Stream Data { get; set; }
    }
}

You can easily use it like so:

var file1 = @"C:\file";
var file2 = @"C:\file2";

var yourUrl = "http://yourdomain.com/process.php";
var httpForm = new HttpForm(yourUrl);
httpForm.AttachFile("file1", file1).AttachFile("file2", file2);
httpForm.setValue("foo", "some foo").setValue("blah", "rarrr!");
httpForm.Submit();

Do you use Git? If so, checkout Gitpilot to make using Git thoughtless.

Follow me on Twitter: @jprichardson.

-JP Richardson

Installing Node.js on Ubuntu 10.04 LTS

Installing Node.js on Ubuntu 10.04 LTS is pretty straight forward.

You will want a Node.js versioning manager. Node.js has a quick release cycle, point releases happen quite frequently. A Node.js versioning manager will help you keep all of your versions isolated from each other.

As it stands today, there are four Node.js version managers. They are:

  1. NVM – NVM works like RVM. It must be sourced in your ~./bashrc or ~./profile file. Some people don’t like this. It’s my understanding that some find this to be a bit of hackery.
  2. Nave – Nave doesn’t need to be sourced or loaded up into your bash profile. But, when you use Nave it executes commands into a subshell. It’s my understanding that if any process in a subshell modifies the environment then these changes won’t persist to the parent process. It’s not entirely clear these changes persist or not. But the rhetoric from some regarding using subshells for version management was enough to drive me away.
  3. n – I love the simplicity of ‘n’. It doesn’t use subshells and it doesn’t require that you modify your bash profile. I would use ‘n’ if it installed NPM (Node.js package manager) with each release, and it doesn’t.
  4. nodeenv - I never seriously considered this one as it requires Python to be installed. I haven’t read about anyone using this. But I wanted to list it so that you’d be informed about its existence.

Use NVM. Seriously, it just works.

On your clean Ubuntu machine, make sure that Git is installed:

sudo apt-get install git-core

Then install NVM:

git clone git://github.com/creationix/nvm.git ~/.nvm
. ~/.nvm/nvm.sh # <------ be sure to add this line to the end of your ~./profile or ~./bashrc file

Now install all of the packages need to build Node.js:

sudo apt-get install build-essential openssl libssl-dev pkg-config

Now install the latest version of Node.js, at the time of this writing it’s v0.6.9

nvm install v0.6.9

You now have a Node.js environment on your machine! Just run node on the command line to experiment with the Node.js REPL. You can also run npm to install Node.js packages. Read more about NPM here.

Do you use Git? If so, checkout Gitpilot to make using Git mindless.

Follow me on Twitter: @jprichardson and read my blog on entrepreneurship: Techneur.

-JP Richardson

Comparing Two Javascript Objects

Recently, I was faced with a problem where I needed to compare two Javascript objects. My initial strategy was to convert them to JSON and compare the JSON strings.

Sort of like this:

var a = JSON.stringify(person1);//'{"firstName":"JP","lastName":"Richardson"}'
var b = JSON.stringify(person2);//'{"firstName":"JP","lastName":"Richardson"}'

assert(a === b);

Simple enough, right?

Not so fast. I encountered a case like this:

var a = JSON.stringify(person1);//'{"firstName":"JP","lastName":"Richardson"}'
var b = JSON.stringify(person2);//'{"lastName":"Richardson","firstName":"JP"}'

assert(a === b);

The data is the same, but the string is different. Fortunately, Stackoverflow had a nice Javascript object comparison algorithm to dump into my app.

Object.prototype.equals = function(x)
{
  var p;
  for(p in this) {
      if(typeof(x[p])=='undefined') {return false;}
  }

  for(p in this) {
      if (this[p]) {
          switch(typeof(this[p])) {
              case 'object':
                  if (!this[p].equals(x[p])) { return false; } break;
              case 'function':
                  if (typeof(x[p])=='undefined' ||
                      (p != 'equals' && this[p].toString() != x[p].toString()))
                      return false;
                  break;
              default:
                  if (this[p] != x[p]) { return false; }
          }
      } else {
          if (x[p])
              return false;
      }
  }

  for(p in x) {
      if(typeof(this[p])=='undefined') {return false;}
  }

  return true;
}

Test passed. I eventually hit a situation where I had some code with an Object that had a Person prototype and some data that came from JSON. Kinda like this:

var person1 = new Person('JP', 'Richardson');
var person2 = JSON.parse('{"firstName":"JP","lastName":"Richardson"}');

//deepEquals is code snippet above ^
person1.deepEquals(person2); // <--- THIS FAILS

I only cared about comparing the data. The methods associated with the object (Prototype) didn’t matter. Let’s modify the above algorithm. I use CoffeeScript. Here’s the modification:

Object::jsonEquals = (x) ->
  #we do this because two objects may have the same data fields and data but different prototypes
  x1 = JSON.parse(JSON.stringify(this))
  x2 = JSON.parse(JSON.stringify(x))

  p = null
  for p of x1
    return false if typeof (x2[p]) is 'undefined'
  for p of x1
    if x1[p]
      switch typeof (x1[p])
        when 'object'
          return false unless x1[p].jsonEquals(x2[p])
        when 'function'
          return false if typeof (x2[p]) is 'undefined' or (p isnt 'equals' and x1[p].toString() isnt x2[p].toString())
        else
          return false  unless x1[p] is x2[p]
    else
      return false if x2[p]
  for p of x2
    return false if typeof (x1[p]) is 'undefined'
  true

This causes the situation like I described above to pass. Essentially convert to JSON to remove the prototype. I suppose you could make this more efficient my just manually setting the prototype to Object before doing the comparison, but oh well this works for the time being.

Do you use Git? If so, checkout Gitpilot to make project management and collaborating on projects seamless.

Follow me on Twitter: @jprichardson and read my blog on entrepreneurship: Techneur.

-JP Richardson

Node.js Exec Like Ruby Exec and Writing a Node.js Native Add On Module

Recently, I was faced with a problem that required my Node.js programs process to execute another process and have the procoess that’s passed to the exec function completely replace the Node.js process. In short, I wanted an ‘exec’ function like Ruby’s ‘exec’ function. Unfortunately, out of the box, Node.js doesn’t support this functionality. I asked on Stackoverflow.com, and someone had a response that I should use the POSIX exec functions to solve my problem and to consider writing a native Node.js extension.


npm install kexec

You can then use it like:

var kexec = require('kexec');
kexec('top'); //you can pass any process that you want here

Here is the C++ source for Node Kexec:


#include <v8.h>
#include <node.h>
#include <cstdio>

//#ifdef __POSIX__
#include <unistd.h>
/*#else
#include <process.h>
#endif*/

using namespace node;
using namespace v8;

static Handle<Value> kexec(const Arguments& args) {
    String::Utf8Value v8str(args[0]);
    char* argv2[] = {"", "-c", *v8str, NULL};

    execvp("/bin/sh", argv2);      
    return Undefined();
}

extern "C" {
    static void init (Handle<Object> target) {
        NODE_SET_METHOD(target, "kexec", kexec);
    }

    NODE_MODULE(kexec, init);
}

As you can see, writing a C++ add on in Node.js isn’t too difficult. You can use it in your Node.js Javascript like so:

var kexec;

try {
  kexec = require("./build/default/kexec.node"); //Node.js v0.4
} catch(e) {
  kexec = require("./build/Release/kexec.node"); //Node.js v0.6
}

module.exports = kexec.kexec; //function of kexec module is named kexec

Don’t forget your wscript file, which ironically is Python code:

def set_options(opt):
  opt.tool_options("compiler_cxx")

def configure(conf):
  conf.check_tool("compiler_cxx")
  conf.check_tool("node_addon")

def build(bld):
  obj = bld.new_task_gen("cxx", "shlib", "node_addon") 
  obj.cxxflags = ["-g", "-D_FILE_OFFSET_BITS=64", "-D_LARGEFILE_SOURCE","-Wall"]
  obj.target = "kexec"
  obj.source = "src/node_kexec.cpp"

In your package.json, include this bit:

"scripts": { "install": "node-waf configure build" }

Github Sourcecode: Node.js kernel exec

I’ve also included other resources for writing a Node.js Native Add On Module:

  1. Google V8 Engine Getting Started
  2. Google V8 Embedder’s Guide
  3. How to Roll Your Own Javascript API with V8
  4. How to Write Your Own Native Node.js Extension
  5. Writing Node.js Native Extensions
  6. Node.js Native Extension with Hammer and a Prayer
  7. Mastering Node; Add Ons
  8. Node.js Documentation: Add Ons
  9. Postgres Node.js Module
  10. V8 Sample: shell.cc
  11. V8 Objects
  12. There’s C in My JavaScript
  13. Converting V8 Arguments to C++ Types

Do you use Git? If so, checkout Gitpilot to make using Git easy.

Follow me on Twitter: @jprichardson and read my blog on entrepreneurship: Techneur.

-JP Richardson

Using OCMock with Mac OS X Lion, Xcode 4, to Mock and Unit Test Cocoa Desktop Apps

If you’re trying to learn how to use OCMock, you’ll encounter a number of articles dedicated to using it with iOS. You won’t find very many related to writing tests, mocks, and stubs for your Cocoa desktop applications for OS X Lion. If you’re writing your apps to exclusively target OS X Lion (10.7), then this article will be of use to you. I’m not sure if this technique will work for Snow Leopard apps or not. But, since OS X Lion 10.7 is a 64 bit OS the distributable library in the downloadable package (1.77) will not work. Hence, the reason for writing this article.

Here are the steps that you need to follow, you will create a brand new demo application that will demo using OCMock. You should be able to apply part of these instructions to your own project.

I should preface these instructions by stating that I am not an Xcode or Objective C expert. Also, I’m using Xcode version 4.1 despite version 4.2 being available. I just haven’t upgraded yet.

Building 64-bit OCMock Library

This is the first necessary task. As stated earlier, the libOCMock.a file found in the downloadable package is 32 bit only.

  1. Download the latest OCMock package. At the time of this writing, it’s version 1.77. It’s conceivable that later versions will include the 64 bit version and you’ll be able to skip these steps entirely.
  2. Mount the package and extract the Source and Release directories.
  3. Navigate to Source/ocmock-1.77 and open up OCMock.xcodeproj
  4. When Xcode opens up, click the root node “5 targets, multiple platforms” of the project navigator. The project’s build settings will show up. Observe that there are 5 targets: OCMock, OCMockTests, OCMockPhoneSim, OCMockPhoneDevice, OCMockLib
  5. Notice how there are 4 projects schemes: OCMockPhoneSim, OCMockPhoneDevice, OCMock, and OCMockLib
  6. Notice how there are four products: OCMock.framework, OCMockTests.octest, libOCMock.a, and libOCMock.a
  7. Recall, that we are most interested in a 64 bit libOCMock.a
  8. Delete the target OCMockPhoneSim. Select it. Right click and hit ‘Delete’ You should notice that one of the libOCMock.a products disappears, leaving us with one left.
  9. You should still be in the OCMockPhoneSim target with “My Mac 64-bit” Select the target OCMockPhoneDevice. Change the Base SDK to Mac OS X 10.7 on every dropdown that you can.
  10. Change Architecture to 64-bit on every drop down that you can.
  11. Remove i386 from Valid Architecture.
  12. Change scheme to “OCMockLib – My Mac 64-bit”
  13. Click ‘OCMockLib’ target. Click ‘Build Phases” and then expand the ‘Run Script’, remove the following text:
    # combine lib files for device and simulator platforms into one
    
    lipo -create "${BUILD_DIR}/${BUILD_STYLE}-iphoneos/libOCMock.a" "${BUILD_DIR}/${BUILD_STYLE}-iphonesimulator/libOCMock.a" -output "${TARGET_BUILD_DIR}/Library/libOCMock.a"
    
    &nbsp;
    
    # copy the headers (we could have used a copy files build phase, too)
    
    cp -R "${BUILD_DIR}/${BUILD_STYLE}-iphoneos/Headers" "${TARGET_BUILD_DIR}/Library"
    
  14. Click ‘Build’ from the ‘Product’ menu. 64 bit libOCMock.a should be built now. Right click libOCMock.a in the project navigator under the ‘Products’ group. Click ‘Show in Finder’. Copy libOCMock.a and the directory OCMock founder in the Headers directory that is located in the same directory as libOCMock.a. Copy these two items to a location that you can find them later.

Adding OCMock to the Demo Project

Now we’ll add the library and header to a demo project.

  1. Open Xcode. Click File…New Project. Select Cocoa Application.
  2. Name it whatever you want. Make sure that you click “Include Unit Tests”
  3. Verify that the default test is working… click Product…Test You should get a test error in the testing file. If so, works as expected.
  4. Navigate to your project directory. Create a directory in it called ‘TestLibraries’
  5. Copy libOCMock.a and the folder ‘OCMock’ containing the header files into the ‘TestLibraries’ directory.
  6. You’ll have a group (folder) that is named like so: (YOUR_PROJECT_NAME)Tests. We’ll refer to this as the testing group. Right click it and click ‘Add Files to..’
  7. Select the folder in your project directory that you created: ‘TestLibraries’ Make sure that “Copy items into designations group’s folder” is NOT checked since these files already exist at the project root. Select ‘Create groups for any added folders’. Uncheck your project target and make sure that your testing target is checked.
  8. Go to the Test target build settings. Make sure that: “Library Search Paths” has “TestLibraries” in it. If not, you’ll need to add the following string WITH THE QUOTES: “${SRCROOT}/TestLibraries”
  9. In the Test target build settings, make sure that “Header Search Paths” has “TestLibraries” with recursive selected. If not, add the following string WITH THE QUOTES:  ”${SRCROOT}/TestLibraries” Select the ‘recursive’ option.
  10. In the Test target build settings, locate ‘Other Linker Flags’, add: -ObjC -all_load
  11. In your implementation test file, locate the textExample method. Put this snippet in its place:
    #include <OCMock/OCMock.h> //put this at the top
    id mockString = [OCMockObject mockForClass:[NSString class]];
    
    [[[mockString stub] andReturn:@"MOCKS UP IN"] lowercaseString];
    
    STAssertEqualObjects([mockString lowercaseString], @"MOCKS UP IN", nil);
    
  12. Click “Product” menu and “Build For Testing”
  13. Then click “Product” and “Test” All should pass.

If you get the following error: ‘unrecognized selector sent to instance’ then you didn’t at the ‘Other Linker Flags’

Hope this helps. References:

Do you use Git? If so, checkout Gitpilot to make using Git easy.

Follow me on Twitter: @jprichardson and read my blog on entrepreneurship: Techneur.

-JP Richardson

Synchronous File Copy in Node.js

Sometimes, asynchronous operations can be a burden. Especially when you’re writing small console utilities like to batch process files.

There are many asynchronous ways to copy a file. Here is a synchronous version (CoffeeScript):

copyFileSync = (srcFile, destFile) ->
  BUF_LENGTH = 64*1024
  buff = new Buffer(BUF_LENGTH)
  fdr = fs.openSync(srcFile, 'r')
  fdw = fs.openSync(destFile, 'w')
  bytesRead = 1
  pos = 0
  while bytesRead > 0
    bytesRead = fs.readSync(fdr, buff, 0, BUF_LENGTH, pos)
    fs.writeSync(fdw,buff,0,bytesRead)
    pos += bytesRead
  fs.closeSync(fdr)
  fs.closeSync(fdw)

You can view the converted version in JavaScript.

Do you use Git? If so, checkout Gitpilot to make using Git thoughtless.

Follow me on Twitter: @jprichardson and read my blog on entrepreneurship: Techneur.

-JP Richardson

Buzz: A Node.js Command Line Program to Keep Your App Running Indefinitely; Like the Program Forever

Buzz is a command line program that can kill your app routinely and restart it.
It’ll will also restart your app if it dies. It’s a lot like the other Node.js
program Forever.

It’s much simpler than Forever. Approximately 50 lines of CoffeeScript code.
It displays your apps output to STDOUT and also displays any of your apps
STDERR output in red.

Usage

Install it via npm:

npm install buzz

Then run:

buzz 240 your_cool_app param1 param2

The first parameter to buzz is the time in seconds that it’ll be killed and
restarted. So, `your_cool_app` would be killed and restarted after four minutes.

If you don’t want buzz to kill your app, but you want it to bring it back to
life if it dies, run:

buzz your_cool_app param1 param2

You can test buzz by running his the app `buzz_test`:

buzz_test

`buzz_test` runs the app `smarty_pants` that spews out random facts to you and
taunts you. Occasionally `smarty_pants` will commit suicide, but buzz will
bring him back to life.

`buzz_test` ends up actualy just running the following command:

buzz 10 smarty_pants 2000 0.15

Which will kill smarty pants every 10 seconds and bring him back to life. Also,
every two seconds, smarty pants will spit out a random fact. Approximately, every
13 seconds smarty pants will take his own life, but Buzz will bring him back.

Motivation

I have a command line app that is nasty to debug. It’s working fine for the first
five minutes or so. Thus, Buzz was born. Instead of fixing the bug, I wanted
to make this. =)

But really, it’s utility is that it’s a much simpler Forever.

The name comes from Buzz Lightyear in the movie Toy Story. His popular phrase was: To infinity and beyond!

Do you use Git? If so, checkout Gitpilot to make using Git thoughtless.

Follow me on Twitter: @jprichardson and read my blog on entrepreneurship: Techneur.

-JP Richardson

A Node.js Experiment: Thinking Asynchronously, Using Recursion to Calculate the Total File Size in a Directory

I recently picked up Node.js/CoffeeScript; I figured that since JavaScript can run on about every modern computing device, it’s about time that I accept JavaScript instead of side-stepping it by using dying technologies such as GWT and Silverlight.

I’ve always felt that the best way to learn a new language/platform is to start by writing a simple program that solves a simple problem.

My problem involved traversing the filesystem and performing some tasks. For the sake of this blog post and for the sake of your attention span, the problem can be reduced to a simple algorithm that computes the total space that a directory and its contents use.

Let’s start by creating a simple synchronous version:

fs = require('fs')
path = require('path')

du = (dir) ->
  total = 0
  try 
    stat = fs.lstatSync(dir)
    if stat.isFile()
      total += stat.size
    else if stat.isDirectory()
      files = fs.readdirSync(dir)
      for file in files
        total += du(path.join(dir, file))
  catch e
  
  total

DIR = '/'
total_bytes = du(DIR)
total_kb = total_bytes / 1024.0
total_mb = total_kb / 1024.0

console.log("#{DIR}: #{total_mb.toFixed(3)} MB")

This code works fine and as expected. It displays the total size of your entire directory in MiB. Ya, I know, I wrote “MB”.

But… we are using Node.js here. The asynchronous nature should be embraced. Let’s rewrite this algorithm in an asynchronous form.

fs = require('fs')
path = require('path')

duAsync = (dir, cb) ->
  total = 0
  fs.lstat dir, (err, stat) ->
    if err then return
    if stat.isFile()
      total += stat.size
    else if stat.isDirectory()
      fs.readdir dir, (err, files) ->
        if err then return
        for file in files
          duAsync path.join(dir,file), cb
    cb(null,total)

DIR = '/'
duAsync DIR, (err, total_bytes) ->
  total_kb = total_bytes / 1000.0
  total_mb = total_kb / 1000.0

  console.log("#{DIR}: #{total_mb.toFixed(3)} MB")

Hmm, this doesn’t output the correct values. I’m not passing the totals up the callback chain.

Also, from here on out, I’m only going to show the algorithm.

Let’s take advantage of closures and modify this a bit. If we could remove the recursion, that may simplify things a bit.

duAsync2 = (dir,cb) ->
  total = 0
  files = []
  all_files.push(dir)

  while all_files.length > 0
    current_dir = files.pop
    fs.lstat current_dir, (err,stat) ->
      if err then return
      if stat.isFile()
        total += stat.size
      else if stat.isDirectory()
        fs.readdir current_dir, (err,files) ->
          if err then return
          for file in files
            all_files.push(path.join(current_dir, file))
      cb(null,total)

On the surface, this looks fairly simple. We have removed the recursive aspect to simplify it a bit. The code in the while block will always see ‘total’ so we don’t run into the same problem as the last implementation.

One major problem though, this doesn’t work. This exits almost right away. Ah yes… we are doing an asynchronous implementation. The all_files array is empty by the time the while loop goes to the second iteration.

Maybe recursion is unavoidable? Let’s still leverage closures though.

This version is very similar to the last, I’ve just managed to use recursion within a function. The ‘again’ function is called recursively.

duAsync3 = (dir,cb) ->
  total = 0

  again = (current_dir) ->
    fs.lstat current_dir, (err, stat) ->
      if err then return
      if stat.isFile()
        total += stat.size
      else if stat.isDirectory()
        fs.readdir current_dir, (err,files) ->
          if err then return
          for file in files
            again(path.join(current_dir, file))
      cb(null, total)

  again(dir)

It works! Consider this: what if you only want the results at the very end? That is, you only want the callback to occur once, and at the end… then what do you do?

This was a dilemma that I faced for a bit. For this particular problem, it might not really matter much. Especially considering that this is a console utility. However, I considered figuring this out, a right of passage as a Node.js/JavaScript noob. So I didn’t want to use any utilities such as Async.js, Seq, etc.

I started doing research, fortunately I stumbled upon two great articles:

  1. “Asynchronous JavaScript: The Tale of Harry”
  2. Currying the Callback the Essence of Futures

The first article seemed to have almost an identical problem. Except, that the author didn’t impose the additional constraint of only executing the callback upon the finished. The solution in that article works as expected, but seems a bit more complex than necessary.

I kept researching. Found an article Deriving the Y-Combinator in 7 Easy Steps (JavaScript). My mind was exploding learning some of these functional programming concepts!

But, I still wasn’t closer to a solution. I finally made my way into #node.js on freenode (IRC). Fortunately, AvianFlu was able to lend me a tip. He suggested the following:

  • Create three variables: started, finished, running
  • At the beginning of the callback, increment started and running.
  • At the end, decrement running and increment finished.
  • When (started === finished) && (running === 0) You should be done.

I experimented with this for awhile. Sometimes, it felt that I was close. But it never quite worked. Then I thought about it a bit more and kept the concept of a ‘running’ variable and added a variable to denote the number of files left to process.

duAsync4 = (dir,cb) ->
  total = 0
  file_counter = 1 #starts at one because of the initial directory
  async_running = 0

  again = (current_dir) ->
    fs.lstat current_dir, (err, stat) ->
      if err then file_counter--; return
      if stat.isFile()
        file_counter--
        total += stat.size
      else if stat.isDirectory()
        file_counter--
        async_running++
        fs.readdir current_dir, (err,files) ->
          async_running--
          if err then return #console.log err.message
          file_counter += files.length
          for file in files
            again path.join(current_dir, file)
      else
        file_counter--
      if file_counter is 0 and async_running is 0
        cb(null, total)

  again dir

This works. What’s important to note is that there are many ways to solve problems using Node.js. On my Quad-Core MBP 8 GB Ram, this is almost twice as fast as the synchronous version!

Try it out and let me know your results. Also, can you think of any other ways to solve this problem?

Do you use Git? If so, checkout Gitpilot to make using Git thoughtless.

Follow me on Twitter: @jprichardson and read my blog on entrepreneurship: Techneur.

-JP Richardson

Follow

Get every new post delivered to your Inbox.