Close to the metal, Part 1: Building an HTTP API with members and collections

Published by Lorenzo Planas on April 21, 2014

When building HTTP APIs I put a lot of care in separating behavior from persistence, and keeping my tests fast. Usually one leads to the other, and you can get both if you spend some time thinking about good abstractions for your app.

Some years ago Backbone.js popularized the abstraction of resources as a duality of collections and members. I thought that could also work in the backend, for those APIs that mainly deal with collections of items. There are many domains where we handle sets of items, where an item may belong to several collections at once, or where an item will transition from one collection to another along its lifecycle.

An example could be an application with multiple event timelines - events may appear in several timelines at once, or move from one to another (e.g. from 'unread' to 'read').

Comments on an item also fit this abstraction, and since I'm lacking a comments engine in this blog, I thought I could build one and provide a real, albeit simplistic example of using collections and members in the backend.

For a first version, I'm going to focus on these features:

  1. Anybody will be able to post comments, no authentication needed.
  2. When a comment is created, it will go to a 'pending' collection, where it will wait for moderation.
  3. From time to time the blog admin will go through the pending collection, approving or deleting comments as needed.
  4. The admin will need some authentication means, however simple.
  5. When a comment is approved, it will be moved from the 'pending' collection to a specific collection for the URL that comment is tied to.
  6. There will be an endpoint to list all approved comments for a URL.

This is going to get a bit long, so feel free to browse the full code and specs while we move forward.

Members

Let's skip the outside-in BDD flow for this time, and jump directly to the resource objects. We are going to implement single comments as instances of Comment::Member. In this first version we won't be sending notifications to contributors on futher updates in a comment thread, so we will just store the name and e-mail of the author, the actual text of the comment, and the URL of the blog post.

Resources will delegate persistence to a repository object. The repository object will just care about a key and some data associated to the key.

Let's start by implementing a #sync method to tell the repository to store data for a comment:

def sync
  validate!
  Comment.repository.store(id, attributes)
  self
end

To get data back from the repository we will use #fetch:

def fetch
  set_attributes(Comment.repository.fetch(id))
  self
rescue KeyError, Errno::ENOENT
  raise ResourceNotFound
end

and to tell the repository to delete a comment, we will use #delete:

def delete
  Comment.repository.delete(id)
  self
end

You can see the code for the member is quite barebones. I'm using a plain Ruby object and some help from the standard library. I rolled out my own naive attributes and validations, though for more complex settings I will happily use and recommend Virtus and Aequitas. I'm also using UUIDs, since they are quite handy when using key / value pairs for storage.

Still here? Good, let's move on to the collection code.

Collections

We are going to use a similar approach for collection objects, although collection methods have a bit more logic. Check the #add and #remove methods below that we will use to, well, add and remove items to the collection:

def add(member)
  member.validate!
  members.add(member.attributes)
  operations.push([:add, member.attributes])
  self
end

def remove(member)
  members.delete(member.attributes)
  operations.push([:remove, member.attributes])
  self
end

We are operating on two mirroring sets: the members instance variable, which acts as a buffer to the underlying data in the repository, and a set of operations that will run as a batch in the repository once we sync changes. The #sync method will delegate to the repository and tell it to apply the operations in a batch:

def sync
  Comment.repository.apply(id, operations)
  self
end

This extra complexity is a bit of a premature abstraction at this point - since we aren't going to paginate records yet, we wouldn't need to keep a buffer in the collection object. Still, when we arrive to the point where we dive into the repository implementations, we will take advantage of this approach form some cheap optimizations. Besides, in the same way we don't persist member attributes one by one upon multiple changes, it doesn't hurt to batch operations on collections, when there's the potential we'll make several changes on their contents before syncing them.

Pass this point, we just need a #fetch method to retrieve collection data from the repository.

def fetch
  self.members = Comment.repository.fetch(id)
    .map { |attributes| member_klass.new(attributes) }
  self
rescue KeyError, Errno::ENOENT
  raise ResourceNotFound
end

While masking exceptions for nonexistent records with a more semantic name, this method is giving away some details of the two repository implementations included in the repo: in-memory and serialization to JSON files on the local filesystem. Oh well, so much for abstracting from persistence.

The last bit worth mentioning about the collection is the great support Ruby provides for this kind of objects. Just by implementing #each, we have access to a plethora of methods like #map, #reduce, #first, #last, etc. that will make handling collections a breeze.

include Enumerable

# code abridged ...

def each(&block)
  members.each(&block)
end

Persistence

While going through the code we have been making leaps of faith about the repository, assuming it would take care of everything related to persistence.

Beyond the many possible implementations for a repository, my main interest is the ability to replace it as needed: an in-memory one to run fast unit tests, or serialization to JSON files to achieve rough, non-volatile persistence, with minimum dependencies. Due to the reduced API, many other implementations can be made with no need to touch the code of the actual resources.

I'm just going to stop a bit on the internals of the JsonSerializer repository, since I mentioned earlier some optimizations to the #apply #apply method. Check the collection-related methods #apply, #add and #remove below:

def apply(id, operations)
  overwrite_data_in(File.join(basedir, id)) { |data|
    operations.each { |operation|
      command, args = operation[0], operation[1..-1]
      self.send(command, false, *args).call(data)
    }
  }
end

def add(id, *members)
  transformation = lambda { |data| data.push(*members) }
  return transformation unless id

  overwrite_data_in(File.join(basedir, id), &transformation) 
end

def remove(id, *members)
  transformation = lambda { |data| 
    members.each { |member| data.delete(member) }
  }
  return transformation unless id

  overwrite_data_in(File.join(basedir, id), &transformation)
end

The code in #apply, #add and #remove when serializing to JSON files is certainly more involved than for the in-memory version. All collection operations need to load the whole JSON file for the collection, alter the collection in-memory, and dump it again to the filesystem. I have extracted that logic to the private method #overwrite_data_in, to reduce the number of times we do this dance when applying a batch of operations. The methods #add and #remove will also return a lambda with their transformation operation when no id is passed, to be executed by #apply.

Even if this implementation is a bit convoluted, it shows that many persistence-related features and optimizations (in this case, to avoid extra file system calls and file-content rewriting operations) are orthogonal to the behavior implemented in the resources. This means another programmer may provide a repository implementation that is less naive and less flawed than mine, and still be a drop-in replacement to the application.

HTTP API

And we arrive to the HTTP API. Following the philosophy of being close to the metal while building this component, Sinatra is a natural choice to implement the HTTP endpoints.

Let's dive into the route that handles comment creation:

post "/comments" do
  begin
    comment = Comment::Member.new(payload)
    pending_comments = Comment::Collection.new("comments:pending")
    pending_comments.add(comment)

    comment.sync
    pending_comments.sync

    [201, comment.to_json]
  rescue InvalidResource
    [400, comment.to_json]
  end
end

We hydrate a new comment, add it to the 'pending' collection, and sync both the comment and collection. Now is the turn for the admin to review the pending comments.

get "/comments/pending" do
  begin
    comments = Comment::Collection.new("comments:pending").fetch
      .sort_by { |comment| comment.created_at }
    [200, comments.to_json]
  rescue ResourceNotFound
    [200, [].to_json]
  end
end

I'm sorting the collection in quite a rough way for now. Bear for me for a future installment where we we'll see how to pass filtering and sorting criteria to the repositories.

Anyway, let's approve some legitimate comments:

put "/comments/approved/:comment_id" do
  return [401] unless admin?

  begin
    comment = Comment::Member.new(id: params[:comment_id]).fetch
    pending_comments = Comment::Collection.new("comments:pending")
    entry_comments = Comment::Collection.new("comments:#{comment.entry}")

    pending_comments.remove(comment).sync
    entry_comments.add(comment).sync

    [200, comment.to_json]
  rescue ResourceNotFound
    [404]
  end
end

At last we get a bit of a semantic advantage using this approach of splitting resources into collections and members. I think this example shows it's quite straightforward to move a comment from a collection to another, signalling a change of state in the comment:

get "/comments" do
  return [204] unless params[:url]
  entry = Comment.entry_from(CGI.unescape(params[:url]))

  begin
    comments = Comment::Collection.new("comments:#{entry}").fetch
      .sort_by { |comment| comment.created_at }
    [200, comments.to_json]
  rescue ResourceNotFound
    [200, [].to_json]
  end
end

And we arrive to our bread and butter, the route that will return the collection of approved comments for a given URL. If you dig the code for the Comment.entry_from method, you'll see I'm turning URLs for posts into MD5 hashes. The JsonSerializer repository will use the MD5 hashes as file names to persist the collection of comments for a given URL. The hash will not only identify the collection uniquely, but will avoid long URLs going over the character limit for a filename.

Exceptions for control flow?!?!

A final note on the implementation of the HTTP endpoints. You may be cringing at those rescue blocks catching ResourceNotFound and InvalidResource exceptions, which I'm using to render responses for error conditions. I'm indeed using exceptions for flow control, well-aware that it is anathema in some circles. However, the flow is so simple in this case that I prefer the clarity of the code over the performance penalty. And this is Ruby after all ;)

Just this time, I'll argue using either this approach or a more orthodox if-else branching is a matter of different personal tastes, and you're welcome to differ.

Wrap up

You will surely find quicker, simpler ways to build an API to store comments. Or you may even employ your time on more profitable pursuits. On the other hand, maybe you can try this approach while building an app that makes extensive use of collections of items. Try and separate the behavior of your objects from their persistence needs, aiming for below-second running times when testing a whole component.

Total running time for tests, include
acceptance

The commenting engine isn't complete yet, though. None of the provided repository implementations will work in Heroku, for a start (and for good reasons). And the API lacks an UI! In the next part of this series I'll discuss how to implement a repository for Redis, and how to build pagination, querying and sorting into the repositories, which will bring us closer to real-world code. If you stay with me till the third part, we'll see how to build the Javascript client with vanilla Javascript and a bit of help from Riot.JS

While I finish the commenting engine, feel free to reach me through Twitter.