Tech_Veer: November 2011

According to Simon Guest from Microsoft on their cloud computing architecture. Although there was no new concept or idea introduced, Simon has provided an excellent summary on the major patterns of doing cloud computing.

I have to admit that I am not familiar with Azure and this is my first time hearing a Microsoft cloud computing presentation. I felt Microsoft has explained their Azure platform in a very comprehensible way. I am quite impressed.

Simon talked about 5 patterns of Cloud computing. Let me summarize it (and mix-in a lot of my own thoughts) …

1. Use Cloud for Scaling

The key idea is to spin up and down machine resources according to workload so the user only pay for the actual usage. There is two types of access patterns: passive listener model and active worker model.

Passive listener model uses a synchronous communication pattern where the client pushes request to the server and synchronously wait for the processing result.

In the passive listener model, machine instances are typically sit behind a load balancer. To scale the resource according to the work load, we can use a monitor service that send NULL client request and use the measured response time to spin up and down the size of the machine resources.

On the other hand, Active worker model uses an asynchronous communication patterns where the client put the request to a queue, which will be periodically polled by the server. After queuing the request, the client will do some other work and come back later to pickup the result. The client can also provide a callback address where the server can push the result into after the processing is done.

In the active worker model, the monitor can measure the number of requests sitting in the queue and use that to determine whether machine instances (at the consuming end) need to be spin up or down.

2. Use Cloud for Multi-tenancy

Multi-tenancy is more a SaaS provider (rather than an enterprise) usage scenario. The key idea is to use the same set of code / software to host the application for different customers (tenants) who may have slightly different requirement in

UI branding
Business rules for decision criteria
Data schema

The approach is to provide sufficient “customization” capability for their customer. The most challenging part is to determine which aspects should be opened for customization and which shouldn’t. After identifying these configurable parameters, it is straightforward to define configuration metadata to capture that.

3. Use Cloud for Batch processing

This is about running things like statistics computation, report generation, machine learning, analytics … etc. These task is done in batch mode and so it is more economical to use the “pay as you go” model. On the other hand, batch processing has very high tolerance in latency and so is a perfect candidate of running in the cloud.

Here is an example of how to run Map/Reduce framework in the cloud. Microsoft hasn’t provided a Map/Reduce solution at this moment but Simon mentioned that Dryad in Microsoft research may be a future Microsoft solution. Interestingly, Simon also recommended Hadoop.

Of course, one challenge is how to move the data from the cloud in the first place. In my earlier blog, I have describe some best practices on this.

4. Use Cloud for Storage

The idea of storing data into the cloud and no need to worry about DBA tasks. Most cloud vendor provide large scale key/value store as well as RDBMS services. Their data storage services will also take care of data partitioning, replication … etc. Building cloud storage is a big topic involving many distributed computing concepts and techniques, I have covered it in a separate blog.

5. Use Cloud for Communication

A queue (or mailbox) service provide a mechanism for different machines to communicate in an asynchronous manner via message passing.

Azure also provide a relay service in the cloud which is quite useful for machines behind different firewall to communicate. In a typical firewall setup, incoming connection is not allowed so these machine cannot directly establish a socket to each other. In order for them to communicate, each need to open an on-going socket connection to the cloud relay, which will route traffic between these connections.

I have used the same technique in a previous P2P project where user’s PC behind their firewall need to communicate, and I know this relay approach works very well.

REST API design is relatively new for me and trust me it is much more than designing Class with few methods. It becomes all the more difficult for someone with RPC background. Here you need different mind set all together.

Rest is different Architecture Style and there is no well-defined specification for the same. Rest is being very well adopted and has emerged over the period since 2000. While defining highly scalable API we really need to think about lot from non-functional requirements like scalability, manageability availability, extendibility, backward compatibility etc..

Overall, the thing to keep in mind is that REST is about exposing resources through URIs, not services through messaging interfaces. REST, in theory, is not limited to the web, but for the purposes of this blog post we assume to be discussing REST in the context of the world-wide web.

In order to design Rest APIs we need to simply answer following four questions.

1. What are the URIs? [“resources"]

a. Identify Resources (Generally nouns in the system) and break your problem down into the types of resources you want to manipulate. Try to list all the resources you could possibly need at this step. e.g Student, Order, LineItem etc..

b. Two places to consider when looking for potential resources are collections and search interfaces. A “collection of resources” may, in itself, be a whole new resource.Search interface will always return collection of resource meeting the search or filter criteria.

c. The thing to remember is that each resource should have its own URI.

2. What’s the format? ["representation"]

a. Decide what the representations are going to look like ( XML & JSON are most common exchange formats for RestFul Services )

3. What methods are supported at each URI? [GET,PUT,POST,DELETE]

a. Identifying methods to support seems relatively straightforward. Think about Create, Retrieve, Update and Delete operations on the identified resource. Based on application operation on the resource, GET, PUT, POST, DELETE could be decided

b. There is generally no confusion about GET and DELETE ( Retrieve, Delete operations )

c. POST is used for non-idempotent creation. This is when entity identifier is not passed during creation but needs to be generated by the back-end. For instance, news student creation operations are non-idempotent. Entity identifier (primary key) for the created entity is returned in the response, if entity creation was successful. Since the operation is non-idempotent, issuing it twice will create two similar entities.

d. PUT is used for idempotent creation. PUT should be used whenever resource unique id is known and passed during creation. General philosophy here is if resource does not exist it will be created and if exist then it will be updated.

e. Typically each entity type only supports either idempotent or non-idempotent creation. Usually most content is non-idempotent and is created with HTTP POST.

4. What status codes could be returned?

a. 200 OK: Success!

b. 304 Not Modified: There was no new data to return (think: cache).

c. 400 Bad Request: Invalid Request. Error message will be returned to provide further details.

d. 401 Unauthorized: Authentication credentials were missing or incorrect.

e. 403 Forbidden: Valid request that was refused. Attempt to access a resource that the client does not have permission to. Error message will be returned to provide further details.

f. 404 Not Found: The URL requested is invalid or the resource requested, such as a story or a user, does not exists.

g. 406 Not Acceptable: Returned when parameters passed are correct in theory and individually, but when combined can not be satisfied because the combination makes no sense (e.g. cart_id from one user is used with a user_id from another user). If possible, an error message will be returned to provide further details.

h. 500 Internal Server Error: Something went wrong on server side

i. 503 Service Unavailable: Servers are offline for maintenance or went down under load (oops).

Important REST Design Principles

1. RESTful HTTP approach

Proper RESTful APIs extensively utilize HTTP Protocol. Usage of HTTP methods for CRUD, standard HTTP response codes, common HTTP headers and Mime Types is a common practice.

Using HTTP is not enough, Using HTTP in a standard way is most important!

2. Every resource has identifier

On the Web, there is a unified concept for IDs: The URI. URIs make up a global namespace, and using URIs to identify your key resources means they get a unique, global ID. Each resource in CVP program viz. vehicle, user etc. will have identifier. VIN is unique identifier to identify vehicle.

RESTful URL Format RESTful APIs are semantic, resource-centric and have a general structure which looks something like the following:

http://api.myservice.com/{ver}/{lang}/{resource_type}/{resource_id}.{output_format}?{filters and api_key as arguments}

where:

{ver} indicates the version of the API and allows changing API syntax/behavior without breaking legacy clients (not necessarily providing “new” functionality for them, however). Usually version can be omitted, and it defaults to the latest stable version of the API.

{lang} in multilingual APIs, is a two-letter ISO abbreviation for a language, e.g. “en” for English, “es” for Spanish etc. Typically defaults to “en” for English, if ommited.

{resource_type} is the name of the resource, e.g.: user, story, cart, line item etc.

{resource_id} is a unique identifier of the resource. Can be numeric or alpha-numeric (e.g. in usernames). Often sequence numbers of the internal database schemas are used as identifiers, however usage of universally unique identifiers (UUIDs) leads to better-designed APIs.

{output_format} - Commonly used to let API know which format response is requested in: xml, html, json, bson, rss, atom are some of the common formats implemented. Frequently format can also be indicated using the HTTP Accept-headers.

3. Do not overuse POST: POST is in some senses the “most flexible” of HTTP’s methods. It has a slightly looser definition than the other methods and it supports sending information in and getting information out at the same time. Therefore there is a tendency to want to use POST for everything. You should only use POST when you are creating a new URI. Simply asl yourself whether you are using POST to do something that is really a GET, DELETE or PUT, or could be decomposed into a combination of methods.

4. Link things together

Each resource also gets links to its sub resources or its associations with other resources. GET request for Student also gets link to associated Address of that Student.

Resources in RESTful URLs are often chained. For instance, to access an item in a user’s order the resource part of the URL may look like: “user/2323/order/54234/line_item/73321″. Important thing to remember about resource chaining is that it represents a hierarchy: the line item belongs to the order and the order belongs to the user.

5. Stateless communication

REST mandates that state be either turned into resource state, or kept on the client. In other words, a server should not have to retain some sort of communication state for any of the clients it communicates with beyond a single request. The most obvious reason for this is scalability the number of clients interacting would seriously impact the server’s footprint if it had to keep client state. (Note that this usually requires some re-design

6. Idempotent Operation

Requests that use GET, PUT and DELETE methods are idempotent according to HTTP specification, meaning that they can be called multiple times consecutively without changing the status on the back-end. POST method does not guarantee to be idempotent special attention is needed for POST method.

7. Avoid actions in URIs.

This follows naturally from the previous point. But a particularly pernicious abuse of URIs is to have query strings like “someuri?action=delete”. First, you are using GET to do something unsafe. Second, there is no formal relationship between “action=” convention is something specific to application. REST is about driving as many “application conventions” out of the protocol as possible. Also avoid using verbs in URI

8. Sessions are irrelevant.

There should be no need for a client to “login” or “start a connection.” HTTP authentication is done automatically on every message. Client applications are consumers of resources, not services. Therefore there is nothing to log in to! So for example, while placing an order on a REST web service, we don’t create a new “session” connection to the service. Rather we ask the “order creator object” to create you a new order.

9. Request Caching

REST encourages caching of requests that can be cached, to minimize network bandwidth utilization. APIs should implicitly or explicitly mark response to a request as cacheable or non-cacheable. If a response is cacheable, then clients are encouraged to locally cache it and reuse data for later, equivalent requests. HTTP headers are typically used to label cacheable content and indicate the permitted duration of cache.

Tech_Veer

Monday 28 November 2011

Five Cloud Computing Patterns

1. Use Cloud for Scaling

2. Use Cloud for Multi-tenancy

3. Use Cloud for Batch processing

4. Use Cloud for Storage

5. Use Cloud for Communication

RESTful API Design