At the outset of the Kraken project I believe it is fundamental to have a good understanding of the required architecture even if the exact business requirements are still forming in my mind.

A minimum requirement of the system, regardless of the specific requirements, will be a duplex based messaging environment, that allows a web application to talk to a service layer via a messaging protocol and then receive pushed messages back from the service layer. The service layer will need to be scalable to meet demand and must be able to persist data in some format. Failures to hardware should not lead to service outages and data retrieval and persistence must be fast and scalable so that data bottlenecks do not occur.

My main aims of a system architecture can be simply stated as SCALABILITY, AVAILABILITY /  RESILIENCE and PERFORMANCE.

Scalability

I need to be confident that the hardware supporting Kraken Office can be scaled to cope with an increased demand for the service. Scalability in this sense relates to horizontal scalability, i.e. the addition of extra hardware units to increase capacity to an existing system without needing changes to underlying code.

In my experience scalability is best achieved by creating stateless, disconnected software components that talk to each other via a message broker or enterprise service bus (ESB). This removes the requirement for systems to have direct communication with each other. It is this direct communication between system components which often limits the ability to simply add more hardware as required. Message based systems (either simple brokers or ESBs) take care of communication delivery and enable additional publishers and subscribers to be quickly and easily added to a system. This means that multiple servers running the same software can easily be configured and injected into a system to provide the ability to cater for more demand.

An example of an Enterprise Service Bus

Enterprise services buses are higher level entities than pure message brokers and generally have features such as routing, security, reporting, queue interfaces, data transformations etc.. ESBs are commonly used by service orientated systems and are common in SOA world. Message brokers are lower level and enable raw messages to be broadcast from publishers to subscribers.

I have yet to make a decisions on the exact messaging implementation to be used by Kraken Office (this will be covered in a later post) but I will be using some messaging implementation to disconnect the different components of the system. The chosen communication system will need to be scalable across multiple nodes to provide support for flexible levels of demand and therefore also increase resilience.

In effect, the ability to easily drop in more nodes to an existing system with simple configuration is the key to scalability. If designed correctly a system under strain can be improved by simply installing the necessary software components onto a new hardware server which is then added to the system in the correct location via configuration.

Availability / Resilience

Distributed, disconnected systems also offer the benefits of resilience as a by-product of scalability. The ability to add multiple nodes running the same services to a system means that the failure of one of the nodes will not result in a total loss of service. If a system is designed to be scalable then a service layer will generally consist of two or more hardware servers. If one of these fails then the ability of the system to cope with excessive demand for its services may be reduced but the system will not automatically fail and lead to a total loss of service. If enough servers are used to ensure an adequate service level from a particular software component even under the heaviest of usage then the failure of one of the servers should not lead to an obvious degredation in system performance.

The goals of scalability ultimately lead to the removal of any one point of failure in a system. This ensures that all elements of a system can be scaled across multiple nodes, thus removing the possibility of one server failure bringing a system to it’s knees.

High availability is becoming increasingly important in software systems as more business processes and elements of our personal lives are reliant on computer systems. As a result, users no longer tolerate lack of availability and availability and resilience should be key aims in any system architecture.

As long as I successfully design the Kraken Office to be scalable with no single points of failure in the system I believe availability will follow closely behind.

Performance Bottlenecks

We all want lightning fast websites these days. Users have more and more choice when selecting which web applications to use and are unwilling to accept  any waiting time when using websites. Website performance is therefore a very important consideration and I am going to design Kraken Office for optimal performance from the outset.

In my experience performance bottlenecks generally occur in the following main areas:

  1. Large, inefficient web pages
  2. Excessive server communication with no client-side caching
  3. Long running, inefficient server actions
  4. Excessive data-store access
  5. Inappropriate use of relational databases
  6. Slow, inefficient database access code (scripts, stored procedures etc..)

All of the above are obvious candidates for performance issues and I’m sure pretty much all developers will have experienced these at some stage during their careers.

Some of these problems are associated with inefficient coding practices and over-use of large resources (images, videos etc.) in web applications. Modern broadband speeds are solving, or indeed masking, some of these file size download issues but size optimization should always be an important consideration when developing web projects.

Excessive server communication is an issue I have encountered numerous times, especially with LOB (line of business) applications which often manage and display large sets of data. Client side caching can be invaluable when focusing on these issues.

These issues are more implementational than design and therefore I will not consider them here. When I come to implement elements of the Kraken Office system these issues will be discussed in detail.

Designing For Performance

Designing for performance should address the issues raised above. This is a separate issue from implementing for performance.

I believe that systems designed for performance should consider the following:

  1. Minimise the number of calls between client and server – generally this is achieved by the use of client-side caching technologies.
  2. Minimise the server processes and interaction between server components, especially where these are synchronous communications.
  3.  Favour asynchronous communications where possible – this stops components and user interface elements from become unresponsive while waiting  for a process to complete.
  4. Utilise in memory data stores (NoSQL concepts) on the server side for real time data access rather than relational databases. NoSQL systems generally scale horizontally much better than RDBMS systems and as they generally run in memory have faster access times.
  5. Relational database interaction should use asynchronous operations based on message queues to persist and request data. If possible RDBMS should not be relied upon for real time usage.

Obviously many of the points above are open to debate and I do not prescribe an approach that fits all but merely what has worked for me when designing previous projects.

Basic Overall Design

The diagram below gives an overall picture of how I intend to architect the system. This should be considered as only one take on an asynchronous, duplex communication based web application with offline data storage. I imagine significant elements of the system will change and grow as the project progresses but this is a starting point from which I will begin.

The design of the system enables horizontal scalability of all elements of the system, given that distributable ESB and data caching systems are used.

The following points describe the salient elements of the system:

  1. A persisted duplex communication socket will existing between the web applications running in client web browsers and one of the application servers in the application layer. If one of the application servers fails then any clients connected to the server will lose their connection. A simple refresh of the browser would reconnect the user to a different application server via the load balancer.
  2. As the application layer sits behind a load balancer it can be easily scaled by simply adding new hardware with the correct components installed behind the load balancer.
  3. The application layer does not talk directly to any other part of the system, but all communication is asynchronous via an enterprise service bus or message broker. The messaging system then talks to any relevant systems and relays messages back to the application layer. The ESB or message broker will need to be distributable across multiple nodes to ensure scalability and resilience as described above.
  4. The data layer is scalable as it has no direct communication with the application layer and a correctly configured server can simply be injected into the data layer, picking up messages from the ESB or message broker.
  5. The data layer has direct communication with the cached data store as this will present one interface to the outside world, even though the system should be distributed across multiple nodes in implementation. The data layer will have the responsibility of checking whether data is available in the cache, loading from the persistant RDBMS if not and ensuring that updates are correctly persisted in both the cached data layer and the RDBMS. Reads from the RDBMS will be synchronous but all writes will be made offline via the ESB or message broker.
  6. The RDBMS will be scalable vertically and to some extent horizontally but the system will not rely on the system for real time data availability if the data has been pre-loaded into the cache. Exactly how data is to be cached is yet to be decided. The options include a full data cache on system start, data caching on demand or an intelligent pre-caching of commonly used data.

With the exception of synchronous data reads from the RDBMS and data cache the application is strictly asynchronous. Even the UI will not update until a data update message is received from the application layer. A typical process flow would be:

  1. A user makes a change to data.
  2. An asynchronous message is sent to the application layer  informing the system of the update.
  3. A message is sent to the ESB or message broker that data has been updated.
  4. A node within the data layer picks up the message and makes the relevant changes to the data cache, sending a message to the ESB or message broker informing that persistant data needs to update.
  5. A message is sent from the data layer to the ESB informing other systems that a data update has been made.
  6. All servers within the application layer pick up the message from the data layer and distribute it to the relevant web applications via their duplex connection.
  7. The user interface on the web applications is updated on receipt of the data update message.

This methodology enables truely asynchronous operations and also enables data changes to be broadcast to a range of web applications at one go, keeping data synchronised across multiple users.

Conclusion

At this stage of the process I think I have the basics of an architecture to enable  me to begin further design.

The next round of design will focus on the available technologies which can be used to develop such a system and a decision on which  I am going to employ.

Goodbye till then!