Selecting Technologies For Kraken Office

3 Comments

My last, long, post was concerned with the overall architecture of the Kraken Office project. I am now at a point where I am settled enough about the initial architecture of a duplex, message based web application that I can start selecting technologies for the project.

Most of my career has involved developing software for the Microsoft platform ( ASP, VB6, VB Script, ASP.NET, C#, SQL, Silverlight etc.) so I am naturally going to approach the problem with .NET as the technology for the development of the majority of the middleware, That is not to say that I won’t consider alternative technologies but this is to be my starting point.

Web Application Technologies

The client facing user interface will be a web application. This is to maximise the audience for Kraken Office.

Having recent experience with Silverlight and the MVVM methodology, this would be a logical choice. I am becoming increasing concerned, however, about Silverlight’s usage as a internet facing technology due to its reliance on a plug-in. The fact that there is no plug-in support for most mobile devices is especially disconcerting. Another issue is Microsoft’s lack of vocal support for the technology which is currently worrying the entire Microsoft development community (see Microsoft developers horrified).

I want Kraken Office to appeal to as large an audience as possible and I don’t think this goal would be advanced by setting system pre-requisites or by excluding the mobile market.

The obvious alternative for creating a web facing interface with rich LOB features is HTML 5. While this is a relatively new technology and not yet universally supported across all browsers, the development community has generally welcomed it and take up has been good. Microsoft appear to have invested heavily in its usage in Windows 8, although the full details are yet to emerge.

I have therefore decided upon HTML 5 as the main web technology for the application, as it currently appears to be the only viable alternative to Silverlight (Flash also appears to be going the way of Silverlight) for Kraken Office, which I am planning to develop into a fairly complex LOB application. In many ways this is a shame as the toolset for Silverlight and MVVM is impressive, easy to use and easy to debug. Moving back to HTML with Javascript, albeit with comprehensive Javascript libraries these days, does seem a bit daunting.

It is important to remember, however, that the end product of the development process is a useful product with the biggest audience possible. If HTML 5 is the best way of achieving this goal then I feel this is the approach to take.

While HTML 5 may be the technology I also need to decide on a development environment. For me this is pretty easy decision – I will use the Microsoft MVC 3 platform for the website framework and will work within Visual Studio with which I have a lot of experience.

I’m going to need to get more familiar with the available Javascript libraries, with the starting point being the ubiquitous JQuery. It has a small footprint and a large plug in community.

The last main issue is communication with the application server. I have used duplex communication via Duplex WCF services, but these utilise a long poll mechanism rather than a real push mechanism. Web sockets are the new HTML 5 technology for duplex communication and offer support for real duplex communication between server and client. Web sockets currently have limited browser support but are currently being implemented in all major new browser releases. For further information on web sockets refer to this interesting comparison of the pros and cons of long polling vs web sockets

Unfortunately web sockets are not yet supported in Microsoft’s WCF platform, although support is on the way at some stage (WCF WebSocket preview).

In view of the lack of WCF support my first thought was to write a web socket server in C#. I did some initial prototyping and developed a working web sockets server but found the solution verbose and complex. I also have reservations about network based coding in general as I have always found it one of the most difficult types of component to get stable and reliable. In addition to this I want to re-use as many existing frameworks and toolkits as possible, especially when it comes to project infrastructure, so that I can concentrate on developing business functionality.

I came across a new project called XSockets which has only just been released at the time of writing and which looks like it could offer WebSockets hosted in .NET. It was a brand new release, however, with very little documentation so has been discounted for now.

The other framework that caught my eye was the node.js project. After reading this great node.js websocket introduction I was immediately intrigued. Node.js is a javascript based framework for creating scalable, distributable network servers. Although it does not directly support web sockets, a plugin, WebSocket-Node, is available which appears to render the creation of web socket servers a relatively trivial task. Allied with excellent scalability and a dedicated community I believe node.js is an ideal framework for hosting my web socket servers and will relieve me of the task of creating my own server application.

Client-side web sockets will be handled using Javascript. JQuery does not currently support easy scripting of web sockets but there is a JQuery plugin, jquery-websocket, which is designed specifically for client-side interaction with web socket servers.

In conclusion the following sums up my initial web platform technology choices:

  1. HTML 5 developed in Visual Studio with MVC 3
  2. Javascript scripting with JQuery
  3. Node.js hosted web socket server using WebSocket-Node plugin
  4. Client-side javascript web socket interaction using the jquery-websocket JQuery extension
Middleware / Data Storage Technologies

My initial system designs for Kraken Office, which can be found at Designing A Scalable, Resilient System, assumed that I would be creating a .NET based web sockets server. With my decision to implement the web socket server with node.js, this will now not be the case. What is clear, however, is that the node.js server will be the bridge between the client web application and the middleware. In order to ensure that the system remains scalable any communication between node.js and the middleware must use a message broker of some description.

While investigating message brokers, Enterprise Service Buses (ESBs) and in-memory data stores I came across a range of options. ESBs seem too heavy-weight for the system in mind, although at a later date such power may be required. Currently, however, I don’t want to spend all of my precious time resources on trying to configure a system that is not adding value to the project.

One of the most recommended, and fast, in-memory data caches is Redis. A Windows port of the project is available here. In addition to being immensely fast, it provides support for publisher / subscriber queues. Redis is open source and would enable me to kill two birds with one stone; to provide both in-memory data storage and message broker requirements in one system. This would minimise the amount of new technology I need to learn and also the amount of software installation and configuration.

Redis clients exist for both node.js (Redback) and C# (ServiceStack).

Using Redis and node.js will necessitate a change to the architecture of the system. The diagram below documents v2 of the architecture.

The system now uses a scalable node.js layer as the initial application layer. This layer will manage web socket instances for the client web applications. The node.js layer will relay messages to the rest of the system via the Redis messaging layer. A .NET application layer will receive messages from the node.js layer and will decode and act upon the messages. If a message requires data interaction then the application layer will again submit this to the Redis messaging layer, in a different message channel. The data layer (another .NET based layer) will subscribe to data channel messages in Redis and will action any relevant messages. Data will be returned from the data layer via the Redis messaging layer back to the application layer and then onto the node.js layer.

This should prove to be a scalable and resilient system.

In conclusion the following sums up my initial middleware technology choices:

  1. .NET middleware layers (application and data) written in Visual Studio using C#
  2. Messaging provided by Redis and accessed by node.js and .NET clients using Redback and ServiceStack respectively
  3. Fast in-memory data caching using Redis
Data Persistence

While data cached in Redis can be persisted to disk using a variety of configurable persistence strategies, I do not consider it suitable for a permanent data storage solution for two main reasons:

  1. Persisting data on a regular basis to disk, which would be necessary for data integrity, will degrade the performance of Redis
  2. The Kraken Office system will need to archive certain types of data but retain the ability to retrieve the data on demand, for which Redis would be unsuitable

As a result I am going to stick with my decision to persist data to a RDBMS for permanent storage, in addition to retaining data in Redis for real-time application usage. As I am most familiar with Microsoft SQL Server I will stick with this as an initial database, although I will investigate MySQL in the near future due to its Open Source status.

Designing A Scalable, Resilient System

1 Comment

At the outset of the Kraken project I believe it is fundamental to have a good understanding of the required architecture even if the exact business requirements are still forming in my mind.

A minimum requirement of the system, regardless of the specific requirements, will be a duplex based messaging environment, that allows a web application to talk to a service layer via a messaging protocol and then receive pushed messages back from the service layer. The service layer will need to be scalable to meet demand and must be able to persist data in some format. Failures to hardware should not lead to service outages and data retrieval and persistence must be fast and scalable so that data bottlenecks do not occur.

My main aims of a system architecture can be simply stated as SCALABILITY, AVAILABILITY /  RESILIENCE and PERFORMANCE.

Scalability

I need to be confident that the hardware supporting Kraken Office can be scaled to cope with an increased demand for the service. Scalability in this sense relates to horizontal scalability, i.e. the addition of extra hardware units to increase capacity to an existing system without needing changes to underlying code.

In my experience scalability is best achieved by creating stateless, disconnected software components that talk to each other via a message broker or enterprise service bus (ESB). This removes the requirement for systems to have direct communication with each other. It is this direct communication between system components which often limits the ability to simply add more hardware as required. Message based systems (either simple brokers or ESBs) take care of communication delivery and enable additional publishers and subscribers to be quickly and easily added to a system. This means that multiple servers running the same software can easily be configured and injected into a system to provide the ability to cater for more demand.

An example of an Enterprise Service Bus

Enterprise services buses are higher level entities than pure message brokers and generally have features such as routing, security, reporting, queue interfaces, data transformations etc.. ESBs are commonly used by service orientated systems and are common in SOA world. Message brokers are lower level and enable raw messages to be broadcast from publishers to subscribers.

I have yet to make a decisions on the exact messaging implementation to be used by Kraken Office (this will be covered in a later post) but I will be using some messaging implementation to disconnect the different components of the system. The chosen communication system will need to be scalable across multiple nodes to provide support for flexible levels of demand and therefore also increase resilience.

In effect, the ability to easily drop in more nodes to an existing system with simple configuration is the key to scalability. If designed correctly a system under strain can be improved by simply installing the necessary software components onto a new hardware server which is then added to the system in the correct location via configuration.

Availability / Resilience

Distributed, disconnected systems also offer the benefits of resilience as a by-product of scalability. The ability to add multiple nodes running the same services to a system means that the failure of one of the nodes will not result in a total loss of service. If a system is designed to be scalable then a service layer will generally consist of two or more hardware servers. If one of these fails then the ability of the system to cope with excessive demand for its services may be reduced but the system will not automatically fail and lead to a total loss of service. If enough servers are used to ensure an adequate service level from a particular software component even under the heaviest of usage then the failure of one of the servers should not lead to an obvious degredation in system performance.

The goals of scalability ultimately lead to the removal of any one point of failure in a system. This ensures that all elements of a system can be scaled across multiple nodes, thus removing the possibility of one server failure bringing a system to it’s knees.

High availability is becoming increasingly important in software systems as more business processes and elements of our personal lives are reliant on computer systems. As a result, users no longer tolerate lack of availability and availability and resilience should be key aims in any system architecture.

As long as I successfully design the Kraken Office to be scalable with no single points of failure in the system I believe availability will follow closely behind.

Performance Bottlenecks

We all want lightning fast websites these days. Users have more and more choice when selecting which web applications to use and are unwilling to accept  any waiting time when using websites. Website performance is therefore a very important consideration and I am going to design Kraken Office for optimal performance from the outset.

In my experience performance bottlenecks generally occur in the following main areas:

  1. Large, inefficient web pages
  2. Excessive server communication with no client-side caching
  3. Long running, inefficient server actions
  4. Excessive data-store access
  5. Inappropriate use of relational databases
  6. Slow, inefficient database access code (scripts, stored procedures etc..)

All of the above are obvious candidates for performance issues and I’m sure pretty much all developers will have experienced these at some stage during their careers.

Some of these problems are associated with inefficient coding practices and over-use of large resources (images, videos etc.) in web applications. Modern broadband speeds are solving, or indeed masking, some of these file size download issues but size optimization should always be an important consideration when developing web projects.

Excessive server communication is an issue I have encountered numerous times, especially with LOB (line of business) applications which often manage and display large sets of data. Client side caching can be invaluable when focusing on these issues.

These issues are more implementational than design and therefore I will not consider them here. When I come to implement elements of the Kraken Office system these issues will be discussed in detail.

Designing For Performance

Designing for performance should address the issues raised above. This is a separate issue from implementing for performance.

I believe that systems designed for performance should consider the following:

  1. Minimise the number of calls between client and server – generally this is achieved by the use of client-side caching technologies.
  2. Minimise the server processes and interaction between server components, especially where these are synchronous communications.
  3.  Favour asynchronous communications where possible – this stops components and user interface elements from become unresponsive while waiting  for a process to complete.
  4. Utilise in memory data stores (NoSQL concepts) on the server side for real time data access rather than relational databases. NoSQL systems generally scale horizontally much better than RDBMS systems and as they generally run in memory have faster access times.
  5. Relational database interaction should use asynchronous operations based on message queues to persist and request data. If possible RDBMS should not be relied upon for real time usage.

Obviously many of the points above are open to debate and I do not prescribe an approach that fits all but merely what has worked for me when designing previous projects.

Basic Overall Design

The diagram below gives an overall picture of how I intend to architect the system. This should be considered as only one take on an asynchronous, duplex communication based web application with offline data storage. I imagine significant elements of the system will change and grow as the project progresses but this is a starting point from which I will begin.

The design of the system enables horizontal scalability of all elements of the system, given that distributable ESB and data caching systems are used.

The following points describe the salient elements of the system:

  1. A persisted duplex communication socket will existing between the web applications running in client web browsers and one of the application servers in the application layer. If one of the application servers fails then any clients connected to the server will lose their connection. A simple refresh of the browser would reconnect the user to a different application server via the load balancer.
  2. As the application layer sits behind a load balancer it can be easily scaled by simply adding new hardware with the correct components installed behind the load balancer.
  3. The application layer does not talk directly to any other part of the system, but all communication is asynchronous via an enterprise service bus or message broker. The messaging system then talks to any relevant systems and relays messages back to the application layer. The ESB or message broker will need to be distributable across multiple nodes to ensure scalability and resilience as described above.
  4. The data layer is scalable as it has no direct communication with the application layer and a correctly configured server can simply be injected into the data layer, picking up messages from the ESB or message broker.
  5. The data layer has direct communication with the cached data store as this will present one interface to the outside world, even though the system should be distributed across multiple nodes in implementation. The data layer will have the responsibility of checking whether data is available in the cache, loading from the persistant RDBMS if not and ensuring that updates are correctly persisted in both the cached data layer and the RDBMS. Reads from the RDBMS will be synchronous but all writes will be made offline via the ESB or message broker.
  6. The RDBMS will be scalable vertically and to some extent horizontally but the system will not rely on the system for real time data availability if the data has been pre-loaded into the cache. Exactly how data is to be cached is yet to be decided. The options include a full data cache on system start, data caching on demand or an intelligent pre-caching of commonly used data.

With the exception of synchronous data reads from the RDBMS and data cache the application is strictly asynchronous. Even the UI will not update until a data update message is received from the application layer. A typical process flow would be:

  1. A user makes a change to data.
  2. An asynchronous message is sent to the application layer  informing the system of the update.
  3. A message is sent to the ESB or message broker that data has been updated.
  4. A node within the data layer picks up the message and makes the relevant changes to the data cache, sending a message to the ESB or message broker informing that persistant data needs to update.
  5. A message is sent from the data layer to the ESB informing other systems that a data update has been made.
  6. All servers within the application layer pick up the message from the data layer and distribute it to the relevant web applications via their duplex connection.
  7. The user interface on the web applications is updated on receipt of the data update message.

This methodology enables truely asynchronous operations and also enables data changes to be broadcast to a range of web applications at one go, keeping data synchronised across multiple users.

Conclusion

At this stage of the process I think I have the basics of an architecture to enable  me to begin further design.

The next round of design will focus on the available technologies which can be used to develop such a system and a decision on which  I am going to employ.

Goodbye till then!

What On Earth is ‘Kraken Office’?

6 Comments

Kraken Office is the name I have given to a new office management tool I am writing.

The product has two main aims:

1. To enable me to experiment with a range of web and middle-ware technologies
2. To provide a viable and exciting office management tool, simplifying and shortening tasks such as employee communication, HR processes and general office tasks to many to mention here.

Basically, if you have ever been frustrated by an absence of a centralised process in your office during your working life I am aiming to tackle it! This is obviously a vast remit and I will have over achieved if I even make a dent into this but it does encapsulate what I am trying to achieve.

Stick with me for the journey – it may take some time and often be painful but I hope we can all learn from and enjoy the process…

The Kraken Awakes – Beginnings

Leave a comment

Welcome to the world of Kraken!

To be frank I’m not quite sure exactly where I got the name from but I liked the sound of it and the concept of summoning a leviathan from the deep..

I have been a software developer for the last 10 years or so and currently work with Microsoft technologies such as C#, Silverlight and ASP.NET MVC. I like working on personal development projects  and have decided to start a new project to develop some office management software. In a first for me, however, I intend to document the process and make it available to anyone who is interested. I’m sure I’ll encounter a whole range of challenges along the way and will do my best to document this in as useful a format as I can.

I am embarking upon this project for a number of reasons:

1. I like to have something to do in the evenings and on the train to and from work
2. I am very keen to get involved in some of the newer web technologies
3. I want to have a product in use by people which I think would be a great feeling

What is this Kraken nonsense?

Kraken is the name I have given to the office management system I am developing. One of the things that has surprised me over the last few years is the lack of office co-ordination I have encountered. By office co-ordination I mean:

1. The ability for employees to communicate with one another affectively
2. The ability for employees to access company organisational charts and to put ‘names to faces’ and roles in larger organisations
3. A centralised tool for recording holidays / stationery requests / help-desk queries etc
4. A centralised personnel system to track objectives, performance, reviews etc
5. Notice boards, company and departmental news postings , social club information, lost and found notices, company-wide classified advertisements etc

I’m sure there are hundreds of other ideas that could be added to this list and I’m sure that many of these will evolve and be added to the list with time.

I intend to approach the project using a ‘hybrid’ agile system. I will have written specifications and will not even know the scope of the project. I know what I am trying to achieve but not the exact system end-point. I have at least determined the first few tasks I am going to tackle:

1. System architecture plan
2. A bare bones system that is capable of supporting a range of business functions
3. The ability to create and manage users
4. The ability to authenticate against the system
5. An office communication tool to enable users to talk to one another

Once these initial tasks are complete I will in essence have a sophisticated  chat application running on a scalable and resilient framework. I’m hoping that by this time I will know what I want to implement next in order to keep the application moving in the right direction.

Manpower Issues & Cutting Corners!

Obviously as I am working on the application on my own in my spare time I will have a limited amount of resources. I am hoping that the lack of manpower will be somewhat offset by my ability to make all decisions on the fly without having to defer to a higher authority.

In order to progress at a fairly rapid rate, however, I am going to allow myself to cut some corners and to focus on delivering functionality (which is oft forgotten about in my experience).

So if I once in a while upset you purist developers, go easy on me and remember that software development is simply a tool to facilitate business and money-making and not an art form…