Aug 20, 2013

Streaming real-time stock market feeds with webscokets

This is something related to my first job. I was lucky enough to work on an amazing project where I had to write a stand-alone server component that had to stream stock market data to Rich Internet Applications (RIA). In order to do so we used a workaround called long-polling and websockets. This article explains what, why, and when of websockets and how it solved the problem that we were facing.


What is websockets ?

It looks like a new type of a TCP socket, doesn't it. But it’s not -- it is a new protocol that is designed to stream data using port 80 and port 443 and it comes as a part of HTML5. In other words websocket is a TCP socket with some added constraints. As I said earlier, unlike TCP sockets websockets can operate on standard web ports. Therefore there is no hassle of opening ports for applications, no need to worry about firewalls and virus-guards and etc. So this was ideal for us as we had to stream data to our RIAs.

Why websockets ?

In early days websites were just used to share information. All most all the websites only included words and images and there were no such things called commercial websites. Moreover, when it came to hardware, web servers had limited memory and storage yet it had to scale to serve a large number of users. So to cope with these requirements engineers came up with a protocol called HTTP. This was purely based on request/response paradigm. Whenever there is a new request web server creates a new socket and serves the request with a response and as soon as the response is sent server closes the socket and reacquires the resources that were allocated. This was fine before people came up with RIAs which gave birth to a large number of commercial websites. As time went on both software and hardware kept on evolving. Now with the help of AJAX and SOAP web applications are able to send asynchronous request to web servers. In addition, web servers are now rich with memory, storage and high processing power. This provided the perfect platform for RIAs and consequently RIAs became more and more like stand-alone applications. This is the point where HTTP became inadequate and this gave birth to websockets. In our case we had a RIA which presents stock market data and then allow users to trade their stocks. 

How websockets work ?

Before people invent websocket protocol there was no straight-forward way of streaming data on standard web ports. Therefore we used something called long-polling. As you may know, polling is the action of continuously checking some condition to see if some event has occurred. Long-polling is a slight deviation of this. Instead of implementing this we used a framework called CometD. But this has a major problem, though this manage to create the illusion of streaming data on top of a request/response model, this consumes a lot of bandwidth. So as a solution for this we used websockets which also comes with CometD framework. A lot of people tend to misunderstand that websocket is a replacement to HTTP. Actually, it is not but instead it is a complement to HTTP (or at least for now). Following, diagram depicts how a websocket communication is initiated. 




As you may have noticed, initial websocket handshake is done using HTTP. Then upon successful handshake both web client and server upgrade their communication protocol to websocket. Thereafter communication is done using websocket frames. The other thing to notice from the diagram is websocket communication does not have to be request/response. Websocket frames can flow back and forth as they wish like a normal TCP socket connection. Initial handshake request which is sent using HTTP protocol has to include certain header fields such as origin header, http compliant version header, websocket subversion header and so on. These headers and frames can be seen with the help of google-chrome's developer tool and you can try this out using link [3] which is a demo site to check websocket communication. Now let’s look at a webscoket frame. I took this image from link [2].



You can find more information about websocket frames in link [2]. However, websocket is not a silver bullet. There are several issues that websockets had to overcome. One of the main problems that websockets itself come-across is security vulnerability such as cross-origins attacks and proxy cache poisoning. Anyway, cross-origin attacks are tackled by adding an origin header which indicates the origin of the request. Proxy cache poisoning attacks are handled by masking frames which are sent by the client. Apart from these issues there is a different type of problem. That is, most of the existing proxy servers does not support websockets. When we hosted the server in United Kingdom and tried to establish a websocket connection using a browser in Sri Lanka, it failed saying bad request. This was due to intermediary proxy servers didn’t understand the protocol. As a solution for that we tried to initiate a websocket connection using HTTPS which intern upgrades to WSS and as we expected connection was successfully established. But this has the overhead of encryption. But we were happy to live with it as this saved a lot of bandwidth. Even in link [2] it says there is around 80 percent of success rate if we use wss instead of ws for communication. Lastly, only the latest browsers support web-sockets like Firefox latest version, chrome latest version and IE10.

When to use websockets ?

I think the answer to this question is pretty clear by now if you have read the article from the beginning. Though websockets are great when it comes to streaming data. It is not as matured as HTTP. Moreover, This can be used to create highly interactive web applications such as chat applications and online web based IDEs. However, when it comes to streaming data such as stock market updates using standard web ports, websockets are the only feasible solution as it saves a lot of bandwidth with less latency. 


References




4 comments :

    Blogger news

    Blogger templates

    Blogroll

    About