Menu Home

Deciphering the signal from the noise, what languages should you write trading applications in?

There is a myriad of different platforms, tools and languages to choose from when writing trading applications. Depending on your background and what type of trading you wish to do it can be hard to know where to start. Additionally, the tools used in the institutional setting can vary greatly to those that are a sensible choice for individuals or small scale operations.

If you come from a scientific computing, development or have no grounding in either of these, these can all affect the path you choose to proceed down. What follows is a brief overview of some of the main languages, along with why you may or may not wish to consider them.

Building your models (scientific computing)

For the pure mathematicians they will often have a grounding in MATLAB or Mathematica, and statisticians in R. These languages have extensive catalogs of ready made maths and stats libraries, making them great for building models, however, their performance is where they suffer compared to other languages.

Julia was designed to help address these short-comings, however, it's still a relatively new language, and does not have as rich a set of numerical libraries to match the established scientific languages. It is especially fast at processing loops (avoiding the need to vectorise your code), which is one of the key features often highlighted.

If you don't know any scientific languages, either MATLAB (or Octave if you don't have a MATLAB license) or R are worth familiarising yourself with, as a lot of math/stats/econometrics books and papers reference them.

Python also provides great support for scientific computing via a number of different libraries, the main one being the SciPy group of projects. There are companies providing scientific Python distributions with SciPy and other libraries pre-configured as part of the distribution, avoiding some of the headache's that can be encountered with setting them up manually (especially on Windows platforms). The most prominent being Enthought Canopy and Anaconda.

Automating your trading (server side programming)

The majority of institutions rely heavily on the statically typed languages C/C++ or Java for their server-side trading applications (market connectivity, order management/execution, settlement and risk systems), although some do use C#/.NET too. There are never ending flame wars about which is the superior language of C/C++ versus Java, but as a rule of thumb, if you want the absolute fastest possible algorithmic and order execution applications, you'll want to go with C/C++ optimised for your CPU architecture, providing you are optimising every part of your stack - utilising custom Linux kernel builds, specialist network devices with onboard programmable FPGA controllers, co-location in the exchange (which is all very costly in both time and money). If you just want fast, Java is still an excellent choice, especially when you follow the advice from blogs such as here.

The issue with using these languages for your own applications, is that if you're not already well versed in either of them, the learning curve is much greater then it is in a dynamically typed language such as Python or Ruby. Additionally, development times tend to be longer due to the greater complexity of the languages. Both Python and Ruby have QuickFIX implementations making them suitable candidates for building your server side components, although you're better off going with Python, unless you have a very compelling reason to go with Ruby. Some vendors even provide Python implementations of their APIs. However, do keep in mind that if you're going to be processing tens of thousands of events per second (i.e. a real market data feed), you're likely to be better off going with one of the higher performant languages from the get-go.

Functional programming languages have also gained a great deal of attention in the last few years, the main ones being Haskell, Scala (built on top of Java), Erlang and OCaml. Understanding how to think and develop in functional languages requires a paradigm shift in your thinking, but the resulting code tends to be much more concise (and many would argue elegant) then it's counterparts in non-functional languages. Python provides support for functional semantics, and both Java and C++ are starting to incorporate more functional features within them.

One firm has invested very heavily in OCaml and actually runs its entire trading stack on it. However, it stands out from most firms in having made such a significant investment in one language (of great benefit to the OCaml community). However, functional languages are good for scaling up quantitative models in distributed environment, where management of physical resources is left to the language interpreter/virtual machine, allowing the modeller to focus on the implementation of their model.

Data storage

At some point you're going to require somewhere to store records of your trading activities and historical data used in backtests. Up until a few years ago, the choice was between the different relational systems - MySQL, PostgreSQL, Oracle, Sybase, SQL Server. MySQL and PostgreSQL are still excellent (free) choices which can scale up to the enterprise to store order/execution data. However, you now also have the option of NoSQL and time-series databases which retain all available data in RAM (providing they have enough at their disposal). This differs from the relational stores, which historically have cached a data in RAM, but rely on fast physical disk access using SANs for the bulk of their storage.

NoSQL databases such as MongoDB and Cassandra are great at storing and retrieving unstructured data (Mongo uses a JSON like structure BSON), hence some firms has chosen them to store their time-series data.

In the institutional and HFT setting, tick databases have gained a lot of popularity in the last few years, as a place to store time series (such as market) data. One of the most prominent is the column orientated time-series database kdb+. Organisations will build their own "ticker plants" potentially capturing every single tick across a number of different exchanges (which can easily be gigabytes of data per day), and using this for running their back tests. kdb+ uses the Q programming language for queries. Although connectors exist to pull data into regular programming languages if required.

The cost of a simple setup with a few Exchange feeds combined with licensing costs for a time series database, hardware and data centre fees can easily run over $1,000,000 annually, before you've even written any code, which is why the barriers to entry in this space are out of the ordinary persons grasp.

Additionally, there are "big data" solutions such as Apache's Hadoop which can be considered, for running large scale distributed processing jobs across a number of nodes, however, these tend to be used when working with more varied data sets, not simple time-series data which you will be using for your backtests.

User interfaces

No doubt, there will come a point where some sort of user interface is required. Again there is a whole suite of different languages that can be used here. The main decision being whether you wish to create a web based solution for visualising what your system is doing, utilising one of the many web frameworks out there, or write a fat client to run on your own machine in C#/.NET, Java, Objective-C.

In summary

There's no reason why you can't use the language you already know best for developing your applications. However, it's best to retain a flexible mindset with respect to which tools you use for which parts of your system. For instance, when prototyping new models, you certainly want to use a language with good scientific computing capabilities, otherwise you'll end up wasting time reinventing the wheel instead of finding a decent trading opportunity. But also do keep in mind that the established firms have huge sums of money they can throw at teams to hire quants, developers and engineers separately, each being experts in the tools and technology of their chosen field.

If you don't know where to start, you won't go too far wrong starting with Python, as it provides decent capabilities in all of the above areas you need to work with. Your primary focus is to have an application that works, optimisation/rewriting can come later when you hit some genuine limitations of the language.

The universe of programming languages is ever expanding, there's plenty of options that I've excluded, such as CUDA, Clojure, D, Google's Go, Groovy to name a few, but hopefully this has given you some insight into what you have at your disposal for the different parts of your application.

Categories: Development


Leave a Reply

Your email address will not be published. Required fields are marked *