In her paper on ETNs on VIX futures, Carol Alexander demonstrates how principal component analysis can be used to identify the main variance factors in the term structure of the VIX. Over the next couple of posts I am going to demonstrate how you can implement this.

Principal component analysis (PCA) is a useful tool when working with multiple sets of data to identify the main drivers (*principal components*) of the variability of the system. In this instance our system is VIX futures (30, 60, 90, 120, 150, 180 day) returns. PCA enables us to identify a ranking of our different futures based on how much of the variability of returns they are responsible for.

As all futures have an expiry, we need to create constant maturity synthetic futures prices based on the real underlying futures prices, such that on any given date we have a price for a future that expires 30, 60, ..., 180 days from now. This post demonstrates how to obtain the relevant data and produces these synthetic prices.

You can obtain the daily VIX futures prices direct from the CBOE here. You will need to do a little manipulation to get all prices in line (details of which follow), but the end result is a matrix containing synthetic prices of the VIX futures 30, .., 180 days into the future.

30.0 60.0 90.0 120.0 150.0 180.0 2009-01-02 39.143667 39.429000 37.445000 36.195917 36.038000 35.935778 2009-01-05 40.053333 39.538667 37.558000 36.401667 36.018000 35.667389 2009-01-06 39.540000 38.969167 37.339556 36.621000 36.058933 36.017000 2009-01-07 42.976000 41.777000 39.765556 38.496167 37.849000 37.387722 2009-01-08 42.828333 41.770000 39.412000 38.150833 37.114400 37.217778 ... 2014-12-23 15.951667 16.752500 17.300000 17.745417 18.079000 18.382778 2014-12-24 16.271667 16.955000 17.405000 17.851667 18.230500 18.510417 2014-12-26 15.918333 16.577500 17.096667 17.608333 17.985000 18.290556 2014-12-29 16.226667 16.785000 17.186111 17.668333 18.041000 18.344444 2014-12-30 16.585000 17.083333 17.435000 17.865833 18.193000 18.499444

To start with you will need to load all historic futures prices. The following code will let you do this. You will need to specify the location of where you have stored the futures prices in the *FUTURES_DATA_DIR* variable.

FUTURES_MONTHS = ['F', 'G', 'H', 'J', 'K', 'M', 'N', 'Q', 'U', 'V', 'X', 'Z'] FUTURES_DATA_DIR = '/path/to/data' def load_vix_futures_prices(source_dir, price='Close', start_year=2005, end_year=2099): """ Dictionary of dataframe price data in format CFE_[M][YY]_VX.csv where M is [] start_year and end_year parameters refer to futures we are interested in, not dates we have price data for. :return data[YYYY][M] = dataframe Where YYYY is expiry year, M is expiry month in range [0, 11] """ data = {} files = glob.glob(os.path.join(source_dir, 'CFE_*')) for f in files: filename = os.path.basename(f) month = FUTURES_MONTHS.index(filename[4]) year = int('20' + filename[5] + filename[6]) if year < start_year or year > end_year: continue try: df = load_symbol_data(f, index=0, header_row=0) except IndexError: df = load_symbol_data(f, index=0, header_row=1) if year not in data: data[year] = 12 * [None] data[year][month] = df[price] return data

Once you have your futures prices, you need to create synthetic prices based on the following calculation:

Where is is the synthetic futures price expiry days in the future. is the number of days until the synthetic future expiry, is the number of days until the nearest future expires, and is the second nearest future expiry.

We can determine when the next future expires by building on the code from my last post. Not only do we need to know when the next future expires, but we also need to know the one after that, starting at a point of time , days into the future.

We use a function *get_vix_future_ndays*, which performs these calculations for us given a date days into the future, the current date and plus our futures prices.

def get_vix_future_ndays(n, futures_prices, curr_date): """ Weighted future expiring n days from now. """ # We use n-1 as we roll to the next month if our target date falls on an # expiry date, we want to that months contract, not the next target_date = curr_date + dt.timedelta(days=n-1) prev_expiry_date = get_next_expiry_date(curr_date) next_expiry_date = get_next_expiry_date( prev_expiry_date + dt.timedelta(days=1)) while next_expiry_date < target_date: prev_expiry_date = next_expiry_date next_expiry_date = get_next_expiry_date( prev_expiry_date + dt.timedelta(days=1)) prices = get_futures_prices( futures_prices, curr_date, prev_expiry_date, 2) near_date_maturity = (prev_expiry_date - curr_date).days if prices[0] is np.NaN or prices[1] is np.NaN: return np.NaN else: return (near_date_maturity / n) * prices[0] + \ ((n - near_date_maturity) / n) * prices[1] def get_futures_prices(futures_prices, curr_date, expiry_date, month_count): def get_price_value(prices, curr_date): if curr_date in prices: return prices[curr_date] else: return np.NaN year = expiry_date.year month = expiry_date.month prices = [] prices.append( get_price_value(futures_prices[year][month - 1], curr_date)) remaining = month_count - 1 next_month = month next_year = year while remaining > 0: if next_month == 12: next_month = 1 next_year += 1 else: next_month += 1 prices.append( get_price_value( futures_prices[next_year][next_month - 1], curr_date)) remaining -= 1 return prices def get_next_expiry_date(curr_date): expiry_date = get_expiry_date_for_month(curr_date) # It must be less then the expiry date, as on expiry date we only have # a settlement price if curr_date < expiry_date: return expiry_date else: return get_expiry_date_for_month(curr_date + dt.timedelta(days=30)) def get_expiry_date_for_month(curr_date): """ http://cfe.cboe.com/products/spec_vix.aspx TERMINATION OF TRADING: Trading hours for expiring VIX futures contracts end at 7:00 a.m. Chicago time on the final settlement date. FINAL SETTLEMENT DATE: The Wednesday that is thirty days prior to the third Friday of the calendar month immediately following the month in which the contract expires ("Final Settlement Date"). If the third Friday of the month subsequent to expiration of the applicable VIX futures contract is a CBOE holiday, the Final Settlement Date for the contract shall be thirty days prior to the CBOE business day immediately preceding that Friday. """ # Date of third friday of the following month if curr_date.month == 12: third_friday_next_month = dt.date(curr_date.year + 1, 1, 15) else: third_friday_next_month = dt.date(curr_date.year, curr_date.month + 1, 15) one_day = dt.timedelta(days=1) thirty_days = dt.timedelta(days=30) while third_friday_next_month.weekday() != 4: # Using += results in a timedelta object third_friday_next_month = third_friday_next_month + one_day # TODO: Incorporate check that it's a trading day, if so move the 3rd # Friday back by one day before subtracting return third_friday_next_month - thirty_days

We now have all of the building blocks to produce our synthetic futures prices, which we generate using the following code:

def main(): futures_prices = load_futures_prices() n_futures = [30., 60., 90., 120., 150., 180.] prices = {} start = '01/01/2009' end = '30/12/2014' dates = pd.date_range(start, end, normalize=True) prices = pd.DataFrame(index=dates) expected_nan = 0 for n in n_futures: result = load_vix_future(n, futures_prices, dates) # You may want to add some additional validation here looking for NaNs... prices[str(n)] = result prices = prices.dropna()

Finally, we plot our price data to make sure we have something sensible generated.

prices.plot() plt.show()

Now we have our synthetic data set, we have overcome one of our significant obstacles - getting a clean dataset. In the next post, I'll demonstrate the magic of running principal component analysis on our dataset.

Categories: Development Python Volatility

Perhaps give Quandl a whirl?

Might be a little more standardised.

Actually I just found this, which looks nice at first glane:

https://www.quandl.com/data/CHRIS?keyword=vix%20future

Quandl's great. Although their roll algorithm simply concatenates the prices - there's no weighted mean depending on how close we are to expiry with the near month versus the second.

https://www.quandl.com/collections/futures/continuous

Yes I bumped into that before. Unsure whether that applies to this specific set or not...