Menu Home

Variance Factors on VIX Futures I - Synthetic Futures

In her paper on ETNs on VIX futures, Carol Alexander demonstrates how principal component analysis can be used to identify the main variance factors in the term structure of the VIX. Over the next couple of posts I am going to demonstrate how you can implement this.

Principal component analysis (PCA) is a useful tool when working with multiple sets of data to identify the main drivers (principal components) of the variability of the system. In this instance our system is VIX futures (30, 60, 90, 120, 150, 180 day) returns. PCA enables us to identify a ranking of our different futures based on how much of the variability of returns they are responsible for.

As all futures have an expiry, we need to create constant maturity synthetic futures prices based on the real underlying futures prices, such that on any given date we have a price for a future that expires 30, 60, ..., 180 days from now. This post demonstrates how to obtain the relevant data and produces these synthetic prices.

You can obtain the daily VIX futures prices direct from the CBOE here. You will need to do a little manipulation to get all prices in line (details of which follow), but the end result is a matrix containing synthetic prices of the VIX futures 30, .., 180 days into the future.

                 30.0       60.0       90.0      120.0      150.0      180.0
2009-01-02  39.143667  39.429000  37.445000  36.195917  36.038000  35.935778
2009-01-05  40.053333  39.538667  37.558000  36.401667  36.018000  35.667389
2009-01-06  39.540000  38.969167  37.339556  36.621000  36.058933  36.017000
2009-01-07  42.976000  41.777000  39.765556  38.496167  37.849000  37.387722
2009-01-08  42.828333  41.770000  39.412000  38.150833  37.114400  37.217778
2014-12-23  15.951667  16.752500  17.300000  17.745417  18.079000  18.382778
2014-12-24  16.271667  16.955000  17.405000  17.851667  18.230500  18.510417
2014-12-26  15.918333  16.577500  17.096667  17.608333  17.985000  18.290556
2014-12-29  16.226667  16.785000  17.186111  17.668333  18.041000  18.344444
2014-12-30  16.585000  17.083333  17.435000  17.865833  18.193000  18.499444

To start with you will need to load all historic futures prices. The following code will let you do this. You will need to specify the location of where you have stored the futures prices in the FUTURES_DATA_DIR variable.

FUTURES_MONTHS = ['F', 'G', 'H', 'J', 'K', 'M', 'N', 'Q', 'U', 'V', 'X', 'Z']
FUTURES_DATA_DIR = '/path/to/data'

def load_vix_futures_prices(source_dir, price='Close',
                            start_year=2005, end_year=2099):
    Dictionary of dataframe price data in format
    CFE_[M][YY]_VX.csv where M is []

    start_year and end_year parameters refer to futures we are interested in,
    not dates we have price data for.

    :return data[YYYY][M] = dataframe
    Where YYYY is expiry year, M is expiry month in range [0, 11]

    data = {}

    files = glob.glob(os.path.join(source_dir, 'CFE_*'))
    for f in files:
        filename = os.path.basename(f)
        month = FUTURES_MONTHS.index(filename[4])
        year = int('20' + filename[5] + filename[6])

        if year < start_year or year > end_year:

            df = load_symbol_data(f, index=0, header_row=0)
        except IndexError:
            df = load_symbol_data(f, index=0, header_row=1)

        if year not in data:
            data[year] = 12 * [None]
        data[year][month] = df[price]

    return data

Once you have your futures prices, you need to create synthetic prices based on the following calculation:

P_{t_n} = \dfrac{t_{ex_1}}{t_n} P_{exp_1} + \dfrac{(t_n - t_{ex_1})}{t_n} P_{ex_2}

Where P_{t_n} is is the synthetic futures price expiry n days in the future. t_n is the number of days until the synthetic future expiry, t_{ex_1} is the number of days until the nearest future expires, and t_{ex_2} is the second nearest future expiry.

We can determine when the next future expires by building on the code from my last post. Not only do we need to know when the next future expires, but we also need to know the one after that, starting at a point of time t, n days into the future.

We use a function get_vix_future_ndays, which performs these calculations for us given a date n days into the future, the current date and plus our futures prices.

def get_vix_future_ndays(n, futures_prices, curr_date):
    Weighted future expiring n days from now.
    # We use n-1 as we roll to the next month if our target date falls on an
    # expiry date, we want to that months contract, not the next
    target_date = curr_date + dt.timedelta(days=n-1)

    prev_expiry_date = get_next_expiry_date(curr_date)
    next_expiry_date = get_next_expiry_date(
        prev_expiry_date + dt.timedelta(days=1))

    while next_expiry_date < target_date:
        prev_expiry_date = next_expiry_date
        next_expiry_date = get_next_expiry_date(
            prev_expiry_date + dt.timedelta(days=1))

    prices = get_futures_prices(
        futures_prices, curr_date, prev_expiry_date, 2)
    near_date_maturity = (prev_expiry_date - curr_date).days

    if prices[0] is np.NaN or prices[1] is np.NaN:
        return np.NaN
        return (near_date_maturity / n) * prices[0] + \
               ((n - near_date_maturity) / n) * prices[1]

def get_futures_prices(futures_prices, curr_date, expiry_date, month_count):

    def get_price_value(prices, curr_date):
        if curr_date in prices:
            return prices[curr_date]
            return np.NaN

    year = expiry_date.year
    month = expiry_date.month

    prices = []

        get_price_value(futures_prices[year][month - 1], curr_date))

    remaining = month_count - 1
    next_month = month
    next_year = year
    while remaining > 0:
        if next_month == 12:
            next_month = 1
            next_year += 1
            next_month += 1

                futures_prices[next_year][next_month - 1], curr_date))

        remaining -= 1

    return prices

def get_next_expiry_date(curr_date):
    expiry_date = get_expiry_date_for_month(curr_date)

    # It must be less then the expiry date, as on expiry date we only have
    # a settlement price
    if curr_date < expiry_date:
        return expiry_date
        return get_expiry_date_for_month(curr_date + dt.timedelta(days=30))

def get_expiry_date_for_month(curr_date):


    Trading hours for expiring VIX futures contracts end at 7:00 a.m. Chicago
    time on the final settlement date.


    The Wednesday that is thirty days prior to the third Friday of the
    calendar month immediately following the month in which the contract
    expires ("Final Settlement Date"). If the third Friday of the month
    subsequent to expiration of the applicable VIX futures contract is a
    CBOE holiday, the Final Settlement Date for the contract shall be thirty
    days prior to the CBOE business day immediately preceding that Friday.
    # Date of third friday of the following month
    if curr_date.month == 12:
        third_friday_next_month = + 1, 1, 15)
        third_friday_next_month =,
                                          curr_date.month + 1, 15)

    one_day = dt.timedelta(days=1)
    thirty_days = dt.timedelta(days=30)
    while third_friday_next_month.weekday() != 4:
        # Using += results in a timedelta object
        third_friday_next_month = third_friday_next_month + one_day

    # TODO: Incorporate check that it's a trading day, if so move the 3rd
    # Friday back by one day before subtracting
    return third_friday_next_month - thirty_days

We now have all of the building blocks to produce our synthetic futures prices, which we generate using the following code:

def main():
    futures_prices = load_futures_prices()

    n_futures = [30., 60., 90., 120., 150., 180.]
    prices = {}

    start = '01/01/2009'
    end = '30/12/2014'
    dates = pd.date_range(start, end, normalize=True)

    prices = pd.DataFrame(index=dates)

    expected_nan = 0

    for n in n_futures:
        result = load_vix_future(n, futures_prices, dates)
        # You may want to add some additional validation here looking for NaNs...
        prices[str(n)] = result

    prices = prices.dropna()

Finally, we plot our price data to make sure we have something sensible generated.



Now we have our synthetic data set, we have overcome one of our significant obstacles - getting a clean dataset. In the next post, I'll demonstrate the magic of running principal component analysis on our dataset.

Categories: Development Python Volatility

Tagged as:


3 replies

Leave a Reply

Your email address will not be published. Required fields are marked *