Wednesday, June 15, 2011

Part Two of Pair Trading Using Python 3.1!!

Today I am going to showcase the methodology, from a coding stand-point, we would use to determine whether two securities are good candidates for pair trading. Whereas, the next posting will show the actual source code that I wrote to do the analysis.

Step 1: We need to get the data. The analysis is worthless without having sufficient data to use. For this exercise, I pulled 3 years worth of daily closing prices from AAPL Closing Prices . Notice, this data source is freely available and located at Yahoo! Finance . Make sure you set the parameters at the top to indicate that you want daily prices, and make sure you select the same time frame for each set of data you want to compare. In this example, I pulled Jan 1, 2008 to yesterday (Jun 14, 2011). After we arrive at this web address, at the bottom of the page is a link to "Download to Spreadsheet." Select this link and now the data should be in a spreadsheet. Next, we open a blank ".txt" file and type in the Ticker Symbol for the Security you just pull data from on the top line of the file, e.g. "AAPL". Lastly, we copy the closing price column from the spreadsheet we just downloaded and paste it into the .txt file below the ticker symbol.

Step 2: After you have pulled one data set, repeat the same procedures outlined in Step 1 for the security you wish to compare with the first data set. MAKE SURE YOU SELECT THE SAME TIME FRAME! We should now have two .txt files titled with the two ticker symbols you wish to compare, for example, KO.txt and PEP.txt.

Step 3: Now that we have two data sets that represent the closing prices of two securities for a set period of time, we are ready to do some analysis. The first step in analysis is to calculate an average difference in price for the two securities during the time frame. For example, if we select a time frame of the last 50 closing prices, we want to know the difference in closing price between the two securities for every day in that 50 day window. We will store those "difference data points" in a list. Then we will calculate the average difference by taking the average of the "difference data points list." This overall average difference is going to be critical in determining whether or not the two securities represent a "good" pair.

Step 4: The next step in the analysis is to determine how many of those "difference data points" are "close" to the actual average difference. In other words, we are checking to see if the average resulted from consistent differences in price, or from wild, varying differences. This is important to know because in order to profit from price movement, we want there to be a consistent difference in price. That way, when the prices move away from this average difference, we can make "bets" on the fact that they will once again return to that average difference in price. The way we will approach this problem is by setting a "variance." We want to check how many of the "difference data points" in the list fall within the range of the "average difference" plus/minus a "variance." This will result in a simple "count" of the number of data points in the "difference data points list" that fall within the overall average difference plus/minus a variance that we set earlier.

Step 5: Now that we have this "count" value from Step 4, we will now do some more analysis to determine if that is a sufficiently large count value. Simply, we set an overall "threshold percentage" that represents the percentage of data points that need to fall within the variance range to represent a "good" pair. If the "count" value divided by the total number of data points we selected is greater than the "threshold percentage" that we established, then we say that the two securities are a "Good Match."

Step 6: Lastly, if the two securities resulted in a "Good Match" decision, then we need to do some analysis to determine which security needs to be shorted and which one needs to be bought. If the decision rendered was that they were a "Bad Match", then the analysis is done and we do nothing with that pair. In the case of a "Good Match", the logic to determine which one is a short and a buy is relatively simple. First, it is noteworthy that we are calculating the difference in data points as Security2 closing price - Security1 closing price. In other words, the first security you get data from is being subtracted from the second security you gathered data on. So, if the overall average difference is greater than or equal to 0, then that means that on average Security2's price is greater than Security1's price. With this information we are ready to create adjusted prices. If the average difference in price is greater than or equal to 0, then the adjusted price for Security1 is equal to the current price of Security1 plus the average difference between the two securities. The adjusted price for Security2 would then equal the current price of Security2 minus the average difference between the two securities. In the case that the average difference is less than or equal to 0 (meaning on average the price of Security1 is greater than Security2), we do the opposite and subtract the average difference from the current price of Security1 and add the average difference to the current price of Security2 to come up with the overall adjusted prices. Finally, if the adjusted price of Security1 is greater than Security2, that would mean that we should short Security1 and buy Security2. If the adjusted price of Security2 is greater than Security1, then we would short Security2 and buy Security1.

Summarily, we are getting the data points for two different securities. We are making a list of the difference in prices for each day. We then take the average of that list of differences. We are then calculating a percentage based on how many of those data points are close to the average difference. If the percentage is high, we would say the two securities are a good match because that means that the difference in price is usually the same between the two securities. Last, we do some math to see which one is priced too high, and we short that security and buy the other one.

Hopefully, this explanation of the methodology is sufficient for you understanding. However, if you still do not fully understand, then the next posting containing the source code should drive the point across. Do keep in mind that my Inbox is always open for questions ( progamtotrade@gmail.com ).

A preview of the agenda for upcoming posts:

Next Post: Source code for pair trading analysis

Next Series: How does the delta value for an option as well as the corresponding option premium relate to the premium of a VIX (Volatility Index) option for the same time frame? Is there a relationship? If so, what is the underlying relationship and how can it be utilized for profits?? We will explore these questions through the use of functional programming in Python 3.1!

Take Care!

No comments:

Post a Comment