Random Investment Co.

Beating the S&P 500 is something companies always brag about. Most funds can't do it so when it happens consistently over time, it's kind of a big deal.

Or is it? Companies like Fidelity or Vanguard have tens and hundreds of funds. A subset of them just have beat the S&P 500 by random chance. Below is an example that I'll call Random Investment Co. (RandCo for short).

RandCo features 25 funds, each consisting of 50 stocks. But instead of categories like biotech or transportation, the 50 stocks are randomly selected from the S&P 500.  In other words, they are 25 random subsets of size 50 each.

By definition (unless I have extremely bad luck), a subset of these 25 funds will just have to outperform the S&P 500.

Let's verify this claim with real numbers. To make calculations easy, we place equal weighting on the holdings. The real S&P 500 is weighted; but fortunately, there is an equal-weighting ETF: RSP. From Jan. 27, 2012 to Jan. 30, 2015, RSP gained 58%. Let's see how RandCo compares.

Overall, RandCo gained 60% over the same time period. A slight beat over RSP. But one of its 25 funds gained 76%, that's 31% higher than RSP. Here are its holdings: LUV ZMH MNST STX SYK PM AGN WFM CMI LYB KR NWL CMA ORCL CTL ABT SNDK SWK BBT EIX CELG BLL DPS R AMAT ROST ROK WFC AMZN AEE ESS PFG PPL LNC CHRW GM FLIR SRCL APD SCG LEN V BA TXT D VMC MO RF CA.

Interestingly, half (13) of the 25 funds outperformed RSP and the rest underperformed. How statistically likely. But we won't share the underperformers with CNBC :-)

I also simulated 100,000 different incarnations of RandCo. The average gain over the same time period? 58%.

Every now and then, I'll run into a list of mutual funds that beat the S&P 500 over 5 years or over 10 years. Whenever I see it, I think about RandCo and its 31% beat. Is it all just random chance? My bet is on yes.

The Data-Driven Offensive Coordinator

I was listening to the Tony Kornheiser podcast a while ago and he was raising a suggestion of going after a Sabermetrics coach for the Redskins. NFL scouting is less data-oriented than baseball and basketball, and Kornheiser was suggesting maybe a Billy Bean like figure for the Redskins to shake things up.

The suggestion got me thinking. Sabermetrics for scouting is easy. But could a machine do play calling better than an offensive coordinator? Not something like the play calling in Madden but a real data analytics based solution.  The nature of football with stoppage between plays makes it a natural fit.

Here is how it could work. It requires a lot of data that isn't readily available today, but maybe one day. First, offline, gather as much historical data on individual plays as possible. Each play contains features such as:

  1. Down
  2. Yards to go
  3. Clock
  4. Field position
  5. Score
  6. Home/away
  7. Weather
  8. Opponent
  9. Defensive formation
  10. Defensive personnel (maybe even down the individuals)
  11. Defensive coordinator/team
  12. Offensive formation
  13. Offensive personnel
  14. Offensive play

Let's say you could gather the above for every single play for your team in the past X seasons. You could in theory learn a function that takes the above input features and outputs yards gained. There is a question of how much data you have or need (especially if you want to build a different model per opponent). But let's suppose you get it done.

Then, at runtime (aka play calling time), some assistant coach can enter in all observable features. They would have to watch the defense line up to get the formation and personnel. This is a bit tricky since the defensive usually waits for the offense. Anyway, let's assume it's doable. Then it's a simple matter of applying the model to every single play in the playbook and seeing which one returned the most predicated yards gained.

This offense would be pure data-driven, no emotions.  Will probably go for it on 4th down more than a human being. But somehow I feel like it could revolutionize the game, even more than Sabermetrics did to baseball.

On Being Data Driven

"Data driven" is a buzzword many people/teams/companies throw around these days. Like many other buzzwords, it's often misused and misunderstood.

To many, "data driven" equals AB-testing. They setup a live AB test, look at metrics (e.g., click-through rate, revenue, etc.), and make a launch decision based on which treatment has the bigger number.

While valid, this is only a small part of what being data driven means.

What it really means is something much more pervasive throughout the product cycle. Most of them seem obvious, but it's funny how the obvious is often forgotten.

  • Rely on data in product planning. Often, one is faced with many possible choices of how to iterate on a product and having to prioritize amongst them. Instead of relying purely on instinct/experience, add data to the mix. Use analysis to figure out which areas have the most gap, which areas have the most to gain, etc.
  • Invest in metrics infrastructure. Many companies have analysts whose fulltime job is to run data warehouse queries and make pretty charts, often the same ones over-and-over. Though the charts can be useful or even critical, the process does not move the org forward. Instead, hire engineers to build the infrastructure to automate those charts. Use the analysts to discover new charts that should be automated.
  • Focus on data quality. Pick any internet company and its data warehouse is full of logs. Raw, dirty, noisy logs. Every team who wants to use data has to clean those logs from scratch, often in different ways. As a result, different people will come up with different numbers for the same question. Instead, have a single source of cleaned logs that can be universally accepted. If different teams have different definitions of "clean", use the least common denominator.
  • Look beyond the top-line metrics. With AB testing, some people just look at the top-line metrics such as revenue or clicks and make a decision based on that. But when asked why the metrics are the way they are, the answer is often just a hypothesis. Validate that hypothesis with more data. If that data is not available, make it available and understand the cause behind the top-line metric.

Facebook vs. Flickr vs. 500px

About a year and half ago, I decided to ditch Flickr as my primary method of picture sharing online. Since more and more of my friends were using Facebook exclusively for everything, I started to post all pictures to Facebook.

I could tag people, places, things, etc. My photos were getting comments from friends rather than strangers. And although the Facebook picture viewing experience sucked back then, I decided the pros outweighed the cons.

Fast forward to now. I'm still on Facebook but I'm feeling the pains. Album, set, tag management in Facebook is nonexistent. There are many bugs, especially when photos are put into albums. Though the actual picture viewing experience has been improved, everything else is still pretty old.

In addition, I'm starting to miss the occasional comment from a stranger on my photos. I needed another way to share my photos.

So, I looked around and 500px seemed like the "in" thing. It looks like a more beautiful Flickr with less random photos.

I posted a few random photos on there, and voila, there were comments almost instantly! But as I looked further, it was basically a circle jerk within 500px to promote each other. None of the comments were actually useful.

So now I'm lost again. Should I just live with Facebook? Should I try out 500px a little more? Should I go back to Flickr? Should I give Google+ a try?

Settling Down

Signed the lease on our more permanent place yesterday. Found a great townhouse in Sunnyvale, 3 bedroom, 2 bath, a couple of patio areas, awesome living room with floor-to-ceiling glass walls. Going to be pretty sweet. Moving there in roughly 2 weeks, can't wait.

I've been using the Magic Trackpad lately. Before, I thought it was kind of a lame device. Basically yet another attempt by Apple to make some cash. But after using it for a few days, I get it now. It lets me do all the multi-finger gestures I do on a laptop with a desktop. For example, I use the 2-finger scroll and 4-finger expose a lot. The 3-finger drag is also pretty useful.


Just finished my second day of work at Groupon today. Lots of learning happening; very exciting. Feels a little weird not inside the MS world. Bash/Vim commands are slowly coming back to me. I find myself pressing key combinations with some kind of muscle memory and they actually do things that I want! Sucks that I lost my .vimrc and .bash_profile from those days. It's going to take a while to get all that configured right.

For the past week, I've been driving a rented Nissan Versa while waiting for my actual car to be transported here. I've grown attached to the thing. It's weak but cute. Feels like a toy. I like throwing it around corners at speeds that almost feel like I'm going to roll over. Parking is also way easier with a tiny car.

New Old World

After spending almost 3 years in Seattle, I'm now back in California, Bay Area to be exact. The wife and I flew down yesterday and are now living in Mountain View. I quit Microsoft and will be joining Groupon next week. The wife is transferring to the Microsoft down here.

Lots of changes but it also feels like we never left. The wife's previous job was in the Bay Area and I spent a few summers here as well. Everything around has that familiar feel to it. Reduces the stress of moving I suppose.

Along with the move back, I'm also going to re-introduce a couple of old hobbies back into my life. First is music. I haven't seriously listened to music, old or new, in at least a year. I almost exclusively listen to podcasts these days. It's good and bad I suppose. I keep up with current happenings with podcasts but a lot of them are redundant (anyone else feel like 90% of the twit network is just reverberation of the different shows?). I've been missing out on a lot of new music (Mogwai, Red Sparowes, the whole "chill-wave" scene, etc).

Second is photography, specifically film photography. I haven't touched a film camera in ages. I basically just use my Panasonic Micro 4/3 these days. Kinda sad to have all that equipment sitting around gathering dust. In the next couple of weeks, I'm definitely going to break out one of the medium formats, shoot a couple of rolls, and develop them at home. Let's just hope I still remember how to load film, how to meter, ... :-)

One last thing. I've decided to stop subscribing to cable TV. With the HD, DVR, HBO, etc. packages, it was adding up to something like $100/month just for TV, even though I just watch a handful of shows. Ridiculous. Instead, I'm going to start subscribing to Hulu Plus on the PS3. I already have Netflix on that. Add ESPN3 on the Xbox, that's basically 90% of what I watch these days.

Finally Back

I forgot the password to my stupid blogger form and couldn't remember it for the life of me. Finally got them to reset it for me and now I can finally write to this! Some random musings...

Red Dead Redemption is a pretty awesome game
Pansonic GF1 is a pretty decent camera but I'm having a hard time seeing it replacing the 5D mk II
Kobe sucks. Fisher sucks. Lakers suck.
Still using my iPad everyday, not a fad
Tired of the iPhone though, looking to jump ship
Listening to podcasts at 2X speed at the gym, a very good idea