Solving XC Madness: Developing New Rating System

Oct 14, 2015

The Problem:

In cross country we always run into the same issue. The only accurate way to compare teams/athletes is in head to head match ups. It's fun to play around with times and see what you can do with that data, but ultimately, when comparing athletes, xc times don't carry much weight.
I believe it also requires a culture change in Texas. There's a lot more importance put on xc times by the TX running community compared to many other ares of the country.

Limited Options:

Speed Ratings: Fantastic system developed by Bill Meylan of tullyrunners.com in NY. Incredibly precise but virtually impossible to accurately replicate.
Butler Ratings: Developed by Jimmy Butler and used in partnership with FloTrack to accurately rate and predict collegiate athletes 2010-2011. Again, virtually impossible to replicate.

Next Step:

Before I got into running, my athletic background was in xc skiing. After mulling over possible directions to go, I kept coming back to the FIS World Cup points system and Jimmy Butler's xc ratings. Both of which utilize different methods to rate performances based off of head to head match ups.

The Solution:

I spent a lot of time trying to come up with a viable and time effective solution. So far, I've failed miserably in regards to time effective, but through some early tests, it's proven to be relatively accurate. I'll try to explain it without nerding out too much and diving into the gritty/boring details.

Lower rating = better
Time relative to course distance is irrelevant = Avoids dealing with inconsistencies in course distance reporting.
Ratings are given by race, not meet = Every race at a meet rated separately
Winner of every race is assigned a 0 base score. Using the formulas, increasing values are then assigned to the rest of the field based off of how many seconds back they finished from 1st.
Penalty: derived for each race based on the quality of the competition
Base value + penalty = athlete race rating
Athlete rankings then based off of season average

That's a very very very simplistic explanation, but gives the general overview.

Now What?

I built spreadsheets that have around 6k female athletes and 8k male athletes from the 2015 season then compiled 1000s of performances from 2015 xc and a few races from 2014 and began crunching numbers. After weeks of wanting to throw my computer out the window, things finally began to operate relatively smoothly.

Test #1:

Nike South Invitational:

By taking the athlete ratings, I scored the Nike South boys elite race based off of the teams in the race. Projected vs actual scores and finish place can be seen in the image to the right.

Test #2:

Austin Westlake Invitational:

Using the same process, I scored the Girls Varsity race at the Austin Westlake invite. Projected vs actual scores and finish place can be seen in the image to the right.

Takeaways:

Pretty dang accurate. Anything can happen in xc, so race projections are always a shot in the dark. Assuming every team is racing their full varsity squad, among many other factors.
It's brutally time consuming so not very practical in that aspect.
Still testing and tweaking. Data so far is still fairly limited to the 90 or so meets I've been able to add. Focusing on varsity races and a few of the larger JV races when I can find the time.
Need to develop a program to handle the data as it is already overloading my excel spreadsheets.

Check it Out:

The latest Texas Saucony Flo 50 Rankings are based off these ratings.

Boys	Girls
Top-25 Teams	Top-25 Teams
Top-25 Individuals	Top-25 Individuals