There's been a bunch of chat following Winelands and 99er about "low betas", seeding calcs, how to move up, etc, etc.
I've done some reading on the PPA website about how seeding works, and I thought some worked examples, with explanations, might be helpful, so here goes.
If you want to argue with me, fine, but please do it with maths, not with "I feel like 'x'". I'm basing this purely off the logic as laid out by PPA, which I assume they follow. I'm also 99% sure my maths is right, but correct it if you see a glaring error.
I'll work through the PPA page point by point, noting where they no longer use certain points.
1. Ride weighting
This seems to have fallen out of use. From everything I've seen since starting riding in 2020, your best index result, after penalties, contributes 100% to your seeding.
2. Establish a base event
Unclear exactly how this contributes, but as they states this is the last CTCT, we'll just go with it (becomes more obvious in the Tadej example below).
3. Calculate adjusted wining times and beta for a race
This is the crux of the matter, and where most of the discussion lies. There's a false perception that both of these steps are subjective, and the main point of this post is to show how they are not. I also strongly believe that PPA does themselves a massive disservice by labelling beta as "difficulty" since this suggests some subjectivity, which it very clearly does not. It's all too easy for anecdotal "I felt this race was harder than that race" to enter into the discussion, where it actually has no basis in the actual calcs, as we will show below.
The goal of this step is to say "on average, a rider coming into an event with index of 'x' should get an index of 'x for the event" (read that a few times, it's important).
Obviously, hundreds/thousands of riders do an event, and it's not mathematically possible to fit a line such that everyone's index remains the same. (Also, then your index would never change, which is obviously undesirable). That's where the linear regression comes in. Sounds complicated, really isn't, you just let Excel do it for you.
An important point to note is that, in a race with a beta of 1 and where the winner of the race had an index of 0 before the race, your index will represent the percentage over the winning time. (Eg: winner takes 200 minutes, you take 220 minutes, you did 10% more, your index is 10. If you take 300 minutes, you did 50% more, your index is 50).
Let's work through some examples to see how it plays out.
For all examples, we're going use the fictitious "Tour de Bikehub" as our event. Assume only 6 riders, since that's easier to comprehend the maths. The indexes shown in the tables represent seeding indexes prior to the event. Times are represented in minutes to make things easier to math.
The base case
In this case, the winner had an index of 0, and did 150 minutes.
Everyone else, conveniently, did exactly their index worse (so the person with an index of 10 did 10% worse, or 15minutes worse = 165mins).
We can plug those times and indexes into excel and draw a chart, with index along the x-axis and time on y-axis
Excel then allows us to draw a linear trendline through those points, and to plot the equation of "y = mx + c" (high school maths reminder: m = the gradient/slope of the line, c = the y-intercept, which in this case is the "predicted winning time of a 0-indexed rider")
We can pull those values out as well using the "LINEST" formula in excel
For our purposes, "c" represents the "adjusted winning time" we're familiar with.
We need to do a little maths on "m" to get to "beta", but it's not hard: beta = m/c * 100 (which in this case equates to 1).
So we now, for this event, have a beta of 1, an adjusted winning time of 150mins.
We can then proceed to step 4, and calculate new indexes for all riders:
New index = (Time / WinningTime - 1) / Beta * 100
If we do that for each rider, we get:
Nothing changed. As expected, because this was the base case, where the best possible seeded rider won and everyone else performed to expectation.
But hopefully now we understand the maths, so let's go to example 2.
Example 2: Missing elites
Let's keep the same cohort as the last example, and the same times, but this time let's say the rider seeded 0 doesn't rock up.
So the rider seeded 10 wins, and does it in the same time they did in example 1:
Doing the same excel gymnastics as last time, we get this chart:
And these params:
Which all looks quite familiar? Again, this is expected: the winning time is adjusted down to what it would have been if a 0 index rider had shown up. The beta, however, remains unchanged at 1, since the performances relative to that time are in line with the index expectations.
So we, again, don't have any index changes:
Example 3: The slow day
Let's assume the race was just really slow, and the times look like this:
Chart:
As you can see, some dots are above the line, some are below. In general, those above the line won't improve their index, those below the line will.
We end up with a huge beta of 1.87 (since in general riders were well over the "x% more than the winning time" index heuristic) and riders C and E improved their indexes by finishing "below the trendline"
Example 4: The monster in Group D
Let's now assume the monster rider A decides to drop back to D, and pulls all his mates to a faster finishing time.
They improve the D time from 195mins to 175mins (they catch C, who started 5 mins before them), everyone else remains the same (so B wins again).
Our chart and params now look like this (note how far some points are from the trendline now):
As you can see, this leads to a slightly adjusted winning time, but a very low beta (shades of 2024 Tour de PPA here).
This helps B, C, and D, but is penal for E & F.
Fun example: Tadej comes to town
It's also possible for the winning time to be adjusted up. Imagine Tadej (who doesn't have a Racetec chip, or a PPA seeding index) decides to come do the Tour de Bikehub. He obviously smashes everyone, including the 0 index rider.
In this case, we actually just don't include Tadej's time (since they don't have an index and thus can't be included in the calcs).
Chart and params:
Winning time is adjusted up from 135mins to 150mins (ie, what the actual 0 index rider did).
Tadej now has an index of -10, unlikely but possible (until the next CTCT he wins, at which point he becomes the "baseline" or "0 index rider").
Conclusion
That was long, but I hope it's helpful to understand just how little subjectivity there is in calculating the seeding numbers. Obviously with thousands of riders doing all sorts of performances the linear regression becomes less easy to interrogate, but it should "average out".
If I've missed anything, or anything is unclear, please let me know so I can update this post.
If you want to play around with the calcs, make a copy of this Google sheet and go nuts