Jump to content

Recommended Posts

  • Replies 59
  • Created
  • Last Reply

Top Posters In This Topic

Top Posters In This Topic

Posted

But the "easiness" of the race is exactly what the beta value measures. The alpha (in this case adjusted winners time) measures the quality of the field.

 

The problem with your suggestion of just taking the race difficulty into account is in cases where there is a very small race, the winner will get a seeding index of 0 (or 1, whatever the starting point is) regardless of who he is competing against or how fast he actually is.

 

In very simplistic terms what it does is say that if you are the best seeded person at the race, you are expected to win and as such, winning will not change your seeding much. Which to me makes sense.

 

In principle I agree - but taking the Paarl race that started this thread as example is not exactly a small race, it was sold out with more than a 1000 finishers. Yet the 12 minutes they adjust the time with make the race useless as a seeding exercise for most riders at sharper end of the field (not just the Milky's among us that are hoping for an index of around 0). If there ain't going to be many funrides this year with stronger fields what is the point of the massive adjustment?

 

I share Eddie_V's frustration, not everyone has the time & money to do every race until you stumble upon the one where the seeding computer have a glitch in your favour.

Posted

So what Racetec is saying is …. If you have an index of 10, and you win the race, regardless of whether you are faster, you should still have an index of 10 after this particular race. So lets adjust the winners time so the line will fit.

 

Not strictly true. That would only apply if you were the top seeded person at the race. They basically use seeding at the start of the race to quantify the "quality" of the field and adjust the winners time accordingly so that if the best seeded person had a seeding of 10 before the race and they win it, they will still have a seeding of approximately 10 after the race.

 

To me the approach makes sense intuitively. The one thing that they don't publish, which I would find interesting as a statistician, are things like parameter significance and estimate standard errors so we could see how well the model actually fits the reality.

Posted

 

To me the approach makes sense intuitively. The one thing that they don't publish, which I would find interesting as a statistician, are things like parameter significance and estimate standard errors so we could see how well the model actually fits the reality.

 

Indeed ... wonder how they would account for the bunch finishes etc. which formulas is used for correlation regression , which distribution they model it on, and interval (estimation).

post-20017-0-05440700-1580989826_thumb.png

Posted (edited)

Indeed ... wonder how they would account for the bunch finishes etc. which formulas is used for correlation regression , which distribution they model it on, and interval (estimation).

 

From what I've read it seems like a simple linear regression of seeding index vs race time.

 

Handily with linear regression you don't need to assume any sort of distribution if you have a lot of data points (>30-50 or so should get you into Central Limit Theorem territory) and you also get the same results regardless of whether you use Ordinary Least Squares (OLS) or Maximum Likelihood Estimation (MLE) to fit the regression line.

 

I'd be very surprised if they didn't use Pearson Correlation (assuming you look at the correlation separately rather than using the r² as a "goodness-of-fit" measure).

 

I'm not sure why you would need any intervals when you have exact seeding and time results and they have already defined a seeding index of 0 as the best cyclist and 100 as a cyclist who would take twice as long as the best cyclist on the average course.

 

So you don't really have to make that many assumptions.

Edited by Jehosefat
Posted

From what I've read it seems like a simple linear regression of seeding index vs race time.

 

Handily with linear regression you don't need to assume any sort of distribution if you have a lot of data points (>30-50 or so should get you into Central Limit Theorem territory) and you also get the same results regardless of whether you use Ordinary Least Squares (OLS) or Maximum Likelihood Estimation (MLE) to fit the regression line.

 

I'd be very surprised if they didn't use Pearson Correlation (assuming you look at the correlation separately rather than using the r² as a "goodness-of-fit" measure).

 

I'm not sure why you would need any intervals when you have exact seeding and time results and they have already defined a seeding index of 0 as the best cyclist and 100 as a cyclist who would take twice as long as the best cyclist on the average course.

 

So you don't really have to make that many assumptions.

 

I would rather implement intervals if I were going to do block moves .... else the error statistic would look horrible? And then recalculate each individual seeding once the winners phantom time is established?

Posted

I would rather implement intervals if I were going to do block moves .... else the error statistic would look horrible? And then recalculate each individual seeding once the winners phantom time is established?

 

Well it doesn't plot position vs time, it plots seeding vs time so I doubt you see as big a step type graph as you had in your earlier post (without having seen the data I can't really say for sure though but in my experience the whole bunch almost never stays together for the whole race, it pretty much always breaks up unless its a really flat race with no wind (I've been seeded everywhere from AL to GL in my short cycling career)). Even if the steps do exist they would result in the correct sort of behaviour (if you finish in a bunch where most people are seeded better than you your seeding will improve, if you finish in a bunch where most people are seeded worse than you you seeding will deteriorate). Agreed that the error function would look a bit off though.

 

That's exactly what it does. It uses current seeding and the race finish times to calculate both the relative difficulty of the race (beta) as well as the quality of the field (adjusted winners time) and then it uses those two parameters to calculate a new seeding for everyone who participated.

Posted (edited)

Hi Guys, 

 

Any statisticians that maybe want to play with data to see how winner time is calculated?

 

I've given up for moment, it does not make any sense in my small head.

 

Some scatter data from me playing about with 99er seeding index vs argus seeding

post-28252-0-03818000-1581669106_thumb.jpg

Edited by Karman de Lange
Posted

it would be nice if someone from PPA could answer that question, I am sure there must be atleast 1 person from PPA on this forum.

Posted

it would be nice if someone from PPA could answer that question, I am sure there must be atleast 1 person from PPA on this forum.

You take a sample of the field and compare that to the results of the Cycle Tour. The theory goes if Sam Gaze et al raced the 99er, what time would they have done. There is no formula, its a bit of estimation as you need to take into account the beta of the field as well. 

 

Hence why seeding for road is easier that MTB.

Posted (edited)

You take a sample of the field and compare that to the results of the Cycle Tour. The theory goes if Sam Gaze et al raced the 99er, what time would they have done. There is no formula, its a bit of estimation as you need to take into account the beta of the field as well. 

 

Hence why seeding for road is easier that MTB.

 

According to website, there is no humans involved .. which means, there should be a formula...

 

"In statistical terms, a linear regression is performed for the event relative to the indexes of the people in the event who also rode one of the base events. This determines how much the winner’s time should be adjusted and what the difficulty factor “beta” should be. There is no subjectivity in this process – it is an automated calculation without human intervention"

 

Edited by Karman de Lange
Posted

Hi Guys, 

 

Any statisticians that maybe want to play with data to see how winner time is calculated?

 

I've given up for moment, it does not make any sense in my small head.

 

Some scatter data from me playing about with 99er seeding index vs argus seeding

attachicon.gifCapture.JPG

 

What data do you have? I think you would need seeding index at the start of a race and the race time for each participant for a particular race. Comparing seeding across races doesn't really measure the same thing.

Posted (edited)

I've pitched my theory for winning times and betas before. By now most riders will come into a race with an existing index. My theory is that every rider's time is plotted on a graph of existing index on the x axis vs finish time on the Y axis. A linear regression is done on this data to get the best fit linear relationship. The Y-axis crossing is essentially the winning time, the time a theoretical rider with index 0, the angle of the graph effectively the beta, showing how much your time should increase as your index does. If your performance is plotted above the graph your index improves, if you're below the graph it stays the same. I was fairly happy with this model, until enter 2020 season. I've found this season particularly difficult to see the indices I've seen in previous years, despite numbers showing this to be my strongest season, form wise, yet. I've seen this feeling confirmed in my close circles, also winners of big races are getting indexes that I got in previous years for less than stellar performances. Then enter the refactor of the PPA groups and I pulled out the foil hat. 

 

The biggest flaw with my model, and potentially the racetec model is the penalty. Consider your average cyclist, they're rolling in to a new season, they've dusted off their road bike after staying fit on the mountain bike and the indoor trainer, they're now sporting a shiny penalty of +6, but their fitness and form is probably only worthy of a penalty of +1. So this average cyclist goes and enters their first race, and they put up a performance worthy of an index with a +1 penalty. This final time is placed onto the data pool with it's +6 penalty. The statistical importance of this "average" rider is it's going to shift the regression to the right, because on average riders are carrying a higher pre existing index than they should be. The effect of this is the theoretical 0 index time is going to shift down the Y-axis, and the theoretical winning time is going to drift away from the actual winners time. Rider's indices are going to be penaltied past this artificially penaltied index and the cycle will continue. Theoretically this should build and build and riders will continue to struggle to get indexes they had before. 

 

Perhaps I'm off base here, it's quite possible racetec have a solution built into their algorithm, or theirs just isn't flawed like my guesstimated one. Perhaps I'm on point and the group shift was in fact a temporary band aid to get around three okes lining up in the A group at PPA races. Thoughts? Is it a general feeling that decent seedings are harder to come by? 

Edited by Jay56
Posted

You take a sample of the field and compare that to the results of the Cycle Tour. The theory goes if Sam Gaze et al raced the 99er, what time would they have done. There is no formula, its a bit of estimation as you need to take into account the beta of the field as well.

 

Hence why seeding for road is easier that MTB.

The seconds is also adjusted, must be a formula

Posted

What data do you have? I think you would need seeding index at the start of a race and the race time for each participant for a particular race. Comparing seeding across races doesn't really measure the same thing.

 

Well, seeding index for "base events" are easy to calculate as the beta and adjusted times are static.

 

So , I have argus seeding times for each rider

 

then I have the end time for each rider per race that did the argus with seeding index less than 100.

 

Then you can of course calculate all the other seeding index for each race.

 

I've tried all variations of the linear aggregation, but with all of them the Y cross point is never the winners times and slope never matches beta.  (check the graphs posted above)

 

but, ive last done stats about 20 years ago, bit rusted

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Settings My Forum Content My Followed Content Forum Settings Ad Messages My Ads My Favourites My Saved Alerts My Pay Deals Help Logout