Jump to content

Seeding: Who determines BETA?


daniemare

Recommended Posts

Posted

Shebeen quoted from PPA weebsite

"The assumption is made that the same riders should have the same index for both events, so the winner’s time of the fun ride is now adjusted and the “beta” is calculated to achieve this."

 

In my non-statistical, but I can feel my own legs opinion, that assumption is only accurate for the sharp end of the Field.  With Pure Savage's comment proving this.

 

The same Middle of the Field Rider is far more likely to slog it out alone over the last 30km in the Tour de PPA than in the CTCT. Thus I will end closer to the Winner in the CTCT than I would ever in the Tour de PPA.

- 99ner > Try going up Vissershok and Odendaal Road and end within the same ratio as the winner as a middle of the field guy compared to the easy coast through Llodudno and Camps Bay

- Tour de PPA > finishing fairly on your own due to small field into an always present headwind up Adderley for 20km and end in the same ratio to the winner compared to the one Faster Group after the Other you can draft towards the end of the CTCT.

- Not to mention the extra kick you get from the general race atmosphere and support on the CTCT

 

So regardless of the Algorithm or Logic the esteemed and knowledgeable Hubbers are trying to explain to me, FOR ME the CTCT is FAR easier than any other race than venture into the loneliness of the Windy Outer Durbanville Vlaktes, yet the BETA says otherwise.

 

I know I am not an E. I see the times of those Es in the 99ner, Tour de PPA etc.

 

Bottom line, I am not even close, and will probably fall back a bit to still stretch myself but still enjoy the ride

 

And PS - how was this CTCT at 1.07 more difficult than the base race which is the CTCT given the weather conditions on the day.  Is it suggested that the weather can even be more perfect. Wow!

Posted

The way I saw this is that there might have been a sufficient large number of riders that normally ride much better times, but due to circumstances they took it a bit easier, resulting in them having a relative "slower" time than the computer would have predicted. 

 

The other "theory" is the impact of the MTB riders on the seeding.  When doing a MTB race I generally get a higher index than I get for road races, thus if there is sufficient of other riders like me, this could have an impact on the overall beta, as relative to the expected seeding we are now riding slower.   

 

Thus those races where "fewer" mtb seeded riders are racing would probably come up at a lower beta (one tonner), vs where more mtb seeded riders are racing would come up higher (ie CTCT)

 

but please don't make them change the system, as it helps me alot ;-)

Posted

My 2c based on my hobbyist analysis of the system and my need to understand this magic number that defines my quality as a bike rider, perhaps with some gaps filled in with how I would design the algorithm if I was designing the system. I don't believe anything is thumb sucked, I believe all results are plotted on a Y axis against the rider's previous index before the race on the X axis for every person with a previous index between 0 and 100. Linear regression could then be used to derive a linear plot that would on average predict a rider's time based on their index coming in to the race. The importance of this plot, it can be defined by it's Y intersect and it's gradient. The Y intersect, or the theoretical time of a perfect rider starting the ride with a seeding index of 0, is the theoretical winning time. The gradient is effectively the beta. Conversely this plot and it's relationship between index and time could be used to calculate an index that a rider should have had coming in to the race to get the time they achieved. 

 

As far as the beta being reported as based on the conditions, I can only guess that perhaps poor conditions effect a high index rider more than a low index rider. I can see a similar system would have had issue with a rider having an index of their best possible ride. I can see someone would have tried to solve this with an averaging system, but since we rarely really reach our best performance on race day I can see that poor results would always be factored into the index. Taking only a rider's best index wouldn't factor in decaying form so the penalty system is an effective way of retiring out old results. So for a rider actively engaging in the system, their index is going to tend towards an index of their best performance, on average for a mass of riders engaging in the system everyone's index will tend towards being an index of their best performance.

 

If I'm close then winning time and beta are contributed to by every person with a previous index. Since it's a linear regression no single person would have a large influence over how close the theoretical winner and actual winner's times are, so too would no single rider have influence over the rate that predicted finishing time increases based on riding index. I haven't come up with a good idea for starting the system, when no rider already had an index, but I'm sure that could be solved easily enough. 

Posted

I haven't come up with a good idea for starting the system, when no rider already had an index, but I'm sure that could be solved easily enough. 

 

That's the easy bit. "Adjusted" Winner's Time = Actual winners time and beta = 1. Then you have a set of seedings for the next race. The seedings might be a bit volatile for the first few races as things stabilise but I don't see why you would need to make it any more complicated than that.

Posted

That's the easy bit. "Adjusted" Winner's Time = Actual winners time and beta = 1. Then you have a set of seedings for the next race. The seedings might be a bit volatile for the first few races as things stabilise but I don't see why you would need to make it any more complicated than that.

 

It's easy for the first race, as you say. Then for the second race you have a very small percentage of the field that have previous index and therefore you may not get a viable representation of field as a whole. You could have a situation where you start getting < 0 betas from a small enough subset of data. So then you could give everyone without an index their index they would have got from the actual winner, beta = 1 model, but thats not going to give them a completely accurate index of quality, but it may be enough to kick the system off until it stabilise. Then how do you phase out the no previous index riders and is this even  necessary? It was around that stage of consideration that I decided meh, good enough. With all seeding and index data being blocked behind ID numbers there's not a lot of data to test theories. 

 

Edit: That said, at the point that the seeding was kicked off there may have been enough historical data to go straight into a stable system. 

Posted

Shebeen quoted from PPA weebsite

"The assumption is made that the same riders should have the same index for both events, so the winner’s time of the fun ride is now adjusted and the “beta” is calculated to achieve this."

 

In my non-statistical, but I can feel my own legs opinion, that assumption is only accurate for the sharp end of the Field. With Pure Savage's comment proving this.

 

The same Middle of the Field Rider is far more likely to slog it out alone over the last 30km in the Tour de PPA than in the CTCT. Thus I will end closer to the Winner in the CTCT than I would ever in the Tour de PPA.

- 99ner > Try going up Vissershok and Odendaal Road and end within the same ratio as the winner as a middle of the field guy compared to the easy coast through Llodudno and Camps Bay

- Tour de PPA > finishing fairly on your own due to small field into an always present headwind up Adderley for 20km and end in the same ratio to the winner compared to the one Faster Group after the Other you can draft towards the end of the CTCT.

- Not to mention the extra kick you get from the general race atmosphere and support on the CTCT

 

So regardless of the Algorithm or Logic the esteemed and knowledgeable Hubbers are trying to explain to me, FOR ME the CTCT is FAR easier than any other race than venture into the loneliness of the Windy Outer Durbanville Vlaktes, yet the BETA says otherwise.

 

I know I am not an E. I see the times of those Es in the 99ner, Tour de PPA etc.

 

Bottom line, I am not even close, and will probably fall back a bit to still stretch myself but still enjoy the ride

 

And PS - how was this CTCT at 1.07 more difficult than the base race which is the CTCT given the weather conditions on the day. Is it suggested that the weather can even be more perfect. Wow!

I can tell you riding on my ace into that headwind from suikerbosie was tougher than I wanted.

 

 

Sent from my iPhone using Tapatalk

Posted

I can tell you riding on my ace into that headwind from suikerbosie was tougher than I wanted.

 

Sent from my iPhone using Tapatalk

What he said… That headwind was fairly unusual.

Posted

It's easy for the first race, as you say. Then for the second race you have a very small percentage of the field that have previous index and therefore you may not get a viable representation of field as a whole. You could have a situation where you start getting < 0 betas from a small enough subset of data. So then you could give everyone without an index their index they would have got from the actual winner, beta = 1 model, but thats not going to give them a completely accurate index of quality, but it may be enough to kick the system off until it stabilise. Then how do you phase out the no previous index riders and is this even  necessary? It was around that stage of consideration that I decided meh, good enough. With all seeding and index data being blocked behind ID numbers there's not a lot of data to test theories. 

 

Edit: That said, at the point that the seeding was kicked off there may have been enough historical data to go straight into a stable system. 

 

Some fair points.

 

I'd use the actual winner's time and a beta of 1 until you get to the point where the standard error of your estimates is below a certain threshold and then move over to the statistical measures of adjusted winners time and beta.

 

There is probably some sort of fancy inferential statistics that you can do to speed up this convergence/stability but you are never really starting from a zero base because of the massive amounts of data available on finishing times from various races. Theoretically you could do all of the beta = 1 calcs to get a preliminary seeding for each person (based on races over the last few years) before the first race where you actually used your system.

Archived

This topic is now archived and is closed to further replies.

Settings My Forum Content My Followed Content Forum Settings Ad Messages My Ads My Favourites My Saved Alerts My Pay Deals Help Logout