Funpec-RpAbout The JournalEditorial BoardCurrent IssueAll IssuesSearchIndexersInstructions For AuthorsContactSponsorsLinks

The effect of simulated censored data on estimates of heritability of longevity in the Thoroughbred racing industry
Eleanor M. Burns, Richard M. Enns and Dorian J. Garrick
Department of Animal Sciences, Colorado State University, Fort Collins, CO 80523-1171, USA
Corresponding author: R.M. Enns
E-mail: mark.enns@colostate.edu
Genet. Mol. Res. 5 (1): 7-15 (2006)
Received December 13, 2005
Accepted January 24, 2006
Published February 16, 2006

ABSTRACT. We examined the impact of censored data on estimates of heritability of longevity. Longevity, defined as the length of productive racing life of an individual, is influenced by many factors. A simulated data set, modelled on the Irish Thoroughbred industry, was used to estimate heritabilities of longevity. Several scenarios representing various levels of censoring of performance data were created. The heritability of longevity was estimated for each scenario and compared to the estimated heritability of 0.120 for the complete data set. It was found that the estimates of heritability (0.107, 0.106, 0.082) were biased downwards with (10, 20, and 25%, respectively) censoring of data from poor-performing animals. We found that for a complete reporting it is necessary to reduce bias in the estimation of heritability of longevity.

Key words: Thoroughbred, Longevity, Censored data

INTRODUCTION

Knowledge of factors that influence longevity may allow equine professionals to manage their businesses so that there is less wastage. The development of a genetic evaluation for longevity is made difficult by non-reporting or censoring of performance records. Our objective was to investigate the impact of selective non-reporting on estimates of heritability for longevity.

Longevity is the length of productive life of an individual; it is a trait of considerable economic importance in horses due to the great amount of invested time and money (Wallin et al., 2001). Numerous studies have examined factors that affect career longevity. These factors include conformation, training, environmental conditions, type of competition, age, and sex.

Injury is the major cause of culling in Thoroughbred racehorses (Rossdale et al., 1985; Robinson and Gordon, 1988) and poor conformation may predispose a horse to injury (Stashak, 2002). Bourke (1995), citing a 1994 American Association of Equine Practitioners report, noted that musculoskeletal injuries account for three times the wastage of all other medical problems. More than half of all two-, three- and four-year-old racehorses become lame and 20% of all racehorses eventually suffer a career-ending injury.

Training has been found to have a significant impact on the risk of injury and thus on career longevity. Mason and Bourke (1973) found that trainers had a significant effect on the soundness of two-year-old horses. Ainslie (1988) states that a Thoroughbred racehorse reaches peak performance in the middle of its fourth year of age and maintains this peak through its seventh year. However, many careers are often prematurely shortened due to the stress and strain of early training on immature skeletons and due to insufficient recovery time following injuries.

Temporary environmental factors have been found to have an effect on the risk of injury and on career longevity. Several studies have examined the relationship between track condition and risk of injury. Tracks with some moisture seem to reduce the risk of injury and falls, while dry tracks and very wet tracks both seem to increase the risk of injury (Rooney,1983; Bailey et al., 1997; Williams et al., 2001). Röhe et al. (2001) found that track condition resulted in a difference of 2.1 s per kilometer between dry and muddy tracks. The increased speed associated with very dry tracks may contribute to the increased risk of injury.

The effect of type of competition on career longevity has been evaluated. Bailey et al. (1998) reported that the most important risk factor was the type of race. McKee (1995) found that the fatality rates in national hunt flat, hurdle and steeplechase races were 0.47, 0.49 and 0.70%, respectively. Horses in hurdle races were about four times as likely to suffer musculoskeletal breakdown, while horses in steeplechases had eight times more chance of suffering an injury, compared to horses racing on the flat (Bailey et al., 1998). The presence of barriers likely explains this.

Age at start of career has a significant impact on career longevity. High levels of loss occur in the first or second racing seasons (Mason and Bourke, 1973; Mohammed et al., 1991; Bourke, 1995); while lack of ability contributed to this loss, a significant proportion of horses may be retired due to injury or disease associated with training and racing. Several studies have found that the risk of injury increases with age (Robinson and Gordon, 1988; Mohammed et al., 1991; Bailey et al.,1997), but starting to compete at an older age has a negative impact on career duration (Bourke, 1995; Ricard and Fournet-Hanocq, 1997).

Bailey et al. (1999) found that female racehorses were less likely to race and Bourke (1995) found that males did race on average one racing season longer than females. This may be because females tend to win less money (Minkema, 1975) and because males are faster than females (Leroy et al., 1989; Röhe et al., 2001).

MATERIAL AND METHODS

The SheepSim Program (Sherlock RG, unpublished data) was parameterized to simulate records for a population of racing Thoroughbreds representative of the Irish Racing industry. This program generates phenotypes, genotypes, breeding values, and quantitative trait loci for defined traits in sheep populations; it was modified for the appropriate equine traits and industry structure in our study, as outlined below.

The simulation was parameterized to reflect the current structure of the Irish Thoroughbred racing industry. A base population of 50,000 animals (Horse Racing Ireland, 2003) consisted of 5,469 horses in training and 16,823 horses in breeding, with the remaining assumed to be foals, yearlings and those not actively training. The breeding population was determined to consist of 356 stallions and 16,467 mares (Horse Racing Ireland, 2003). A foaling rate of 62% was estimated by averaging the reported number of live foals per mare bred for the years 2001 and 2002. The number of mares bred to each stallion varied from one mare to over a hundred. The average mating ratio of 45 mares to each stallion was estimated using the ratio of stallions to broodmares in Ireland for the years 2001 and 2002. Inbreeding depression was not simulated in this population. The mating age varies greatly in Thoroughbreds, but the average age of a stallion standing at stud was estimated to be 12 years, and the average age of a broodmare was estimated to be nine years of age. Age at culling was defined as 18 and 15 years of age for stallions and mares, respectively. Each year, stallions were indiscriminately mated to mares, with random numbers of mares mated to particular stallions, resulting in a range of one to one hundred matings.

Phenotypic values for racing time were generated for all horses in the base population and in subsequent generations. Five race times (RT1 to RT5) for flat races were simulated, with each considered a different trait. The RT1 represented a training time that would determine if a horse was culled before its first race. The RT2 to RT5 observations each represented the average racing time for four different annual periods of racing. The assumed first and second moments used as input values are summarized in Tables 1 to 4. Genetic and residual variances were from values estimated by Oki et al. (1995).





Culling was implemented by sequentially overwriting particular racing times. All race times were deleted for a random 14% of births, corresponding to the number of foals unregistered with the Jockey Club each year (Jeffcott et al., 1982). The lowest 26.2% RT1 yearlings had subsequent race times deleted to represent individuals that are not trained and those that are trained but never go on to race. The lowest 40% of horses for RT2 had subsequent race times deleted to reflect the number of unsound horses culled at the end of their first year of racing. Among those remaining with an RT3 observation, 10% with low performance were culled, and among the remainder another 10% were culled based on low performance for RT4. This represents animals culled due to lack of ability and likely errs on the conservative side. Additional involuntary culling due to death or injury was not simulated. No individual was allowed a race time observation in the same year as it was represented as a parent. Breeding animals were selected based on age and phenotypic observations for prior racing time. Stallions ranged in age from 4 to 18 years, with an average age of 12 years. Broodmares ranged from 4 to 15 years of age, with an average age of 9 years. Annual selection intensities of 1.40 and 0.64 corresponding to proportions selected of 0.2 and 0.6 were used for stallions and mares, respectively.

Animals were assigned longevity scores based on the number of race time observations. Animals without RT2 or later race times had a longevity observation of one (N = 20,514). Animals last observed at RT2 and RT3 had longevities of two (N = 18,867) and three (N = 25,903), respectively. Animals with RT4 or RT5 had longevity four (N = 40,191) .

The Animal Breeders Tool Kit (ABTK) (Golden et al., 1992) and utilities comprising the Linux operating system were used to prepare the data by sorting and joining pedigree files and performance records. Incidence matrices that included birth year as a contemporary group, as well as sex and animal, were created.

The genetic, phenotypic and residual variances, heritabilities, and means used to describe the Irish Thoroughbred population were from studies by Oki et al. (1994, 1995). It was assumed that all races were one-mile flat races, although this is not an accurate representation of racing in Ireland. Culling was based only on poor phenotypic values, and it did not account for horses that would be progressively involuntarily culled due to injury or death or retired early for breeding. Horses were assumed to be failures, i.e., they received a score of one for longevity, if they failed to train for racing; this does not account for Thoroughbreds that are productively used in other areas, such as dressage and hunting.

The effect of selective reporting on heritability estimates was evaluated by comparing heritability estimated from the complete data set to six scenarios representing subsets of the data. Corresponding pedigree information was not deleted when an individual was excluded in a particular subset, so that the individuals with unknown performance records could still be used as parents. This approach was used to represent a common occurrence in the equine industry, as poor-performing individuals with favorable pedigrees are often retained as breeding animals.

Based on phenotypic values for longevity, 10% (13,184) of the animals with the shortest productive lives had their longevity phenotypes deleted (i.e., treated as missing) in the scenario Bottom10. This would be typical in the Thoroughbred industry; records are generally maintained on successful animals but observations are seldom recorded on unsuccessful animals. Successful animals would be considered those that complete training and go on to race before the age of four. In Bottom20, the bottom 20% (26,368) of animals was deleted based on phenotypic observations for longevity. This scenario represents animals that failed during training or had their racing unreported; there is a lack of recorded observations for unsuccessful individuals. Based on phenotypic observations, the bottom 25% (32,960) of animals were deleted in Bottom25.

In Random10, a random 10% of records were deleted, simulating animals being unreported at all levels of racing performance. This method of random deletion was also used to exclude 20 and 25% of animals in scenarios Random20 and Random25.

The underlying determinants of the observed longevity score were described using the following model equation

yijk = µ + cgi + sj + ak + eijk

where yijk is the performance of animal k of sex j in contemporary group i, µ is the mean, cgi is the fixed birth year contemporary group effect, sj is the sex effect, ak is the additive genetic effect, and eijk is the residual.

Solutions to the fixed and random effects in the model were obtained by setting up and solving the relevant mixed model equations as if longevity scores were continuous variables. In matrix form:


where y is the vector of longevity observations, b is the vector of birth year and sex fixed effects, u is the vector of additive genetic effects, X is the incidence matrix for fixed effects, Z is the incidence matrix for random effects, A-1 is the inverse of the numerator relationship matrix for the animals in u, and l is the variance ratio, where


Method  (Reverter et al., 1994) was used to estimate heritability using the software package, DS6 (Golden et al., 1992). This implementation of Method  involved selecting 100 random subsets comprising of 50% of the performance records. The mixed model equations were set up and solved using the Gauss-Seidel method to obtain estimates of the breeding values from the complete data and from each subset. Five hundred Gauss-Seidel iterations were used to solve the mixed model equations unless a convergence criterion of 0.0001 was reached. For each subset of data, the regression of the estimated breeding values from the complete data on the estimated breeding values from the subset of data was calculated to give a value for Â, the regression coefficient in Method Â. The value of this coefficient is then used to successively increase or decrease the estimate of lambda until a lambda is found that gives a regression value of one for this particular subset of data. Given 100 subsets, there will be 100 estimates of lambda. The median value of lambda among these estimates was used to represent the heritability of longevity using the formulas:



where w is residual and A is additive.

A standard error (se) of the estimate of heritability of longevity was estimated using:


where s is the sample standard deviation of 100 estimates of the heritability of longevity, n is 100, the number of estimates of the heritability of longevity.

Birth year and sex were fitted as fixed effects along with animals as a random effect.

RESULTS

The heritability of longevity, using the complete uncensored data set, was estimated to be 0.120 ± 0.0005. Bottom10, Bottom20 and Bottom25 represented non-reporting of the poorest performing individuals at different levels. Sampling data in this non-random fashion causes selection bias as lower performing animals in the population are underrepresented. In each of these three scenarios, the estimate of heritability was biased downwards, i.e., heritability was underestimated. The estimate of heritability decreased to 0.107 ± 0.0005 in Bottom10, 0.106 ± 0.0007 in Bottom20 and 0.082 ± 0.0006 in Bottom25.

The scenarios Random10, Random20 and Random25 represented different levels of non-reporting of data of animals across performance levels. Each animal had an equal probability of being included in the sub-samples, and thus sample bias should be minimized. Heritabilities of longevity remained unchanged in each scenario: 0.117 ± 0.0006 in Random10, 0.116 ± 0.0006 in Random20 and 0.115 ± 0.0008 in Random25.

DISCUSSION

Longevity is of enormous economic importance in the Thoroughbred racehorse due to the magnitude of money and time invested in these horses. In comparison to heritability estimated using the complete uncensored data set, heritability estimates were biased downwards when less successful animals were excluded. Non-reporting of poor performing individuals in field data has been found to be common in all livestock species (Mallinckrodt, 1993; Bourdon, 2000), and this cannot be accounted for in Best Linear Unbiased Prediction (BLUP) procedures. Selective reporting creates a subset of data that does not accurately represent the population and thus biases the estimated values (Mallinckrodt et al., 1995). In contrast, heritability estimates remained relatively unchanged, with the exclusion of data across all levels of performance, as random exclusion did not bias the estimate of heritability.

One of the purposes of genetic evaluation is to partition superiority or inferiority of performance into genetic and environmental effects in an effort to facilitate the selection of the best parents for producing the next generation. Accurate and uniform reporting of data allows for the more reliable separation of the environmental effects from genetics than is the case when data are selectively reported.

The industry’s current method of data recording makes accurate estimation of heritability difficult; while data are readily available on successful individuals, data are often missing on unsuccessful animals. Complete reporting of data, even for unsuccessful animals, will greatly enhance the accuracy and usefulness of genetic evaluation in the equine industry.

It is clear that there is a need for complete reporting; non-reporting of 25% of the worst animals resulted in a 32% decrease in the estimate of heritability of longevity. Genetic progress can be significantly increased with the use of BLUP animal models; however, censored reporting will decrease the accuracy of these predictions, as BLUP is unable to account for bias due to non-reported records. Accuracy of the estimation of heritability and breeding values can be improved with data that are free of reporting bias.

The investigation of data misreporting or falsification was beyond the scope of our study but also warrants investigation. We did not examine effects of non-reporting on individual breeding values. Individual breeding value estimates would be expected to change due to the effects of non-reporting on contemporary groups. Future research is recommended in these areas.

CONCLUSIONS

The benefits of genetic evaluation are numerous; many European breed associations have found an increased rate of genetic improvement since the implementation of genetic evaluation. Education is the key to the large-scale acceptance of genetic evaluation in the equine industry. Breeders and owners need to be taught that breeding values are a useful tool when selecting breeding animals. Equine professionals also need to be made aware of the importance of recording information on all animals. Performance testing and the implementation of a Universal Equine Life Number have greatly increased the accuracy in recording information on Warmbloods and Sport Horses in Europe. The requirement that all Thoroughbreds must be registered with the Jockey Club in order to race or to be used as breeding animals has been a very successful method for ensuring that animals are recorded. However, linking performance records with such pedigree information is difficult as performance records are disparate. A common database that links pedigree information and performance records for all individuals would greatly improve the accuracy of prediction.

REFERENCES

Ainslie T (1988). Ainslie’s complete guide to Thoroughbred racing. Fireside, London, UK.

Bailey CJ, Reid SW, Hodgson DR, Bourke JM et al. (1997). A retrospective case-control study of musculoskeletal racing injuries in Australian Thoroughbreds. 8th Symposium International Society Veterinary Epidemiology and Economics, pp. 3.06.1-3.06.2.

Bailey CJ, Reid SW, Hodgson DR, Bourke JM et al. (1998). Flat, hurdle and steeple racing: risk factors for musculoskeletal injury. Equine Vet. J. 30: 498-503.

Bailey CJ, Reid SW, Hodgson DR and Rose RJ (1999). Factors associated with time until first race and career duration for Thoroughbred racehorses. AJVR 60: 1196-1200.

Bourdon RM (2000). Understanding animal breeding. Prentice Hall, Upper Saddle River, NJ, USA.

Bourke JM (1995). Wastage in Thoroughbreds. Animal Seminar, Equine Branch, New Zealand Veterinary Association. Foundation for Continuing Education, No. 167, pp. 17-119.

Golden BL, Snelling WM and Mallinckrodt CH (1992). Animal breeder’s tool kit user’s guide and reference manual. Agricultural Experiment Station Technical Bulletin LTB92-2. Colorado State University, Fort Collins, CO, USA.

Horse Racing Ireland (2003). Strategic Plan 2003-2007. Dublin, Ireland.

Jeffcott LB, Rossdale PD, Freestone J, Frank CJ et al. (1982). An assessment of wastage in Thoroughbred racing from conception to 4 years of age. Equine Vet. J. 14: 185-198.

Leroy PL, Kafidi N and Bassleer E (1989). Estimation of breeding values of Belgian trotters using an animal model. European Association for Animal Production Publication No. 42., London, England, pp. 3-17.

Mallinckrodt CH (1993). The effect of animal model approximations and data problems of the reliability of genetic evaluations. Ph.D dissertation, Colorado State University, Fort Collins, CO, USA.

Mallinckrodt CH, Golden BG and Bourdon RM (1995). The effect of selective reporting on estimates of weaning weight parameters in beef cattle. J. Anim. Sci. 73: 1264-1270.

Mason TA and Bourke JM (1973). Closure of the distal radial epiphysis and its relationship to unsoundness in two year old Thoroughbreds. Aust. Vet. J. 49: 221-228.

McKee SL (1995). An update on racing fatalities in the UK. Equine Vet. Educ. 7: 202-204.

Minkema D (1975). Studies on the genetics of trotting performance in Dutch trotters. Ann. Génét. Sél. Anim. 7: 99-121.

Mohammed HO, Hill T and Lowe J (1991). Risk factors associated with injuries in Thoroughbred horses. Equine Vet. J. 23: 445-448.

Oki H, Willham RL and Sasaki Y (1994). Genetics of racing performance in the Japanese Thoroughbred horse: I. Description of the data. J. Anim. Breed. Genet. 111: 121-127.

Oki H, Sasaki Y and Willham RL (1995). Genetic parameter estimates for racing time by restricted maximum likehood in the Thoroughbred horse of Japan. J. Anim. Breed. Genet. 112: 146-150.

Ott RL and Longnecker M (2001). An introduction to statistical methods and data analysis. 5th edn. Duxbury, Pacific Grove, CA, USA, p. 175.

Reverter A, Golden BL, Bourdon RM and Brinks JS (1994). Method  variance components procedure: application on the simple breeding value model. J. Anim. Sci. 72: 2247-2253.

Ricard A and Fournet-Hanocq F (1997). Analysis of factors affecting length of competitive life of jumping horses. Genet. Sel. Evol. 29: 251-267.

Robinson RA and Gordon B (1988). American Association of Equine Practitioners track breakdown studies - horse results. 7th International Conference of Racing Analysts and Veterinarians, Lexington, KY, USA, pp. 385-394.

Röhe R, Savas T, Brka M, Willms F et al. (2001). Multiple-trait genetic analyses of racing performance of German trotters with distanglement of genetic and driver effects. Arch. Tierz. 44: 579-587.

Rooney JR (1983). Track condition in relationship to fatigue and lameness in Thoroughbred racehorses. Equine Vet. 134-135.

Rossdale PD, Hopes R, Wingfield Digby NJ and Offord K (1985). Epidemiological study of wastage among racehorses 1982 and 1983. Vet. Rec. 116: 66-69.

Stashak TS (2002). Lameness. In: Adams’ lameness in horses. 5th edn. Chapter 8. Lippincott Williams and Wilkins, Philadelphia, PA, USA.

Wallin L, Strandberg E and Philipsson J (2001). Phenotypic relationship between test results of Swedish warmblood horses as 4-year-olds and longevity. Livest. Prod. Sci. 6: 97-105.

Williams RB, Harkins LS, Hammond CJ and Wood JL (2001). Racehorse injuries, clinical problems and fatalities recorded on British racecourses from flat racing and National hunt racing during 1996, 1997 and 1998. Equine Vet. J. 33: 478-486.

   Copyright © 2006 by FUNPEC