IS IT POSSIBLE TO ESTIMATE MATCH RESULT IN VOLLEYBALL : A NEW PREDICTION MODEL

This study investigates the power of variables in a logistic regression model (the efficacy model or (EM)) to explain the match results in the Turkish Men’s Volleyball League (TMVL) and the Turkish Women’s Volleyball League (TWVL) in terms of the players’ positions. The dependent variable was the match result, and the power of the variables libero player efficiency (LPE), setter efficiency (SE), middle blocker efficiency (MBE), outside hitter efficiency (OHE) and universal player efficiency (UPE) were separately investigated for both genders. The EM accurately classified 83.45% of the games won and lost in the TWVL. The sensitivity (proportion of won games classified as won) and specificity (proportion of lost games classified as lost) was 85.03 and 81.88%, respectively. In the TMVL analysis, the classification accuracy, sensitivity and specificity were 78.23, 78.77 and 77.70%, respectively. Moreover, for both genders, the match results were chiefly explained by the SE, MBE, OHE and UPE. The LPE variable could not predict the results in the TWVL.


Introduction
Almost all Olympic sports hold separate tournaments for men and women.The main reason for not competing men against women is the gender factor, which is considered to have an important influence on sport performance (Beim, Winter, 2003, p. 35).Men and women largely differ in their physical body composition, muscle mass, hormonal secretions and oxygen consumption (Rickenlund et al., 2003;Korhonen, Mero, Suominen, 2003).Kinanthropometric properties significantly differ by gender and somatotype, affecting the selected sports, positions and performances of men and women (Gualdi-Russo, Graziani, 1993).However, the only gender-based rule of volleyball is the height of the net.The similarities and differences between male and female volleyball players have not been investigated from a modelling perspective.In volleyball, as in other branches of sports, numerous statistical data have been collected by match analysis programs.To interpret this large body of data and make inferences, we need statistical tools.In particular, statistical models can reveal the relationships among various variables and extract information

Methods
The efficacy model (EM) employed in this study was developed in Akarçeşme's (2010) PhD thesis.The EM has successfully predicted the match results in women's volleyball.From the models formed in this study Efficiency Model estimates the match result right with 87.65%; Technical Elements Model estimates the match result right with 84.66%.
The EM model outputs a binary variable result.Therefore, we employed a logistic link function and analysed the data by logistic regression.Before obtaining the EM, we trialled many variables that would likely affect the match Vol. 19, No. 3/2017 Is it Possible to Estimate Match Result in Volleyball: A new Prediction Model result.The match outcome did not singly depend on excellent serve reception by the libero, the ratio of excellent serve reception of the team (excepting the libero), points achieved from block, serve and other motions by the setter (for example setter tip, setter kill block), serve reception fault, points from attack and other motions by the outside hitter, points from block, block error and other motions by the middle blocker or the total number of points from attack, attack error and other motions by the universal player.Therefore, we calculated the net differences in player positions between the teams' successes and failures, defined new variables and examined whether these variables significantly explained the match results.Because these derived variables define the efficacies in terms of player positions, they are referred to as efficacy variables.
Using the models yielded by the analyses, we predicted the match results of TWVL and TMVL by estimating the model coefficients and conducting a diagnostic study.The initial model prediction was based on data from the women's league.The variance inflation factor (VIF), conditional eigenvalues and index values were derived from the multicollinearity between independent variables.Moreover, the goodness of fit of the model was determined from the Bayesian and Akaike information criteria (BIC and AIC respectively), which quantify the model fitness.The predictive power of the model was assessed by constructing a classification table.Finally, the effect of potentially anomalous data on the goodness of fit was analysed using standardised Pearson residuals and deviance residual statistics.The influence of likely anomalous observations on the fitted function was checked by calculating leverage values.The model based on these new efficacy variables obtained very satisfactory goodness of fit values, confirming the statistical significance of the model.The power of the individual efficacy variables in explaining the match result was significant at the 0.05 level.Accordingly, the variable libero efficacy (LE) made the largest contribution to a match win bet (the probability ratio of a match win to a match loss).The other variables also contributed, but to a lesser extent.However, the sensitivity of the coefficient prediction (proportion of won games classified as won) was lower for LE than for the other parameters, because the standard error of parameter prediction was highest for LE.
The study population was the match data of 12 teams competing in Men's and Women's Premier Volleyball Leagues in Turkey during the 2010-2011 season.The sample sizes of the men's and women's leagues were 296 observations (games) in 148 matches and 300 observations in 150 matches, respectively.As these samples covered the whole population, the modelling created from the obtained data was assumed to be valid.The statistical data were used under the consent of the Turkish Volleyball Federation.Moreover, in the leagues of both genders, the teams were entitled to align three foreign players in each game.
In this study, the results of the volleyball matches were correlated with the variables establishing the results for each gender.

Model variables
Result: This variable defines the result of a match.The winning team is represented by 1 and the losing team by 0.
Libero player efficiency (LPE): This variable evaluates the error made by a libero player during serve reception throughout the match.The libero is specialised in defensive skills and cannot attack the ball, so this value starts from 0 and becomes more negative with increasing error.The LPE is generally negative because libero players lack technical flexibility, which would reverse their likelihood of making an error during serve reception.
Setter efficiency (SE): This variable represents the difference between the points scored and points conceded by a setter during all games attended by that setter throughout a match.

Cengiz Akarçeşme
Middle blocker efficiency (MBE): The MBE represents the difference between the points scored and points conceded by middle players during all game events throughout a match.The simultaneous evaluation of the MBEs of two middle players is called the MBE.
Outside hitter efficiency (OHE): The OHE represents the difference between the points scored and points conceded by both number 4 hitters during all games throughout a match.
Universal player efficiency (UPE): This variable represents the difference between the points scored and points conceded by a universal player, also referred to as an opposite or power spiker, during all games throughout a match.
In the first part of this study, we constructed the EM, a logistic regression function.For this purpose, we examined the multicollinearity, conducted goodness of fit tests and analysed the influential observations and regression diagnostics (Weisberg, 2005, pp. 211-216).Potential problems with the model were also examined.The efficient observation analysis, which identifies potential anomalous observations that significantly affect the regression model, was conducted first.Certain observations in the dataset may deviate from the general trend and show characteristics that are very distinct from those of other observations.Such anomalous observations can potentially, but do not necessarily, worsen the goodness of fit of the predicted regression model.Efficient observations are anomalous observations that largely affect the predicted parameters (Hosmer, Lemeshow, 2000, pp. 48-56).
One way to handle efficient observations is to exclude them from the analysis.As the predictive model is an average model, such observations attract the regression line, distorting the overall trend.Therefore, we identified and excluded the efficient observations from the model prior to the prediction.
Efficient observations may be analysed by a variety of methods.Any single one of these analyses would give an inaccurate assessment, so several analyses were performed.
The dependent variable Result in the EM, which takes values of 1 or 0, is a categorical variable with a binomial distribution.Therefore, the model was based on a logistic function.To explain the dependent variable Result through the independent variables LPE, SE, MBE, OHE and UPE, we formulated the EM as follows.

e
The logistic regression model is a probability model and Π(X) refers to the probability of winning a match.Such a nonlinear function cannot be predicted by the least squares (LS) method, as in linear regression models.
To allow an LS interpretation, we perform a logit transformation of Π(X).The transformed function is referred to as a logit function.
Formula 2 defines the bet ratio of winning the match.The βparameters define the variation in the log (bet ratio) for each unit change in the respective efficacy variable.

Results
Table 1 lists the outputs of the predictive logit function.As the results of the prediction model, without the setter efficiency (SE) variable, all other variables explains the result variable at the significance level of α = 0.05.The values of the variables effects the result of the match at significance level.The SE variable is significant at the level of α = 0.06.
Table 2 summarises the anomalous value analysis of the prediction model.For this purpose, we calculated the Pregibon residuals, the standardised Pearson residuals and the deviance statistics (Pregibon, 1981).Observations 145 and 286 satisfied the efficient observation criterion for all three statistics, so they were identified to avoid overestimation.Table 3 analyses the anomalous observations 145 and 286 in the Turkish Women's Volleyball League.The Pgbon, stress and dv columns list the Pregibon residuals, Pearson residuals and deviance residuals, respectively.The Pregibon residuals of both observations exceed 0.2 and the Pearson and deviance residuals are higher than 2, confirming that observations 145 and 286 are efficient observations.
After removing these observations and re-performing the logit regression, we obtained the results of Table 4.As a final diagnostic analysis of the accuracy of the model, we determined whether any of the independent variables (LE, PE, OOE, PCE and number 4 efficiency (N4E)) were multicollinear.Any linear correlations between these variables would expand the variance of the coefficient predictions, causing errors in the model parameter predictions.Multiple correlations can be identified by several approaches.The correlation matrix provides an initial indicator of such correlations.As shown in Table 5, the VIFs are very close to 1, the tolerances are approximately or slightly less than 1, the R 2 values are small, all eigenvalues except the first are small and the conditional indexes are below 10.All these results confirm that there is no multicollinearity between the independent variables, and the prediction model can be considered as successful.
The predicted efficiency model successfully passed all diagnostic tests.Therefore, we can reliably state that result is a function of LE, PE, OOE, PCE and N4E.Next, the extent to which the predicted efficacy model can accurately predict the match result was examined in a classification table.The classification table for the TWVL EM is given as Table 6.According to these findings, the logistic regression model correctly classified 83.45% of the games won and lost in the TWVL.The sensitivity and specificity (proportion of lost games classified as lost) of the model were 85.03 and 81.88%, respectively.
The same analyses were then performed for the TMVL.The results of the logistic progression model for TMVL are presented in Table 7.
To identify the anomalous observations, we calculated the standardised Pearson residuals and plotted them against the observations.Some of the game results stray far from the main group of observations.For these observations, the model predicted a lost match as won with a very high probability or vice versa.Table 7 has the power to identify all prediction model's variables on the 0.05 significance level.On the other hand, with 1 point increase at efficiency values, all variables are constant without Odds Ratio values which increased by 1.290 at LPE, 1.125 at SE, 1.082 at MBE, 1.156 at OHE and 1.152 at UPE on the bet levels, between match results as won rather than loss.As shown in the anomalous observation analysis (Table 8), the teams in observations 773, 895, 711, 722 and 933 lost the matches they were expected to win; conversely, the teams in observations 802, 770, 917, 93,5 and 901 won the matches they were expected to lose.After analysing these anomalous observations, observations 901 and 935 passed the criteria for influential observations in three different checks, as seen in Table 9.It seems that the results of logit regression prediction results in the table 7 with 296 compared observed results, without 901 and 935 observations, model's adaptability has improved.The value of LR chi 2 = 136.49has been raised to LR chi 2 = 145.79; the value of Pseudo R 2 = 0.3326 has been raised to Pseudo R 2 = 0.3577.Excluding these two observations, effects logit regression factors as impoving explanatory power of the LPE's result variable from p> |z| = 0.009 to p > |z| = 0.002.At the same time all variables in this table has the power to identify efficiency model.Also Odds Ratio bet results has increased at all variables without LPE.It has confirmed that there is no multiple linearity between the model's arguments with Table 11; VIF (Variance Sweeling Factors) are close to 1, tolarance values are below 1 and close to 1, R 2 values are minor, eigenvalues (Eigenval) are majör at the first value and the condition indexes (Con Index) are smaller than 10.The model correctly classified 78.23% of the wins and losses in TMVL.The sensitivity and specificity of the model in TMVL were 78.77 and 77.70%, respectively.

Discussion
There are several researches in the literature to predict the match result in volleyball, but the number that evaluates them in a model that includes many parameters are very less.
Nikos, Elissavet (2011) investigated the relationship between impulse performance and attack tempo and they found that the success rate of the offenses with quick tempo attackers was more high and setter's performance and attack tempo as determinants of attack efficacy in Olympic level male volleyball teams.
In a similar manner, Marcelino, Sampaio, Mesquita (2012) investigated the relationship between attack and serve performance in different periods of time and opponent team level from 600 rallies which were selectively sampled from a total of 5.117 rallies at Men's World Cup 2007 and determined that service and attack performance changed in relation to both opponent team level and match period.Silva, Lacerda, Joao (2013) analyzed 24 matches during the men's Senior Volleyball World Championship-Italy 2010 for to understand what happens when the setter is in the attack zone (zones 4, 3 and 2).They determined that the team had a negative effect on the successful outcome when setter came to attack zone position (zones 4, 3, 2) because the team could not effectively perform the offensive skills needed to win such as offensive side out.
In the Men's Volleyball World Championship, the outcomes were best explained by attack error, jump serve point, quick ball error and jump serves (Asterios et al., 2009).Analysing the Greek premier league male volleyball teams during 2005-2006, Drikos et al. (2009) ) found that team performance was most affected by the serve efficacy ratio and attack efficacy ratio.In female volleyball, Monteiro et al. (2009) linked the set outcome to attack efficacy.Marelic et al. (2004) concluded that all their tested technical elements (serve, serve reception, spike in the phase of attack, block and spike in the phase of counter attack variables) effectively determine the game outcomes.Pena, Rodriguez-Guerra, Busca, Serra (2013) studied on high-level men's volleyball in the Spanish Superliga during the 2010-2011 season, to see which skills and factors were more effective to predict winning and losing.They found that the points obtained in the break point phase, number of reception errors and the number of blocked attacks by the opponent were significant predictors of winning or losing the matches.
Gonzalez-Silva, Domingues, Fernandez-Echeverria, Rabaz, Arroyo (2016) sampled 5,842 game actions carried out by the 16 male category and the 18 female category teams that participated in the Under-16 Spanish Championship for to evaluate setting efficacy in young male and female volleyball players.Results showed that the best predictive variables of setting efficacy, both in female and male categories, were reception efficacy, setting technique and tempo of a set.
Rentero, Joao, Moreno (2015) analyzed libero's participation and their influence in the attack and defense phases in men's elite volleyball with the sample of 1,101 pass and defense game actions of the four highest-placed teams in the 2008 Beijing Olympic Games.The results of the study revealed that there were significant associations during the defensive stage of the game between the defending player and the defensive phase, the libero's defence predominating in zone 5; the defending player and defense efficiency, which is improved by the libero; the defending player and counterattack, as attacks increased in zone 6 when the libero was defending.
In another study of Silva, Lacerda, Joao (2014) shows that service points, reception errors, and blocking errors were the discriminating variables that identify the final outcome of the match (won/loss).Moreover, successful service points were the major variable most likely associated with match success (won).In this sense, increasing the effectiveness of service should be a top priority in coaching elite volleyball teams.
As seen in the literature examples above, the results of the match or the factors that determine whether to win or loss are always influenced by the researchers.In the researches conducted, one or more volleyball techniques or the effects of the player's positions were searched and the results were obtained.This study differs from others in that it contains all the techniques and player's positions in volleyball.
To test the success of the predicted efficacy model in predicting the match result, we compared the predicted and actual results.The model correctly predicted the outcomes of 83.45% of the games in TWVL and 78.23% of the games in TMVL.In other words, the probability of correct prediction by the model is 83.45% in women's league and 78.23% in men's league.
The results of this study were derived from the match data of teams in the Turkish Elite Volleyball League.However, to determine whether these results characterise the league, the same league should be reanalysed for both genders using data of the 2014-2015 season.Repeating the study for the elite men's volleyball league would determine whether the results are generalisable and would highlight the gender differences.These analyses have been planned as extensions to this study and are currently being investigated.

Conclusion
Statistical reports of sports matches contain many variables.Such numerous and varied data on the teams and individual players are difficult to organise into a performance evaluation by match analysis programs.Therefore, Central European Journal of Sport Sciences and Medicine

Cengiz Akarçeşme
by identifying the variables that significantly establish the match result, coaches can develop the contents of both development and tactical training programs.
In this context, Akarçeşme (2010) determined that the match results were well explained by an efficacy model based on the team's position and the technical skills that determine success or failure.The validity and reliability of the model was confirmed in an analysis of the next two years' worth of league data.The model can also explain gender differences in volleyball matches.

Recommendations
The data in the present study were sampled from Turkish men's and women's volleyball leagues in 2010 and 2011.In that season, teams in both leagues were entitled to align 3 foreign players and the teams were found to exercise this right.The study could be repeated for seasons in which teams were entitled to align 2 foreign players.Currently, we are analysing the data of the 2011-2012 and 2012-2013 seasons.These additional analyses will further validate the model and will reveal the tendencies of the leagues employing various numbers of foreign players, in terms of the EM variables.

Table 1 .
Results of the logistic regression prediction model for Turkish Women's Volleyball League

Table 2 .
Analysis of anomalous observations for the Turkish Women's Volleyball LeagueAs shown in Table2, the teams in observations 196, 173, 53, 272 and 157 lost the matches they were expected to win; conversely, the teams in observations 125, 293, 286, 58 and 145 won the matches they were expected to lose.

Table 3 .
Analysis of efficient observations 145 and 286 in the Turkish Women's Volleyball League

Table 4 .
Results of the logit regression model in the Turkish Women's Volleyball League after removing the efficient observations(observation 145 and 286)

Table 5 .
Multicollinearity diagnostic tests in the Turkish Women's Volleyball League (with observations 145 and 286 removed) Vol. 19, No. 3/2017 Is it Possible to Estimate Match Result in Volleyball: A new Prediction Model

Table 6 .
Classification table for the efficacy model in the Turkish Women's Volleyball League

Table 7 .
Logistic regression prediction model for the Turkish Men's Volleyball League

Table 8 .
Anomalous observation analysis in the Turkish Men's Volleyball League

Table 9 .
Analysis of efficient observations in the Turkish Men's Volleyball League Is it Possible to Estimate Match Result in Volleyball: A new Prediction Model As shown in the table 9; 901 and 935 observations are the most efficient observations which could affect the prediction model.Vol.19, No. 3/2017 Table 10 presents the results of the logit regression model after removal of the efficient observations 901 and 935, and Table 11 summarises the results of the multicollinearity analysis.The classification table for the TMVL EM is given as Table 12.

Table 10 .
Results of logit regression model in the Turkish Men's Volleyball League after removing the efficient observations (observations

Table 11 .
Multicollinearity diagnostic tests in the Turkish Men's Volleyball League (after removing observations 901 and 935)

Table 12 .
Classification table for the efficacy model in the Turkish Men's Volleyball League Vol. 19, No. 3/2017 Is it Possible to Estimate Match Result in Volleyball: A new Prediction Model