Analysis of Data in Network and Natural Language Formats
The work herein describes a predictive model for cricket matches, a method of evaluating cricket players, and a method to infer properties of a network from a link-traced sample. In Chapter 2, player characteristics are estimated using a frequency count of the outcomes that occur when that player is batting or bowling. These characteristics are weighted against the relative propensity of each outcome in each of 200 game situations, and incorporate prior information using a Metropolis-Hastings algorithm. The characteristics of players in selected team rosters are then fed into a Monte Carlo Markov chain system to simulate game outcomes. The winning probabilities of each team are shown to perform similarly to competitive betting lines during the 2014 Cricket World Cup. In Chapter 3 the simulator is used to estimate the effect, in terms of expected number of runs, of each player. The effect of the player is reported as expected runs scored or allowed per innings above an average player in the same batting or bowling position. Chapter 4 proposes a method based on approximate Bayesian computation (ABC) to make inferences on hidden parameters of a network graph. Network inference using ABC is a very new field. This is the first work, to the author’s knowledge, of an ABC based inference using only a sample of a network, rather than the either network. Summary statistics are taken from the sample of the network of interest, networks and samples are then simulated using hidden parameters from a prior distribution, and a posterior of the parameters is found by a kernel density estimate conditioned on the summary statistics. Chapter 5 describes an application of the method proposed in Chapter 4 to real data. A network of precedence citations between legal documents, centered around cases overseen by the Supreme Court of Canada, is observed. The features of certain cases that lead to their frequent citation are inferred, and their effects estimated by ABC. Future work and extensions are briefly discussed in Chapter 6.
Keywords: Simon Fraser University; approxmiate Bayesian computation; kernel density estimation; network inference; Cricket; Metropolis-Hastings; moneyball; player evaluation; simulated annealing; text analysis