Monday, March 08, 2004
if the simulator is the brain, the database is the guts. the algorithm, then, is the musculoskeletal system, and the theory is, um, the pulmonary system. yeah, the pulmonary system.

we grabbed sean lahman's baseball archive and loaded it into sequel server. this has been nice, as we now have wal con pow's for every player in history. we would like to have minor-league data, and the holy grail is every pitch of every game. we have ripped 2003 pitches from espn.com, but still need to parse them.

the next big project is the creation of the right-now profiles. the right-now profiles will be a statement of every player's current abilities. eventually, a report from the right-now profiles will be the input for the simulator as it crunches games.

what do we need to consruct rnp's? theory. that's the next post. reports from the historical profiles will inform the statistical work that needs to be done. once that work is done . . . the work is never done, but once it is good enough . . . we'll have formulae through which we can put historical data. throughput.

rnp's will contain all the data the simulator needs to answer the questions asked by the algorithm. these data will include: pitch frequencies by count, chance it's hittable (by pitch), chance he swings, chance of contact, mean and standard deviation of angular postition (fair or foul?), the other angle (up/down) and the distance, speed to first, chance of extra bases, base-stealing properties, and baserunning tendencies.

these data will be kept for pitchers and batters. except the baserunning ones, which will be for fielders and batters.

all data will be park-adjusted. we need to do park tables. that'll be theory.

all your base are belong to us.
