Thursday, March 04, 2004
the algorithm
the baseball pod project

the algorithm asks a series of questions. the answer to each question is based on certain aspects of the right-now profile and informs both what questions will be asked next and what their answers will be. as it now stands:

1. what type of pitch does the pitcher throw?

the count, the pitcher's preferences, and the batter's strengths and weaknesses are the main factors in the answer of this question. manager and catcher come into play, as well as which pitches have been working, in the game overall, and in previous matchups with this batter. park also has an effect.

the answer to this question will inform all subsequent questions. batters have different abilities with respect to each type of pitch. eg: batter a swings at high fastballs 75% of the time, parking 3%.

2. is the pitch anywhere near the strike zone?

for some batters, the answer will almost always be yes. nobody is afraid of the power stroke of neifi perez. for others, the answer will frequently be no.

if the answer is no, the possibilities are (1) ball and (2) hit-by-pitch. if the answer is yes, go to question 3.

3. does the hitter swing?

pitch recognition and tendencies with each pitch are factors here. a general picture of questions 2 and 3 can be seen in each player's wal stat.

if the answer is no, the possibilities are (1) ball and (2) strike. if the answer is yes, go to question 4.

4. does the hitter make contact?

here we use a pitch-by-pitch version of the con stat. different players will have different contact abilities with different pitches.

if the answer is no, the result is a strike. if the answer is yes, the possibilites are (4.1) foul ball and (5) ball in play.

4.1 foul ball

possibilities are (1) strike, (2) out, and (3) nothing.

5. what kind of contact is it?

this is our current focus of research. there are three main categories: deep drive, line drive, and weak contact.

deep drives result in home runs, doubles, triples, and outs. factors are power, fly ball tendencies, and outfielders.

line drives result in doubles, triples, singles, and outs. main factors are power and fly ball tendences. fielders have less effect, varying chiefly in the ability to prevent extra bases. most double plays come from this category.

power and fly ball tendencies, incidentally, are influenced by hitters and pitchers. power is more of a hitter thing, but pitchers do have an effect. park has an effect on everything and is assumed to be a factor always.

weak contact includes bloops and ground balls. the infield and the speed of the runner are main factors. handedness affects speed to first. pop flies are included here as well.

that's it in a nutshell. i'm gonna eat breakfast.
