Julien's Starcraft Blog
Some moves are slightly good. Some moves are slightly bad. I tell you about them.
Monday, June 30, 2003
sammy sosa .164 .675 .312
does he suck or is it just a rough start? let's look at recent performance:
1998 .103 .734 .462
1999 .115 .726 .478
2000 .133 .722 .436
2001 .175 .735 .557
2002 .160 .741 .408
that 2001 is positively mcgwirean. it's his peak---few have had better---and he's on the decline. he should finish with between 600 and 700 home runs. hall of fame is beyond question. one of the top 5 ever at right field.
Thursday, June 26, 2003
fight shabby journalism! (luckily a blog isn't journalism)
phil rogers is a featured baseball writer at the chicago tribune. i've been suffering through his columns for years, so i take it as my duty to publicly skewer him.
i ran out of time to skewer him today. look for future skewerings.
good morning bloggers! today we examine the national league pitching leaders:
kevin brown 2.22
jason schmidt 2.23
hideo nomo 2.41
woody williams 2.55
mark prior 2.64
jae seo 2.66
kaz ishii 2.78
miguel batista 2.83
kerry wood 2.94
carlos zambrano 2.95
three cubs make the list. i didn't do that on purpose. i'm not even a cubs fan. i just live in chicago. i swear!
don't talk to me about wins.
ok if we thought era was important, we'd feel like we had a pretty good handle on things: tight race at the top, bunching in the high 2's.
but we would be deluding ourselves (hypothetical delusion . . . don't ya hate it?). era is an ok stat for measuring pitchers. the problem is it takes like 500 innings before it starts to tell you things. since old hoss radbourne doesn't pitch any more, we need other metrics to effectively evaluate a single season. it's funny that that thought doesn't occur to cy young voters. there's a lot that's funny about baseball.
new list (wal con pow):
kevin brown .068 .765 .083
jason schmidt .075 .705 .140
hideo nomo .102 .789 .143
woody williams .067 .819 .142
mark prior .075 .720 .161
jae seo .054 .868 .142
kaz ishii .164 .731 .139
miguel batista .086 .824 .111
kerry wood .122 .650 .209
carlos zambrano .124 .776 .104
where did kevin brown get that pow? some skill, some dodger stadium, and a lot of luck. batista and zambrano have also been lucky in that regard. wood has been unlucky. see the plethora of .14's? pitchers do not vary the way batters do.
jae seo has been hit lucky. you gotta love his control, but with that kind of contact he will not keep his era under 3. the three dodgers pitchers have also been hit lucky, and it's not a coincidence, it's defense. woody williams is another guy that benefits from exceptional defense.
so who's the best? schmidt, probably, followed by wood. wood's con is outstanding, and his pow will come down. brown and prior are next. seo, batista, and zambrano will slip. nomo, williams, and ishii will fall some, but their defenses will keep them in it.
who did we leave out?
javier vazquez .063 .708 .261
randy wolf .102 .751 .211
adam eaton .096 .765 .211
matt clement .107 .772 .207
kevin millwood .075 .776 .197
odalis perez .063 .801 .218
matt morris .060 .802 .190
ben sheets .051 .826 .243
carl pavano .042 .830 .204
jason jennings .114 .790 .215
these guys are all better than seo, batista, and zambrano. what do they have in common? bad luck in pow. vazquez is the best of this group. his .261 comes from that bandbox in san juan they call a stadium. shout out to a colorado pitcher for making the list.
soon: al starters, relievers.
know your source
i have seen different sources say different things about juan gonzalez: some say he'll reject the trade; some say he hasn't decided.
also i have heard both that kyle farnsworth has been suspended 5 games and will appeal, and that he's been suspended 3 games and will not appeal.
maybe he'll serve the first 3 and then appeal the last 2.
Sunday, June 22, 2003
craig biggio .099 .813 .201
craig biggio has reinvented himself as a center fielder. he plays the position, and holds it down. the reason his skills have not declined him into oblivion is he retains his speed: 16 sb 2 cs 2002.
he's overpaid, but he's good enough to start.
Saturday, June 21, 2003
paxton from chicago:
ok about your numbers:
wal = (bb + hbp) / (ab + bb + hbp)
con = (ab - k) / (ab)
pow = (tb - h) / (ab - k)
maybe you could discuss why you chose these exactly, because my impulse would be to use these formulae:
wal = (bb + hbp) / (ab + bb + hbp) // unchanged
con = (ab - k) / (ab + bb + hbp) // changed denominator
now con is more directly comparable to wal, also k-rate is 1 - wal - con. i think that's cool. also, philosophically, i think it better because when you are thinking about outcome of a partcular at bat you can just use the con number directly as the chance of the ball being put in play, rather than being forced to first multiply it by the chance of not walking. does that make sense?
pow = tb / (ab - k) // changed numerator
i think this number is just more elegant. and look at what you can do with it:
pow * con + wal = offense in bases per plate appearance
doesn't that seem like a great single number to compare players with? and it is so easily derived from the detailed stats pow, wal, and con. how good is player x? look at this number. why is he that good? look at wal con pow.
the reason for the stats i'm using is they represent the three fundamental skills of hitting: first, whether or not the hitter walks. (2) if he doesn't walk, then does he put the ball in play? (3) if he puts the ball in play, how hard does he hit it?
those are some cool features you describe, but the system as you construct it is just another "how good is this guy?" system. There are many, such as equivalent average, runs created per 27 outs, and ops. all have special features, but all basically say the same thing.
my system, i contend, takes into account the nature of the game, thus is a constructive way of thinking that increases understanding.
the reason hits are not included in pow is we want to be able to get meaning from small sample sizes, and hits are the most variable thing out there. it takes 1000 at-bats for hits to normalize. plus, the more extreme ratio between how hits are counted (hr/3b/2b 3/2/1) makes differences show up faster. ie it takes less plate appearances to know that corey patterson hits harder than troy o'leary than it used to.
the thing to bear in mind with pow is that home runs in some parks are doubles in others, so look to make sure that the player's hr and 2b are in line with career averages. actually, a refinement of pow would substitue for hr and 2b the following:
(hr + 2b)(career hr percentage) and
(hr + 2b)(career 2b percentage)
hr % = (hr) / (hr + 2b)
2b % = (2b) / (hr + 2b)
or instead of career you could use last three years.
josh from chicago has an interesting question. he thinks shawn estes sucks, and wants to know what his walconpow is, to see if dusty baker (or jim hendry . . . we'll pin this on someone) is a fool.
i don't mean to pick on dusty baker. i would do this to every manager if i saw all their games.
well, josh, you could cut out the middle man and run the numbers yourself, but then i wouldn't have a column to write. too short for a column. blurb.
ok here it is:
shawn estes .114 .871 .120
yup, he sucks. cubs = dumb.
it has been pointed out to me that baseballhq.com also uses walk percentage and contact percentage, along with a power rating and a few other things. i checked out their stuff. it's pretty good. my approach differs from theirs in that i see wal con pow (with a side of egg roll) as a foundation to base all hitting analysis on. i believe that research will show a systematic way to derive other metrics. for example, it will be possible to predict batting average based on contact and power. the real batting average will converge to the predicted batting average over time. voros mccracken basically did this with era, in his defense-independent pitching stats.
league average .099 .810 .192
data from last year combined with this year's hbp rate applied to 162 games. best i could do.
carl everett .117 .821 .352
that power will come down. this year will be not much better than the last two.
it will be a little better. but no more 1999--2000. that was his peak.
and he can't play center field anymore. the rangers need to find someone who can play center field. and hit better than doug glanville.
dg .040 .840 .083
vladimir guerrero .182 .864 .224
weird. vlad walks now. power is down. he's turned into a walcon.
i really gotta quit it with the walcon shit. prediction: vlad's power improves.
javy lopez .059 .823 .449
there was a link on our fantasy website to an article about guys that sucked last year but are good this year. i was going to run the numbers on them and see whether it would last, but the article is gone. i remember that javy lopez was on it, so we'll do him.
that kind of contact combined with that kind of power is super super super. i bet vlad's numbers are similar. we could call these guys "conpows". get it? i suppose then there would also be walcons, walpows, and that rarest of beasts, the walconpow. otherwise known as barry bonds.
ok enough. javy has never hit with that kind of power (few have), and he won't continue. expect 30 hr on the season. patience this year is bad as usual. what about contact?
2000 .075 .834 .237
2001 .080 .813 .194
2002 .089 .818 .169
contact is the same. so he's been slightly hit lucky. (he was unlucky last year.) with the moderate power gains he seems to have made, look for a year similar to 1998 (.284/.328/.540), ie badass.
Friday, June 20, 2003
jim thome .173 .706 .337
this goes in the category "good".
he's 32, though, thus on the decline (the decline rule works only for mortals. do not apply it to barry bonds.)
jim'll never be as good as he was last year:
2002: .209 .710 .525
those are the kind of numbers you get when you hit 52 hr and walk 127 times.
let's run his whole career:
1991 .058 .837 .134
1992 .093 .709 .133
1993 .176 .766 .271
1994 .125 .738 .346
1995 .184 .750 .324
1996 .203 .721 .418
1997 .199 .706 .414
1998 .174 .680 .428
1999 .210 .654 .402
2000 .180 .693 .378
2001 .179 .648 .513
2002 .209 .710 .525
2003 .173 .706 .337
total .182 .703 .396
as do most players, thome improved his patience and power until he reached his peak, which started in 1996 at the age of 25. notice the downward trend in contact. this is normal. it will eventually force him to retire. the power spike of 2001--2002 seems to be a slight anomaly, which does not bode well for the phillies and all the dollars they dumped.
it's been a great run, though. the center of the peak was 1999--2000, when he was 28--29. this is also normal.
compare with numbers we're used to (avg/obp/slg):
remarkable consistence. he's got a shot at the hall.
it's harder to tell with these numbers that the peak started in 1996. in '93 he developed patience and power. in '94 he slipped a bit in patience, but improved his power. in '95 he consolidated these gains. he became fully formed in '96.
why do they suck?
yesterday, i posted a list of cubs that suck. i'll repeat the list:
lenny harris .092 .888 .071
tom goodwin .035 .831 .072
eric karros .097 .901 .229
mark grudzielanek .064 .851 .094
ramon martinez .105 .857 .147
troy o'leary .067 .839 .138
these guys have a lot in common. they all make contact. they all swing the bat (a lot. dusty baker apparently loves this type of player.) so why do they suck?
2 reasons: patience, and power. none of these bitches can walk to save their life, and contact isn't going to help if you can't hit the ball out of the infield. result: low obp, low slg.
let's run some more numbers (avg/obp/slg):
lenny harris .169/.245/.225
tom goodwin .217/.244/.277
eric karros .282/.352/.489
mark grudzielanek .282/.329/.363
ramon martinez .269/.338/.395
troy o'leary .214/.260/.330
you can see who has gotten lucky by looking at batting average. karros has a little bit of power, but not enough to play first base. his numbers are too high because he's got his doubles and home runs backwards. he's platooning, but he should be cut. the cubs are already wasting $8 million on him; they don't need to waste a roster spot too.
grudzielanek is the other former dodger. by the end of the season, he will be in the tank. but the rightful starter, bobby hill, won't take over till next year, because dusty baker is a presbyophile.
the fact that these players are on major league rosters is the result of an overemphasis on strikeouts. gm's know strikeouts are important, but they don't know how to place them in the proper context. to succeed at the plate, you have to make contact and hit the ball hard.
i live in chicago, so i see a lot of cubs games.
bellhorn may have had his best year last year, but with the trade to colorado he's got a shot at a pension.
good baseball resources:
daily: lee sinins' atm reports---all the important news with statistics you can use.
scouting guide: baseball prospectus---their website sucks because it's mostly premium and they think they're the masters of science. clay davenport's player cards are good, though. and the prospectus triple play is a daily column that's better than most. premium things i would read if i had them: will carroll's injury reports, chris kahrl's transaction analysis. chris is the master of references that no one gets. as an added bonus, his sentences are labrynthine.
book: the bill james baseball abstract 2000
column: rob neyer
fantasy: join my fantasy leagues (free)! a fantasy league that mirrors true player value is a big help toward understanding the game. drop me a line if you wanna join.
whoa mark bellhorn got traded! colorado got him and a minor leaguer (travis anderson) for nothing (jose hernandez). actually hernandez can provide value at shortstop. contrary to popular belief, he fields the position. but they're gonna use him at third.
jose hernandez: .095 .630 .198
he's better than ramon martinez and *guffaw* lenny harris. that could count as an improvement, since they weren't gonna use bellhorn anyway.
bellhorn is gonna go nuts in colorado. he will win a starting spot.
i gotta find out who travis anderson is.
how long is dusty baker gonna drag lenny harris's carcass around?
Thursday, June 19, 2003
mark bellhorn .178 .669 .161
that pow will come up after he hits a couple more home runs.
choi: .192 .644 .391 (cf eric karros)
bobby hill: .097 .813 .153 (aaa)
these kids are young and can play. they will be part of the next cubs dynasty. assuming the cubs don't fuck up. but that makes an ass out of you and an ass out of me. or something.
mark bellhorn, though, has old player skills and is on the decline. 2002 was his best year.
carlos lee just hit into a 523 double play. the white sox have a problem with double plays. they have old player skills---patience and power, as opposed to young player skills---contact and speed. players with old player skills decline faster, and lose their defense.
pat burrell led off the ninth with a double off smoltz. smoltz is tired.
burrell, btw, is good: .137 .678 .325
he got removed for a pinch-runner. that's not a forward-thinking move.
now they're bunting. the braves might win this.
the phillies, of course, really wasted money on david bell.
i've never seen that before.
polanco stole second, and third baseman vinny castilla covered on the throw. he was standing there because they were in the thome shift.
my friend ch wants me to point out that jim thome had his best season last year, and the phillies wasted a lot of dollars on him.
ch also wants me to talk about how larry bowa is a terrible manager. i invited him to post on the blog but he declined. larry bowa just got thrown out. his major arguments were "that's bullshit" and "that's fucking bullshit."
let's look at some pitchers:
smoltz: .043 .659 .090
gagne: .067 .468 .085
but those are the two best.
smoltz just blew the save. don sutton said he'd converted 67 of 69, and that quick polanco poked one to right to plate 2.
the 13 is often a tough play, because the runner can get in the way. it can happen on any bunt or weak grounder to the right side. sometimes the first baseman gets involved, making it tougher. the second baseman has to cover first, and it's a bang-bang play (14 or 34). then there's that magic corridor between the mound and first. if a ball's fast enough to go through, but not fast enough for the second baseman, there's nothing they can do.
the bases are loaded, 1 out, in the ninth, and lenny harris is at the plate. that should be mark bellhorn. harris struck out. next batter: troy o'leary. that should be mark bellhorn. or choi. the cubs have a lot of bad hitters that get playing time:
lenny harris .092 .888 .071
tom goodwin .035 .831 .072
eric karros .097 .901 .229
mark grudzielanek .064 .851 .094
ramon martinez .105 .857 .147
(data through june 19)
grudzielanek and martinez are usable as backup infielders. the rest are zeroes. i suppose you could keep goodwin as a fifth outfielder, but they need somebody on the bench who can hit. and i'm not talking about choi, hill, and bellhorn. they should be starting.
oh yeah troy o'leary sucks too: .067 .847 .138
(does not include june 19: 1 pa 1 k)
dusty baker can't get enough crappy veterans.
kyle farnsworth .124 .646 .123
that's some contact avoidance! 16 w 40 k 2 hr 32.0 ip.
what a great game! wilson thought he could fight farnsworth and got planted. both benches emptied. dusty baker and ray knight were yelling at each other. it was chaos.
today kyle farnsworth became a man.
who are the best arms in the league? furcal may be the best at short.
did anyone see the throw home today from sammy sosa? i thought his arm was gone.
jose reyes looks good in the field.
choi, hill, and bellhorn all sat on the bench today while paul wilson---paul wilson!---held the cubs to 1 run in seven innings.
most commentators are intolerable. steve stone and chip caray have got to be the worst. except anything jeff brantley is involved in.
prior only went six innings today. that's encouraging.
kyle farnsworth is a motherfucker.
i heard vin scully for the first time 2 nights ago. i now know how baseball broadcasting should be done.
corey patterson .029 .771 .311
that kind of contact and power at his age (23)? he becomes a superstar when he learns to take a pitch.
he can also run, field, and throw.
he won't keep it up, but he'll be alright.
mark prior .075 .723 .152
that's the same thing as pointing out that he has 32 w 111 k 7 hr in 102.1 innings.
w = bb + hbp.
stats include today's game (usually they don't).
notes from a cubs game:
i think pitchers are beginning to close the gap that hitters opened up on them in 1993. the hitter explosion was based on growth juice (tm); the pitcher rebirth will be due to less career-ending injuries to young arms. people are finally, slowly, learning how to take care of their pitchers.
my revenue sharing/ salary cap plan: take each team's ticket and tv revenues (maybe others, whatever), and cut them in half. 50% goes to the league, 50% to the team. the league's half then gets divided by 30, one share for each team. that amount is the salary cap. teams use their shares to pay salaries. they cannot go over, and whatever they don't use they lose. obviously almost all of it will be used. teams use the other revenue they generate to pay for the farm system, park improvements, et al. this is a good system for the players because they are guaranteed half the revenue. the biggest problem is to make sure the teams get good value in their tv contracts. it can be insured by good accountants.
the worst thing about watching cubs games is chip caray.
mark prior is gonna win the nl cy young.
what are the formulas? (formulae?)
wal = (bb + hbp) / (ab + bb + hbp)
con = (ab - k) / (ab)
pow = (tb - h) / (ab - k)
we're including hit by pitch in our walks category because they're basically the same thing.
let's look at a recent mvp controversy:
sosa, 1998: .103 .734 .462 (wal con pow)
mcgwire, 1998: .248 .695 .653 (wal con pow)
as we expect, sosa made more contact while mcgwire walked more and hit harder. compare with stats we know:
sosa, avg/obp/slg: .308/.377/.647
mcgwire, avg/obp/slg: .299/.470/.752
the averages are similar, which suggests that the differences in con and pow balance each other out in that regard.
pitching is slightly different, because they don't give you the same stats:
wal = (bb + hbp) / (bf)
con = (bf - bb - hbp - k) / (bf - bb - hbp)
pow = (tb - h) / (bf - bb - hbp - k)
next time: pitcher examples, league averages.
here we go!
we are online. i sit around watching baseball, and i accumulate opinions. these opinions---til now---have been foisted upon my friends, well past their threshold of tolerance. no more! now i can absentmindedly scatter them (opinions, not friends) all over this blog. if anyone is stupid enough to read this (looks like somebody is!), they will find many (opinions, definitely not friends).
i've got this idea that i'm going to push. the idea is that the important stats to consider in evaluating a player's talent are these:
walk percentage (wal), contact percentage (con), power percentage (pow).
not normal stats, you observe. quite right! that's my unique angle that i present to the world . . . my niche. i love this blogging shit.
there will be plenty of time to discuss why i do things (lucky you). ok let's start now. what do hitters like to do? get on base and move runners over (this is the starting point for this blog). the former is adequately measured by obp, the latter by slg. the thing is, both include a substantial ball-in-play component. balls in play are highly random. therefore, it takes a long time before the statistics have meaning (at least a whole season).
the idea here is to find meaning in smaller sample sizes. in contrast to balls in play, walks, strikeouts, and home runs quickly converge to a level representative of players' abilities. thus our three stats, based on walks, strikeouts, and power.
basically we took the batting average out of obp and slg. obp without batting average gives you walk percentage. slg minus average is isolated power, a similar idea to our power percentage.
wait a minute isn't batting average important? yes, but we can interpolate it based on contact and power. this is where contact percentage comes in. you see, there are two aspects to hitting for average: making contact, and hitting the ball hard. these things are measured by con and pow. thus, if you know these numbers, you can predict what the player's average should be, given a significant sample size.
here's the cool thing: you can use the same numbers for pitchers. this time you want the numbers to be low. walk percentage measures control, contact percentage measures strikeouts, and power percentage measures the ability to keep the ball in the park.
baserunning and fielding are important aspects that are not taken into account by our method. they will be added to the discussion.