Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/18/84; site fisher.UUCP Path: utzoo!decvax!bellcore!petrus!sabre!zeta!epsilon!gamma!ulysses!allegra!princeton!astrovax!fisher!david From: david@fisher.UUCP (David Rubin) Newsgroups: net.sport.baseball Subject: Re: Lineup dependency Message-ID: <768@fisher.UUCP> Date: Thu, 26-Sep-85 14:23:14 EDT Article-I.D.: fisher.768 Posted: Thu Sep 26 14:23:14 1985 Date-Received: Sat, 28-Sep-85 12:42:05 EDT References: <453@philabs.UUCP> Distribution: na Organization: Princeton University.Mathematics Lines: 517 [">>"," " = me,">" = (shudder) him :-)] Don't panic, folks! Only about half of it is new material! >You disregard my explanation of the events, and have not proven, in any >sense, that on-base average and slugging average are independent of >factors such as who is batting in front or behind you. I can demonstrate that OBA and SA together will predict very well the run production of a team. All I have demanded is the same standard of evidence be applied to lineup effects: that you demonstrate that the consideration of such an effect improve our ability to project or predict run production, and that you provide some rationale for it. You've done the latter without doing the former. >................ You embrace those stats without showing >that any strong correlation exists between them and scoring runs (or >more precisely, that a stronger correlation exists than for, say, the >stat R + RBI - HR.) I can demonstrate such a correlation for OBA and SA; for your benefit, I will post them (this weekend, probably). As for R+RBI-HR, please note that it adds nothing to our understanding of run production to predict runs using runs!!! We have already noted that R's and RBI's are heavily and DIRECTLY dependent on one's teammates' actions: if the question we are considering is how does an individual player contribute, we must free him from the burden/benefit of his teammates and figure out how much each of the events he could contribute ALONE (outs, walks, hits, etc.) would contribute to producing runs on some "typical" team. You, too, recognize this principle, as it is the rationale for your argument for "lineup effects". The problem is not one of goals, but of methods. "Lineup effects" are, I suggest, illusions caused by using the wrong statistics to evaluate offensive performance. It is because you are tied to measuring INDIIVIDUAL performance with tools meant to evaluate TEAM performance that you must deal with something as archane and diffuse as lineup effects; when one considers statistics that are not directly influenced by one's teammates, one finds that there is no discernable lineup effect. In other words, if you took a player who remains with one team over the course of his career's prime, you would likely find the player's RBI and R totals fluctuating with the team's fortunes (strongly correlated), but his SA and OBA fluctuating "randomly" (weakly correlated). If you were to focus on the RBI's, you would persuade yourself there was such a thing as "lineup effect", because RBI's measure what the guys in front of you did as well as what you did. If you looked at SA, you would remain agnostic concerning lineup effects, because SA appears to fluctuate with little regard to the quality of the team. >.......................................It is not just my responsibility to >prove that lineup dependencies exist. It is also yours to prove that >they don't! You can never prove something doesn't exist (how does one proceed in a disproof of existance?). It is considered sensible in most circles to keep one's explanation of events as simple as possible: we need only consider new factors if they somehow improve our understanding of events. It is therefore the burden of one who wishes to include an "effect" to show its inclusion improves our knowledge or understanding, for if we can do as well without it, we have no reason to use it. >> I shall limit myself, therefore, to the general comment (call it >> Rubin's Law of Empirics, if you will) that HAVING A PLAUSIBLE >> EXPLANATION FOR AN ANTICIPATED EFFECT IS NOT EVIDENCE THAT THAT EFFECT >> HAS ACTUALLY OCCURRED. >Perhaps you should take a course or two in prob&stat and learn the >actual laws, instead of making up your own. Are you serious? Taking shots at my statistical sophistication is inappropriate (as well as incorrect), and serves only as an excuse for ignoring the truth of the statement that I referred to, tongue-in-cheek, as Rubin's Law of Empirics. >> All of Paul's explanations mean little, >> therefore, until he establishes that what his explanations explain has >> indeed happened! Only in the case of Mattingly does he attempt to >> actually demonstrate that a lineup effect exists, and I will therefore >> concentrate on it. Elsewhere, he merely shows lineup effects are >> consistent with his selected observations without either showing other >> explanations are inconsistent or that the observations would be >> inexplicable without lineup effects. >This is exactly the point I have been making for weeks. "it may be >misleading unless we know that the circumstances of the two categories >are otherwise similar..." Two players for different teams do not >satisfy this criterion, and thus their stats are not directly >comparable. For example, many, including myself, like Guerrero for >the MVP, but I don't favor him because he leads the NL in slugging >and on-base average. Those stats are irrelevant, since you can't compare >them to, say, Dale Murphy's stats. Why not? It's simple. 18 times >a year, Dale Murphy has to face the great Dodger pitching staff, which >is clearly the best in the league, while Guerrero faces the Braves' staff, >which is one of the worst. That's over 11% of the season. This is in >addition to other differences, such as the number of day/night games, >the different stadiums they play in, the number of double-headers they >play in, the number of day games after night games, etc. They don't even >play the exact same other teams, either! After all, if a team played >most of its games against Philadelphia earlier in the year, they faced a >much easier opponent than a team whose schedule calls for them to >face Phila now. The reverse is true for the Cubs. Playing them >before all their starters were injured is different than playing >them afterwards. Strange, but when I did try to adjust for these effects in the Carter-Pena discussion, you protested vociferously! I am all for adjusting for effects whose existance is demonstrable, and thus had called earlier for the inclusion of Palmer's "Park Factor", which considered, explictly or implicitly, park dimensions, day/night balance at the home field, and the quality of the hitter's own pitching staff. If it could be shown that some complex scheme to correct for the changing quality of the opposition is necessary (most teams remain about as talented in August as they were in May, and the ones that don't may not have a substantial effect), I would certainly entertain that correction. At the time I first brought up the matter of such adjustments, you held your hands up to your ears and screamed that he didn't want to hear about such stuff; as those factors did not strongly affect the relative offensive merits of Carter and Pena, I didn't press the issue then. Naturally, I'm stunned by your reversal; stunned, but not surprised. Incidentally, it is likely that Murphy derives more benefit from Fulton County Stadium than Guerrero derives from not having to face his own staff; adjusted statistics will likely favor Guerrero even more than the unadjusted ones do! >So, unless you can correct for ALL these factors, and others, to >ensure that your circumstances are similar, all the analyses that >you have posted are "worthless (possibly even worse: (they) may be >misleading". I will adjust for all the factors that can be demonstrated; "adjusting" for a factor that has not been demonstrated (and therefore cannot be quantified) is a theological exercise. Rather than asking ourselves how much a factor affects our statistics, we wind up asking ourselves how much we BELIEVE a factor affects our statistics. >The only attempt you have made to correct your stats is to include >a ratio which takes into account the differences between stadiums, >and how hard they are for hitters. You did not read, then, how the "Park Factor" was derived. Tsk, tsk. It measured how difficult it was to produce runs in a particular park, and therefore implicitly considered dimensions, elevation, day/night games, etc, etc, and corrected for the prowess (or lack thereof) of the home staff. >................................But even this attempt showed your >statistical inexperience. Saying, for example, that park A is 10% percent >harder to hit in than park B because the overall averages (of say, slugging) >are 10% lower, is a valuable and meaningful stat when applied to the whole >group of hitters - it provides information on the park to its owners. >But it is TOTALLY MEANINGLESS to apply this stat to individual batters >in this park. One must also know the shape of the distribution. It could >be that almost nobody hits 10% worse in that park - that many hit much worse >or better, and it averages out to 10%. For example, if a country's families >have 2.3 children on the average, it doesn't mean that anyone has 2.3 >children, or even that most families have 2 or 3 children. Bivariate >distributions are not uncommon, and in these, almost noone is around >the mean. You are correct, but need not worry. It is necessary to check that the detriment/advantage supplied by a home park effects the players equally (or that deviations from equality are random, rather than systematic). You will be pleased, therefore, to hear that such deviations are binomially/normally distributed, and that where individual players fall on these distributions appears random, and that the distribution of ALL players is tighter once these effects are taken out. >Furthermore, the reason that I use only Mattingly is that these stats >are rarely available. It's much easier to compute personal averages >such as batting average, slugging average, runs, RBI, etc. than to >compute how much a batter tends to improve the stats of those batting >ahead of him or behind him, etc. We almost never see these stats. We >don't often enough see stats such as batting average with runners in >scoring position, etc. You criticize me for the deficiencies of baseball >statisticians everywhere. It's not my fault, so don't criticize me >for it. I know it's not your fault. I'd love to see such breakdowns myself; supposedly, that's what Bill James's "Project Scoresheet" is in the process of doing. If this enlarged data base should provide someone with the means to prove a "lineup effect", I will change my tune, naturally. However, it is difficult (and unwise) to believe a dramatic effect could "hide" in currently available data; if there is some "lineup effect", it likely to far smaller (and possibly even of a far different nature) than you believe. >> Moreover, even if Paul COULD assure us that this was so, he does not >> have nearly enough data. Examine, in particular, the data for batting >> second: it is based on 35 games, i.e. about 100-150 at bats. Most >> fans will not put much store in a player's average after 35 games >> (early May), and for good reason: the player has not yet accumulated >> enough at bats for us to form any reasonable opinion as to his likely >> seasonal productivity. We are talking about guessing whether a player >> is hitting .300 or .400 based on that many at bats: it would not be at >> all unusual for the difference (10 to 15 hits) to be due to a "hot" or >> "cold" streak (what Statisticians conveniently label "random", but we >> may understand as being that which is beyond our knowledge). We would >> need to have many more at bats (perhaps in a couple of more seasons we >> will) before we could say that the difference is due to the position >> in the lineup rather than a propitious hot streak. To put it another >> way, if a lifetime .300 hitter were to have a .400 average on May 5th, >> would you tentatively conclude (until further info was available) that >> the man would bat .400 for the season? Of course not. You would >> correctly conclude that he is more likely to hit .300 from June >> through September than .400. He may just have had a good April... >..............................................Mattingly's hot >stats for the second position were not compiled in one streak... >....................................So, it is NOT the result of a hot streak. You misunderstand what I mean by a "hot" streak. I mean any random fluctuation caused by the smallness of the sample. Perhaps I should have clarified as follows: pick, at random, ANY 120 of Mattingly's AB's. Calculate his BA. For all his AB's, his BA is about .320 (last time I looked). You will find, if you do this, say, 100 times, that five to ten times you will get a .400+ BA for Mattingly. In other words, there's a greater than 5% chance that any particular .320 hitter will hit over .400 for 120 RANDOM at bats; if we check out, say, 20 major leaguers who are batting between .300 and .340, and pick 120 at bats for each of them at random, we would only be surprised if NONE of them hit over .400 during that span. Since it is likely this is what TSN did, intentionally or unintentionally, the statistic carried nothing to contradict my inclination that the .400 average meant nothing. >As for right-handed vs. left-handed opposition, I checked the games from >August 5 on. There were both right-handed and left-handed opponents. >He is playing full-time in that spot, so he faces all types of pitching. >Martin moved him to 2 on August 5 because of his excellent production >in that spot before. The question, though, is whether Mattingly faced lefty/righty pitching in the same proportion as a #2 hitter as he did as a #3 hitter. Moreover, I don't think he is REGULARLY batting #2, as every time I watch the Yankees (about half a dozen times in the past month), he's batting third. Too bad, too: if he batted second the rest of the way, we may have had enough data to say something about Mattingly in '85. >> Even if it were established for Mattingly, it would hold only for >> Don Mattingly with the current Yankees: to apply it to, say, Tony >> Pena, it would have to be demonstrated for a wide variety of players on >> a wide variety of teams. Still, it would be quite a surprise to me if >> anyone could get even that far. >I see! Whenever I come up with evidence, it counts only for that >case, but you have never detailed an instance of a player changing, >say, his lineup position and keeping the same OBA and slugging pct., I could come up with a wide variety of such instances. Shall I spend an hour with my Baseball Encyclopedia? To establish a general principle, I won't require that you prove it in every instance, but I don't feel I'd be unreasonable to remain dubious even if it were proved for one. >EVERYBODY knows that lineups are interdependent! "Everybody" "knows" this, because "everybody" evaluates players on the basis of RBI's and R's. Yes, lineups are interdependent in scoring runs. No, lineups do not substantially effect INDIVIDUAL performance. That Fred Xyzz is more likely to bat in a run with a runner on first is what "everybody" DOES know; that Fred Xyzz is more likely to hit a double with a runner on first is something that is not known by "everybody"; certainly, it is not yet known by me. >.......................................................Try watching a game >sometime (instead of just reading numbers). I watch, on TV or in person, about 100 games a season. >.................................................You'll see that when a runner >is on base, it affects (among other things): > 1) the way the pitcher throws. Using the stretch instead of a full > windup definitely hurts most pitchers' performances. Otherwise, > there would be no need for anyone to ever windup. > 2) the pitch selection; > 3) the defensive alignment. No doubt, but none of these things is done often enough to substantially effect a player's OB or SA. Let's say, for example, that a player gets 500 AB's. On a really lousy team, there's a runner on when he bats, say (these are only guesses; if you have the real numbers, go ahead and substitute, as I doubt that I am SO far off as to invalidate my argument) 25% of the time, while with a really good team, it might be 50% of the time. The lucky player gets an extra 125 AB's with runners on. Consider #3; this lucky player, if he's a right-handed pull or straight-away hitter gets the secondbaseman in a position where the second baseman is less likely to make the play. Let's say he is a contact-hitter who NEVER strikesout, and he hits lots of groundballs, with few down the line. Then he might hit a groundball toward the secondbaseman about 20% of the time, and the secondbaseman may now convert only two thirds of them, rather than three quarters of them, into outs. So we have 125*.2*(.75-.67) is an extra four or so singles over the course of the season. If the batter in question strikes out some, or hits a lot of fly balls, than the difference is even less. Of course, with 125 extra shots at an RBI, THAT total will rise substantially. I could argue similarly on the other points. My point is not that these things are fiction, only that it is unlikely that they SUBSTANTIALLY affect a player's SA or OBA. The numbers I use are unimportant; what is important is the plausibility of my argument that we ought to be careful not to confuse existance with significance. >By the way, if you still doubt the existence of lineup dependency (which >you undoubtedly still do) then answer the following question: > If there were no lineup interaction, then all managers would bat their > best hitter first, then their second-best, etc. to give them the > most opportunities to hit. Thus, according to your criteria (OBA and > slugging pct), the way to optimize the team's OBA and slugging pct > is to bat the best in these categories first, the next-best second, > etc. We would see Carter batting leadoff for the Mets, and Coleman > would not be the leadoff hitter for St. Louis, McGee would be, followed > by Clark. Coleman would be somewhere around 6 or 7. Come to think of it, > since Cedeno has been playing for the Cards and the way he has been > hitting, he would be batting leadoff. Also, Guerrero would be > hitting leadoff for LA (absurd!). There is lineup interaction on a team's run production; I only deny its significance in judging individual performance. Using my criteria, a manager would be disposed to bat his top OBA men near the top of the order and his top SA men in the middle. I would not have Carter bat leadoff (you are a silly one, aren't you?), but I would drop Wilson from the top of the order. I would also switch Coleman and McGee around, but as long we were going with two table setters, both would be secure near the top of the order. You are confused: I do NOT say that lineup doesn't affect a team's performance, only that it has precious little effect on an individual's performance. > As ANY real baseball fan knows, managers carefully > pick the order to help run production, e.g. alternating left-handed > and right-handed batters, and putting speedsters in front of hitters > who hit well with men in scoring position. WHY WOULD THEY BOTHER TO > DO THIS IF THERE WERE NO LINEUP INTERACTION??? Why not bat Mattingly > leadoff, to get him more atbats? Maybe the fact that he would be > batting behind a much weaker hitter just MIGHT have a teeny-weeny > little bit to do with it?! Nyahh. The reason that we don't bat Mattingly lead-off is not that we fear his production will drop, but because we fear his production will be wasted. There is a difference. > Thus, we see that some excellent managers, such as Whitey Herzog, > deliberately put a player like Coleman, who has a lower OBA and > slugging average than McGee, in the spot where he will get the most > at-bats, thus effectively reducing the overall OBA and slugging pct of > his team. Do you really think he is deliberately reducing the run-scoring > ability of his team? Or do you just think that all these baseball > professionals are sadly misguided? I think Herzog is making a mistake. Not a big one, but probably one that will cost him a few runs over the course of the season. Herzog is not sadly misguided, just slightly in error. Herzog makes mistakes, Benjamin makes mistakes, even Rubin makes mistakes! That we HOPE that Herzog makes them less frequently is no guarantee of his infallibility. I vaguely recall Herzog being fired from a couple of jobs. Perhaps he did make mistakes...or do you believe that the professionals running the Rangers and the Royals did?? Some of these professionals must have erred if a firing was necessary.....Of course, you will argue that Herzog knows so much, I cannot question him. Thus I ask you: if there thirty professional managers who, in a given situation, would do ten different things, does that make most of them "sadly misguided"? Of course not; men of good faith can disagree without calling one another idiots. I reserve the phrase "sadly misguided" for those who will not even examine alternatives. Maybe Benjamin would call me sadly misguided for batting McGee ahead of Coleman, but I doubt that Herzog would do so. As for team OBA/SA vs. individual OBA/SA, see below. > The only other alternative is > that TEAM RUN-SCORING ABILITY IS NOT DIRECTLY CORRELATED WITH > TEAM OBA OR SLUGGING, i.e., these stats aren't all you crack them > up to be. There must be other factors, e.g., speed. There are other factors. They just don't provide many runs that are not already accounted for by OB and SA. Coleman has stolen 100+ bases, and has been caught, say, 30 times. By the best estimate available, Coleman's base stealing has given the Cardinals an extra .3*(100-2*30)= 12 runs. (The forumula was empirically derived; it is how many runs an average team would gain if a player had 100SB, 40CS rather than just wait on first for the next player to put the ball in play. How many runs he has meant to the Cards this year may be somewhat higher (or lower), but we who do not have score sheets for all Card games cannot otherwise make a better guess. Of course, anyone who knows how many CS Coleman has can improve matters by substituting for my guess) > To rephrase this point, so that you will have less chance of > misinterpreting it, if Guerrero's slugging avg and OBA are what > are most important to the Dodgers, then he should bat leadoff, > so as to maximize the team's slugging avg and OBA. He doesn't, > and the very idea seems preposterous. Either Lasorda doesn't > understand the game as you do, or your emphasis on OBA and > slugging is wrong. Which is it? Lasorda and I both agree that much of Guererro's SA will be wasted if there are no men on base. As Lasorda and I have found that it's far easier to scrape someone up who has a decent OBA then it is to get someone who has a good SA, we both place a greater premium on Guererro's power. Certainly, it is NOT true maximizing a team's OBA and/or SA is the SAME as maximizing the teams run production, and I have never said that it was. I have suggested it's pretty darn close, though. The relationship between team OBA, SA, and run production is close, but not exact. It would cost the Dodgers some runs to bat Guerrero lead-off, but not because Guerrero wouldn't be a good lead off man. You've merely shown that OBA, SA, and runs are not identical: another straw man bites the dust! >The lineup can even affect the selection of relief pitchers. And haven't >you ever heard a manager say that what he really needs is a left-handed >power-hitter (or more speed in the lineup, etc.)? Why are these things >important to managers if the players in lineups don't interact? Again, you misunderstand what I am saying. The new left-handed power hitter may see big changes in his RBI totals, and his new team may see a surge in runs scored, but the new player is unlikely to see any substantial change in his OBA and SA, once those two are properly adjusted. >> I suppose Paul believes Carter has a special dispensation: in >> moving from the Expos to the Mets, he gains by being surrounded by >> Keith, Darryl, and George, while those three do NOT gain from Gary's >> presence. The fact is, the production of all four has remained about >> the same over the past two years, an argument AGAINST lineup effects. >Or an argument that Carter is about as productive as Hubie Brooks is. Correct. It says a lot about lineup effects if they indicate that Carter is about the same hitter as Brooks is. It says just how off the wall they are... Of course, I should have expected this. Brooks is about as productive a player as Pena, and so Paul must assert that Brooks is about on par with Carter. That is, of course, why the Mets were obliged to throw in Youmans, Fitzgerald, and Winningham into a deal involving palyers of equal value. Well, Paul, if you're right, the Mets and Expos managements must be mistaken about Carter's value vis a vis Brooks. So you, too, find yourself in contradiction with baseball "authority". Let us all savor this moment: it is as if the Pope were found guilty of heresy! >Strawberry is having a better years, and all the other three are down, >except for Carter's HR rate: >(these 1985 stats are as of 9/12; the 1985f stats are approximations to >what they would have at the end of the season if their surrent averages >continue) > BA HR RBI >Strawberry: > 1984 .251 26 97 > 1985 .282 23 66 > 1985f .282 27 77 (.282 38 113) >Hernandez: > 1984 .311 15 94 > 1985 .291 10 79 > 1985f .291 12 92 >Foster: > 1984 .269 24 86 > 1985 .254 17 66 > 1985f .254 20 77 >Carter: > 1984 .294 27 106 > 1985 .281 26 77 > 1985f .281 30 90 Read my lips: I HAVE NEVER NEVER NEVER NEVER DENIED THAT LINEUPS EFFECT RBI'S!!!!! To show an increase in RBI's shows the TEAM has had a better (or worse) year, not that the player has had a better or worse year. Looking at the specifics, you'll find that Hernandez, Carter, and Foster all show something of a drop-off from the last two years and Strawberry shows a definite improvement (this is apparent when looking at SA and OBA; BA and HR give us a glimpse of it). You would argue that Carter's introduction strengthened/weakened the Mets' lineup, but this would lead to general rise/fall in INDIVIDUAL production. There is no such general rise/fall; as a GROUP, one would be hard pressed to say the four were doing better or worse than last year. What we do see is (1) Carter and Hernandez are having "typical" seasons. Their slight drop is due to the fact they both had outstanding seasons the previous year. (2) Foster fell off a bit. This is expected from 36 year olds. (3) Strawberry has improved. He was expected to, with or without Carter, and will likely further improve next year. Fact is, these are the kind of outputs we would have expected from all four had Carter remained in Montreal....... David Rubin {allegra|astrovax|princeton}!fisher!david P.S. Remember, Paul, that I deny lineup effects only with regard to SA and OBA, and that are argument is over INDIVIDUAL, not team, production. Repeat this to yourself five times before you write a rebuttal. P.P.S. Paul, you also dropped a lot of smiley faces, e.g. when you declared me to be statistically naive and understanding baseball as well as a Martian. Fortunately, I KNOW you didn't mean to insult, but shouldn't you be more careful for the sake of others who are not as intimately familiar with your tolerant nature? P.P.P.S. There is something called Linear Weights that does even better with run production that OBA and SA; it includes things you object to having left out, such as SB's. It is a SLIGHT improvement, while being a GREAT increase in complexity. The increased complexity, in my view, is too great to be justified by this slight improvement. You may well think otherwise.