Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/5/84; site philabs.UUCP Path: utzoo!decvax!linus!philabs!dpb From: dpb@philabs.UUCP (Paul Benjamin) Newsgroups: net.sport.baseball Subject: Lineup dependency Message-ID: <453@philabs.UUCP> Date: Mon, 23-Sep-85 18:09:38 EDT Article-I.D.: philabs.453 Posted: Mon Sep 23 18:09:38 1985 Date-Received: Tue, 24-Sep-85 17:17:07 EDT Distribution: na Organization: Philips Labs, Briarcliff Manor, NY Lines: 280 Alright, folks, here's another exceedingly long posting for anyone who cares to keep track of this argument over what baseball statistics can and cannot mean. It consists of a point-by-point rebuttal of a posting by David Rubin. > First, let me say right off that while I disagree with what most of > Paul wrote, if I countered all his points, > (a) this article would be another monster, and > (b) general principles would be lost among specifics. > > Much of Paul's arguments are anecdotal in nature: he brings up a case > which he believes supports his position, and concludes that, since his > explanation is CONSISTENT with his own observations, it must be TRUE. > As an example, he credits McGee's year to Coleman; he is satisfied > that since his explanation makes sense, > > (1) he may disregard alternate explanations of the event, and > (2) he need not further investigate. I wish that, for once, you would read what I wrote. The points I presented were not of my own making. They are the opinions of, among others, Billy Martin, and the author of the article. Have you read the article? Also note that everything you have said above can be said about you! You disregard my explanation of the events, and have not proven, in any sense, that on-base average and slugging average are independent of factors such as who is batting in front or behind you. Your evidence is completely anecdotal. You embrace those stats without showing that any strong correlation exists between them and scoring runs (or more precisely, that a stronger correlation exists than for, say, the stat R + RBI - HR.) It is not just my responsibility to prove that lineup dependencies exist. It is also yours to prove that they don't! > I shall limit myself, therefore, to the general comment (call it > Rubin's Law of Empirics, if you will) that HAVING A PLAUSIBLE > EXPLANATION FOR AN ANTICIPATED EFFECT IS NOT EVIDENCE THAT THAT EFFECT > HAS ACTUALLY OCCURRED. Perhaps you should take a course or two in prob&stat and learn the actual laws, instead of making up your own. > All of Paul's explanations mean little, > therefore, until he establishes that what his explanations explain has > indeed happened! Only in the case of Mattingly does he attempt to > actually demonstrate that a lineup effect exists, and I will therefore > concentrate on it. Elsewhere, he merely shows lineup effects are > consistent with his selected observations without either showing other > explanations are inconsistent or that the observations would be > inexplicable without lineup effects. > In other words, a simple breakdown such as this is > worthless (possibly even worse: it may be misleading) unless we also > know that the circumstances of the two categories (batting 2nd vs. > batting 3rd or 4th) are otherwise similar; otherwise, it may be some > other factor (such as lefty-righty, home-away, grass-turf, day-night, > etc.), strongly correlated with the categories, that is driving the > discrepancy (Statisticians refer to this confusion of one cause with > another as "confounding"). Marvelous! Perfect! I'm SO glad you said this. It's much easier to shoot down someone's argument when he provides the ammunition himself. This is exactly the point I have been making for weeks. "it may be misleading unless we know that the circumstances of the two categories are otherwise similar..." Two players for different teams do not satisfy this criterion, and thus their stats are not directly comparable. For example, many, including myself, like Guerrero for the MVP, but I don't favor him because he leads the NL in slugging and on-base average. Those stats are irrelevant, since you can't compare them to, say, Dale Murphy's stats. Why not? It's simple. 18 times a year, Dale Murphy has to face the great Dodger pitching staff, which is clearly the best in the league, while Guerrero faces the Braves' staff, which is one of the worst. That's over 11% of the season. This is in addition to other differences, such as the number of day/night games, the different stadiums they play in, the number of double-headers they play in, the number of day games after night games, etc. They don't even play the exact same other teams, either! After all, if a team played most of its games against Philadelphia earlier in the year, they faced a much easier opponent than a team whose schedule calls for them to face Phila now. The reverse is true for the Cubs. Playing them before all their starters were injured is different than playing them afterwards. So, unless you can correct for ALL these factors, and others, to ensure that your circumstances are similar, all the analyses that you have posted are "worthless (possibly even worse: (they) may be misleading". The only attempt you have made to correct your stats is to include a ratio which takes into account the differences between stadiums, and how hard they are for hitters. But even this attempt showed your statistical inexperience. Saying, for example, that park A is 10% percent harder to hit in than park B because the overall averages (of say, slugging) are 10% lower, is a valuable and meaningful stat when applied to the whole group of hitters - it provides information on the park to its owners. But it is TOTALLY MEANINGLESS to apply this stat to individual batters in this park. One must also know the shape of the distribution. It could be that almost nobody hits 10% worse in that park - that many hit much worse or better, and it averages out to 10%. For example, if a country's families have 2.3 children on the average, it doesn't mean that anyone has 2.3 children, or even that most families have 2 or 3 children. Bivariate distributions are not uncommon, and in these, almost noone is around the mean. Furthermore, the reason that I use only Mattingly is that these stats are rarely available. It's much easier to compute personal averages such as batting average, slugging average, runs, RBI, etc. than to compute how much a batter tends to improve the stats of those batting ahead of him or behind him, etc. We almost never see these stats. We don't often enough see stats such as batting average with runners in scoring position, etc. You criticize me for the deficiencies of baseball statisticians everywhere. It's not my fault, so don't criticize me for it. > Moreover, even if Paul COULD assure us that this was so, he does not > have nearly enough data. Examine, in particular, the data for batting > second: it is based on 35 games, i.e. about 100-150 at bats. Most > fans will not put much store in a player's average after 35 games > (early May), and for good reason: the player has not yet accumulated > enough at bats for us to form any reasonable opinion as to his likely > seasonal productivity. We are talking about guessing whether a player > is hitting .300 or .400 based on that many at bats: it would not be at > all unusual for the difference (10 to 15 hits) to be due to a "hot" or > "cold" streak (what Statisticians conveniently label "random", but we > may understand as being that which is beyond our knowledge). We would > need to have many more at bats (perhaps in a couple of more seasons we > will) before we could say that the difference is due to the position > in the lineup rather than a propitious hot streak. To put it another > way, if a lifetime .300 hitter were to have a .400 average on May 5th, > would you tentatively conclude (until further info was available) that > the man would bat .400 for the season? Of course not. You would > correctly conclude that he is more likely to hit .300 from June > through September than .400. He may just have had a good April... Again I wish you would actually read the article before you respond to it! Of course, I know you already know everything :-) Mattingly's hot stats for the second position were not compiled in one streak. He started the season batting 3-4, then moved him to 2 in May for 17 games. He was then moved back to 3-4, but occasionally in June and July batted 2. The article does not give stats for those instances alone, but states that it "worked like a charm". He was still usually batting 3-4, but was moved to 2 on August 5, when Martin became aware of the stats for his earlier production in the 2 spot. So, it is NOT the result of a hot streak. As for right-handed vs. left-handed opposition, I checked the games from August 5 on. There were both right-handed and left-handed opponents. He is playing full-time in that spot, so he faces all types of pitching. Martin moved him to 2 on August 5 because of his excellent production in that spot before. > Even if it were established for Mattingly, it would hold only for > Don Mattingly with the current Yankees: to apply it to, say, Tony > Pena, it would have to be demonstrated for a wide variety of players on > a wide variety of teams. Still, it would be quite a surprise to me if > anyone could get even that far. I see! Whenever I come up with evidence, it counts only for that case, but you have never detailed an instance of a player changing, say, his lineup position and keeping the same OBA and slugging pct., but I am supposed to swallow your arguments! It's interesting. When I respond to your postings, I feel like I'm trying to explain baseball to a Martian. You know so little about the game! EVERYBODY knows that lineups are interdependent! Try watching a game sometime (instead of just reading numbers). You'll see that when a runner is on base, it affects (among other things): 1) the way the pitcher throws. Using the stretch instead of a full windup definitely hurts most pitchers' performances. Otherwise, there would be no need for anyone to ever windup. 2) the pitch selection; 3) the defensive alignment. Thus, if the batter ahead of, say Mattingly, gets on base more often, is a threat to steal, and gets in scoring position more often, he can (and does) affect whether Mattingly gets a hit or not. Perhaps we should just forget this whole argument. You will continue to emphasize the individual aspects of the game, and I will continue to emphasize the team aspects. After all, if we both enjoy the game, that's the purpose of baseball anyway. By the way, if you still doubt the existence of lineup dependency (which you undoubtedly still do) then answer the following question: If there were no lineup interaction, then all managers would bat their best hitter first, then their second-best, etc. to give them the most opportunities to hit. Thus, according to your criteria (OBA and slugging pct), the way to optimize the team's OBA and slugging pct is to bat the best in these categories first, the next-best second, etc. We would see Carter batting leadoff for the Mets, and Coleman would not be the leadoff hitter for St. Louis, McGee would be, followed by Clark. Coleman would be somewhere around 6 or 7. Come to think of it, since Cedeno has been playing for the Cards and the way he has been hitting, he would be batting leadoff. Also, Guerrero would be hitting leadoff for LA (absurd!). As ANY real baseball fan knows, managers carefully pick the order to help run production, e.g. alternating left-handed and right-handed batters, and putting speedsters in front of hitters who hit well with men in scoring position. WHY WOULD THEY BOTHER TO DO THIS IF THERE WERE NO LINEUP INTERACTION??? Why not bat Mattingly leadoff, to get him more atbats? Maybe the fact that he would be batting behind a much weaker hitter just MIGHT have a teeny-weeny little bit to do with it?! Thus, we see that some excellent managers, such as Whitey Herzog, deliberately put a player like Coleman, who has a lower OBA and slugging average than McGee, in the spot where he will get the most at-bats, thus effectively reducing the overall OBA and slugging pct of his team. Do you really think he is deliberately reducing the run-scoring ability of his team? Or do you just think that all these baseball professionals are sadly misguided? The only other alternative is that TEAM RUN-SCORING ABILITY IS NOT DIRECTLY CORRELATED WITH TEAM OBA OR SLUGGING, i.e., these stats aren't all you crack them up to be. There must be other factors, e.g., speed. To rephrase this point, so that you will have less chance of misinterpreting it, if Guerrero's slugging avg and OBA are what are most important to the Dodgers, then he should bat leadoff, so as to maximize the team's slugging avg and OBA. He doesn't, and the very idea seems preposterous. Either Lasorda doesn't understand the game as you do, or your emphasis on OBA and slugging is wrong. Which is it? The lineup can even affect the selection of relief pitchers. And haven't you ever heard a manager say that what he really needs is a left-handed power-hitter (or more speed in the lineup, etc.)? Why are these things important to managers if the players in lineups don't interact? > And even if we WERE to consider them, why does Paul believe that Carter > has his stats inflated by Hernandez, Strawberry, and Foster when NONE > of those three show any substantial increase in production over last > year? WRONG. Strawberry is having a much better year than last year. Note that Strawberry bats directly behind Carter, just as Mattingly bats directly behind Henderson. See below. > I suppose Paul believes Carter has a special dispensation: in > moving from the Expos to the Mets, he gains by being surrounded by > Keith, Darryl, and George, while those three do NOT gain from Gary's > presence. The fact is, the production of all four has remained about > the same over the past two years, an argument AGAINST lineup effects. Or an argument that Carter is about as productive as Hubie Brooks is. Hubie Brooks is a very productive hitter, and is having a fine year batting cleanup for Montreal. And I never said Carter didn't help the others. Don't put words in my mouth and then criticize me for saying them. But you are completely wrong about the production of all four Met players. Strawberry is having a better years, and all the other three are down, except for Carter's HR rate: (these 1985 stats are as of 9/12; the 1985f stats are approximations to what they would have at the end of the season if their surrent averages continue) BA HR RBI Strawberry: 1984 .251 26 97 1985 .282 23 66 (don't forget he missed 7 weeks injured; prorating his stats over that time gives him about 34 HRs and 95+ RBI already) 1985f .282 27 77 (.282 38 113) Hernandez: 1984 .311 15 94 1985 .291 10 79 1985f .291 12 92 Foster: 1984 .269 24 86 1985 .254 17 66 1985f .254 20 77 Carter: 1984 .294 27 106 1985 .281 26 77 1985f .281 30 90