Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.2 9/5/84; site philabs.UUCP
Path: utzoo!decvax!linus!philabs!dpb
From: dpb@philabs.UUCP (Paul Benjamin)
Newsgroups: net.sport.baseball
Subject: Lineup dependency
Message-ID: <453@philabs.UUCP>
Date: Mon, 23-Sep-85 18:09:38 EDT
Article-I.D.: philabs.453
Posted: Mon Sep 23 18:09:38 1985
Date-Received: Tue, 24-Sep-85 17:17:07 EDT
Distribution: na
Organization: Philips Labs, Briarcliff Manor, NY
Lines: 280

Alright, folks, here's another exceedingly long posting for anyone
who cares to keep track of this argument over what baseball
statistics can and cannot mean. It consists of a point-by-point
rebuttal of a posting by David Rubin.

> First, let me say right off that while I disagree with what most of
> Paul wrote, if I countered all his points,
>       (a) this article would be another monster, and
>       (b) general principles would be lost among specifics.
> 
> Much of Paul's arguments are anecdotal in nature: he brings up a case
> which he believes supports his position, and concludes that, since his
> explanation is CONSISTENT with his own observations, it must be TRUE.
> As an example, he credits McGee's year to Coleman; he is satisfied
> that since his explanation makes sense,
> 
>       (1) he may disregard alternate explanations of the event, and
>       (2) he need not further investigate.

I wish that, for once, you would read what I wrote. The points I presented
were not of my own making. They are the opinions of, among others,
Billy Martin, and the author of the article. Have you read the article?

Also note that everything you have said above can be said about you!
You disregard my explanation of the events, and have not proven, in any
sense, that on-base average and slugging average are independent of
factors such as who is batting in front or behind you. Your evidence
is completely anecdotal. You embrace those stats without showing
that any strong correlation exists between them and scoring runs (or
more precisely, that a stronger correlation exists than for, say, the
stat R + RBI - HR.) It is not just my responsibility to
prove that lineup dependencies exist. It is also yours to prove that
they don't!

> I shall limit myself, therefore, to the general comment (call it
> Rubin's Law of Empirics, if you will) that HAVING A PLAUSIBLE
> EXPLANATION FOR AN ANTICIPATED EFFECT IS NOT EVIDENCE THAT THAT EFFECT
> HAS ACTUALLY OCCURRED.  

Perhaps you should take a course or two in prob&stat and learn the
actual laws, instead of making up your own.

> All of Paul's explanations mean little,
> therefore, until he establishes that what his explanations explain has
> indeed happened!  Only in the case of Mattingly does he attempt to
> actually demonstrate that a lineup effect exists, and I will therefore
> concentrate on it.  Elsewhere, he merely shows lineup effects are
> consistent with his selected observations without either showing other
> explanations are inconsistent or that the observations would be
> inexplicable without lineup effects. 
> In other words, a simple breakdown such as this is
> worthless (possibly even worse: it may be misleading) unless we also
> know that the circumstances of the two categories (batting 2nd vs.
> batting 3rd or 4th) are otherwise similar; otherwise, it may be some
> other factor (such as lefty-righty, home-away, grass-turf, day-night,
> etc.), strongly correlated with the categories, that is driving the
> discrepancy (Statisticians refer to this confusion of one cause with
> another as "confounding").

Marvelous! Perfect! I'm SO glad you said this. It's much easier to
shoot down someone's argument when he provides the ammunition
himself.

This is exactly the point I have been making for weeks. "it may be
misleading unless we know that the circumstances of the two categories 
are otherwise similar..." Two players for different teams do not
satisfy this criterion, and thus their stats are not directly
comparable. For example, many, including myself, like Guerrero for
the MVP, but I don't favor him because he leads the NL in slugging
and on-base average. Those stats are irrelevant, since you can't compare
them to, say, Dale Murphy's stats. Why not? It's simple. 18 times
a year, Dale Murphy has to face the great Dodger pitching staff, which
is clearly the best in the league, while Guerrero faces the Braves' staff,
which is one of the worst. That's over 11% of the season. This is in
addition to other differences, such as the number of day/night games,
the different stadiums they play in, the number of double-headers they
play in, the number of day games after night games, etc. They don't even
play the exact same other teams, either! After all, if a team played
most of its games against Philadelphia earlier in the year, they faced a
much easier opponent than a team whose schedule calls for them to
face Phila now. The reverse is true for the Cubs. Playing them
before all their starters were injured is different than playing
them afterwards.

So, unless you can correct for ALL these factors, and others, to
ensure that your circumstances are similar, all the analyses that
you have posted are "worthless (possibly even worse: (they) may be
misleading".

The only attempt you have made to correct your stats is to include
a ratio which takes into account the differences between stadiums,
and how hard they are for hitters. But even this attempt showed your
statistical inexperience. Saying, for example, that park A is 10% percent
harder to hit in than park B because the overall averages (of say, slugging)
are 10% lower, is a valuable and meaningful stat when applied to the whole
group of hitters - it provides information on the park to its owners.
But it is TOTALLY MEANINGLESS to apply this stat to individual batters
in this park. One must also know the shape of the distribution. It could
be that almost nobody hits 10% worse in that park - that many hit much worse
or better, and it averages out to 10%. For example, if a country's families
have 2.3 children on the average, it doesn't mean that anyone has 2.3
children, or even that most families have 2 or 3 children. Bivariate
distributions are not uncommon, and in these, almost noone is around
the mean.

Furthermore, the reason that I use only Mattingly is that these stats
are rarely available. It's much easier to compute personal averages
such as batting average, slugging average, runs, RBI, etc. than to
compute how much a batter tends to improve the stats of those batting
ahead of him or behind him, etc. We almost never see these stats. We
don't often enough see stats such as batting average with runners in
scoring position, etc. You criticize me for the deficiencies of baseball
statisticians everywhere. It's not my fault, so don't criticize me
for it.

> Moreover, even if Paul COULD assure us that this was so, he does not
> have nearly enough data.  Examine, in particular, the data for batting
> second: it is based on 35 games, i.e. about 100-150 at bats.  Most
> fans will not put much store in a player's average after 35 games
> (early May), and for good reason: the player has not yet accumulated
> enough at bats for us to form any reasonable opinion as to his likely
> seasonal productivity.  We are talking about guessing whether a player
> is hitting .300 or .400 based on that many at bats: it would not be at
> all unusual for the difference (10 to 15 hits) to be due to a "hot" or
> "cold" streak (what Statisticians conveniently label "random", but we
> may understand as being that which is beyond our knowledge).  We would
> need to have many more at bats (perhaps in a couple of more seasons we
> will) before we could say that the difference is due to the position
> in the lineup rather than a propitious hot streak.  To put it another
> way, if a lifetime .300 hitter were to have a .400 average on May 5th,
> would you tentatively conclude (until further info was available) that
> the man would bat .400 for the season?  Of course not.  You would
> correctly conclude that he is more likely to hit .300 from June
> through September than .400.  He may just have had a good April...

Again I wish you would actually read the article before you respond to
it! Of course, I know you already know everything :-) Mattingly's hot
stats for the second position were not compiled in one streak. He started
the season batting 3-4, then moved him to 2 in May for 17 games. He was
then moved back to 3-4, but occasionally in June and July batted 2. The
article does not give stats for those instances alone, but states that
it "worked like a charm". He was still usually batting 3-4, but was
moved to 2 on August 5, when Martin became aware of the stats for his
earlier production in the 2 spot. So, it is NOT the result of a hot streak. 
As for right-handed vs. left-handed opposition, I checked the games from
August 5 on. There were both right-handed and left-handed opponents.
He is playing full-time in that spot, so he faces all types of pitching.
Martin moved him to 2 on August 5 because of his excellent production
in that spot before.

> Even if it were established for Mattingly, it would hold only for
> Don Mattingly with the current Yankees: to apply it to, say, Tony
> Pena, it would have to be demonstrated for a wide variety of players on
> a wide variety of teams.  Still, it would be quite a surprise to me if
> anyone could get even that far.

I see! Whenever I come up with evidence, it counts only for that
case, but you have never detailed an instance of a player changing,
say, his lineup position and keeping the same OBA and slugging pct.,
but I am supposed to swallow your arguments!

It's interesting. When I respond to your postings, I feel like I'm trying
to explain baseball to a Martian. You know so little about the game!
EVERYBODY knows that lineups are interdependent! Try watching a game
sometime (instead of just reading numbers). You'll see that when a runner
is on base, it affects (among other things):

    1) the way the pitcher throws. Using the stretch instead of a full
       windup definitely hurts most pitchers' performances. Otherwise, 
       there would be no need for anyone to ever windup.

    2) the pitch selection;

    3) the defensive alignment.

Thus, if the batter ahead of, say Mattingly, gets on base more often,
is a threat to steal, and gets in scoring position more often, he
can (and does) affect whether Mattingly gets a hit or not.
Perhaps we should just forget this whole argument. You will continue
to emphasize the individual aspects of the game, and I will continue
to emphasize the team aspects. After all, if we both enjoy the game,
that's the purpose of baseball anyway.

By the way, if you still doubt the existence of lineup dependency (which
you undoubtedly still do) then answer the following question:

    If there were no lineup interaction, then all managers would bat their
    best hitter first, then their second-best, etc. to give them the
    most opportunities to hit. Thus, according to your criteria (OBA and
    slugging pct), the way to optimize the team's OBA and slugging pct
    is to bat the best in these categories first, the next-best second,
    etc. We would see Carter batting leadoff for the Mets, and Coleman
    would not be the leadoff hitter for St. Louis, McGee would be, followed
    by Clark. Coleman would be somewhere around 6 or 7. Come to think of it,
    since Cedeno has been playing for the Cards and the way he has been
    hitting, he would be batting leadoff. Also, Guerrero would be
    hitting leadoff for LA (absurd!).

    As ANY real baseball fan knows, managers carefully
    pick the order to help run production, e.g. alternating left-handed
    and right-handed batters, and putting speedsters in front of hitters
    who hit well with men in scoring position. WHY WOULD THEY BOTHER TO 
    DO THIS IF THERE WERE NO LINEUP INTERACTION??? Why not bat Mattingly
    leadoff, to get him more atbats? Maybe the fact that he would be
    batting behind a much weaker hitter just MIGHT have a teeny-weeny
    little bit to do with it?!

    Thus, we see that some excellent managers, such as Whitey Herzog,
    deliberately put a player like Coleman, who has a lower OBA and
    slugging average than McGee, in the spot where he will get the most 
    at-bats, thus effectively reducing the overall OBA and slugging pct of 
    his team. Do you really think he is deliberately reducing the run-scoring
    ability of his team? Or do you just think that all these baseball
    professionals are sadly misguided? The only other alternative is
    that TEAM RUN-SCORING ABILITY IS NOT DIRECTLY CORRELATED WITH
    TEAM OBA OR SLUGGING, i.e., these stats aren't all you crack them
    up to be. There must be other factors, e.g., speed.

    To rephrase this point, so that you will have less chance of
    misinterpreting it, if Guerrero's slugging avg and OBA are what
    are most important to the Dodgers, then he should bat leadoff,
    so as to maximize the team's slugging avg and OBA. He doesn't,
    and the very idea seems preposterous. Either Lasorda doesn't
    understand the game as you do, or your emphasis on OBA and
    slugging is wrong. Which is it?

The lineup can even affect the selection of relief pitchers. And haven't
you ever heard a manager say that what he really needs is a left-handed
power-hitter (or more speed in the lineup, etc.)? Why are these things
important to managers if the players in lineups don't interact?

> And even if we WERE to consider them, why does Paul believe that Carter
> has his stats inflated by Hernandez, Strawberry, and Foster when NONE
> of those three show any substantial increase in production over last
> year?  

WRONG. Strawberry is having a much better year than last year. Note that
Strawberry bats directly behind Carter, just as Mattingly bats directly
behind Henderson. See below.

> I suppose Paul believes Carter has a special dispensation: in
> moving from the Expos to the Mets, he gains by being surrounded by
> Keith, Darryl, and George, while those three do NOT gain from Gary's
> presence.  The fact is, the production of all four has remained about
> the same over the past two years, an argument AGAINST lineup effects.

Or an argument that Carter is about as productive as Hubie Brooks is.
Hubie Brooks is a very productive hitter, and is having a fine year
batting cleanup for Montreal. And I never said Carter didn't help the 
others. Don't put words in my mouth and then criticize me for saying them. 
But you are completely wrong about the production of all four Met players.

Strawberry is having a better years, and all the other three are down,
except for Carter's HR rate:
(these 1985 stats are as of 9/12; the 1985f stats are approximations to
what they would have at the end of the season if their surrent averages 
continue)

              BA     HR    RBI  
Strawberry:
     1984    .251    26    97
     1985    .282    23    66
(don't forget he missed 7 weeks injured; prorating his stats over that
time gives him about 34 HRs and 95+ RBI already)
     1985f   .282    27    77  (.282    38    113)

Hernandez:
     1984    .311    15    94
     1985    .291    10    79
     1985f   .291    12    92

Foster:
     1984    .269    24    86
     1985    .254    17    66
     1985f   .254    20    77

Carter:
     1984    .294    27   106
     1985    .281    26    77
     1985f   .281    30    90