Xref: utzoo comp.lang.fortran:803 comp.software-eng:643
Path: utzoo!utgpu!water!watmath!clyde!att!osu-cis!killer!ames!pasteur!ucbvax!decwrl!labrea!sri-unix!garth!smryan
From: smryan@garth.UUCP (Steven Ryan)
Newsgroups: comp.lang.fortran,comp.software-eng
Subject: Re: Fortran follies
Summary: Vectoriser follies.
Message-ID: <809@garth.UUCP>
Date: 25 Jun 88 21:05:44 GMT
References: <5377@cup.portal.com> <2852@mmintl.UUCP> <1005@cresswell.quintus.UUCP> <701@garth.UUCP> <2157@sugar.UUCP> <1555@kalliope.rice.edu>
Reply-To: smryan@garth.UUCP (Steven Ryan)
Organization: INTERGRAPH (APD) -- Palo Alto, CA
Lines: 56

>I'm not sure about that. Vectorizers will only rarely need the largest
>dimension since it does not appear in the addressing arithmetic.

It is critical for dependency analysis.

Given a loop like
           for i from m to n
             a[xi]:=f a[yi]
dependency analysis determines if xi=yj for m<=i                                     Furthermore, unless the bound
>is hardwired as a constant, it won't be very useful anyway.

The vectoriser  handles constant bounds as a special case.  It uses symbolic
expressions for loop bounds, array dimensions, and subscript expressions.

>                                                            If you
>see reduced vectorization it may be due to an assumption that the
>dimension is short and hence vectorization would be unprofitable.

The Cyber 205's breakeven vector length is from 20 to 50 elements. To get large
enough vectors the compiler has always concentrated on vectorising a loop nest
rather than the innermost loop. (Cray, Kuck, the Good Folks at Rice only worry
about the innermost loop according to the literature.) So.....

If you have loop nest like,
      for i to m
        scalar := ....
        a[i] := ....
        for j to n
            b[i,j] := ....
        c[i] := scalar + ....

If everything is otherwise vectorisable, the j loop can be vectorised
even if n>hardware vector length by surrounding it with scalar stripmining loop.

If m*n<=hardware vector length, the entire nest can be vectorised. But if
m*n>hardware vector length, the i-loop as written cannot be vectorised. If the
loops are split it is possible, but such a split must correctly handle the
promoted scalar which is defined above the split and used below.

Finally to the point: if m and n are expressions, it difficult or impossible
to compare m*n to the hardware limit. In this case, FTN200 agains hunts for
constant bounds of the array. If it can find an upper bound for m*n less than
65535, it will vectorise the entire loop nest. If greater than 65535 or a
constant upper bound is not known, it can only vectorise the innermost.