Xref: utzoo comp.lang.fortran:803 comp.software-eng:643 Path: utzoo!utgpu!water!watmath!clyde!att!osu-cis!killer!ames!pasteur!ucbvax!decwrl!labrea!sri-unix!garth!smryan From: smryan@garth.UUCP (Steven Ryan) Newsgroups: comp.lang.fortran,comp.software-eng Subject: Re: Fortran follies Summary: Vectoriser follies. Message-ID: <809@garth.UUCP> Date: 25 Jun 88 21:05:44 GMT References: <5377@cup.portal.com> <2852@mmintl.UUCP> <1005@cresswell.quintus.UUCP> <701@garth.UUCP> <2157@sugar.UUCP> <1555@kalliope.rice.edu> Reply-To: smryan@garth.UUCP (Steven Ryan) Organization: INTERGRAPH (APD) -- Palo Alto, CA Lines: 56 >I'm not sure about that. Vectorizers will only rarely need the largest >dimension since it does not appear in the addressing arithmetic. It is critical for dependency analysis. Given a loop like for i from m to n a[xi]:=f a[yi] dependency analysis determines if xi=yj for m<=iFurthermore, unless the bound >is hardwired as a constant, it won't be very useful anyway. The vectoriser handles constant bounds as a special case. It uses symbolic expressions for loop bounds, array dimensions, and subscript expressions. > If you >see reduced vectorization it may be due to an assumption that the >dimension is short and hence vectorization would be unprofitable. The Cyber 205's breakeven vector length is from 20 to 50 elements. To get large enough vectors the compiler has always concentrated on vectorising a loop nest rather than the innermost loop. (Cray, Kuck, the Good Folks at Rice only worry about the innermost loop according to the literature.) So..... If you have loop nest like, for i to m scalar := .... a[i] := .... for j to n b[i,j] := .... c[i] := scalar + .... If everything is otherwise vectorisable, the j loop can be vectorised even if n>hardware vector length by surrounding it with scalar stripmining loop. If m*n<=hardware vector length, the entire nest can be vectorised. But if m*n>hardware vector length, the i-loop as written cannot be vectorised. If the loops are split it is possible, but such a split must correctly handle the promoted scalar which is defined above the split and used below. Finally to the point: if m and n are expressions, it difficult or impossible to compare m*n to the hardware limit. In this case, FTN200 agains hunts for constant bounds of the array. If it can find an upper bound for m*n less than 65535, it will vectorise the entire loop nest. If greater than 65535 or a constant upper bound is not known, it can only vectorise the innermost.