Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/17/84; site hao.UUCP Path: utzoo!watmath!clyde!bonnie!akgua!sdcsvax!sdcrdcf!trwrba!cepu!hao!woods From: woods@hao.UUCP (Greg "Bucket" Woods) Newsgroups: net.unix Subject: Re: Sort(1) on E-format numerics Message-ID: <1182@hao.UUCP> Date: Fri, 28-Sep-84 18:27:11 EDT Article-I.D.: hao.1182 Posted: Fri Sep 28 18:27:11 1984 Date-Received: Mon, 1-Oct-84 04:29:08 EDT References: <1181@hao.UUCP> Distribution: net Organization: High Altitude Obs./NCAR, Boulder CO Lines: 48 > > We have a need to numerically sort files which contain columns of numbers in > E-format, i.e. something of the form [+-]#.####e[+-]##, where "#" means > a digit and [+-] means an optional sign. Unfortunately, the -n option to > sort(1) does not recognize exponents and stops numerical conversion of the > sort field when it sees the "e". This results in incorrect sorting in some > cases, like it will put 1.0e-07 before 2.0e-09. In reply to my own question, after a bit of trial and error I discovered a method that seems to work. It does depend on the fact that every line is identical in format, which is true in all cases we have. Here is an example: 1.27000E-07 8.91000E+04 6.00495E+09 9.82000E+05 1.66451E+05 4.99966E+09 1.43000E-07 5.00000E+04 1.04275E+10 9.76000E+05 2.38238E+06 8.68145E+09 8.09000E-07 2.30000E+04 2.35302E+10 8.87000E+05 4.11476E+08 2.02331E+10 1.67000E-07 3.20000E+04 1.57815E+08 9.71000E+05 3.63586E+07 1.31336E+10 1.97000E-07 2.55000E+04 1.93346E+10 9.68000E+05 1.92010E+08 1.61099E+10 2.30000E-07 2.45000E+04 2.00822E+10 9.64000E+05 2.55430E+08 1.68091E+10 1.81000E-07 2.80000E+04 1.78057E+10 9.70000E+05 9.50806E+07 1.48126E+10 1.58000E-07 3.70000E+04 1.38137E+10 9.73000E+05 1.38215E+07 1.14989E+10 4.70000E-07 2.40000E+04 2.14417E+10 9.33000E+05 2.84392E+08 1.80507E+10 6.56000E-07 2.35000E+04 2.25669E+10 9.08000E+05 3.37865E+08 1.91669E+10 3.37000E-07 2.42000E+04 2.07391E+10 9.49000E+05 2.70261E+08 1.74114E+10 We want to sort on the third column. The command "sort +2.9 -n +2" run on this file, which says "sort on third field and skip 9 characters, sort this numerically, then subsort on the third field" does what we want. It took a lot of trial and error to figure this one out! The only problem with it is that it won't work if some of the exponents are negative (in all of our cases, the exponents are all the same sign). I tried using "sort +2.8" instead, but apparently the stupid numeric sort algorithm knows about minus signs but not plus signs (AAARGH!) and so sort +2.8 failed totally. I'm going to see about fixing that so a plus sign as a leading character in a numeric field will be ignored instead of aborting the field. Thanks to all those who responded. Some people gave me kludges using "sed" and/or "awk". I didn't actually try any of these, but from the looks of it, "awk" is aptly named! :-) One person even sent me mods to sort.c to make numeric sorts work on E-format. If anyone is interested in any of those, drop me a line and I'll be glad to mail you everything I got. --Greg -- {ucbvax!hplabs | allegra!nbires | decvax!stcvax | harpo!seismo | ihnp4!stcvax} !hao!woods "Every silver lining has a touch of grey..."