Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!rutgers!ames!ucbcad!ucbvax!jade!eris!mwm From: mwm@eris.BERKELEY.EDU (Mike (My watch has windows) Meyer) Newsgroups: comp.unix.questions Subject: Re: awk or sed question Message-ID: <4249@jade.BERKELEY.EDU> Date: Sat, 4-Jul-87 06:30:36 EDT Article-I.D.: jade.4249 Posted: Sat Jul 4 06:30:36 1987 Date-Received: Sun, 5-Jul-87 08:37:51 EDT References: <4780@columbia.UUCP> <3892@burdvax.PRC.Unisys.COM> Sender: usenet@jade.BERKELEY.EDU Reply-To: mwm@eris.BERKELEY.EDU (Mike (My watch has windows) Meyer) Distribution: world Organization: Missionaria Phonibalonica Lines: 167, agw@broadway.columbia.edu (Art Werschulz) says: [A request for a sed or awk tool to break 80-character lines at whitespace.] Some problems just aren't amenable to tackling with sed/awk. I think this is one of them. It may be doable with sed, but I'm not sure how. Any awk script to do this wiill be almost as complicated as a C program to do the same thing. For example: In article someone writes: N; n -= LEN) { < while (substr($0,LEN+i-1,1) != " ") { < LEN -= 1 < } < if (i==1) { < printf "%s\\\n", substr($0, i, LEN) < } else { < printf "> %s\\\n", substr($0, i, LEN) < } < i += LEN; < } < printf "> %s\\\n", substr($0,i) < } <} ' This is what I mean. First, converting tabs directly to 8 spaces has *got* to be wrong. Secondly, this fails on files with lines longer than awks internal buffer for records (minor, and usually acceptable). The loose problem spec doesn't help much, of course. But that just means the problem is a "real-life" problem, and not a classroom exercise. The C code to solve the problem has some differences (no tags on folded lines, and the whitespace where the fold is doesn't get printed). It's also a pure filter, but allows for user-specified fold columns, instead of wiring it to 80. The main loop of the C code is 26 lines, not counting comments. The awk script is 19 lines. The C code would shrink to 22 lines by using printfs instead of fputs/putchar, and formatting if/else the same way the awk script is. Since (as far as I'm concerned)) sed and awk are for quickly building programs that would be difficult in C, the small difference between the two programs - which hopefully indicates a small difference in construction time - shows that this is an problem for which awk isn't really suited. On the other hand, some simple test case (the first n integers on a single line, seperate by a singe space) show the C version can handle n = 10000 in about the same sys and user times (as reported by /bin/time on a Sun 3/50 running SunOS 3.3) as the sed/awk version for n = 100. The sed/awk version drops core for n >= 1000, and the C version takes less that 1/10th of a second of sys and user time for n <= 1000, so I didn't do direct comparisons. The shell script to emulate the awk/sed script user interfaces, and the more complex script to combine the two, is left as an exercise for the reader. /* * MAXFOLD is the largest fold column we're willing to accept. All others * rejected. */ #define MAXFOLD 160 void main(argc, argv) int argc; char **argv; { register foldc = 80 ; char buffer[MAXFOLD + 2] ; register char *fold_point, *leftovers ; /* Argument processing */ if (argc > 2) { fprintf(stderr, "useage: %s [n]\n", argv[0]) ; exit(1) ; } if (argc == 2) foldc = atoi(argv[1]) ; if (foldc <= 0 || foldc > MAXFOLD) { fprintf(stderr, "%s: only fold columns between 1 and %d supported\n", argv[0], MAXFOLD) ; exit(1) ; } /* * The plan is to treat each line + leftovers from last read as * a new line. fold_point indicates where the end of the leftovers * end. Initially set to the beginning of the buffer, it's set up * correctly each time through the loop. * * We need to get one more characters than the maximum fold, as * the first character past the fold column might be whitespace, * and that's a legit fold point. Since fgets reads at most n-1 * characters (n is the second argument), we need to ask for foldc+2 * characters, minus however much leftovers there are from last loop. */ leftovers = buffer ; while (fgets(leftovers, foldc+2-(leftovers-buffer), stdin) != NULL) { /* * If we got a complete line, print it. */ if (buffer[strlen(buffer) - 1] == '\n') { fputs(buffer, stdout) ; leftovers = buffer ; } /* * Got a long line. Find the fold point, print up to the fold, * then shuffle the remaining characters forward and try again. */ else { fold_point = buffer + foldc ; while (*fold_point != ' ' && *fold_point != '\t' && fold_point > buffer) fold_point -= 1 ; /* Test for lines with no whitespace */ if (fold_point == buffer) { fputs(buffer, stdout) ; putchar('\n') ; leftovers = buffer ; } else { /* Dump up to fold point */ *fold_point = '\0' ; fputs(buffer, stdout) ; putchar('\n') ; /* Now, deal with the leftovers */ fold_point += 1 ; strcpy(buffer, fold_point) ; leftovers = &buffer[strlen(buffer)] ; } } } exit(0) ; } -- I'm gonna lasso you with my rubberband lazer, Mike Meyer Pull you closer to me, and look right to the moon. mwm@berkeley.edu Ride side by side when worlds collide, ucbvax!mwm And slip into the Martian tide. mwm@ucbjade.BITNET