Path: utzoo!attcan!uunet!mcvax!ukc!stl!stc!praxis!hausdorff!tkr From: tkr@praxis.co.uk (Tim Rylance) Newsgroups: comp.mail.elm Subject: Re: elm just ate my mailbox ....... Keywords: elm, munch, trashed-mailbox, I can't believe it ate the whole thing Message-ID: <2287@newton.praxis.co.uk> Date: 10 May 88 23:50:08 GMT References: <372@m3.mfci.UUCP> Sender: news@praxis.co.uk Reply-To: tkr@praxis.co.uk (Tim Rylance) Organization: Praxis Systems plc, Bath, UK Lines: 868 In article <372@m3.mfci.UUCP> bronson@mfci.UUCP () writes: Just 2 minutes ago, when I used the resyncronize command, I watched as elm printed out 'seek error ??? while reading mailbox' (or something like this) and then suddenly 60+ messages in my mailbox were gone! (I'd gotten some of these messages earlier, but without losing mail). Any idea why I suddenly starting getting seek errors today ? (too much junk in my mailbox) ? Until this happened I've liked to maintain a large set of files in /usr/spool/mail, and keep them as a reminder. We had a major outbreak of this last year. I also like to keep reminders in /usr/spool/mail, and when Elm ate over 200 of them I became strongly motivated to solve the problem. It happens when /tmp is full. This occurs frequently with the pathetic nearly-full-from-the-start 7Mb root filesystem in SunOS 3.X (fixed in Sys4-3.2 and 4.0.) When Elm starts up it copies /usr/spool/mail/foo to /tmp/mbox.foo, building a table of headers and message offsets as it goes. If /tmp happens to be full you may notice the subliminal "write failed: file system full" message flash by. If you don't, you will not realise anything is amiss because Elm extracts the messages you read from /usr/spool/mail/foo. But when you quit/resynchronize/change mailbox Elm copies /tmp/mbox.foo back to /usr/spool/mail/foo, skipping deleted messages. At which point it discovers that /tmp/mbox.foo is not as large as it should be (hence the "seek failed...") and collapses in a heap, having destroyed your mailbox. In fact Elm *never* checks for errors after writing. I went through it adding checks and trying to do something reasonably intelligent when a write fails. I now give up immediately if /tmp is full on startup, and I copy /tmp/mbox.foo back to /usr/spool/mail/foo.and then rename the latter to avoid abandoning mail in /tmp if /usr/spool is full. I also removed the use of temporary file names constructed from getpid()+1 and replaced the mailbox locking code (which appears to contain a race) with that from GNU Emacs. My diffs follow. Your line numbers will differ. Also note that if /usr/spool/mail is not world-writeable a little more work is needed... diff -rc elm-1.5b/hdrs/defs.h elm-1.5c/hdrs/defs.h *** elm-1.5b/hdrs/defs.h Tue May 5 15:48:17 1987 --- elm-1.5c/hdrs/defs.h Wed Jul 29 14:51:52 1987 *************** *** 6,12 #include "sysdefs.h" /* system/configurable defines */ ! #define VERSION "1.5b" /* Version number! WHAT_STRING should agree */ #define WHAT_STRING "@(#) Version 1.5b, April 1987" --- 6,12 ----- #include "sysdefs.h" /* system/configurable defines */ ! #define VERSION "1.5c" /* Version number! WHAT_STRING should agree */ #define WHAT_STRING "@(#) Version 1.5c, 29th July 1987" *************** *** 8,14 #define VERSION "1.5b" /* Version number! WHAT_STRING should agree */ ! #define WHAT_STRING "@(#) Version 1.5b, April 1987" #define KLICK 10 --- 8,14 ----- #define VERSION "1.5c" /* Version number! WHAT_STRING should agree */ ! #define WHAT_STRING "@(#) Version 1.5c, 29th July 1987" #define KLICK 10 diff -rc elm-1.5b/hdrs/sysdefs.h elm-1.5c/hdrs/sysdefs.h *** elm-1.5b/hdrs/sysdefs.h Tue Jun 30 16:45:55 1987 --- elm-1.5c/hdrs/sysdefs.h Tue Jul 28 19:09:36 1987 *************** *** 190,195 #define OLDEBUG "ELM:debug.last" #define temp_file "/tmp/snd." #define temp_form_file "/tmp/form." #define temp_mbox "/tmp/mbox." #define temp_print "/tmp/print." --- 190,196 ----- #define OLDEBUG "ELM:debug.last" #define temp_file "/tmp/snd." + #define temp_hdr "/tmp/hdr." #define temp_form_file "/tmp/form." #define temp_mbox "/tmp/mbox." #define temp_print "/tmp/print." diff -rc elm-1.5b/src/file.c elm-1.5c/src/file.c *** elm-1.5b/src/file.c Tue May 5 11:46:33 1987 --- elm-1.5c/src/file.c Wed Jul 22 10:18:08 1987 *************** *** 162,168 save_current = current; current = number+1; ! copy_message("", fd, FALSE, FALSE); current = save_current; if (resolve_mode) --- 162,172 ----- save_current = current; current = number+1; ! if (copy_message("", fd, FALSE, FALSE) != 0) { ! error2("Error writing %s - message %d not saved", filename, number); ! return; /* we haven't marked the message DELETED yet, ! so there's no cause to panic */ ! } current = save_current; if (resolve_mode) diff -rc elm-1.5b/src/file_utils.c elm-1.5c/src/file_utils.c *** elm-1.5b/src/file_utils.c Tue May 5 11:48:16 1987 --- elm-1.5c/src/file_utils.c Wed Jul 29 14:23:10 1987 *************** *** 184,190 } while (fgets(buffer, VERY_LONG_STRING, from_file) != NULL) ! fputs(buffer, to_file); fclose(from_file); fclose(to_file); --- 184,196 ----- } while (fgets(buffer, VERY_LONG_STRING, from_file) != NULL) ! if (fprintf(to_file, "%s", buffer) == EOF) { ! dprint(1, (debugfile, "Error %d writing %s (copy)\n", ! errno, to)); ! error1("error writing %s", to); ! force_final_newline(to_file); ! return(1); ! } if (fflush(to_file) == EOF) { dprint(1, (debugfile, "Error %d fflushing %s (copy)\n", *************** *** 186,191 while (fgets(buffer, VERY_LONG_STRING, from_file) != NULL) fputs(buffer, to_file); fclose(from_file); fclose(to_file); --- 192,205 ----- return(1); } + if (fflush(to_file) == EOF) { + dprint(1, (debugfile, "Error %d fflushing %s (copy)\n", + errno, to)); + error1("error writing %s", to); + force_final_newline(to_file); + return(1); + } + fclose(from_file); fclose(to_file); *************** *** 261,264 fprintf(fd, "%d\n", header_table[current-1].index_number); fclose(fd); } --- 275,292 ----- fprintf(fd, "%d\n", header_table[current-1].index_number); fclose(fd); + } + + force_final_newline(f) + FILE *f; + { + /** Try to replace the last byte of the file with a \n. + Called when a write has failed, presumably because + a file system is full, to prevent the next message + written to the file "vanishing" because the "From " + is not at the beginning of the line. No error + checking - if it doesn't work at least we tried **/ + + fseek(f,-1,2); /* 1 byte before EOF */ + putc('\n',f); } diff -rc elm-1.5b/src/fileio.c elm-1.5c/src/fileio.c *** elm-1.5b/src/fileio.c Tue May 5 11:52:27 1987 --- elm-1.5c/src/fileio.c Wed Jul 29 14:24:50 1987 *************** *** 19,24 char *error_name(); copy_message(prefix, dest_file, remove_header, remote) char *prefix; FILE *dest_file; --- 19,25 ----- char *error_name(); + int copy_message(prefix, dest_file, remove_header, remote) char *prefix; FILE *dest_file; *************** *** 30,35 then it will start copying into the file... If remote is true then it will append "remote from " at the end of the very first line of the file (for remailing) **/ char buffer[LONG_SLEN]; --- 31,37 ----- then it will start copying into the file... If remote is true then it will append "remote from " at the end of the very first line of the file (for remailing) + Returns 0 if successful, non-zero otherwise **/ char buffer[LONG_SLEN]; *************** *** 43,49 header_table[current-1].offset, "copy_message")); error1("ELM [seek] failed trying to read %d bytes into file", header_table[current-1].offset); ! return; } /* how many lines in message? */ --- 45,51 ----- header_table[current-1].offset, "copy_message")); error1("ELM [seek] failed trying to read %d bytes into file", header_table[current-1].offset); ! return 1; } /* how many lines in message? */ *************** *** 71,77 ok = 0; /* STOP NOW! */ } else ! fprintf(dest_file, "%s%s", prefix, buffer); } if (strlen(buffer) + strlen(prefix) > 1) fprintf(dest_file, "\n"); /* blank line to keep mailx happy *sigh* */ --- 73,84 ----- ok = 0; /* STOP NOW! */ } else ! if (fprintf(dest_file, "%s%s", prefix, buffer) == EOF) { ! dprint(1, (debugfile, "Error %d writing (copy_message)\n", ! errno)); ! force_final_newline(dest_file); ! return 1; ! } } if (strlen(buffer) + strlen(prefix) > 1) { *************** *** 73,80 else fprintf(dest_file, "%s%s", prefix, buffer); } ! if (strlen(buffer) + strlen(prefix) > 1) ! fprintf(dest_file, "\n"); /* blank line to keep mailx happy *sigh* */ } /******** the following routines are for a nice clean way to preserve --- 80,104 ----- return 1; } } ! ! if (strlen(buffer) + strlen(prefix) > 1) { ! /* need blank line to keep mailx happy *sigh* */ ! if (fprintf(dest_file, "\n") == EOF) { ! dprint(1, (debugfile, "Error %d writing \\n (copy_message)\n", ! errno)); ! force_final_newline(dest_file); ! return 1; ! } ! } ! ! if (fflush(dest_file) == EOF) { ! dprint(1, (debugfile, "Error %d fflushing (copy_message)\n", ! errno)); ! force_final_newline(dest_file); ! return 1; ! } ! ! return 0; } /******** the following routines are for a nice clean way to preserve diff -rc elm-1.5b/src/leavembox.c elm-1.5c/src/leavembox.c *** elm-1.5b/src/leavembox.c Tue Jun 30 16:31:55 1987 --- elm-1.5c/src/leavembox.c Tue Jul 28 15:11:43 1987 *************** *** 170,176 if (! mbox_specified) { if (pending) { /* keep some messages pending! */ ! sprintf(outfile,"%s%d", temp_mbox, getpid()); unlink(outfile); } else if (mailbox_defined) /* save to specified mailbox */ --- 163,171 ----- if (! mbox_specified) { if (pending) { /* keep some messages pending! */ ! /* put temp file into same filesystem as user's maildrop ! to avoid leaving mail in /tmp if we have space problems */ ! sprintf(outfile,"%s%s.%d", mailhome, username, getpid()); unlink(outfile); } else if (mailbox_defined) /* save to specified mailbox */ *************** *** 220,226 else { dprint(2, (debugfile, "#%d, ", current)); } ! copy_message("", temp, FALSE, FALSE); } fclose(temp); dprint(2, (debugfile, "\n\n")); --- 215,227 ----- else { dprint(2, (debugfile, "#%d, ", current)); } ! if (copy_message("", temp, FALSE, FALSE) != 0) { ! /* probably a file system full somewhere ! we haven't deleted anything important yet, ! so we can quit normally deleting temp files */ ! error1("error writing %s - leaving mail unchanged", outfile); ! leave(); ! } } fclose(temp); dprint(2, (debugfile, "\n\n")); *************** *** 264,270 infile)); dprint(1, (debugfile, "** %s - %s **\n", error_name(errno), error_description(errno))); ! error("something godawful is happening to me!!!"); emergency_exit(); } else { --- 265,271 ----- infile)); dprint(1, (debugfile, "** %s - %s **\n", error_name(errno), error_description(errno))); ! error1("leaving mail in %s", outfile); emergency_exit(); } else { *************** *** 284,290 error_description(errno)); emergency_exit(); } ! unlink(outfile); } else if (keep_empty_files) { sleep(1); --- 285,291 ----- error_description(errno)); emergency_exit(); } ! unlink(outfile); /* link succeeeded - complete rename */ } else if (keep_empty_files) { sleep(1); *************** *** 333,339 return(to_delete); } ! char lock_name[SLEN]; lock(direction) int direction; --- 334,341 ----- return(to_delete); } ! char lock_name[SLEN], ! temp_name[SLEN]; lock(direction) int direction; *************** *** 340,351 { /** Create lock file to ensure that we don't get any mail while altering the mailbox contents! - If it already exists sit and spin until - either the lock file is removed...indicating new mail - or - we have iterated MAX_ATTEMPTS times, in which case we - either fail or remove it and make our own (determined - by if REMOVE_AT_LAST is defined in header file If direction == INCOMING then DON'T remove the lock file on the way out! (It'd mess up whatever created it!). --- 342,347 ----- { /** Create lock file to ensure that we don't get any mail while altering the mailbox contents! This code was lifted from GNU Emacs (etc/movemail.c) **/ *************** *** 347,354 either fail or remove it and make our own (determined by if REMOVE_AT_LAST is defined in header file ! If direction == INCOMING then DON'T remove the lock file ! on the way out! (It'd mess up whatever created it!). **/ register int iteration = 0, access_val, lock_fd; --- 343,349 ----- /** Create lock file to ensure that we don't get any mail while altering the mailbox contents! ! This code was lifted from GNU Emacs (etc/movemail.c) **/ struct stat st; *************** *** 351,357 on the way out! (It'd mess up whatever created it!). **/ ! register int iteration = 0, access_val, lock_fd; sprintf(lock_name,"%s%s.lock", mailhome, username); --- 346,354 ----- This code was lifted from GNU Emacs (etc/movemail.c) **/ ! struct stat st; ! long now; ! int desc, tem; sprintf(temp_name,"%s%s:%d", mailhome, username, getpid()); sprintf(lock_name,"%s%s.lock", mailhome, username); *************** *** 353,358 register int iteration = 0, access_val, lock_fd; sprintf(lock_name,"%s%s.lock", mailhome, username); access_val = access(lock_name, ACCESS_EXISTS); --- 350,356 ----- long now; int desc, tem; + sprintf(temp_name,"%s%s:%d", mailhome, username, getpid()); sprintf(lock_name,"%s%s.lock", mailhome, username); unlink (temp_name); *************** *** 355,361 sprintf(lock_name,"%s%s.lock", mailhome, username); ! access_val = access(lock_name, ACCESS_EXISTS); while (access_val != -1 && iteration++ < MAX_ATTEMPTS) { dprint(2, (debugfile, --- 353,359 ----- sprintf(temp_name,"%s%s:%d", mailhome, username, getpid()); sprintf(lock_name,"%s%s.lock", mailhome, username); ! unlink (temp_name); while (1) { /* Create the lock file, but not under the lock file name. */ *************** *** 357,401 access_val = access(lock_name, ACCESS_EXISTS); ! while (access_val != -1 && iteration++ < MAX_ATTEMPTS) { ! dprint(2, (debugfile, ! "File '%s' currently exists! Waiting...(lock)\n", ! lock_name)); ! if (direction == INCOMING) ! PutLine0(LINES, 0, "Mail being received!\twaiting..."); ! else ! error1("Attempt %d: Mail being received...waiting", ! iteration); ! sleep(5); ! access_val = access(lock_name, ACCESS_EXISTS); ! } ! ! if (access_val != -1) { ! ! #ifdef REMOVE_AT_LAST ! ! /** time to waste the lock file! Must be there in error! **/ ! ! dprint(2, (debugfile, ! "Warning: I'm giving up waiting - removing lock file(lock)\n")); ! if (direction == INCOMING) ! PutLine0(LINES, 0,"\nTimed out - removing current lock file..."); ! else ! error("Throwing away the current lock file!"); ! ! if (unlink(lock_name) != 0) { ! dprint(1, (debugfile, ! "Error %s (%s)\n\ttrying to unlink file %s (%s)\n", ! error_name(errno), error_description(errno), lock_name)); ! PutLine1(LINES, 0, ! "\n\rI couldn't remove the current lock file %s\n\r", ! lock_name); ! PutLine2(LINES, 0, "** %s - %s **\n\r", error_name(errno), ! error_description(errno)); ! if (direction == INCOMING) ! leave(); ! else ! emergency_exit(); } /* everything is okay, so lets act as if nothing had happened... */ --- 355,367 ----- unlink (temp_name); ! while (1) { ! /* Create the lock file, but not under the lock file name. */ ! /* Give up if cannot do that. */ ! desc = open (temp_name, O_WRONLY | O_CREAT, 0666); ! if (desc < 0) { ! error1("Can't create temporary lock file %s", temp_name); ! leave(); } close (desc); *************** *** 397,402 else emergency_exit(); } /* everything is okay, so lets act as if nothing had happened... */ --- 363,369 ----- error1("Can't create temporary lock file %s", temp_name); leave(); } + close (desc); tem = link (temp_name, lock_name); unlink (temp_name); *************** *** 398,417 emergency_exit(); } ! /* everything is okay, so lets act as if nothing had happened... */ ! ! #else ! ! /* okay...we die and leave, not updating the mailfile mbox or ! any of those! */ ! if (direction == INCOMING) { ! PutLine1(LINES, 0, "\nGiving up after %d iterations...", iteration); ! PutLine0(LINES, 0, ! "Please try to read your mail again in a few minutes.\n"); ! dprint(2, (debugfile, ! "Warning: bailing out after %d iterations...(lock)\n", ! iteration)); ! leave_locked(0); } else { dprint(2, (debugfile, --- 365,381 ----- } close (desc); ! tem = link (temp_name, lock_name); ! unlink (temp_name); ! if (tem >= 0) ! break; ! sleep (1); ! ! /* If lock file is a minute old, unlock it. */ ! if (stat (lock_name, &st) >= 0) { ! now = time (0); ! if (st.st_ctime < now - 60) ! unlink (lock_name); } } } *************** *** 413,426 iteration)); leave_locked(0); } - else { - dprint(2, (debugfile, - "Warning: after %d iterations, timed out! (lock)\n", - iteration)); - leave(error("Timed out on lock file reads. Leaving program")); - } - - #endif } /* if we get here we can create the lock file, so lets do it! */ --- 377,382 ----- if (st.st_ctime < now - 60) unlink (lock_name); } } } *************** *** 422,453 #endif } - - /* if we get here we can create the lock file, so lets do it! */ - - if ((lock_fd = creat(lock_name, 0)) == -1) { - dprint(1, (debugfile, - "Can't create lock file: creat(%s) raises error %s (lock)\n", - lock_name, error_name(errno))); - if (errno == EACCES) - leave(error1( - "Can't create lock file! I need write permission in %s!\n\r", - mailhome)); - else { - dprint(1, (debugfile, - "Error encountered attempting to create lock %s\n", - lock_name)); - dprint(1, (debugfile, "** %s - %s **\n", error_name(errno), - error_description(errno))); - PutLine1(LINES, 0, - "\n\rError encountered while attempting to create lock file %s;\n\r", - lock_name); - PutLine2(LINES, 0, "** %s - %s **\n\r", error_name(errno), - error_description(errno)); - leave(); - } - } - close(lock_fd); /* close it. We don't want to KEEP the thing! */ } unlock() --- 378,383 ----- unlink (lock_name); } } } unlock() diff -rc elm-1.5b/src/mailmsg2.c elm-1.5c/src/mailmsg2.c *** elm-1.5b/src/mailmsg2.c Tue May 5 11:54:10 1987 --- elm-1.5c/src/mailmsg2.c Wed Jul 22 13:19:18 1987 *************** *** 211,217 /** write all header information into real_reply **/ ! sprintf(filename2,"%s%d",temp_file, getpid()+1); /** try to write headers to new temp file **/ --- 211,217 ----- /** write all header information into real_reply **/ ! sprintf(filename2,"%s%d",temp_hdr, getpid()); /** try to write headers to new temp file **/ diff -rc elm-1.5b/src/newmbox.c elm-1.5c/src/newmbox.c *** elm-1.5b/src/newmbox.c Wed Jun 24 18:50:38 1987 --- elm-1.5c/src/newmbox.c Wed Jul 29 11:12:13 1987 *************** *** 341,347 } } ! if (copyit) fputs(buffer, temp); line_bytes = (long) strlen(buffer); line++; if (first_word(buffer,"From ")) { --- 341,352 ----- } } ! if (copyit) ! if(fprintf(temp, "%s", buffer) == EOF) { ! error1("error writing %s - leaving mail unchanged", temp_filename); ! leave(); ! } ! line_bytes = (long) strlen(buffer); line++; if (first_word(buffer,"From ")) { *************** *** 451,456 } bytes += (long) line_bytes; } header_table[count > 0? count-1:count].lines = line + 1; --- 467,478 ----- } bytes += (long) line_bytes; } + + if (copyit) + if(fflush(temp) == EOF) { + error1("error writing %s - leaving mail unchanged", temp_filename); + leave(); + } header_table[count > 0? count-1:count].lines = line + 1; diff -rc elm-1.5b/src/utils.c elm-1.5c/src/utils.c *** elm-1.5b/src/utils.c Tue May 5 11:49:17 1987 --- elm-1.5c/src/utils.c Wed Jul 22 13:20:30 1987 *************** *** 64,70 dprint(1, (debugfile, " The composition file : %s%d\n", temp_file, getpid())); dprint(1, (debugfile, ! " The header comp file : %s%d\n", temp_file, getpid()+1)); dprint(1, (debugfile, " The readmsg data file: %s/%s\n", home, readmsg_file)); --- 64,70 ----- dprint(1, (debugfile, " The composition file : %s%d\n", temp_file, getpid())); dprint(1, (debugfile, ! " The header comp file : %s%d\n", temp_hdr, getpid())); dprint(1, (debugfile, " The readmsg data file: %s/%s\n", home, readmsg_file)); *************** *** 98,104 sprintf(buffer,"%s%d",temp_file, getpid()); /* editor buffer */ (void) unlink(buffer); ! sprintf(buffer,"%s%d",temp_file, getpid()+1); /* editor buffer */ (void) unlink(buffer); sprintf(buffer,"%s%s",temp_mbox, username); /* temp mailbox */ --- 98,104 ----- sprintf(buffer,"%s%d",temp_file, getpid()); /* editor buffer */ (void) unlink(buffer); ! sprintf(buffer,"%s%d",temp_hdr, getpid()); /* editor buffer */ (void) unlink(buffer); sprintf(buffer,"%s%s",temp_mbox, username); /* temp mailbox */ *************** *** 110,115 sprintf(buffer,"%s%s.lock",mailhome, username); /* lock file */ (void) unlink(buffer); if (! mail_only) { MoveCursor(LINES,0); Writechar('\n'); --- 110,118 ----- sprintf(buffer,"%s%s.lock",mailhome, username); /* lock file */ (void) unlink(buffer); + sprintf(buffer,"%s%s.%d",mailhome, username, getpid()); /* temp maildrop */ + (void) unlink(buffer); + if (! mail_only) { MoveCursor(LINES,0); Writechar('\n'); *************** *** 135,141 sprintf(buffer,"%s%d",temp_file, getpid()); /* editor buffer */ (void) unlink(buffer); ! sprintf(buffer,"%s%d",temp_file, getpid()+1); /* editor buffer */ (void) unlink(buffer); if (! mail_only) { --- 138,144 ----- sprintf(buffer,"%s%d",temp_file, getpid()); /* editor buffer */ (void) unlink(buffer); ! sprintf(buffer,"%s%d",temp_hdr, getpid()); /* editor buffer */ (void) unlink(buffer); if (! mail_only) { *************** *** 165,171 sprintf(buffer,"%s%d",temp_file, getpid()); /* editor buffer */ (void) unlink(buffer); ! sprintf(buffer,"%s%d",temp_file, getpid()+1); /* editor buffer */ (void) unlink(buffer); sprintf(buffer,"%s%s",temp_mbox, username); /* temp mailbox */ --- 168,174 ----- sprintf(buffer,"%s%d",temp_file, getpid()); /* editor buffer */ (void) unlink(buffer); ! sprintf(buffer,"%s%d",temp_hdr, getpid()); /* editor buffer */ (void) unlink(buffer); sprintf(buffer,"%s%s",temp_mbox, username); /* temp mailbox */ -- Tim Rylance Praxis Systems plc, 20 Manvers St, BATH BA1 1PX, UK ...!uunet!mcvax!ukc!praxis!tkr