Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/18/84; site noao.UUCP Path: utzoo!watmath!clyde!burl!ulysses!allegra!princeton!astrovax!noao!grandi From: grandi@noao.UUCP (Steve Grandi) Newsgroups: net.unix-wizards Subject: Revectoring bad blocks on RA81 disks Message-ID: <421@carina.noao.UUCP> Date: Fri, 28-Jun-85 17:51:33 EDT Article-I.D.: carina.421 Posted: Fri Jun 28 17:51:33 1985 Date-Received: Sun, 30-Jun-85 00:22:36 EDT Organization: Natl. Optical Astronomy Observatories, Tucson, AZ USA Lines: 66 Rumor hath it that a program is available through DEC field service to revector bad blocks on UDA disk drives (RA81s in particular). Details of the rumor are that the program is a "standalone" program written by the Ultrix folks called /rabads that can be booted instead of vmunix and that non-Ultrix sites running 4.2BSD can obtain the program through their field service reps. Has any non-Ultrix site obtained this program? Is there a part number or any identifying information that our Friendly Field Service Man can use to pry it out of DEC's bureaucracy? We are already running the Riacs UDA driver on our 750's that will try to revector blocks that generate hard errors; our problem are blocks that generate lots and lots of soft errors. Since soft errors tend to turn into hard errors and since these are rather important blocks (see below) and revectoring blocks with hard errors often generates data which is not guaranteed to be correct (a "forced error" in MSCP-speak), I would dearly love to revector these marginal blocks now and avoid the massive pain that a trashed file system can bring. (Once burned, twice shy; 5 times in the last year burned, 10**6 times shy!!). Also, the system REALLY slows down when the disk driver is printing error messages on the console. Obviously, we could probably hack the Riacs driver to give us a utility to revector disk blocks, but another rumor hath it that the procedure used in the driver is not REALLY correct (since DEC is incredibly reluctant to reveal details of the very complicated song and dance that has to be gone through to accomplish this feat, I'm not surprised). Also it would be nice to have a tool that our Friendly Field Service Rep believed in as opposed to the incredulous looks I get when I explain the history of our disk driver. Two details of our problems might be of interest to students of MSCP soft error datagrams or of the 4.2BSD file system. The "drive detected error" we are getting is code 1A39 (that's the contents of word 27 of the SDI error variant of the MSCP packet) which indicates a "servo fine position error" generated when "a write command is attempted while the positioner is off track (not detented)". The servo boards and the R/W boards in the drives showing these errors have all been replaced, so the HDA is obviously showing marginal behavior at these locations. The disk blocks showing errors are also interesting. For several file systems on several disks on several 750's, relative block numbers 576 and 577 are repeatedly showing up with fine-positioning errors (and these cases constitute about 75% of our total collection of these errors). A morning's study of the output from dumpfs(8) and fs.h indicates that for our 8K/2K and 8K/1K file systems, blocks 576-7 contain the csum structure, which contains a summary of information about all the cylinder groups (number of directories, number of free blocks, number of free inodes, number of free frags). Obviously, since our disks are figuratively digging holes in the oxide at these blocks, this structure is used a lot, presumably everytime a file is created (and extended?). Is this structure a single point of failure? If block 576 is destroyed, is the file-system totally trashed or just incapable of creating new files? (in other words, can I dump(8) the file-system?). Can fsck completely regenerate the data in the csum structure? (I know fsck can correct things; one often sees "SUMMARY INFORMATION ... BAD" messages on a post-crash reboot). All in all, I think I might have been better off with Eagles.... Steve Grandi, National Optical Astronomy Observatories, Tucson, AZ, 602-325-9228 {arizona,decvax,hao,ihnp4,seismo}!noao!grandi noao!grandi@lbl-csam.ARPA -- Steve Grandi, National Optical Astronomy Observatories, Tucson, AZ, 602-325-9228 {arizona,decvax,hao,ihnp4,seismo}!noao!grandi noao!grandi@lbl-csam.ARPA