Path: utzoo!utgpu!attcan!uunet!husc6!rutgers!att!cuuxb!mmengel From: mmengel@cuuxb.ATT.COM (Marc W. Mengel) Newsgroups: comp.lang.misc Subject: Re: Text or data files? Summary: Text wins! Keywords: portability, debugging, file formats Message-ID: <2085@cuuxb.ATT.COM> Date: 27 Sep 88 14:48:39 GMT References: <3958@enea.se> Reply-To: mmengel@cuuxb.UUCP (Marc W. Mengel) Followup-To: comp.lang.misc Organization: AT&T-DSD, System Engineering for Resellers, Lisle IL Lines: 100 In article <3958@enea.se> sommar@enea.se (Erland Sommarskog) writes: > >Miles O'Neal (meo@stiatl.UUCP) writes: >>If you even have your data files as text files, debugging >>becomes much easier. For instance, would you rather debug >>98764389437034gh307ytfhr398f39 >>or >>12/22/88 01:30 10790 100 100 382 -1 >>? > >There are several advantages of using fixed record files for data, >if the data I have fits to that format. Let's say we have: (This is >Pascal.) > Data_record = RECORD > Date : PACKED ARRAY(.1..8.) OF char; > Time : PACKED ARRAY(.1..8.) OF char; > Incident : Incident_type; (* Enumerated *) > No_of_warnings : integer; > Alarmed : boolan; > Username : PACKED ARRAY(.1..12.) OF char; > END; > >The simplest way to read and write this is to through a FILE OF Data_record, >if no other programs is to read it. Two major problems with this idea. The first is that most of the time other programs will need to read the data sooner or later. Second, when files are written in a binary format like this, the same program cannot read the data when run on a different machine with a different byte ordering, so after you have built up a list of 2000 incidents, and have to move to a new machine, you lose big time. You have a data file with packed records in it, and you (the programmer) have *no idea* how the data is actually formatted. > If we store the data in a text file, we >have to parse every line we get. (And what a trouble if Username contains >an LF character.) And since the text file is vulnerable to changes from >e.g. a text editor, we cannot be sure that the file follows the supposed >format throughout. It's true, you have to parse some of the data file (the numbers), but even Pascal gives you a means of writing and reading integers of a fixed width. Since Pascal has a problem with newlines, you can either write them with an escape sequence like '\n', or just check Pascal's eol() function before reading a character. >As for debugging this only applies if you have to look at the file as such. Clearly, one would never need to look at the data file a program uses to determine if a value is getting trashed before being written to the file, or after being read back in. (HEAVY sarcasm here) >>Another thing this buys you is that, in my experience, its easier >>to change file formats if you use text files. It requires a little >>plannning, but in general is a lot less work than doing the same >>thing with any other type of data. > >Uh? If you have a text file and change the format you have to rewrite >the parsing and the writing-to file parts. With a fixed format you >change the declaration, and that's all. (Well, you may have to write >a simple program to convert old files, but you have do to that for >text files too.) What's so tough about fixed format text parsing? Every language known to mankind can read and write integers, etc. from a file; The format is easily extensible, you *can* add records with a text editor, you can debug your code much more easily, you can write programs in other languages or on other machines that can read your data files, you can use Unix utilities like grep and sed and awk to make useful reports... The list goes on and on. Binary data saves you a whole 20 minutes to an hour when writing the program, makes your data files unportable between machines, makes you have to use the same language you started with on the same machine as long as you want to use the same data files... This list also goes on and on. If you're writing real, production programs that are useful to people and generate data files, the people who use it may want to use it on a different machine 5 years from now; they will read in the source and data file from a tape onto their new machine, and their data will be garbage, because the 1's will be 256's, or 2048's, etc. due to byte order on the new machine. If they move to this new machine because the old one is dead and they can't get parts any more, and now they can't read their old accounting records; they will find worse names to call you than you can imagine. >-- >Erland Sommarskog ! "Hon ligger med min b{ste v{n, >ENEA Data, Stockholm ! jag v}gar inte sova l{ngre", Orup >sommar@enea.UUCP ! ("She's making love with best friend, > ! I dare not to sleep anymore") -- Marc Mengel mmengel@cuuxb.att.com attmail!mmengel {lll-crg|mtune|att}!cuuxb!mmengel