Path: utzoo!attcan!uunet!bloom-beacon!mintaka!oliveb!amdahl!dgcad!gary From: gary@dgcad.SV.DG.COM (Gary Bridgewater) Newsgroups: comp.os.aos Subject: Re: How to find *real* file sizes in AOS/VS...? Message-ID: <1135@svx.SV.DG.COM> Date: 24 Sep 89 10:28:27 GMT References: <1702@murdu.oz> Reply-To: gary@svx.SV.DG.COM () Organization: Data General SDD, Sunnyvale, CA Lines: 96 In article <1702@murdu.oz> rab@murdu.oz (Richard Alan Brown) writes: >I know: > >Each file has a length given in bytes. Given the element size for that file >(default 4 on our system), the real file size is just the length in bytes >taken up to the next multiple of the element size. (e.g. An element size of 4 >means 4*512 = 2k bytes, so such files are allocated in 2k chunks). Yes. >But what about index blocks? OK, so I also count the number of index levels >in a file, and allow a block for each level (Is this correct? Are index blocks >true blocks, or blocks within a 2k chunk? In other words, does the system lose >4 blocks on the first index 'block' and use this for the next three?) A 0 level file has no index blocks. It is a direct file and its size is one element. A 1 level file has one disk block (512 bytes) which contains 128 four byte logical disk addresses pointing to data elements. A 2 level file has one disk block pointing to 128 'level 1' index blocks. A 3 level file has one disk block pointing to 128 'level 2' index blocks. >BUT! I have noticed DG's sneaky compression of files (executables) with large >blocks of nulls in them (am I right?), so that a file can seem large (in bytes), >while actually taking up much less disk space. (Is this only for PRV files? We prefer "clever". In the above index block scheme you can hae element pointers that are zero. AOS/VS takes this to mean that the entire element is empty and it provides the 0 bytes if you try to read these blocks. This is a great disk space savings for executables and data bases. It will work on ANY kind of file but only if you A) write at least whole index block worth of 0s at once or B) use some form of file positioning command to skip data. Note that it is important when moving such files over the net to use the MOVE/FTA/COMPRESS command rather than just MOVE (RMA form) or MOVE/FTA (no compression). Without both FTA and COMPRESS the transfer takes place a byte at a time and the system won't notice the null blocks. A way to fix files which have been incorrectly grown this way is to DUMP and LOAD them since DUMP will squeeze out the 0s and LOAD will do positional block writes. This 'compression' makes exact space computation tricky. It also makes reading such file interesting - study the BLKIO system call, for instance. Its main feature is the ability to skip these empty spaces - that is why DUMP_II can dump such files MUCH faster than DUMP which reads a byte at a time. It is also why the system can seem to "go into its navel" when READing such a file - no disk activity and the expansion is done at system priority. >Why doesn't the file system tell users the 'correct' size?) What is the real size? If you read it a byte at a time you will get an EOF after the Nth byte so the file is N bytes long. If you ?BLKIO the file an element at a time you can discover how many blocks it is taking up and from that you can infer the element structure if you map the empty elements. But do you want the CLI to do that everytime you say F/LEN? You could submit an str to have another switch added to the VSII CLI to perform this activity. You could also submit an STR asking that the system maintain a count of the number of elements allocated to a file which would make the system bigger and slower to provide a rarely needed piece of information. If you want to know how much space on the disk the file takes then create a cpd, move the file there, do a space, delete the file, do another space to get the size of the cpd itself and subtract from the first size. Crude but exact. >Now for the *really* tricky part. Create an empty CPD. put a file in it >(length 0). Start adding data. Who knows how much space the file takes up!? >Does the SPACE command include the space taken up by directory entries? How >does one calculate that (Note that when one deletes the file, the CPD is not >'empty'. This presumably is the directory entry...?). The size of the directory is the second number above. Directory space is a function of the number of files, any UDAs and the length of the filenames. Directories are also files so they have index blocks too! Yes, the space a directory takes is included in the size of the directory. >So if there are any Data General employees or hackers out there, maybe you >could enlighten me? You could also order a Filesystem Internals manual which goes into all the gory details of this. Another poster mentions using DUMP to discover a file's true size. Won't work - DUMP compresses nulls wherever it finds them irrespective of block boundaries. And some Unix versions also do this sort of 'hollow' file optimization. We may very well have inherited it from MULTICS which is the inspiration for both Unix and AOS(/VS). The above is my interpretation of How It All Works and should not be interpreted as an Official Version. See the manual and the Release notices. Buy the sources and KNOW enlightenment. Your mileage may vary. -- Gary Bridgewater, Data General Corp., Sunnyvale Ca. gary@sv4.ceo.sv.dg.com or {amdahl,aeras,amdcad,mas1,matra3}!dgcad.SV.DG.COM!gary No good deed goes unpunished.