Path: utzoo!utgpu!water!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!mailrus!ames!pasteur!ucbvax!ucsd!ucsdhub!jack!crash!pnet01!haitex
From: haitex@pnet01.cts.com (Wade Bickel)
Newsgroups: comp.sys.amiga.tech
Subject: New IFF format details (long).
Message-ID: <3450@crash.cts.com>
Date: 19 Sep 88 12:56:37 GMT
Sender: news@crash.cts.com
Organization: People-Net [pnet01], El Cajon CA
Lines: 369


cmcmanis%pepper@Sun.COM (Chuck McManis) writes:
>First what was your misunderstanding? IFF can be parsed fairly readily by
>most "types" of parsers primarily because it's grammar is self consistent.
>
>->        So the question I wish to pose is;  Would you (the Amiga community)
>->reject a re-design of the current IFF standard?
>
>Yes if you decided to redesign it simply because you blew it while reading
>the documents. Please, don't be offended but there are already "other"... 
>
>Please, let us know what the "flaw" is first and *then* ask us if we want

        Ok, here goes...

        --------

        The problem with the current IFF is that it is not generic.

        To be more specific, a FORM specifier is not a chunk per say.
Under EA's definition, an ILBM is defined as:


                +-----------------------------------+
                | 'FORM'        size                |
                +-----------------------------------+
                | 'ILBM'                            |
                +-----------------------------------+
                | +-------------------------------+ |
                | | 'BMHD'      size              + |
                | |             data              | |
                | +-------------------------------+ |
                | | 'CMAP'      size              | |
                | |             data              | |
                | +-------------------------------+ |
                | pad bytes (if needed)             |
                +-----------------------------------+
                | 'BODY'        size                +
                |               data                |
                +-----------------------------------+


        The difficulty is that the 'ILBM' specifier is a special case, it has
no size specifier.  This wreaks havic on a generic parser.  It also results in
a nesting depth limitation (ie: BMHD cannot contain chuncks.)
	
	Another problem is that no bad chunk management is done.  If any chunk
is bad, the whole file is bad.  Why not make a reasonable effort to retain the
valid chunks?  If the CMAP is messed up do we really need to through away the
BODY?  Recovering the CMAP would, in many cases, take but minutes useing a 
tool such as Doug's Color Commander (Seven Seas' Software), whereas an artist
might loose hours of careful manipulation of the BODY.  By allocating a bit
in each chunk header this can be easily accomodated.

	Another problem is that there are no dirty chunk provisions.  I feel
that dirty chunk tracking would be a valuable option.  Dirty chunks would 
occure when, after finding some recognized chunks, unrecognized chunks are
encountered.  IFF '85 discards these chunks.  I propose that as a user option
unrecognized chunks be retained when a program modifies a partially understood
IFF '88 file.  This can be easily achieved by allocating two bits in each
chunk header.  When unrecognized chunks are written they're marked as dirty,
and any chunks which have been modified are also noted.  This would allow
programs with new, or proprietary chunks, to be made more compatable with
existing programs (certain paint programs come to mind...).

     { BTW:  I got the idea for the need for dirty chunk handling from
	     Carolyn Scheppner, so don't tell me I'm off the wall on this
	     one, I just happen to agree with her and offer this as one
	     solution.  I'm very open to any better solutions. }
	      

	In IFF '88 a LONGWORD (ie: 32 bits) would be included at the top of
all chunks to maintain the "status" of the chunk.

        Consider the following IFF '88 proposed format,

                +-----------------------------------+
                | 'FORM'         size,status        |
                | +-------------------------------+ |
                | | 'ILBM'       size,status      | |
                | | +---------------------------+ | |
                | | | 'BMHD'     size,status    | | |
                | | |            data           | | |
                | | +---------------------------+ | |
                | | | 'CMAP'     size,status    | | |
                | | |            data           | | |
                | | +---------------------------+ | |
                | | | 'BODY'     size,status    + | |
                | | |            data           | | |
                | | +---------------------------+ | |
                | +-------------------------------+ |
                +-----------------------------------+

                (pad bytes not shown, but considered added at the end
                    of any odd byte length chunk, checksum assumed included
		    at the end of each chunk as well).

        This format allows a generic parser to reconize 'FORM' and 'ILBM' as
just another chunk type.  More importantly, it allows a much simpler parser
design that is also much more versital.   It is entirely possible to place
chunks within ANY chunk type.  Thus data structures such as B-Trees are
easily and efficeintly supported.  Example:

            +-----------------------------------------------------+
            | 'FORM'                  size,status                 |
            | +-------------------------------------------------+ |
            | | '23BT'                size,status               | |
            | | +---------------------------------------------+ | |
            | | | 'NODE'              size,status             | | |
            | | | +-----------------------------------------+ | | |
            | | | | 'NDAT'            size,status           | | | |
            | | | |                   data                  | | | |
            | | | +-----------------------------------------+ | | |
            | | | | 'NODE'            size,status           | | | |
            | | | | +-------------------------------------+ | | | |
            | | | | | 'NDAT'         size,status          | | | | |
            | | | | |                data                 | | | | |
            | | | | +-------------------------------------+ | | | |
            | | | | | 'NODE'         size,status          | | | | |
            | | | | | +---------------------------------+ | | | | |
            | | | | | | 'NDAT'       size,status        | | | | | |
            | | | | | |              data               | | | | | |
            | | | | | +---------------------------------+ | | | | |
            | | | | | |  NODEs, etc. etc. etc...        | | | | | |
            | | | | | |                                 | | | | | |
            | | | | | |^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^| | | | | |
            | | | | +-------------------------------------+ | | | |
            | | | +-----------------------------------------+ | | |
            | | | | 'NODE'             size,status          | | | |
            | | | | +-------------------------------------+ | | | |
            | | | | |  {NDAT and 3 NODES...etc., etc.     | | | | |
            | | | | +-------------------------------------+ | | | |
            | | | +-----------------------------------------+ | | |
            | | | | 'NODE'             size,status          | | | |
            | | | | +-------------------------------------+ | | | |
            | | | | |  {NDAT and 3 NODEs...etc., ect.     | | | | |
            | | | | +-------------------------------------+ | | | |
            | | | +-----------------------------------------+ | | |
            | | +---------------------------------------------+ | |
            | +-------------------------------------------------+ |
            +-----------------------------------------------------+

	Amoung other things, this format would support quicker searchs of
the file for a specific node, since nodes can be searched in a true tree like
fassion.  However this is not the point of the change.

	What I really want to do is create a purely Data driven mechanism, as
opposed to the Code driven one in the current IFF.  Rather than having to 
write code to handle each type of occurance, a structure would be initialized
at run time, and this would be passed to the Reader or Writer parser to be
handled.  In this way it would never be necessary to update the Library(s).

        The following is a document specifying how the system is to work.


 =============================================================================



		     Conceptual Design Specification
                    ---------------------------------

	Like its' predecessor, IFF '88 is a recursive descendant
parser design.  The primary differences between the old design and
the new one is that while IFF '85 was code driven, IFF '88 is data driven.
Whereas IFF '85 reader/writers' require re-compilation of the
source to accomodate format updates, IFF '88 will not.  IFF '88 also
incorporates a more natural recursive descendant format.

	Basically, IFF '88 will consist of a number of libraries.  In the 
simplest scheme there would be two libraries, one containing two parsers
(read and write) and the other containing support routines.  In a more
complex scheme 5 libraries would be created, one for each parser, one for
each set of related support routines, and the fifth for routines shared by
both the reader and writer libraries.

	To use IFF '88 the developer will initialize a control stucture
(a list of nodes) which will be used to read/write the files.  Effectively,
your program will write a program, which will be used to write or read the
desired file.  Initialization of the data structures will be simplified with
routines provided in the support libraries.  Defining a control structure
will be acheived through calls much like those used to initialize intuition
menue structures, which most of us are quite familiar with.

	The IFF '88 parser design is generic and performs no error checking
on the validity of the control structure it is passed.  It will be the
responsiblility of the developer to ensure that a valid control structure
is passed to the parser.


			The Writer Mechanism
	              ------------------------

	In order to write a file an implementation first creates and
properly initializes a writer-structure, then calls the writer function
which parses the structure and writes the file.


		    ENTRIES in the Write Structure
		  ----------------------------------

	The basic element of the writer/reader structure will henceforth be
called an "entry".  An entry to the writer structure is simply the following
record:
	   StdProcPtr		= POINTER TO PROCEDURE(ADDRESS);

	   WrtAlgParamsPtr	= POINTER TO WrtAlgParams;
           WrtAlgParams		= RECORD
				   DataAddr	: ADDRESS;
				   ByteCount	: LONGCARD;
				  END;

	   WENTRY		= RECORD
				   ckID		: ARRAY[0..4] OF CHAR;
				   ckStatus     : LONGWORD;
				   PreCall,
				   WrtAlg,
				   PostCall	: StdProcPtr;
				   PreData,
				   WrtData,
				   PostData	: ADDRESS;
				   WLev		: WLevelPtr; 
						   {defined later in this doc.}
				  END;

	The fields have the following definitions:

	   ckID	   :  4 byte ID as defined in IFF '85.

	   ckStatus:  32 bits to be used for flags and such.  I envision
			three flags to be used for "bad", "dirty", and
			"modified" chunk identification.
			
           WrtAlg  :  The algorithm used to write the chunk contents as
	   		referenced by the "WrtData" field.  In the simplest
			case the WrtAlg will point at a standard WriteBytes
			routine.  This routine is passed one parameter on the
			stack.  In this way differences in compiler paramater
			passing conventions can be more easily resolved.
			
	   PreCall :  Normally NIL.  Used for special cases to execute a
			pre-write function, and is passed the value held
			in "PreData" as its parameter.

	   PostCall:  As for PreCall, but called after a call WrtAlg.
	   
           WrtData :  Passed to the fuction pointed to by WrtAlg.  There is
	                no restriction on what this field is to be used for.
			However, as a general convention it will be used to
			hold the address of an initialized WrtAlgParams record.
			
	   PreData :  As WrtData, but used in conjunction with PreCall.
	   
	   PostData:  As WrtData, but used in conjunction with PostCall.

	   WLev :  A pointer to a lower WLEVEL structure.  If this pointer
                        is NIL then this entry contains data and the
			other feilds of this entry are processed.  If it is
			not NIL the other feilds in this entry are ignored,
			and the WLEVEL structure pointed at is parsed.  A
			variant record could also be used, but this is easier
			and thus less prone to cause undesired results.



			LEVELs in the Write Structure
		      ---------------------------------
		      
	Levels in the write structure represent nesting control
of the file writing mechanism.

	  WLevelPtr	=    POINTER TO WLevel;
	  WLevel	=	RECORD
                      	         Entry	:  WENTRY;
                                 Next	:  WLEVELptr;
                    	        END;

	Using levels in the write structure is quite simple.  A level is
composed of any number of WLevel nodes, linked together in a list, and
defines how the parser should organize chunks.  The following example
should provide an efficeint explanation of the operational mechanism.


            Parsing an Example Initialized Write Strutructure
           ---------------------------------------------------
	
	The parser is very simple.  The easiest way to decribe its function
is through example so...

	First we need something to parse so consider the following initialized
structure for writing a simple ILBM.  The parser is passed a WLevelPtr which
we will call root.  Unintialized fields are not shown.  Record types are shown
in {} as in "{WLevel}" and are abstract (not part of the actual data). The
contents of a record type are indented one space.  Sorry for the lack of
graphics in this doc.

 root 
   \
    \
   {WLevel}
    {WEntry}
      ckID = "FORM";
      WLev --------> {WLevel}
     Next = NIL;      {WEntry}
                        ckID = "ILBM";
			WLev -----> {WLevel}
		       Next = NIL;   {WEntry}
		                       ckID = "BMHD";
				       WrtAlg = ADR(WriteBytes());
				       WrtData ---> {WrtAlgParams}
				      Next	      ADR(BitMapHdr);
					|	      TSIZE(BitMapHdr);
				        |
					V
				    {WLevel}
				     {WEntry}
				       ckID = "CMAP"
				       WrtAlg = ADR(WriteBytes());
				       WrtData ---> {WrtAlgParams}
				      Next            ADR(ColorTable);
					|	      nColors;
				        |
					V
				    {WLevel}
				     {WEntry}
				       ckID = "BODY";
				       WrtAlg = ADR(BodyWrtAlg());
				       WrtData ---> {WrtAlgParams}
				      Next	      ADR(BitMap);
                                        |
				        |
				        V
				       NIL
	
	Effectively each node in the level structure is a node in a simple
binary tree. One of the descendant pointers is contained in the WLEVEL
structure and is used to establish lists of entries at the same level.  The
other descendant pointer, WLev, is contained in the WENTRY structure.  It is
used to establish lower levels or specify that the chunk contains data (by
being NIL).

	The reader is a bit more complicated, but follows the same general
principals.  The structure is more complex, allowing groupings of chunks.
Level pointers can be connected to higher levels creating a recursive reader.

	What all this buys us is versatility.  Because it is possible to link
user routines into the writer or reader structures, it is not necessary to
update the library to incorporate a new low-level algorithm, such as
compression algorithms.  Also, LISTS and CATS are unnecessary; simple
extention through Levels is sufficient to write any file.  It would probably
be desirable to replace the "FORM" keyword with something new, such as
"NIFF" or "IF88".

        Sorry this is not well organized, but I already spent more of the
day on this than I have.  There is undoubtedly room for improvement,
suggestions?

	If there is any interest I'll go into more detail.  Right now I have
to get back to X-Specs 3D stuff.

							Thanks,




UUCP: {cbosgd, hplabs!hp-sdd, sdcsvax, nosc}!crash!pnet01!haitex
ARPA: crash!pnet01!haitex@nosc.mil
INET: haitex@pnet01.CTS.COM
Opionions expressed are mine, and not necessarily those of my employer.