Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!iuvax!cica!tut.cis.ohio-state.edu!purdue!haven!adm!smoke!gwyn From: gwyn@smoke.BRL.MIL (Doug Gwyn) Newsgroups: comp.std.c Subject: Re: C source character set Message-ID: <11210@smoke.BRL.MIL> Date: 2 Oct 89 21:53:29 GMT References: <1302@gmdzi.UUCP> Reply-To: gwyn@brl.arpa (Doug Gwyn) Organization: Ballistic Research Lab (BRL), APG, MD. Lines: 32 In article <1302@gmdzi.UUCP> wittig@gmdzi.UUCP (Georg Wittig) writes: > /* in the following lines let @ be the character '\0' */ > int x; > x = 1 + /* foo @ bar */ > 2 /* */ > ; The character you're representing by "@" is not in the standard C source character set, so such a program is not strictly conforming. Some implementations may be able to deal with that source code but others will not. If an implementation does deal with it, it is up to that implementation how to interpret this non-standard extension. >[2] Furthermore, there are (non-UNIX) operating systems that encode the end of > a source line by the number of bytes of that line ... There is a misunderstanding here. The specifications for C source character set do not constrain how C source code files are represented in a particular implementation, nor how text editors present C source code visually, nor myriad other similar issues. C source code characters must be seen as distinct units by the conforming C translator; what mapping is done from physical source character encoding before that point lies beyond the scope of the C standard. Presumably it will be similar to that done for "text" files in the hosted C library text-stream support, but it need not be. >[3] Line continuation by `\': Does it only apply to #define contexts and string > constant contexts, or is it a general rule? It's a general rule. The first translation phase is physical-to-C source code character mapping, then trigraph replacement, then \ newline splicing. Preprocessing occurs after that.