Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!seismo!lll-crg!ames!ucbcad!ucbvax!decvax!decwrl!sun!wmb
From: wmb@sun.uucp (Mitch Bradley)
Newsgroups: comp.lang.forth
Subject: Re: Unix system calls from Forth
Message-ID: <10419@sun.uucp>
Date: Sat, 13-Dec-86 21:22:02 EST
Article-I.D.: sun.10419
Posted: Sat Dec 13 21:22:02 1986
Date-Received: Tue, 16-Dec-86 01:05:11 EST
References: <12234@watnot.UUCP>
Distribution: net
Organization: Sun Microsystems, Inc.
Lines: 325
Summary: Rochester Conference working group on Unix and Forth
At the 1985 Rochester Forth Conference, several of us Unix and Forth
users had a working group and hammered out a set of Forth-to-Unix
interface conventions. Following is a copy of the resulting working group
report.
In summary:
1) Forth word names for Unix system calls should start with underscore (_)
2) The leftmost C argument should appear on the top of the Forth stack
3) We define defining words SYSCALL: and SUBROUTINE: for constructing
interfaces to system calls and library routines, respectively.
There are also defining words to access C data storage areas and
to allow C routines to call Forth words.
4) Argument type conversion is done automatically by the defining words,
under control of a parameter type specification list. The number
of arguments and the number of return values is not enough in the
general case.
5) Error reporting is handled with a word ERRNO which returns a value,
not an address. ERRNO returns 0 if no error occurred, or the Unix
error number otherwise.
6) The report covers other areas such as case sensitivity, control characters
in source code files, file naming conventions, etc.
Mitch Bradley (the rest of the messages is the report)
Forth and Unix Working Group
Mitch Bradley
ABSTRACT
The Forth and Unix working group included a number of
people who are currently using Forth with the UNIX-
operating system, and a few interested observers. The
group agreed on a set of guidelines for the interface
between Forth and UNIX, based on the experience of the
participants. Adoption of these guidelines should
increase the ability of Forth users under UNIX to share
code.
System Call and C Language Interface
It is frequently desireable to use UNIX system calls from within
Forth. Also, since UNIX has an extensive set of library routines that
are written in or callable from the C language, Forth can benefit from
being able to execute C subroutines. The following wordset defines an
interface between Forth, C, and the operating system. The scheme is
quite general; it should serve equally well to integrate Forth into
another operating system (other than UNIX), or another language
envoronment (other than C).
SYSCALL: ( -- ) ( Input Stream: system-call-name )
A defining word used in the form: SYSCALL:
Defines so that when is later executed, the UNIX
system call of the same name will be invoked. This should only be
used to define Forth interfaces to system calls (as opposed to C
language subroutines). should be the same as the name of a
UNIX system call, but with an underscore (_) as the first charac-
ter. For example, the read() system call, which reads from a
file, would be interfaced to Forth with:
SYSCALL _read
will be described later.
SUBROUTINE: ( ext-name -- ) ( Input Stream: )
Defines so that when is later executed, the external
_________________________
- UNIX is a trademark of Bell Laboratories.
December 13, 1986
- 2 -
subroutine ext-name is invoked. ext-name is passed to SUBROUTINE:
as the address of a packed string. ext-name is usually the exter-
nal name of a C language subroutine. is
described later.
ENTRY: ( ext-name -- ) ( Input Stream: )
Builds an entry point so that the already-existing Forth word
may be called from outside of Forth. ext-name is the
external name by which the Forth word will be known to the outside
world.
This is useful, for example, when Forth calls a C routine which per-
forms output, but the programmer wants the C output to go through the
Forth I/O system. This example might result in the following:
" _putchar" ENTRY: EMIT
is described later.
Note that ENTRY: is not a defining word, in that it does not cause a
new name to be created in the Forth dictionary.
DATA: ( ext-name -- ) ( Input Stream: )
Defines so that when is later invoked, the address
of the data storage area associated with the external symbol ext-
name is left on the stack. For example, if a C subroutine defines
an external array:
int primes = { 2, 3, 5, 7, 11, 13, 17, 19 };
that array could be accessed from Forth by declaring:
" _primes" DATA: primes
(The external name of C objects in the UNIX world is the name with
an underscore prepended, hence _primes).
Since the details of how arguments are passed to and from subrou-
tines is usually different between Forth and the rest of UNIX, it is
necessary to provide a means for moving arguments between the Forth
stack(s) and wherever the UNIX and C language routines expect the
arguments to be. Rather than requiring the Forth programmer to deal
with this, the interface wordset provides a way to describe the argu-
ments in such a way than appropriate conversions may be made automati-
cally. A suggested implementation is to compile an appropriate bit of
assembly code for each SYSCALL: , SUBROUTINE: , or ENTRY:, which would
perform the argument conversions/movements. The argument specifica-
tion is done with a . A specifies
the type and order of the input and output arguments. The is a list of the types of the input arguments, followed by "--",
followed by the type of the output argument, followed by "END".
The possible types are from this table:
December 13, 1986
- 3 -
void_ty null type
addr_ty address (a pointer to something)
int_ty "standard" or "normal" integer (1 stack cell)
float_ty floating point
dfloat_ty double precision float
string_ty string
char_ty 1 byte
uchar_ty 1 byte unsigned
short_ty 2 bytes signed
ushort_ty 2 bytes unsigned
long_ty 4 bytes signed
ulong_ty 4 bytes unsigned
The order of the input arguments is opposite from that of the C
specification; i.e. the rightmost C argument is mentioned first in the
Forth . This is due to the fact that most C compilers
actually process arguments from right to left, so this scheme is
likely to cause fewer potentional problems.
Example
The UNIX system call to create a new file is called creat. It's C
language description is:
int creat(name,mode)
char *name;
int mode;
This means that it takes 2 arguments: a string (char *) which is the
name of the file to create, and an integer "mode" which controls which
users have various access permissions on the new file. The return
value is an integer which is a UNIX file descriptor useful for subse-
qunetly accessing the file, or -1 if an error occurred. The Forth
interface to creat is specified as follows:
( mode name fd )
SYSCALL: _creat int_ty string_ty -- int_ty END
Errors
In UNIX there is a global variable errno which generally contains an
extra error status code if the last system call failed for some rea-
son. The Forth interface to this is the Forth word:
ERRNO( -- error-code )
After each UNIX system call, the value left on the stack by ERRNO
will be 0 if the system call succeeded, or the contents of the
UNIX global variable errno if the call failed. Any data storage
required by ERRNO should be in the USER area, so that different
Forth tasks may independently perform system calls without con-
flict.
December 13, 1986
- 4 -
Case Sensitivity?
The group had mixed feelings about this issue. The following (incom-
plete) set of guidelines were agreed-upon:
1 The Forth system should be able to accept either upper case or
lower case input.
2 At the users option, upper case and lower case input should be
treated as either distinct or indistinct.
3 Programmers are strongly encouraged to avoid the use of names that
differ only in the case of the letters used; e.g., don't name one
variable "blockno" and another different variable "BLOCKNO".
Input Delimiters
The Forth phrase BL WORD should treat all control characters, as well
as the ascii blank character, as delimiters, both when skipping ini-
tial delimiters, and when scanning for the delimiter which terminates
a word. This greatly simplifies the interpretation of ordinary text
files, which may contain tabs, linefeeds, carriage returns, and
formfeeds as separator characters in addition to ordinary blanks.
This may be efficiently implementing by testing for "( char ) BL <="
instead of "( char ) BL =" when skipping or scanning for delimiters.
WORD with any character other than BL as the delimiter should treat
only that character as the delimiter. In this case, leading delim-
iters should NOT be skipped. Not skipping leading delimiters prevents
a common Forth bug whereby a zero-length string is not processed
correctly. For example, some systems will not do the obvious thing
when confronted with ( ) or ." " The author has NEVER seen a case
where WORD with a non-blank delimiter should have skipped leading del-
imiters.
The actual delimiter encountered which terminates the scanning of WORD
should be stored in the USER variable:
DELIMITER ( -- addr )
addr is the address of a USER variable which contains the actual
delimiter encountered when executing the previous invocation of
WORD . If the delimiter encountered was the end of the input
stream, the value contained in the USER variable is -1.
This makes it easy to check for a number of end conditions.
Environment
A Forth program can access the UNIX shell Environment Variables with:
GETENV ( str1 -- [ str ] flag )
str1 is the address of a counted string which is the name of the
desired environment variable. flag is true if that environment
variable is set, and str is the address of a counted string which
December 13, 1986
- 5 -
contains the value of that environment variable. flag is false if
that environment variable is not set, and str is not present.
The user may set the environment variable FPATH in his shell environ-
ment. If set, Forth may use the value of this variable as a list of
directory names in which to search for files. Example (csh syntax):
setenv FPATH .:/usr/wmb/lib/forth/:/usr/local/lib/forth
If this envoronment variable is not set, Forth may use a system-
dependent default list of directories in which to search for files.
The default list contains the current directory as its first com-
ponent, but the rest of the list is system-dependent.
Filename Extensions
Ordinary UNIX text files containing Forth source code (not in block
format) should have names ending with the extension ".fth". (".f"
would be nice but Fortran got it first!). Files containing Forth
blocks shoule have names which end with ".blk".
Subprocesses
The following words provide the capability of executing UNIX sub-
processes from within Forth:
SH ( -- ) ( Rest of Line: string arguments to process )
A subshell is spawned to execute the UNIX command line which is
the remainder of the Forth input line. If the user's SHELL
environment variable is set, it's value controls which shell to
use (Bourne shell or C-shell or Korn shell). Otherwise the Bourne
shell (/bin/sh) is used. As a possible optimization, the imple-
mentation of SH is allowed to directly execute the command line in
a subprocess rather than spawn a subshell, if it can determine
that no special shell metacharacter expansions (like wildcards,
for instance) are required.
SH[ ( -- ) ( Input Stream: characters up to next ] )
Similar to SH , but only those characters between the brackets [ ]
are included in the command line.
-SH ( command-string -- )
Similar to SH, but the command line is taken as the address of a
packed string from the stack.
CHILD-STATUS ( -- status )
status is the return status returned by the most-recently executed
subprocess. The implementation should keep any data associated
with CHILD-STATUS in the USER area so that different tasks may
execute subprocesses without conflict.
Open Issues
Many issues remain to be addressed, to wit: Terminal independent
December 13, 1986
- 6 -
display control - TERMCAP vs Termio vs something else? Object file
formats and Forth words for controlling the dynamic loading of them
(as opposed to the specification of the interface points, which is
covered here). Signal handling. Multitasking and the interface with
(blocking) UNIX I/O system calls.
Participants
Mitch Bradley
Bill Sebok
Peter Blake
Tom Almy
Dave Hooley
Harry Arnold
December 13, 1986