Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!uunet!husc6!think!ames!ucbcad!ucbvax!CS.ROCHESTER.EDU!nl-kr-request From: nl-kr-request@CS.ROCHESTER.EDU (NL-KR Moderator Brad Miller) Newsgroups: comp.ai.nlang-know-rep Subject: NL-KR Digest Volume 3 No. 57 Message-ID: <8712050130.AA18628@castor.cs.rochester.edu> Date: Fri, 4-Dec-87 20:09:00 EST Article-I.D.: castor.8712050130.AA18628 Posted: Fri Dec 4 20:09:00 1987 Date-Received: Thu, 10-Dec-87 05:59:20 EST Sender: usenet@ucbvax.BERKELEY.EDU Reply-To: nl-kr@cs.rochester.edu Organization: University of Rochester, Department of Computer Science Lines: 565 Approved: nl-kr@cs.rochester.edu NL-KR Digest (12/04/87 20:08:14) Volume 3 Number 57 Today's Topics: Knowledge-based bibliographies Wanted: a module for natural language interface (in LISP) Text Encoding Standard for the Humanities - Vassar Workshop report DCG Re: measures of "Englishness" Re: Lip Movement and Mental Lexicons? Re: Language Learning Re: Language Learning (a Turing test) Re: Language Learning (anecdotes) Re: Language Learning (anecdotes) ---------------------------------------------------------------------- Date: Wed, 2 Dec 87 09:01 EST From: Roland Zito-WolfSubject: Knowledge-based bibliographies I am looking for references regarding knowledge-bases and KB-based tools for organizing a bibliographic database on AI. I want to be able to retrieve references by various indices. Specific issues I'd like to know about: - friendly data entry - searching through alternate paths (say, finding articles related to a given article in some way: by author, topic, system name, etc.) - ability to "evolve" the structure of the KB with time - what is areasonable conceptual structure for reference databases, in general? I'll post a digest of responses to the list. Roland J. Zito-wolf Palladian Software 4 Cambridge Center Cambridge, Mass 02142 617-661-7171 RJZ%JASPER@LIVE-OAK.LCS.MIT.EDU ------------------------------ Date: Wed, 2 Dec 87 12:54 EST From: David Naumann Subject: Wanted: a module for natural language interface (in LISP) Wanted: A module for a natural language interface (in LISP) We are developing a tool for research of systems analyst behavior. The tool requires a natural lanThe tool requires a natural language front end. We would like to know if anybody has, or knows of, any natural language interface module (in LISP) that would take a question in English, validate it and produce a parsed tree. We prefer public domain software, but are also willing to pay for it if necessary. ne necesary. Please note that we have a limited budget. Thanks for your help. J. David Naumann Macedonio Alanis University of Minnesota Management Sciences Department Management Information Systems Area ARPA nauman@umn-cs BITNET naumann@umnacvx alanis@umnacvx ------------------------------ Date: Wed, 2 Dec 87 22:50 EST From: Robert Amsler Subject: Text Encoding Standard for the Humanities - Vassar Workshop report [The following is a summary prepared by Michael Sperberg-McQueen for the HUMANIST mailing list of the first workshop on the preparation of an encoding standard for text in the humanities held at Vassar College last month. As an attendee and steering committee member, I would be willing to answer further questions concerning this effort for the IRLIST or NL-KR communities. The effort to develop a standard for encoding texts in the humanities is just starting and anyone with interest in this noble and ambitious goal should not feel the slightest hesitancy about becoming a part of the effort. What is at stake is nothing less than the creation, use and preservation of our global electronic cultural heritage - R. Amsler, (amsler@flash.bellcore.com)] Contributor: "Michael Sperberg-McQueen" A followup on the current status of the ACH effort to formulate guidelines for text encoding practices. ****************************************************************** * NOTE: The following encoding conventions have been used to * * represent French accents throughout this message: * * * * To Represent Accents -- Pour la representation des accents * * / acute accent - accent aigu * * ` grave accent - accent grave * * * * The accent codes are typed Les codes pour les accents se * * AFTER the letter, and are trouvent APRES la lettre qu'ils * * used with both upper and modifient, et s'utilisent avec * * lower case letters. les majuscules aussi bien que * * les minuscules. * ****************************************************************** On November 12 and 13, 1987, 31 representatives of professional societies, universities, and text archives met to consider the possibility of developing a set of guidelines for the encoding of texts for literary, linguistic, and historical research. The meeting was called by the Association for Computers and the Humanities and funded by the National Endowment for the Humanities. The list of participants is appended to this document. The participants heartily endorsed the idea of developing encoding guidelines. In order to guide such development, they agreed on the following principles: The Preparation of Re/daction des directives Text Encoding Guidelines pour le codage des textes Poughkeepsie, New York 13 November 1987 1. The guidelines are intended 1. Le but des directives est de cre/er to provide a standard format un format standard pour l'e/change for data interchange in des donne/es utilise/es pour la humanities research. recherche dans les humanite/s. 2. The guidelines are also 2. Les directives sugge/reront intended to suggest principles e/galement des principes pour for the encoding of texts l'enregistrement des textes in the same format. destine/s a` utiliser ce format. 3. The directives should 3. Les directives devraient a. define a recommended a. de/finir une syntaxe recommande/e syntax for the format pour exprimer le format, b. define a metalanguage b. de/finir un me/ta-langage for the description de/crivant les syste`mes de of text-encoding schemes, codage des textes, c. describe the new format c. de/crire par le moyen de ce and representative me/talangage, aussi bien qu'en existing schemes both in prose, le nouveau syste`me de that metalanguage and codage aussi bien qu'un choix in prose. repre/sentatif de syste`mes de/ja` en vigueur. 4. The guidelines should 4. Les directives devraient proposer propose sets of coding des syste`mes de codage utilisables conventions suited for pour un large e/ventail various applications. d'applications. 5. The guidelines should 5. Sera incluse dans les directives include a minimal set of l'e/nonciation d'un syste`me de conventions for encoding codage minimum, pour guider new texts in the format. l'enregistrement de nouveaux textes conforme/ment au format propose/. 6. The guidelines are to be 6. Le travail d'e/laboration des drafted by committees on: directives sera confie/ a` quatre comite/s centre/s sur les sujets suivants: a. text documentation a. la documentation des textes, b. text representation b. la repre/sentation des textes, c. text interpretation c. l'analyse et l'interpre/tation and analysis des textes d. metalanguage definition d. la de/finition du me/talangage et and description of son utilisation pour de/crire le existing and proposed nouveau syste`me aussi bien que schemes ceux qui existent de/ja`. co-ordinated by a steering Ce travail sera coordonne/ par un committee of representatives comite/ d'organisation ou` of the principal sie`geront des repre/sentants des sponsoring organizations. principales associations qui soutiennent cet effort. 7. Compatibility with existing 7. Dans la mesure du possible, le standards will be maintained nouveau syste`me sera compatible as far as possible. avec les syste`mes de codage existants. 8. A number of large text 8. Des repre/sentants de plusieurs archives have agreed in grandes archives de textes en form principle to support the lisible par machine acceptent en guidelines in their function principe d'utiliser les directives as an interchange format. en tant que description des formats We encourage funding agencies pour l'e/change de leurs donne/es. to support development of Nous encourageons les organismes tools to facilitate this qui fournissent des fonds pour la interchange. recherche de soutenir le de/veloppement de ce qui est ne/cessaire pour faciliter cela. 9. Conversion of existing 9. En convertissant des textes machine-readable texts to lisibles par machine de/ja` the new format involves the existants, on remplacera translation of their automatiquement leur codage actuel conventions into the syntax par ce qui est ne/cessaire pour les of the new format. No rendre conformes au format nouveau. requirements will be made for Nul n'exigera l'ajout the addition of information d'informations qui ne sont pas not already coded in the de/ja` repre/sente/es dans ces texts. textes. (trad. P. A. Fortier) ****************** The further organization and drafting of the guidelines will be supervised by a steering committee selected by the three sponsoring organizations: ACH (the Association for Computers and the Humanities), ACL (the Association for Computational Linguistics), and ALLC (the Association for Literary and Linguistic Computing). Drafts of the guidelines will be submitted for comment to an editorial committee with representatives of all participating organizations (in addition to the sponsors, thus far: the Modern Language Association, the Association for Computing Machinery Special Interest Group for Information Retrieval, and the Association of American Publishers; the following groups have indicated interest informally but have not yet formally pledged participation, in most cases pending a formal vote: the Linguistic Society of America, the Association for Documentary Editing, the American Philological Association. The American Anthropological Association, plus several organizations within Europe, are now being asked to consider participation. The interchange format defined by the guidelines is expected to be compatible with the Standard Generalized Markup Language defined by ISO 8859, if that proves compatible with the needs of research. The needs of specialized research interests will be addressed wherever it proves possible to find interested groups or individuals to do the necessary work and achieve the necessary consensus. Formation of specific working groups will be announced later; in the meantime, those interested in working on specific problems are invited to contact either Dr. C. M. Sperberg-McQueen, Computer Center, University of Illinois at Chicago (M/C 135), P.O. Box 6998, Chicago IL 60680 (on Bitnet: U18189 at UICVM), or Prof. Nancy Ide, Dept. of Computer Science, Vassar College, Poughkeepsie NY 12601 (on Bitnet: IDE at VASSAR). - N.I., C.M.S-McQ ------------------------------------------------------------------------------ List of Participants NOTE: Association names are given following the names of their representatives at this meeting. Helen Aguera, National Endowment for the Humanities Robert A. Amsler, Bell Communications Research David T. Barnard, Department of Computing and Information Science, Queen's University, Ontario Lou Burnard, Oxford Text Archive Roy Byrd, IBM Research Nicoletta Calzolari, Istituto di linguistica computazionale, Pisa David Chestnutt (Assoc. for Documentary Editing, American Historical Assoc.), Department of History, University of South Carolina Yaacov Choueka (Academy of the Hebrew Language), Department of Mathematics and Computer Science, Bar-Ilan University Jacques Dendien, Institut National de la Langue Francaise Paul A. Fortier, Department of Romance Languages, University of Manitoba Thomas Hickey, OCLC Online Computer Library Center Susan Hockey (Association for Literary and Linguistic Computing), Oxford University Computing Service Nancy M. Ide (Association for Computers and the Humanities), Department of Computer Science, Vassar College Stig Johansson, International Computer Archive of Modern English, University of Oslo Randall Jones (Modern Language Association), Humanities Research Computing Center, Brigham Young University Robert Kraft, Center for the Computer Analysis of Texts, University of Pennsylvania Ian Lancashire, Center for Computing in the Humanities, University of Toronto D. Terence Langendoen (Linguistic Society of America), Graduate Center, City University of New York Charles (Jack) Meyers, National Endowment for the Humanities Junichi Nakamura, Department of Electrical Engineering, Kyoto University Wilhelm Ott, Universitaet Tuebingen Eugenio Picchi, Istituto di linguistica computazionale, Pisa Carol Risher (American Association of Publishers), American Association of Publishers, Inc. Jane Rosenberg, National Endowment for the Humanities Jean Schumacher, Centre de traitement e/lectronique de textes, Universite/ catholique de Louvain a` Louvain-la-neuve J. Penny Small (American Philological Association), U.S. Center for the Lexicon Iconographicum Mythologiae Classicae, Rutgers University C.M. Sperberg-McQueen, Computer Center, University of Illinois at Chicago Paul Tombeur, Centre de traitement e/lectronique de textes, Universite/ catholique de Louvain a` Louvain-la-neuve, Belgium Frank Tompa, New Oxford English Dictionary Project, University of Waterloo Donald E. Walker (Association for Computational Linguistics), Bell Communications Research Antonio Zampolli, Istituto di linguistica computazionale, Pisa, Italy ------------------------------ Date: Thu, 3 Dec 87 20:27 EST From: ganguly@ATHENA.MIT.EDU Subject: DCG Hi! Does someone have a Definite Clause Grammar parser written in Edinburgh PROLOG that I may use as an user interface ? Thanking in advance, Jaideep Ganguly ------------------------------ Date: Fri, 20 Nov 87 11:46 EST From: Bruce Nevin Subject: Re: measures of "Englishness" Re statistical measures of `Englishness': A number of studies were made of admissable and inadmissable phoneme sequences in English vocabulary in the '50s. One application was provision of a list of unused potential English vocabulary for new trade names. There may be something about this in Gleason's old textbook. There are some examples illustrating the general method of generating tables of next-successor phonemes or of next-successor morphemes in words in Harris's _Methods in Structural Linguistics_ (1952). In his 1968 book _Mathematical Structures of Language_, Harris summarizes results of computer test of a hypothesis made earlier in his `From phoneme to morpheme' paper (sorry, I don't have the reference-- _Language_ in the early '50s I think). The report of results of the test appears in full in one of the TDAP papers from U. Penn. The general observation is that the number of next successors drops as you proceed along the phoneme sequence making up a morpheme, and rises again when you get to morpheme boundary, reflecting the relative arbitrariness of how the next morpheme may begin. Thus for the sentence `Dogs were indisputably quicker', the number of next successors for each phoneme is as follows (numbers under phonemes): d o g . z . w ^ r . i n . d i s . p y u w t . ^ b . l i y . 12 7 29 29 7 3 28 13 28 10 14 21 9 2 2 2 28 2 4 2 2 28 k w i k . ^ r . 12 8 10 28 3 29 The dots indicating morpheme boundaries suggested by the test were not input to the test, and are included only to clarify results. The boundary between the last two syllables of `indisputably' is the least strongly indicated. Running the test in reverse order (next predecessors, as it were) helps confirm or eliminate marginal cases. And all results are subject to regularization by standard distributional methods of linguistics. I have altered the display on p. 25 of Harris's book (1) by using ^ for schwa and (2) by estimating the numbers from his graph. I may not have got the numbers just right but they are certainly good enough to make the point. Bruce Nevin bn@cch.bbn.com (Disclaimer: if you infer anything from this about the opinions of my employer, its clients, etc, it's not by my intent, and you're on your own.) ------------------------------ Date: Wed, 25 Nov 87 17:20 EST From: Steve Cassidy Subject: Re: Lip Movement and Mental Lexicons? Date: Sun, 15 Nov 87 10:41 EST From: Murray Watt Subject: Re: Lip Movement and Mental Lexicons? What does phonemic represention have to do with LEXICAL MEANING? (Phonemic meaning is all the rage in current linguistic research, but I think this is a different type of meaning.) ... I have never SEEN any arguments that the phonemic representation resides in the same location as lexical enties and I have never heard of a letter based lexicon in the mind. Are you sure your not confusing dictionaries and the human mind? 8-) Murray Watt The current `best' theory of human word processing (going from printed word to `lexical item') is based on making analogies with stored representations of the words based on a letter mediated representation. That is there are letters in there somewhere but they may be grouped or organised in a way which is not yet clear. The current best theory of reading development suggests that it is heavily tied in with spelling development and that the same `lexical entry' is used for both, and that there is transference between the two skills. For competent spelling a letter by letter representation of the word is needed, sound to letter rules don't work well enough. Similarly it would seem that a phonemic representation is needed to pronounce (some) words. So the mental lexicon should contain references to an orthographic representation (probably close to letter strings) and a phonemic representation. I don't know what lexical meaning is. Do you? The judgement as to 'best' theories above is my own. Steve Cassidy ACSnet: steve@vuwcomp.nz| Victoria University, Private Bag, -------------------------------------| Wellington, New Zealand UUCP: ...seismo!uunet!vuwcomp!steve| "If God had meant us to be perfect, He would have made us that way" - Winston Niles Roomford III ------------------------------ Date: Wed, 25 Nov 87 08:45 EST From: Richard Wexelblat Subject: Re: Language Learning Readers of this group might be interested in looking up the Ph.D. dissertation of Kathy Hirsh-Pasek (Univ. of Penna., 1980+-2?) who did an extensive study of language learning in hearing children of deaf parents. As I recall, she concluded that there was no statistically significant difference -- but I don't really remember the parameters of the study. Perhaps someone with access to _Dissertation_Abstracts_ will look up the specific reference. -- --Dick Wexelblat {uunet|ihnp4|decvax}!philabs!rlw rlw@philabs.philips.com ------------------------------ Date: Tue, 1 Dec 87 11:56 EST From: Rick Wojcik Subject: Re: Language Learning (a Turing test) In article <2363@tut.cis.ohio-state.edu> paul@tut.cis.ohio-state.edu (Paul W. Placeway) writes: > >Actually, the "story" I was thinking of is similar, but with a big >difference: I am told that Dr. Lehiste (who's native language is >Estonian), when traveling in Germany, regularly fools native speakers >into thinking that she is German, but from some other region. From >what I have been told, this effect is true, even for extended >conversations. > I am familiar with your examples, since I took my undergraduate and graduate degrees in linguistics at OSU. Having studied Estonian with Dr. Lehiste, a world-renowned acoustician and phonetician, I can well believe that she fools native German speakers. She has pointed out some very subtle differences between Estonian & German--for example, the fact that word-initial vowels in German are always preceded by a glottal stop, but that those in Estonian never are. I may be wrong, but I don't think that she has native-like control over this aspect of German. When did she learn German, anyway? That's an essential point here. (Don't forget that the Estonia of her childhood had close ties to Germany.) It is also worth noting that, despite her many years of residence in America and her linguistic sophistication, she retains a noticeable foreign accent in English. Her control of English is about as good as it can get in adult language learners. >The similarity of dialect does not allways hold either. Elizabeth >Zwicky does not speak the same regional dialect of SAE that I do, even >though the two of us spent the majority of our lives growing up within >10 miles of each other, in the same side of the same city. Our You miss the point. I never said anything about the social and ethnic factors that shape dialects. The Columbus neighborhood that you and she grew up in contains a mixture of Northern and Midland dialects. Elizabeth's dialect (Northern) and yours (Midland?) are recognizably American. =========== Rick Wojcik rwojcik@boeing.com ------------------------------ Date: Tue, 1 Dec 87 14:23 EST From: goldfain@osiris.cso.uiuc.edu Subject: Re: Language Learning (anecdotes) I think the "crystallization hypothesis" in language acquisition is an hypothesis which by its very nature will snag people into a debate. I think a review of the overall nature of this hypothesis and debate are instructive as to something which should be avoided whenever possible in science. 1) We have an observable phenomenon at a very high level of complexity: It concerns fine distinctions in natural language behavior. 2) The observations of the phenomenon are not well pinned down: Researchers mention something about "mastery" of the language, then sometimes back off and only simply make claims about phonetic categorial perception, then shift back to discussing scores on grammar tests among people who have been in a culture for 10-20 years, immigrating at different times in their lives, etc. ************************************************************************** * I am not saying the phenomenon isn't real! There are observable and * * interesting phenomena here. I am just qualifying that. * ************************************************************************** 3) The phenomena *suggests* that *possibly* there is a physiological basis for such trends and differences as are observed. To make a really concrete claim, it *suggests* that perhaps some maturation process in the normal human brain occurs at about mid teenage years. 4) There are lots of other mechanisms that are consistent with the observed phenomena: a wide range of psychological "lower-level" factors have been listed in the current debate in this note file. 5) If one really steps back and looks at this objectively, we can tell that the "experiments" ("studies" is actually a better word) thus far performed and currently underway will never help distinguish whether this phenomenon has a physiological basis or merely a psychological basis, or a combination of both (don't forget that possibility!) 6) There is a large set of anecdotal rumor floating around that is only going to keep the issue cloudy. It may keep us from the wrong conclusion, but it is not going to settle us down on whatever the correct answer is. I think the only way to settle the matter will have to wait on tighter experimentation (if it is ever judged that this issue is worth the experiments it would take to settle it.) It will require a great deal of progress in neurophysiology, or some volunteers for some outrageous psychology experiments. (Find me 100 open-minded adults who will set aside all other interests for at least 5 years of their lives ... ) In other words, I think the moral of this issue is that you cannot expect to settle an issue that is several layers of abstraction below the level of your observational apparatus. (In this case it might be more than "several".) In a sense I'm saying: "Go back to the lab and let's look for other things we can get a better grip on - this issue will have to wait until another day." Mark Goldfain arpa: goldfain@osiris.cso.uiuc.edu US Mail: Mark Goldfain (A lowly student at)--> Department of Computer Science University of Illinois at U-C 1304 West Springfield Avenue Urbana, Illinois 61801 ------------------------------ [Editor's Note: There is still a backlog of items on language learning which will be posted next week.] End of NL-KR Digest *******************