Path: utzoo!utgpu!water!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!mailrus!ames!pasteur!eris!korn
From: korn@eris.berkeley.edu (Peter "Arrgh" Korn)
Newsgroups: comp.sys.mac.hypercard
Subject: Re: Large Scale DataBase
Message-ID: <5845@pasteur.Berkeley.EDU>
Date: 21 Sep 88 05:19:06 GMT
References: <52305GFX@PSUVM> <69189@sun.uucp>
Sender: news@pasteur.Berkeley.EDU
Organization: What, me organized???
Lines: 79

In <69189@sun.uucp>, landman@sun.UUCP (Howard A. Landman) said:  
>In article <52305GFX@PSUVM> GFX@PSUVM.BITNET writes:
>>We use a rather large database (500,000 + records) holding  a few
>>variables ( 25 +/-) related to industrial establishments.  We manage
>>it with SAS on an IBM mainframe, but are curious as to whether such
>>a dataBase could be installed in an HyperCard environment to provide
>>interactive queries (a rarity, but nonetheless...)
>
>>I would be delighted to hear from anyone with similar experience, or
>>from anyone able to formulate educated guesses...
>
>A really rough guess, based on my stack with 1,600 cards which takes
>nearly .5 MB, would be: (500,000 / 1,600) * .5 MB, or about 156 MB.

To add another data point:  I have a stack containing all of the registered
UUCP sites in the U.S. as of a about two months ago.  The stack has 2,916
cards in it (including about 40 help cards), and is 1.36 Meg large, 
averaging out to 466 bytes/card.  The stack imports data from the uucp
sites database, and took several hours on a MacII to do the import (I
don't know exactly how long b/c I was out of the house at the time and
forgot to put timing information into the code that did the import).

Once imported, I compacted the stack twice to get optimal searching performance,
and then I 'locked' the stack to make it read-only.  Opening the stack takes
roughly 5 seconds.  Once open and Hypercard has cached whatever information
is caches on opening, search time is minimal.  To find "starnine", the first
occurance of which is on the 889th card, takes under 1 second.  To find
"starnine" in field "site name" (ie: the card belonging to starnine), which
is the 1,408th card, also takes under a second.

For interactive query I find performance is *very* good.  Again, this is
on a twice-compacted locked stack of 1.3Meg and ~3,000 cards.  Also, I'm
running on a Quantum 80 Meg drive on a MacII w/1,000K devoted to HyperCard
1.2.1 under MultiFinder 6.0.  When I tried this same stack out on a MacSE
w/an Apple 20 Meg drive, search time was on the order of 5 seconds (using
HyperCard 1.2, not 1.2.1).

Again, quite reasonable for *interactive* query.  I would NOT want to use
it for generating reports however.

>... [other questions, the answers already given I agree with completely]

>>What would be the expected search time?
>
>The search speed of HyperCard gets slower as the size of the stack
>increases, and if we assume linear degradation your searches might
>take 2.5 hours since mine take up to half a minute.  (Additional searches
>on the same key are much faster - I think HyperCard either builds indices
>or at least scans ahead to the next one while you're not looking.)

Howard, judging from my search times, I'd guess that twice-compressed
locked stacks have FAR superior search times.  Also, if you know which
field your data is in, search time is also faster.  Extrapolating linearly
from my search times and assuming equivalent hardware etc. and 1/2 a
second for the search times (which is a good rough guess), the search time
for a ~150 Meg twice-compacted locked stack would be roughly 60 seconds.

Anyone at Apple have any data on this sort of thing?

>
>>Any other approach you think is superior?
>
>You should try to determine numbers for a fast database system (FoxBase?),
>so you have a point of comparison.

I agree entirely here.  60 seconds is just that:  60 seconds.  It isn't
fast or slow until you compare it with something.  However, for a number
of interactive things (like the UUCP stack), the HyperCard interface is
superior enough to anything FoxBase could give me that even if FoxBase
were twice or three times as fast it wouldn't even be a consideration (in
fact, for this application HyperCard is the ONLY consideration in my
opinion...).


Peter
--
Peter "Arrgh" Korn
korn@ucbvax.Berkeley.EDU
{decvax,hplabs,sdcsvax,ulysses,usenix}!ucbvax!korn