Path: utzoo!attcan!uunet!tut.cis.ohio-state.edu!mailrus!ncar!ico!vail!rcd
From: rcd@ico.ISC.COM (Dick Dunn)
Newsgroups: comp.software-eng
Subject: Re: C source lines in file
Summary: lines don't count...and Two Rules
Message-ID: <16018@vail.ICO.ISC.COM>
Date: 17 Aug 89 18:10:24 GMT
References: <35120@ccicpg.UUCP>
Organization: Interactive Systems Corp, Boulder, CO
Lines: 107

swonk@ccicpg.UUCP (Glen Swonk) writes:
> Does anyone have a program or a method of determing
> the number of C source lines in a source file?
> My assumption is that comments don't count as source
> lines unless the comment is on a line with code.

If you're on a UNIX system or have comparable tools, a simple awk script
can do this much.  However, you don't learn much from it.  In particular,
given the question:

> Are there any other tools to measure the complexity
> of a source file?

it's clear you're off on the wrong foot.  A count of source lines is NOT a
useful measure of program size or complexity.  Incidentally, be careful
about the difference between size and complexity!

As noted by flint@gistdev.UUCP (Flint Pellett):

> Comment lines don't count?  What are you going to use the count for when you
> get it? ... If you're counting for purposes of
> measuring productivity, then comment lines certainly do count, otherwise
> you're going to be encouraging people to not document their code.

Pellett is correct about the effect of not counting comment lines.
However, if you go off counting lines as a measure of work, you'll see a
useful comment like:

/*	lexcom - scan (a piece of) a comment
 *	Return either T_COM if end of comment found or T_NULL if end of
 *	line found first.
 *	Also handles instate and comment counting.
 */

turn into a baroque display like:

/************************************************************************/
/*                                                                      */
/*    FUNCTION NAME:    lexcom                                          */
/*                                                                      */
/*    RESULT TYPE:      int                                             */
/*                                                                      */
/*    ARGUMENTS:        (none)                                          */
/*                                                                      */
/*    PURPOSE:          blather babble...                               */
/*                                                                      */
[etc., ad nauseam...no sense wasting netbandwidth on it...]
/*                                                                      */
/************************************************************************/

The same thing will happen if you associate some reward or figure of merit
with source-line count, or identifier length, etc...you'll see:

	for (p = s; *p; p++) {
		[stuff]
	}

turn into:

	for (string_search_pointer = target_string;
		*string_search_pointer != STRING_TERMINATOR;
		string_search_pointer++)
	{
		[stuff]
	}

When I've tried to measure C source-file size and complexity, I've used a
program which does a simple analysis of the source but gives several
measures, including the following:
	blank lines
	lines containing only comment text
	lines containing only code
	lines containing comment and code
	average comment length or histogram of lengths
	average number of tokens per line, per nonblank line
	average identifier length or histogram of lengths
	average nesting level (requires tedious explanation)
	count of occurrences of each keyword
	count of occurrences of literal constants, by type
The result, of course, does NOT reduce program size or complexity to a
single number.  The token count is far more useful than a line count if you
want to know "how much code" you've got, but it's still woefully
inadequate.

I offer two rules about measuring program size/complexity:

1.  Any variant of "source line count" is useless as a measure of the
program.
	I've heard countless times the rationalization that "Well, it may
	not be good, but it's the best we can do."  This is WRONG!  It's
	worse than no measure at all.  It implies that you have information
	you don't really have.  If it's used as a measure of productivity,
	it's particularly bad, because there are obvious ways to pervert
	any obvious measures--and all of them make for worse programs.

2.  Programs are supposed to be good, not big.
	A program should be measured against what it is supposed to do.
	Sheer size is often unrelated to apparent complexity, and both may
	be unrelated to actual complexity (in terms of programming effort).

Talking among various people I know, we've all come up with a joke about
"negative productivity".  You start the day with, say, a thousand lines of
crappy code and end the day with 300 lines of clean code--thereby having
produced -700 lines of code for the day.
-- 
Dick Dunn     rcd@ico.isc.com    uucp: {ncar,nbires}!ico!rcd     (303)449-2870
   ...Are you making this up as you go along?