Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.2 9/18/84; site boulder.UUCP
Path: utzoo!watmath!clyde!burl!ulysses!mhuxr!mhuxn!ihnp4!qantel!dual!lll-crg!seismo!hao!nbires!boulder!richard
From: richard@boulder.UUCP (Richard Byrd)
Newsgroups: net.math
Subject: Re: A new coding problem
Message-ID: <404@boulder.UUCP>
Date: Fri, 13-Sep-85 12:54:01 EDT
Article-I.D.: boulder.404
Posted: Fri Sep 13 12:54:01 1985
Date-Received: Sun, 15-Sep-85 11:58:41 EDT
References: <6218@duke.UUCP>
Reply-To: richard@boulder.UUCP (Richard Byrd)
Organization: University of Colorado, Boulder
Lines: 43



>The problem :
>Let S be the set of all binary words of length n.
>Let E be a set containg exactly L of the words in S.
>Define 
>         C(E)  =   sigma    min    diff(x,e)
>                  x in S   e in E
>
>                 where diff(x,y) = m  if x and y are two binary words that 
>                 are different in exactly m bits. Of course m <= n.
>
>to be the cost of the set E.
>Now given the two integers n and L, the problem is to construct a set E
>such that C(E) is minimum.

This seems to be a special case of the p-median problem or 
min-sum multicenter problem which comes up in operations research applications.
In the general case "S" is an arbitrary set (or perhaps a graph) 
with a distance function. You may find some information in the O.R.
literature under "Location Problems" or "Facility Location"

Garey and Johnson's book "Computers and Intractability" mentions it on p.220.
They say that in the general case the problem is NP-complete for general 
L (in your notation), but if L is fixed the problem is polynomial solvable
(unfortunately in the size of S, I think).
Of course you have a special distance function for which the
problem may be easier.

An approach which may help sometimes is the following.
You can set up the problem as an integer program.

min sum dij*xij
subj. to  sumj xij =1 for all i
               xij <= xjj for all i,j
          sumj xjj <= L
               xij >= 0 for all i,j    
where xjj=1 means word j is in E
      xij=1 word j is a closest word in E to word i
      dij is distance from i to j

I have heard that it very often happens that the corresponding
continuous LP has an integer solution.  Of course the LP can be very large.