1   
  2   
  3   
  4   
  5   
  6   
  7   
  8   
  9   
 10   
 11   
 12   
 13   
 14   
 15   
 16   
 17   
 18   
 19   
 20   
 21   
 22   
 23   
 24   
 25   
 26   
 27   
 28   
 29   
 30   
 31   
 32   
 33   
 34   
 35   
 36   
 37   
 38   
 39  """ 
 40  Provides the implementation for various knapsack algorithms. 
 41   
 42  Knapsack algorithms are "fit" algorithms, used to take a set of "things" and 
 43  decide on the optimal way to fit them into some container.  The focus of this 
 44  code is to fit files onto a disc, although the interface (in terms of item, 
 45  item size and capacity size, with no units) is generic enough that it can 
 46  be applied to items other than files. 
 47   
 48  All of the algorithms implemented below assume that "optimal" means "use up as 
 49  much of the disc's capacity as possible", but each produces slightly different 
 50  results.  For instance, the best fit and first fit algorithms tend to include 
 51  fewer files than the worst fit and alternate fit algorithms, even if they use 
 52  the disc space more efficiently. 
 53   
 54  Usually, for a given set of circumstances, it will be obvious to a human which 
 55  algorithm is the right one to use, based on trade-offs between number of files 
 56  included and ideal space utilization.  It's a little more difficult to do this 
 57  programmatically.  For Cedar Backup's purposes (i.e. trying to fit a small 
 58  number of collect-directory tarfiles onto a disc), worst-fit is probably the 
 59  best choice if the goal is to include as many of the collect directories as 
 60  possible. 
 61   
 62  @sort: firstFit, bestFit, worstFit, alternateFit 
 63   
 64  @author: Kenneth J. Pronovici <pronovic@ieee.org> 
 65  """ 
 66   
 67   
 68   
 69   
 70   
 71   
 72   
 73   
 74   
 76   
 77     """ 
 78     Implements the first-fit knapsack algorithm. 
 79   
 80     The first-fit algorithm proceeds through an unsorted list of items until 
 81     running out of items or meeting capacity exactly.  If capacity is exceeded, 
 82     the item that caused capacity to be exceeded is thrown away and the next one 
 83     is tried.  This algorithm generally performs more poorly than the other 
 84     algorithms both in terms of capacity utilization and item utilization, but 
 85     can be as much as an order of magnitude faster on large lists of items 
 86     because it doesn't require any sorting. 
 87   
 88     The "size" values in the items and capacity arguments must be comparable, 
 89     but they are unitless from the perspective of this function.  Zero-sized 
 90     items and capacity are considered degenerate cases.  If capacity is zero, 
 91     no items fit, period, even if the items list contains zero-sized items. 
 92   
 93     The dictionary is indexed by its key, and then includes its key.  This 
 94     seems kind of strange on first glance.  It works this way to facilitate  
 95     easy sorting of the list on key if needed. 
 96   
 97     The function assumes that the list of items may be used destructively, if 
 98     needed.  This avoids the overhead of having the function make a copy of the 
 99     list, if this is not required.  Callers should pass C{items.copy()} if they 
100     do not want their version of the list modified. 
101   
102     The function returns a list of chosen items and the unitless amount of 
103     capacity used by the items. 
104   
105     @param items: Items to operate on 
106     @type items: dictionary, keyed on item, of C{(item, size)} tuples, item as string and size as integer 
107   
108     @param capacity: Capacity of container to fit to 
109     @type capacity: integer 
110   
111     @returns: Tuple C{(items, used)} as described above 
112     """ 
113   
114      
115     included = { } 
116   
117      
118     used = 0 
119     remaining = capacity 
120     for key in items.keys(): 
121        if remaining == 0: 
122           break 
123        if remaining - items[key][1] >= 0: 
124           included[key] = None 
125           used += items[key][1] 
126           remaining -= items[key][1] 
127   
128      
129     return (included.keys(), used) 
 130   
131   
132   
133   
134   
135   
137   
138     """ 
139     Implements the best-fit knapsack algorithm. 
140   
141     The best-fit algorithm proceeds through a sorted list of items (sorted from 
142     largest to smallest) until running out of items or meeting capacity exactly. 
143     If capacity is exceeded, the item that caused capacity to be exceeded is 
144     thrown away and the next one is tried.  The algorithm effectively includes 
145     the minimum number of items possible in its search for optimal capacity 
146     utilization.  For large lists of mixed-size items, it's not ususual to see 
147     the algorithm achieve 100% capacity utilization by including fewer than 1% 
148     of the items.  Probably because it often has to look at fewer of the items 
149     before completing, it tends to be a little faster than the worst-fit or 
150     alternate-fit algorithms. 
151   
152     The "size" values in the items and capacity arguments must be comparable, 
153     but they are unitless from the perspective of this function.  Zero-sized 
154     items and capacity are considered degenerate cases.  If capacity is zero, 
155     no items fit, period, even if the items list contains zero-sized items. 
156   
157     The dictionary is indexed by its key, and then includes its key.  This 
158     seems kind of strange on first glance.  It works this way to facilitate  
159     easy sorting of the list on key if needed. 
160   
161     The function assumes that the list of items may be used destructively, if 
162     needed.  This avoids the overhead of having the function make a copy of the 
163     list, if this is not required.  Callers should pass C{items.copy()} if they 
164     do not want their version of the list modified. 
165   
166     The function returns a list of chosen items and the unitless amount of 
167     capacity used by the items. 
168   
169     @param items: Items to operate on 
170     @type items: dictionary, keyed on item, of C{(item, size)} tuples, item as string and size as integer 
171   
172     @param capacity: Capacity of container to fit to 
173     @type capacity: integer 
174   
175     @returns: Tuple C{(items, used)} as described above 
176     """ 
177   
178      
179     included = { } 
180   
181      
182     itemlist = items.items() 
183     itemlist.sort(lambda x, y: cmp(y[1][1], x[1][1]))   
184     keys = [] 
185     for item in itemlist: 
186        keys.append(item[0]) 
187   
188      
189     used = 0 
190     remaining = capacity 
191     for key in keys: 
192        if remaining == 0: 
193           break 
194        if remaining - items[key][1] >= 0: 
195           included[key] = None 
196           used += items[key][1] 
197           remaining -= items[key][1] 
198   
199      
200     return (included.keys(), used) 
 201   
202   
203   
204   
205   
206   
208   
209     """ 
210     Implements the worst-fit knapsack algorithm. 
211   
212     The worst-fit algorithm proceeds through an a sorted list of items (sorted 
213     from smallest to largest) until running out of items or meeting capacity 
214     exactly.  If capacity is exceeded, the item that caused capacity to be 
215     exceeded is thrown away and the next one is tried.  The algorithm 
216     effectively includes the maximum number of items possible in its search for 
217     optimal capacity utilization.  It tends to be somewhat slower than either 
218     the best-fit or alternate-fit algorithm, probably because on average it has 
219     to look at more items before completing. 
220   
221     The "size" values in the items and capacity arguments must be comparable, 
222     but they are unitless from the perspective of this function.  Zero-sized 
223     items and capacity are considered degenerate cases.  If capacity is zero, 
224     no items fit, period, even if the items list contains zero-sized items. 
225   
226     The dictionary is indexed by its key, and then includes its key.  This 
227     seems kind of strange on first glance.  It works this way to facilitate  
228     easy sorting of the list on key if needed. 
229   
230     The function assumes that the list of items may be used destructively, if 
231     needed.  This avoids the overhead of having the function make a copy of the 
232     list, if this is not required.  Callers should pass C{items.copy()} if they 
233     do not want their version of the list modified. 
234   
235     The function returns a list of chosen items and the unitless amount of 
236     capacity used by the items. 
237   
238     @param items: Items to operate on 
239     @type items: dictionary, keyed on item, of C{(item, size)} tuples, item as string and size as integer 
240   
241     @param capacity: Capacity of container to fit to 
242     @type capacity: integer 
243   
244     @returns: Tuple C{(items, used)} as described above 
245     """ 
246   
247      
248     included = { } 
249   
250      
251     itemlist = items.items() 
252     itemlist.sort(lambda x, y: cmp(x[1][1], y[1][1]))     
253     keys = [] 
254     for item in itemlist: 
255        keys.append(item[0]) 
256   
257      
258     used = 0  
259     remaining = capacity 
260     for key in keys: 
261        if remaining == 0: 
262           break 
263        if remaining - items[key][1] >= 0: 
264           included[key] = None 
265           used += items[key][1] 
266           remaining -= items[key][1] 
267   
268      
269     return (included.keys(), used) 
 270   
271   
272   
273   
274   
275   
277   
278     """ 
279     Implements the alternate-fit knapsack algorithm. 
280   
281     This algorithm (which I'm calling "alternate-fit" as in "alternate from one 
282     to the other") tries to balance small and large items to achieve better 
283     end-of-disk performance.  Instead of just working one direction through a 
284     list, it alternately works from the start and end of a sorted list (sorted 
285     from smallest to largest), throwing away any item which causes capacity to 
286     be exceeded.  The algorithm tends to be slower than the best-fit and 
287     first-fit algorithms, and slightly faster than the worst-fit algorithm, 
288     probably because of the number of items it considers on average before 
289     completing.  It often achieves slightly better capacity utilization than the 
290     worst-fit algorithm, while including slighly fewer items. 
291   
292     The "size" values in the items and capacity arguments must be comparable, 
293     but they are unitless from the perspective of this function.  Zero-sized 
294     items and capacity are considered degenerate cases.  If capacity is zero, 
295     no items fit, period, even if the items list contains zero-sized items. 
296   
297     The dictionary is indexed by its key, and then includes its key.  This 
298     seems kind of strange on first glance.  It works this way to facilitate  
299     easy sorting of the list on key if needed. 
300   
301     The function assumes that the list of items may be used destructively, if 
302     needed.  This avoids the overhead of having the function make a copy of the 
303     list, if this is not required.  Callers should pass C{items.copy()} if they 
304     do not want their version of the list modified. 
305   
306     The function returns a list of chosen items and the unitless amount of 
307     capacity used by the items. 
308   
309     @param items: Items to operate on 
310     @type items: dictionary, keyed on item, of C{(item, size)} tuples, item as string and size as integer 
311   
312     @param capacity: Capacity of container to fit to 
313     @type capacity: integer 
314   
315     @returns: Tuple C{(items, used)} as described above 
316     """ 
317   
318      
319     included = { } 
320   
321      
322     itemlist = items.items() 
323     itemlist.sort(lambda x, y: cmp(x[1][1], y[1][1]))     
324     keys = [] 
325     for item in itemlist: 
326        keys.append(item[0]) 
327   
328      
329     used = 0  
330     remaining = capacity 
331   
332     front = keys[0:len(keys)/2] 
333     back = keys[len(keys)/2:len(keys)] 
334     back.reverse() 
335      
336     i = 0 
337     j = 0 
338   
339     while remaining > 0 and (i < len(front) or j < len(back)): 
340        if i < len(front): 
341           if remaining - items[front[i]][1] >= 0: 
342              included[front[i]] = None 
343              used += items[front[i]][1] 
344              remaining -= items[front[i]][1] 
345           i += 1 
346        if j < len(back): 
347           if remaining - items[back[j]][1] >= 0: 
348              included[back[j]] = None 
349              used += items[back[j]][1] 
350              remaining -= items[back[j]][1] 
351           j += 1 
352   
353      
354     return (included.keys(), used) 
 355