7

In this problem I'm trying to simply take a list of items and a range and to find combinations that would allow all items to be used.

Here's an example: imagine you have 4 items (apple, pear, peach, and orange) and want a minimum of 20% of the basket to contain each and a maximum of 60%. For example, you could have 25%, 25%, 25%, 25% of each item or 30%, 30%, 20%, 20% and so on, but 0%, 0%, 50%, 50% does not work because the specified min% is 20%.

The program works fine, but it uses items less than the entire list (instead of 4 items in every solution, some solutions contain 2 or 3 items, which is not what I want). If I send a list of 4 items, I want the combinations using all 4 items togethers and nothing less. I do not want this because I plan to use large lists and I want the size to be items used to only be all and nothing less than the min%. Here's an example using the above information(4 items, 20-60% range):

good:
 apples = 22
 pears  = 24
 peach  = 25
 orange = 29
 total: 100%

bad:
 apples = 0
 pears  = 0
 peach  = 40
 orange = 60
 total: 100%
 // Although total is correct, the example fails because
 // the minimum of 20% per item was not obeyed.

I'm really confused as to why this is happening, but if I had to bet, I'd think it's the way my recursion is taking the number of items in the list and subtracting one before sending it back. Its in the method recursion_part:

private static void recursion_part(int k, int sum, int[] coeff) {
    //k is number of items in the list(in this example its 4(0-3), sum is the remaining total percent to break down, coeff is the template to store values
    //this recursively takes the sum and tries to find lower values of it until it equals zero using the bounds given
    for (int c = low_bound[k]; c <= high_bound[k]; c++) {  
        coeff[k] = c;
        int[] newcoeff = Arrays.copyOf(coeff, coeff.length);
        if (c - sum == 0) {
        results.add(newcoeff);
        printresults(newcoeff);
        break;
    } else if (k > 0) {
        recursion_part(k - 1, sum - c, newcoeff);
    }
}
}

I want to work on larger lists and I think it'll be a problem if it calculates a lot of results that I don't care for. How can I redesign this to only process all items in the list and stay within the range limits?

I thought of putting a method that checks how many zeros there are in the list and then breaks if it goes below the size of the list, but the fact that I'm getting blank results means its processing items less than my list and I'm thinking its better to design the program so it doesn't waste resources.

Here's the entire code (it works as described but is giving zero results as mentioned):

import java.util.ArrayList;
import java.util.Arrays;


public class recursion_percent_returner {
    static final int target_percent = 100;
    static final String[] names = new String[] {"apples", "pears", "peach", "orange" };
    static int[] low_bound = new int[names.length];
    static int[] high_bound = new int[names.length];
    static ArrayList results =  new ArrayList(); //queue to store results
    static int[] default_coeff = new int[names.length];
    
    public static void main(String[] args) {
        System.out.println("starting..");
        System.out.println("list size " + names.length);
        Arrays.fill(low_bound, 20); //fills the min list with default value
        Arrays.fill(high_bound, 60); //fills the max list with default value
        recursion_part(names.length-1,target_percent,default_coeff);
        System.out.println("total size of results are " + results.size());
    }

    private static void recursion_part(int k, int sum, int[] coeff) {
        //k is number of items in the list(in this example its 4(0-3), sum is the remaining total percent to break down, coeff is the template to store values
        //this recursively takes the sum and tries to find lower values of it until it equals zero using the bounds given
        for (int c = low_bound[k]; c <= high_bound[k]; c++) {  
            coeff[k] = c;
            int[] newcoeff = Arrays.copyOf(coeff, coeff.length);
            if (c - sum == 0) {
                results.add(newcoeff);
                printresults(newcoeff);
                break;
            } else if (k > 0) {
                recursion_part(k - 1, sum - c, newcoeff);
            }
        }
    }

    private static void printresults(int[] newcoeff) {      
        for (int x = 0; x<newcoeff.length; x++) {
            System.out.println(names[x] + " = " + newcoeff[x]);
        }
        System.out.println("*********");

    }
}

I am open to better ways to achieve the outcome I'm looking for.

p.s. this is not homework and I'm not a student; I just have a tendency to find weird sounding proxy problems.

Edit

I included the entire code but here's the output as well. It's a snippet of the 2653 solutions, which is generating more than what I need. If you look at it briefly, you'll see that most are correct, but as you get lower you'll see not all values are being used; I only want the solutions that are using all values; there should be no 0 value-entries.

halfer
  • 19,824
  • 17
  • 99
  • 186
Lostsoul
  • 25,013
  • 48
  • 144
  • 239
  • 4
    That's an awful lot of code for a SO question, and a not-very-clear problem description. Please can you work on a [minimal test-case](http://sscce.org), and find a way to explain the problem better? – Oliver Charlesworth May 25 '12 at 19:11
  • 1
    If you need each item to be at least 20%, why not start with each item allocated 20%, then algorithmically allocate the remaining 20%? – Marcin May 25 '12 at 19:13
  • 1
    @OliCharlesworth I'm not sure, Its really the one method thats a problem which I highlighted but I wanted to provide the full code(although heavily edited to keep it small..) so people can run it if they want. I can try to cut pieces but then it wouldn't run my example as I'm describing it. – Lostsoul May 25 '12 at 19:13
  • @Marcin if I start the coeff variable with the min value, it still does the same thing. In the for loop(in the recursive method), I do have the range of the loop between the min and max, but it still has some items with 0% values. – Lostsoul May 25 '12 at 19:15
  • Also, are percentage points discrete? If so, your problem reduces to finding all ways to allocate 20 discrete identical items to 4 buckets, a simple combinatoric problem (it's the same as enumerating all base-4 20-digit numbers). – Marcin May 25 '12 at 19:15
  • @Marcin sorry I'm not good at math(or really programming I suppose :-) not sure what you mean. The example above is exactly what I'm working on, its all positive integers that all operate within the range specified. – Lostsoul May 25 '12 at 19:17
  • @Lostsoul So, yes, they are discrete, and it follows that all you need to do is enumerate all of the combinations for the remaining 20%, then add 20 to the final totals. – Marcin May 25 '12 at 19:19
  • @Marcin sorry i'm confused..my understanding of algorithms is not very strong. Can you explain how I can modify my recursion to reduce the number of results that do not contain the min value I'm looking for? – Lostsoul May 25 '12 at 19:31
  • @Lostsoul This question deals with counting in arbitrary bases: http://stackoverflow.com/questions/1944508/arbitrary-digit-counter You need all 20-digit numbers (including leading zeros). Start again if necessary. – Marcin May 25 '12 at 19:37
  • @Marcin I have a feeling you were right the first time, before correcting yourself (20-digit numbers, base-4). There are 20 independent choices between 4 buckets. One digit -> one independent choice, digit range -> how many buckets to choose from each time. – Marko Topolnik May 25 '12 at 19:40
  • @Lostsoul Sorry to keep changing my mind, what you need is all 20-digit base-4 numbers, then to count the number of occurrences of numeral (each numeral represents a fruit). – Marcin May 25 '12 at 19:40
  • @MarkoTopolnik Snap! Great minds think at the same time! – Marcin May 25 '12 at 19:41
  • I don't think this is exactly what OP will find useful. 4^20 is not what you call a tractable number, weighing in with over 1 trillion. – Marko Topolnik May 25 '12 at 19:48
  • This looks an awful lot like : http://en.wikipedia.org/wiki/Knapsack_problem – Woot4Moo May 25 '12 at 19:48
  • I agree with Marcins first remark: There are 4 fruits, each with a mminimum or or 20%. So the maximum is not 60% but 20%+100-(4*20%)=40%. The rekurive method could take the rest (20) after setting each value to the minimum, and add every possible value (0 to 20) to the next element (apples, ...), and the divide the rest beyond the remaining elements (pears, peach, orange). If there is only one remaining element (orange) add all to orange. – user unknown May 25 '12 at 19:48
  • @Woot4Moo Knapsack is NP, this isn't, difference being in the non-regularity of size with knapsack. – Marko Topolnik May 25 '12 at 19:49
  • @MarkoTopolnik apologies, a constrained knapsack – Woot4Moo May 25 '12 at 19:49
  • 1
    Lostsoul, what say ye, you aren't planning on enumerating the whole trillion possibilities? How do you additionally constrain your solution? – Marko Topolnik May 25 '12 at 19:52
  • @MarkoTopolnik i constrain it via the number of items and the range. In this case the range is large but I plan to use tighter rangers such as 3-5%.Using the above code, it gives you 2653 results but 882(30%+) of them are not needed which I believe limits the results. This current approach is generating too much data that is below the min %. – Lostsoul May 25 '12 at 20:04
  • 1
    OK, but do you intend to fix according to the scheme that @Marcin has proposed? That's a very straightforward technique. You might even have it done automatically by some arbitrary-base number printing... thingy. – Marko Topolnik May 25 '12 at 20:08
  • There aren't trillions of possibilities. You have 20% at each, minimum. This makes already 80%, so 20% remains. Distribute 20% over 4 elements: a=20% => 0,0,0 for b,c,d. a=19% => 0,0,1; 010, 100 for the remaining elements. a=18% => distribute 2 over 3 and so on. 20^4 is already an upper bound, the real value is far below. – user unknown May 25 '12 at 20:10
  • @MarkoTopolnik I'm very new to programming and wasn't strong in Math, so I'm just googling almost everything you and Marcin wrote and trying to understand how to integrate it into my code(I'm very slow but persistent so I'll work on it as long as it takes). – Lostsoul May 25 '12 at 20:10
  • 1
    @userunknown To sum it up, you'll make 20 independent choices between four elements. That's 4^20 variations. – Marko Topolnik May 25 '12 at 20:14
  • Also, this is relevant: http://stackoverflow.com/a/2381031/21640 – Marcin May 25 '12 at 20:19
  • @MarkoTopolnik where I'm confused is, I thought I do start with allocating 20% to each item..since my for loop is bounded by 20 to 60. I think the problem is if the program finds the result with 2 items instead of 4 it just goes with it but it really shouldn't even attempt to process anything less than the size of 4 items.. – Lostsoul May 25 '12 at 20:19
  • @Marko: Not independent. You only have 20 remaining points to distribute. If you give the first element 20 points, there is no point left for b,c,d. If you give a,b,c 0 points each, you have to place the remaining points to d. – user unknown May 25 '12 at 20:31
  • 1
    @userunknown I was wrong. In Marcin's and my way of thinking the flaw is that many variations are equivalent because you only count the number of occurrences of each digit in the generated number. So it's not variations, but combinations (with repeating) and the number is not in the trillions :) In fact it's 1771. – Marko Topolnik May 25 '12 at 20:32
  • 1
    @userunknown RE your last comment. I'm not modelling it like that. Imagine each of the 20 remaining points being a ball and imagine pear, apple, peach, orange being four buckets. Now you throw each point into any bucket. That's a 20-combination from a set of 4, with repetition. – Marko Topolnik May 25 '12 at 20:36
  • 1
    @MarkoTopolnik Yup, `1771 = choose(20+4-1,20)`, generally `choose(percent+buckets-1,percent)`. – Daniel Fischer May 25 '12 at 20:37
  • 1
    OP, that means I withdraw my suggestion on how to generate the results. – Marko Topolnik May 25 '12 at 20:40
  • 1
    Wow..thank you everyone for help in discussing the problem. You all are really awesome thanks! @Marcin thanks for posting the link to that Q, I tried my best to read the source code and my understanding is once the results reach less than the number of items he breaks. I can def do that, won't that prevent certain results from coming in(this was my original idea but thought it was better instead of catching the problem to prevent it)? Am I not understanding the approach mentioned? – Lostsoul May 25 '12 at 20:43
  • It seems I was mistaken in my combinatorics. See @userunknown's answer. – Marcin May 25 '12 at 20:48
  • Thank you so much everyone..there is over 20 comments here but the discussion really helped me understand why some approaches didn't work and why some did..I'm playing with the code in the answer and I understand what you guys were saying now. I think I can easily intergrate the logic into my program. Thank you so much @marcin and everyone else. You all are friggin awesome(and mad geniuses)! – Lostsoul May 25 '12 at 21:36
  • @MarkoTopolnik thank you very much as well...you have translate alot of the ideas to something I understood. Thanks..Have a great weekend everyone! – Lostsoul May 25 '12 at 21:37
  • @DanielFischer Hey Daniel. Your formula was correct but I don't understand how to modify it. I can replace the numbers but don't you need to take the max_range into account? you seem to only take the min. – Lostsoul May 25 '12 at 22:07
  • @Lostsoul Right, that's without upper limit. With four buckets and a minimum of 20%, the maximum of 60% can never be reached, so I ignored it. If you have a lower maximum that can be reached, that reduces the number of solutions/combinations, the formula is more complicated, and I don't know it off the top of my head. You can look it up in many statistics books, if you care. For the algorithm, you don't need the formula, however. – Daniel Fischer May 25 '12 at 22:15
  • 1
    got it. I found the nCr one you used, which I recreated and got your results but because it doesn't consider the top bound its a bit off for my experiments. I plan to use this on a larger list(say 40 items with a smaller range, 1-4 or something) and your formula was helpful in understanding how many results I'd be having. thanks and I'll dig a bit on the net for the formula..Thanks for showing us the algo earlier – Lostsoul May 25 '12 at 22:17

1 Answers1

6
import java.util.*;

public class Distributor {

    private ArrayList<int[]> result =  new ArrayList <int[]> ();
    
    public Distributor (final String [] names, int [] low, int [] high) 
    {
        final int rest = 10;
        int minimum = 0;
        for (int l : low)
            minimum += l; 
        int [] sizes = new int [names.length];
        distribute (0, low, high, rest - minimum, sizes);
        System.out.println ("total size of results are " + result.size ());
        for (int [] ia : result)
            show (ia, low); 
    }
    
    public static void main (String [] args) {
        final String [] names = new String [] {"a", "b", "c"};
        int [] low = new int [] {2, 2, 1};
        int [] high = new int [] {3, 4, 6};
        new Distributor (names, low, high);
    }
    
    /*
        distribute the rest of values over the elements in sizes, beginning with index i.           
    */
    void distribute (int i, int [] low, int [] high, final int rest, int [] sizes) {
        // System.out.println (i + " " + rest + " " + sizes);
        if (i == sizes.length - 1) {
            if (rest < high [i]) {
                sizes[i] = rest; 
                result.add (Arrays.copyOf (sizes, sizes.length));
            }
        }
        else 
            for (int c = 0; 
                c <= java.lang.Math.min (high [i] - low [i], rest); 
                ++c) {  
                sizes [i] = c;
                    distribute (i + 1, low, high, rest - c, sizes);                 
            }
    }

    private static void show (int [] arr, int [] low) {      
        for (int x = 0; x < arr.length; x++) {
            System.out.print (" " + (arr [x] + low[x]));
        }
        System.out.println ();
    }
}

Long variable names are better, if they are clearer than short ones. But they aren't more valuable per se.

More so: Stick to namingConventions, which are CamelCase in Java, not_funky_underlines.

Okay - why should low and high be static? I didn' work up im this direction for 'names' and 'result'. Remains as an exercise to remove the static modifier.

Okay - my impression is, that the sum needs always to be 100, not less, nor above 100. So if 4x20% is the minimum, a maximum of 60% isn't possible. But I understand that the values are variable, and in another settings, you might have 4x10% minimum, and then, a maximum of 60 % would make sense. I didn't test the code for that purpose.

But I substract the minimum (4*20%) from the rest to distribute, so you have to add these values for the final result.

The recursive distribute function starts with the termination condition: The last element is reached. Now the rest - how much it might be - has to be put for this last element.

Else, you take all possible values for this element, and distribute the rest.

update:

I changed the code to take care of the upper limit, and simplified the example on the one side, since it now uses just 3 elements and a maximum of 10 instead of 100. The output shows the real values (min + variable part) and the algorithm only adds solutions which don't violate the upper bound constraint.

Sample output:

 2 2 6
 2 3 5
 2 4 4
 3 2 5
 3 3 4
 3 4 3
total size of results are 6
Community
  • 1
  • 1
user unknown
  • 35,537
  • 11
  • 75
  • 121
  • Thanks for this(and your tips..) your right, 60% is not possible I think the highest can be 20%,20%,20%,40%(using 4 items). If I understand correctly in your example 0 isn't 0 but its the min value(in this case 20?) – Lostsoul May 25 '12 at 20:57
  • Thanks so much..you really clarified alot for me. I had another question, in the comments you had a formula to figure out how many possiblities there were per restrictions. `1771 = choose(20+4-1,20), generally choose(percent+buckets-1,percent)` In it you had the lower range, but how can I do this based on the upper range..for example,40 items with a range of 2-5% or something like that..Its interesting to know how big some processing will be before I actually do it. – Lostsoul May 25 '12 at 21:39
  • 1
    Also..thank you so much...your code is very very clear and I've broken it a few times to understand what your doing and I think I understand most of it. Thanks so much, if you could see me you'd see a lightbulb go on after hacking around your code for a while. Thanks a million! – Lostsoul May 25 '12 at 21:40
  • Ahh your right.sorry..I think i'm so tried with trying to figure this out my brain is off...I thought you wrote the comment I was referring to. Thanks for your great example, its very helpful..Thanks so much! – Lostsoul May 25 '12 at 22:04
  • @Lostsoul: You're welcome. Maybe we can cleanup some comments? It's very long here and there. Note, that I made another, minor update. The code printed the new sizes array, but it was changed over and over again in place, so we ended with 1771 references to the same Array. Now I always add a copyOf, which allows to remove the println from the generating method. – user unknown May 25 '12 at 22:22
  • Thanks this code keeps getting better and beter. I'll just ask this because you wrote it: Is it possible to make it memory efficient? I don't mean use shorts instead of ints but anything to make the memory footprint more constant while keeping the speed? When running it on a 30 items, it runs out of memory. My way around it(in my real program) is to use rabbitmq that writes to disk once its memory hits a threshold or stxxl(for c/c++ to store variables on disk) but do you see another way? Its prob. not possible but just thought I'd ask since you know exactly what I'm doing. – Lostsoul May 25 '12 at 22:33
  • wait..sorry, I retract that dumb question..I just studied your code and memory consumption is constant..it seems the result variable is taking all the memory as more results get into it..sorry about that..I comfirm it stays constant with 33 items with a range of 1-3%, this is a work of art..thanks! – Lostsoul May 25 '12 at 22:36
  • 1
    It still can be improved. The names aren't really used - would be a candidate for the show-method as headline. Then: The low-value is only added in the show-method. That isn't very useful and should be done to the values themselves. About memory: This is a kind of partitioning problem, and these problems **can** grow aggressively. I guess sum=100, 100 elements with min/max 0-100 each would be a serious problem. But you could write to a file, instead of adding to the result collection. – user unknown May 26 '12 at 03:01
  • Your right, my goal is to run this at 30-40 items with a min/max of 1-5 and sum of 100, but after 3 1/2 hours I stopped it. A idea I'm looking at is in my first example, 4 items, min/max=20-60, sum=100, the first result is: 0 0 0 20 and the last result is: 20 0 0 0. I'm working right now to study where it starts to mirror the previous results so maybe instead of reprocessing the entire list I can just reverse existing values..hopefully that'll make it double as fast. – Lostsoul May 26 '12 at 03:22
  • Do all elements have the same bounds? Else, it won't be symetrical. – user unknown May 26 '12 at 03:41
  • Yes, they all have the same bounds..the individual bounds you created is awesome but didn't really fit my situation..all items obey the same bounds – Lostsoul May 26 '12 at 03:45
  • Your sample code contained arrays to describe the bounds: `int[] low_bound = new int[names.length];`, so I thought it was important. – user unknown May 26 '12 at 03:50
  • I am not very good at programming so I often think as simply as possible and do everything in small steps :-) It was interesting so I tried to keep that structure but I don't need it right now(maybe one day?) – Lostsoul May 26 '12 at 03:56
  • A performance improvement could be made, if we use the bounds information more early, I think. Given 4 elements, size 1..5 each. This would allow sums from 4 to 20. Now if we have a sum of 13 only. Can the first element be 1? Yes, because 1+2+(2*5)=13. But if the first is 1, the second can't be 1 too, because we then have a rest of 11, which is more than 2*5, the max. I'm not sure whether the current algo terminates early enough. Controlling the rest with a multiplication is more easy than iterating over the high-array of the remaining indexes. – user unknown May 26 '12 at 09:58
  • Awesome! Thanks so much user_unknown..I didn't think about that but that should cut amount of processing significantly and I think its easy to implement. Do I basically precalculate the range then based on the sum and position of `i` modify the range? Thanks so much, I'm going to spend today learning how to implement your advice and study how the results reverse overtime. – Lostsoul May 26 '12 at 18:01