DZone Snippets is a public source code repository. Easily build up your personal collection of code snippets, categorize them with tags / keywords, and share them with the world

Snippets has posted 5883 posts at DZone. View Full User Profile

Weighted Random Array Selection

09.25.2007
| 5918 views |
  • submit to reddit
        Say you have an array of ['apples', 'oranges', 'bananas', 'kiwis']

Selecting a random item is simple enough
  arr = ['apples', 'oranges', 'bananas', 'kiwis']
  arr[rand(arr.size)]

But say you want to have apples come up 50% of the time, oranges 25%, bananas 15% and kiwis 10% ... 


  class Array
    
    #sum (and mean) found on http://snippets.dzone.com/posts/show/2161
    def sum
      inject( nil ) { |sum,x| sum ? sum+x : x }
    end
    
    def probability(spread = 2)
      z = 1.0
      collect {|x| z = z / spread}
    end
    def weighted_random_index(probability_array = probability(2) )
      size.times do |x|
        return x if rand < probability_array[0..x].sum
      end
      return 0
    end
    
    def get_weighted_random_item(probability_array = probability(2))
      self[weighted_random_index(probability_array)]
    end
    
    def get_weighted_random_indexes(num_items, p = probability(2))
      res = []
      escape = 1000
      while res.size < num_items and escape > 0
        escape -= 1
        tmp = weighted_random_index(p) 
        res << tmp unless res.include?(tmp)
      end
      return res.sort
    end
  
  end


arr = ['apples', 'oranges', 'bananas', 'kiwis']
10.times do |i|
  puts arr.get_weighted_random_item([0.5, 0.25, 0.15, 0.10])
end

oranges
oranges
apples
apples
bananas
apples
apples
bananas
oranges
apples


I'm sure there's some math term for this but I'm calling it 'spread' until I learn otherwise.  I put in a default probability array of p = p / 2 which gives you (for an array of 4)

arr.probability
=> [0.5, 0.25, 0.125, 0.0625]

On larger arrays the probabilities will get (for all practicle purposes) zero quickly

arr = [1,2,3,4,5,6,7,8,9]
=> [1, 2, 3, 4, 5, 6, 7, 8, 9]
arr.probability
=> [0.5, 0.25, 0.125, 0.0625, 0.03125, 0.015625, 0.0078125, 0.00390625, 0.001953125]

I put in a simple spread concept to spread out the probabilities a bit more... the higher the spread, the more even the probability distribution (this varies a lot with array size - but here's a simple example):

   arr = ['apples', 'oranges', 'bananas', 'guava']
   iterations = 10000
   4.times do |t|
     p = arr.probability(t+2)
     res = Array.new(arr.size).fill(0)
     iterations.times do |x|
       res[arr.weighted_random_index(p)] += 1
     end
     res2 = res.collect {|x| (x/iterations.to_f) * 100}
     puts "For probability spread #{t+2}"
     puts "Results #{res2.join(", ")} = #{res2.sum}"
     puts ""
   end

   For probability spread 2
   Results 49.71, 37.55, 11.35, 1.39 = 100.0

   For probability spread 3
   Results 43.44, 29.99, 17.79, 8.78 = 100.0

   For probability spread 4
   Results 48.3, 23.56, 16.73, 11.41 = 100.0

   For probability spread 5
   Results 54.22, 19.31, 14.65, 11.82 = 100.0

Here is a longer example:

    def weighted_random_index_example
      arr = ['apples', 'oranges', 'bananas', 'guava']
      puts "Sample array = [#{arr.join(",")}]"
      p = [0.5,0.25, 0.15, 0.10]
      puts "Probability that each will show up [#{p.join(', ')}]"
      puts "1000 runs..."
      res = Array.new(arr.size).fill(0)
      1000.times do |t|
        res[arr.weighted_random_index(p)] += 1
      end
      res2 = res.collect {|x| (x/1000.0) * 100}
      puts "Results:"
      4.times do |t|
        puts "    #{arr[t]}: #{res2[t]}%"
      end
      
      puts ""
      puts "You can also use more spread out probability arrays"
      p = arr.probability(3)
      puts "Probability that each will show up with a spread of 3 [#{p.join(', ')}]"
      puts "1000 runs..."
      res = Array.new(arr.size).fill(0)
      1000.times do |t|
        res[arr.weighted_random_index(p)] += 1
      end
      res2 = res.collect {|x| (x/1000.0) * 100}
      puts "Results:"
      4.times do |t|
        puts "    #{arr[t]}: #{res2[t]}%"
      end
      
      puts ""
      puts "You can also just get selected indexes"
      puts "arr.get_weighted_random_indexes(3,p) = [#{arr.get_weighted_random_indexes(3,p).join(', ')}]"
      
      puts "The probability spread will depend on the number of items in your array - for an array of 4 it looks like this:"
      8.times do |t|
        puts "  probability(#{t+2}):  [#{arr.probability(t+2).collect {|x| sprintf('%0.5f',x)}.join(', ')}]"
      end
      return nil
    end
    weighted_random_index_example

Outputs...
Sample array = [apples,oranges,bananas,guava]
Probability that each will show up [0.5, 0.25, 0.15, 0.1]
1000 runs...
Results:
    apples: 49.1%
    oranges: 38.3%
    bananas: 11.3%
    guava: 1.3%

You can also use more spread out probability arrays
Probability that each will show up with a spread of 3 [0.333333333333333, 0.111111111111111, 0.037037037037037, 0.0123456790123457]
1000 runs...
Results:
    apples: 43.6%
    oranges: 29.2%
    bananas: 18.0%
    guava: 9.2%

You can also just get selected indexes
arr.get_weighted_random_indexes(3,p) = [0, 1, 3]
The probability spread will depend on the number of items in your array - for an array of 4 it looks like this:
  probability(2):  [0.50000, 0.25000, 0.12500, 0.06250]
    

Comments

Snippets Manager replied on Thu, 2009/11/26 - 5:32am

Thanks so much for this! This is exactly what I was looking for. Invest In China China Investment Property Investment China Real Estate Investment Real Estate Chinese Business

Snippets Manager replied on Sat, 2009/09/19 - 7:43pm

Sorry - some code got cutoff in my comment above. The full test code is: def badpick(values, probs) values.each_index { |index| sum = probs[0..index].inject { |x,y| x+y } return values[index] if rand < sum } end def goodpick(values, probs) randValue = rand() values.each_index { |index| sum = probs[0..index].inject { |x,y| x+y } return values[index] if randValue < sum } end def printTotal(total, values) values.each { |v| percent = total.count(v).to_f / 10000.0 puts "#{v}: #{percent}" } end values = ["a", "b", "c", "d"] probs = [0.1, 0.2, 0.3, 0.4] total = [] 10000.times { total << badpick(values, probs) } puts "Bad:" printTotal(total, values) total = [] 10000.times { total << goodpick(values, probs) } puts "Good:" printTotal(total, values)

Snippets Manager replied on Sat, 2009/09/19 - 7:43pm

There is one glaring error - you need to set the value of a random variable outside of the "size.times" codeblock in weighted_random_index rather than calling rand for each index. I wrote a quick simulation to show the difference between the two methods. def goodpick(values, probs) randValue = rand() values.each_index { |index| sum = probs[0..index].inject { |x,y| x+y } return values[index] if randValue < sum } end def printTotal(total, values) values.each { |v| percent = total.count(v).to_f / 10000.0 puts "#{v}: #{percent}" } end values = ["a", "b", "c", "d"] probs = [0.1, 0.2, 0.3, 0.4] total = [] 10000.times { total << badpick(values, probs) } puts "Bad:" printTotal(total, values) total = [] 10000.times { total << goodpick(values, probs) } puts "Good:" printTotal(total, values) You'll notice that d is is never 0.4 as it should be. The corrected version of the weighted_random_index method should be: def weighted_random_index(probability_array = probability(2) ) randValue = rand size.times do |x| return x if randValue < probability_array[0..x].sum end return 0 end

Snippets Manager replied on Wed, 2007/09/26 - 10:32am

This functionality is similar to something I wanted a while back. The main difference was that I did not want to pass in the weighting, but wanted the weighting to be a property of the objects in the array. Here is the code: class Array def random_weighted(method) total = self.map(&method).sum running_total = 0 index = rand(total) + 1 #puts "INDEX: #{index}" # we assume it is already sorted in descending order, although I suppose order does not matter self.each do |item| running_total += item.send(method) return item if index <= running_total end # it is possible to arrive here if all the elements had weight of zero. Handle this: return self[rand(self.size)] end end So, for a contrived example, I can now do something like: >> [1,2,4,8].random_weighted(:to_i) => 8 >> [1,2,4,8].random_weighted(:to_i) => 8 >> [1,2,4,8].random_weighted(:to_i) => 8 >> [1,2,4,8].random_weighted(:to_i) => 8 >> [1,2,4,8].random_weighted(:to_i) => 4 >> [1,2,4,8].random_weighted(:to_i) => 4 >> [1,2,4,8].random_weighted(:to_i) => 8 >> [1,2,4,8].random_weighted(:to_i) => 4 >> [1,2,4,8].random_weighted(:to_i) => 4 >> [1,2,4,8].random_weighted(:to_i) => 1 Or, I could do: >> %w(a one a_really_long_word ).random_weighted(:size) These are of course contrived examples. I am using it to do a random weighted selection of things that get clicked on the most on a web site.