AntecK7 All American 7755 Posts user info edit post |
I have a database of items and sizes. I wan't to group them into groups of X objects with a total size of <=Y but as close to Y as possible
Here are some example values (there are actually 2k of them) 109 114 54 25 85 86 71 66 62 52 146 77 117 36 70 48
I want Put them into groups of 60 items with a group total value of 6000.
My first though is grab the largest and then the smallest, and keep doing that to the center median value. but ideally i would actually have them kinda averaged out.
There is probably some smart/packing algorithm or something that I could use here, but I'm not that savvy anymore..
Any thoughts 2/17/2015 7:30:28 PM |
sarijoul All American 14208 Posts user info edit post |
So your givens are the number of items in a group and the group total value (and the large set of numbers to pull from)? 2/17/2015 7:50:08 PM |
smoothcrim Universal Magnetic! 18968 Posts user info edit post |
if you can assume the set of values isn't ordered, as your set is listed, then the most (compute) efficient way to do that would be to grab the next element from the list and add it to the group until the next value would take you over. a couple nested loops should do it. if you wanted to add packing efficiency as well, just have it try list.next(), list.next().next(), and list.next().next().next(), so you bound any run-away searches. should be pretty close to O(n) 2/17/2015 8:29:13 PM |
neolithic All American 706 Posts user info edit post |
It probably depends on how much you care about getting close to an optimal solution. A solution like the one outlined in ^ would probably be fine if you want a quick solution.
I think you would probably do a little better in terms of average distance from your upper bound if you first sorted your item size list. Create a priority queue of size 6000/60 = 100 and start adding the biggest items first to this priority queue. The PQ should be sorted on the basis of total item size for each element in the queue. Add the next biggest item from the list to the top of the PQ (i.e. the element with the smallest total size) and repeat. If an element hits a size of 6000 exactly, remove it from the PQ.
If you posted this to http://stackoverflow.com, I bet you would get some answers that would be nearly optimal within an hour. 2/17/2015 8:52:49 PM |
moron All American 34183 Posts user info edit post |
Visualize your numbers as a distribution. You have 33 buckets to pluck from.
To keep things even, you have to drain the buckets by pulling from pairs in the nth bucket and n+33/2 bucket .
You'll have to calculate a distribution for your values though to use this (which shouldn't be hard really). 2/17/2015 9:19:10 PM |
AntecK7 All American 7755 Posts user info edit post |
does need to be optimal for allocation however, compute/caluation optimization isn't really important.
I don't want to really brute force it... Also my code has to be in powershell.
Right now this is my current plan
Since I know that the average Value of the group=100
Create a group, get the first Item from the list add that to the group
find a value that has not been assigned to a group such that (Group+New Value) is as close to 100 as possible without putting the sum(group)>6000
Add that value to the group, continue till the group has 60 items in it or the sum of the group>6000
then create a new group, and grab the first free value.
Again, probably horrific from a compute standpoint, but I think it will get me near where i want to be.
on a simplified set of where say we wanted 5 items per group with a sum of 500, using the data i posted earlier Group 1 would be something like 109,86,114,77,117
Group 2 would be something like 54,146,70,117, et cetra 2/17/2015 9:20:27 PM |
LimpyNuts All American 16859 Posts user info edit post |
This looks like a variation of the "k-partition problem" which is well-studied. I did a quick googling and found "The Easiest Hard Problem: Number Partitioning" by Stephan Mertens which presents some methods for partitioning a set into 2 subsets. You might try looking in math or computer science journals. 2/18/2015 12:48:36 PM |
AntecK7 All American 7755 Posts user info edit post |
^ thanks for the tip.
That makes sense,
Looking at solutions for the "Balanced Partition Problem"
which i can recursively call on the set breaking them into 2 sets of about equal size and count, then keep subdividing those sets.
[Edited on February 18, 2015 at 3:11 PM. Reason : dd] 2/18/2015 3:04:30 PM |
Novicane All American 15416 Posts user info edit post |
sounds like a data structures class
i've seen this before but my mind isn't functioning right now. 2/18/2015 4:33:25 PM |
AntecK7 All American 7755 Posts user info edit post |
so i think i got the follwoing powershell code working for the most part
$myArray = 1,6,2,4,5,9,1 $ArraySize=$myarray.count
$Sum=($myArray|Measure-Object -Sum).sum
$sol=new-object bool[] ([math]::Round($sum/2+1,[System.MidpointRounding]::AwayFromZero)) $sol[0]=$true
#for ([int]$i=0;$i -lt $ArraySize;$i++){ foreach($i in $myArray){ for ([int]$j=$sum/2;$j -ge $i;$j--){ ##-Host IValue:$i #Write-Host JValue:$j if ($sol[$j-$i]){ $sol[$j]=$true } #End If
}#End J Loop
}#End I Loop
[int]$HalfSumCloser=([math]::Round($sum/2,[System.MidpointRounding]::AwayFromZero))
For(;($sol[$HalfSumCloser]) -ne $true{ $HalfSumCloser-- }
(($sum - $halfsumcloser) - $halfsumcloser)
but that tells me if i can do it, not how to actually do it. 2/18/2015 5:21:03 PM |
AntecK7 All American 7755 Posts user info edit post |
trying to implement this
http://people.cs.clemson.edu/~bcdean/dp_practice/dp_4.swf
Yea.... i haven't done this type of programming problem in 12 years... if ever. 2/18/2015 6:17:37 PM |
AntecK7 All American 7755 Posts user info edit post |
$myarray=@() $myarray+=120 for($i=0;$i -lt 10;$i++){ $myarray+=Get-Random -Minimum 80 -Maximum 150
} $best=($myarray|Measure-Object -Sum).Sum $target=$best/2 write-host Starting work arraysum $best $bests=@()
$p=@{} $p[-1]=@{0=@()} for ([int]$i=0;$i -lt $myArray.Length;$i++){ $x=$myArray[$i] $p[$i]=@{} foreach ($j in $p[$i-1].Keys){ $p[$i][$j]=$p[$i-1][$j] $p[$i][$j+$x]=$p[$i-1][$j]+@($x) #$p if([math]::Abs($j+$x-$target) -lt [math]::Abs($best-$target) -and ($p[$i][$j+$x]).count -eq [int]($myArray.count/2)){ $best=$j+$x #write-host Best $best #write-host besti $i $j $x $bests=$p[$i][$j+$x]
} } }
Write-Host Solution error: ($best-$target) write-host Original Array ($myarray|Sort-Object) write-Host Group1 (($bests|Sort-Object) -join ', ') Write-Host group1 sum ($best|measure-object -Sum).Sum
[System.Collections.ArrayList]$group2list=$myArray foreach($value in $bests){ $group2list.Remove($value) }
Write-Host write-Host Group2 (($group2list|Sort-Object) -join ', ') Write-Host group2 sum ($group2list|measure-object -Sum).Sum
[Edited on February 18, 2015 at 11:59 PM. Reason : BUGS!] 2/18/2015 11:51:28 PM |
AntecK7 All American 7755 Posts user info edit post |
one note on this,
on a set of 1000 buckets this thing DRINKS memory
Basically re-wrote this python code in ps
https://bj0z.wordpress.com/2011/03/07/the-balanced-partition-problem/
[Edited on February 19, 2015 at 12:22 AM. Reason : phython] 2/19/2015 12:20:43 AM |
AntecK7 All American 7755 Posts user info edit post |
okay so screw those bastards
here is how i'm doing it, seems 1000 times faster, although it might not be 100% accurate
$myarray=@() for($i=0;$i -lt 1000;$i++){ $myarray+=Get-Random -Minimum 80 -Maximum 150
}
[System.Collections.ArrayList]$Set1=@() [System.Collections.ArrayList]$Set2=@()
for([int]$i=0;$i -lt $myArray.Count/2;$i++){ $set1.Add($myArray[$i]) } for([int]$i=$myArray.Count/2;$i -lt $myArray.Count;$i++){ $set2.Add($myArray[$i]) }
$s1sum=($set1|Measure-Object -Sum).Sum $s2sum=($set2|Measure-Object -Sum).Sum
$setdifference=$s1sum-$s2sum
for($i=0;$i -lt $set1.Count;$i++){ if($setdifference -eq 0){ break} $setdifference
$s1item=$set1[$i]
for($J=0;$j -lt $set2.Count;$j++){ $s2item=$set2[$j]
if([math]::Abs((($s1sum-$s1item+$s2item)-($s2sum-$s2item+$s1item))) -lt [math]::abs($setdifference)){ #write-host Change $set1.Remove($s1item) $set1.Add($s2item) $set2.remove($s2item) $set2.Add($s1item) $s1sum=($set1|Measure-Object -Sum).Sum $s2sum=($set2|Measure-Object -Sum).Sum $setdifference=$s1sum-$s2sum $i=-1 break
}}
}
write-host TotalSum ($myarray|Measure-Object -sum).sum write-host StartArray (($myarray|Sort-Object) -join ', ')
write-host Set1Sum ($set1|Measure-Object -sum).sum write-host Set1 (($set1|Sort-Object) -join ', ')
write-host Set2Sum ($set2|Measure-Object -sum).sum write-host Set2 (($set2|Sort-Object) -join ', ') 2/19/2015 1:47:43 AM |
409Sea New Recruit 19 Posts user info edit post |
You said you want groups of 60 close to 6000, this implies the mean of all your values is more or less 100. Is this for certain? With a mean reasonably far away from 100 I think you'd end up with a lot of values that you couldn't get to add up to 6000 in groups of 60. 2/20/2015 6:09:21 PM |
AntecK7 All American 7755 Posts user info edit post |
^just an example, right now values are between 80 and 300, with a mode around 120.
Actually trying to load balance file servers to datacenter DFS servers 2/21/2015 12:38:29 AM |