User not logged in - login - register
Home Calendar Books School Tool Photo Gallery Message Boards Users Statistics Advertise Site Info
go to bottom | |
 Message Boards » » Algorithm Question Page [1]  
AntecK7
All American
7755 Posts
user info
edit post

I have a database of items and sizes. I wan't to group them into groups of X objects with a total size of <=Y but as close to Y as possible

Here are some example values (there are actually 2k of them)
109
114
54
25
85
86
71
66
62
52
146
77
117
36
70
48

I want Put them into groups of 60 items with a group total value of 6000.

My first though is grab the largest and then the smallest, and keep doing that to the center median value. but ideally i would actually have them kinda averaged out.

There is probably some smart/packing algorithm or something that I could use here, but I'm not that savvy anymore..

Any thoughts

2/17/2015 7:30:28 PM

sarijoul
All American
14208 Posts
user info
edit post

So your givens are the number of items in a group and the group total value (and the large set of numbers to pull from)?

2/17/2015 7:50:08 PM

smoothcrim
Universal Magnetic!
18968 Posts
user info
edit post

if you can assume the set of values isn't ordered, as your set is listed, then the most (compute) efficient way to do that would be to grab the next element from the list and add it to the group until the next value would take you over. a couple nested loops should do it. if you wanted to add packing efficiency as well, just have it try list.next(), list.next().next(), and list.next().next().next(), so you bound any run-away searches. should be pretty close to O(n)

2/17/2015 8:29:13 PM

neolithic
All American
706 Posts
user info
edit post

It probably depends on how much you care about getting close to an optimal solution. A solution like the one outlined in ^ would probably be fine if you want a quick solution.

I think you would probably do a little better in terms of average distance from your upper bound if you first sorted your item size list. Create a priority queue of size 6000/60 = 100 and start adding the biggest items first to this priority queue. The PQ should be sorted on the basis of total item size for each element in the queue. Add the next biggest item from the list to the top of the PQ (i.e. the element with the smallest total size) and repeat. If an element hits a size of 6000 exactly, remove it from the PQ.

If you posted this to http://stackoverflow.com, I bet you would get some answers that would be nearly optimal within an hour.

2/17/2015 8:52:49 PM

moron
All American
34183 Posts
user info
edit post

Visualize your numbers as a distribution. You have 33 buckets to pluck from.

To keep things even, you have to drain the buckets by pulling from pairs in the nth bucket and n+33/2 bucket .

You'll have to calculate a distribution for your values though to use this (which shouldn't be hard really).

2/17/2015 9:19:10 PM

AntecK7
All American
7755 Posts
user info
edit post

does need to be optimal for allocation however, compute/caluation optimization isn't really important.

I don't want to really brute force it... Also my code has to be in powershell.

Right now this is my current plan

Since I know that the average Value of the group=100

Create a group, get the first Item from the list add that to the group

find a value that has not been assigned to a group such that (Group+New Value) is as close to 100 as possible without putting the sum(group)>6000

Add that value to the group, continue till the group has 60 items in it or the sum of the group>6000

then create a new group, and grab the first free value.

Again, probably horrific from a compute standpoint, but I think it will get me near where i want to be.

on a simplified set of where say we wanted 5 items per group with a sum of 500, using the data i posted earlier
Group 1 would be something like
109,86,114,77,117

Group 2 would be something like
54,146,70,117, et cetra

2/17/2015 9:20:27 PM

LimpyNuts
All American
16859 Posts
user info
edit post

This looks like a variation of the "k-partition problem" which is well-studied. I did a quick googling and found "The Easiest Hard Problem: Number Partitioning" by Stephan Mertens which presents some methods for partitioning a set into 2 subsets. You might try looking in math or computer science journals.

2/18/2015 12:48:36 PM

AntecK7
All American
7755 Posts
user info
edit post

^ thanks for the tip.

That makes sense,

Looking at solutions for the "Balanced Partition Problem"

which i can recursively call on the set breaking them into 2 sets of about equal size and count, then keep subdividing those sets.

[Edited on February 18, 2015 at 3:11 PM. Reason : dd]

2/18/2015 3:04:30 PM

Novicane
All American
15416 Posts
user info
edit post

sounds like a data structures class

i've seen this before but my mind isn't functioning right now.

2/18/2015 4:33:25 PM

AntecK7
All American
7755 Posts
user info
edit post

so i think i got the follwoing powershell code working for the most part


$myArray = 1,6,2,4,5,9,1
$ArraySize=$myarray.count

$Sum=($myArray|Measure-Object -Sum).sum

$sol=new-object bool[] ([math]::Round($sum/2+1,[System.MidpointRounding]::AwayFromZero))
$sol[0]=$true

#for ([int]$i=0;$i -lt $ArraySize;$i++){
foreach($i in $myArray){
for ([int]$j=$sum/2;$j -ge $i;$j--){
##-Host IValue:$i
#Write-Host JValue:$j
if ($sol[$j-$i]){
$sol[$j]=$true
} #End If

}#End J Loop

}#End I Loop

[int]$HalfSumCloser=([math]::Round($sum/2,[System.MidpointRounding]::AwayFromZero))

For(;($sol[$HalfSumCloser]) -ne $true{
$HalfSumCloser--
}

(($sum - $halfsumcloser) - $halfsumcloser)


but that tells me if i can do it, not how to actually do it.

2/18/2015 5:21:03 PM

AntecK7
All American
7755 Posts
user info
edit post

trying to implement this

http://people.cs.clemson.edu/~bcdean/dp_practice/dp_4.swf

Yea.... i haven't done this type of programming problem in 12 years... if ever.

2/18/2015 6:17:37 PM

AntecK7
All American
7755 Posts
user info
edit post

$myarray=@()
$myarray+=120
for($i=0;$i -lt 10;$i++){
$myarray+=Get-Random -Minimum 80 -Maximum 150

}
$best=($myarray|Measure-Object -Sum).Sum
$target=$best/2
write-host Starting work arraysum $best
$bests=@()

$p=@{}
$p[-1]=@{0=@()}
for ([int]$i=0;$i -lt $myArray.Length;$i++){
$x=$myArray[$i]
$p[$i]=@{}
foreach ($j in $p[$i-1].Keys){
$p[$i][$j]=$p[$i-1][$j]
$p[$i][$j+$x]=$p[$i-1][$j]+@($x)
#$p
if([math]::Abs($j+$x-$target) -lt [math]::Abs($best-$target) -and ($p[$i][$j+$x]).count -eq [int]($myArray.count/2)){
$best=$j+$x
#write-host Best $best
#write-host besti $i $j $x
$bests=$p[$i][$j+$x]

}
}
}

Write-Host Solution error: ($best-$target)
write-host Original Array ($myarray|Sort-Object)
write-Host Group1 (($bests|Sort-Object) -join ', ')
Write-Host group1 sum ($best|measure-object -Sum).Sum

[System.Collections.ArrayList]$group2list=$myArray
foreach($value in $bests){
$group2list.Remove($value)
}

Write-Host
write-Host Group2 (($group2list|Sort-Object) -join ', ')
Write-Host group2 sum ($group2list|measure-object -Sum).Sum






[Edited on February 18, 2015 at 11:59 PM. Reason : BUGS!]

2/18/2015 11:51:28 PM

AntecK7
All American
7755 Posts
user info
edit post

one note on this,

on a set of 1000 buckets this thing DRINKS memory

Basically re-wrote this python code in ps

https://bj0z.wordpress.com/2011/03/07/the-balanced-partition-problem/

[Edited on February 19, 2015 at 12:22 AM. Reason : phython]

2/19/2015 12:20:43 AM

AntecK7
All American
7755 Posts
user info
edit post

okay so screw those bastards

here is how i'm doing it, seems 1000 times faster, although it might not be 100% accurate

$myarray=@()
for($i=0;$i -lt 1000;$i++){
$myarray+=Get-Random -Minimum 80 -Maximum 150

}


[System.Collections.ArrayList]$Set1=@()
[System.Collections.ArrayList]$Set2=@()

for([int]$i=0;$i -lt $myArray.Count/2;$i++){
$set1.Add($myArray[$i])
}
for([int]$i=$myArray.Count/2;$i -lt $myArray.Count;$i++){
$set2.Add($myArray[$i])
}

$s1sum=($set1|Measure-Object -Sum).Sum
$s2sum=($set2|Measure-Object -Sum).Sum

$setdifference=$s1sum-$s2sum

for($i=0;$i -lt $set1.Count;$i++){
if($setdifference -eq 0){
break}
$setdifference

$s1item=$set1[$i]

for($J=0;$j -lt $set2.Count;$j++){
$s2item=$set2[$j]

if([math]::Abs((($s1sum-$s1item+$s2item)-($s2sum-$s2item+$s1item))) -lt [math]::abs($setdifference)){
#write-host Change
$set1.Remove($s1item)
$set1.Add($s2item)
$set2.remove($s2item)
$set2.Add($s1item)
$s1sum=($set1|Measure-Object -Sum).Sum
$s2sum=($set2|Measure-Object -Sum).Sum
$setdifference=$s1sum-$s2sum
$i=-1
break

}}


}

write-host TotalSum ($myarray|Measure-Object -sum).sum
write-host StartArray (($myarray|Sort-Object) -join ', ')

write-host Set1Sum ($set1|Measure-Object -sum).sum
write-host Set1 (($set1|Sort-Object) -join ', ')


write-host Set2Sum ($set2|Measure-Object -sum).sum
write-host Set2 (($set2|Sort-Object) -join ', ')

2/19/2015 1:47:43 AM

409Sea
New Recruit
19 Posts
user info
edit post

You said you want groups of 60 close to 6000, this implies the mean of all your values is more or less 100. Is this for certain? With a mean reasonably far away from 100 I think you'd end up with a lot of values that you couldn't get to add up to 6000 in groups of 60.

2/20/2015 6:09:21 PM

AntecK7
All American
7755 Posts
user info
edit post

^just an example, right now values are between 80 and 300, with a mode around 120.


Actually trying to load balance file servers to datacenter DFS servers

2/21/2015 12:38:29 AM

 Message Boards » Tech Talk » Algorithm Question Page [1]  
go to top | |
Admin Options : move topic | lock topic

© 2024 by The Wolf Web - All Rights Reserved.
The material located at this site is not endorsed, sponsored or provided by or on behalf of North Carolina State University.
Powered by CrazyWeb v2.39 - our disclaimer.