Host Node Randomization Ideas

christra · December 14, 2022, 6:31pm

While implementing the hostname constraint, I’d like to also implement host randomization options in the python xir model.

This is useful from an experimental point of view and from an operational point of view to distribute the wear and tear across the nodes more evenly as well.

I’m thinking of something like passing a list of metrics to sort by.
Here are some example metrics:

Cost
Hostname
PercentAllocated

It’s also possible to divide costs further down, currently it is the aggregate of CPU, Disk, and Network that the node’s specs have (aka does not consider what’s allocated already on it).

So, if you wanted the current algorithm (as the unit tests would want), you’d pass something along the lines of:
net = Network("test", hostsorting==[LowestCost, LowestHostname])

where things are sorted first by lowest cost, and if their costs are equal, then you compare the hostnames, leading to a not very random selection.

Later on, the realization algorithm iterates through this sorted list of hosts and tries provisioning each guest onto each host until it’s successful (which means that the host has enough raw resources to rlz on).

If we allow for other selections than that, if you wanted just randomization using the lowest cost nodes, you’d pass:
hostsorting==[LowestCost]
This should probably be the default if the list is nil (which is different than the empty list).

If you wanted to place them actually anywhere, you’d pass:
hostsorting==[]
(Whether we allow this is a separate consideration)

If you wanted to “try” to place as much of it on as few as possible nodes, you could pass:
hostsorting==[LowestCost, LowestPercentAllocated]

The backend algorithm is pretty simple:

# Each node will have a random unique number assigned to it as node.index

def compare(node_a, node_b):
  for metric in metrics:
    # metric(node_a, node_b) can return:
    #   - EQUAL
    #   - LESSTHAN
    #   - GREATERTHAN
    if metric(node_a, node_b) is not EQUAL:
      return metric(noda_a, node_b)

  return node_a.index < node_b.index ? LESSTHAN : GREATERTHAN

Is this overly complicated on the user end? As in, does this expose too much of the internals? Should we just have user pass a single enum value instead, where each option maps onto one of these lists?

Should LowestCost be implicitly the first element of each list? In other words, should that always be the dominant metric to sort by?

We will probably need a new uint64list constraint object to do this?

With the hostname constraint, then you’re able to reproduce any specific realization, at least in terms of node selection, for debugging purposes.

Although, it might be easier to pass something like a dictionary of guestname → hostname to the net initialization and to have the CLI be able of producing such a dictionary.

jdbarnes · December 14, 2022, 7:43pm

tl;dr:
i think percent allocated is probably the only thing necessary, given some additional realization request capabilities, and if you want to wear level, counters of some kind are more useful than percent allocated

i don’t run experiments, really, so take what i say with a big fat grain of salt, but to me it seems like we are relying on users to be knowledgeable about the technologies, internals when they really just want their constraints to offer them (potentially) reproducibility, for whatever reason.

If a user cares about reproducibility, is there a way they can reference a previous realization instead of adding constraints to hostnames? that is, could they say “please try to re-realize a new realization the same as this old one”?
also would a user even be aware of hostnames on a realization unless they’re specifically looking for it?

re: wear leveling, is it possible to introduce counters for baremetal and virtual materializations on a node so you can cost by number of materializations rather than by “location”? the only real “wear” that is happening on these machines from what i understand is going to be on the disks, so if you really want to wear level for disk you might need something like the prometheus stat for bytes written if it exists… if you are talking about distributing bandwidth utilization, percent allocated may be the way to go, but in terms of “wear” you’re going to run into the same issue of wear means a counter rather than an “in use” flag.

re: affinity/anti-affinity/spacing for vms – this might be difficult without a clear picture of actual network costs, but students, for example, don’t usually care about nanoseconds difference in link latency, nor do they really usually care about the underlying network structure as long as “node a <-> node b” is a functional link within the speed constraint they have (100mbps/1gbps), at least from my experience on deterlab. i think this is probably not necessary to consider if it does not affect the constraints given by the experimenter, however, it would make sense to me that an “advanced level” experimenter might want to set up some rules for this if they see artifacts from the underlying infrastructure showing up in their experiment results – i think we may need a clearer picture of how the underlying infra affects experiments to really optimize merge allocations, but in the mean time, perhaps your solution is the most elegant for advanced users.