Differential Materialization

ry · April 19, 2020, 2:57pm

Introduction

The MergeTB model of an experiment is an immutable stack of experiment versions. When a user wants to update some aspect of their experiment topology, they do so by updating the experiment source code and pushing a new version. In the process of this push model checkers and reticulators are run, yielding a new underlying XIR experiment definition.

At this point the experimenter can create a new realization from this new version, materialize it and run a side-by-side comparison of the old version with the new version. This can be quite useful. However there is an alternative use case that we don’t currently support terribly well.

What if the reason for the change is to correct something with the experiment that is just straight up wrong? In this case we don’t want a second materialization for comparative analysis, we just want to modify the experiment in place. Furthermore we’d like the modification of the running materialization to be exactly the delta between the broken version and our new possibly not-broken version. This minimal change delta is critically important for experiments that have complex systems already in place, and would like to keep those systems in place, but just alter the nodes or topology in a specific way that let’s the user pick up right where they left off.

Benefit

The core benefit here is that we introduce a workflow that allows for experiments to be designed in a rapid incremental fashion that directly leverages the rigor of having an immutable experiment history. Complex experiments are never correct on the first shot. Forcing the user in to full rematerialization for every topology update is in direct violation of the speed core value. At the same time introducing ad-hoc realization or materialization updates that are not rooted in the model source violates experiment integrity as the experiment becomes a living element without a record of structure or procedure. This approach allows users to have their reproducible cake and devour it with speed.

Proposal

Introduce a new MergeAPI method

mergetb evolve <exp> <rlz> <to>

Here exp is the name of the experiment rlz is the name of a realization and to is the name has of a source revision that the user has pushed and would like the realization and subsequently the materialization to evolve to.

When this API call is made the following things happen.

a new realization is attempted, if failed stop here
the current realization is replaced by the new realization
if there is an active materialization a materialization evolution plan is computed
the materialization evolution plan is carried out

Requirements

Evolution plans must represent the minimum delta to achieve target specification.
- Nodes not involved in the delta must not be disturbed
- Links not involved in the delta must not be distributed
- Changes to node configuration should only result in a foundry update. Rebooting or re-imgaing a node are not changes to the experiment and can be carried out using alternative commands. The exception is changes that require these services like changing the image of a node which clearly requires a re-image/reboot.
Evolutions should behave like partial materializations
- mergetb status should show the task status of the evolution.
- mergetb wait should wait until the evolution is done.
- mergetb demat|mat|free should be gated on evolution just like materializations
XDC connections should not be disturbed
Any resources dropped from the realization due to the evolution should be freed and returned to the pool.
Any resources added to the realization due to the evolution should be allocated to the realization.
Evolution should have the same semantics as realization in terms of acceptance. When the user creates an evolution, they should be able to inspect the modified realization that has been computed and decide if that is what they want and choose to accept or reject the evolution. The freeing/allocation of resources in this regard behaves in the same way as normal realizations, except the resources in question are strictly in the delta between the originating realization and the new realization.
When presenting an evolution for acceptance, the API should provide the list of tasks that will be carried out, so the user knows exactly what is going to happen and there are no surprises.

ry · August 18, 2020, 9:17pm

Use Cases

Things that should not, but currently do require a full re-realize/re-mat

Mounting storage when you forgot to add the mount to the model.
Changing link parameters, or adding tags for dynamic link parameters.