One of the problems we originally encountered with XDCs was that a user could easily saturate a portal server’s resources. This created a situation where XDCs colocated with a resource hog could not perform work due to lack of resources.
The solution to the problem was to introduce resource limits. However, this has come with its own issues.
- A container is not aware of what the resource limits are from inside the container. So software can easily OOM with no real way to self-police.
- Running ansible playbooks is one of the primary things users do on XDCs, and an Ansible playbook configuring an inventory of dozens or hundreds of nodes can easily consume significant amounts of memory through forking. Often times forking is required to get reasonable performance from a playbook.
This thread is to discuss potential solutions.