Refactoring current GitLab practices and tooling in light of new Gold subscription

isi-pchandler · May 7, 2020, 5:28pm

Background

Now that @ry has acquired a “Gold” license for MergeTB’s hosted GitLab, we need to revisit our approach to leveraging GitLab for MergeTB.

While we did get the gold license for free because we are open source and “educational”, it does have membership quotas:

Only 20 non-guest members are allowed with our current license agreement
Guest members are not metered

Note in this context, the following terms apply:

Term	Meaning
`member`	A logged in user who has been added as a team or project member for a team or project within the “MergeTB” GitLab team.
`guest member`	A `member` who has been added to a MergeTB team or project as in a GitLab `guest` role
`non-guest member`	A `member` who has been added to a MergeTB team or project in a role with higher permissions than `guest`, e.g. developer etc.

Further background reading:

Gitlab roles versus permissions

Agenda

Things to figure out.

This is a living document, so feel free to add to this wishlist.

Git workflow choices

Do we stick with the current workflow, which is essentially a “Feature branch workflow” ?

Or do we move to a “Forking workflow (aka fork&pull)” ?

Aside from questions of how to handle our own “source” material, we should also consider the concerns raised in the Keeping up with upstream projects topic.

Git pipeline choices

Multi-project pipelining

One of the reasons that the gold license was requested, was to get better support for mulit-project ci/cd pipelines.

Are there any existing projects which could benefit from this?

Merge trains

Are there any existing projects which could benefit from this?

Project management

Group or team level issues

Currently issues are only maintained at individual project levels.

There is a need for cross-project issues for concerns that cross individual project scopes.

Are there any gitlab features in gold that would help with this? And how do we want to use them?

isi-pchandler · May 7, 2020, 5:29pm

Kicking this off based on a conversation in the team’s mattermost.

isi-pchandler · May 7, 2020, 5:41pm

Here’s some of my initial 2 cents on the git workflow question.

A lot of it hinges around mitigating risk of secrets leakage and securing/isolating gitlab runners. And how much risk we are willing to accept.

Fork and pull friction points

Problem statement

Elaborating on what @ry and @lincoln have told me in the past, and on my own exposure to this workflow on past projects.

Pipelines in forked projects (like for merge requests), do not run in the parent project’s “space”
- Full background here
- Gist is that if the parent project’s gitlab ci config needs secrets or other goodies to conduct build steps or delivery/deployment steps, then those have to be copied into each forked project’s settings. Either manually, or via terraform etc. recipes
- And those secrets are under the control of the forked project owner, not the main team
Gitlab runner permissions
- Aside from secrets, in MergeTB’s scenario with custom ‘privileged’ runners, the material in forked projects would be run in privileged mode on the custom runners whenever a forked project created a merge request.
- This would mean that guest level perms would allow running arbitrary code on our gitlab runners.

So “guests” in this world have a lot of access and it isn’t trivial for them to configure their forked projects to properly participate in the mergetb ci/cd.

Possible solutions / mitigations

Some ways to mitigate these friction points might be to:

For external services (outside of GitLab) which need to be logged into by gitlab jobs
- Use service accounts with distinct privileges , some with read-only and some with read-write
- Only run the highest risk jobs as a result of tag or merge request events that happen solely within the parent projects, and not the forked projects.
For custom gitlab runners
- Use two different gitlab runner populations on farms that are isolated from each other
- One farm is for jobs that are only run by events triggered in parent project “space”, and the other is a sort of dmz for jobs that are run by events triggered in forked project “spaces”
To reduce the project configuration for forked projects, try the following
- At a minimum, clearly document the manual steps necessary to configure forked projects
- Then see if tooling can be arranged to automate as much of that as possible
  - See if we can provide Python scripts (or similar) that hit the GitLab rest apis on behalf of the forked project owner and properly configure their projects.
  - Or require that forked projects give access to a service account owned by MergeTB which can then be used to terraform their projects for them.

isi-pchandler · May 8, 2020, 2:00am

FYI

I am doing some prototyping of some of the fork+pull mitigation ideas and tooling to support them.

Where is the “beef” ?

Currently the prototypes reside in a GitLab repo: isi-pchandler/mergetb-devops-gl-config-automation. Logged in MergeTB team members should be able to view it.

I am running them against a community edition GitLab that I have running on my local MiniKube via the official GitLab helm charts.

Goals

The two main goals for the prototyping are:

Prototyping tools used to curate common config settings for
the groups and projects within the MergeTB gitlab team
Prototyping tools that owners of forked projects could use
to configure their forks to align with common MergeTB parent
project config expecations

Status

Currently I’ve got a crude Terraform recipe going that enforces local culture defaults for:

groups
memberships in groups for enumerated members
project settings for
- push/merge norms of behavior
- protected branches and tags (as well as default permission levels for them)

Current take aways

For curating “parent” team entities:

I’m only a Terraform novice, but its recipes seem more capable than those in the Ansible gitlab modules.
Downside, is it seems like running “terraform destroy” will nuke anything managed by Terraform in my current recipes. Haven’t figured out the right way to denote certain things as “make sure this exists, but don’t destroy it ever”. Probably my own failing to understand Terraform well enough …

For curating “forked project” entities:

While terraform seems to have a good set of capabilities that could manage a forked project, I worry about the learning curve for the forked project owners. It is one thing for the team’s devops-ish folks to maintain terraform recipes centrally, it is another to offer terraform recipes as a “self service” tool for forked project owners.

Next Steps

Implement strawman pipelines that explore:

Gating secrets propagation from parent projects to forked

isi-pchandler · May 8, 2020, 5:49pm

Did some further reading on Terraform. I can confirm that it’s world view is that if you are going to run ‘terraform destroy’, then you really intend to destroy the managed resources.

So hypothetically, if the gitlab projects are enumerated as ‘data’, not ‘resources’ in the Terraform world view, then terraform recipes could be used to apply tag/branch protection policies and similar to them. And there would be no risk of destroying the projects themselves. But there are a lot of policy attributes which can only be set in a “gitlab project resource”. So one wouldn’t be able to manage those attributes without making those git projects subject to destruction by terraform if an operator makes a mistake.

I’m going to look at just using terraform to stand up test fixtures, and then try out the gitlab api for applying policies in a less risky manner, using those test fixtures as guinea pigs.

Also moving this prototyping into my cicd poc project in gitlab.