ci/cd on gae

Google App Engine (GAE) is a "serverless" service within Google Cloud Platform which runs applications within containers on Google's hosted infrastructure.

For smaller scale applications, GAE is an easy-to-use and self-contained platform which allows developers to quickly make their code available at a public URL.

However after doing enterprise CI/CD with GAE for some time now, here are some issues we have encountered along the way.

Long Build / Deploy Time

Issue:
If you are doing CI/CD, chances are you have already built your application, run your tests on it, and when it comes time to deploy, you want to push up the same image that was built in the first step.

However by default, GAE will rebuild your application from source - most likely on slower systems than you are using - adding unnecessary delays to your deployments.

Solution:
Push your image to the Google Container Registry (GCR) associated with the account / project that you will be deploying.

When deploying with gcloud app deploy use the --image= flag to provide the full GCR URL for the image.

GAE will then use this pre-built image, greatly decreasing deployment times.

Note: If you use different projects to delineate between environments, you will need to push the image to each project / environment before deploying.

Of course, GAE will never be able to match the deployment times going to self-managed infrastructure, but this helps immensely.

502 / 503 after deploy

Issue:
Immediately after gcloud app deploy ... succeeds, the application will go down for 1-4 minutes.

Solution:
Instead of immediately promoting the deployment, run it with the --no-promote flag.

Then wait about a minute before switching the traffic with gcloud app services set-traffic.

GAE-flavor Runtimes

It's rare, but there were a few times where the GAE-flavor of an upstream runtime didn't exactly match what we were seeing locally, either on dev workstations or in our CI/CD build systems.

The most effective solution was to first switch to the custom runtime which granted a bit more control in the ops-realm.

But even in this case, the GAE-flavor base image sometimes caused issues when building in GAE's system.

However when using pre-built images and the deploying with the --image= flag from GCR as detailed above, this issue is completely mitigated.

Old Versions

By default, old versions of your service will remain running.

You can manually purge these, but we have found it better to stop and remove old versions after each deploy, keeping only the last N versions for quick roll back ability.

Log Availability After Version Deletion

Once a version is deleted, the logs are deleted with it.

Ideally, your application would be integrated with a logging platform such as Splunk or New Relic, but if not, you will want to capture these logs for auditing purposes before deleting a version.

Assuming your version IDs are unique (such as using git commit ids), you can run gcloud logging read resource.labels.version_id=[version] to capture all the logs for a version.

Conclusion

GAE is still a relatively new service, and in talking with Google's engineers, it does seem like they plan on expanding the offerings and hammering out the issues mentioned above.

last updated 2024-10-03