service mesh sso

A vital feature of a service mesh is secure communication between services.

For services immediately within the mesh, communication is secured with mTLS and the root certs installed in the mesh.

For end-users residing outside of the mesh, either OIDC or JWT must be implemented to securely grant access to resources.

In a large network, SSO will be required to centralize security policies and access controls as well as increase user experience across services.

Many SSO providers require some level of code change within the application, however authentication and authorization can increasingly be done at the mesh level, enabling legacy systems to be secured by modern protocols without any code change.

Traditional Network Trust

Traditional network security of implicit trust within the network simply does not suffice in the modern era of cloud computing, global clients, and edge-computing.

While VPNs and proxy servers are often used to enable users physically outside of the network to access internal resources, these lead to choke points and single points of failure for network traffic.

These issues have only been made more apparent in the current era of work-from-home where most of the network's internal clients suddenly became external clients overnight, all fighting for bandwidth in the VPN and proxy servers that were originally designed for a much smaller workload.

This model made sense when some systems / applications simply could not be exposed as they did not implement modern (if any) security protocols, or in situations where communication between services was in plain text and therefore subject to easy MITM sniffing.

Additionally, when all systems in a network resided in the same physical data center, there was not as much focus on securing traffic between and around systems when the only link between them was a physical ethernet cable.

However with the move to the cloud where your compute can be physically anywhere in the world and moves frequently, or when your clients themselves are globally dispersed and not always internal trusted users, relying on an implicit trust network begins to fall apart.

The implementation of a trustless network where communication between services requires mutual certificate validation is of course the goal, however this is often difficult to do while maintaining backwards compatibility with legacy systems built on top of an implicit trust model.

Service Mesh Ingress

The implementation of service mesh design enables optimization of network traffic within the mesh, so that high-traffic services within the network do not need to NAT out to communicate between VPCs if they reside in the same global network.

Addtionally, within the mesh service-to-service communication is secured through the use of mTLS and authorization policies.

However to enable end-users to securely access resources within the mesh, an additional security layer must be implemented.

Within a desktop configuration policy you can install root certs on client devices however in an increasingly bring-your-device environment, or in situations where your users are on mobile devices, relying on root access to your client's machines is not always guaranteed.

Furthermore, the use of an IP ACL proxy not only bottlenecks all traffic through this proxy, it also relies on the idea that your users' IPs are well-known, static, and never going to be used by an unauthorized user within their network.

By implementing short-lived sessions secured through OIDC or JWT, end-users will be able to securely access resources based on their explicilty granted scopes, rather than implicitly granting all users access to all resources in the network.

This enables services to be exposed securely at the edge, with only users supplying a valid signed JWT claim being able to access secured resources.

As each service is fronted by a unique envoy proxy, the noisy neighbor problem experienced in proxy servers is not experienced, and services and their corresponding ingresses can be scaled independently.

For high-traffic systems, this enables them to be moved from behind central proxy routes to securely terminate traffic at edge.

SSO JWT Issuer

To implement SSO across a service mesh, you must first have a central auth provider which can issue client JWTs and publish a list of JWKs that the mesh can access to validate client-provided keys and claims.

This auth provider should provide users with a claim payload which can grant them access to all services the user should access within the mesh.

Depending on the auth provider this will either be returned in a JSON payload, or in a deeply-integrated provider running on the domain, will set an Authorization cookie on the domain to enable the user to seamlessly navigate to other SSO resources.

Istio JWT Configuration

Every user-facing service in the mesh should be configured with a RequestAuthentication resource defining the JWT issuer and jwksUri, and an AuthorizationPolicy defining the subjects and list-type claims granting access to the service.

Alternatively, a single policy can be created and all services labeled with a common label (example: sso: enabled), however this will prevent more granular control over service access.

The service will now return RBAC: access denied unless an Authorization: Bearer [jwt] header is sent in the request.

Great! We now have SSO for all services in our mesh.

Except for one thing.

Providing the SSO JWT as a header works great with curl, but in a real-world SSO environment, your end users will be accessing services within their browser, where they cannot easily modify headers on the fly.

Additionally, headers need to be set on every request, whereas your SSO JWT was most likely set as a cookie on the domain.

While cookies are technically sent as a header, at the time of writing this, Istio currently does not check cookies for the JWT token, and only checks the Authorization header.

There is an Issue on GitHub in which Lua envoy filters were tested, however as discussed, it is preferred to keep the mesh as "clean" of external code whenever possible.

To the client...

Client Side Extension

Our SSO provider returns the JWT as an Authorization cookie.

As the SSO is hosted on our domain, this cookie will follow the user throughout the network, authenticating every request across services with the same JWT.

By creating a tiny Chrome extension to copy the cookie to headers sent for services in the domain, we can not only enable the cookie to be sent to our services as the expected header (without modifying the mesh itself!), we can also inline the SSO login page as a simple iframe in the extension popup, making SSO login even more convenient.

Once this extension is installed, users will be able to click the extension icon to log into the company-wide SSO which will set a cookie on the domain, and then for all domains defined in the extension, the cookie will be additionally sent as an HTTP header on every request, enabling services within the mesh to validate the JWT.

This solution is more stable than the lua solutions investigated as those are dependent on the changing Envoy API and Istio patching support, whereas this is a simple client-side HTTP header modification that can be implemented without touching the mesh itself.

Trustless Security

By securing services through the use of signed and validated JWT claims rather than bottlenecking requests through a central proxy or VPN, it enables services to scale much more efficiently and effectively in the cloud without compromising security.

On the contrary, the move away from an implicit trust VPN model to an explicit trust security-at-edge model increases network security, as users will only be granted access to the resources they have been explicitly granted access to with short-lived JWT tokens.

With more and more users accessing resources from home or on mobile devices, there is no longer a need to bottleneck all traffic through VPNs into the network or whitelist specific IPs, as the user's access to resources is secured by the SSO provider and validated by the token granted to them by the auth provider.

Access lists and policies can be moved from the disparate combination of firewalls, proxy servers, and app-internal auth systems, to centralize all auth and access controls to the SSO provider for better review, auditing, and management.

While this is a significant shift, legacy systems do not need to be left behind. These same policies and toolsets will work exactly the same with legacy systems which are fronted with Istio, as all auth is done at the mesh layer, and the legacy application can continue to receive the expected traffic profile, while increased security can be implemented at the edge ingress.

While service mesh certainly grew out of the cloud/container-native world, it can greatly benefit companies in their cloud journey by enabling all systems - new and legacy alike - to be cleanly and securely connected to the cloud environment.

last updated 2020-06-21T13:21:33+0000