service mesh sso
A vital feature of a service mesh is secure communication between services.
For services immediately within the mesh, communication is secured with mTLS and the root certs installed in the mesh.
For end-users residing outside of the mesh, either OIDC or JWT must be implemented to securely grant access to resources.
In a large network, SSO will be required to centralize security policies and access controls as well as increase user experience across services.
Many SSO providers require some level of code change within the application, however authentication and authorization can increasingly be done at the mesh level, enabling legacy systems to be secured by modern protocols without any code change.
Traditional Network Trust
Traditional network security of implicit trust within the network simply does not suffice in the modern era of cloud computing, global clients, and edge-computing.
While VPNs and proxy servers are often used to enable users physically outside of the network to access internal resources, these lead to choke points and single points of failure for network traffic.
These issues have only been made more apparent in the current era of work-from-home where most of the network's internal clients suddenly became external clients overnight, all fighting for bandwidth in the VPN and proxy servers that were originally designed for a much smaller workload.
This model made sense when some systems / applications simply could not be exposed as they did not implement modern (if any) security protocols, or in situations where communication between services was in plain text and therefore subject to easy MITM sniffing.
Additionally, when all systems in a network resided in the same physical data center, there was not as much focus on securing traffic between and around systems when the only link between them was a physical ethernet cable.
However with the move to the cloud where your compute can be physically anywhere in the world and moves frequently, or when your clients themselves are globally dispersed and not always internal trusted users, relying on an implicit trust network begins to fall apart.
The implementation of a trustless network where communication between services requires mutual certificate validation is of course the goal, however this is often difficult to do while maintaining backwards compatibility with legacy systems built on top of an implicit trust model.
Service Mesh Ingress
The implementation of service mesh design enables optimization of network traffic within the mesh, so that high-traffic services within the network do not need to NAT out to communicate between VPCs if they reside in the same global network.
Addtionally, within the mesh service-to-service communication is secured through the use of mTLS and authorization policies.
However to enable end-users to securely access resources within the mesh, an additional security layer must be implemented.
Within a desktop configuration policy you can install root certs on client devices however in an increasingly bring-your-device environment, or in situations where your users are on mobile devices, relying on root access to your client's machines is not always guaranteed.
Furthermore, the use of an IP ACL proxy not only bottlenecks all traffic through this proxy, it also relies on the idea that your users' IPs are well-known, static, and never going to be used by an unauthorized user within their network.
By implementing short-lived sessions secured through OIDC or JWT, end-users will be able to securely access resources based on their explicilty granted scopes, rather than implicitly granting all users access to all resources in the network.
This enables services to be exposed securely at the edge, with only users supplying a valid signed JWT claim being able to access secured resources.
As each service is fronted by a unique envoy proxy, the noisy neighbor problem experienced in proxy servers is not experienced, and services and their corresponding ingresses can be scaled independently.
For high-traffic systems, this enables them to be moved from behind central proxy routes to securely terminate traffic at edge.
SSO JWT Issuer
To implement SSO across a service mesh, you must first have a central auth provider which can issue client JWTs and publish a list of JWKs that the mesh can access to validate client-provided keys and claims.
This auth provider should provide users with a claim payload which can grant them access to all services the user should access within the mesh.
Depending on the auth provider this will either be returned in a JSON payload, or in a deeply-integrated provider running on the domain, will set an Authorization cookie on the domain to enable the user to seamlessly navigate to other SSO resources.
Istio JWT Configuration
Every user-facing service in the mesh should be configured with a RequestAuthentication resource defining the JWT issuer and jwksUri, and an AuthorizationPolicy defining the subjects and list-type claims granting access to the service.
Alternatively, a single policy can be created and all services labeled with a common label (example:
sso: enabled
), however this will prevent more granular control over service access.
The service will now return
RBAC: access denied
unless an
Authorization: Bearer [jwt]
header is sent in the request.
Great! We now have SSO for all services in our mesh.
Except for one thing.
Providing the SSO JWT as a header works great with
curl
, but in a real-world SSO environment, your end users will be accessing services within their browser, where they cannot easily modify headers on the fly.
Additionally, headers need to be set on every request, whereas your SSO JWT was most likely set as a cookie on the domain.
While cookies are technically sent as a header, at the time of writing this, Istio currently does not check cookies for the JWT token, and only checks headers for the JWT.
Istio OAuth2 SSO
To address this issue, I wrote a
small OAuth2 API and corresponding Envoy filter and Lua script to complete the OAuth2 handshake, set the SSO domain cookie on the client's browser, and redirect the client back to the original service in the mesh.
Legacy applications which cannot be updated to support OAuth2 or more modern standards can be labeled in the mesh and will automatically have the OAuth2 handshake implemented, and authentication / authorization can be federated to the IDP with absolutely zero code change.
For a mesh containing hundreds of microservices, each with their own authentication requirements, federating all authX to the IDP and using the mesh to implement the OIDC lifecycle provides a seamless user experience as they move from application to application.
For applications which do implement OAuth2, their authentication is additive on top of the base SSO provider, and the transition from one tenant to another is seamless.
Trustless Security
By securing services through the use of signed and validated JWT claims rather than bottlenecking requests through a central proxy or VPN, it enables services to scale much more efficiently and effectively in the cloud without compromising security.
On the contrary, the move away from an implicit trust VPN model to an explicit trust security-at-edge model increases network security, as users will only be granted access to the resources they have been explicitly granted access to with short-lived JWT tokens.
With more and more users accessing resources from home or on mobile devices, there is no longer a need to bottleneck all traffic through VPNs into the network or whitelist specific IPs, as the user's access to resources is secured by the SSO provider and validated by the token granted to them by the auth provider.
Access lists and policies can be moved from the disparate combination of firewalls, proxy servers, and app-internal auth systems, to centralize all auth and access controls to the SSO provider for better review, auditing, and management.
While this is a significant shift, legacy systems do not need to be left behind. These same policies and toolsets will work exactly the same with legacy systems which are fronted with Istio, as all auth is done at the mesh layer, and the legacy application can continue to receive the expected traffic profile, while increased security can be implemented at the edge ingress.
While service mesh certainly grew out of the cloud/container-native world, it can greatly benefit companies in their cloud journey by enabling all systems - new and legacy alike - to be cleanly and securely connected to the cloud environment.
last updated 2024-03-18