Security
Zero-Trust Networking in Kubernetes with Istio Service Mesh
Zero trust architecture in Kubernetes, from implementation and policy to observability.
The traditional “castle and moat” model no longer maps neatly to modern platform environments. Once applications are split into many services, run across clusters, and depend on multiple APIs and third-party systems, trusting everything inside the perimeter becomes a poor assumption.
That is why zero-trust networking matters. The goal is simple: never rely on location alone as proof of trust. Every connection should be authenticated, authorised, and encrypted.
Istio remains one of the strongest ways to implement that model in Kubernetes because it lets platform teams apply transport security, identity, and policy at the infrastructure layer rather than relying on every application team to build it all themselves.
Understanding Zero-Trust Principles
Zero-trust networking is built around a few key ideas:
- identity-based security
- least-privilege access
- microsegmentation
- continuous verification
- encryption everywhere
That shifts security away from broad network trust and towards explicit service-to-service controls.
Istio Architecture Overview
Istio works through two main layers:
- the control plane, which distributes certificates, policies, and configuration
- the data plane, which handles the traffic path and enforces those rules
As of current Istio releases, teams can choose between two main data-plane patterns:
- sidecar mode, where each workload gets its own Envoy proxy
- ambient mesh, where shared infrastructure proxies provide mesh behaviour without per-pod sidecars
Both can support zero-trust networking. The choice is mostly about operational trade-offs.
Control Plane Components
apiVersion: apps/v1
kind: Deployment
metadata:
name: istiod
namespace: istio-system
spec:
replicas: 2
selector:
matchLabels:
app: istiod
template:
metadata:
labels:
app: istiod
spec:
containers:
- name: discovery
image: istio/pilot:1.27.1
env:
- name: PILOT_ENABLE_AMBIENT
value: 'true'
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
The control plane handles certificate issuance, service discovery, and policy distribution across the mesh.
Sidecar Mode
In sidecar mode, workloads get an injected proxy that intercepts inbound and outbound traffic:
apiVersion: v1
kind: Pod
metadata:
name: example-app
annotations:
sidecar.istio.io/inject: 'true'
spec:
containers:
- name: app
image: example-app:latest
ports:
- containerPort: 8080
This remains a strong option when you need fine-grained per-workload control or already operate comfortably with sidecars.
Ambient Mesh
Ambient mesh is attractive when you want the security properties of a service mesh with less per-pod overhead:
apiVersion: v1
kind: Namespace
metadata:
name: fintech-core
labels:
istio.io/dataplane-mode: ambient
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: payment-processor
namespace: fintech-core
spec:
replicas: 3
selector:
matchLabels:
app: payment-processor
template:
metadata:
labels:
app: payment-processor
spec:
containers:
- name: payment-app
image: payment-processor:2.1.0
ports:
- containerPort: 8443
Ambient mesh is particularly useful when platform teams want to centralise networking and security operations without forcing every workload to carry an extra sidecar.
Implementing Mutual TLS
If zero-trust networking is the destination, mTLS is the mechanism that gets you there. It ensures both sides of a connection prove their identity and that traffic is encrypted in transit.
A cluster-wide strict policy looks simple:
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: default
namespace: istio-system
spec:
mtls:
mode: STRICT
But the operational advice matters more than the manifest itself: do not start with strict mode everywhere in production unless you are already confident everything is in the mesh and speaking mTLS correctly.
Start with permissive mode, observe traffic, then tighten selectively:
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: permissive-mtls
namespace: production
spec:
mtls:
mode: PERMISSIVE
For sensitive services, a tighter per-service policy is often the better path:
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: payment-service-mtls
namespace: fintech-core
spec:
selector:
matchLabels:
app: payment-processor
mtls:
mode: STRICT
portLevelMtls:
9090:
mode: DISABLE
15021:
mode: DISABLE
That keeps monitoring and health checks workable whilst still enforcing strict security on application traffic.
Authorization Policies
mTLS answers “who is talking?”, but not “what are they allowed to do?”. That is where authorisation policies come in.
A simple service-to-service policy might look like this:
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: payment-processor-access
namespace: fintech-core
spec:
selector:
matchLabels:
app: payment-processor
rules:
- from:
- source:
principals:
- 'cluster.local/ns/fintech-core/sa/order-service'
- 'cluster.local/ns/fintech-core/sa/checkout-service'
to:
- operation:
methods: ['POST']
paths: ['/api/v2/payments/process', '/api/v2/payments/authorize']
You can also add richer conditions such as request headers, paths, or time-based checks for more sensitive operations.
Deny policies are equally important. They help enforce a baseline against obviously hostile traffic or prohibited patterns:
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: security-baseline-deny
namespace: production
spec:
action: DENY
rules:
- when:
- key: request.headers[user-agent]
values:
- '*sqlmap*'
- '*nmap*'
- '*nikto*'
Network Segmentation Strategies
Zero trust is also about reducing the blast radius of mistakes and compromises. Namespace boundaries and workload identity give you good primitives for that.
For example, you can keep namespaces isolated:
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: namespace-isolation
namespace: production
spec:
rules:
- from:
- source:
namespaces: ['production']
And you can tighten application-layer access within a namespace:
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: microservice-segmentation
namespace: ecommerce
spec:
selector:
matchLabels:
tier: backend
rules:
- from:
- source:
principals:
- 'cluster.local/ns/ecommerce/sa/frontend'
- 'cluster.local/ns/ecommerce/sa/api-gateway'
to:
- operation:
methods: ['GET', 'POST']
This kind of identity-based segmentation tends to survive cluster changes much better than IP-based assumptions.
Observability and Monitoring
Zero trust without observability turns into guesswork quickly. You need to know which traffic is using mTLS, where denials are happening, and whether policies are blocking the right things.
Distributed tracing and telemetry help make that visible:
apiVersion: v1
kind: ConfigMap
metadata:
name: istio-tracing
namespace: istio-system
data:
mesh: |
defaultConfig:
tracing:
zipkin:
address: zipkin.istio-system:9411
sampling: 1.0
apiVersion: telemetry.istio.io/v1alpha1
kind: Telemetry
metadata:
name: security-metrics
namespace: istio-system
spec:
metrics:
- providers:
- name: prometheus
overrides:
- match:
metric: ALL_METRICS
tagOverrides:
source_workload:
value: '%{source_workload}'
destination_service:
value: '%{destination_service_name}'
mtls_status:
value: '%{connection_security_policy}'
A practical dashboard normally focuses on:
- mTLS coverage
- authorisation denials
- certificate expiry
- request volume by workload pair
Advanced Security Patterns
As environments mature, it is common to extend Istio with external identity and policy decisions.
JWT Validation
apiVersion: security.istio.io/v1beta1
kind: RequestAuthentication
metadata:
name: jwt-auth
namespace: api
spec:
selector:
matchLabels:
app: api-gateway
jwtRules:
- issuer: 'https://auth.company.com'
jwksUri: 'https://auth.company.com/.well-known/jwks.json'
audiences:
- 'api.company.com'
External Authorisation
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
name: control-plane
spec:
meshConfig:
extensionProviders:
- name: external-authz
envoyExtAuthzHttp:
service: external-authz.auth.svc.cluster.local
port: 8080
timeout: 0.5s
Rate Limiting
For internet-facing services or high-risk APIs, local or global rate limiting is often part of the broader zero-trust control set:
apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
name: rate-limit-filter
namespace: production
spec:
configPatches:
- applyTo: HTTP_FILTER
patch:
operation: INSERT_BEFORE
value:
name: envoy.filters.http.local_ratelimit
Operational Best Practices
The teams that make zero-trust rollouts work tend to follow the same principles:
- start with a contained pilot
- enable observability before tightening policy
- use permissive or low-risk modes first
- automate certificate handling and policy rollout
- document failure modes and incident response clearly
For ambient mesh specifically, the rollout can be gentler because you do not need to restart pods just to gain the initial L4 security layer.
Troubleshooting Common Issues
When zero-trust policies fail, they usually fail closed. That is correct from a security perspective, but painful operationally if you are not ready for it.
Useful first checks include:
# Check mTLS configuration
istioctl authn tls-check <pod-name>.<namespace>
# Verify certificates
istioctl proxy-config secret <pod-name>.<namespace>
# Check proxy configuration
istioctl proxy-config cluster <pod-name>.<namespace> --fqdn <service-fqdn>
# Test authorisation policies
istioctl experimental authz check <pod-name>.<namespace>
In practice, the most common issues are not exotic authorisation bugs. They are usually:
- workloads not actually participating in the mesh
- health checks or metrics endpoints blocked accidentally
- certificate or clock problems
- mismatched service accounts or namespace assumptions
Conclusion
Zero-trust networking in Kubernetes is not a single feature you switch on. It is a set of transport, identity, policy, and observability decisions that reinforce each other.
Istio gives platform teams a practical route to implementing that model without forcing every application team to re-implement mTLS, fine-grained authorisation, tracing, and traffic controls themselves. Whether you adopt sidecars, ambient mesh, or a phased mix of both, the important thing is to roll it out deliberately, observe it closely, and keep the policies grounded in how your services actually operate.