Security

Zero-Trust Networking in Kubernetes with Istio Service Mesh

Zero trust architecture in Kubernetes, from implementation and policy to observability.

September 23, 2025 • Platform Engineering • 15 min read

Glossy service-mesh security diagram with trust boundaries, mTLS lanes, and policy-enforced traffic flows

The traditional “castle and moat” model no longer maps neatly to modern platform environments. Once applications are split into many services, run across clusters, and depend on multiple APIs and third-party systems, trusting everything inside the perimeter becomes a poor assumption.

That is why zero-trust networking matters. The goal is simple: never rely on location alone as proof of trust. Every connection should be authenticated, authorised, and encrypted.

Istio remains one of the strongest ways to implement that model in Kubernetes because it lets platform teams apply transport security, identity, and policy at the infrastructure layer rather than relying on every application team to build it all themselves.

Understanding Zero-Trust Principles

Zero-trust networking is built around a few key ideas:

identity-based security
least-privilege access
microsegmentation
continuous verification
encryption everywhere

That shifts security away from broad network trust and towards explicit service-to-service controls.

Istio Architecture Overview

Istio works through two main layers:

the control plane, which distributes certificates, policies, and configuration
the data plane, which handles the traffic path and enforces those rules

As of current Istio releases, teams can choose between two main data-plane patterns:

sidecar mode, where each workload gets its own Envoy proxy
ambient mesh, where shared infrastructure proxies provide mesh behaviour without per-pod sidecars

Both can support zero-trust networking. The choice is mostly about operational trade-offs.

Control Plane Components

apiVersion: apps/v1
kind: Deployment
metadata:
  name: istiod
  namespace: istio-system
spec:
  replicas: 2
  selector:
    matchLabels:
      app: istiod
  template:
    metadata:
      labels:
        app: istiod
    spec:
      containers:
        - name: discovery
          image: istio/pilot:1.27.1
          env:
            - name: PILOT_ENABLE_AMBIENT
              value: 'true'
          resources:
            requests:
              cpu: 100m
              memory: 128Mi
            limits:
              cpu: 500m
              memory: 512Mi

The control plane handles certificate issuance, service discovery, and policy distribution across the mesh.

Sidecar Mode

In sidecar mode, workloads get an injected proxy that intercepts inbound and outbound traffic:

apiVersion: v1
kind: Pod
metadata:
  name: example-app
  annotations:
    sidecar.istio.io/inject: 'true'
spec:
  containers:
    - name: app
      image: example-app:latest
      ports:
        - containerPort: 8080

This remains a strong option when you need fine-grained per-workload control or already operate comfortably with sidecars.

Ambient Mesh

Ambient mesh is attractive when you want the security properties of a service mesh with less per-pod overhead:

apiVersion: v1
kind: Namespace
metadata:
  name: fintech-core
  labels:
    istio.io/dataplane-mode: ambient
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: payment-processor
  namespace: fintech-core
spec:
  replicas: 3
  selector:
    matchLabels:
      app: payment-processor
  template:
    metadata:
      labels:
        app: payment-processor
    spec:
      containers:
        - name: payment-app
          image: payment-processor:2.1.0
          ports:
            - containerPort: 8443

Ambient mesh is particularly useful when platform teams want to centralise networking and security operations without forcing every workload to carry an extra sidecar.

Implementing Mutual TLS

If zero-trust networking is the destination, mTLS is the mechanism that gets you there. It ensures both sides of a connection prove their identity and that traffic is encrypted in transit.

A cluster-wide strict policy looks simple:

apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
  namespace: istio-system
spec:
  mtls:
    mode: STRICT

But the operational advice matters more than the manifest itself: do not start with strict mode everywhere in production unless you are already confident everything is in the mesh and speaking mTLS correctly.

Start with permissive mode, observe traffic, then tighten selectively:

apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: permissive-mtls
  namespace: production
spec:
  mtls:
    mode: PERMISSIVE

For sensitive services, a tighter per-service policy is often the better path:

apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: payment-service-mtls
  namespace: fintech-core
spec:
  selector:
    matchLabels:
      app: payment-processor
  mtls:
    mode: STRICT
  portLevelMtls:
    9090:
      mode: DISABLE
    15021:
      mode: DISABLE

That keeps monitoring and health checks workable whilst still enforcing strict security on application traffic.

Authorization Policies

mTLS answers “who is talking?”, but not “what are they allowed to do?”. That is where authorisation policies come in.

A simple service-to-service policy might look like this:

apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: payment-processor-access
  namespace: fintech-core
spec:
  selector:
    matchLabels:
      app: payment-processor
  rules:
    - from:
        - source:
            principals:
              - 'cluster.local/ns/fintech-core/sa/order-service'
              - 'cluster.local/ns/fintech-core/sa/checkout-service'
      to:
        - operation:
            methods: ['POST']
            paths: ['/api/v2/payments/process', '/api/v2/payments/authorize']

You can also add richer conditions such as request headers, paths, or time-based checks for more sensitive operations.

Deny policies are equally important. They help enforce a baseline against obviously hostile traffic or prohibited patterns:

apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: security-baseline-deny
  namespace: production
spec:
  action: DENY
  rules:
    - when:
        - key: request.headers[user-agent]
          values:
            - '*sqlmap*'
            - '*nmap*'
            - '*nikto*'

Network Segmentation Strategies

Zero trust is also about reducing the blast radius of mistakes and compromises. Namespace boundaries and workload identity give you good primitives for that.

For example, you can keep namespaces isolated:

apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: namespace-isolation
  namespace: production
spec:
  rules:
    - from:
        - source:
            namespaces: ['production']

And you can tighten application-layer access within a namespace:

apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: microservice-segmentation
  namespace: ecommerce
spec:
  selector:
    matchLabels:
      tier: backend
  rules:
    - from:
        - source:
            principals:
              - 'cluster.local/ns/ecommerce/sa/frontend'
              - 'cluster.local/ns/ecommerce/sa/api-gateway'
      to:
        - operation:
            methods: ['GET', 'POST']

This kind of identity-based segmentation tends to survive cluster changes much better than IP-based assumptions.

Observability and Monitoring

Zero trust without observability turns into guesswork quickly. You need to know which traffic is using mTLS, where denials are happening, and whether policies are blocking the right things.

Distributed tracing and telemetry help make that visible:

apiVersion: v1
kind: ConfigMap
metadata:
  name: istio-tracing
  namespace: istio-system
data:
  mesh: |
    defaultConfig:
      tracing:
        zipkin:
          address: zipkin.istio-system:9411
        sampling: 1.0

apiVersion: telemetry.istio.io/v1alpha1
kind: Telemetry
metadata:
  name: security-metrics
  namespace: istio-system
spec:
  metrics:
    - providers:
        - name: prometheus
      overrides:
        - match:
            metric: ALL_METRICS
          tagOverrides:
            source_workload:
              value: '%{source_workload}'
            destination_service:
              value: '%{destination_service_name}'
            mtls_status:
              value: '%{connection_security_policy}'

A practical dashboard normally focuses on:

mTLS coverage
authorisation denials
certificate expiry
request volume by workload pair

Advanced Security Patterns

As environments mature, it is common to extend Istio with external identity and policy decisions.

JWT Validation

apiVersion: security.istio.io/v1beta1
kind: RequestAuthentication
metadata:
  name: jwt-auth
  namespace: api
spec:
  selector:
    matchLabels:
      app: api-gateway
  jwtRules:
    - issuer: 'https://auth.company.com'
      jwksUri: 'https://auth.company.com/.well-known/jwks.json'
      audiences:
        - 'api.company.com'

External Authorisation

apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
  name: control-plane
spec:
  meshConfig:
    extensionProviders:
      - name: external-authz
        envoyExtAuthzHttp:
          service: external-authz.auth.svc.cluster.local
          port: 8080
          timeout: 0.5s

Rate Limiting

For internet-facing services or high-risk APIs, local or global rate limiting is often part of the broader zero-trust control set:

apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: rate-limit-filter
  namespace: production
spec:
  configPatches:
    - applyTo: HTTP_FILTER
      patch:
        operation: INSERT_BEFORE
        value:
          name: envoy.filters.http.local_ratelimit

Operational Best Practices

The teams that make zero-trust rollouts work tend to follow the same principles:

start with a contained pilot
enable observability before tightening policy
use permissive or low-risk modes first
automate certificate handling and policy rollout
document failure modes and incident response clearly

For ambient mesh specifically, the rollout can be gentler because you do not need to restart pods just to gain the initial L4 security layer.

Troubleshooting Common Issues

When zero-trust policies fail, they usually fail closed. That is correct from a security perspective, but painful operationally if you are not ready for it.

Useful first checks include:

# Check mTLS configuration
istioctl authn tls-check <pod-name>.<namespace>

# Verify certificates
istioctl proxy-config secret <pod-name>.<namespace>

# Check proxy configuration
istioctl proxy-config cluster <pod-name>.<namespace> --fqdn <service-fqdn>

# Test authorisation policies
istioctl experimental authz check <pod-name>.<namespace>

In practice, the most common issues are not exotic authorisation bugs. They are usually:

workloads not actually participating in the mesh
health checks or metrics endpoints blocked accidentally
certificate or clock problems
mismatched service accounts or namespace assumptions

Conclusion

Zero-trust networking in Kubernetes is not a single feature you switch on. It is a set of transport, identity, policy, and observability decisions that reinforce each other.

Istio gives platform teams a practical route to implementing that model without forcing every application team to re-implement mTLS, fine-grained authorisation, tracing, and traffic controls themselves. Whether you adopt sidecars, ambient mesh, or a phased mix of both, the important thing is to roll it out deliberately, observe it closely, and keep the policies grounded in how your services actually operate.