NIS2 for Platform Engineers: What Changes in Practice

· Eurtifact Platform Team

Context

The NIS2 Directive (Directive (EU) 2022/2555) came into force in January 2023, with member state transposition required by October 2024. Full compliance is now mandatory for entities designated as “essential” or “important” under national law.

Unlike voluntary frameworks (ISO 27001, SOC 2), NIS2 creates legal obligations with penalties up to €10 million or 2% of global annual turnover for essential entities.

For platform engineers, NIS2 is not a documentation exercise. It requires operational changes to incident response, access control, logging, and supply chain management.

Reality Check

Common Belief

“NIS2 is for CISOs and legal teams. Platform engineering continues as usual.”

Why That’s Incomplete

NIS2 Article 21 requires “cybersecurity risk-management measures” that include specific technical controls:

  • Incident handling (Article 21(2)(a))
  • Business continuity and disaster recovery (Article 21(2)(b))
  • Supply chain security (Article 21(2)©)
  • Security in network and information systems acquisition, development, and maintenance (Article 21(2)(d))
  • Policies and procedures for assessing the effectiveness of cybersecurity measures (Article 21(2)(e))

These are not policy documents. They are operational capabilities. Platform teams implement them.

Engineering Implications

NIS2 imposes these technical requirements on platform operations:

1. Incident Detection and Reporting

Requirement: Article 23 mandates incident notification within:

  • 24 hours: Early warning (incident detected)
  • 72 hours: Incident notification (initial assessment)
  • Final report: Within 1 month

What this means:

  • Automated alerting for security events (failed authentication, privilege escalation, data exfiltration attempts)
  • Incident classification playbooks (what qualifies as “significant”?)
  • Integration between monitoring systems (Prometheus, Grafana, Loki) and incident response workflows (PagerDuty, Opsgenie)
  • Evidence preservation: logs and system state captured at time of detection

Common failure: Relying on manual incident discovery. By the time someone notices unusual behavior, the 24-hour early warning window has passed.

Example (Prometheus AlertManager):

groups:
  - name: nis2_critical
    rules:
      - alert: UnauthorizedPrivilegeEscalation
        expr: |
          increase(k8s_rbac_denied_total{verb="create", resource="clusterrolebindings"}[5m]) > 0
        labels:
          severity: critical
          nis2_reportable: "true"
        annotations:
          summary: "Unauthorized attempt to escalate privileges"
          description: "Pod {{ $labels.pod }} attempted to create cluster role binding"

2. Access Control and Multi-Factor Authentication

Requirement: Article 21(2)(a) requires “policies on access control.”

What this means:

  • Multi-factor authentication (MFA) for all privileged access (Kubernetes admin, database root, SSH to production)
  • Role-Based Access Control (RBAC) with principle of least privilege
  • Time-bounded access grants (temporary elevation instead of permanent admin rights)
  • Audit logging of all privileged operations

Common failure: MFA enabled for VPN access but not for kubectl commands, SSH sessions, or database consoles.

Example (Kubernetes RBAC):

# WRONG: Cluster admin for developers
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: dev-team-admin
subjects:
  - kind: Group
    name: developers
roleRef:
  kind: ClusterRole
  name: cluster-admin  # Too broad

# CORRECT: Namespace-scoped, time-limited via external system
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: dev-team-staging
  namespace: staging
subjects:
  - kind: Group
    name: developers
roleRef:
  kind: Role
  name: developer-role  # Limited to staging namespace

Time-bounded elevation managed via approval workflow (HashiCorp Boundary, Teleport).

3. Logging and Retention

Requirement: Article 21(2)(e) requires “assessing the effectiveness” of measures, implying audit trails.

What this means:

  • Centralized logging of authentication events, API calls, configuration changes
  • Immutable log storage (append-only, tamper-evident)
  • Retention period sufficient for incident investigation (minimum 1 year, but 3-5 years common for regulated entities)
  • Log aggregation across infrastructure (Kubernetes audit logs, database query logs, network flow logs)

Common failure: Logs stored locally on ephemeral pods or EC2 instances. When an incident occurs, logs are already deleted.

Example (Kubernetes Audit Policy):

apiVersion: audit.k8s.io/v1
kind: Policy
rules:
  # Log all authentication failures
  - level: Metadata
    verbs: ["create"]
    resources:
      - group: ""
        resources: ["serviceaccounts/token"]
    namespaces: ["*"]

  # Log all RBAC changes
  - level: RequestResponse
    verbs: ["create", "update", "delete"]
    resources:
      - group: "rbac.authorization.k8s.io"
        resources: ["roles", "rolebindings", "clusterroles", "clusterrolebindings"]

Logs forwarded to immutable storage (S3 with object lock, Loki with retention policies).

4. Supply Chain Security

Requirement: Article 21(2)© requires “supply chain security, including security-related aspects concerning the relationships between each entity and its direct suppliers.”

What this means:

  • Software Bill of Materials (SBOM) for all deployed artifacts
  • Vulnerability scanning of container images, dependencies, and infrastructure-as-code
  • Signed container images with provenance attestations (Sigstore, Notary)
  • Approved vendor list with security assessment for third-party integrations

Common failure: Pulling public container images from Docker Hub without signature verification or vulnerability scanning.

Example (OPA Policy for Image Signing):

package kubernetes.admission

deny[msg] {
  input.request.kind.kind == "Pod"
  container := input.request.object.spec.containers[_]
  not is_signed(container.image)
  msg := sprintf("Container image %v is not signed", [container.image])
}

is_signed(image) {
  # Check Cosign signature via admission webhook
  # Implementation depends on signature verification infrastructure
}

5. Business Continuity and Backup

Requirement: Article 21(2)(b) requires “business continuity, such as backup management and disaster recovery, and crisis management.”

What this means:

  • Automated backups with tested restore procedures
  • Recovery Time Objective (RTO) and Recovery Point Objective (RPO) defined and monitored
  • Geographic redundancy for critical data (multi-region replication)
  • Regular disaster recovery drills (not just backup tests)

Common failure: Backups configured but never tested. When an incident occurs, backups are corrupted or incomplete.

Example (Velero Backup Schedule):

apiVersion: velero.io/v1
kind: Schedule
metadata:
  name: daily-cluster-backup
spec:
  schedule: "0 2 * * *"  # Daily at 2 AM
  template:
    includedNamespaces:
      - production
      - staging
    storageLocation: eu-backup-s3
    volumeSnapshotLocations:
      - eu-snapshot-location
    ttl: 720h  # 30 days retention

Quarterly restore drills to verify RTO/RPO compliance.

Failure Modes

Pattern 1: Incident Detected, Reporting Delayed

An engineer notices suspicious activity on Friday afternoon. They investigate over the weekend and report to management on Monday. By then, the 24-hour early warning deadline has passed.

Why this fails: NIS2 timelines start at detection, not at formal acknowledgment. Detection triggers the clock.

Solution: Automated alerting directly to incident response workflows. Detection = notification.

Pattern 2: Logs Stored but Not Analyzable

All logs are forwarded to S3, satisfying the “we have logs” checkbox. When an incident occurs, no one can query them because there’s no indexing or search infrastructure.

Why this fails: Logs must be queryable for incident investigation. Append-only storage without query capability does not satisfy “assessing effectiveness.”

Solution: Centralized log aggregation with search (Loki, OpenSearch, Splunk). Logs are data, not write-only archives.

Pattern 3: Backup Success Metrics Without Restore Testing

Backup jobs run nightly and report “success.” But the backups have never been restored to a live environment. During a ransomware incident, restored data is missing critical tables.

Why this fails: Backup success means “data was written to storage.” It does not mean “data can be restored and used.”

Solution: Quarterly restore drills to isolated environments. Measure RTO/RPO in practice, not theory.

What “Good” Looks Like

A NIS2-compliant platform has these properties:

  1. Automated Incident Detection: Security events trigger alerts within minutes. Incident response playbooks execute automatically (isolate affected pods, preserve logs, notify on-call).

  2. Enforced Access Controls: MFA required for all production access. RBAC policies enforce least privilege. Temporary elevation logged and time-limited.

  3. Queryable Audit Logs: Centralized logging with retention aligned to incident investigation needs (3-5 years). Logs are indexed and searchable within seconds.

  4. Verified Supply Chain: All deployed artifacts have SBOMs, signatures, and vulnerability scan results. Unsigned or vulnerable images are rejected at admission time.

  5. Tested Disaster Recovery: Backup restore procedures executed quarterly. RTO/RPO measured against real incidents, not synthetic tests.

These are operational capabilities, not policy documents.

Limits & Trade-offs

This approach does not:

  • Guarantee zero incidents: NIS2 requires timely detection and response, not prevention. Incidents will still occur.
  • Eliminate manual processes: Some incident response steps require human judgment (false positive triage, root cause analysis). Automation reduces latency, not responsibility.
  • Resolve regulatory ambiguity: What qualifies as a “significant” incident? Member states interpret this differently. Compliance requires local legal guidance.

NIS2 creates a baseline. It is not a comprehensive security program.

Key Takeaways

  • NIS2 Article 23 requires incident reporting within 24 hours of detection. This is a technical requirement, not a policy timeline.
  • Access control (Article 21(2)(a)) means MFA, RBAC, and audit logging for all privileged operations, including Kubernetes, databases, and SSH.
  • Logging must be centralized, queryable, and retained long enough for incident investigation (3-5 years recommended for regulated entities).
  • Supply chain security (Article 21(2)©) requires SBOMs, signed artifacts, and vulnerability scanning for all deployed software.
  • Disaster recovery (Article 21(2)(b)) means tested restore procedures with measured RTO/RPO, not just backup job success metrics.

This article reflects the Eurtifact platform team’s understanding of NIS2 technical requirements as of February 2026. Member state implementations vary. For compliance questions specific to your jurisdiction or entity classification, consult qualified legal and compliance advisors.