NIS2 for Platform Engineers: What Changes in Practice
· Eurtifact Platform Team
Context
The NIS2 Directive (Directive (EU) 2022/2555) came into force in January 2023, with member state transposition required by October 2024. Full compliance is now mandatory for entities designated as “essential” or “important” under national law.
Unlike voluntary frameworks (ISO 27001, SOC 2), NIS2 creates legal obligations with penalties up to €10 million or 2% of global annual turnover for essential entities.
For platform engineers, NIS2 is not a documentation exercise. It requires operational changes to incident response, access control, logging, and supply chain management.
Reality Check
Common Belief
“NIS2 is for CISOs and legal teams. Platform engineering continues as usual.”
Why That’s Incomplete
NIS2 Article 21 requires “cybersecurity risk-management measures” that include specific technical controls:
- Incident handling (Article 21(2)(a))
- Business continuity and disaster recovery (Article 21(2)(b))
- Supply chain security (Article 21(2)©)
- Security in network and information systems acquisition, development, and maintenance (Article 21(2)(d))
- Policies and procedures for assessing the effectiveness of cybersecurity measures (Article 21(2)(e))
These are not policy documents. They are operational capabilities. Platform teams implement them.
Engineering Implications
NIS2 imposes these technical requirements on platform operations:
1. Incident Detection and Reporting
Requirement: Article 23 mandates incident notification within:
- 24 hours: Early warning (incident detected)
- 72 hours: Incident notification (initial assessment)
- Final report: Within 1 month
What this means:
- Automated alerting for security events (failed authentication, privilege escalation, data exfiltration attempts)
- Incident classification playbooks (what qualifies as “significant”?)
- Integration between monitoring systems (Prometheus, Grafana, Loki) and incident response workflows (PagerDuty, Opsgenie)
- Evidence preservation: logs and system state captured at time of detection
Common failure: Relying on manual incident discovery. By the time someone notices unusual behavior, the 24-hour early warning window has passed.
Example (Prometheus AlertManager):
groups:
- name: nis2_critical
rules:
- alert: UnauthorizedPrivilegeEscalation
expr: |
increase(k8s_rbac_denied_total{verb="create", resource="clusterrolebindings"}[5m]) > 0
labels:
severity: critical
nis2_reportable: "true"
annotations:
summary: "Unauthorized attempt to escalate privileges"
description: "Pod {{ $labels.pod }} attempted to create cluster role binding"
2. Access Control and Multi-Factor Authentication
Requirement: Article 21(2)(a) requires “policies on access control.”
What this means:
- Multi-factor authentication (MFA) for all privileged access (Kubernetes admin, database root, SSH to production)
- Role-Based Access Control (RBAC) with principle of least privilege
- Time-bounded access grants (temporary elevation instead of permanent admin rights)
- Audit logging of all privileged operations
Common failure: MFA enabled for VPN access but not for kubectl commands, SSH sessions, or database consoles.
Example (Kubernetes RBAC):
# WRONG: Cluster admin for developers
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: dev-team-admin
subjects:
- kind: Group
name: developers
roleRef:
kind: ClusterRole
name: cluster-admin # Too broad
# CORRECT: Namespace-scoped, time-limited via external system
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: dev-team-staging
namespace: staging
subjects:
- kind: Group
name: developers
roleRef:
kind: Role
name: developer-role # Limited to staging namespace
Time-bounded elevation managed via approval workflow (HashiCorp Boundary, Teleport).
3. Logging and Retention
Requirement: Article 21(2)(e) requires “assessing the effectiveness” of measures, implying audit trails.
What this means:
- Centralized logging of authentication events, API calls, configuration changes
- Immutable log storage (append-only, tamper-evident)
- Retention period sufficient for incident investigation (minimum 1 year, but 3-5 years common for regulated entities)
- Log aggregation across infrastructure (Kubernetes audit logs, database query logs, network flow logs)
Common failure: Logs stored locally on ephemeral pods or EC2 instances. When an incident occurs, logs are already deleted.
Example (Kubernetes Audit Policy):
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
# Log all authentication failures
- level: Metadata
verbs: ["create"]
resources:
- group: ""
resources: ["serviceaccounts/token"]
namespaces: ["*"]
# Log all RBAC changes
- level: RequestResponse
verbs: ["create", "update", "delete"]
resources:
- group: "rbac.authorization.k8s.io"
resources: ["roles", "rolebindings", "clusterroles", "clusterrolebindings"]
Logs forwarded to immutable storage (S3 with object lock, Loki with retention policies).
4. Supply Chain Security
Requirement: Article 21(2)© requires “supply chain security, including security-related aspects concerning the relationships between each entity and its direct suppliers.”
What this means:
- Software Bill of Materials (SBOM) for all deployed artifacts
- Vulnerability scanning of container images, dependencies, and infrastructure-as-code
- Signed container images with provenance attestations (Sigstore, Notary)
- Approved vendor list with security assessment for third-party integrations
Common failure: Pulling public container images from Docker Hub without signature verification or vulnerability scanning.
Example (OPA Policy for Image Signing):
package kubernetes.admission
deny[msg] {
input.request.kind.kind == "Pod"
container := input.request.object.spec.containers[_]
not is_signed(container.image)
msg := sprintf("Container image %v is not signed", [container.image])
}
is_signed(image) {
# Check Cosign signature via admission webhook
# Implementation depends on signature verification infrastructure
}
5. Business Continuity and Backup
Requirement: Article 21(2)(b) requires “business continuity, such as backup management and disaster recovery, and crisis management.”
What this means:
- Automated backups with tested restore procedures
- Recovery Time Objective (RTO) and Recovery Point Objective (RPO) defined and monitored
- Geographic redundancy for critical data (multi-region replication)
- Regular disaster recovery drills (not just backup tests)
Common failure: Backups configured but never tested. When an incident occurs, backups are corrupted or incomplete.
Example (Velero Backup Schedule):
apiVersion: velero.io/v1
kind: Schedule
metadata:
name: daily-cluster-backup
spec:
schedule: "0 2 * * *" # Daily at 2 AM
template:
includedNamespaces:
- production
- staging
storageLocation: eu-backup-s3
volumeSnapshotLocations:
- eu-snapshot-location
ttl: 720h # 30 days retention
Quarterly restore drills to verify RTO/RPO compliance.
Failure Modes
Pattern 1: Incident Detected, Reporting Delayed
An engineer notices suspicious activity on Friday afternoon. They investigate over the weekend and report to management on Monday. By then, the 24-hour early warning deadline has passed.
Why this fails: NIS2 timelines start at detection, not at formal acknowledgment. Detection triggers the clock.
Solution: Automated alerting directly to incident response workflows. Detection = notification.
Pattern 2: Logs Stored but Not Analyzable
All logs are forwarded to S3, satisfying the “we have logs” checkbox. When an incident occurs, no one can query them because there’s no indexing or search infrastructure.
Why this fails: Logs must be queryable for incident investigation. Append-only storage without query capability does not satisfy “assessing effectiveness.”
Solution: Centralized log aggregation with search (Loki, OpenSearch, Splunk). Logs are data, not write-only archives.
Pattern 3: Backup Success Metrics Without Restore Testing
Backup jobs run nightly and report “success.” But the backups have never been restored to a live environment. During a ransomware incident, restored data is missing critical tables.
Why this fails: Backup success means “data was written to storage.” It does not mean “data can be restored and used.”
Solution: Quarterly restore drills to isolated environments. Measure RTO/RPO in practice, not theory.
What “Good” Looks Like
A NIS2-compliant platform has these properties:
-
Automated Incident Detection: Security events trigger alerts within minutes. Incident response playbooks execute automatically (isolate affected pods, preserve logs, notify on-call).
-
Enforced Access Controls: MFA required for all production access. RBAC policies enforce least privilege. Temporary elevation logged and time-limited.
-
Queryable Audit Logs: Centralized logging with retention aligned to incident investigation needs (3-5 years). Logs are indexed and searchable within seconds.
-
Verified Supply Chain: All deployed artifacts have SBOMs, signatures, and vulnerability scan results. Unsigned or vulnerable images are rejected at admission time.
-
Tested Disaster Recovery: Backup restore procedures executed quarterly. RTO/RPO measured against real incidents, not synthetic tests.
These are operational capabilities, not policy documents.
Limits & Trade-offs
This approach does not:
- Guarantee zero incidents: NIS2 requires timely detection and response, not prevention. Incidents will still occur.
- Eliminate manual processes: Some incident response steps require human judgment (false positive triage, root cause analysis). Automation reduces latency, not responsibility.
- Resolve regulatory ambiguity: What qualifies as a “significant” incident? Member states interpret this differently. Compliance requires local legal guidance.
NIS2 creates a baseline. It is not a comprehensive security program.
Key Takeaways
- NIS2 Article 23 requires incident reporting within 24 hours of detection. This is a technical requirement, not a policy timeline.
- Access control (Article 21(2)(a)) means MFA, RBAC, and audit logging for all privileged operations, including Kubernetes, databases, and SSH.
- Logging must be centralized, queryable, and retained long enough for incident investigation (3-5 years recommended for regulated entities).
- Supply chain security (Article 21(2)©) requires SBOMs, signed artifacts, and vulnerability scanning for all deployed software.
- Disaster recovery (Article 21(2)(b)) means tested restore procedures with measured RTO/RPO, not just backup job success metrics.
This article reflects the Eurtifact platform team’s understanding of NIS2 technical requirements as of February 2026. Member state implementations vary. For compliance questions specific to your jurisdiction or entity classification, consult qualified legal and compliance advisors.