Data discovery — how to find personal data across your systems
You can't protect what you can't see. Here's how modern data discovery finds and classifies personal data across databases, clouds, mailboxes and endpoints.
Fortifyze Team
Trufe · 18 June 2026
Every privacy programme runs into the same wall early on: you can't protect, minimise or delete personal data you can't see. Data discovery is how you tear that wall down — by building a live map of where personal data lives and how sensitive it is.
Why discovery is hard
Personal data doesn't stay in one tidy database. It spreads:
- across relational and NoSQL databases,
- into cloud object storage (S3, Azure Blob, GCS),
- through mailboxes and shared inboxes,
- onto cloud drives (OneDrive, SharePoint, Google Drive, Dropbox, Box),
- and onto employee laptops and servers as files and exports.
Manual mapping goes stale the moment it's finished. The goal is continuous, automated discovery.
What good discovery looks like
A modern discovery capability should:
- Connect broadly. Read-only connectors for databases, cloud storage, mailboxes and cloud drives, plus on-device agents for endpoints.
- Classify intelligently. Detect not just field names but the content — emails, phone numbers, identifiers, financial and health data — and label category, sensitivity and confidence. AI-assisted classification cuts the false positives that plague regex-only tools.
- Minimise exposure. Scan with masking at the source. The best discovery surfaces what category of data exists and where without copying the raw values into yet another system.
- Feed governance. Findings should flow into your records of processing, retention decisions and rights-fulfilment — not sit in a one-off report.
Discovery and the DPDPA
Under the DPDPA, discovery underpins almost every obligation:
- Purpose limitation & minimisation — you can only justify data you know you hold.
- Data principal rights — fulfilling access, correction and erasure requests requires knowing where a person's data is.
- Security — you protect sensitive data first only if you know where it is.
- Breach response — assessing impact means knowing what data a breached system held.
Start with the riskiest surfaces
You don't have to scan everything on day one. Prioritise:
- systems with known sensitive data (HR, finance, customer records),
- shared drives and mailboxes where data accumulates informally,
- endpoints that hold local exports.
How Fortifyze does discovery
Fortifyze discovers personal data across databases, cloud storage, mailboxes, cloud drives and endpoints, classifies it with AI, and enforces per-source exposure levels so raw data needn't leave the source. Findings flow straight into the rest of your DPDPA programme.
See Fortifyze on your data
Discover personal data and prove DPDPA compliance in one platform.