What is de-identification?
De-identification is the process of removing personal details from health-related data so that the information can no longer be linked back to any specific person. Think of it as stripping away the "who" while keeping the “what,” so that data can still be useful without ever putting your privacy (or your client’s privacy) at risk.
SimplePractice may de-identify data to help build and improve tools for our customer base. For example, de-identified transcripts will be used to improve Note Taker and related AI features, which includes making draft notes clearer and more accurate, reducing irrelevant or repetitive language, and better reflecting how clinicians document care. When we do this, we follow a strict process under HIPAA to make sure no one can trace the data back to you.
The HIPAA Safe Harbor method
When SimplePractice de-identifies data, we follow a strict process under HIPAA to make sure no one can trace the data back to you or your clients. HIPAA’s Privacy Rule provides a specific method called "Safe Harbor" for de-identifying health data. Under Safe Harbor, all 18 of the following personal identifiers must be removed before data is considered de-identified. These identifiers apply not only to the individual, but also to their relatives, employers, and household members:
Names: First name, last name, or initials
Geographic information smaller than a state: Street address, city, county, and ZIP code (with limited exceptions for the first three digits of a ZIP code in populous areas)
Dates related to a person (except year): Birth date, appointment dates, discharge dates, date of death, and all ages over 89
Phone numbers
Fax numbers
Email addresses
Social Security numbers
Medical record numbers: Any number a practice or facility uses to identify a patient's file
Health plan beneficiary numbers: Insurance member IDs or subscriber numbers
Account numbers: Including billing or financial account numbers
Certificate or license numbers: Such as a driver's license or professional license number
Vehicle identifiers: Including license plate numbers and serial numbers
Device identifiers and serial numbers: Such as serial numbers on medical devices
Web URLs: Website addresses linked to a person
IP addresses: The digital address assigned to a person's computer or phone
Biometric identifiers: Including fingerprints and voiceprints
Full-face photographs and comparable images
Any other unique identifying number, characteristic, or code
Only after every one of these identifiers has been removed can data qualify as de-identified under Safe Harbor. Even a single remaining identifier means the data is still protected health information and subject to full HIPAA protections.
Here's an example of what de-identification looks like in practice:
Original transcript: Jane Doe came in on March 12, 2026 reporting persistent anxiety that began after she moved to her new apartment at 123 Fake Street, Anytown, ZZ 00000. She mentioned that her psychiatrist, Dr. Smith, prescribed Prozac last year but wants to revisit the dosage. I asked her to follow up by phone at (555) 555-0100 or by email at not.a.real.email@example.test so we can coordinate care.
De-identified transcript: [REDACTED_PERSON_1] came in on [REDACTED_DATE_TIME_1] reporting persistent anxiety that began after she moved to her new apartment at [REDACTED_LOCATION_1]. She mentioned that her psychiatrist, Dr. [REDACTED_PERSON_2], prescribed Prozac [REDACTED_DATE_TIME_2] but wants to revisit the dosage. I asked her to follow up by phone at [REDACTED_PHONE_NUMBER_1] or by email at [REDACTED_EMAIL_1] so we can coordinate care.
With names, locations, dates, and contact details removed and replaced by generic placeholders, the redacted version retains the clinical meaning of the note while shedding anything that could tie it to a specific individual.
De-identification mechanics with transcripts
The de-identification process begins after 7 days, or once you lock and sign the note, whichever comes first. At that point, the encrypted transcript is sent to our secure Snowflake environment, where it is run through three different ML de-identification tools that recognize and remove identifiers using a combination of pattern-matching, linguistic cues, and business-specific rules.
The de-identified transcripts live in a separate container in our production Snowflake environment, where it is intermingled with other de-identified transcripts and used to improve AI features.
De-coupling
In addition to taking steps to de-identify transcripts under the Safe Harbor standard, SimplePractice also de-couples the transcripts to remove any relationship between the clinician and client in the data.
De-coupling from the clinician and client happens in a few ways. The identifiers that are removed (such as names, locations, and dates) are replaced with generic placeholders that are not derived from the original values, so they cannot be reversed or traced back to a specific person. The de-identified output is also stored separately from any account, client, or appointment identifiers, meaning there is no link connecting the de-identified content back to its source. Access to this environment is tightly restricted and monitored via SimplePractice internal policies and processes.
While de-identification focuses on removing the listed identifiers, de-coupling is centered on removing any links between the parties.
Our commitment to you
De-identified data allows us to do things like improve our platform for our customers, understand how practitioners use our tools, and identify trends such as our Annual Report for practitioners, all without compromising your or your clients’ identifiable data.
Specific to transcripts, SimplePractice understands the heightened sensitivity that any free text clinical narratives might identify a specific individual given contextual information such as life events or specific disclosures. That is why we go above and beyond what is required by HIPAA and take additional steps to protect your and your clients’ privacy, including: (1) decoupling, which severs all links between the transcript and any specific client, clinician, practice, or session date; (2) giving minimal access to the de-identified transcripts with strict security controls and audit logs; (3) ensuring that de-identified transcripts remain within the SimplePractice technical ecosystem (e.g. not shared outside of our control); and (4) screening for and removing things like unique stories.
Your trust is the foundation of everything we build. SimplePractice is committed to HIPAA compliance and to protecting the privacy of every practitioner and client on our platform. If you have questions about how we handle your data, we encourage you to reach out to us at any time.