—Reference · Documentation

How DataBridge cleans your list.

Detection logic, Export Mapper, plan limits, and the public API surface — top to bottom.

On this page

How it works

Upload your CSV

Drop any CSV that contains an email column. DataBridge auto-detects the email column by its header name. All other columns are carried through untouched.

Analysis runs

Every email is checked across seven detection categories in a single pass: format validation, MX record lookup (so typo'd and parked domains don't slip through), duplicate detection, role-based patterns, no-reply patterns, disposable domain lists, and risky TLD lists.

Download the clean file

The clean file is a filtered version of your original CSV — same columns, same order, only the rows that passed every check. A full breakdown shows exactly how many rows were removed and why.

Detection categories

InvalidHigh

Emails that fail basic format validation — missing @, broken domain, illegal characters, etc.

Examplejohn@, @gmail.com, user@@domain.com

ResultAlways removed from the clean file. These will hard-bounce every time.

Unreachable domainHigh

Syntactically valid emails whose domain has no MX record — meaning nothing on the internet can accept mail for them. Catches typo'd domains, parked domains, expired brands, and dev-only domains that slip past a format-only check.

Exampleuser@gmial.co (typo), hello@parked-domain-xyz.com, test@example.test

ResultRemoved. We do a live DNS lookup on every unique domain before processing, so the clean file never contains addresses that are guaranteed to bounce.

DuplicatesHigh

The same email address appearing more than once. Only the first occurrence is kept.

Examplealice@company.com appearing 3× → 2 removed

ResultRemoved from the clean file. Sending to duplicates inflates metrics and irritates contacts.

Role-basedMedium

Generic organizational addresses not tied to a real person — info@, support@, billing@, admin@, etc.

Exampleinfo@company.com, support@startup.io, hello@brand.com

ResultRemoved. Role addresses have low engagement, high unsubscribes, and trigger spam filters.

No-replyHigh

Addresses configured to reject incoming mail — noreply@, do-not-reply@, bounce@, etc.

Examplenoreply@service.com, donotreply@platform.io

ResultRemoved. Sending to these guarantees a bounce or immediate discard.

DisposableHigh

Temporary addresses from known throwaway providers — Mailinator, Guerrilla Mail, temp-mail, etc.

Examplexyz123@mailinator.com, user@guerrillamail.com

ResultRemoved. These are abandoned almost immediately after sign-up.

Risky TLDMedium

Emails on top-level domains with very high abuse rates — .xyz, .tk, .ml, .top, .icu, etc.

Exampleuser@site.xyz, contact@promo.tk

ResultRemoved. These TLDs are disproportionately used for spam with low legitimate use.

Auto-repair

Paid

Preview is free. Applying the fix requires any paid plan.

When you upload a CSV with structural problems — an unclosed quote, a UTF-8 BOM from Excel, headers with stray whitespace, ragged rows — DataBridge marks the dataset as Invalid CSV instead of silently processing garbage. The dataset detail page surfaces an Auto-repair section that previews exactly what we’d clean and lets you apply the fix without re-uploading.

What we fix automatically

Unclosed quotes

An orphan " merges rows into one giant cell. We find the stray quote, remove it, and re-validate before declaring the file safe.

UTF-8 BOM

Excel and Google Sheets prepend an invisible byte that breaks email-column detection. We strip it.

Ragged rows

Rows with fewer or extra columns vs. the header. Short rows get padded with empty strings; long rows get truncated to the header width.

Whitespace on headers

Trims leading/trailing spaces so " email " matches the email-column detector.

Empty headers

Renamed to column_N so the column is preserved instead of silently dropped.

Duplicate headers

A second "email" column is renamed to email_2 so neither is lost.

How it works

Upload as usual

Drop your CSV on the hero or dashboard. If it has structural problems, the worker rejects it and the dataset gets the Invalid CSV status.

Open the failed dataset

From /dashboard/datasets, click the row tagged Invalid CSV. The detail page shows the failure reason and the Auto-repair section right below it.

Review what we'll fix

The Auto-repair section lists every fix we'd apply, points to the exact line and character of each issue, and shows a sample of the first 8 rows of the cleaned file.

Apply

Click Apply fixes & process. We re-upload the cleaned content as a new dataset (your original stays in your history) and queue it for processing.

Watch the live progress

The global upload status card tracks the worker's progress in real time. When it finishes you're navigated to the new dataset's detail page with the report.

What we won’t guess

If the file has no email column at all, or the orphan-quote heuristic can’t recover the file without producing inconsistent rows, Auto-repair will say so explicitly and ask you to fix the file by hand. We never “fix” in a way that risks corrupting your data.

Export Mapper

Paid

Available on any paid plan.

The Export Mapper lets you rename column headers before downloading your clean file. Instead of raw column names from the original CSV, you define exactly what each header should be called in the output — so you can import directly into your email platform without touching the file manually.

Select a dataset

Choose any processed dataset with status ready from the list at /dashboard/mapper.

Review detected columns

The mapper loads all column names from your CSV. These are the source columns.

Define the output mapping

Toggle each source column on or off, and set the output name — the header that will appear in the exported CSV.

Apply a preset (optional)

Select a preset to auto-fill output names for a specific platform. You can still modify any field after applying.

Preview and download

A live preview shows your first rows with new headers. Click Download to generate and save the remapped CSV.

Example mapping

Source column	Output name	Included
`email_address`	Email Address
`fname`	First Name
`surname`	Last Name
`signup_date`	—	excluded

Export presets

Mailchimp

Audience import

Beehiiv

Subscriber import

HubSpot

CRM contact import

Klaviyo

List import

ConvertKit

Subscriber import

ActiveCampaign

Contact import

Custom

Define all names manually

Applying a preset does not lock the mapping — you can modify any field name afterwards.

List comparison

Compare Lists lets you upload two CSV files and instantly find which email addresses appear in both, which are exclusive to each list, and download every segment separately. Processing happens entirely in your browser — your data is never sent to any server.

Free plan: 3 comparisons / month. Pro and Agency: unlimited + column mapper on export.

Upload two CSV files

Drop or browse for File 1 and File 2. DataBridge auto-detects the email column in each file using the header name or data pattern.

Click Compare

The comparison runs instantly in your browser. No upload, no processing queue — results appear in under a second for most files.

Review the three segments

In both files (matched emails), Only in File 1 (exclusive to the first list), Only in File 2 (exclusive to the second list).

Download what you need

Each segment can be downloaded independently. For matched emails, choose whether to export File 1's columns or File 2's columns. Pro users can customize and rename columns before downloading.

Result segments

In both files

Emails present in File 1 AND File 2. Can be downloaded with File 1 columns or File 2 columns.

Only in File 1

Emails that exist in File 1 but have no match in File 2.

Only in File 2

Emails that exist in File 2 but have no match in File 1.

Matching is case-insensitive and trims whitespace. Alice@Example.com and alice@example.com are treated as the same address.

Public API

PaidOpen interactive reference →

A token-authenticated, versioned API for cleaning email lists from your own infrastructure — without a browser session. Built on the same engine as the dashboard, including live MX, typo auto-fix, and ESP-aware export mapping. Try every endpoint right from the interactive reference.

POST  /api/v1/clean          — analyze a CSV, get verdicts per row
POST  /api/v1/clean/email    — JSON in/out, 1–100 emails per request
POST  /api/v1/clean/filter   — drop categories you pick, get a clean CSV
GET   /api/v1/datasets/{id}  — summary + status of a single run
GET   /api/v1/datasets       — paginated list of your runs

Auth via Authorization: Bearer db_live_… tokens, generated from the dashboard. Webhooks (job.completed, job.failed, dataset.ready) are already live for paid plans — see the Settings → Webhooks tab.

Webhooks

Paid

Available on any paid plan.

Wire job results into your pipeline without polling. Configure the endpoint URL and signing secret in Settings → API & webhooks.

Event	When it fires
`job.completed`	A cleaning run finishes successfully (status flips to done).
`job.failed`	A run terminated with an error — payload includes the error code.
`dataset.ready`	The cleaned CSV + per-bucket stats are persisted and downloadable.

Every payload is signed with HMAC-SHA256 using the secret you generate in Settings. Verify the X-DataBridge-Signature header against the raw request body before trusting the event. Retries follow exponential backoff (1m / 5m / 30m / 2h) with a 24-hour cap.

Audit trail

Every cleaned CSV ships with two extra columns appended to the original input:verdict and reason. The same trace travels with every row regardless of the surface (dashboard download, API response, or webhook payload) so compliance, growth, and engineering all read the same column to answer the same question: why was this address dropped?

email,name,verdict,reason,suggested
founder@stripe.com,Patrick,deliverable,All checks passed,
info@acme.io,Mailbox,role,Role-based mailbox,
test@mailinator.com,Demo,disposable,Disposable provider,
sara@gmial.com,Sara,typo,Did you mean sara@gmail.com?,sara@gmail.com

The suggested column only appears for rows where verdict === "typo" — it carries the corrected address so you can write it back to your CRM directly. See Auto-repair for the typo guard rules.

Possible verdict values: deliverable, syntax_invalid, no_mx, duplicate, role, no_reply, disposable, risky_tld, typo.

Files larger than 5 MB

The sync POST /api/v1/clean endpoint caps the request body at 5 MB so it can fit inside a serverless function’s memory + timeout window. Three options if your file is bigger:

Split your CSV and POST in batches

The fastest path that doesn’t require any new endpoint. ~10 lines of shell + a loop in your language of choice.

# Split into chunks of 50,000 rows (preserve header on each)
HEADER=$(head -1 list.csv)
tail -n +2 list.csv | split -l 50000 - chunk_
for f in chunk_*; do
  (echo "$HEADER"; cat "$f") > "with_header_$f.csv"
done

# Then POST each chunk and merge results
for f in with_header_*.csv; do
  curl -sX POST "https://databridge.so/api/v1/clean?format=csv" \
    -H "Authorization: Bearer $DB_API_KEY" \
    -H "Content-Type: text/csv" \
    --data-binary @"$f" >> cleaned.csv
done

Pro tip: parallelise the inner loop with xargs -P 4 or your language’s concurrency primitive. Just respect the API rate limits — see Public API for the per-key budget.

Upload from the dashboard

Best for one-off jobs and ad-hoc cleanups. The dashboard worker runs outside serverless — no 5 MB cap, no execution-time ceiling. Drop a file of any size into /dashboard and you get the same cleaned CSV + per-bucket stats as the API would return.

Custom async API for your volume

If your pipeline needs to push hundreds of MB or millions of rows daily, we ship a custom POST /api/v1/jobs endpoint scoped to your account: signed upload URL → async processing → webhook on completion → download via export route.

Email support@databridge.so with your expected volume + frequency and we’ll have a beta endpoint against your account inside 2–3 business days. Standard pricing for Pro/Agency; custom rates above 1M rows/month.

Plans & limits

Plan	Price	Rows / month	File size	Downloads	Export Mapper	Compare
Free	$0	1,000 / mo	10 MB	1 / day	—	3 / month
Pro	$29 / mo	100,000 / mo	200 MB	Unlimited
Agency	$79 / mo	1,000,000 / mo	1 GB	Unlimited

Need more rows? Purchase extra packs (25K for $9 · 100K for $29) from your billing page. Purchased rows stack on top of your monthly limit and never expire.

FAQ

What exactly is in the clean file?

Every row from your original CSV that passed all checks — not invalid, not on an unreachable domain, not a duplicate, not role-based, not no-reply, not disposable, and not on a risky TLD. The original column structure is preserved exactly.

Do you actually verify if a mailbox exists?

We verify the domain, not the individual mailbox. For every unique domain in your file we do a live DNS MX lookup before processing — if the domain has no MX record, no server on the internet can receive mail for it, so every address at that domain goes to the Unreachable domain bucket. This catches typo'd and parked domains that regex-only checkers miss. Full SMTP mailbox probing is on the roadmap for a higher tier.

Are the original rows changed in any way?

No. DataBridge only filters rows — it never modifies email addresses or any other column. What goes in comes out, just with the bad rows removed.

What if my CSV has multiple columns?

DataBridge auto-detects the email column by looking for a header that contains the word "email". All other columns are preserved as-is in both the analysis and the clean file.

How are row limits counted?

Based on total rows processed per billing period, not clean rows. A file with 1,000 rows counts as 1,000 regardless of how many issues are found.

What is the Export Mapper?

A Pro/Agency feature that lets you rename column headers before downloading. Instead of exporting raw column names, you define exactly what each header should be called — so you can import directly into Mailchimp, HubSpot, Klaviyo, etc. without touching the file manually.

What happens if my CSV is broken?

We don't process broken files silently. If your upload has an unclosed quote, a Excel BOM, ragged rows, or other structural issues, the worker marks the dataset as Invalid CSV instead of producing wrong stats. The dataset detail page then shows an Auto-repair section that previews exactly what we'd clean — character by character — and lets you apply the fix without re-uploading. Pro/Agency plans can apply; free users see the preview.

Can I re-download a processed file later?

Yes. Every processed dataset is saved to your dashboard. Free plan allows 1 download per day; Pro and Agency plans have unlimited downloads.

Is my data stored permanently?

Processed files are stored securely and linked to your account. Raw uploaded files are deleted from temporary storage immediately after processing.

Still have questions?

Our support team usually responds within a few hours.

Try it free View pricing

—Reference · Documentation

How DataBridge cleans your list.

Detection logic, Export Mapper, plan limits, and the public API surface — top to bottom.

On this page

How it works

Upload your CSV

Drop any CSV that contains an email column. DataBridge auto-detects the email column by its header name. All other columns are carried through untouched.

Analysis runs

Download the clean file

The clean file is a filtered version of your original CSV — same columns, same order, only the rows that passed every check. A full breakdown shows exactly how many rows were removed and why.

Detection categories

InvalidHigh

Emails that fail basic format validation — missing @, broken domain, illegal characters, etc.

Examplejohn@, @gmail.com, user@@domain.com

ResultAlways removed from the clean file. These will hard-bounce every time.

Unreachable domainHigh

Exampleuser@gmial.co (typo), hello@parked-domain-xyz.com, test@example.test

ResultRemoved. We do a live DNS lookup on every unique domain before processing, so the clean file never contains addresses that are guaranteed to bounce.

DuplicatesHigh

The same email address appearing more than once. Only the first occurrence is kept.

Examplealice@company.com appearing 3× → 2 removed

ResultRemoved from the clean file. Sending to duplicates inflates metrics and irritates contacts.

Role-basedMedium

Generic organizational addresses not tied to a real person — info@, support@, billing@, admin@, etc.

Exampleinfo@company.com, support@startup.io, hello@brand.com

ResultRemoved. Role addresses have low engagement, high unsubscribes, and trigger spam filters.

No-replyHigh

Addresses configured to reject incoming mail — noreply@, do-not-reply@, bounce@, etc.

Examplenoreply@service.com, donotreply@platform.io

ResultRemoved. Sending to these guarantees a bounce or immediate discard.

DisposableHigh

Temporary addresses from known throwaway providers — Mailinator, Guerrilla Mail, temp-mail, etc.

Examplexyz123@mailinator.com, user@guerrillamail.com

ResultRemoved. These are abandoned almost immediately after sign-up.

Risky TLDMedium

Emails on top-level domains with very high abuse rates — .xyz, .tk, .ml, .top, .icu, etc.

Exampleuser@site.xyz, contact@promo.tk

ResultRemoved. These TLDs are disproportionately used for spam with low legitimate use.

Auto-repair

Paid

Preview is free. Applying the fix requires any paid plan.

What we fix automatically

Unclosed quotes

An orphan " merges rows into one giant cell. We find the stray quote, remove it, and re-validate before declaring the file safe.

UTF-8 BOM

Excel and Google Sheets prepend an invisible byte that breaks email-column detection. We strip it.

Ragged rows

Rows with fewer or extra columns vs. the header. Short rows get padded with empty strings; long rows get truncated to the header width.

Whitespace on headers

Trims leading/trailing spaces so " email " matches the email-column detector.

Empty headers

Renamed to column_N so the column is preserved instead of silently dropped.

Duplicate headers

A second "email" column is renamed to email_2 so neither is lost.

How it works

Upload as usual

Drop your CSV on the hero or dashboard. If it has structural problems, the worker rejects it and the dataset gets the Invalid CSV status.

Open the failed dataset

From /dashboard/datasets, click the row tagged Invalid CSV. The detail page shows the failure reason and the Auto-repair section right below it.

Review what we'll fix

The Auto-repair section lists every fix we'd apply, points to the exact line and character of each issue, and shows a sample of the first 8 rows of the cleaned file.

Apply

Click Apply fixes & process. We re-upload the cleaned content as a new dataset (your original stays in your history) and queue it for processing.

Watch the live progress

The global upload status card tracks the worker's progress in real time. When it finishes you're navigated to the new dataset's detail page with the report.

What we won’t guess

Export Mapper

Paid

Available on any paid plan.

Select a dataset

Choose any processed dataset with status ready from the list at /dashboard/mapper.

Review detected columns

The mapper loads all column names from your CSV. These are the source columns.

Define the output mapping

Toggle each source column on or off, and set the output name — the header that will appear in the exported CSV.

Apply a preset (optional)

Select a preset to auto-fill output names for a specific platform. You can still modify any field after applying.

Preview and download

A live preview shows your first rows with new headers. Click Download to generate and save the remapped CSV.

Example mapping

Source column	Output name	Included
`email_address`	Email Address
`fname`	First Name
`surname`	Last Name
`signup_date`	—	excluded

Export presets

Mailchimp

Audience import

Beehiiv

Subscriber import

HubSpot

CRM contact import

Klaviyo

List import

ConvertKit

Subscriber import

ActiveCampaign

Contact import

Custom

Define all names manually

Applying a preset does not lock the mapping — you can modify any field name afterwards.

List comparison

Free plan: 3 comparisons / month. Pro and Agency: unlimited + column mapper on export.

Upload two CSV files

Drop or browse for File 1 and File 2. DataBridge auto-detects the email column in each file using the header name or data pattern.

Click Compare

The comparison runs instantly in your browser. No upload, no processing queue — results appear in under a second for most files.

Review the three segments

In both files (matched emails), Only in File 1 (exclusive to the first list), Only in File 2 (exclusive to the second list).

Download what you need

Each segment can be downloaded independently. For matched emails, choose whether to export File 1's columns or File 2's columns. Pro users can customize and rename columns before downloading.

Result segments

In both files

Emails present in File 1 AND File 2. Can be downloaded with File 1 columns or File 2 columns.

Only in File 1

Emails that exist in File 1 but have no match in File 2.

Only in File 2

Emails that exist in File 2 but have no match in File 1.

Matching is case-insensitive and trims whitespace. Alice@Example.com and alice@example.com are treated as the same address.

Public API

PaidOpen interactive reference →

POST  /api/v1/clean          — analyze a CSV, get verdicts per row
POST  /api/v1/clean/email    — JSON in/out, 1–100 emails per request
POST  /api/v1/clean/filter   — drop categories you pick, get a clean CSV
GET   /api/v1/datasets/{id}  — summary + status of a single run
GET   /api/v1/datasets       — paginated list of your runs

Webhooks

Paid

Available on any paid plan.

Wire job results into your pipeline without polling. Configure the endpoint URL and signing secret in Settings → API & webhooks.

Event	When it fires
`job.completed`	A cleaning run finishes successfully (status flips to done).
`job.failed`	A run terminated with an error — payload includes the error code.
`dataset.ready`	The cleaned CSV + per-bucket stats are persisted and downloadable.

Audit trail

email,name,verdict,reason,suggested
founder@stripe.com,Patrick,deliverable,All checks passed,
info@acme.io,Mailbox,role,Role-based mailbox,
test@mailinator.com,Demo,disposable,Disposable provider,
sara@gmial.com,Sara,typo,Did you mean sara@gmail.com?,sara@gmail.com

The suggested column only appears for rows where verdict === "typo" — it carries the corrected address so you can write it back to your CRM directly. See Auto-repair for the typo guard rules.

Possible verdict values: deliverable, syntax_invalid, no_mx, duplicate, role, no_reply, disposable, risky_tld, typo.

Files larger than 5 MB

The sync POST /api/v1/clean endpoint caps the request body at 5 MB so it can fit inside a serverless function’s memory + timeout window. Three options if your file is bigger:

Split your CSV and POST in batches

The fastest path that doesn’t require any new endpoint. ~10 lines of shell + a loop in your language of choice.

# Split into chunks of 50,000 rows (preserve header on each)
HEADER=$(head -1 list.csv)
tail -n +2 list.csv | split -l 50000 - chunk_
for f in chunk_*; do
  (echo "$HEADER"; cat "$f") > "with_header_$f.csv"
done

# Then POST each chunk and merge results
for f in with_header_*.csv; do
  curl -sX POST "https://databridge.so/api/v1/clean?format=csv" \
    -H "Authorization: Bearer $DB_API_KEY" \
    -H "Content-Type: text/csv" \
    --data-binary @"$f" >> cleaned.csv
done

Pro tip: parallelise the inner loop with xargs -P 4 or your language’s concurrency primitive. Just respect the API rate limits — see Public API for the per-key budget.

Upload from the dashboard

Custom async API for your volume

Plans & limits

Plan	Price	Rows / month	File size	Downloads	Export Mapper	Compare
Free	$0	1,000 / mo	10 MB	1 / day	—	3 / month
Pro	$29 / mo	100,000 / mo	200 MB	Unlimited
Agency	$79 / mo	1,000,000 / mo	1 GB	Unlimited

Need more rows? Purchase extra packs (25K for $9 · 100K for $29) from your billing page. Purchased rows stack on top of your monthly limit and never expire.

FAQ

What exactly is in the clean file?

Do you actually verify if a mailbox exists?

Are the original rows changed in any way?

No. DataBridge only filters rows — it never modifies email addresses or any other column. What goes in comes out, just with the bad rows removed.

What if my CSV has multiple columns?

DataBridge auto-detects the email column by looking for a header that contains the word "email". All other columns are preserved as-is in both the analysis and the clean file.

How are row limits counted?

Based on total rows processed per billing period, not clean rows. A file with 1,000 rows counts as 1,000 regardless of how many issues are found.

What is the Export Mapper?

What happens if my CSV is broken?

Can I re-download a processed file later?

Yes. Every processed dataset is saved to your dashboard. Free plan allows 1 download per day; Pro and Agency plans have unlimited downloads.

Is my data stored permanently?

Processed files are stored securely and linked to your account. Raw uploaded files are deleted from temporary storage immediately after processing.

Still have questions?

Our support team usually responds within a few hours.

Try it free View pricing