—Reference · Documentation
Detection logic, Export Mapper, plan limits, and the public API surface — top to bottom.
On this page
Upload your CSV
Drop any CSV that contains an email column. DataBridge auto-detects the email column by its header name. All other columns are carried through untouched.
Analysis runs
Every email is checked across seven detection categories in a single pass: format validation, MX record lookup (so typo'd and parked domains don't slip through), duplicate detection, role-based patterns, no-reply patterns, disposable domain lists, and risky TLD lists.
Download the clean file
The clean file is a filtered version of your original CSV — same columns, same order, only the rows that passed every check. A full breakdown shows exactly how many rows were removed and why.
Emails that fail basic format validation — missing @, broken domain, illegal characters, etc.
john@, @gmail.com, user@@domain.comSyntactically valid emails whose domain has no MX record — meaning nothing on the internet can accept mail for them. Catches typo'd domains, parked domains, expired brands, and dev-only domains that slip past a format-only check.
user@gmial.co (typo), hello@parked-domain-xyz.com, test@example.testThe same email address appearing more than once. Only the first occurrence is kept.
alice@company.com appearing 3× → 2 removedGeneric organizational addresses not tied to a real person — info@, support@, billing@, admin@, etc.
info@company.com, support@startup.io, hello@brand.comAddresses configured to reject incoming mail — noreply@, do-not-reply@, bounce@, etc.
noreply@service.com, donotreply@platform.ioTemporary addresses from known throwaway providers — Mailinator, Guerrilla Mail, temp-mail, etc.
xyz123@mailinator.com, user@guerrillamail.comEmails on top-level domains with very high abuse rates — .xyz, .tk, .ml, .top, .icu, etc.
user@site.xyz, contact@promo.tkPreview is free. Applying the fix requires any paid plan.
When you upload a CSV with structural problems — an unclosed quote, a UTF-8 BOM from Excel, headers with stray whitespace, ragged rows — DataBridge marks the dataset as Invalid CSV instead of silently processing garbage. The dataset detail page surfaces an Auto-repair section that previews exactly what we’d clean and lets you apply the fix without re-uploading.
What we fix automatically
Unclosed quotes
An orphan " merges rows into one giant cell. We find the stray quote, remove it, and re-validate before declaring the file safe.
UTF-8 BOM
Excel and Google Sheets prepend an invisible byte that breaks email-column detection. We strip it.
Ragged rows
Rows with fewer or extra columns vs. the header. Short rows get padded with empty strings; long rows get truncated to the header width.
Whitespace on headers
Trims leading/trailing spaces so " email " matches the email-column detector.
Empty headers
Renamed to column_N so the column is preserved instead of silently dropped.
Duplicate headers
A second "email" column is renamed to email_2 so neither is lost.
How it works
Upload as usual
Drop your CSV on the hero or dashboard. If it has structural problems, the worker rejects it and the dataset gets the Invalid CSV status.
Open the failed dataset
From /dashboard/datasets, click the row tagged Invalid CSV. The detail page shows the failure reason and the Auto-repair section right below it.
Review what we'll fix
The Auto-repair section lists every fix we'd apply, points to the exact line and character of each issue, and shows a sample of the first 8 rows of the cleaned file.
Apply
Click Apply fixes & process. We re-upload the cleaned content as a new dataset (your original stays in your history) and queue it for processing.
Watch the live progress
The global upload status card tracks the worker's progress in real time. When it finishes you're navigated to the new dataset's detail page with the report.
What we won’t guess
If the file has no email column at all, or the orphan-quote heuristic can’t recover the file without producing inconsistent rows, Auto-repair will say so explicitly and ask you to fix the file by hand. We never “fix” in a way that risks corrupting your data.
Available on any paid plan.
The Export Mapper lets you rename column headers before downloading your clean file. Instead of raw column names from the original CSV, you define exactly what each header should be called in the output — so you can import directly into your email platform without touching the file manually.
Select a dataset
Choose any processed dataset with status ready from the list at /dashboard/mapper.
Review detected columns
The mapper loads all column names from your CSV. These are the source columns.
Define the output mapping
Toggle each source column on or off, and set the output name — the header that will appear in the exported CSV.
Apply a preset (optional)
Select a preset to auto-fill output names for a specific platform. You can still modify any field after applying.
Preview and download
A live preview shows your first rows with new headers. Click Download to generate and save the remapped CSV.
| Source column | Output name | Included |
|---|---|---|
email_address | Email Address | |
fname | First Name | |
surname | Last Name | |
signup_date | — | excluded |
Mailchimp
Audience import
Beehiiv
Subscriber import
HubSpot
CRM contact import
Klaviyo
List import
ConvertKit
Subscriber import
ActiveCampaign
Contact import
Custom
Define all names manually
Applying a preset does not lock the mapping — you can modify any field name afterwards.
Compare Lists lets you upload two CSV files and instantly find which email addresses appear in both, which are exclusive to each list, and download every segment separately. Processing happens entirely in your browser — your data is never sent to any server.
Free plan: 3 comparisons / month. Pro and Agency: unlimited + column mapper on export.
Upload two CSV files
Drop or browse for File 1 and File 2. DataBridge auto-detects the email column in each file using the header name or data pattern.
Click Compare
The comparison runs instantly in your browser. No upload, no processing queue — results appear in under a second for most files.
Review the three segments
In both files (matched emails), Only in File 1 (exclusive to the first list), Only in File 2 (exclusive to the second list).
Download what you need
Each segment can be downloaded independently. For matched emails, choose whether to export File 1's columns or File 2's columns. Pro users can customize and rename columns before downloading.
In both files
Emails present in File 1 AND File 2. Can be downloaded with File 1 columns or File 2 columns.
Only in File 1
Emails that exist in File 1 but have no match in File 2.
Only in File 2
Emails that exist in File 2 but have no match in File 1.
Matching is case-insensitive and trims whitespace. Alice@Example.com and alice@example.com are treated as the same address.
A token-authenticated, versioned API for cleaning email lists from your own infrastructure — without a browser session. Built on the same engine as the dashboard, including live MX, typo auto-fix, and ESP-aware export mapping. Try every endpoint right from the interactive reference.
Auth via Authorization: Bearer db_live_… tokens, generated from the dashboard. Webhooks (job.completed, job.failed, dataset.ready) are already live for paid plans — see the Settings → Webhooks tab.
Available on any paid plan.
Wire job results into your pipeline without polling. Configure the endpoint URL and signing secret in Settings → API & webhooks.
| Event | When it fires |
|---|---|
job.completed | A cleaning run finishes successfully (status flips to done). |
job.failed | A run terminated with an error — payload includes the error code. |
dataset.ready | The cleaned CSV + per-bucket stats are persisted and downloadable. |
Every payload is signed with HMAC-SHA256 using the secret you generate in Settings. Verify the X-DataBridge-Signature header against the raw request body before trusting the event. Retries follow exponential backoff (1m / 5m / 30m / 2h) with a 24-hour cap.
Every cleaned CSV ships with two extra columns appended to the original input:verdict and reason. The same trace travels with every row regardless of the surface (dashboard download, API response, or webhook payload) so compliance, growth, and engineering all read the same column to answer the same question: why was this address dropped?
email,name,verdict,reason,suggested founder@stripe.com,Patrick,deliverable,All checks passed, info@acme.io,Mailbox,role,Role-based mailbox, test@mailinator.com,Demo,disposable,Disposable provider, sara@gmial.com,Sara,typo,Did you mean sara@gmail.com?,sara@gmail.com
The suggested column only appears for rows where verdict === "typo" — it carries the corrected address so you can write it back to your CRM directly. See Auto-repair for the typo guard rules.
Possible verdict values: deliverable, syntax_invalid, no_mx, duplicate, role, no_reply, disposable, risky_tld, typo.
The sync POST /api/v1/clean endpoint caps the request body at 5 MB so it can fit inside a serverless function’s memory + timeout window. Three options if your file is bigger:
The fastest path that doesn’t require any new endpoint. ~10 lines of shell + a loop in your language of choice.
# Split into chunks of 50,000 rows (preserve header on each)
HEADER=$(head -1 list.csv)
tail -n +2 list.csv | split -l 50000 - chunk_
for f in chunk_*; do
(echo "$HEADER"; cat "$f") > "with_header_$f.csv"
done
# Then POST each chunk and merge results
for f in with_header_*.csv; do
curl -sX POST "https://databridge.so/api/v1/clean?format=csv" \
-H "Authorization: Bearer $DB_API_KEY" \
-H "Content-Type: text/csv" \
--data-binary @"$f" >> cleaned.csv
donePro tip: parallelise the inner loop with xargs -P 4 or your language’s concurrency primitive. Just respect the API rate limits — see Public API for the per-key budget.
Best for one-off jobs and ad-hoc cleanups. The dashboard worker runs outside serverless — no 5 MB cap, no execution-time ceiling. Drop a file of any size into /dashboard and you get the same cleaned CSV + per-bucket stats as the API would return.
If your pipeline needs to push hundreds of MB or millions of rows daily, we ship a custom POST /api/v1/jobs endpoint scoped to your account: signed upload URL → async processing → webhook on completion → download via export route.
Email support@databridge.so with your expected volume + frequency and we’ll have a beta endpoint against your account inside 2–3 business days. Standard pricing for Pro/Agency; custom rates above 1M rows/month.
| Plan | Price | Rows / month | Downloads | Compare |
|---|---|---|---|---|
| Free | $0 | 1,000 / mo | 1 / day | 3 / month |
| Pro | $29 / mo | 100,000 / mo | Unlimited | |
| Agency | $79 / mo | 1,000,000 / mo | Unlimited |
Need more rows? Purchase extra packs (25K for $9 · 100K for $29) from your billing page. Purchased rows stack on top of your monthly limit and never expire.
What exactly is in the clean file?
Every row from your original CSV that passed all checks — not invalid, not on an unreachable domain, not a duplicate, not role-based, not no-reply, not disposable, and not on a risky TLD. The original column structure is preserved exactly.
Do you actually verify if a mailbox exists?
We verify the domain, not the individual mailbox. For every unique domain in your file we do a live DNS MX lookup before processing — if the domain has no MX record, no server on the internet can receive mail for it, so every address at that domain goes to the Unreachable domain bucket. This catches typo'd and parked domains that regex-only checkers miss. Full SMTP mailbox probing is on the roadmap for a higher tier.
Are the original rows changed in any way?
No. DataBridge only filters rows — it never modifies email addresses or any other column. What goes in comes out, just with the bad rows removed.
What if my CSV has multiple columns?
DataBridge auto-detects the email column by looking for a header that contains the word "email". All other columns are preserved as-is in both the analysis and the clean file.
How are row limits counted?
Based on total rows processed per billing period, not clean rows. A file with 1,000 rows counts as 1,000 regardless of how many issues are found.
What is the Export Mapper?
A Pro/Agency feature that lets you rename column headers before downloading. Instead of exporting raw column names, you define exactly what each header should be called — so you can import directly into Mailchimp, HubSpot, Klaviyo, etc. without touching the file manually.
What happens if my CSV is broken?
We don't process broken files silently. If your upload has an unclosed quote, a Excel BOM, ragged rows, or other structural issues, the worker marks the dataset as Invalid CSV instead of producing wrong stats. The dataset detail page then shows an Auto-repair section that previews exactly what we'd clean — character by character — and lets you apply the fix without re-uploading. Pro/Agency plans can apply; free users see the preview.
Can I re-download a processed file later?
Yes. Every processed dataset is saved to your dashboard. Free plan allows 1 download per day; Pro and Agency plans have unlimited downloads.
Is my data stored permanently?
Processed files are stored securely and linked to your account. Raw uploaded files are deleted from temporary storage immediately after processing.