• Data Project: CRM Dupe-Remover | Problem:

    Managing CRM data at scale often results in duplicate leads, contacts, and accounts scattered across teams. This leads to inaccurate reporting, wasted sales effort, and poor customer experiences. Without a reliable way to detect and merge duplicates, organizations face mounting data quality issues that erode trust in the CRM system.

  • Solution

    The CRM De-Dupe & Merge Dashboard acts as a data quality firewall for your CRM. Records are automatically scanned using fuzzy matching (names, emails, domains) and clustered with graph-based algorithms. Survivorship rules then select the most complete and accurate record, while merging duplicates seamlessly. Clean, exportable datasets are produced in minutes — without manual intervention.

  • Impact

    By implementing this dashboard, organizations gain real-time visibility into duplicate clusters, with every merge decision traceable and auditable. The result:

    30%+ reduction in duplicate records in test data

    Reliable, actionable CRM data for sales and marketing teams

    Faster campaigns and reporting with confidence in accuracy

    Significant time savings, freeing staff to focus on revenue, not cleanup

  • Tech Stack

    Python + Pandas → data processing

    RapidFuzz → fuzzy string matching

    NetworkX → graph clustering

    Streamlit → interactive UI

    Hugging Face Spaces → cloud deployment

  • Dive in!

    Try the demo: Upload your Leads, Contacts, and Accounts, or download pre-loaded sample files to test instantly.
    See how messy CRM data transforms into clean, reliable, and campaign-ready records.

🧹 CRM De-Dupe & Merge Demo

📦 Download Dummy Data

Each set has 100 rows — 70 unique + 30 deliberate dupes/typos.

Set A

Set B (extra noise)