#自然语言处理#A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.
🆔 A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.
Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
#计算机科学#A powerful and modular toolkit for record linkage and duplicate detection in Python
#自然语言处理#Straightforward fuzzy matching, information retrieval and NLP building blocks for JavaScript.
🆔 Command line tool for deduplicating CSV files
🆔 Examples for using the dedupe library
#Awesome#A list of free data matching and record linkage software.
Super Fast String Matching in Python
🔎 Finds fuzzy matches between CSV files
#计算机科学#PyTorch library for transforming entities like companies, products, etc. into vectors to support scalable Record Linkage / Entity Resolution using Approximate Nearest Neighbors.
#计算机科学#Link Discovery Framework for Metric Spaces.
Spark RDD with Lucene's query and entity linkage capabilities
#自然语言处理#A convenient way to link, deduplicate, aggregate and cluster data(frames) in Python using deep learning
Resources for tackling record linkage / deduplication / data matching problems
Record Linkage ToolKit (Find and link entities)
Link Wikidata items to large catalogs
Python package for deduplication/entity resolution using active learning
#Awesome#List of entity resolution software and resources.
Python implementation of anonymous linkage using cryptographic linkage keys