Entity Resolution is the task of disambiguating manifestations of real world entities in various records or mentions by linking and grouping. For example, there could be different ways of addressing the same person in text, different addresses for businesses, or photos of a particular object. In this extra credit assignment, you will link two product catalogs.
## Getting Started
First, pull the most recent changes from the cmsc13600-public repository:
```
$ git pull
```
Then, copy the `ec` folder to your submission repository. Change directories to enter your submission repository. Your code will go into `analzy.py`. You can the files to the repository using `git add`:
```
$ git add analyze.py
$ git commit -m'initialized homework'
```
You will also need to fetch the datasets used in this homework assignment:
Download each of the files and put it into your `ec` folder.
Before we can get started, let us understand the main APIs in this project. We have provided a file named `core.py` for you. This file loads and processes the data that you've just downloaded. For example, you can load the Amazon catalog with the `amazon_catalog()` function. This returns an iterator over data tuples in the Amazon catalog. The fields are id, title, description, mfg (manufacturer), and price if any:
A matching is a pairing between id's in the Google catalog and the Amazon catalog that refer to the same product. The ground truth is listed in the file `Amzon_GoogleProducts_perfectMapping.csv`. Your job is to construct a list of pairs (or iterator of pairs) of `(amazon.id, google.id)`. These matchings can be evaluated for accuracy using the `eval_matching` function: