Completed

Vendor matches search 1

I’ve got a data analysis project.

In an excel (.xlsx) file, I have 2 sets of data, each set in a separate tab of the file:

Data set 1. “DB Units” contains 41,786 unique people names, that’s the client base of a Business chain, and there is a field (column) that identifies to which unit of the chain each client belongs.

Data set 2. “DB Vendor” contains 2,022 people names, they are visitors from a vendor who may have visited one or more units of the chain.

This Project consists in identifying which of the vendor visitors are already within the client base of the chain, and to which unit those found belong.

Be aware of the following:

- Both data sets come from different sources and the name of the same person may not be written identically. That is the challenge of this project; we have already identified the exact matches, but odds are that there are more matching visitors that have not yet been identified since one or more characters may differ between sets.

- The names are in Spanish and some contain the Spanish character “ñ”. We have taken out accents (á, é, í, ó, ú) from both sets.

- We have taken out special characters and double or triple spaces, leaving a single space between words on both sets.

- Complete names are usually formed by first names (first, middle and probably a third name) and 2 last names. Examples:

- Maria Antonieta Jimenez Martinez (Maria and Antonieta are first and middle names, Jimenez Martinez are both last names)

- Juan Roberto Lucas Vazquez Ramos (Juan Roberto Lucas are all names, while Vazquez Ramos are both last names)

We need someone who can go further than a vlookup search and can identify variations of the same name. We need you to identify and tag different types of matches:

- Type 1: Exact matches

- Type 2: High certainty matches: in which one or two letters may be missing from a typo but the rest of the name is the same

- Type 3: Medium certainty matches:

- One of two names may be omitted or abbreviated

E.g. Maria Rosa Jimenez Juarez vs. Ma. Rosa Jimenez Juarez, or Juan E. Lopez Vazquez vs. Juan Enrique Lopez Vazquez, or Juan Lopez Vazquez

- One last name may be missing. E.g. Juan Jose Rodriguez vs. Juan Jose Rodriguez Villanueva

- Any combination of the above that could lead to a match but there is not 100% certainty.

The end product is the list on the “DB Vendor” tab, having filled two fields for each match:

1. Match type

2. The corresponding “Consecutive unit” value for the Name Vendor that matched. (See excel file).

Skills: Data Analytics, Analytics

See more: mysql search tag query, matches product search criteria oscommerce, perl search replace html tag, imei vendor search, search html tag perl, tag search form joomla, seo centro tools search engines meta tag analyzer, perl search file line matches, https www freelancer com job search sell facebook accounts 1 see more at http www eurobdnews com it 16278 sthash dfikzrn7 dpuf, http://www.llrzeemap.com/pub?group=2267331&search=1&x=-97.385812&y=37.731971&z=14, search engines see your .com and com index html or com index php as different pages, magento 1.9 google tag manager, great designer search 1, how to search and replace a tag value in xml file using java, calculus of variations solved examples, type 1 and type 2 variations, google search 1.0 2, leaving a voicemail message examples, cs cart multi vendor v4 3.1 shopping cart edition white label

About the Employer:
( 0 reviews ) Coyoacan, Mexico

Project ID: #20103432

Awarded to:

hmehedy154

At first, take my sincere respect. I have gone through your project description. after that, I am able to understand that I have enough skills to do the project. I am a freelancer. I have lots of experience in data typ More

$30 USD in 7 days
(2 Reviews)
1.0
GuilhermeCiconi

Hi, I took a look at your dataset and I belive that with simple strings manipulation in R Language I can do it really easly. I'm a statistc's student and have large experience in R language for analysis. If you allow m More

$15 USD in 7 days
(0 Reviews)
0.0
$25 USD in 1 day
(1 Review)
0.8