mysql - Rails: A way to check for duplicate item in DB? Affiliate data feeds -


I have a problem with related data feeds

For example, Amazon or other e-shop From Partners

For example Amazon : Product Title: iPhone 5: I'm trying to import my product data, but want to avoid duplicate. 16 GB black

and another store uses the product title: iPhone 5 16 GB .

They should be listed as a product, now imagine that I have 10 stores in the sale of iPhone 5.

Of course, they have many more parameters, but I still need algorithms to prevent this from happening. A similarity of product parameters like matching algorithm.

Does anyone have experience with it and can tell me what kind of algorithm can be advised for this scenario?

A detailed list of parameters

Thanks a lot!

This can be done by EAN number, but if this number is not provided.

Before developing an alogorithm, you need to define business rules. If you have a situation, where all the features are left without the title, then you can try sub-string (partial of another) or fuzzy match on the title.

We are using fuzzy-string-match gem to find duplicate companies.

Assuming that the discrepancy is on the title only, you can put more intelligence into an algorithm by analyzing the title parts. In your example, the title part can be model, version, capacity and color. For this example:

  required_dates = [model, version, capacity] optional_data = [color]  

and define attributes for each product category . It is related to a fuzzy match and you should be able to get a good match on spelling errors and match the following:

  iPhone 5 16GB Black iPhone 5 16GB IPhone 5 16 GB White  

Comments

Popular posts from this blog

Member with no value in F# -

java - Joda Time Interval Not returning what I expect -

c# - Showing a SelectedItem's Property -