The Jaccard index, also known as the Jaccard similarity coefficient is used for comparing the similarity and diversity of sample sets.
The Jaccard coefficient measures similarity between sample sets, and is defined as the size of the intersection divided by the size of the union of the sample sets:
- DISADVANTAGE:
- This technique has one disadvantage which makes everyone to use vector model. This method mainly searches the document bag of words for comparison. If a document has redundant sentences then the similarity value changes as expected.
- Regards,
- Rajasekhar.