The similarity of different bags can be represented in a graph, and it can be measured by calculating the euclidean distance, dot product distance, or cosine of theta between two bags. If we use normalized values, then the amount of items/words in the bag would be treated the same as another one that has the same proportion.
Ivan Zhou