Figure 3 Comparison of performance of proposed recommendation system and category-recency based recommendation System According to Herlocker et al. [9], evaluating recommender systems and their algorithms is inherently difficult because different algorithms perform differently on different data sets. Also, such algorithms may be entirely inappropriate in a domain where there are many more items than users or vice versa. Similar differences exist for ratings density, ratings scale, and _ other properties of data sets. Our goal was to compare performance of the proposed recommendation system over previously existing system which suggests items from same category, recency- wise. Additionally, we wanted to expose a good variety of items to users rather then showing few items in random fashion. To meet our purpose, we used method of split testing (also called A/B testing), where half the time recommendations are served through previous system and rest of the time, recommendations are presented via proposed system.