What are some tactics being used in testing AI driven features ? How can we measure the accuracy of recommendations / predictions / scores coming out of an AI feature ? How do we create data that mimics real world business scenarios to run these test cases ?