Case Study: Analyzing Credit Card Fraud Data with ML
Updated: Jun 28, 2020
When creating a Machine Learning or AI application, we always want the most accurate predictions possible. Model optimization can give us better results, but only if we are willing to spend the time. For those of you who are unfamiliar with this topic, there are many tools and techniques available to optimize models for Machine Learning and Artificial Intelligence solutions. One of these ways is by tuning the model hyperparameters. Hyperparameters are those optional values you pass in to the model, such as number of features to select, number of samples to use, etc. This can give you a much higher score if done correctly.
I will show you how to use 2 of the most popular tools, GridSearchCV and CrossValidation. These tools are very flexible, you can use almost any model in them. I will start with a simple one - RandomForestClassifier.
The datasource I will use is an open source dataset containing credit card transactions with fraud/not fraud as the label. This dataset is heavily imbalanced, so I will also include some techniques on how to deal with this complex issue. My code is shown below.