An Experience in Kaggling
Updated: Oct 19
Often in a work environment, you are given the task of updating or improving a model that someone else built. Commonly known as "tweaking" the model. To demonstrate this, I took a Kaggle competition kernel/model and started tweaking it. Original kernel is here:
The data set contains some numerical data and some text data (house attributes such as square footage, and has pool or not), with a numerical output value to predict (sales price). I am only using Machine Learning techniques in this example, no neural nets are included. The kernel goes through a typical process to find a good solution.
1) cleanse the data
2) select and build the models
3) combine the models into an ensemble
4) save the predictions for submission
The competition is still ongoing, so I can only report the intermediate results. Out of 5700 valid entries, the original score was #328*, now it is #139* - top 2% - after my changes and adjustments. I am still working on this from time to time - the most recent kernel can be found here:
Along the way I learned a bit about ensemble methods, and how to apply them to a typical dataset. Also may have found a rare example where PCA did not help, and actually decreased the score. Since this is so rare, I should look into this further to determine if my PCA methods need a little "tweaking" of it's own.
* top 72 scores (0 wrong) in this competition are not real, the users just loaded the answers without modelling