Some lessons from Kaggle’s competition

Robin Dong 发表于 2018年11月02日 13:23 | Hits: 476
Tag: machine learning | Kaggle | Object Detection

About two months ago, I joined the competition of‘RSNA Pneumonia Detection’in Kaggle. It’s ended yesterday, but I still have many experiences and lessons to be rethinking.

1. Augmentation is extremely crucial. After using tf.image.sample_distorted_bounding_box() in my program, the mAP(mean Average Precision) of evaluating dataset thrived to a perfect number. Then I realised that I should have used radical augmentation method in the first place. Otherwise, for machine learning job such as image detection and image classification, the number of samples is only about tens of thousands which is quite small for extracting dense features. Thus we need use more powerful augmentation strategy or tools (albumentationmay be a good choice).

2. SGD is good for generalisation. Previously I used Adam to acquire outstanding training accuracy. But soon after, I found it is useless since evaluating accuracy is poor. For the samples are too few, I can’t use my evaluating dataset (10% cut from original data) to correctly evaluate the score on competition leaderboard. Without choice, I have to use only SGD to train my model in last stage.

3. Use more visual monitor tools. At first my model have high training accuracy and low evaluating accuracy, but after I added too many regularisation methods (such as dropout, weight decay) both training accuracy and evaluating accuracy thrinked to a too low value. The key for regularistaion of DNN is “Keep the fitting capability of training, and then try to rise evaluating accuracy”. So if I could monitor both training and evaluating accuracy at realtime, I would not trapped in dilemma.

4. Thinking more, experimenting less. Spend more time to understand and check the source code, the mechanism of model, instead of only adjusting hyper-parameters in vain.

There are still some questions I can’t answer at present:

1. Why even Resnet-50 can’t raise my training mAP up to 0.5 ?

2. Why perfect mAP value in my evaluating dataset can’t gain good score in this competition’s leaderboard ?

Will go on to discover them.

原文链接: http://donghao.org/2018/11/02/some-lessons-from-kaggles-competition/

0     0