kwan's note

캐글 -petal to medtal (95% correct w. efficient net) 본문


캐글 -petal to medtal (95% correct w. efficient net)

kwan's note 2021. 3. 11. 19:21

저번에는 google net 의 version 3인 inception v3를 이용해서 petal to metal의 classification을 진행하였습니다.

이번에는 조금 더 효율을 높이고자 augmentation을 하였고 또 inception v3모델이 아닌 최근 가장 강력한 모델중 하나인 efficient net을 이용하였습니다.


Petals to the Metal - Flower Classification on TPU

Getting Started with TPUs on Kaggle!


캐글 -petal to medtal (pre trained model) Petals to the Metal - Flower Classification on TPU Getting Started with TPUs on Kaggle! pedal to medal 을 오마주한 꽃 classification 이다. 104종류..


inception v3모델도 효율적인 모델이긴 하나 google net의 기본적인 idea를 이용하여 전개된 모델이고 inception v4와는 다르게 resnet의 아이디어를 활용하지 않은 모델입니다.

resnet의 rediual-connection을 이용하면 학습에 방해되는 (필요 없는 노드나 overfitting을 낳는 layer등)을 뛰어넘어 학습을 진행하고 전파할 수 있으므로 굉장히 효율적이게 됩니다.


여기서는 하지만 efficient net을 사용하고자 했습니다.

필터의 개수를 늘리는 width scaling 와 레이어의 개수를 늘리는 depth scaling 그리고 input image의 해상도를 높이는 resolution scaling의 최적값을 찾아 이용하는 방식입니다.

이를 이용하여 상위 14%의 성적을 받았습니다.



def decode_image(image_data):
    image = tf.image.decode_jpeg(image_data, channels=3)
    image = tf.cast(image, tf.float32) / 255.0  # convert image to floats in [0, 1] range
    image = tf.reshape(image, [*IMAGE_SIZE, 3]) # explicit size needed for TPU
    return image

def read_labeled_tfrecord(example):
        "image":[], tf.string), # tf.string means bytestring
        "class":[], tf.int64),  # shape [] means single element
    example =, LABELED_TFREC_FORMAT)
    image = decode_image(example['image'])
    label = tf.cast(example['class'], tf.int32)

    return image, label # returns a dataset of (image, label) pairs

def read_unlabeled_tfrecord(example):
        "image":[], tf.string), # tf.string means bytestring
        "id":[], tf.string),  # shape [] means single element
        # class is missing, this competitions's challenge is to predict flower classes for the test dataset
    image = decode_image(example['image'])
    idnum = example['id']
    return image, idnum # returns a dataset of image(s)

def load_dataset(filenames, labeled=True, ordered=False):
    # Read from TFRecords. For optimal performance, reading from multiple files at once and
    # disregarding data order. Order does not matter since we will be shuffling the data anyway.

    ignore_order =
    if not ordered:
        ignore_order.experimental_deterministic = False # disable order, increase speed

    dataset = # automatically interleaves reads from multiple files
    dataset = dataset.with_options(ignore_order) # uses data as soon as it streams in, rather than in its original order
    dataset = if labeled else read_unlabeled_tfrecord)
    # returns a dataset of (image, label) pairs if labeled=True or (image, id) pairs if labeled=False
    return dataset

def get_validation_dataset():
    dataset = load_dataset( + '/tfrecords-jpeg-512x512/val/*.tfrec'), labeled=True, ordered=False)
    dataset = dataset.batch(BATCH_SIZE)
    dataset = dataset.cache()
    return dataset

def get_test_dataset(ordered=False):
    dataset = load_dataset( + '/tfrecords-jpeg-512x512/test/*.tfrec'), labeled=False, ordered=ordered)
    dataset = dataset.batch(BATCH_SIZE)
    return dataset

def data_augment(image, label):
    crop_size = tf.random.uniform([], int(HEIGHT*.8), HEIGHT, dtype=tf.int32)
    image = tf.image.random_flip_left_right(image)
    image = tf.image.random_flip_up_down(image)
    image = tf.image.random_saturation(image, lower=0.7, upper=1.5)
    image = tf.image.random_contrast(image, lower=0.9, upper=1.5)
    image = tf.image.random_brightness(image, max_delta=.2)
#    image = tf.image.adjust_gamma(image, gamma=.6)

    image = tf.image.random_crop(image, size=[crop_size, crop_size, CHANNELS])
    image = tf.image.resize(image, size=[HEIGHT, WIDTH])

    return image, label

def get_training_dataset():
    dataset = load_dataset( + '/tfrecords-jpeg-512x512/train/*.tfrec'), labeled=True)
    dataset =, num_parallel_calls=AUTO)
    dataset = dataset.repeat() # the training dataset must repeat for several epochs
    dataset = dataset.shuffle(100000)
    dataset = dataset.batch(BATCH_SIZE)
    dataset = dataset.prefetch(AUTO) # prefetch next batch while training (autotune prefetch buffer size)
    return dataset

training_dataset = get_training_dataset()
validation_dataset = get_validation_dataset()

데이터셋을 만들었습니다.

이때 augmentation을 진행하였는데 flip, saturation, contrasity,brightness를 조절하였습니다.

저번에 shuffle의 크기를 데이터의 크기보다 작게 설정하여 정확히 shuffle이 안되어서 이번에는 충분히 높은 숫자로 설정하였습니다.



with strategy.scope():    
    pt_model = efficient.EfficientNetB7(
        input_shape=(512, 512, 3),
    """pt_model= tf.keras.applications.InceptionV3(
    include_top=False, weights='imagenet', input_tensor=None,
    input_shape=[*IMAGE_SIZE, 3]
    model = tf.keras.Sequential([
        layers.Dense(104, activation='softmax'),
        loss = 'sparse_categorical_crossentropy',


historical =, 
          validation_data=validation_dataset, callbacks=[early_stopping])

위와같이 pretrained model을 이용하여 돌렸는데 마지막에 제대로 학습되지 않고 일부 진동하는 모습을 보였습니다. 

진동폭이 적지 않아 learning rate를 decay 시키는것이 더 나을것 같다는생각을 했는데 일단은 optimizer로 adam을 사용하였고 마지막층만 학습하는것이 목적이었기 때문에 그대로 사용을 했다.

Downloading data from
258441216/258434480 [==============================] - 3s 0us/step
Model: "sequential"
Layer (type)                 Output Shape              Param #   
efficientnet-b7 (Functional) (None, 16, 16, 2560)      64097680  
global_average_pooling2d (Gl (None, 2560)              0         
dense (Dense)                (None, 104)               266344    
Total params: 64,364,024
Trainable params: 64,053,304
Non-trainable params: 310,720
Epoch 1/30
99/99 [==============================] - 758s 1s/step - loss: 2.5024 - sparse_categorical_accuracy: 0.4525 - val_loss: 1.4078 - val_sparse_categorical_accuracy: 0.7047
Epoch 2/30
99/99 [==============================] - 93s 939ms/step - loss: 0.6331 - sparse_categorical_accuracy: 0.8323 - val_loss: 0.6531 - val_sparse_categorical_accuracy: 0.8370
Epoch 3/30
99/99 [==============================] - 93s 938ms/step - loss: 0.4097 - sparse_categorical_accuracy: 0.8907 - val_loss: 0.5397 - val_sparse_categorical_accuracy: 0.8739
Epoch 4/30
99/99 [==============================] - 92s 926ms/step - loss: 0.3325 - sparse_categorical_accuracy: 0.9049 - val_loss: 0.4794 - val_sparse_categorical_accuracy: 0.8933
Epoch 5/30
99/99 [==============================] - 92s 925ms/step - loss: 0.2311 - sparse_categorical_accuracy: 0.9389 - val_loss: 0.4324 - val_sparse_categorical_accuracy: 0.9033
Epoch 6/30
99/99 [==============================] - 92s 929ms/step - loss: 0.2045 - sparse_categorical_accuracy: 0.9410 - val_loss: 0.4142 - val_sparse_categorical_accuracy: 0.9022
Epoch 7/30
99/99 [==============================] - 92s 929ms/step - loss: 0.1844 - sparse_categorical_accuracy: 0.9462 - val_loss: 0.4292 - val_sparse_categorical_accuracy: 0.9027
Epoch 8/30
99/99 [==============================] - 92s 925ms/step - loss: 0.1397 - sparse_categorical_accuracy: 0.9585 - val_loss: 0.4287 - val_sparse_categorical_accuracy: 0.9127
Epoch 9/30
99/99 [==============================] - 92s 927ms/step - loss: 0.1346 - sparse_categorical_accuracy: 0.9624 - val_loss: 0.4026 - val_sparse_categorical_accuracy: 0.9141
Epoch 10/30
99/99 [==============================] - 92s 929ms/step - loss: 0.1052 - sparse_categorical_accuracy: 0.9682 - val_loss: 0.4498 - val_sparse_categorical_accuracy: 0.9076
Epoch 11/30
99/99 [==============================] - 92s 927ms/step - loss: 0.1031 - sparse_categorical_accuracy: 0.9708 - val_loss: 0.3650 - val_sparse_categorical_accuracy: 0.9251
Epoch 12/30
99/99 [==============================] - 92s 929ms/step - loss: 0.0902 - sparse_categorical_accuracy: 0.9737 - val_loss: 0.4129 - val_sparse_categorical_accuracy: 0.9200
Epoch 13/30
99/99 [==============================] - 91s 924ms/step - loss: 0.0956 - sparse_categorical_accuracy: 0.9705 - val_loss: 0.4477 - val_sparse_categorical_accuracy: 0.9197
Epoch 14/30
99/99 [==============================] - 92s 927ms/step - loss: 0.0849 - sparse_categorical_accuracy: 0.9755 - val_loss: 0.6063 - val_sparse_categorical_accuracy: 0.8947
Epoch 15/30
99/99 [==============================] - 92s 930ms/step - loss: 0.0891 - sparse_categorical_accuracy: 0.9715 - val_loss: 0.4078 - val_sparse_categorical_accuracy: 0.9235
Epoch 16/30
99/99 [==============================] - 92s 928ms/step - loss: 0.0758 - sparse_categorical_accuracy: 0.9761 - val_loss: 0.5323 - val_sparse_categorical_accuracy: 0.9138
Epoch 00016: early stopping

다음으로 이제 trianable하게 동결을 풀어서 학습시키고자 하는데 위에 진동이 있었으므로 학습 rate를 어느정도 작게 해서 발산하지 않도록 고려했다.

with strategy.scope():    

        loss = 'sparse_categorical_crossentropy',

historical =, 
          validation_data=validation_dataset, callbacks=[early_stopping])
Epoch 1/30
99/99 [==============================] - 686s 1s/step - loss: 0.0600 - sparse_categorical_accuracy: 0.9845 - val_loss: 0.3759 - val_sparse_categorical_accuracy: 0.9340
Epoch 2/30
99/99 [==============================] - 91s 922ms/step - loss: 0.0397 - sparse_categorical_accuracy: 0.9871 - val_loss: 0.3406 - val_sparse_categorical_accuracy: 0.9383
Epoch 3/30
99/99 [==============================] - 91s 922ms/step - loss: 0.0365 - sparse_categorical_accuracy: 0.9892 - val_loss: 0.3212 - val_sparse_categorical_accuracy: 0.9407
Epoch 4/30
99/99 [==============================] - 90s 913ms/step - loss: 0.0272 - sparse_categorical_accuracy: 0.9927 - val_loss: 0.3171 - val_sparse_categorical_accuracy: 0.9418
Epoch 5/30
99/99 [==============================] - 90s 913ms/step - loss: 0.0301 - sparse_categorical_accuracy: 0.9910 - val_loss: 0.3100 - val_sparse_categorical_accuracy: 0.9426
Epoch 6/30
99/99 [==============================] - 90s 911ms/step - loss: 0.0266 - sparse_categorical_accuracy: 0.9907 - val_loss: 0.3052 - val_sparse_categorical_accuracy: 0.9432
Epoch 7/30
99/99 [==============================] - 90s 914ms/step - loss: 0.0240 - sparse_categorical_accuracy: 0.9942 - val_loss: 0.3037 - val_sparse_categorical_accuracy: 0.9432
Epoch 8/30
99/99 [==============================] - 90s 907ms/step - loss: 0.0180 - sparse_categorical_accuracy: 0.9953 - val_loss: 0.3002 - val_sparse_categorical_accuracy: 0.9453
Epoch 9/30
99/99 [==============================] - 90s 909ms/step - loss: 0.0201 - sparse_categorical_accuracy: 0.9935 - val_loss: 0.2993 - val_sparse_categorical_accuracy: 0.9459
Epoch 10/30
99/99 [==============================] - 90s 907ms/step - loss: 0.0185 - sparse_categorical_accuracy: 0.9953 - val_loss: 0.2989 - val_sparse_categorical_accuracy: 0.9445
Epoch 11/30
99/99 [==============================] - 90s 911ms/step - loss: 0.0183 - sparse_categorical_accuracy: 0.9953 - val_loss: 0.2965 - val_sparse_categorical_accuracy: 0.9453
Epoch 12/30
99/99 [==============================] - 90s 908ms/step - loss: 0.0156 - sparse_categorical_accuracy: 0.9961 - val_loss: 0.2945 - val_sparse_categorical_accuracy: 0.9459
Epoch 13/30
99/99 [==============================] - 90s 913ms/step - loss: 0.0204 - sparse_categorical_accuracy: 0.9943 - val_loss: 0.2928 - val_sparse_categorical_accuracy: 0.9453
Epoch 14/30
99/99 [==============================] - 90s 909ms/step - loss: 0.0178 - sparse_categorical_accuracy: 0.9955 - val_loss: 0.2925 - val_sparse_categorical_accuracy: 0.9467
Epoch 15/30
99/99 [==============================] - 90s 910ms/step - loss: 0.0170 - sparse_categorical_accuracy: 0.9951 - val_loss: 0.2923 - val_sparse_categorical_accuracy: 0.9467
Epoch 16/30
99/99 [==============================] - 90s 910ms/step - loss: 0.0164 - sparse_categorical_accuracy: 0.9964 - val_loss: 0.2903 - val_sparse_categorical_accuracy: 0.9461
Epoch 17/30
99/99 [==============================] - 90s 907ms/step - loss: 0.0146 - sparse_categorical_accuracy: 0.9963 - val_loss: 0.2904 - val_sparse_categorical_accuracy: 0.9477
Epoch 18/30
99/99 [==============================] - 90s 907ms/step - loss: 0.0128 - sparse_categorical_accuracy: 0.9965 - val_loss: 0.2915 - val_sparse_categorical_accuracy: 0.9459
Epoch 00018: early stopping

오히려 이번에는 너무 learning rate가 낮아서 멈춘게 아닌가 싶은 생각도 들었지만

early stopping의 min delta가 0.003으로 너무낮지는 않았던점, sparse_categorical_accuracy가 0.9965까지 꽤 많이 학습되었던점을봐서 어느정도 affordable하다고 생각하고 더이상 학습을 진행시키지는 않았다.


위 모델을 제출했고 accuracy 0.94358로 상위 19%의 성적을 받았다


efficient pre trained

Explore and run machine learning code with Kaggle Notebooks | Using data from Petals to the Metal - Flower Classification on TPU

여기서 위에서 바꾸고자 했던 파라미터 튜닝을 통해(첫번째 학습에선 learning rate을 낮추고, 두번째 학습에선 learning rate를 높였다) 0.95165의 accuracy로 상위 14%의 성적을 받았다.
