Segmenting CityScapes using Pix2Pix GAN.

3 min readAug 8, 2022

The objective is to semantically segment the image on the left to the image in the right.

`The task is to segment the images from cityscapes into corresponding masks using Pix2Pix GAN. Traditionally we can use other segmentation approaches such as U-NET or other segmentation models, but this requires selecting a appropriate loss function and large training size to achieve good results.`

About Pix2Pix GAN

Pix2Pix GAN uses a conditional GAN to learn a mapping from image to target domain, using image-target image pairs for training. This is useful in a variety of tasks.Some details from the paper : A) Architecture : Generator uses a UNET type architecture for generating images from input images, while the discriminator uses a Patch-GAN to identify patches of generated images as real or fake. B) Loss Function : Generator Loss =  Binary-Crossentropy + (Lambda * Mean Absolute Error) ; where Mean absolute loss is the pixel wise MSE between generated image and original image.The paper uses a Lambda value of 100.  Discriminator loss : Binary-Crossentropy loss, classifying patches of images as real or fake using a Patch-GAN.C) Optimization : Uses a batch size of 1 while training, and Adam  as optimizer with Learning rate = 0.0002 and beta_1 = 0.5 and beta_2 = 0.999. One step update on Discriminator followed by 1 step update on the generator. D) Inference : Used Instance normalization instead of batch normalization during inference.

Some Applications of Pix2Pix GAN (from the paper)

. Semantic labels↔photo, trained on the Cityscapes dataset
• Architectural labels to photo, trained on CMP Facades
• Map to aerial photo, trained on data scraped from Google Maps.
• BW to color photos.
• Edges to photo.
• Sketch to photo: tests edges→photo models on human-drawn sketches.
• Day to night images.
• Thermal to color photos .
• Photo with missing pixels to inpainted photo, trained
on Paris StreetView.

Training progress

We trained the model for 50000 steps , where each step represents training on a single Image-Mask pair, using Adam optimizer with a learning rate of 0.0002, decaying it by a factor of 0.75 after every 10000 steps.

Segmenting an test set image using the trained model.

Following is the training progress GIFF, where each image represents the generated segmentation mask after training the model for 5000 steps.

Training progress, where each images is the output after 5000 steps of training.

Results

Following is the result of evaluation of the generator model, evaluated on the unseen test set, after training the model on 50,000 steps. A few samples from the test set are used for evaluating the performance of the trained model.