Day-Night Image Translations using Cycle-GAN.

4 min readAug 13, 2022

The task is to use Cycle GAN to trying to convert high resolution Day city-scapes images to Night images and vice versa. The dataset consists of high resolution Day and Night images (Unpaired). Cycle GAN allows us to use the dataset without any explicit image-target image pairs, by using cycle consistency loss to ensure that the images retain the semantic and structural components. Hence Cycle GAN builds on top of Pix2Pix GAN, with an additional Cycle Consistency loss.

About Cycle GAN

Paper Link : https://www.cs.cmu.edu/~junyanz/projects/CycleGAN/CycleGAN.pdfThe problem that Cycle GAN address can be classified as Image to Image translation. While there are other methods that address the problem with image- target pairs, obtaining these pairs is difficult and expensive. This method is applied for the tasks of style transfer, object transfiguration, and attribute transfer and claims to outperform baseline approaches.Cycle GAN provides the approach to translate an image from source domain X to target domain Y in absence of paired examples. The goal is to learn a mapping G: X-> Y , and inverse mapping F: Y -> X , and introduce a cycle consistency loss to enforce F(G(X)) = X and G(F(Y_hat)) = Y_hat. Our goal is to learn two mapping functions G,F over two domains X,Y ; such that G: X -> Y and F : Y -> X (i.e inverse mapping of G). We can think of X as set of pictures, and Y as the set of paintings, for the task of converting Pictures to Paintings. The functions G and F are Generators. 

We also have adversarial discriminators Dx and Dy, where Dx has the task to discriminating between images from set X (pictures) and set of images generated by F(Y)/X_hat (i.e the image generated by the inverse mapping Generator F). Similarly Dy has the task to discriminating between images from set Y (Paintings) and set of images generated by G(X)/Y_hat  (i.e the image generated by the generator G) .

We guide the training of the GAN using an objective that has two components : 
1) Adversarial loss: For mapping of input domain to target domain, and vice versa.
2) Cycle Consistency loss:  For enforcing cycle consistency so that F(G(X)) = X and G(F(Y)) = Y.

Losses

1) Adversarial loss : 

Loss for GAN G: LGAN(G,DY ,X,Y ) =Ey∼pdata(y)[log DY (y)] + Ex∼pdata (x)[log(1 −DY (G(x))]

Loss for GAN F : LGAN(F,DX ,Y,X ) =Ex∼pdata (x)[log Dx (x)] + Ey∼pdata (y)[log(1 −Dx (F(y))]

Where G tries to generate images similar to domain Y, and Dy tries to to discriminate between generated images and original images.
Similarly, F tries to generate images similar to domain X, and Dx tries to to discriminate between generated images and original images.

2) Cycle Consistency loss: The network can learn to map input images to some random set permutations in the output domain, which can match the target distribution.Hence using only a adversarial loss would not guarantee a desired and consistent mapping. Hence we apply Cycle consistency loss to enforce this behavior. 

Forward Cycle Consistency : x → G(x) → F(G(x)) ≈ x ; where x is the input image domain.

Backward Cycle Consistency : y → F(y) → G(F(y)) ≈ y ; where y is the target image domain.

Cycle loss : Lcyc(G,F) =Ex∼pdata(x) [‖F(G(x)) −x‖1] + Ey∼pdata(y) [‖G(F(y)) −y‖1]Hence the full loss is: L(G,F,DX ,DY) = LGAN(G,DY ,X,Y ) + LGAN(F,DX ,Y,X) + λ * Lcyc(G,F)

Implementation

Implementation from the Cycle GAN paper.1) Architecture : Used the generator architecture from Johnson et al, with instance normalization. The discriminator uses a Patch-GAN, which classifies whether 70*70 patch in the image is real or fake.
2) Training Details: 
   a) Adversarial Loss : Replace the Log-likelihood adversarial loss with the Least Square loss, as it is more stable during training.
   b) Total loss : Used the cycle consistency loss weighting parameter Lambda  = 10 .
   c) Optimization : Used Adam optimizer with LR = 0.0002 for first 100 epoch, linearly decaying into 0 in next 100 epochs. (i.e total 200 epochs)
   d) Batch Size : 1 
   c) Training updates: Used history of training images for updating Dx and Dy, using a buffer that stores 50 previously generated images . This reduces model oscillations (from strivastava et al).