Implementing a CycleGAN for Image Translation

Implementing a CycleGAN for Image Translation

1. Import Libraries

import tensorflow as tf
from tensorflow.keras import layers, models
import numpy as np
import matplotlib.pyplot as plt

To start, we import TensorFlow and Keras to build and train our models, as well as NumPy to work with data and Matplotlib to see the pictures that were made.

2. Load and Preprocess the Data

dataset, metadata = tfds.load('cycle_gan/horse2zebra', with_info=True, as_supervised=True)
train_horses, train_zebras = dataset['trainA'], dataset['trainB']

def preprocess_image(image, label):
    image = tf.image.resize(image, [286, 286])
    image = tf.image.random_crop(image, size=[256, 256, 3])
    image = tf.image.random_flip_left_right(image)
    image = (image / 127.5) - 1
    return image

train_horses = train_horses.map(preprocess_image).batch(1)
train_zebras = train_zebras.map(preprocess_image).batch(1)

We use TensorFlow Datasets to load the horse2zebra dataset. The pictures are enlarged, cropped at random to 256x256 pixels, and made to be normal across the [-1, 1] range. Random horizontal flips are also used to add to the training data.

3. Define the Generators

def build_generator():
    inputs = layers.Input(shape=[256, 256, 3])
    down1 = layers.Conv2D(64, 4, strides=2, padding='same')(inputs)
    down1 = layers.LeakyReLU()(down1)

    down2 = layers.Conv2D(128, 4, strides=2, padding='same')(down1)
    down2 = layers.InstanceNormalization()(down2)
    down2 = layers.LeakyReLU()(down2)

    up1 = layers.Conv2DTranspose(128, 4, strides=2, padding='same')(down2)
    up1 = layers.InstanceNormalization()(up1)
    up1 = layers.ReLU()(up1)

    up2 = layers.Conv2DTranspose(64, 4, strides=2, padding='same')(up1)
    up2 = layers.InstanceNormalization()(up2)
    up2 = layers.ReLU()(up2)

    outputs = layers.Conv2D(3, 7, padding='same', activation='tanh')(up2)
    return models.Model(inputs, outputs)

A set of convolutional and inverted convolutional layers make up the generator model. Normalization of instances and activations of the ReLU help keep training stable. A tanh activation is used in the last layer to make outputs in the [-1, 1] band.

4. Define the Discriminators

def build_discriminator():
    inputs = layers.Input(shape=[256, 256, 3])
    down1 = layers.Conv2D(64, 4, strides=2, padding='same')(inputs)
    down1 = layers.LeakyReLU()(down1)

    down2 = layers.Conv2D(128, 4, strides=2, padding='same')(down1)
    down2 = layers.InstanceNormalization()(down2)
    down2 = layers.LeakyReLU()(down2)

    outputs = layers.Conv2D(1, 4, padding='same')(down2)
    return models.Model(inputs, outputs)

The discriminator model is a type of convolutional network that tells the difference between real and fake pictures. To make things run faster, it uses LeakyReLU activations and instance normalization.

5. Compile the Models

generator_g = build_generator()
generator_f = build_generator()
discriminator_x = build_discriminator()
discriminator_y = build_discriminator()

generator_g.compile(optimizer=tf.keras.optimizers.Adam(2e-4, beta_1=0.5), loss='mse')
generator_f.compile(optimizer=tf.keras.optimizers.Adam(2e-4, beta_1=0.5), loss='mse')
discriminator_x.compile(optimizer=tf.keras.optimizers.Adam(2e-4, beta_1=0.5), loss='mse')
discriminator_y.compile(optimizer=tf.keras.optimizers.Adam(2e-4, beta_1=0.5), loss='mse')

We use the Adam algorithm and mean squared error loss to help us build the generators and discriminators. This setup helps keep the mechanics of training fixed.

6. Training the CycleGAN

def train_cyclegan(epochs, batch_size=1):
    for epoch in range(epochs):
        for image_x, image_y in tf.data.Dataset.zip((train_horses, train_zebras)):
            with tf.GradientTape(persistent=True) as tape:
                fake_y = generator_g(image_x, training=True)
                fake_x = generator_f(image_y, training=True)

                cycle_x = generator_f(fake_y, training=True)
                cycle_y = generator_g(fake_x, training=True)

                disc_real_x = discriminator_x(image_x, training=True)
                disc_real_y = discriminator_y(image_y, training=True)
                disc_fake_x = discriminator_x(fake_x, training=True)
                disc_fake_y = discriminator_y(fake_y, training=True)

                gen_g_loss = tf.reduce_mean(tf.square(disc_fake_y - 1.0))
                gen_f_loss = tf.reduce_mean(tf.square(disc_fake_x - 1.0))

                cycle_loss = tf.reduce_mean(tf.abs(image_x - cycle_x)) + tf.reduce_mean(tf.abs(image_y - cycle_y))
                total_gen_g_loss = gen_g_loss + cycle_loss
                total_gen_f_loss = gen_f_loss + cycle_loss

                disc_x_loss = (tf.reduce_mean(tf.square(disc_real_x - 1.0)) + tf.reduce_mean(tf.square(disc_fake_x))) / 2.0
                disc_y_loss = (tf.reduce_mean(tf.square(disc_real_y - 1.0)) + tf.reduce_mean(tf.square(disc_fake_y))) / 2.0

            gradients_g = tape.gradient(total_gen_g_loss, generator_g.trainable_variables)
            gradients_f = tape.gradient(total_gen_f_loss, generator_f.trainable_variables)
            gradients_disc_x = tape.gradient(disc_x_loss, discriminator_x.trainable_variables)
            gradients_disc_y = tape.gradient(disc_y_loss, discriminator_y.trainable_variables)

            generator_g.optimizer.apply_gradients(zip(gradients_g, generator_g.trainable_variables))
            generator_f.optimizer.apply_gradients(zip(gradients_f, generator_f.trainable_variables))
            discriminator_x.optimizer.apply_gradients(zip(gradients_disc_x, discriminator_x.trainable_variables))
            discriminator_y.optimizer.apply_gradients(zip(gradients_disc_y, discriminator_y.trainable_variables))

        print(f'Epoch: {epoch}, Generator G Loss: {total_gen_g_loss.numpy()}, Generator F Loss: {total_gen_f_loss.numpy()}')

train_cyclegan(epochs=100)

The training loop is controlled by the train_cyclegan function. It uses gradient descent to update the generators and discriminators every time. Translations are useful and correct because of the cycle consistency loss, and pictures that are made are lifelike because of the discriminators. The producers are taught how to trick the discriminators, and the discriminators are taught how to tell the difference between real and fake pictures.

You can set up a CycleGAN to do single image-to-image translation by following these steps. This method makes it possible to change images in flexible and useful ways without using paired datasets. It opens up a lot of new options in computer vision and image processing.

logo

Generative AI

Implementing a CycleGAN for Image Translation

Beginner 5 Hours

Implementing a CycleGAN for Image Translation

1. Import Libraries

import tensorflow as tf
from tensorflow.keras import layers, models
import numpy as np
import matplotlib.pyplot as plt

To start, we import TensorFlow and Keras to build and train our models, as well as NumPy to work with data and Matplotlib to see the pictures that were made.

2. Load and Preprocess the Data

dataset, metadata = tfds.load('cycle_gan/horse2zebra', with_info=True, as_supervised=True)
train_horses, train_zebras = dataset['trainA'], dataset['trainB']

def preprocess_image(image, label):
    image = tf.image.resize(image, [286, 286])
    image = tf.image.random_crop(image, size=[256, 256, 3])
    image = tf.image.random_flip_left_right(image)
    image = (image / 127.5) - 1
    return image

train_horses = train_horses.map(preprocess_image).batch(1)
train_zebras = train_zebras.map(preprocess_image).batch(1)

We use TensorFlow Datasets to load the horse2zebra dataset. The pictures are enlarged, cropped at random to 256x256 pixels, and made to be normal across the [-1, 1] range. Random horizontal flips are also used to add to the training data.

3. Define the Generators

def build_generator():
    inputs = layers.Input(shape=[256, 256, 3])
    down1 = layers.Conv2D(64, 4, strides=2, padding='same')(inputs)
    down1 = layers.LeakyReLU()(down1)

    down2 = layers.Conv2D(128, 4, strides=2, padding='same')(down1)
    down2 = layers.InstanceNormalization()(down2)
    down2 = layers.LeakyReLU()(down2)

    up1 = layers.Conv2DTranspose(128, 4, strides=2, padding='same')(down2)
    up1 = layers.InstanceNormalization()(up1)
    up1 = layers.ReLU()(up1)

    up2 = layers.Conv2DTranspose(64, 4, strides=2, padding='same')(up1)
    up2 = layers.InstanceNormalization()(up2)
    up2 = layers.ReLU()(up2)

    outputs = layers.Conv2D(3, 7, padding='same', activation='tanh')(up2)
    return models.Model(inputs, outputs)

A set of convolutional and inverted convolutional layers make up the generator model. Normalization of instances and activations of the ReLU help keep training stable. A tanh activation is used in the last layer to make outputs in the [-1, 1] band.

4. Define the Discriminators

def build_discriminator():
    inputs = layers.Input(shape=[256, 256, 3])
    down1 = layers.Conv2D(64, 4, strides=2, padding='same')(inputs)
    down1 = layers.LeakyReLU()(down1)

    down2 = layers.Conv2D(128, 4, strides=2, padding='same')(down1)
    down2 = layers.InstanceNormalization()(down2)
    down2 = layers.LeakyReLU()(down2)

    outputs = layers.Conv2D(1, 4, padding='same')(down2)
    return models.Model(inputs, outputs)

The discriminator model is a type of convolutional network that tells the difference between real and fake pictures. To make things run faster, it uses LeakyReLU activations and instance normalization.

5. Compile the Models

generator_g = build_generator()
generator_f = build_generator()
discriminator_x = build_discriminator()
discriminator_y = build_discriminator()

generator_g.compile(optimizer=tf.keras.optimizers.Adam(2e-4, beta_1=0.5), loss='mse')
generator_f.compile(optimizer=tf.keras.optimizers.Adam(2e-4, beta_1=0.5), loss='mse')
discriminator_x.compile(optimizer=tf.keras.optimizers.Adam(2e-4, beta_1=0.5), loss='mse')
discriminator_y.compile(optimizer=tf.keras.optimizers.Adam(2e-4, beta_1=0.5), loss='mse')

We use the Adam algorithm and mean squared error loss to help us build the generators and discriminators. This setup helps keep the mechanics of training fixed.

6. Training the CycleGAN

def train_cyclegan(epochs, batch_size=1):
    for epoch in range(epochs):
        for image_x, image_y in tf.data.Dataset.zip((train_horses, train_zebras)):
            with tf.GradientTape(persistent=True) as tape:
                fake_y = generator_g(image_x, training=True)
                fake_x = generator_f(image_y, training=True)

                cycle_x = generator_f(fake_y, training=True)
                cycle_y = generator_g(fake_x, training=True)

                disc_real_x = discriminator_x(image_x, training=True)
                disc_real_y = discriminator_y(image_y, training=True)
                disc_fake_x = discriminator_x(fake_x, training=True)
                disc_fake_y = discriminator_y(fake_y, training=True)

                gen_g_loss = tf.reduce_mean(tf.square(disc_fake_y - 1.0))
                gen_f_loss = tf.reduce_mean(tf.square(disc_fake_x - 1.0))

                cycle_loss = tf.reduce_mean(tf.abs(image_x - cycle_x)) + tf.reduce_mean(tf.abs(image_y - cycle_y))
                total_gen_g_loss = gen_g_loss + cycle_loss
                total_gen_f_loss = gen_f_loss + cycle_loss

                disc_x_loss = (tf.reduce_mean(tf.square(disc_real_x - 1.0)) + tf.reduce_mean(tf.square(disc_fake_x))) / 2.0
                disc_y_loss = (tf.reduce_mean(tf.square(disc_real_y - 1.0)) + tf.reduce_mean(tf.square(disc_fake_y))) / 2.0

            gradients_g = tape.gradient(total_gen_g_loss, generator_g.trainable_variables)
            gradients_f = tape.gradient(total_gen_f_loss, generator_f.trainable_variables)
            gradients_disc_x = tape.gradient(disc_x_loss, discriminator_x.trainable_variables)
            gradients_disc_y = tape.gradient(disc_y_loss, discriminator_y.trainable_variables)

            generator_g.optimizer.apply_gradients(zip(gradients_g, generator_g.trainable_variables))
            generator_f.optimizer.apply_gradients(zip(gradients_f, generator_f.trainable_variables))
            discriminator_x.optimizer.apply_gradients(zip(gradients_disc_x, discriminator_x.trainable_variables))
            discriminator_y.optimizer.apply_gradients(zip(gradients_disc_y, discriminator_y.trainable_variables))

        print(f'Epoch: {epoch}, Generator G Loss: {total_gen_g_loss.numpy()}, Generator F Loss: {total_gen_f_loss.numpy()}')

train_cyclegan(epochs=100)

The training loop is controlled by the train_cyclegan function. It uses gradient descent to update the generators and discriminators every time. Translations are useful and correct because of the cycle consistency loss, and pictures that are made are lifelike because of the discriminators. The producers are taught how to trick the discriminators, and the discriminators are taught how to tell the difference between real and fake pictures.

You can set up a CycleGAN to do single image-to-image translation by following these steps. This method makes it possible to change images in flexible and useful ways without using paired datasets. It opens up a lot of new options in computer vision and image processing.

Frequently Asked Questions for generative-ai

line

Copyrights © 2024 letsupdateskills All rights reserved