DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • How to Port CV/ML Models to NPU for Faster Face Recognition
  • Unsupervised Learning Methods for Analyzing Encrypted Network Traffic
  • Understanding Neural Networks
  • Understanding the Basics of Neural Networks and Deep Learning

Trending

  • The Developer's Guide to Context-Aware AI: When Your Code Documentation Becomes Intelligent
  • Why Google Data Migration Gets Stuck at 99%: Causes and Proven Fixes
  • Contract-First Integration: Building Scalable Systems With Flyway, OpenAPI, and Kafka
  • S3 Vectors: How to Build a RAG Without a Vector Database
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. How We Trained a Neural Network to Generate Shadows in a Photo: Part 2

How We Trained a Neural Network to Generate Shadows in a Photo: Part 2

In this article, we prepare for training and look at loss functions and metrics.

By 
Artyom Nazarenko user avatar
Artyom Nazarenko
·
Feb. 19, 21 · Tutorial
Likes (5)
Comment
Save
Tweet
Share
6.8K Views

Join the DZone community and get the full member experience.

Join For Free

In this series, Artem Nazarenko, Computer Vision Engineer at Everypixel shows you how you can implement the architecture of a neural network. In the first part, we were talking about the working principles of GAN and methods of collecting datasets for training. This part is about preparing for GAN training.

Loss Functions and Metrics

Attention. At this point, we deviate from the reference article. We take the loss function to solve the segmentation problem. Generation of attention maps (masks) can be considered as a classic image segmentation problem. We take Dice Loss as the loss function. It is well resilient to unbalanced data.

We take IoU (Intersection over Union) as a metric.

Learn more about Dice Loss and IoU.

Shadow Generation. We take the loss function for the generation block similar to the one given in the original article. It consists of a weighted sum of three loss functions: L2, Lper and Ladv:

L2 estimates the distance from the ground truth image to the generated ones (before and after the refinement block, denoted as R).

Lper (perceptual loss) is a loss function that calculates the distance between feature maps of the VGG16 network when images are run through it. The difference is considered the standard MSE between the ground truth image with a shadow and the generated images — before and after the refinement block, respectively.

Ladv is a standard adversarial loss that takes into account the competitive nature of the generator and the discriminator. D (.) is the probability of belonging to the "real image" class. During training, the generator tries to minimize Ladv, while the discriminator, on the contrary, tries to maximize it.

Preparation

Installing the required modules. To implement ARShadowGAN-like, we will use Python deep learning library – PyTorch.

Libraries in use. We start the work by installing the required modules:

  • to import U-Net architecture,
  • for augmentations,
  • to import the required loss function,
  • for rendering images inside Jupyter notebooks,
  • to work with arrays,
  • to work with images,
  • to visualize training schedules,
  • for neural networks and deep learning,
  • to import models, for deep learning,
  • for progress bar visualization.
Shell
 




xxxxxxxxxx
1
10


 
1
pip install segmentation-models-pytorch==0.1.0
2
pip install albumentations==0.5.1
3
pip install piq==0.5.1
4
pip install matplotlib==3.2.1
5
pip install numpy==1.18.4
6
pip install opencv-python>=3.4.5.20
7
pip install tensorboard==2.2.1
8
pip install torch>=1.5.0
9
pip install torchvision>=0.6.0
10
pip install tqdm>=4.41.1



Dataset 

Dataset: structure, download, unpacking. For training and testing purposes, I will use a ready-made dataset. The data is already split into train and test samples. We download and unpack it.

Shell
 




xxxxxxxxxx
1


 
1
unzip shadow_ar_dataset.zip



The folder structure in the dataset is as follows. Each of the samples contains five folders with the following images:

  • noshadow (shadow-free images),
  • shadow (images with shadows),
  • mask (masks of inserted objects),
  • robject (neighboring objects or occluders),
  • rshadow (shadows from neighboring objects).

You can prepare your dataset with a similar file structure.

We prepare the ARDataset class for image processing and issuing the i-th batch of data on request.

Python
 




xxxxxxxxxx
1
13


 
1
import os
2
import os.path as osp
3
import cv2
4
import random
5
import numpy as np
6
import albumentations as albu
7

          
8
import torch
9
import torch.nn as nn
10
from torch.utils.data import Dataset, DataLoader
11
from torch.autograd import Variable
12
from piq import ContentLoss
13
import segmentation_models_pytorch as smp



Then, we provide the class. The main function of the class is __getitem __ (). It returns the i-th image and the corresponding mask on request.

Python
 




xxxxxxxxxx
1
113


 
1
class ARDataset(Dataset):
2
    def __init__(self, dataset_path, augmentation=None, \
3
                 augmentation_images=None, preprocessing=None, \
4
                 is_train=True, ):
5
        """ Initializing dataset parameters
6
        dataset_path — path to the train or test folder
7
        augmentation — augmentations applied to both images and masks
8
        augmentation_images — augmentations applied only to images
9
        preprocessing — image preprocessing
10
        is_train — flag (True - training mode, False - prediction mode)
11
        """
12
        noshadow_path = os.path.join(dataset_path, 'noshadow')
13
        mask_path = os.path.join(dataset_path, 'mask')
14

          
15
        # Collect paths to files
16
        self.noshadow_paths = []; self.mask_paths = [];
17
        self.rshadow_paths = []; self.robject_paths = [];
18
        self.shadow_paths = [];
19

          
20
        if is_train:
21
            rshadow_path = osp.join(dataset_path, 'rshadow')
22
            robject_path = osp.join(dataset_path, 'robject')
23
            shadow_path = osp.join(dataset_path, 'shadow')
24

          
25
        files_names_list = sorted(os.listdir(noshadow_path))
26

          
27
        for file_name in files_names_list:
28
            self.noshadow_paths.append(osp.join(noshadow_path, file_name))
29
            self.mask_paths.append(osp.join(mask_path, file_name))
30

          
31
            if is_train:
32
                self.rshadow_paths.append(osp.join(rshadow_path, file_name))
33
                self.robject_paths.append(osp.join(robject_path, file_name))
34
                self.shadow_paths.append(osp.join(shadow_path, file_name))
35

          
36
        self.augmentation = augmentation
37
        self.augmentation_images = augmentation_images
38
        self.preprocessing = preprocessing
39
        self.is_train = is_train
40

          
41
        
42
    def __getitem__(self, i):
43
        """ Getting the ith set from the dataset.
44
        i — index
45

          
46
        It returns:
47
        image — image with normalization for the attention block
48
        mask — mask with normalization for the attention block
49
        image1 — image with normalization for the shadow generation block
50
        mask1 — mask with normalization for the shadow generation block
51
        """
52
        # Original image
53
        image = cv2.imread(self.noshadow_paths[i])
54
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
55

          
56
        # Mask of the inserted object
57
        mask = cv2.imread(self.mask_paths[i], 0)
58

          
59
        if self.is_train:
60
            # Mask of neighboring objects
61
            robject_mask = cv2.imread(self.robject_paths[i], 0)
62

          
63
            # Mask of shadows from neighboring objects
64
            rshadow_mask = cv2.imread(self.rshadow_paths[i], 0)
65

          
66
            # Resulting image
67
            res_image = cv2.imread(self.shadow_paths[i])
68
            res_image = cv2.cvtColor(res_image, cv2.COLOR_BGR2RGB)
69

          
70
            # Apply augmentation to images separately
71
            if self.augmentation_images:
72
                sample = self.augmentation_images(
73
                  image=image, 
74
                  image1=res_image
75
                )
76
                image = sample['image']
77
                res_image = sample['image1']
78

          
79
            # Collect masks into one variable to apply augmentations
80
            mask = np.stack([robject_mask, rshadow_mask, mask], axis=-1)
81
            mask = mask.astype('float')
82

          
83
            # Do the same for images
84
            image = np.concatenate([image, res_image], axis=2)
85
            image = image.astype('float')
86

          
87
        # Apply augmentation
88
        if self.augmentation:
89
            sample = self.augmentation(image=image, mask=mask)
90
            image, mask = sample['image'], sample['mask']
91

          
92
        # Normalization of masks
93
        mask[mask >= 128] = 255; mask[mask < 128] = 0
94
        # Normalization for the shadow generation block
95
        image1, mask1 = image.astype(np.float) / 127.5 - 1.0, \
96
                        mask.astype(np.float) / 127.5 - 1.0
97
        # Normalization for the attention block
98
        image, mask = image.astype(np.float) / 255.0, \
99
                      mask.astype(np.float) / 255.0
100

          
101
        # Preprocessing
102
        if self.preprocessing:
103
            sample = self.preprocessing(image=image, mask=mask)
104
            image, mask = sample['image'], sample['mask']
105

          
106
            sample = self.preprocessing(image=image1, mask=mask1)
107
            image1, mask1 = sample['image'], sample['mask']
108

          
109
        return image, mask, image1, mask1
110

          
111
      
112
    def __len__(self):
113
        """ It returns the length of the dataset"""
114
        return len(self.noshadow_paths)



Declare augmentations and functions for data processing. We take augmentations from the albumentations repository.

Python
 




xxxxxxxxxx
1
59


 
1
def get_training_augmentation():
2
    """ Augmentation for all images, training samples. """
3
    train_transform = [
4
        albu.Resize(256,256),
5
        albu.HorizontalFlip(p=0.5),
6
        albu.Rotate(p=0.3, limit=(-10, 10), interpolation=3, border_mode=2),
7
    ]
8
    return albu.Compose(train_transform)
9

          
10

          
11
def get_validation_augmentation():
12
    """ Augmentation for all images, validation/testing samples """
13
    test_transform = [
14
        albu.Resize(256,256),
15
    ]
16
    return albu.Compose(test_transform)
17

          
18

          
19
def get_image_augmentation():
20
    """ Augmentation for images only (not for masks). """
21
    image_transform = [
22
        albu.OneOf([
23
          albu.Blur(p=0.2, blur_limit=(3, 5)),
24
          albu.GaussNoise(p=0.2, var_limit=(10.0, 50.0)),
25
          albu.ISONoise(p=0.2, intensity=(0.1, 0.5), \
26
                        color_shift=(0.01, 0.05)),
27
          albu.ImageCompression(p=0.2, quality_lower=90, quality_upper=100, \
28
                                compression_type=0),
29
          albu.MultiplicativeNoise(p=0.2, multiplier=(0.9, 1.1), \
30
                                   per_channel=True, \
31
                                   elementwise=True),
32
        ], p=1),
33
        albu.OneOf([
34
          albu.HueSaturationValue(p=0.2, hue_shift_limit=(-10, 10), \
35
                                  sat_shift_limit=(-10, 10), \
36
                                  val_shift_limit=(-10, 10)),
37
          albu.RandomBrightness(p=0.3, limit=(-0.1, 0.1)),
38
          albu.RandomGamma(p=0.3, gamma_limit=(80, 100), eps=1e-07),
39
          albu.ToGray(p=0.1),
40
          albu.ToSepia(p=0.1),
41
        ], p=1)
42
    ]
43
    return albu.Compose(image_transform, additional_targets={
44
        'image1': 'image',
45
        'image2': 'image'
46
    })
47

          
48

          
49
def get_preprocessing():
50
    """ Preprocessing """
51
    _transform = [
52
        albu.Lambda(image=to_tensor, mask=to_tensor),
53
    ]
54
    return albu.Compose(_transform)
55

          
56

          
57
def to_tensor(x, **kwargs):
58
    """ Converts the image to the format: [channels, width, height] """
59
    return x.transpose(2, 0, 1).astype('float32')



In the next and final part, we start training.

neural network Network

Opinions expressed by DZone contributors are their own.

Related

  • How to Port CV/ML Models to NPU for Faster Face Recognition
  • Unsupervised Learning Methods for Analyzing Encrypted Network Traffic
  • Understanding Neural Networks
  • Understanding the Basics of Neural Networks and Deep Learning

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook