Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Intelligent Retail Checkout With an Android App

DZone's Guide to

Intelligent Retail Checkout With an Android App

Ever wanted an easier way to check out at the store? Read this article in order to look at an intelligent self-checkout system embedded in a mobile app.

· AI Zone ·
Free Resource

Insight for I&O leaders on deploying AIOps platforms to enhance performance monitoring today. Read the Guide.

In many retail markets and supermarkets, it is common to see different self-checkout systems to register the purchase or consult the prices for the final consumer. Many of these systems are based on barcodes, RFID tags, or QR code. This article proposes an Intelligent Self-Checkout System embedded in a mobile app not only to detect multiple products without any labels but also to display a list including each one of the prices and the total cost of the purchase.

Architecture Description

In the next figure, we provide a high-level functional description overview.

Image title

The workflow is:

  1. The user takes a picture of his/her shopping cart.
  2. Object detection with a neural network (YOLO V.3) to detect bottles, boxes, etc.
  3. The images of each object are extracted from the photo.
  4. Transfer learning is used from SqueezeNet to ConvNet to classify the product in each sub-image and to build an ID products list.
  5. Finally, the cost of the purchase is retrieved through a product prices list and all of the information in the Android mobile app referring the detected products is deployed.

A Google Cloud instance is used as a server to upload the images, to run all the Artificial Neural Network models, and to send a product prices list to the Android mobile APP.

Image Detection Through YOLO V3

You Only Look Once (YOLO) is an artificial neural network used for object detection. It is trained with the ImageNet 1000 class classification dataset in 160 epochs.

In the training part, YOLO is used to detect objects in the images of our dataset and to generate a new dataset, only with the objects' images. The dataset result is used to train the classification system. The process is shown in the next piece of code.

from darkflow.net.build import TFNet
from time import time as t
options = {"model": "cfg/yolo.cfg", "load": "bin/yolo.weights", "threshold": 0.3}
tfnet = TFNet(options)

def get_objet(imgcv):
    t_init = t()
    JsonRespond = {"deteccionImg": []}
    result = tfnet.return_predict(imgcv)
    for i in result:
        if  i["label"] == "bottle":
            deteccionImg = {}
            classId = i["label"]
            score = i["confidence"]

            xmin = int(i["topleft"]["x"])
            ymin = int(i["topleft"]["y"])
            xmax = int(i["bottomright"]["x"])
            ymax = int(i["bottomright"]["y"])

            deteccionImg["img"] = imgcv[ymin:ymax, xmin:xmax, :]
            #save image detect
            cv2.imwrite(classId+str(score)+"_.png", deteccionImg["img"])

import os 
for k in os.listdir("./Dataset"):
    if k.split(".")[1] == "png" or k.split(".")[1] == "jpg":
        img = cv.imread("./Dataset/"+k)
        get_objet(img)

Transfer Learning for Classification of Objects

Transfer learning is a deep learning technique, which allows you to use pre-trained ConvNet models either as an initialization or as a fixed feature extractor for the task of interest. These models are trained with a very large dataset (e.g. ImageNet, which contains 1.2 million images with 1000 categories).

We use SqueezeNet as a fixed feature extractor. SqueezeNet is a pre-trained ConvNet model used to detect the predominant object in an image. In the next image, the neural network architecture is shown.

Image title

Layers "conv1" to "fire9" were taken. The following code shows this process.

import numpy as np
from keras_squeezenet import SqueezeNet
from keras.applications.imagenet_utils import preprocess_input, decode_predictions
from keras.preprocessing import image
from keras.models import Model
from keras.layers import MaxPooling2D
from keras.layers import AveragePooling2D

model = SqueezeNet()
outputs = model.get_layer("drop9").output
outputs = MaxPooling2D()(outputs)
intermediate_layer_model = Model( inputs=model.input, outputs=outputs)

A Convolutional Neural Network (ConvNet) is added to perform the classification. The ConvNet architecture is shown in the next code.

from keras.applications.inception_v3 import InceptionV3
from keras.preprocessing import image
from keras.models import Model
from keras.layers import Dense, GlobalAveragePooling2D
from keras import backend as K
from keras.optimizers import Adam
from keras.optimizers import Adadelta

from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import Flatten
from keras.layers import Conv2D

from keras.models import Sequential
from keras.layers import Dense, Activation

Opti = Adam(lr=0.00001, beta_1=0.9, beta_2=0.999)

model2 = Sequential([
    Conv2D(200,3 , strides=(1, 1),padding = 'same',input_shape=(6, 6, 512,)),
    Activation('relu'),
    Flatten(),
    Dropout(0.2),
    Dense(500),
    Activation('tanh'),
    Dropout(0.2),
    Dense(50),
    Activation('tanh'),
    Dropout(0.2),
    Dense(5),
    Activation('tanh'),
])

model2.compile(loss='mean_squared_error', optimizer=Adadelta(), metrics=['accuracy'])

In the training part, the dataset generated with YOLO is loaded and the SqueezeNet model is evaluated to obtain the corresponding output for each image.

import time 
import os
import cv2
from keras.utils import to_categorical
from sklearn.cross_validation import train_test_split
from sklearn.utils import shuffle
from scipy.sparse import coo_matrix

ficture = []
XD = []
YD = []
cta = 0
for k in os.listdir(PathClass):
    for i in os.listdir(PathClass"/"+k):
        img = image.load_img(PathClass+"/"+k+"/"+i, target_size=(227, 227))
        x = image.img_to_array(img)
        x = np.expand_dims(x, axis=0)
        x = preprocess_input(x)
        time_ini = time.time()
        preds = intermediate_layer_model.predict(x)
        time_fin = time.time()
        XD.append(preds)
        YD.append(cta)
    cta+=1    
    print(k)
XD = [i[0] for i in XD]
YD = to_categorical(YD)
YD = [ ((i+1)*2)-3 for i in YD ]

Then, the SqueezeNet output is used along with the labels (Mayonesa (Mayonnaise), Coca-Cola, Catsup, Activia, None) to train the ConvNet and to fill XD and YD respectively. The following images show the confusion matrix.

Image title

Backend on Python

The backend code is separated into two Python scripts (Detection.py and SmartRetail.py). Detection.py has all the predictions from each model (YOLO, SqueezeNet, and ConvNet) and SmartRetail.py is the service to construct the purchase ticket.

Mobile App

The mobile app is developed in Cordova. Cordova camera plugin is used to take a picture from the smartphone and the Cordova file transfer plugin to send the image to the server. To deploy the price list sent by the server, JQuery Mobile is used. The next image shows the mobile APP views.

Image title

TrueSight is an AIOps platform, powered by machine learning and analytics, that elevates IT operations to address multi-cloud complexity and the speed of digital transformation.

Topics:
image analysis ,retail ,artifical intelligence ,machine learning ,deep leaerning ,image recognition ,ai

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}