Disclaimer: Read the section “Obtaining an image dataset” before you start the project. If you are unfamiliar with neural networks, start with this project instead!

Introduction

This project presents the task of predicting a person’s BMI through facial recognition. In this tutorial, you will learn:

  • how to process images as data inputs
  • how to develop a neural network using Tensorflow and Keras
  • how image data augmentation works and why it is useful in machine learning
  • how to save and load the trained model into the Pixetto

If you do not have prior knowledge with Tensorflow or Keras, do not worry! I will explain the coding syntax as I go.

Project Overview

This project requires Pixetto to detect a person’s face and to output a numerical BMI value. We could approach this project as a classification problem where it classifies images into “BMI categories”, but the interesting part is the BMI prediction. Therefore, this makes our task a regression problem, meaning we have to write our own Python program. Nice!

Because our inputs are images, we will also have to use a specific type of neural network called Convolutional Neural Networks (CNNs). CNNs are especially useful for machine learning tasks because they are much more accurate than the now outdated multilayer perceptron.

If you need a review on CNNs, here is a video I found very useful, alongside most of the other content this channel uploads regarding machine learning.

Obtaining an image dataset

Facial recognition to predict BMI levels is a fairly novel task in the health and artificial intelligence interdisciplinary field, so there exists only a few public datasets. I obtained mine from this paper:

A. Dantcheva, P. Bilinski, F. Bremond, “Show me your face and I will tell you your height, weight and body mass index,” Proc. of 24th IAPR International Conference on Pattern Recognition (ICPR), (Beijing, China), August 2018.

Please refer to this website for information regarding the dataset and how to obtain it. Once you obtain the dataset, we can start writing the Python program!

(You don’t have to read the paper because it is quite complicated, but feel free to digest the information if you want)

Setting up the Python program

The following steps require Python and the pip package installer, so follow various installation guides online if you don’t have them installed!

Python is an extremely versatile language and offers a ton of packages. For this specific project, we need to install these packages:

pip3 install pandas
pip3 install numpy
pip3 install sklearn
pip3 install tensorflow
pip3 install keras

At the start of our Python file, write in these lines of code:

import pandas as pd 
import numpy as np 
from sklearn import model_selection 
from tensorflow import keras  
from keras.preprocessing.image import ImageDataGenerator, load_img, img_to_array 
from keras.models import Sequential

We are now done with setting up our project, and can move onto processing our image data.

Reading and processing our dataset

Apart from the images in the data folder, we also need the numerical data, specifically the image file names and their respective BMI values, from annotation.csv in order to properly train our neural network model. To read this data, pandas offers a simple method that we can use to indicate which columns of data we only want.

Once we read in the dataset, we have to go through the dataset and insert each image and its respective BMI into an expected input and output list and turn those lists into numpy arrays. The details of what these arrays actually contain are not so important, but know that they contain a bunch of RGB values ranging from 0 to 255. The code is written below:

# reading data from the .csv file
# usecols is a parameter that allows us to specify which columns of data we want
dataframe = pd.read_csv('.../BMI Classifier/vip/annotation.csv',
                        usecols=['image', 'BMI']).values
X = []  # the features, or inputs
y = []  # the labels, or outputs

# extracting the inputs and the outputs, aka our images and their corresponding BMI values
for row in dataframe:
    image_name = row[0] + '.jpg'
    image_bmi = row[1]
    # resizing our original images to 256x256, and turning them into numpy arrays
    image = load_img('.../BMI Classifier/vip/data/' + image_name, target_size=(128, 128))
    input_arr = img_to_array(image)
    input_arr = np.array(input_arr)
    X.append(input_arr)
    y.append(image_bmi)

# turning the input and output lists into numpy arrays
X = np.array(X)
y = np.array(y)

Developing and training our neural network model

Great! At this point, we should have successfully extracted the data we need to throw into our neural network. All there is left to do is to develop the model and train it. Before we actually start coding anything, there are several key ideas that should be addressed.

A neural network only works if you train it properly, and you can get unintended results if not. A good rule of thumb is to take the dataset and split it into a training dataset and a testing dataset. Doing this allows your model (and yourself) to see if it actually trained properly since the BMI values of the testing dataset images are known. It is also very important to normalize/scale your data because the network can only read inputs between 0 and 1; it would take something much powerful than a CNN to be able to detect patterns if the inputs range from 0 to 255.

# obtaining the training and testing dataset
# test_size=0.2 means 20% of the X and y dataset becomes X_test and y_test, respectively
X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.2)

# normalizing the image array inputs
X_train = X_train.astype('float32') / 255
X_test = X_test.astype('float32') / 255

Lastly, we write out the model, give it our training datasets, and test our trained model. The paper I referred to above used mean squared error as a way to determine the efficacy of their neural network model, so the same is used here. Don’t worry if you don’t understand the syntax of the CNN setup; feel free to refer to the article I linked to in the disclaimer if you need a refresher!

You might notice the ImageDataGenerator variable here, and its call to the method flow(…). This ImageDataGenerator is used here because it is one of the most efficient ways to artificially increase the number of image inputs we give to the model. By increasing the number of images, the model is able to acquire more data to train with. Think of us getting more math problems to work on, resulting in our better understanding of a certain topic.

How the number of images are increasing is simple. An image that looks slightly rotated or zoomed in is an entirely new data input for a network. Those variables in the ImageDataGenerator parenthesis (rotation_range, width_shift_range, etc.) tells our ImageDataGenerator to do precisely that. It takes our original images, produces many copies of slightly altered images, and gives us “new” data to give to our network. It is important to note, however, that rather than providing these altered images alongside our original images, the ImageDataGenerator is used to replace our original images with tons of altered images.

# The setup for a convolutional neural network (CNN)
model = Sequential()
model.add(keras.Input(shape=(128, 128, 3)))
model.add(keras.layers.Conv2D(filters=8, kernel_size=3, padding='same', activation='relu'))
model.add(keras.layers.MaxPooling2D(pool_size=2))
model.add(keras.layers.Conv2D(filters=8, kernel_size=5, padding='same', activation='relu'))
model.add(keras.layers.MaxPooling2D(pool_size=2))
model.add(keras.layers.Conv2D(filters=16, kernel_size=5, padding='same', activation='relu'))
model.add(keras.layers.MaxPooling2D(pool_size=2))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(64, activation='relu'))
model.add(keras.layers.Dense(1, activation='relu'))

# Training the CNN with our augmented training dataset
data_gen = ImageDataGenerator(
    rotation_range=20,
    width_shift_range=0.2,
    height_shift_range=0.2,
    horizontal_flip=True
)
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(data_gen.flow(X_train, y_train, batch_size=16), epochs=8)

# Saving the model to allow for faster testing in the future
model.save('wherever you want to save the model .pb file')

# Testing the CNN with our testing dataset
results = model.evaluate(X_test, y_test, verbose=0)
print('Mean squared error = ', results)

# Testing the CNN with your own picture
img = load_img('any image you want to use', target_size=(128, 128))
img_array = np.array(img_to_array(img))
img_array = img_array.reshape(1, 128, 128, 3)
img_array = img_array.astype('float32') / 255

prediction = model.predict(img_array)
print('BMI prediction is:', prediction)

If you want to run this model through the Pixetto, in the code above you should have saved a .pb file on your computer somewhere. The Pixetto is able to take in a .tflite file in the Pixetto Studio software, and you can read more about how to do so here. Here is the code that can convert a .pb file into a .tflite file:

import tensorflow as tf

converter = tf.lite.TFLiteConverter.from_saved_model('wherever you saved the .pb file')
tflite_model = converter.convert()

with tf.io.gfile.GFile('model.tflite', 'wb') as f:
    f.write(tflite_model)

Conclusion

If the mean squared error is high, don’t worry; it is expected for multiple reasons.

  1. Our database, even after image data augmentation, would still be considered too small. There is only so much automatic image data augmentation we can do due to performance reasons (i.e your computer might slow down if we tell it to produce hundreds of thousands of additional images).
  2. The images provided by the researcher is not racially balanced; it is predominantly white and will definitely affect the BMI prediction if one was to give the model a picture of an Asian or African American person.
  3. Our CNN model might be too simple to perform the task of BMI prediction solely through face recognition. The model the researchers used was a 50-layer Residual Network, something far more advanced than a CNN.

If anyone wants to improve this project even further, these issues would be the first ones to tackle first.

Author | Phillip Wei is currently a sophomore at Northeastern University’s Khoury College of Computer Sciences. He is pursuing a B.S. degree in Computer Science and Biology and is interested in exploring more interdisciplinary opportunities regarding Computer Science and Biology. Phillip also enjoys learning about new technologies in PC building and performing/listening to all sorts of music