Computer Vision and Machine Learning in Android

Welcome back! In this tutorial we will be looking into the wonders of Android app development, specifically regarding computer vision and machine learning in Android. We will work in Java and with Google’s CameraX and ML Kit API, which allows us to implement a camera and load a custom .tflite model in our app. This app will work on most Android devices, but as a bonus we will use VIA’s newest embedded board VAB-950.

The full source code is in my GitHub, so feel free to check it out if you want!

Note: The CameraX API is relatively new, so it is susceptible to change. If necessary, please refer to the documentation for the most up-to-date information.

Short Introduction to Android, CameraX API, and ML Kit API

If you have not experienced Android app development before, this section will be useful. If you have, feel free to skip ahead to the Setup section.

In Android app development, the coding environment is split into two areas: the Activities and the Layouts. The Activities are what defines the main logic of the app, how different components work to make the app function as expected. The Layouts, as the name itself suggests, composes of “Views” that make up the app’s final design. In a fresh project, Android Studio should automatically create a MainActivity.java and an activity_main.xml file, the two main files that we will be working with. More complicated apps would require more than one Activity and xml file, but we would not have to worry about that.

The paragraph above is just a VERY tiny description of what Android app development is, and even I don’t know everything there is to it. If you want to learn more before continuing this project, feel free to check out this tutorial.

The CameraX API is the newest camera API from Google. It builds off of its predecessor, Camera2, and enables a much simpler camera app development process. What fundamentally makes CameraX more user-friendly is its use cases, mainly the preview, image analysis, and image capture; syntactically they keep our code clean and organized, something very helpful for beginner Android developers like you and me! CameraX also offers other functionalities such as HDR and night mode, but we won’t be needing those. Lastly, CameraX should work on the majority of Android devices, but keep in mind that it does not support USB cameras yet.

The ML Kit API allows developers to deploy machine learning apps efficiently in conjunction with CameraX. Because there are so many possibilities for machine learning projects, we will be following this straightforward tutorial on loading a custom object classification .tflite model into our app. In a way, our app can also serve as a template for any machine learning ideas you might have!

Setup

Before we begin, we must set up our Android environment by downloading the Java Development Kit (JDK) and Android Studio. These two videos (for Windows and Mac) are excellent tutorials for setting up Android Studio in your respective OS. Note that there could be many potential errors (more than I can explain about here) you may run into while setting this up, so you might have to search how to solve them as you go. When you create a new project, make sure to set the minimum SDK to API 21; it is the minimum SDK supported by the CameraX API.

Hopefully that did not cause much trouble! Now, there are two more tasks left for us to do before we start coding the app: changing our build.gradle and main_activity.xml file. In your module-level build.gradle file,

add these lines of code under the android and dependencies section:

android {

    ...

    aaptOptions {
        noCompress "tflite"
    }

    ...
}

dependencies {

    ...

    // CameraX API
    def camerax_version = '1.1.0-alpha02'
    implementation "androidx.camera:camera-camera2:${camerax_version}"
    implementation "androidx.camera:camera-view:1.0.0-alpha21"
    implementation "androidx.camera:camera-lifecycle:${camerax_version}"

    // Object detection & tracking feature with custom model from ML Kit API
    implementation 'com.google.mlkit:object-detection-custom:16.3.1'
}

Great! We will now add code to our activity_main.xml file (they will go inside the pre-existing ConstraintLayout section). In order to do that, when you click open your xml file, you will see three options near the top right: Code, Split, and Design.

Click on Code and paste this XML code in:

<androidx.constraintlayout.widget.ConstraintLayout xmlns:android="http://schemas.android.com/apk/res/android"
    xmlns:app="http://schemas.android.com/apk/res-auto"
    xmlns:tools="http://schemas.android.com/tools"
    android:layout_width="match_parent"
    android:layout_height="match_parent"
    tools:context=".MainActivity">

    <androidx.camera.view.PreviewView
        android:id="@+id/previewView"
        android:layout_width="match_parent"
        android:layout_height="584dp"
        app:layout_constraintEnd_toEndOf="parent"
        app:layout_constraintStart_toStartOf="parent"
        app:layout_constraintTop_toTopOf="parent">

    </androidx.camera.view.PreviewView>

    <TextView
        android:id="@+id/resultText"
        android:layout_width="wrap_content"
        android:layout_height="wrap_content"
        android:layout_marginTop="32dp"
        android:textSize="24sp"
        app:layout_constraintEnd_toEndOf="parent"
        app:layout_constraintStart_toStartOf="parent"
        app:layout_constraintTop_toBottomOf="@+id/previewView"
        app:layout_constraintHorizontal_bias="0.498" />

    <TextView
        android:id="@+id/confidence"
        android:layout_width="wrap_content"
        android:layout_height="wrap_content"
        android:textSize="24sp"
        app:layout_constraintBottom_toBottomOf="parent"
        app:layout_constraintEnd_toEndOf="parent"
        app:layout_constraintStart_toStartOf="parent"
        app:layout_constraintTop_toBottomOf="@+id/resultText"
        app:layout_constraintHorizontal_bias="0.498"
        app:layout_constraintVertical_bias="0.659" />


</androidx.constraintlayout.widget.ConstraintLayout>

A PreviewView is what CameraX provides to allow our app to display a camera “preview”. A ConstraintLayout (the default layout in your XML file’s Component Tree) allows the developer to “constraint” certain Views in the same place regardless of which Android device is being used. A TextView are just Views that display text in an app. In our app, the two TextViews will help us display the results of our machine learning inference.

As for the VAB-950, all you need is a micro-USB to USB-A cable and a CSI camera.

It’s Coding Time

At this point, you might be wondering: “How does the XML and MainActivity file connect together?” There is a fundamental method in Android called findViewById. As its name suggests, it allows us to instantiate a View from our XML file as an object. In our case, the start of our code should look like this:

public class MainActivity extends AppCompatActivity {

    private PreviewView previewView;
    private final int REQUEST_CODE_PERMISSIONS = 101;
    private final String[] REQUIRED_PERMISSIONS = new String[]{"android.permission.CAMERA"};
    private final static String TAG = "Anything unique";
    private Executor executor;

    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_main);
        setRequestedOrientation(ActivityInfo.SCREEN_ORIENTATION_PORTRAIT);
        getSupportActionBar().hide();

        previewView = findViewById(R.id.previewView);
        executor = ContextCompat.getMainExecutor(this);
    }
If you go into the XML file and click on previewView, you’ll see on the top right its id is “previewView”. This is what we use to instantiate the PreviewView by way of R.id.previewView. I won’t go into much detail about the Executor, but just know it executes tasks that run on threads (if you don’t know what this means, don’t worry about it). If you have used an Android device before, recall being asked to give the apps you use certain permissions to do certain tasks. The same thing needs to be done here, where we have to ask the user for camera permission as well as making the app run only if that permission is given. First, open up the AndroidManifest.xml file
and add these two lines before the application section:
<uses-feature android:name="android.hardware.camera.any" />
<uses-permission android:name="android.permission.CAMERA" />

then add these lines into MainActivity.java:

public class MainActivity extends AppCompatActivity {

    @Override
    protected void onCreate(Bundle savedInstanceState) {

        ...

        if (allPermissionsGranted()) {
            startCamera();
        } else {
            ActivityCompat.requestPermissions(this,
                    REQUIRED_PERMISSIONS,
                    REQUEST_CODE_PERMISSIONS);
        }
    }


    @Override
    public void onRequestPermissionsResult(int requestCode, @NonNull String[] permissions,
                                           @NonNull int[] grantResults) {
        super.onRequestPermissionsResult(requestCode, permissions, grantResults);
        if (allPermissionsGranted()) {
            startCamera();
        } else {
            Toast.makeText(this, "Permissions not granted by the user.",
                    Toast.LENGTH_SHORT).show();
            finish();
        }
    }


   /**
    * Checks if all the permissions in the required permission array are already granted.
    *
    * @return Return true if all the permissions defined are already granted
    */
    private boolean allPermissionsGranted() {
        for (String permission : REQUIRED_PERMISSIONS) {
            if (ContextCompat.checkSelfPermission(getApplicationContext(), permission) !=
                    PackageManager.PERMISSION_GRANTED) {
                return false;
            }
        }
        return true;
    }

We also want to load our custom .tflite model by adding the following code to our onCreate method:

public class MainActivity extends AppCompatActivity {

    // New field
    private ObjectDetector objectDetector;

    @Override
    protected void onCreate(Bundle savedInstanceState) {
        
        ....

        // Loads a LocalModel from a custom .tflite file
        LocalModel localModel = new LocalModel.Builder()
                .setAssetFilePath("whatever_your_tflite_file_is_named.tflite")
                .build();

        CustomObjectDetectorOptions customObjectDetectorOptions =
                new CustomObjectDetectorOptions.Builder(localModel)
                        .setDetectorMode(CustomObjectDetectorOptions.STREAM_MODE)
                        .enableClassification()
                        .setClassificationConfidenceThreshold(0.5f)
                        .setMaxPerObjectLabelCount(3)
                        .build();
        objectDetector = ObjectDetection.getClient(customObjectDetectorOptions);
    }
}

What the setAssetFilePath method does is it looks into your root directory’s asset folder to see if there exists a .tflite model matching the provided String. Therefore, you would need to add an assets directory by right clicking the app folder –> New –> Directory. From there, you just search for src/main/assets and drop your .tflite file into the generated folder.

This section is pretty self-explanatory, so we will head on towards writing the startCamera method. Note that starting now there will be a lot more syntax that may seem confusing, but it is more important to know generally how the code works than what specifically Java is doing line by line; most of the intricate details are performed under the hood.

The code for the startCamera method is this:
    /**
     * Starts the camera application.
     */
    public void startCamera() {
        ListenableFuture<ProcessCameraProvider> cameraProviderFuture =
                ProcessCameraProvider.getInstance(this);
        cameraProviderFuture.addListener(() -> {
            ProcessCameraProvider cameraProvider;
            // Camera provider is now guaranteed to be available
            try {
                cameraProvider = cameraProviderFuture.get();
                bindPreviewAndAnalyzer(cameraProvider);
            } catch (ExecutionException | InterruptedException e) {
                e.printStackTrace();
            }
        }, executor);
    }

The main part of this code is the ProcessCameraProvider, an object that allows us to bind CameraX use cases to a lifecycle. In our case, the lifecycle would be our app, and opening and closing our app would tell the program to turn on or off those use cases. This lifecycle binding is achieved in the bindPreviewAndAnalyzer helper method, which is where we will finally see how the use cases are coded.

First, the preview use case is pretty simple to implement:

/**
 * Creates camera preview and image analyzer to bind to the app's lifecycle.
 *
 * @param cameraProvider a @NonNull camera provider object
 */
private void bindPreviewAndAnalyzer(@NonNull ProcessCameraProvider cameraProvider) {
    // Set up the view finder use case to display camera preview
    Preview preview = new Preview.Builder()
            .setTargetResolution(new Size(1280, 720))
            .build();
    // Connect the preview use case to the previewView
    preview.setSurfaceProvider(previewView.getSurfaceProvider());
    // Choose the camera by requiring a lens facing
    CameraSelector cameraSelector = new CameraSelector.Builder()
            .requireLensFacing(CameraSelector.LENS_FACING_BACK)
            .build();

    .... 

}

Remember our PreviewView layer in our activity_main.xml? Here, we basically connect the Preview object to our previewView, telling our app to use our PreviewView object as the surface for the camera preview. The CameraSelector object indicates which camera lens to use since most Android devices has a front and rear camera. VAB-950’s firmware is programmed to have a back camera, so we set our CameraSelector to detect for a rear camera via CameraSelector.LENS_FACING_BACK.

Note: If you see that your Preview seems stretched, try using the setTargetAspectRatio method instead of setTargetResolution. This seems to be a common issue with CameraX on certain devices, and you’ll probably run into this if you are using the VAB-950 (It did not happen to me when I ran the final app on my Samsung phone). It is likely the issue still exists even after using setTargetAspectRatio, but there’s nothing we can do to fix it. Google would have to update the API to address this issue. Here is the Preview.Builder documentation.

Next, the image analysis use case is where the core machine learning inference is happening, where it takes image inputs from the Preview and processes it through our custom. tflite model:

/**
 * Creates camera preview and image analyzer to bind to the app's lifecycle.
 *
 * @param cameraProvider a @NonNull camera provider object
 */
private void bindPreviewAndAnalyzer(@NonNull ProcessCameraProvider cameraProvider) {

    ....

    // Creates an ImageAnalysis for analyzing the camera preview feed
    ImageAnalysis imageAnalysis = new ImageAnalysis.Builder()
            .setTargetResolution(new Size(1280, 720))
            .setBackpressureStrategy(ImageAnalysis.STRATEGY_KEEP_ONLY_LATEST)
            .build();

    imageAnalysis.setAnalyzer(executor,
            new ImageAnalysis.Analyzer() {
                @Override
                public void analyze(@NonNull ImageProxy imageProxy) {
                    @SuppressLint("UnsafeExperimentalUsageError") Image mediaImage =
                            imageProxy.getImage();
                    if (mediaImage != null) {
                        processImage(mediaImage, imageProxy)
                                .addOnCompleteListener(new OnCompleteListener<List<DetectedObject>>() {
                                    @Override
                                    public void onComplete(@NonNull Task<List<DetectedObject>> task) {
                                        imageProxy.close();
                                    }
                                });
                    }
                }
            });
}

/**
 * Throws an InputImage into the ML Kit ObjectDetector for processing
 *
 * @param mediaImage the Image image converted from the ImageProxy image
 * @param imageProxy the ImageProxy image from the camera preview
 */
private Task<List<DetectedObject>> processImage(Image mediaImage, ImageProxy imageProxy) {
    InputImage image =
            InputImage.fromMediaImage(mediaImage,
                    imageProxy.getImageInfo().getRotationDegrees());
    return objectDetector.process(image)
            .addOnFailureListener(new OnFailureListener() {
                @Override
                public void onFailure(@NonNull Exception e) {
                    String error = "Failed to process. Error: " + e.getMessage();
                    Log.e(TAG, error);
                }
            })
            .addOnSuccessListener(new OnSuccessListener<List<DetectedObject>>() {
                @Override
                public void onSuccess(List<DetectedObject> results) {
                    String text = "";
                    float confidence = 0;
                    for (DetectedObject detectedObject : results) {
                        for (DetectedObject.Label label : detectedObject.getLabels()) {
                            text = label.getText();
                            confidence = label.getConfidence();
                        }
                    }
                    TextView textView = findViewById(R.id.resultText);
                    TextView confText = findViewById(R.id.confidence);
                    if (!text.equals("")) {
                        textView.setText(text);
                        confText.setText(String.format("Confidence = %f", confidence));
                    } else {
                        textView.setText("Detecting");
                        confText.setText("?");
                    }
                }
            });
}

The image analysis use case is done by the ImageAnalysis and ImageAnalyzer object, where the analyze method from ImageAnalyzer takes in an ImageProxy object. This ImageProxy object is the image input we need to throw into our model, and this is achieved through our processImage helper method. The ObjectDetector we instantiated earlier processes the image input to produce a list of DetectedObjects, which should include the classified object our model thinks is being shown to the camera. To output this result to a user, we simply obtain the two TextViews we created earlier and set them to be the label and confidence of the DetectedObject.

Finally, we bind the preview and image analysis use case to the app life cycle. This step is especially crucial because or else the app wouldn’t even be running the preview or image analysis! Before with the Camera2 API, a developer had to manually override methods such as onResume and onStop to tell the app how to control these use cases. With CameraX, binding these use cases to a lifecycle means opening and closing the app itself controls these use cases without the developer having to manually override methods that achieve the same task.

/**
     * Creates camera preview and image analyzer to bind to the app's lifecycle.
     *
     * @param cameraProvider a @NonNull camera provider object
     */
    private void bindPreviewAndAnalyzer(@NonNull ProcessCameraProvider cameraProvider) {

        ....

        // Unbind all previous use cases before binding new ones
        cameraProvider.unbindAll();

        // Attach use cases to our lifecycle owner, the app itself
        cameraProvider.bindToLifecycle(this,
                cameraSelector,
                preview,
                imageAnalysis);
    }

Phew! That was a lot, but this essentially finishes our application!

Conclusion

This project focuses on object classification, but the ML Kit offers a lot more machine learning functionalities that our simple camera app can build off of. With the VAB-950, you can even look towards developing an embedded/Edge AI project if you choose to! As with anything related to computer science, I struggled as I initially worked on this app as an amateur Android developer. However, with the right amount of dedication and StackOverflow, working through this project will give you a good introduction to Android app development and machine learning in Android. I am certain I definitely learned a lot through this project, and I hope you will, too!

Author | Phillip Wei is currently a sophomore at Northeastern University’s Khoury College of Computer Sciences. He is pursuing a B.S. degree in Computer Science and Biology and is interested in exploring more interdisciplinary opportunities regarding Computer Science and Biology. Phillip also enjoys learning about new technologies in PC building and performing/listening to all sorts of music