Create an image annotation app with Tensorflow, Flask and ReactJS

Reading Time: 5 minutes

Hello everyone. It’s been a while. In the past month, I have been spending some time developing an application called AnnoMachine and I think it’s time to tell you guys about it.

Overview

Okay, first thing first. What is AnnoMachine all about and what is it capable of? Well, an image is worth a thousand words, and a video is worth even more than that 😉 Let me show you a clip real quick:

As you can see, AnnoMachine is an application that can help us create image annotations with the least effort. It fulfills that mission by:

  • Running a deep learning model under the hood to get a rough estimation of the annotation
  • Providing a user-friendly interface which user can easily visualize and interactively update the annotation

So, I hope that at this point, you guys already have an idea about AnnoMachine and what it does. For the rest of this post, I will talk about why I created this app, how I designed and what stack I used to implement it.

This post is not supposed to be a tutorial but a gentle introduction about AnnoMachine, since visiting every little detail would make it ridiculously long (or even become a book!). Instead, I made it open-source at this repo and you can contact me directly for any questions about the implementation.

Okay, let’s dive in.

Why AnnoMachine?

So, why did I create AnnoMachine in the first place?

As a guy who has been working in the field for three years, I came to realize that the majority of AI engineers/practitioners (myself included) tend to overestimate the role of deep learning within the production pipeline.

Let me explain what I mean. Suppose that we want to ship a chatbot app which may require a bunch of tasks to be done. Below is how we likely think the process should be:

That looks intriguing, right? Some new paper got published and we can’t help trying to implement it ourselves, or waiting for other folks to release the code so that we can play with it.

While that’s good to encourage researchers to create more innovatory ideas and attract more people into the field, it may trick us into thinking that deep learning modeling & training skills are so crucial that nothing else really matters.

Well, I really don’t want to disappoint you but below is what it actually looks like in reality:

Yeah, reality is often disappointing. There are a lot of reasons for that, but I think the most important one is: in order to get used widely and bring home some revenue (not even profit), products should be delivered as some kind of service or application. No one would want to use something that they would have to wake it up from a terminal! Furthermore, in order to ship a fancy-looking app fast, a lot of effort must be poured into “something else” other than deep-learning models.

So, I decided to create AnnoMachine to demonstrate to you a real-life example of how AI can be applied and hopefully, it can drive more attention to AI in production.

Application structure and development tools

Okay, finally it’s time to talk about technology. Here I’m gonna talk about how the app was constructed and what dev stack I used to implement it.

Pages

From the user perspective, I designed AnnoMachine to have to main pages for two main use cases:

  • Home page

This is where users can view all images that were uploaded. Each image contains basic information like what object it may contain or when and by whom it was uploaded. Logged in users can upload their own local image or type in some image’s URL they found on the Internet. That was quite common, right?

  • Annotation Playground page

This is where all the fun begins and in fact, this was the thing that popped up into my mind when I first played with object detection models back in the day (say, Faster R-CNN/YOLO/SSD). So, users will be able to: toggle on and off any bounding box they wish to visualize, add or remove boxes and most importantly, they can interactively modifying the box by simply dragging the mouse around and download the updated annotation file eventually!

Of course, the app also contains two more pages for the user to register and log in which again, is quite common for a web application.

Architecture & development stack

Let’s talk about high-level design. AnnoMachine was created using microservice architecture. Each part of the app is considered a separate component and can be contained within a Docker container that talks to others via HTTP calls or through Web Sockets, etc. For this project, I divided the app into four services:

  • Backend

The backend actually consists of two smaller parts: the object detection model and a RESTful API server on top of it.

For the detection model, I decided to go with the paper called Single Shot Multibox Detector or SSD for short. SSD’s network is fairly light-weight and I used Tensorflow 2.0 to implement. And because the detection model is implemented in Python, it would make more sense to choose a Python framework to create the RESTful API. I simply chose Flask. Django is way overkill for this project.

  • Frontend

This is the part that I love the most. I would prefer a decent-looking app over a well-coded app with no user-friendly interface (which is the reason I created AnnoMachine anyway).

Since I’m a React guy so using React for this project is the first thing that came to my mind. In fact, I thought of trying out Vue but learning Vue and possibly trying to fix tons of bugs along the way would make this project last longer than I expected, so Vue is for another time.

  • Database

I didn’t have any specific constraints when choosing which database to go with, as long as it’s not SQLite. I used to use MySQL in my previous job so I decided to give PostgreSQL a shot. With the help of SQLAlchemy, choosing MySQL or PostgreSQL is not something that really matters.

  • Webserver

Finally, I needed a web server. There is a lot to talk about what it is and why we need to have one so please give this page a good read later on. Among the options out there, I decided to go with Nginx (pronounced as “engine X”). I also kept the config as simple as possible, so that I could focus entirely on implementing the backend and frontend.

Here is a graph illustrating the app’s architecture and what communications are made between services:

Deployment

I used docker, docker-compose, and docker-machine to deploy both on my local machine and on AWS EC2 instance. The app can runs flawlessly on a single t2.micro instance 😉

The steps to get everything up & running can be found on my repo. Play with the code and if you spot anything wrong, please let me know. I appreciate your help.

Final words

And that’s it. In today’s post, I have introduced to you about AnnoMachine, what it’s all about and why/how I created it. Feel free to play with the code and modify it in any possible way you want. Don’t forget to share your results with me. Once again, thank you all for your time! I’ll see you in the next post.

Trung Tran is a Deep Learning Engineer working in the car industry. His main daily job is to build deep learning models for autonomous driving projects, which varies from 2D/3D object detection to road scene segmentation. After office hours, he works on his personal projects which focus on Natural Language Processing and Reinforcement Learning. He loves to write technical blog posts, which helps spread his knowledge/experience to those who are struggling. Less pain, more gain.

Leave a reply:

Your email address will not be published.