harpia

WikiAves is the largest bird cataloging website in the world with 3 millions bird audios and pictures, taking a considerable amount of human resource to correct misclassified information.

To aid users with the selection of the correct bird species in the uploaded photos and audios, I created and deployed models for the tasks of image and audio classification using TensorFlow.

birding

Birding is a hobby where people go outdoors to observe or photograph birds. During the 2020's pandemic the hobby gained a lot of popularity, given that during the lockdown people had less options of places to go, and they would more often go outdoors in non crowded spaces.

Back then, I was living with my brother, which already had this hobby for a longer time. He proposed we would go for walks in the forests and practice it. One day when we were home, he showed me the website WikiAves.

wikiaves

Wikiaves is a simple website with a forum and a lot of beautiful user uploaded photos of birds, categorized in 1000+ species.

Since I had this big interest in Deep Learning and wanted to learn more about it, I decided to go through every species in the website, and download 20 photos from each to try and train a model that classifies birds.

After downloading the photos, I preprocessed the images, and used a pre-trained computer vision model to train a simple model to classify brazilian bird species.

The model worked fairly well, which made my brother stoked and he decided to contact the WikiAves website owner to share it, and the owner was equally excited about it! - and didn't mind me having scraped thousands of photos from his portal.

He hired me to train a better model, and he created a task force involving some of the most experienced birdwatchers from Brazil, to clean the training data.

This task took months, they created an online panel so that it could be done collaboratively, and even subdivided some of the species between gender and age - creating the cleanest brazilian birds dataset in the world.

deep learning

With this dataset in hands, I set out to train a bigger and better model for his website. It was a long process of experimentation, but in the end I converged to a pipeline that during training did:

Various types of preprocessing with the data - cutmix, mixup, rotation
Trained a EfficientNetB3 pre-trained model
Had a Learning Rate Scheduler, together with EarlyStopping and ReduceLROnPlateau
Used Sigmoid Focal Cross Entropy as the loss function
Used Adam as Optimizer
Trained using Google TPU

The result was a really great model that was very accurate with something around 93% accuracy rate across all species combinations.

Prototype of the mobile app created as an interface for the model.

location

Besides the image input, WikiAves also has access to the location data, which could be used to immediately discard many suggestions that might be visually similar.

The data was a list of coordinates where each species had been seen, so I tried to imagine what the probability distribution function should look like, and tried to find an algorithm that could match my intuition.

There are many fancier biological algorithms, that take terrain and biome into account, but wanting to do something simple, I chose to train and tune an One Class SVM for each species.

Given each trained model, and a grid with all the coordinates in Brazil, I generated a pre computed list of probability that a bird could be seen in a given coordinate.

final model

Using the models, I manually created an equation that would weight in each model, also based on my intuition of how it should be - the visual result should always have more weight than location. Creating this ensemble with the Deep Learning and the SVM models, brought the prediction accuracy to 95+%.

Since it's a relatively small model, I optimized the Deep Learning model using TFLite, and deployed it using AWS Lambda, where it runs with no GPU acceleration producing responses in less than 2s - an acceptable latency for the low cost and use-case.

Nowadays any user can use the model, and this has caused the number of mistakenly identified birds to drop over 50%.

audio classification

After this, the next natural step was classifying birds by their sound. The funny thing about this task, is that it's that the model itself is pretty much the same as the one used for image classification.

The catch is that sound waves can be converted to spectrograms, and then the problem of classifying an audio, becomes a problem of classifying an image. This method many times produce better results than trying to work with the soundwave data itself.

Although training some models to do the audio classification, this part was never fully finished. The lockdown was already over, and unfortunately I had to deal with other tasks and my professional career. Besides loving the technical part, it was very fun to know more and work with the birding community members.