Decoding Habitat Photos

Decoding Habitat Photos#

In this lesson, we will apply Detectron2, a deep learning model that has already been trained on 200,000+ everyday images, to classify objects in the citizen science images (like container type).

First, go to Edit -> Notebook settings -> Hardware accelerator and set it to T4 GPU. This should improve the speed of the code.

Install Detectron2 to Google Colab (this will not download anything to your computer and will be deleted once you close the browser). Note that this may take a few minutes to run.

%%capture

!python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'

import torch, detectron2

# Common libraries
import os
import pandas as pd
pd.set_option("display.max_columns", None)
from skimage import io
import geopandas as gpd
from datetime import datetime
from google.colab.patches import cv2_imshow

# Dataset preparation and loading
from detectron2.data.datasets import register_coco_instances
from detectron2.data import DatasetCatalog, MetadataCatalog

# Visualization
from detectron2.utils.visualizer import Visualizer
from detectron2.utils.visualizer import ColorMode

# Configuration
from detectron2 import model_zoo
from detectron2.config import get_cfg

# Training and evaluation
from detectron2.engine import DefaultTrainer
from detectron2.engine import DefaultPredictor

First, we need to get the photos. These are found as urls in the mosquito dataset downloaded in Chapter 1. There are multiple photos that could be submitted: AbdomenCloseupPhotoUrls, LarvaFullBodyPhotoUrls, and WaterSourcePhotoUrls. For this exercise, we will focus on WaterSourcePhotoUrls.

mosquito = gpd.read_file('https://github.com/geo-di-lab/emerge-lessons/raw/refs/heads/main/docs/data/globe_mosquito.zip')

def detect_image(url):
  image = io.imread(url)

  # Define model
  cfg = get_cfg()
  cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"))
  cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5
  cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml")

  # Estimate what objects are in the image
  predictor = DefaultPredictor(cfg)
  outputs = predictor(image)

  # Visualize the results
  visualizer = Visualizer(image[:, :, ::-1], MetadataCatalog.get(cfg.DATASETS.TRAIN[0]), scale=1.2)
  out = visualizer.draw_instance_predictions(outputs["instances"].to("cpu"))
  cv2_imshow(out.get_image()[:, :, ::-1])

# Get the photo links from the GLOBE observations
entry = mosquito.dropna(subset=['WaterSourcePhotoUrls']).iloc[0:10]

print(entry['WaterSourcePhotoUrls'].values)

detect_image('https://data.globe.gov/system/photos/2024/12/31/4294551/original.jpg')

../../_images/4ee94a40a8e6a7f3324b409486321d76481d9690852c818f6c545100fd014189.png

detect_image('https://data.globe.gov/system/photos/2024/12/30/4293975/original.jpg')

../../_images/0340ccd0adfec9d99f22cd3f56e20381d3f1a83a55d19cc431da876c3ee9d348.png

It seems that the model correctly identified the objects in these images. In the second image, the object was identified as both a bowl and sink, showing how the model may have been confused for this object.

detect_image('https://data.globe.gov/system/photos/2024/12/31/4295325/original.jpg')

../../_images/a79f0577fc3462c889f43fdc7a0ae52da8fdda31a5e23b771bbb515f6908a52e.png

detect_image('https://data.globe.gov/system/photos/2024/12/31/4294112/original.jpg')

../../_images/26091d4a7dccc9d0f66682bc8538a9142c6546193ad3b9a118d203e2d603633a.png

The classification was worse for these two images, with both falsely identified as toilets.

detect_image('https://data.globe.gov/system/photos/2024/12/29/4292139/original.jpg')

../../_images/eb5d2e97e7f0384fb977214632ae78337e1473e2809168cab67d4bede0bcb351.png

For this image showing trash and grass, the model did not make any classifications within the image. The model likely could not pinpoint any clear object to classify.

We’ve explored the use of an open-source computer vision model to identify objects in the images submitted to GLOBE Observer showing water sources where mosquitoes may be found. The benefits of such computer vision models is saving the time it would take to manually review and classify all images; however, as we have seen in some of the images, the classifications are not always accurate. It is important to consider this trade-off, as well as consider methods such as training the model on our own pre-labeled data to improve results. There are also multiple other open source models that can perform similar tasks of classifying objects that may be better than Detectron2 for these water source images.

References