Skip to main content

Command Palette

Search for a command to run...

Wrapping up DeepForest Agent with Spatial Analysis

Published
5 min read
Wrapping up DeepForest Agent with Spatial Analysis
S

I am an AI/ML enthusiast with a strong passion for bridging technology and social impact. I love solving complex problems whether it's solving confusion on any AI/ML concept or building LLM systems for real-world applications.

After months of development, the DeepForest Agent with spatial analysis capabilities is now complete. This multi-agent system transforms raw ecological image detection into comprehensive spatial narratives, making complex computer vision results accessible through natural language. Let me walk you through the key features and technical implementation that make this system unique.

The Core Challenge: From Pixels to Ecological Understanding

Traditional object detection gives you bounding boxes and confidence scores. But ecologists need to understand relationships - how wildlife interacts with vegetation, where density patterns emerge, and what spatial arrangements reveal about ecosystem health. That's where our spatial analysis system comes in.

Features

  • Object Detection Summaries
    Provides counts, labels (e.g., bird, tree, livestock), and classification details (alive/dead tree).

  • Confidence-Based Insights
    Separates detections into low, medium, and high-confidence groups, helping the user trust results more.

  • Spatial Distribution Analysis
    Divides the image into a 3×3 grid (northwest, north, etc.) and reports localized density and coverage.

  • Spatial Relationship Queries
    Uses R-tree indexing to find intersecting objects, nearest neighbors, and overlapping detections.

  • Coverage Estimation
    Calculates what percentage of the image is occupied by each object type.

  • Memory of Past Results
    Keeps track of previous analyses, so users can compare new results with previous ones.

Tile Image for Analysis

  • Purpose: Breaks large images into smaller, manageable tiles for efficient analysis with vision-language models (e.g., Qwen-VL).

  • Why: Large images (~9000×6000 px) can exceed memory limits; tiling avoids memory errors and ensures accurate analysis.

  • Primary Method: Uses slidingwindow library to generate overlapping windows, reading from raster files or PIL images.

  • Fallback: If slidingwindow fails, uses DeepForest preprocessing (normalizes image → computes windows → converts tiles to PIL).

Detection Narrative Generator & Spatial Analysis with R-tree

1. Purpose and Features

The DetectionNarrativeGenerator works alongside the DetectionSpatialAnalyzer to convert raw DeepForest detection outputs into natural language narratives for ecological images. Its main goals:

  • Provide comprehensive object summaries, including base and classification labels (e.g., tree → alive/dead).

  • Perform confidence analysis, separating high, medium, and low-confidence detections.

  • Analyze spatial distribution across a 3×3 grid layout of the image.

  • Extract spatial relationships using R-tree indexing (nearest neighbors, intersecting objects).

  • Compute object coverage, showing the percentage of image area occupied by each object type.

2. Core Components

File: rtree_spatial_utils.py

A. DetectionSpatialAnalyzer

  • Uses an R-tree index to efficiently store and query detections with bounding boxes (xmin, ymin, xmax, ymax).

  • Clamps bounding boxes to image dimensions and skips zero-area or invalid detections.

  • Adds extra metadata: centroid_x, centroid_y, area, detection_id.

Key Methods:

  1. add_detections – Adds detection dictionaries to R-tree and stores enriched metadata.

  2. get_grid_analysis – We divide images into nine regions (northwest, north, northeast, etc.) to provide localized insights:

     def get_grid_analysis(self) -> Dict[str, Dict[str, Any]]:
         grid_width = self.image_width / 3
         grid_height = self.image_height / 3
    
         grid_names = {
             (0, 0): "Top-Left (Northwest)", (1, 0): "Top-Center (North)", 
             (2, 0): "Top-Right (Northeast)",
             (0, 1): "Middle-Left (West)", (1, 1): "Center", 
             (2, 1): "Middle-Right (East)",
             (0, 2): "Bottom-Left (Southwest)", (1, 2): "Bottom-Center (South)", 
             (2, 2): "Bottom-Right (Southeast)"
         }
    

    Instead of saying "birds are scattered throughout," the system reports "3 birds in the northwestern region, 2 in the center, concentrated near water sources." This specificity transforms vague observations into actionable data.

  3. _analyze_confidence_categories – Categorizes detections into low, medium, high confidence and computes stats (avg area, min, max, labels). Not all detections are equal. Our confidence analysis separates results into three categories, helping users understand reliability:

     confidence_groups = {
         "High (0.7-1.0)": [],
         "Medium (0.3-0.7)": [],
         "Low (0.0-0.3)": []
     }
    
     for detection in detections_list:
         score = detection.get('score', 0.0)
         if score >= 0.7:
             confidence_groups["High (0.7-1.0)"].append(detection)
         elif score >= 0.3:
             confidence_groups["Medium (0.3-0.7)"].append(detection)
         else:
             confidence_groups["Low (0.0-0.3)"].append(detection)
    
  4. analyze_spatial_relationships_with_indexing – Finds intersections and nearest neighbors using the R-tree for confidence ≥ threshold.

  5. generate_spatial_narrative – Converts intersection and nearest neighbor info into a natural language narrative.

  6. get_detection_statistics – Returns comprehensive statistics: total count, average confidence, size stats, label distribution, confidence distribution.

B. DetectionNarrativeGenerator

File: detection_narrative_generator.py

  • Wraps the spatial analyzer to produce human-readable narratives.

  • Ensures classification handling, e.g., distinguishing alive/dead trees.

  • Incorporates multiple narrative dimensions: overall summary, confidence, spatial distribution, relationships, coverage.

  • This generates narratives like:

    "Overall Detection Summary
    In the whole image, 23 objects were detected with an average confidence of 0.743.
    Object breakdown: 15 trees (8 alive trees, 7 dead trees), 5 birds, 3 livestock.

    Spatial Distribution Analysis
    Northwestern region: 8 objects detected - 5 trees (3 alive, 2 dead), 2 birds
    Central region: 7 objects detected - 4 trees (all alive), 3 livestock

    Spatial Relationships Analysis
    I am 87% confident that, in the northwestern region, 2 birds are intersected around an alive tree at location (245, 123)."

Deployment and Future Vision

The system is now available as a Hugging Face Space (pending PR approval): https://huggingface.co/spaces/weecology/deepforest-agent

Latest Commit: https://github.com/weecology/deepforest-agent/pull/3/commits/641e1396bb47f04f098ab6c28f4693b5d7cf47d7

The roadmap includes two major enhancements:

1. Zoom-Based Analysis Users will click and drag to select image regions for focused analysis

2. Spatial Queries as Tools Integration of spatial analysis directly into the agent workflow

This system bridges the gap between computer vision capabilities and ecological understanding. Instead of technical jargon about bounding boxes and confidence scores, researchers get narratives like:

"The northwestern wetland area shows high bird activity with 5 detections near alive vegetation. Spatial analysis reveals 3 birds clustered around a central tree, suggesting this location serves as a roosting site. Dead trees in the southeastern region show no associated wildlife, indicating possible habitat degradation."

The DeepForest Agent with spatial analysis transforms ecological image analysis from a technical exercise into an intuitive conversation. By combining multi-agent AI architecture with sophisticated spatial indexing, we've created a system that thinks about ecology the way ecologists do - spatially, relationally, and contextually.

Google Summer of Code Blogs

Part 10 of 10

This blog series is a tracker of my progress as a GSoC 2025 contributor under NumFOCUS. I’ll be noting down the gist of my implementations, useful resources, bugs I’ve faced and how I solved them, along with my early plans, ideas, and reflections.

Start from the beginning

Before GSoC

Introduction I am Samia Haque Tisha, a recent graduate in Software Engineering from Daffodil International University. I am passionate about Computer Vision, AI Agents, and open-source contributions. I am currently working on integrating Large Langua...