Wrapping up DeepForest Agent with Spatial Analysis

After months of development, the DeepForest Agent with spatial analysis capabilities is now complete. This multi-agent system transforms raw ecological image detection into comprehensive spatial narratives, making complex computer vision results accessible through natural language. Let me walk you through the key features and technical implementation that make this system unique.

The Core Challenge: From Pixels to Ecological Understanding

Traditional object detection gives you bounding boxes and confidence scores. But ecologists need to understand relationships - how wildlife interacts with vegetation, where density patterns emerge, and what spatial arrangements reveal about ecosystem health. That's where our spatial analysis system comes in.

Features

Object Detection Summaries
Provides counts, labels (e.g., bird, tree, livestock), and classification details (alive/dead tree).
Confidence-Based Insights
Separates detections into low, medium, and high-confidence groups, helping the user trust results more.
Spatial Distribution Analysis
Divides the image into a 3×3 grid (northwest, north, etc.) and reports localized density and coverage.
Spatial Relationship Queries
Uses R-tree indexing to find intersecting objects, nearest neighbors, and overlapping detections.
Coverage Estimation
Calculates what percentage of the image is occupied by each object type.
Memory of Past Results
Keeps track of previous analyses, so users can compare new results with previous ones.

Tile Image for Analysis

Purpose: Breaks large images into smaller, manageable tiles for efficient analysis with vision-language models (e.g., Qwen-VL).
Why: Large images (~9000×6000 px) can exceed memory limits; tiling avoids memory errors and ensures accurate analysis.
Primary Method: Uses slidingwindow library to generate overlapping windows, reading from raster files or PIL images.
Fallback: If slidingwindow fails, uses DeepForest preprocessing (normalizes image → computes windows → converts tiles to PIL).

Detection Narrative Generator & Spatial Analysis with R-tree

1. Purpose and Features

The DetectionNarrativeGenerator works alongside the DetectionSpatialAnalyzer to convert raw DeepForest detection outputs into natural language narratives for ecological images. Its main goals:

Provide comprehensive object summaries, including base and classification labels (e.g., tree → alive/dead).
Perform confidence analysis, separating high, medium, and low-confidence detections.
Analyze spatial distribution across a 3×3 grid layout of the image.
Extract spatial relationships using R-tree indexing (nearest neighbors, intersecting objects).
Compute object coverage, showing the percentage of image area occupied by each object type.

2. Core Components

File: rtree_spatial_utils.py

A. DetectionSpatialAnalyzer

Uses an R-tree index to efficiently store and query detections with bounding boxes (xmin, ymin, xmax, ymax).
Clamps bounding boxes to image dimensions and skips zero-area or invalid detections.
Adds extra metadata: centroid_x, centroid_y, area, detection_id.

Key Methods:

add_detections – Adds detection dictionaries to R-tree and stores enriched metadata.

get_grid_analysis – We divide images into nine regions (northwest, north, northeast, etc.) to provide localized insights:

 def get_grid_analysis(self) -> Dict[str, Dict[str, Any]]:
     grid_width = self.image_width / 3
     grid_height = self.image_height / 3

     grid_names = {
         (0, 0): "Top-Left (Northwest)", (1, 0): "Top-Center (North)", 
         (2, 0): "Top-Right (Northeast)",
         (0, 1): "Middle-Left (West)", (1, 1): "Center", 
         (2, 1): "Middle-Right (East)",
         (0, 2): "Bottom-Left (Southwest)", (1, 2): "Bottom-Center (South)", 
         (2, 2): "Bottom-Right (Southeast)"
     }

Instead of saying "birds are scattered throughout," the system reports "3 birds in the northwestern region, 2 in the center, concentrated near water sources." This specificity transforms vague observations into actionable data.

_analyze_confidence_categories – Categorizes detections into low, medium, high confidence and computes stats (avg area, min, max, labels). Not all detections are equal. Our confidence analysis separates results into three categories, helping users understand reliability:

 confidence_groups = {
     "High (0.7-1.0)": [],
     "Medium (0.3-0.7)": [],
     "Low (0.0-0.3)": []
 }

 for detection in detections_list:
     score = detection.get('score', 0.0)
     if score >= 0.7:
         confidence_groups["High (0.7-1.0)"].append(detection)
     elif score >= 0.3:
         confidence_groups["Medium (0.3-0.7)"].append(detection)
     else:
         confidence_groups["Low (0.0-0.3)"].append(detection)

analyze_spatial_relationships_with_indexing – Finds intersections and nearest neighbors using the R-tree for confidence ≥ threshold.
generate_spatial_narrative – Converts intersection and nearest neighbor info into a natural language narrative.
get_detection_statistics – Returns comprehensive statistics: total count, average confidence, size stats, label distribution, confidence distribution.

B. DetectionNarrativeGenerator

File: detection_narrative_generator.py

Wraps the spatial analyzer to produce human-readable narratives.
Ensures classification handling, e.g., distinguishing alive/dead trees.
Incorporates multiple narrative dimensions: overall summary, confidence, spatial distribution, relationships, coverage.
This generates narratives like:

"Overall Detection Summary
In the whole image, 23 objects were detected with an average confidence of 0.743.
Object breakdown: 15 trees (8 alive trees, 7 dead trees), 5 birds, 3 livestock.

Spatial Distribution Analysis
Northwestern region: 8 objects detected - 5 trees (3 alive, 2 dead), 2 birds
Central region: 7 objects detected - 4 trees (all alive), 3 livestock

Spatial Relationships Analysis
I am 87% confident that, in the northwestern region, 2 birds are intersected around an alive tree at location (245, 123)."

Deployment and Future Vision

The system is now available as a Hugging Face Space (pending PR approval): https://huggingface.co/spaces/weecology/deepforest-agent

Latest Commit: https://github.com/weecology/deepforest-agent/pull/3/commits/641e1396bb47f04f098ab6c28f4693b5d7cf47d7

The roadmap includes two major enhancements:

1. Zoom-Based Analysis Users will click and drag to select image regions for focused analysis

2. Spatial Queries as Tools Integration of spatial analysis directly into the agent workflow

This system bridges the gap between computer vision capabilities and ecological understanding. Instead of technical jargon about bounding boxes and confidence scores, researchers get narratives like:

"The northwestern wetland area shows high bird activity with 5 detections near alive vegetation. Spatial analysis reveals 3 birds clustered around a central tree, suggesting this location serves as a roosting site. Dead trees in the southeastern region show no associated wildlife, indicating possible habitat degradation."

The DeepForest Agent with spatial analysis transforms ecological image analysis from a technical exercise into an intuitive conversation. By combining multi-agent AI architecture with sophisticated spatial indexing, we've created a system that thinks about ecology the way ecologists do - spatially, relationally, and contextually.

Wrapping up DeepForest Agent with Spatial Analysis

The Core Challenge: From Pixels to Ecological Understanding

Features

Tile Image for Analysis