Week 4-5: Building Cache, Parameter, File Management, and Gradio Interface for Agent Integration

I am an AI/ML enthusiast with a strong passion for bridging technology and social impact. I love solving complex problems whether it's solving confusion on any AI/ML concept or building LLM systems for real-world applications.
Last week, I implemented the first API call that may call the DeepForest object detection tool based on the user query. However, the tool call needs to be executed to get the object detection result. After that, the result will be passed down to a follow-up API call to make the final analysis that will blend the DeepForest object detection and the visual analysis. Since the conversation will be a multi-turn conversation, there might be tool calls with the same parameters on the same image. Running the same DeepForest object detection tool repeatedly will be wasteful. Hence, cache management was needed.
DeepForest Cache Management: src/deepforest_agent/cache/detection_cache.py
The CacheManager class was implemented to handle temporary storage and reuse of prediction results.
Permalink: detection_cache.py blob at commit 187129a
This class maintains a structured dictionary cached_predictions that keeps track of image_data, detection outputs in JSON format (predictions_json_str), the annotated_image_array for visualization, and a summary_text, models_detected, whether last_alive_dead_trees_requested was triggered, the current_image_hash to identify if a new image was supplied, and both structured and raw detection_parameters.
class CacheManager:
def __init__(self):
self.cached_predictions = {
"image_data": None,
"predictions_json_str": None,
"annotated_image_array": None,
"summary_text": None,
"models_detected": set(),
"last_alive_dead_trees_requested": False,
"current_image_hash": None,
"detection_parameters": {},
"detection_parameters_dict": {}
}
The should_run_detection() Logic
We need a mechanism to decide if we need to re-run detection because something important has changed. It decides to run the DeepForest detection based on three scenarios:
Image Change: If the current image is not the same as the MD5 image hash.
Model Change: If the user requests detection from models that haven’t been applied before.
Parameter Change: If detection parameters differ from the cached values on the same image.
The reason we are comparing with the cached results because it only stores the previous image, models and parameters. If a slight change is found with the current one, then the detection tool must run again.
Here’s the code snippet:
def should_run_detection(self, image_hash: str, params: DetectionParameters) -> Tuple[bool, str]:
# Check for image changes
if image_hash != self.cached_predictions["current_image_hash"]:
return True, "New image detected (hash changed)"
# Check for new models
requested_models = set(params.model_names)
already_detected = self.cached_predictions["models_detected"]
new_models = requested_models - already_detected
if new_models:
return True, f"New models requested: {list(new_models)}"
# Check for parameter changes per model
cached_params_dict = self.cached_predictions.get("detection_parameters_dict", {})
requested_params_dict = params.to_dict()
for model in requested_models:
if model not in cached_params_dict:
return True, f"No cached parameters for model: {model}"
cached_params = cached_params_dict[model]
for param_name, requested_value in requested_params_dict.items():
cached_value = cached_params.get(param_name)
if cached_value != requested_value:
return True, f"Parameter '{param_name}' changed for model '{model}': {cached_value} → {requested_value}"
return False, f"All models {list(requested_models)} already detected with identical parameters"
Cache Update
After deciding whether detection should run, we need to handle new results if a detection is run. The update_cache function ensures that all prediction outputs, metadata, and parameter settings are refreshed in the cache after each detection run.
def update_cache(self, image_hash: str, params: DetectionParameters,
summary_text: str, annotated_image_array: np.ndarray,
json_output: str) -> None:
self.cached_predictions.update({
"summary_text": summary_text,
"annotated_image_array": annotated_image_array,
"predictions_json_str": json_output,
"current_image_hash": image_hash,
"models_detected": self.cached_predictions["models_detected"].union(set(params.model_names)),
})
params_dict = params.to_dict()
for model in params.model_names:
self.cached_predictions["detection_parameters_dict"][model] = params_dict.copy()
At first, I was only passing params.model_names directly to the models_detected key. This was only saving the latest model names, meaning if there was already a detection result for an existing model, we were still running the detection again. So, the union operation keeps all previously detected models while adding new ones.
Besides that, we also have two other methods, get_detection_summary and clear_cache_for_new_image. The get_detection_summary method provides a summary of which models have cached results available, ensuring users know whether the system is reusing past detections or running fresh ones. The clear_cache_for_new_image resets the cache whenever a new image is detected.
DeepForest Parameter Management: src/deepforest_agent/utils/parameters_manager.py
The DetectionParameters class centralizes all detection-related parameters in one place. This approach ensures defaults are defined exactly once and accessible both from class and instances. This class defines the default detection parameters once here and is reused everywhere. It also supports other utility methods to integrate the tool call and execution, such as—
get_default_model_names(): Returns default model names (["bird", "tree", "livestock"]).from_arguments(): Factory method that builds a parameter object from user input while applying defaults.to_dict(): Converts parameters into a dictionary for cache management and comparison.to_deepforest_args(): Converts parameters into the format required by DeepForest’spredict_objectsmethod.
Permalink: parameters_manager.py blob at commit 187129a
File Management: src/deepforest_agent/utils/file_manager.py
Before running detection, we need to:
Verify that the provided image path is valid and accessible.
Generate an MD5 hash for the image to check if it has been processed before.
Extract the file extension to guide parameter choices (e.g.,
.tiffiles requirepredict_tilewhile.jpgcan usepredict_image).
These are implemented in the FileManager class.
class FileManager:
@staticmethod
def validate_and_extract_info(image_path: str) -> Tuple[Optional[str], str]:
if not image_path or not os.path.exists(image_path):
return None, '.unknown'
try:
with open(image_path, 'rb') as f:
image_data = f.read()
image_hash = hashlib.md5(image_data).hexdigest()
file_extension = os.path.splitext(image_path)[1].lower()
print(f"Detected file extension: {file_extension} for {os.path.basename(image_path)}")
return image_hash, file_extension
except (OSError, IOError) as e:
print(f"Error accessing file {image_path}: {e}")
return None, '.unknown'
Gradio Interface Implementation: src/deepforest_agent/main.py
Permalink: main.py blob at commit 187129a
I aimed for a lightweight basic user interface for starters, which will contain—
An image input box: The User can upload an image.
An image output box: An annotated image will be displayed here.
A chatbot box: A user chat interface of user query and assistant reply will show up.
A text box: The user can input their query here.
with gr.Blocks() as app:
with gr.Row():
image_box = gr.Image(height=500, type="filepath")
with gr.Row():
chatbot = gr.Chatbot(height=750, type="messages")
image_output = gr.Image(height=500, label="Image Output")
with gr.Row():
text_box = gr.Textbox(
placeholder="Enter text and press enter, or upload an image",
container=False,
)
last_annotated_image = gr.State(None)
Separating image_box from image_output prevents confusion when users see detection results overlaid on their original images. gr.State tracks last_annotated_image, preserving the latest image for continued reference in the conversation.
Message Processing
In the query_message function, user_prompt will be added to the conversation history, and the updated state will be returned to ensure users see their messages appear instantly.
def query_message(history, user_prompt, image_path):
if not user_prompt.strip():
return "", history, None
user_message = {"role": "user", "content": user_prompt}
updated_history = history + [user_message]
return "", updated_history, None
Agent Response Processing
In the bot_response function, the model_response function calls the GeminiAgent backend to generate both a textual response and an annotated image. Then, the conversation history is updated with the assistant’s reply, along with the update of current_annotated_image if the model returns a new image.
def bot_response(history, user_prompt, image_path, current_annotated_image):
if not history:
return history, current_annotated_image, current_annotated_image
try:
response_text, annotated_image = agent.model_response(history, user_prompt, image_path)
if response_text:
assistant_message = {"role": "assistant", "content": response_text}
history.append(assistant_message)
if annotated_image is not None:
current_annotated_image = annotated_image
return history, current_annotated_image, current_annotated_image
except Exception as e:
error_message = {"role": "assistant", "content": f"An error occurred: {str(e)}"}
history.append(error_message)
return history, current_annotated_image, current_annotated_image
Event Chain Configuration
I implemented a gr.Button("Submit") alongside the text_box.submit() method. Both are connected to the same backend functions (query_message followed by bot_response). This ensures consistent processing regardless of how the user submits input, whether by pressing the button or hitting Enter.
btn = gr.Button("Submit")
btn.click(
query_message,
inputs=[chatbot, text_box, image_box],
outputs=[text_box, chatbot, image_output]
).then(
bot_response,
inputs=[chatbot, text_box, image_box, last_annotated_image],
outputs=[chatbot, image_output, last_annotated_image]
)
text_box.submit(
query_message,
inputs=[chatbot, text_box, image_box],
outputs=[text_box, chatbot, image_output]
).then(
bot_response,
inputs=[chatbot, text_box, image_box, last_annotated_image],
outputs=[chatbot, image_output, last_annotated_image]
)
I used Gradio’s event chaining mechanism (.click(...).then(...) and .submit(...).then(...)) to sequence two functions. The first function, query_message, handles preparing and updating the conversation state by taking in the chatbot, user text, and image inputs, while also returning updated states. Then, bot_response executes to process the model’s output and provide the final annotated image and chatbot updates.
The interface looks like below:

Conclusion and Next Steps
So far, we have successfully built a proper caching layer (CacheManager) so we stop re-running DeepForest detections on the same image/params. The detection arguments are now in DetectionParameters, so defaults live in one place and convert cleanly to DeepForest args. Finally, the Gradio UI and event chain: query_message → bot_response is hooked to both Enter and a Submit button, so input handling is consistent.
Next week, I will try to finish the Gemini agent end-to-end, using the foundations laid here. The plan is to implement the strategy of the previous week, where Gemini first decides whether DeepForest should run, then executes detection exactly once per unique (image_hash, model_names, params) using the cache, and finally calls a follow-up analysis step that blends the raw detection JSON with visual reasoning to produce a grounded answer.




