Octo-planner: On-Device Language Model for Planner-Action Agents

A 3.8B Model for AI Agent Action Planning with 98%+ Accuracy

Download Octo-planner Deploy this model for your business

TL;DR

Octo-planner is a 3.8B-parameter language model that can run locally on edge devices, addressing concerns about data privacy, latency, and availability.

Octo-planner achieves 98%+ accuracy in breaking down user queries into actionable steps for on-device AI agents in one domain.

Using multi-LoRA training, Octo-planner combines knowledge from different task areas, enabling it to handle complex and diverse queries across various domains (e.g., system actions and e-commerce actions simultaneously).

Evaluation

We created a test dataset of 1,000 data points using GPT-4, consisting of diverse user queries and their corresponding action plans. We used GPT-4 as an oracle to evaluate the correctness of generated plans, aiming to create a local planner that can rival close-source cloud-based models in performance.

We tested full fine-tuning on various base models to assess performance. Microsoft Phi-3 Mini achieved 98.1% accuracy, Google Gemma 2b reached 85.6% accuracy, and Google Gemma 7b attained 99.7% accuracy. We chose the Phi-3 Mini model for Octo-planner as it strikes the best balance between model size and performance for on-device deployment.

Introduction

AI agents require effective planning processes to determine the best course of action and execute planned tasks. At Nexa AI, we've made significant strides in this direction with our Octopus model series. We launched Octopus V2 for fast and accurate action taking, and Octopus V3 took a step forward to support multimodal and multilingual capabilities. Now, we're addressing the crucial planning aspect of AI agents with Octo-planner.

Training Methods

Prior to Octo-planner, AI agent planning typically relied on large language models like GPT-4 or Gemini-Pro. These models, while powerful, faced several limitations for on-device use:

LLMs require significant processing power, making them impractical for edge devices.

Cloud-based models necessitate sending user data off-device, raising data privacy issues.

Cloud-dependent planners introduce delays, hampering real-time applications.

Many existing planners cannot operate without an internet connection.

Using cloud-based LLMs for planning is often expensive, limiting widespread adoption.

These limitations created a need for an efficient, on-device planning solution that could maintain high accuracy while addressing these constraints for on-device AI agents.

Planner and Action Agents Framework

Octo-planner separates planning and action execution into two distinct components:

Decomposes user queries into a sequence of sub-steps.

Executes the planned sub-steps sequentially.

This separation allows for specialized optimization, improving modularity, adaptability, and scalability for different domains and task complexities. It enhances interpretability by making the decision-making process more transparent. Furthermore, the planner internalizes function descriptions during training, eliminating the need for lengthy context in each prompt. As a result, the planner-and-action agents framework significantly reduces computational demands and improves efficiency on resource-constrained devices.

Planner and Action Agents Framework

To train Octo-planner, we developed a specialized dataset that pairs user queries with corresponding action plans. These plans are broken down into sequences of 1-5 steps, representing a range of task complexities. We leveraged GPT-4 to generate a diverse array of queries that align with our available functions, ensuring broad coverage of potential user requests.

Quality control was a key focus in our dataset creation process. We implemented a rigorous validation system, also using GPT-4, to assess and filter the generated data. This ensured that only high-quality, accurate query-response pairs were included in the final dataset. This approach teaches Octo-planner about function capabilities during training, allowing it to operate efficiently on devices without needing lengthy descriptions for each query.

What's Next

Octopus v2 - On-Device 0.5B LLMs, Voice/Text in, action out, outperform GPT-4

Octopus v3 - Compact (Sub-Billion) Multimodal Action Model for On-Device AI Agents

Model Hub

GitHub Docs

Gallery

Nexa SDK Edge LLMs Survey Squid Octo-planner Octopus v3 Octopus v2

Team Career Discord

Contact Usoctopus@nexa4ai.com

Social