Inference

ScreenshotInference block configured with model + tools.

The Inference block is a single-shot LLM call. Messages go in, a response comes out. It supports optional tools and structured output, but has no memory, no skills, and no reasoning config — for that, use a Crew Agent and the Agents block.

Inference vs. Agents vs. Crew Agent

These three terms are not the same. Every page in the docs respects this table.

Term	What it is	Where
Inference block	A workflow block that calls a model with a prompt + optional tools. The atomic LLM step. No memory, no multi-turn loop.	This page
Agents block	A workflow block that runs a saved Crew Agent inside a Sandbox. The block is named `agents` (plural).	`blocks/agents`
Crew Agent	A reusable agent definition (model + skills + env + system prompt) stored in Crew.	`crew/agents`
Legacy `agent` block	Deprecated. Removed from the block palette. Old workflows keep running because the handler is still wired in the executor, but new workflows should use the Inference block (this page) or the Agents block.	—

Configuration

Messages

A JSON array of { role, content } objects. Roles are system, user, or assistant. The block accepts the array directly or a JSON string and validates each entry — invalid entries are dropped.

[
  { "role": "system", "content": "You are a research analyst." },
  { "role": "user", "content": "Summarize the attached report." }
]

Model

The Model dropdown loads from GET /api/workspaces/:workspaceId/model-endpoints — the model endpoints registered in Crew › Models. The block stores the endpoint ID, not a model name. At execution time the handler resolves the endpoint to its provider, model name, API key, and base URL.

You can also pass a known model name directly (for example gpt-4o or claude-sonnet-4-6) — if it matches a built-in provider, the block uses workspace-default credentials.

The default model when none is set is claude-sonnet-4-6.

Tools

Tools are optional. Three kinds are supported:

Block tools — any other block exposed as a callable tool (for example the API block, or a custom-built block).
Custom tools — a user-defined function with a JSON schema.
MCP tools — tools served by a workspace MCP server.

Each tool has a usageControl setting: auto (the model decides), force (always called), or none (excluded from the tool list).

MCP tools are filtered before each call: only tools whose server is currently connected in this workspace are sent to the model. Disconnected servers are skipped silently. Custom tools and MCP tools require workspace permissions; they are validated before the call.

Temperature and max tokens (advanced)

Temperature is a slider from 0 to 1, default 0.3. Max Output Tokens is a free-form integer field. Both are in advanced mode and forwarded as-is to the provider.

Response format

A JSON Schema object. When set, the block asks the provider for structured output and parses the response as JSON before emitting it.

{
  "name": "extracted_invoice",
  "schema": {
    "type": "object",
    "properties": {
      "vendor": { "type": "string" },
      "total":  { "type": "number" }
    },
    "required": ["vendor", "total"]
  },
  "strict": true
}

If the model returns content that does not parse as JSON, the block falls back to the standard string output and attaches _responseFormatWarning to the result so the failure is visible in logs.

Outputs

Output	Type	Description
`content`	string	Generated response. When `responseFormat` is set, the parsed JSON fields are spread onto the output alongside this.
`model`	string	Resolved model name (after endpoint resolution).
`tokens`	json	`{ input, output, total }`.
`toolCalls`	json	`{ list: [{ name, arguments, result, startTime, endTime, duration }], count }`. Tool-call names have any internal prefix stripped.
`cost`	json	Provider cost breakdown for the call.
`providerTiming`	json	Latency breakdown reported by the provider.

Streaming

The block streams when both conditions hold:

The workflow run was started with streaming enabled (ctx.stream).
This block is in the run's selectedOutputs — typically because a downstream block consumes its output, or it was explicitly selected in the run dialog.

Otherwise the call is non-streaming and returns the full response at once. The shape of the streaming output matches the standard output: partial deltas land on content, then tokens, toolCalls, and cost populate at the end of the stream.

Tool calls — single-turn only

Unlike the Agents block, the Inference block does not run a multi-turn tool-calling loop. When the model emits tool calls, they are returned in the toolCalls output and the run ends. To act on those tool calls, route the output back through workflow logic — for example, into a Function block or a Router — or use the Agents block, which runs the loop end-to-end inside a Sandbox.

Differences from the legacy Agent block

The Inference block replaces the legacy agent block. Compared to it, the Inference block:

has no memory service
has no Skills selector (skills live on Crew Agents now)
has no reasoning effort, verbosity, or thinking-level controls
has no deep-research / previousInteractionId chaining
does not resolve provider credentials for Vertex, Azure, or Bedrock
does not run a multi-turn tool-calling loop

If you need any of those, build a Crew Agent and call it from the Agents block.

Source

apps/actana/blocks/blocks/inference.ts — block definition
apps/actana/executor/handlers/inference/inference-handler.ts — runtime handler

Inference

On this page