Skip to content

AI Integration Nodes

Generate captions, titles, and descriptions for your artwork using your preferred AI model. Isekai integrates with 4 LLM providers: Claude (Anthropic), OpenAI GPT, Google Gemini, and local Ollama.

ProviderBest ForRequiresCost
ClaudeLong-form descriptions, nuanced captionsAPI keyPay-per-use
OpenAIGeneral purpose, widely availableAPI keyPay-per-use
GeminiGoogle ecosystem, multimodalAPI keyFree tier + paid
OllamaPrivacy, offline use, no API costsLocal installFree

Generate high-quality captions using Anthropic’s Claude models.

Location: Isekai/LLMs

Inputs:

  • text_input (STRING): Prompt or context to generate from
  • api_key (STRING, optional): Claude API key (uses ANTHROPIC_API_KEY env var if empty)
  • model (COMBO): claude-3-5-sonnet-20241022, claude-3-5-haiku-20241022, claude-3-opus-20240229
  • max_tokens (INT, 1-4096): Maximum response length (default: 100)
  • system_prompt (STRING, optional): Custom instructions for Claude

Outputs:

  • generated_text (STRING): Claude’s response

Recommended Models:

  • claude-3-5-sonnet-20241022: Best quality, most capable
  • claude-3-5-haiku-20241022: Fast and cost-effective
  • claude-3-opus-20240229: Highest intelligence (expensive)

Environment Variable:

Terminal window
export ANTHROPIC_API_KEY="sk-ant-api03-..."

Example:

text_input: "A warrior woman in golden armor standing on a mountain peak"
system_prompt: "Generate a short, catchy title (5-10 words max)"
Output: "Golden Warrior Atop Mountain Peak"

Generate captions using GPT-3.5 or GPT-4 models.

Location: Isekai/LLMs

Inputs:

  • text_input (STRING): Prompt or context to generate from
  • api_key (STRING, optional): OpenAI API key (uses OPENAI_API_KEY env var if empty)
  • model (COMBO): gpt-4-turbo, gpt-4, gpt-3.5-turbo
  • max_tokens (INT, 1-4096): Maximum response length (default: 100)
  • system_prompt (STRING, optional): Custom instructions for GPT

Outputs:

  • generated_text (STRING): GPT’s response

Recommended Models:

  • gpt-4-turbo: Best balance of quality and speed
  • gpt-3.5-turbo: Fast and cost-effective
  • gpt-4: Highest quality (slower, expensive)

Environment Variable:

Terminal window
export OPENAI_API_KEY="sk-..."

Example:

text_input: "portrait of a cyberpunk hacker in neon-lit alley"
system_prompt: "Write a dramatic one-sentence description"
Output: "A lone hacker emerges from shadows, neon reflections dancing across chrome implants."

Generate captions using Google’s Gemini models.

Location: Isekai/LLMs

Inputs:

  • text_input (STRING): Prompt or context to generate from
  • api_key (STRING, optional): Gemini API key (uses GEMINI_API_KEY env var if empty)
  • model (COMBO): gemini-1.5-pro, gemini-1.5-flash, gemini-1.0-pro
  • max_tokens (INT, 1-8192): Maximum response length (default: 100)
  • system_prompt (STRING, optional): Custom instructions for Gemini

Outputs:

  • generated_text (STRING): Gemini’s response

Recommended Models:

  • gemini-1.5-pro: Most capable, multimodal
  • gemini-1.5-flash: Fast and efficient
  • gemini-1.0-pro: Stable, proven model

Environment Variable:

Terminal window
export GEMINI_API_KEY="AIza..."

Example:

text_input: "fantasy dragon breathing fire over medieval castle"
system_prompt: "Create a short title suitable for an art gallery"
Output: "Dragon's Fury: Castle Siege"

Generate captions using local open-source models (fully offline).

Location: Isekai/LLMs

Requirements: Ollama running locally (ollama.com)

Inputs:

  • text_input (STRING): Prompt or context to generate from
  • ollama_url (STRING): Ollama server URL (default: “http://localhost:11434”)
  • model (COMBO): Dynamically populated from your Ollama installation

Outputs:

  • generated_text (STRING): Model’s response

Special Outputs:

  • "Untitled": Empty input
  • "Connection Failed": Cannot reach Ollama
  • "Error: 404": Model not found

Popular Models:

  • llama3: Meta’s flagship model (best quality)
  • mistral: Fast and capable
  • gemma: Google’s open model
  • phi: Microsoft’s efficient model

Example:

text_input: "A highly detailed digital painting of a fierce warrior"
model: llama3
Output: "Fierce Warrior Portrait"
  1. Install Ollama

    Terminal window
    curl -fsSL https://ollama.com/install.sh | sh
  2. Pull a model

    Terminal window
    ollama pull llama3
  3. Verify it’s running

    Terminal window
    curl http://localhost:11434/api/tags

System prompts control how the AI generates text. Here are templates for common use cases:

Generate a short, catchy title (5-10 words max) for this artwork. Be creative and evocative.
Write a vivid, detailed description of this artwork in 1-2 sentences. Focus on mood, composition, and key visual elements.
Create an SEO-friendly title that describes the artwork clearly while being engaging. Include key visual elements.
Write a poetic, atmospheric caption that captures the essence and mood of this artwork. Be artistic and evocative.
Write an engaging social media caption for this artwork. Be concise, use emojis if appropriate, and make it shareable.

Set API keys via environment variables for security (recommended over hardcoding in workflows).

Add to ~/.bashrc or ~/.zshrc:

Terminal window
export ANTHROPIC_API_KEY="sk-ant-api03-..."
export OPENAI_API_KEY="sk-..."
export GEMINI_API_KEY="AIza..."

Then restart ComfyUI.


Dynamic String → CLIP Text Encode → Sampler → VAE Decode
(random prompts) ↓
Claude/OpenAI/Gemini
(generate title)
Isekai Upload
(auto-populated title)
Round Robin ──→ Tag Selector ──→ Concatenate ──→ CLIP Text Encode ──→ Sampler
(characters) (char tags) (full prompt) ↓
VAE Decode
┌─────────────────── (pass prompt)─────┘
Ollama (generate title)
Isekai Upload (with AI title)
Image → Color Adjust → Vignette → [Split to 3 paths]
┌───────────┼───────────┐
↓ ↓ ↓
Claude OpenAI Gemini
(title) (title) (title)
↓ ↓ ↓
[Compare outputs and choose best]

ProviderModelCost per 1M tokens (input)Cost per 1M tokens (output)
ClaudeSonnet 3.5$3.00$15.00
ClaudeHaiku 3.5$1.00$5.00
OpenAIGPT-4 Turbo$10.00$30.00
OpenAIGPT-3.5 Turbo$0.50$1.50
GeminiPro 1.5$1.25$5.00
GeminiFlash 1.5$0.075$0.30
OllamaAll modelsFreeFree

  1. Be specific in prompts: Include key visual elements from your artwork
  2. Use system prompts: Control output length and style
  3. Test different models: Each has different strengths
  4. Keep max_tokens low: For titles, 50-100 tokens is plenty
  5. Use Ollama for experimentation: Free and unlimited testing

Ollama only: Ensure Ollama is running

Terminal window
ollama serve

Cloud models: Verify your API key is correct and has credits

Terminal window
# Test Claude
curl https://api.anthropic.com/v1/messages -H "x-api-key: $ANTHROPIC_API_KEY"
# Test OpenAI
curl https://api.openai.com/v1/models -H "Authorization: Bearer $OPENAI_API_KEY"
# Test Gemini
curl "https://generativelanguage.googleapis.com/v1/models?key=$GEMINI_API_KEY"

Ollama: Pull the model first

Terminal window
ollama pull llama3

Cloud models: Check model name spelling (case-sensitive)

  • Increase max_tokens parameter
  • Simplify your prompt
  • Try a different model