Baoyu Image Gen
jimliu/baoyu-skillsThe baoyu-image-gen skill provides an API-based image generation service supporting multiple providers such as OpenAI, Google, OpenRouter, DashScope, Jimeng, Seedream, and Replicate. It allows users to generate images from prompts with customizable options including size, aspect ratio, quality, and reference images, making it ideal for developers and artists seeking automated or bulk image creation. The skill features detailed configuration, environment variable support for API keys, and batch processing capabilities for efficient multi-image generation.
Image Generation (AI SDK)
Official API-based image generation. Supports OpenAI, Google, OpenRouter, DashScope (阿里通义万象), Jimeng (即梦), Seedream (豆包) and Replicate providers.
Script Directory
Agent Execution:
{baseDir}= this SKILL.md file's directory- Script path =
{baseDir}/scripts/main.ts - Resolve
${BUN_X}runtime: ifbuninstalled →bun; ifnpxavailable →npx -y bun; else suggest installing bun
Step 0: Load Preferences ⛔ BLOCKING
CRITICAL: This step MUST complete BEFORE any image generation. Do NOT skip or defer. Check EXTEND.md existence (priority: project → user):
# macOS, Linux, WSL, Git Bash
test -f .baoyu-skills/baoyu-image-gen/EXTEND.md && echo "project"
test -f "${XDG_CONFIG_HOME:-$HOME/.config}/baoyu-skills/baoyu-image-gen/EXTEND.md" && echo "xdg"
test -f "$HOME/.baoyu-skills/baoyu-image-gen/EXTEND.md" && echo "user"
# PowerShell (Windows)
if (Test-Path .baoyu-skills/baoyu-image-gen/EXTEND.md) { "project" }
$xdg = if ($env:XDG_CONFIG_HOME) { $env:XDG_CONFIG_HOME } else { "$HOME/.config" }
if (Test-Path "$xdg/baoyu-skills/baoyu-image-gen/EXTEND.md") { "xdg" }
if (Test-Path "$HOME/.baoyu-skills/baoyu-image-gen/EXTEND.md") { "user" }
Result
Action
Found
Load, parse, apply settings. If default_model.[provider] is null → ask model only (Flow 2)
Not found
⛔ Run first-time setup (references/config/first-time-setup.md) → Save EXTEND.md → Then continue
CRITICAL: If not found, complete the full setup (provider + model + quality + save location) using AskUserQuestion BEFORE generating any images. Generation is BLOCKED until EXTEND.md is created.
Path
Location
.baoyu-skills/baoyu-image-gen/EXTEND.md
Project directory
$HOME/.baoyu-skills/baoyu-image-gen/EXTEND.md
User home
EXTEND.md Supports: Default provider | Default quality | Default aspect ratio | Default image size | Default models | Batch worker cap | Provider-specific batch limits
Schema: references/config/preferences-schema.md
Usage
# Basic
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image cat.png
# With aspect ratio
${BUN_X} {baseDir}/scripts/main.ts --prompt "A landscape" --image out.png --ar 16:9
# High quality
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --quality 2k
# From prompt files
${BUN_X} {baseDir}/scripts/main.ts --promptfiles system.md content.md --image out.png
# With reference images (Google, OpenAI, OpenRouter, or Replicate)
${BUN_X} {baseDir}/scripts/main.ts --prompt "Make blue" --image out.png --ref source.png
# With reference images (explicit provider/model)
${BUN_X} {baseDir}/scripts/main.ts --prompt "Make blue" --image out.png --provider google --model gemini-3-pro-image-preview --ref source.png
# OpenRouter (recommended default model)
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider openrouter
# OpenRouter with reference images
${BUN_X} {baseDir}/scripts/main.ts --prompt "Make blue" --image out.png --provider openrouter --model google/gemini-3.1-flash-image-preview --ref source.png
# Specific provider
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider openai
# DashScope (阿里通义万象)
${BUN_X} {baseDir}/scripts/main.ts --prompt "一只可爱的猫" --image out.png --provider dashscope
# DashScope Qwen-Image 2.0 Pro (recommended for custom sizes and text rendering)
${BUN_X} {baseDir}/scripts/main.ts --prompt "为咖啡品牌设计一张 21:9 横幅海报,包含清晰中文标题" --image out.png --provider dashscope --model qwen-image-2.0-pro --size 2048x872
# DashScope legacy Qwen fixed-size model
${BUN_X} {baseDir}/scripts/main.ts --prompt "一张电影感海报" --image out.png --provider dashscope --model qwen-image-max --size 1664x928
# Replicate (google/nano-banana-pro)
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider replicate
# Replicate with specific model
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider replicate --model google/nano-banana
# Batch mode with saved prompt files
${BUN_X} {baseDir}/scripts/main.ts --batchfile batch.json
# Batch mode with explicit worker count
${BUN_X} {baseDir}/scripts/main.ts --batchfile batch.json --jobs 4 --json
Batch File Format
{
"jobs": 4,
"tasks": [
{
"id": "hero",
"promptFiles": ["prompts/hero.md"],
"image": "out/hero.png",
"provider": "replicate",
"model": "google/nano-banana-pro",
"ar": "16:9",
"quality": "2k"
},
{
"id": "diagram",
"promptFiles": ["prompts/diagram.md"],
"image": "out/diagram.png",
"ref": ["references/original.png"]
}
]
}
Paths in promptFiles, image, and ref are resolved relative to the batch file's directory. jobs is optional (overridden by CLI --jobs). Top-level array format (without jobs wrapper) is also accepted.
Options
Option
Description
--prompt <text>, -p
Prompt text
--promptfiles <files...>
Read prompt from files (concatenated)
--image <path>
Output image path (required in single-image mode)
--batchfile <path>
JSON batch file for multi-image generation
--jobs <count>
Worker count for batch mode (default: auto, max from config, built-in default 10)
--provider google|openai|openrouter|dashscope|jimeng|seedream|replicate
Force provider (default: auto-detect)
--model <id>, -m
Model ID (Google: gemini-3-pro-image-preview; OpenAI: gpt-image-1.5; OpenRouter: google/gemini-3.1-flash-image-preview; DashScope: qwen-image-2.0-pro)
--ar <ratio>
Aspect ratio (e.g., 16:9, 1:1, 4:3)
--size <WxH>
Size (e.g., 1024x1024)
--quality normal|2k
Quality preset (default: 2k)
--imageSize 1K|2K|4K
Image size for Google/OpenRouter (default: from quality)
--ref <files...>
Reference images. Supported by Google multimodal, OpenAI GPT Image edits, OpenRouter multimodal models, and Replicate. Not supported by Jimeng or Seedream
--n <count>
Number of images
--json
JSON output
Environment Variables
Variable
Description
OPENAI_API_KEY
OpenAI API key
OPENROUTER_API_KEY
OpenRouter API key
GOOGLE_API_KEY
Google API key
DASHSCOPE_API_KEY
DashScope API key (阿里云)
REPLICATE_API_TOKEN
Replicate API token
JIMENG_ACCESS_KEY_ID
Jimeng (即梦) Volcengine access key
JIMENG_SECRET_ACCESS_KEY
Jimeng (即梦) Volcengine secret key
ARK_API_KEY
Seedream (豆包) Volcengine ARK API key
OPENAI_IMAGE_MODEL
OpenAI model override
OPENROUTER_IMAGE_MODEL
OpenRouter model override (default: google/gemini-3.1-flash-image-preview)
GOOGLE_IMAGE_MODEL
Google model override
DASHSCOPE_IMAGE_MODEL
DashScope model override (default: qwen-image-2.0-pro)
REPLICATE_IMAGE_MODEL
Replicate model override (default: google/nano-banana-pro)
JIMENG_IMAGE_MODEL
Jimeng model override (default: jimeng_t2i_v40)
SEEDREAM_IMAGE_MODEL
Seedream model override (default: doubao-seedream-5-0-260128)
OPENAI_BASE_URL
Custom OpenAI endpoint
OPENROUTER_BASE_URL
Custom OpenRouter endpoint (default: https://openrouter.ai/api/v1)
OPENROUTER_HTTP_REFERER
Optional app/site URL for OpenRouter attribution
OPENROUTER_TITLE
Optional app name for OpenRouter attribution
GOOGLE_BASE_URL
Custom Google endpoint
DASHSCOPE_BASE_URL
Custom DashScope endpoint
REPLICATE_BASE_URL
Custom Replicate endpoint
JIMENG_BASE_URL
Custom Jimeng endpoint (default: https://visual.volcengineapi.com)
JIMENG_REGION
Jimeng region (default: cn-north-1)
SEEDREAM_BASE_URL
Custom Seedream endpoint (default: https://ark.cn-beijing.volces.com/api/v3)
BAOYU_IMAGE_GEN_MAX_WORKERS
Override batch worker cap
BAOYU_IMAGE_GEN_<PROVIDER>_CONCURRENCY
Override provider concurrency, e.g. BAOYU_IMAGE_GEN_REPLICATE_CONCURRENCY
BAOYU_IMAGE_GEN_<PROVIDER>_START_INTERVAL_MS
Override provider start gap, e.g. BAOYU_IMAGE_GEN_REPLICATE_START_INTERVAL_MS
Load Priority: CLI args > EXTEND.md > env vars > <cwd>/.baoyu-skills/.env > ~/.baoyu-skills/.env
Model Resolution
Model priority (highest → lowest), applies to all providers:
- CLI flag:
--model <id> - EXTEND.md:
default_model.[provider] - Env var:
<PROVIDER>_IMAGE_MODEL(e.g.,GOOGLE_IMAGE_MODEL) - Built-in default
EXTEND.md overrides env vars. If both EXTEND.md
default_model.google: "gemini-3-pro-image-preview"and env varGOOGLE_IMAGE_MODEL=gemini-3.1-flash-image-previewexist, EXTEND.md wins. Agent MUST display model info before each generation:
- Show:
Using [provider] / [model] - Show switch hint:
Switch model: --model <id> | EXTEND.md default_model.[provider] | env <PROVIDER>_IMAGE_MODEL
DashScope Models
Use --model qwen-image-2.0-pro or set default_model.dashscope / DASHSCOPE_IMAGE_MODEL when the user wants official Qwen-Image behavior.
Official DashScope model families:
qwen-image-2.0-pro,qwen-image-2.0-pro-2026-03-03,qwen-image-2.0,qwen-image-2.0-2026-03-03- Free-form
sizein宽*高format - Total pixels must stay between
512*512and2048*2048 - Default size is approximately
1024*1024 - Best choice for custom ratios such as
21:9and text-heavy Chinese/English layouts
- Free-form
qwen-image-max,qwen-image-max-2025-12-30,qwen-image-plus,qwen-image-plus-2026-01-09,qwen-image- Fixed sizes only:
1664*928,1472*1104,1328*1328,1104*1472,928*1664 - Default size is
1664*928 qwen-imagecurrently has the same capability asqwen-image-plus
- Fixed sizes only:
- Legacy DashScope models such as
z-image-turbo,z-image-ultra,wanx-v1- Keep using them only when the user explicitly asks for legacy behavior or compatibility When translating CLI args into DashScope behavior:
--sizewins over--ar- For
qwen-image-2.0*, prefer explicit--size; otherwise infer from--arand use the official recommended resolutions below - For
qwen-image-max/plus/image, only use the five official fixed sizes; if the requested ratio is not covered, switch toqwen-image-2.0-pro --qualityis a baoyu-image-gen compatibility preset, not a native DashScope API field. Mappingnormal/2konto theqwen-image-2.0*table below is an implementation inference, not an official API guarantee Recommendedqwen-image-2.0*sizes for common aspect ratios: Rationormal2k1:11024*10241536*15362:3768*11521024*15363:21152*7681536*10243:4960*12801080*14404:31280*9601440*10809:16720*12801080*192016:91280*7201920*108021:91344*5762048*872DashScope official APIs also exposenegative_prompt,prompt_extend, andwatermark, butbaoyu-image-gendoes not expose them as dedicated CLI flags today. Official references:- Qwen-Image API
- Text-to-image guide
- Qwen-Image Edit API
OpenRouter Models
Use full OpenRouter model IDs, e.g.:
google/gemini-3.1-flash-image-preview(recommended, supports image output and reference-image workflows)google/gemini-2.5-flash-image-previewblack-forest-labs/flux.2-pro- Other OpenRouter image-capable model IDs Notes:
- OpenRouter image generation uses
/chat/completions, not the OpenAI/imagesendpoints - If
--refis used, choose a multimodal model that supports image input and image output --imageSizemaps to OpenRouterimageGenerationOptions.size;--size <WxH>is converted to the nearest OpenRouter size and inferred aspect ratio when possible
Replicate Models
Supported model formats:
owner/name(recommended for official models), e.g.google/nano-banana-proowner/name:version(community models by version), e.g.stability-ai/sdxl:<version>Examples:
# Use Replicate default model
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider replicate
# Override model explicitly
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider replicate --model google/nano-banana
Provider Selection
--refprovided + no--provider→ auto-select Google first, then OpenAI, then OpenRouter, then Replicate (Jimeng and Seedream do not support reference images)--providerspecified → use it (if--ref, must begoogle,openai,openrouter, orreplicate)- Only one API key available → use that provider
- Multiple available → default to Google
Quality Presets
Preset
Google imageSize
OpenAI Size
OpenRouter size
Replicate resolution
Use Case
normal
1K
1024px
1K
1K
Quick previews
2k (default)
2K
2048px
2K
2K
Covers, illustrations, infographics
Google/OpenRouter imageSize: Can be overridden with --imageSize 1K|2K|4K
Aspect Ratios
Supported: 1:1, 16:9, 9:16, 4:3, 3:4, 2.35:1
- Google multimodal: uses
imageConfig.aspectRatio - OpenAI: maps to closest supported size
- OpenRouter: sends
imageGenerationOptions.aspect_ratio; if only--size <WxH>is given, aspect ratio is inferred automatically - Replicate: passes
aspect_ratioto model; when--refis provided without--ar, defaults tomatch_input_image
Generation Mode
Default: Sequential generation.
Batch Parallel Generation: When --batchfile contains 2 or more pending tasks, the script automatically enables parallel generation.
Mode
When to Use
Sequential (default)
Normal usage, single images, small batches
Parallel batch
Batch mode with 2+ tasks
Execution choice:
Situation
Preferred approach
Why
One image, or 1-2 simple images
Sequential
Lower coordination overhead and easier debugging
Multiple images already have saved prompt files
Batch (--batchfile)
Reuses finalized prompts, applies shared throttling/retries, and gives predictable throughput
Each image still needs separate reasoning, prompt writing, or style exploration
Subagents
The work is still exploratory, so each image may need independent analysis before generation
Output comes from baoyu-article-illustrator with outline.md + prompts/
Batch (build-batch.ts -> --batchfile)
That workflow already produces prompt files, so direct batch execution is the intended path
Rule of thumb:
- Prefer batch over subagents once prompt files are already saved and the task is "generate all of these"
- Use subagents only when generation is coupled with per-image thinking, rewriting, or divergent creative exploration Parallel behavior:
- Default worker count is automatic, capped by config, built-in default 10
- Provider-specific throttling is applied only in batch mode, and the built-in defaults are tuned for faster throughput while still avoiding obvious RPM bursts
- You can override worker count with
--jobs <count> - Each image retries automatically up to 3 attempts
- Final output includes success count, failure count, and per-image failure reasons
Error Handling
- Missing API key → error with setup instructions
- Generation failure → auto-retry up to 3 attempts per image
- Invalid aspect ratio → warning, proceed with default
- Reference images with unsupported provider/model → error with fix hint
Extension Support
Custom configurations via EXTEND.md. See Preferences section for paths and supported options.
GitHub Owner
Owner: jimliu
GitHub Links
- Twitter: https://twitter.com/dotey
Files
first-time-setup.md
SKILL.md
name: baoyu-image-gen description: AI image generation with OpenAI, Google, OpenRouter, DashScope, Jimeng, Seedream and Replicate APIs. Supports text-to-image, reference images, aspect ratios, and batch generation from saved prompt files. Sequential by default; use batch parallel generation when the user already has multiple prompts or wants stable multi-image throughput. Use when user asks to generate, create, or draw images. version: 1.56.2 metadata: openclaw: homepage: https://github.com/JimLiu/baoyu-skills#baoyu-image-gen requires: anyBins: - bun - npx
Image Generation (AI SDK)
Official API-based image generation. Supports OpenAI, Google, OpenRouter, DashScope (阿里通义万象), Jimeng (即梦), Seedream (豆包) and Replicate providers.
Script Directory
Agent Execution:
{baseDir}= this SKILL.md file's directory- Script path =
{baseDir}/scripts/main.ts - Resolve
${BUN_X}runtime: ifbuninstalled →bun; ifnpxavailable →npx -y bun; else suggest installing bun
Step 0: Load Preferences ⛔ BLOCKING
CRITICAL: This step MUST complete BEFORE any image generation. Do NOT skip or defer. Check EXTEND.md existence (priority: project → user):
# macOS, Linux, WSL, Git Bash
test -f .baoyu-skills/baoyu-image-gen/EXTEND.md && echo "project"
test -f "${XDG_CONFIG_HOME:-$HOME/.config}/baoyu-skills/baoyu-image-gen/EXTEND.md" && echo "xdg"
test -f "$HOME/.baoyu-skills/baoyu-image-gen/EXTEND.md" && echo "user"
# PowerShell (Windows)
if (Test-Path .baoyu-skills/baoyu-image-gen/EXTEND.md) { "project" }
$xdg = if ($env:XDG_CONFIG_HOME) { $env:XDG_CONFIG_HOME } else { "$HOME/.config" }
if (Test-Path "$xdg/baoyu-skills/baoyu-image-gen/EXTEND.md") { "xdg" }
if (Test-Path "$HOME/.baoyu-skills/baoyu-image-gen/EXTEND.md") { "user" }
| Result | Action |
|---|---|
| Found | Load, parse, apply settings. If default_model.[provider] is null → ask model only (Flow 2) |
| Not found | ⛔ Run first-time setup (references/config/first-time-setup.md) → Save EXTEND.md → Then continue |
| CRITICAL: If not found, complete the full setup (provider + model + quality + save location) using AskUserQuestion BEFORE generating any images. Generation is BLOCKED until EXTEND.md is created. | |
| Path | Location |
| ------ | ---------- |
.baoyu-skills/baoyu-image-gen/EXTEND.md | Project directory |
$HOME/.baoyu-skills/baoyu-image-gen/EXTEND.md | User home |
| EXTEND.md Supports: Default provider | Default quality |
Schema: references/config/preferences-schema.md |
Usage
# Basic
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image cat.png
# With aspect ratio
${BUN_X} {baseDir}/scripts/main.ts --prompt "A landscape" --image out.png --ar 16:9
# High quality
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --quality 2k
# From prompt files
${BUN_X} {baseDir}/scripts/main.ts --promptfiles system.md content.md --image out.png
# With reference images (Google, OpenAI, OpenRouter, or Replicate)
${BUN_X} {baseDir}/scripts/main.ts --prompt "Make blue" --image out.png --ref source.png
# With reference images (explicit provider/model)
${BUN_X} {baseDir}/scripts/main.ts --prompt "Make blue" --image out.png --provider google --model gemini-3-pro-image-preview --ref source.png
# OpenRouter (recommended default model)
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider openrouter
# OpenRouter with reference images
${BUN_X} {baseDir}/scripts/main.ts --prompt "Make blue" --image out.png --provider openrouter --model google/gemini-3.1-flash-image-preview --ref source.png
# Specific provider
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider openai
# DashScope (阿里通义万象)
${BUN_X} {baseDir}/scripts/main.ts --prompt "一只可爱的猫" --image out.png --provider dashscope
# DashScope Qwen-Image 2.0 Pro (recommended for custom sizes and text rendering)
${BUN_X} {baseDir}/scripts/main.ts --prompt "为咖啡品牌设计一张 21:9 横幅海报,包含清晰中文标题" --image out.png --provider dashscope --model qwen-image-2.0-pro --size 2048x872
# DashScope legacy Qwen fixed-size model
${BUN_X} {baseDir}/scripts/main.ts --prompt "一张电影感海报" --image out.png --provider dashscope --model qwen-image-max --size 1664x928
# Replicate (google/nano-banana-pro)
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider replicate
# Replicate with specific model
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider replicate --model google/nano-banana
# Batch mode with saved prompt files
${BUN_X} {baseDir}/scripts/main.ts --batchfile batch.json
# Batch mode with explicit worker count
${BUN_X} {baseDir}/scripts/main.ts --batchfile batch.json --jobs 4 --json
Batch File Format
{
"jobs": 4,
"tasks": [
{
"id": "hero",
"promptFiles": ["prompts/hero.md"],
"image": "out/hero.png",
"provider": "replicate",
"model": "google/nano-banana-pro",
"ar": "16:9",
"quality": "2k"
},
{
"id": "diagram",
"promptFiles": ["prompts/diagram.md"],
"image": "out/diagram.png",
"ref": ["references/original.png"]
}
]
}
Paths in promptFiles, image, and ref are resolved relative to the batch file's directory. jobs is optional (overridden by CLI --jobs). Top-level array format (without jobs wrapper) is also accepted.
Options
| Option | Description |
|---|---|
--prompt <text>, -p | Prompt text |
--promptfiles <files...> | Read prompt from files (concatenated) |
--image <path> | Output image path (required in single-image mode) |
--batchfile <path> | JSON batch file for multi-image generation |
--jobs <count> | Worker count for batch mode (default: auto, max from config, built-in default 10) |
--provider google|openai|openrouter|dashscope|jimeng|seedream|replicate | Force provider (default: auto-detect) |
--model <id>, -m | Model ID (Google: gemini-3-pro-image-preview; OpenAI: gpt-image-1.5; OpenRouter: google/gemini-3.1-flash-image-preview; DashScope: qwen-image-2.0-pro) |
--ar <ratio> | Aspect ratio (e.g., 16:9, 1:1, 4:3) |
--size <WxH> | Size (e.g., 1024x1024) |
--quality normal|2k | Quality preset (default: 2k) |
--imageSize 1K|2K|4K | Image size for Google/OpenRouter (default: from quality) |
--ref <files...> | Reference images. Supported by Google multimodal, OpenAI GPT Image edits, OpenRouter multimodal models, and Replicate. Not supported by Jimeng or Seedream |
--n <count> | Number of images |
--json | JSON output |
Environment Variables
| Variable | Description |
|---|---|
OPENAI_API_KEY | OpenAI API key |
OPENROUTER_API_KEY | OpenRouter API key |
GOOGLE_API_KEY | Google API key |
DASHSCOPE_API_KEY | DashScope API key (阿里云) |
REPLICATE_API_TOKEN | Replicate API token |
JIMENG_ACCESS_KEY_ID | Jimeng (即梦) Volcengine access key |
JIMENG_SECRET_ACCESS_KEY | Jimeng (即梦) Volcengine secret key |
ARK_API_KEY | Seedream (豆包) Volcengine ARK API key |
OPENAI_IMAGE_MODEL | OpenAI model override |
OPENROUTER_IMAGE_MODEL | OpenRouter model override (default: google/gemini-3.1-flash-image-preview) |
GOOGLE_IMAGE_MODEL | Google model override |
DASHSCOPE_IMAGE_MODEL | DashScope model override (default: qwen-image-2.0-pro) |
REPLICATE_IMAGE_MODEL | Replicate model override (default: google/nano-banana-pro) |
JIMENG_IMAGE_MODEL | Jimeng model override (default: jimeng_t2i_v40) |
SEEDREAM_IMAGE_MODEL | Seedream model override (default: doubao-seedream-5-0-260128) |
OPENAI_BASE_URL | Custom OpenAI endpoint |
OPENROUTER_BASE_URL | Custom OpenRouter endpoint (default: https://openrouter.ai/api/v1) |
OPENROUTER_HTTP_REFERER | Optional app/site URL for OpenRouter attribution |
OPENROUTER_TITLE | Optional app name for OpenRouter attribution |
GOOGLE_BASE_URL | Custom Google endpoint |
DASHSCOPE_BASE_URL | Custom DashScope endpoint |
REPLICATE_BASE_URL | Custom Replicate endpoint |
JIMENG_BASE_URL | Custom Jimeng endpoint (default: https://visual.volcengineapi.com) |
JIMENG_REGION | Jimeng region (default: cn-north-1) |
SEEDREAM_BASE_URL | Custom Seedream endpoint (default: https://ark.cn-beijing.volces.com/api/v3) |
BAOYU_IMAGE_GEN_MAX_WORKERS | Override batch worker cap |
BAOYU_IMAGE_GEN_<PROVIDER>_CONCURRENCY | Override provider concurrency, e.g. BAOYU_IMAGE_GEN_REPLICATE_CONCURRENCY |
BAOYU_IMAGE_GEN_<PROVIDER>_START_INTERVAL_MS | Override provider start gap, e.g. BAOYU_IMAGE_GEN_REPLICATE_START_INTERVAL_MS |
Load Priority: CLI args > EXTEND.md > env vars > <cwd>/.baoyu-skills/.env > ~/.baoyu-skills/.env |
Model Resolution
Model priority (highest → lowest), applies to all providers:
- CLI flag:
--model <id> - EXTEND.md:
default_model.[provider] - Env var:
<PROVIDER>_IMAGE_MODEL(e.g.,GOOGLE_IMAGE_MODEL) - Built-in default
EXTEND.md overrides env vars. If both EXTEND.md
default_model.google: "gemini-3-pro-image-preview"and env varGOOGLE_IMAGE_MODEL=gemini-3.1-flash-image-previewexist, EXTEND.md wins. Agent MUST display model info before each generation:
- Show:
Using [provider] / [model] - Show switch hint:
Switch model: --model <id> | EXTEND.md default_model.[provider] | env <PROVIDER>_IMAGE_MODEL
DashScope Models
Use --model qwen-image-2.0-pro or set default_model.dashscope / DASHSCOPE_IMAGE_MODEL when the user wants official Qwen-Image behavior.
Official DashScope model families:
qwen-image-2.0-pro,qwen-image-2.0-pro-2026-03-03,qwen-image-2.0,qwen-image-2.0-2026-03-03- Free-form
sizein宽*高format - Total pixels must stay between
512*512and2048*2048 - Default size is approximately
1024*1024 - Best choice for custom ratios such as
21:9and text-heavy Chinese/English layouts
- Free-form
qwen-image-max,qwen-image-max-2025-12-30,qwen-image-plus,qwen-image-plus-2026-01-09,qwen-image- Fixed sizes only:
1664*928,1472*1104,1328*1328,1104*1472,928*1664 - Default size is
1664*928 qwen-imagecurrently has the same capability asqwen-image-plus
- Fixed sizes only:
- Legacy DashScope models such as
z-image-turbo,z-image-ultra,wanx-v1- Keep using them only when the user explicitly asks for legacy behavior or compatibility When translating CLI args into DashScope behavior:
--sizewins over--ar- For
qwen-image-2.0*, prefer explicit--size; otherwise infer from--arand use the official recommended resolutions below - For
qwen-image-max/plus/image, only use the five official fixed sizes; if the requested ratio is not covered, switch toqwen-image-2.0-pro --qualityis a baoyu-image-gen compatibility preset, not a native DashScope API field. Mappingnormal/2konto theqwen-image-2.0*table below is an implementation inference, not an official API guarantee Recommendedqwen-image-2.0*sizes for common aspect ratios: | Ratio |normal|2k| |-------|----------|------| |1:1|1024*1024|1536*1536| |2:3|768*1152|1024*1536| |3:2|1152*768|1536*1024| |3:4|960*1280|1080*1440| |4:3|1280*960|1440*1080| |9:16|720*1280|1080*1920| |16:9|1280*720|1920*1080| |21:9|1344*576|2048*872| DashScope official APIs also exposenegative_prompt,prompt_extend, andwatermark, butbaoyu-image-gendoes not expose them as dedicated CLI flags today. Official references:- Qwen-Image API
- Text-to-image guide
- Qwen-Image Edit API
OpenRouter Models
Use full OpenRouter model IDs, e.g.:
google/gemini-3.1-flash-image-preview(recommended, supports image output and reference-image workflows)google/gemini-2.5-flash-image-previewblack-forest-labs/flux.2-pro- Other OpenRouter image-capable model IDs Notes:
- OpenRouter image generation uses
/chat/completions, not the OpenAI/imagesendpoints - If
--refis used, choose a multimodal model that supports image input and image output --imageSizemaps to OpenRouterimageGenerationOptions.size;--size <WxH>is converted to the nearest OpenRouter size and inferred aspect ratio when possible
Replicate Models
Supported model formats:
owner/name(recommended for official models), e.g.google/nano-banana-proowner/name:version(community models by version), e.g.stability-ai/sdxl:<version>Examples:
# Use Replicate default model
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider replicate
# Override model explicitly
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider replicate --model google/nano-banana
Provider Selection
--refprovided + no--provider→ auto-select Google first, then OpenAI, then OpenRouter, then Replicate (Jimeng and Seedream do not support reference images)--providerspecified → use it (if--ref, must begoogle,openai,openrouter, orreplicate)- Only one API key available → use that provider
- Multiple available → default to Google
Quality Presets
| Preset | Google imageSize | OpenAI Size | OpenRouter size | Replicate resolution | Use Case |
|---|---|---|---|---|---|
normal | 1K | 1024px | 1K | 1K | Quick previews |
2k (default) | 2K | 2048px | 2K | 2K | Covers, illustrations, infographics |
| Google/OpenRouter imageSize: Can be overridden with `--imageSize 1K | 2K | 4K` |
Aspect Ratios
Supported: 1:1, 16:9, 9:16, 4:3, 3:4, 2.35:1
- Google multimodal: uses
imageConfig.aspectRatio - OpenAI: maps to closest supported size
- OpenRouter: sends
imageGenerationOptions.aspect_ratio; if only--size <WxH>is given, aspect ratio is inferred automatically - Replicate: passes
aspect_ratioto model; when--refis provided without--ar, defaults tomatch_input_image
Generation Mode
Default: Sequential generation.
Batch Parallel Generation: When --batchfile contains 2 or more pending tasks, the script automatically enables parallel generation.
| Mode | When to Use |
|---|---|
| Sequential (default) | Normal usage, single images, small batches |
| Parallel batch | Batch mode with 2+ tasks |
| Execution choice: | |
| Situation | Preferred approach |
| ----------- | -------------------- |
| One image, or 1-2 simple images | Sequential |
| Multiple images already have saved prompt files | Batch (--batchfile) |
| Each image still needs separate reasoning, prompt writing, or style exploration | Subagents |
Output comes from baoyu-article-illustrator with outline.md + prompts/ | Batch (build-batch.ts -> --batchfile) |
| Rule of thumb: |
- Prefer batch over subagents once prompt files are already saved and the task is "generate all of these"
- Use subagents only when generation is coupled with per-image thinking, rewriting, or divergent creative exploration Parallel behavior:
- Default worker count is automatic, capped by config, built-in default 10
- Provider-specific throttling is applied only in batch mode, and the built-in defaults are tuned for faster throughput while still avoiding obvious RPM bursts
- You can override worker count with
--jobs <count> - Each image retries automatically up to 3 attempts
- Final output includes success count, failure count, and per-image failure reasons
Error Handling
- Missing API key → error with setup instructions
- Generation failure → auto-retry up to 3 attempts per image
- Invalid aspect ratio → warning, proceed with default
- Reference images with unsupported provider/model → error with fix hint
Extension Support
Custom configurations via EXTEND.md. See Preferences section for paths and supported options.