Solution: Image → vision model → text → standard pipeline. All modalities produce NODE_SCHEMA output.
v3 2026-04-05 Q Mapped to whitepaper sections
v2 2026-04-05 Q Imported SPEC-013 from model_specifications_v2.html
v1 2026-04-05 Q Created spec: SPEC-013: Multi-Modal Input Pipeline