RUN THIS WORKFLOW NOW ON FLOYO!
ABOUT THE WORKFLOW
Read an Image and Answer
Upload an image and type a question or instruction. The model reads what is in the image and answers in plain language. You can also add an audio clip to transcribe or describe alongside the image.
Model
Gemma 4 E4B by Google DeepMind. An open-weights multimodal model built from Gemini 3 research that takes text, image, and audio and writes a text response. Strong at description, analysis, transcription, and question answering, with a configurable thinking mode.
Description
ai
concept
opensource
google deepmind
floyo
comfyui
multimodal model
model
i2t
ask about image
image reading
image
gemma 4 e4b
gemma
workflow
Details
Downloads
8
Platform
CivitAI
Platform Status
Available
Created
6/25/2026
Updated
6/26/2026
Deleted
-
