HackerNews中文版

hCaptcha Challenger 利用多模态大语言模型（MLLMs）的空间链式思维（SCoT）推理能力，构建了一种自主工作流框架。这种架构使得自主智能体能够在多样的空间视觉任务上进行零-shot 适应，通过动态问题解决工作流来实现，消除了对特定任务微调或额外训练参数的需求。

查看原文

hCaptcha Challenger harnesses the spatial chain-of-thought (SCoT) reasoning capabilities of multimodal large language models (MLLMs) to construct an agentic workflow framework. This architecture empowers autonomous agents to perform zero-shot adaptation on diverse spatial-visual tasks through dynamic problem-solving workflows, eliminating the requirement for task-specific fine-tuning or additional training parameters.

使用多模态大型语言模型解决hCaptcha挑战