使用多模态大型语言模型解决hCaptcha挑战
hCaptcha Challenger 利用多模态大语言模型(MLLMs)的空间链式思维(SCoT)推理能力,构建了一种自主工作流框架。这种架构使得自主智能体能够在多样的空间视觉任务上进行零-shot 适应,通过动态问题解决工作流来实现,消除了对特定任务微调或额外训练参数的需求。
查看原文
hCaptcha Challenger harnesses the spatial chain-of-thought (SCoT) reasoning capabilities of multimodal large language models (MLLMs) to construct an agentic workflow framework. This architecture empowers autonomous agents to perform zero-shot adaptation on diverse spatial-visual tasks through dynamic problem-solving workflows, eliminating the requirement for task-specific fine-tuning or additional training parameters.