问HN:MCP与基于浏览器的代理

1作者: giaco_hendel1 天前原帖
大家好,我想请教一下你们的意见。 我们建立了 vykee.co 作为一个引导工具。我们的想法是通过隐藏高级功能,使 SaaS 界面更易于理解,特别是对新用户而言。 其中一个主要功能是标签系统,允许 SaaS 公司对元素进行标记,并将多个元素组合成一个功能。因此,基本上我们有一个前端的用户界面层(每个元素和功能都有独特的标识符和注释)。 我认为这个用户界面层对大型语言模型(LLMs)可能非常有用:如果我们将所有这些用户界面信息放在一个简单的 llms.md 文件中(类似于 robots.txt,但针对 LLMs),它们就能比解析 HTML 或依赖截图更好地理解界面。 不过,这仅对基于浏览器的代理有帮助。 我们曾讨论过,有人认为与其押注于基于浏览器的代理,不如押注于 MCP,因为后者是标准且更广泛采用的。这个想法是建立一个 MCP,并将其连接到带标签的前端元素。 你们认为继续押注于基于浏览器的代理还有意义吗? 谢谢!
查看原文
Hi guys, I want ask for your opinion on something.<p>We’ve built vykee.co as an onboarding tool. Idea = make SaaS interfaces easier to understand, by hiding advanced features for new users.<p>One of the main features is a tagging system that allows the SaaS company to tag elements, and group multiple elements into one feature. So basically we have an UI layer of the frontend (complete with unique identifiers and annotations for every element and feature).<p>I think this UI layer could be pretty useful to LLMs: If we put all that UI info in a simple llms.md file (like a robots.txt but for LLMs), they could understand the interface much better than having to parse the hmtl or rely on screenshots.<p>Now – this would only be helpful for browser-based agents.<p>We’ve had this discussion where some argue it would make much more sense to bet on MCP instead of browser-based agents, since it’s the standard and more widely adopted. The idea being to set up an MCP and connect it to the tagged frontend elements.<p>Do you think it still makes sense to bet on browser-based agents?<p>Thanks!