“请勿训练”元标签:人工智能的Robots.txt – 会有人尊重它们吗?
我注意到越来越多的创作者和平台悄悄地在他们的页面上添加类似于 `<meta name="robots" content="noai">` 的标签——这有点像 robots.txt,但针对的是大型语言模型(LLMs)。对于不熟悉的人来说,robots.txt 是网站用来告诉搜索引擎哪些页面不应被抓取的标准文件。这些新的 “noai” 标签具有类似的目的,但针对的是 AI 训练模型,而不是搜索爬虫。
一些实施这些选择退出机制的平台示例:
- Sketchfab 现在为创作者提供了在其账户设置中阻止 AI 训练的选项
- DeviantArt 在其内容保护策略中率先引入了这些标签
- ArtStation 添加了元标签并更新了其服务条款
- Shutterstock 为那些其图像被用于 AI 训练的贡献者创建了补偿模型
但问题在于,这些标签似乎越来越被视为可选建议,而不是严格的界限:
- 各种创作者报告称这些标签被忽视。例如,在 DeviantArt 上的一次讨论(https://www.deviantart.com/lumaris/journal/NoAI-meta-tag-is-NOT-honored-by-DA-941468316)记录了标签未被遵守的案例,并提到 GitHub 上的对话显示了实施问题
- 在一个关于图像数据集工具的 GitHub 拉取请求中(https://github.com/rom1504/img2dataset/pull/218),开发者将尊重这些标签设为可选而非默认,评论者形容这就像是“让它失去效力,以便我们可以洗手不干,而实际上并没有尊重任何人的意愿”
- 实施这些标签的公司 Raptive Support 承认它们“尚未成为行业标准,我们无法保证任何或所有机器人会尊重它们”(https://help.raptive.com/hc/en-us/articles/13764527993755-NoAI-Meta-Tag-FAQs)
- 向 HTML 标准机构提出的提案(https://github.com/whatwg/html/issues/9334)承认这些标签并未强制执行同意,且“在没有强有力的监管的情况下,合规可能不会发生”
一些创作者变得非常悲观,著名艺术家 David Revoy 宣布他们将放弃像 #NoAI 这样的标签,因为“损害已经造成”,而且他们“无法逐一从数据库中删除自己的艺术作品。”(https://www.davidrevoy.com/article977/artificial-inteligence-why-i-ll-not-hashtag-my-art-humanart-humanmade-or-noai)
这引发了几个实际问题:
- 在没有强制执行机制的情况下,这实际上能起作用吗?
- 从法律上来说,未来能否强制执行?
- 有没有人成功使用这些标签来防止未经授权的训练?
除了技术实施,我认为这指向了一个关于 AI 时代创作者同意的更广泛讨论。这是否更多是象征性的——一个信号,表明人们希望在开放网络上有某种形式的“AI 同意”?还是它可能演变成一个真正有约束力的标准?
我很好奇这里的朋友们是否在自己的网站或内容中添加了类似的东西。你们是否实施了任何技术措施来检测自己的内容是否被用于训练?对于那些在 AI 领域工作的人来说:你们如何看待尊重这些选择退出信号?
期待听到大家的看法。
查看原文
I've been noticing more creators and platforms quietly adding things like <meta name="robots" content="noai"> to their pages - kind of like a robots.txt, but for LLMs. For those unfamiliar, robots.txt is a standard file websites use to tell search engines which pages they shouldn't crawl. These new "noai" tags serve a similar purpose, but for AI training models instead of search crawlers.<p>Some examples of platforms implementing these opt-out mechanisms:
- Sketchfab now offers creators an option to block AI training in their account settings
- DeviantArt pioneered these tags as part of their content protection approach
- ArtStation added both meta tags and updated their Terms of Service
- Shutterstock created a compensation model for contributors whose images are used in AI training<p>But here's where things get concerning - there's growing evidence these tags are being treated as optional suggestions rather than firm boundaries:<p>- Various creators have reported issues with these tags being ignored. For instance, a discussion on DeviantArt (https://www.deviantart.com/lumaris/journal/NoAI-meta-tag-is-NOT-honored-by-DA-941468316) documents cases where the tags weren't honored, with references to GitHub conversations showing implementation issues<p>- In a GitHub pull request for an image dataset tool (https://github.com/rom1504/img2dataset/pull/218), developers made respecting these tags optional rather than default, which one commenter described as having "gutted it so that we can wash our hands of responsibility without actually respecting anyone's wishes"<p>- Raptive Support, a company implementing these tags, admits they "are not yet an industry standard, and we cannot guarantee that any or all bots will respect them" (https://help.raptive.com/hc/en-us/articles/13764527993755-NoAI-Meta-Tag-FAQs)<p>- A proposal to the HTML standards body (https://github.com/whatwg/html/issues/9334) acknowledges these tags don't enforce consent and compliance "might not happen short of robust regulation"<p>Some creators have become so cynical that one prominent artist David Revoy announced they're abandoning tags like #NoAI because "the damage has already been done" and they "can't remove [their] art one by one from their database." (https://www.davidrevoy.com/article977/artificial-inteligence-why-i-ll-not-hashtag-my-art-humanart-humanmade-or-noai)<p>This raises several practical questions:<p>- Will this actually work in practice without enforcement mechanisms?<p>- Could it be legally enforceable down the line?<p>- Has anyone successfully used these tags to prevent unauthorized training?<p>Beyond the technical implementation, I think this points to a broader conversation about creator consent in the AI era. Is this more symbolic - a signal that people want some version of "AI consent" for the open web? Or could it evolve into an actual standard with teeth?<p>I'm curious if folks here have added something like this to their own websites or content. Have you implemented any technical measures to detect if your content is being used for training anyway? And for those working in AI: what's your take on respecting these kinds of opt-out signals?<p>Would love to hear what others think.