开放权重大型语言模型的陷阱

1作者: hiddenest4 天前原帖
一些初创公司正在微调开放的语言模型,而不是使用GPT或Gemini。有时是为了特定语言,有时是为了狭窄的任务。但我发现它们都犯了同样的错误。 通过一个简单的提示(这里不分享),我让几个“定制”语言模型服务泄露了它们的内部系统提示——比如安全漏洞应对手册和产品行动清单。 例如,SKT A.X 4.0(基于Qwen 2.5)返回了与最近SKT数据泄露相关的内部指南和关于赔偿政策的指示。Vercel的v0模型泄露了它们系统可以生成的行动示例。 关键在于:如果基础模型泄露,那么基于它构建的每个服务都是脆弱的,无论你微调得多么精细。我们需要不仅考虑服务层面的系统提示加固,还要关注上游改进以及开放权重语言模型本身的更强防御。
查看原文
Some startups are fine-tuning open LLMs instead of using GPT or Gemini. Sometimes it’s for specific language, sometimes for narrow tasks. But I found they’re all making the same mistake.<p>With a simple prompt (not sharing here), I got several “custom” LLM services to spill their internal system prompts—stuff like security breach playbooks and product action lists.<p>For example, SKT A.X 4.0 (based on Qwen 2.5) returned internal guidelines related to the recent SKT data breach and instructions about compensation policies. Vercel’s v0 model leaked examples of actions their system can generate.<p>The point: if the base model leaks, every service built on it is vulnerable, no matter how much you fine-tune. We need to think not only about system prompt hardening at the service level, but also about upstream improvements and more robust defenses in open-weight LLMs themselves.