《光之书》——在人工智能内容主导之前,保护人类经验
我正在构建一个在2025年听起来可能有些反常的项目——一个专门以文本为主的网站,旨在存档真实的人类经历。
问题是:我们即将迎来一个转折点,大多数在线内容将由人工智能生成。在真实人类智慧被合成内容淹没之前,我们有一个狭窄的窗口来保存这些智慧。
方法如下:
- 首先:人工智能管道从互联网提取实际智慧,例如从一段25分钟的YouTube视频中提炼出7分钟的可操作内容。
- 纯文本界面,零视觉干扰或参与操控。
- 跨语言的多语言档案。
- 非营利模式,专注于保存而非盈利。
技术挑战:如何在大规模中分辨信号与噪声?目前的做法是使用大型语言模型(LLMs)来识别和提取经验知识,同时过滤掉宣传内容、猜测和无关信息。
保存的内容示例:
- 一位孟买居民如何挽救婚姻(具体步骤,而非理论)。
- 一位加拿大农民如何在干旱中生存(实际使用的技术)。
- 一位沙特阿拉伯居民如何修理汽车并节省4000美元(具体过程)。
链接到模型设计: [Google Drive Mockups](https://drive.google.com/drive/folders/10OLzJzQ76hy5f9Xj964mgrGBtCGc20-X?usp=sharing)
这个界面在今天的标准下看起来几乎是古老的——这是故意为之。可以想象成互联网档案馆与维基百科的结合,但针对的是生活经历而非事实。
我很好奇HN对这个设想的看法。我们真的面临内容真实性的灭绝,还是我在过度思考这个问题?
查看原文
I'm building something that might sound backward in 2025 – a deliberately text-only website that archives real human experiences.
The problem: We're about to hit an inflection point where most online content will be AI-generated. There's a narrow window to preserve authentic human wisdom before it's buried under synthetic content forever.<p>The approach:<p>For starters : AI pipeline extracts actual wisdom from the internet, for example a Youtube video (25 min → 7 min of actionable content)
Pure text interface, zero visual distractions or engagement manipulation
Multilingual archive across languages
Non-profit model focused on preservation, not monetization<p>Technical challenge: How do you separate signal from noise at scale? Current approach uses LLMs to identify and extract experiential knowledge while filtering out promotional content, speculation, and fluff.
Examples of what gets preserved:<p>How someone in Mumbai saved their marriage (specific steps, not theory)
How a farmer in Canada survived drought (actual techniques used)
How someone in Saudi Arabia fixed their car and saved $4000 (exact process)<p>Link to mockups: https://drive.google.com/drive/folders/10OLzJzQ76hy5f9Xj964mgrGBtCGc20-X?usp=sharing<p>The interface looks almost archaic by today's standards – that's intentional. Think Internet Archive meets Wikipedia, but for lived experiences instead of facts.<p>Curious what HN thinks about the premise. Are we actually facing content authenticity extinction, or am I overthinking this?