HackerNews中文版

伦理在哪里？为何不呼吁全面限制发布？目前所有可用的人工智能模型，以及未来可能出现的模型，都是基于公开可用的数据集进行训练的，这些数据集包括但不限于编程博客、问答网站、讨论论坛、个人博客、开源代码、免费书籍等。然而，几乎所有这些模型背后的组织都从各种不明来源获取数据，甚至包括盗版内容，更不用说来自各种数字画廊、数字商店、流媒体平台等的艺术作品——照片、绘画、音乐等等，我不需要列举所有来源。虽然这些数据对训练是有益的，但同样也对那些依赖自己创作和工作方式的程序员、创作者和作家造成了影响。如果从依赖创意工作者的角度来看，基于生成式人工智能的创作已经接管了这个过程，并且正在摧毁数以百万计依赖于此的人们。现在，大多数争论并不是关于替代，而是为什么不使用人工智能来支持这个过程。好吧，这只是偏离了本文的主题，并没有反对这一点。当人们将自己的作品以开源形式发布时，他们甚至在梦中都不会想到，私人信息和知识产权会被如此腐败、以盈利为目的的组织窃取，用于训练最终会取代他们的东西。为什么没有人愿意发声反对或采取法律行动？我知道有很多针对多家公司的集体诉讼和抗议，但损害已经造成。因此，我提出以下建议：- 既然公开可用的材料被用于训练任何人工智能模型，那就免费提供这些材料。（最初使用这些材料就已经是错误的，无论是否获得同意——如果有人投资过，那是他们自己的问题。不想讨论资本支出/运营支出等问题。） - 或者为每一条在训练中使用的公开信息付费（由于版权、共享版权和其他无意义的许可证，这并不是一个可行的选项！） - 停止这样做——这显然不会发生…还有其他想法或抱怨吗？虽然我表达了一些自己的想法，但我希望社区能够集体行动。请花几分钟时间表达你的支持或反对意见，但请简明扼要地表达你的想法，以便大家可以汇总、呈现并在不久的将来采取行动（可能！）。再次感谢。

查看原文

Where are the ethics and call for a blanket restrain from publishing.While all the available AI models and probably future ones are going to be trained on publicly available data sets including but not limited to programming blogs, q & a sites, discussion forums, personal blogs, open source code, free books etc., almost all of the organisations behind the models have taken sources from all shady places and even pirated content and not to mention the art works - photos, paintings, music from various digital galleries, digital shops, streaming platforms etc., etc., and I don't have to list all sources.While it is all good for the training, the same is being targetted towards the fellow programmers, creators, writers who thrive on their creations and the way of work.If one looks at from a perspective of someone trying to do something that is dependent on creative people, the generative AI based creation has taken over the process and has killed or killing millions of people who are dependent. Now, most of the argument is not about the replacement, but rather why don't one use AI to support the process. Well, that's just a deviation from this article's motto and it is not disagreed.When people published their work as open source, not even in their dreams they would have thought that such breach of private information and intellectual properties will be swindled by such corrupt, for profit organisations to train something that will eventually replace them. Why is that no-one wants to voice against or take legal action. I'm aware that there are a lot of class action lawsuits and uproar against many companies, but the damage is already done.So, I'm proposing the following:- As how the publicly available materials were used to train whatever AI model, give it away for free. (It was wrong in the first place to use it with / without consent - If one had invested, it's their issue. Don't want to talk about CAPEX / OPEX etc,.). - Or pay for every open piece of information that was used in training (this is not an option due to the copyright and copyleft and other mindless licenses that govern OSS and other free to use media!) - Stop doing it - It will not happen...Any other thoughts / rants?While I have expressed some of my thoughts, I want the communities thoughts to collectively action. Please spend a few minutes to voice out whether in support or against, but be concise in expressing your thoughts so that all can be collated, presented and actioned (probably!) in the near future.Thanks again.

呼吁全面禁止开源贡献或发布