启动 HN:Reducto Studio(YC W24)——快速构建精准的文档处理流程
大家好!我们是Adit和Raunak,Reducto的联合创始人(YC W24,<a href="https://reducto.ai">https://reducto.ai</a>)。Reducto将非结构化文档(例如PDF、扫描件、电子表格)转化为结构化数据。这些数据可以用于检索、输入到大型语言模型(LLMs)中,或在后续的其他应用中使用。
我们创办Reducto是因为意识到当今许多AI应用都需要高质量的数据。大家都知道,良好的输入会带来更好的输出,但全球80%的数据仍然被困在杂乱的PDF和电子表格中。Raunak和我推出了一个非常早期的最小可行产品(MVP),用于解析和提取非结构化文档中的信息,并且当技术团队意识到我们的准确性是他们之前未曾见过的时,我们很幸运地获得了很多关注。
起初,我们只是发布了一个供工程师使用的API,但随着时间的推移,我们意识到一个准确的API只是解决方案的一部分。我们的客户希望能够轻松设置多步骤的工作流程,在他们的使用案例中评估和迭代性能,并与参与实际文档处理流程的非工程师团队成员合作。
这就是我们推出Reducto Studio的原因,这是一个基于我们API的网络平台,用户可以在其上构建和迭代端到端的文档处理流程。
通过Studio,您可以:
- 上传整个文件集,并获得针对评估数据的每个字段和每个文档的准确性评分。
- 自动生成并持续优化提取方案,以快速达到生产级质量。
- 保存每次运行,迭代解析/提取配置,并进行结果的并排比较。
您可以在这里查看一些示例(<a href="https://studio.reducto.ai">https://studio.reducto.ai</a>),或者观看这个演示视频:<a href="https://www.loom.com/share/b243551741c642c6a594c00353fcecb3" rel="nofollow">https://www.loom.com/share/b243551741c642c6a594c00353fcecb3</a>。
如果您想上传自己的文档,可以登录并进行上传——我们不要求您预约演示或支付费用来试用。
感谢您的阅读和关注!这只是Studio的第一步,我们非常希望能收到您对任何方面的反馈:用户体验的粗糙之处(我们知道这些问题存在!)、能改善评估的功能、您在处理困难文档时遇到的问题,或任何与非结构化数据处理相关的其他内容。
查看原文
Hi HN! We’re Adit and Raunak, co-founders of Reducto (YC W24, <a href="https://reducto.ai">https://reducto.ai</a>). Reducto turns unstructured documents (e.g., PDFs, scans, spreadsheets) into structured data. This data can then be used for retrieval, passed into LLMs, or used elsewhere downstream.<p>We started Reducto when we realized that so many of today’s AI applications require good quality data. Everyone knows that good inputs lead to better outputs, but 80% of the world’s data is still trapped inside of things like messy PDFs and spreadsheets. Raunak and I launched a really early MVP of parsing and extracting from unstructured documents, and were lucky to have a lot of interest from technical teams when they realized that the accuracy was something they hadn’t seen before.<p>We started by just releasing an API for engineers to build with, but over time we realized that an accurate API was only part of the puzzle. Our customers wanted to be able to easily set up multi step pipelines, evaluate and iterate on performance within their use case, and work with non-engineering teammates that were also involved in the real world document processing flow.<p>That’s why we’re launching Reducto Studio, a web platform that sits on top of our APIs for users to build and iterate on end-to-end document pipelines.<p>With Studio, you can:<p>- Drop an entire file set and get per-field and per-document accuracy scores against your eval data.<p>- Auto-generate and continuously optimize extraction schemas to hit production-grade quality fast.<p>- Save every run, iterate on parse/extract configs, and compare results side-by-side.<p>You can see some examples here (<a href="https://studio.reducto.ai">https://studio.reducto.ai</a>) or you can watch this walkthrough: <a href="https://www.loom.com/share/b243551741c642c6a594c00353fcecb3" rel="nofollow">https://www.loom.com/share/b243551741c642c6a594c00353fcecb3</a>.<p>If you’d like to upload your own document you can log in and do so as well - we don’t make you book a demo or put a payment down to try it.<p>Thanks for reading and checking it out! This is only the first step for Studio, so we’d love feedback on anything: UX rough edges (we know they’re there!), features that would make evaluations better for you, hard documents you’ve had trouble with, or anything else about wrangling with unstructured data.