Charlottesville, VA, Aug 30, 2023 (500NewsWire) — This week Hum, the leading provider of AI & data intelligence solutions for publishers, announced the open source release of their new large language model (LLM) Lodestone.
Using Google's BERT architecture as a foundation and leveraging several improvements that have been released since then, Hum developed a novel model for processing longer text sequences. Lodestone is the highest performing model of its size and sequence length on the MTEB Leaderboard. This makes it particularly compelling for real-time applications on long text where larger models may be prohibitively expensive or slow.
"We are excited to contribute to the open source community and advance natural language processing research, while helping other organizations unlock value from content data," said Niall Little, Hums CTO. "Lodestone can contextualize an entire research paper, surfacing content insights that previous models couldn't comprehend looking at just one or two paragraphs at a time."
Lodestone was trained on a large publicly available dataset, including over 1 million scholarly research articles and publications. The model can process text sequences of 4096 tokens to better capture topical context and nuance compared to other commonly-used LLM models.
Key features include:
- Long sequence embedding
- Improved semantic understanding
- Sentence vectorization for information retrieval, clustering, and sentence similarity tasks
Starting today, developers and enterprises can fine-tune and deploy their own models using Lodestone on Hugging Face, putting long-sequence AI applications in reach of more projects and businesses.
"This release furthers our commitment to using AI to solve challenges for publishers, societies, and other content-driven organizations," said Little. "Were continuing to fine-tune the model for the needs of media and publishing industry clients, but look forward to seeing how the community can build on Lodestone to advance applications for content intelligence and responsible AI."
Hum is a leading AI and data platform designed for publishers, societies, and media. Hum offers powerful content and audience intelligence, enabling content-driven organizations to derive strategic insights and deliver personalized experiences. Learn more at hum.works.
View source version on newsdirect.com: https://newsdirect.com/news/hum-open-sources-cutting-edge-llm-for-long-text-sequences-440016105