NewsNews

🤗Welcome to the Hugging Face Newsletter! 🤗

Every few weeks, we'll be updating you on the latest happenings at Hugging Face. Make sure to subscribe and share with all NLP lovers to get the latest updates on releases, readings, research, and more!

Have an idea for the newsletter? Email newsletter@huggingface.co

Hugging Face  


🤗 Datasets and Metrics Library Heading Toward First Non-Beta Release

Led by ML Intern Quentin Lhoest (@qlhoest) and CSO Thomas Wolf (@Thom_Wolf), the 🤗 team has been hard at work on our newest library focusing on datasets and metrics.

Features:

  • One line access to 150+ datasets and metrics - very easy to add new datasets/metrics to the hub
  • Loading a 17GB+ dataset like English Wikipedia only takes 9MB in RAM and you can iterate over the data at 2-3 Gbit/s
  • Blazing-fast and reproducible data processing
  • Deep integration with numpy/pandas/pytorch/tensorflow

New this summer:

  • Brand new documentation
  • More tutorials to showcase simplicity of use

Short roadmap for the 1.0.0 release:

  • Deep integration and focus on knowledge-based models such as RAG/REALM/ORQA/MARGE/knn-LM using indexed datasets
  • Additional speed improvements (multiprocessing, instant shuffling)
  • Support for multi-modal datasets
  • Final community-voted name for the library will be "🤗 datasets" (a change from the original non-beta release 1.0.0)
Hugging Face  





CommunityCommunity




🔥Top Contributors 🔥

Every newsletter, we'll be highlighting some top contributors to the Hugging Face library! This week's top contributors:

  • Suraj Patil - Added the MBart model and diverse functionalities for Seq2Seq models.
  • Guillaume - Added the "ConversationalPipeline" and optimized "Bad token ids" for generation.
  • Stas Bekman - Fixed many inconsistencies in the documentation and cleaned up some tests.
  • Manuel Romero - Added multiple T5 models to the model hub, notably models for Question Generation.
  • Pradhy729 - Added "Feed forward chunking functionality" and added support for "IterableDataset" in Trainer.
https://github.com/huggingface/transformers  

Want Some 🤗 Stickers?

Make sure you're subscribed to this newsletter and send an email with your shipping address and how you're using Transformers to clara@huggingface.co with the subject line: 🤗Stickers

Hugging Face  

TutorialsTutorials