Hugging Face has announced its acquisition of XetHub, a collaborative platform based in Seattle, founded by former Apple researchers. This platform supports machine learning teams in managing and working with large datasets and models effectively.
While the financial details of the deal have not been disclosed, CEO Clem Delangue shared in a Forbes interview that this is Hugging Face's largest acquisition to date. The company plans to integrate XetHub’s technology into its own platform to enhance storage capabilities, allowing for the support of even larger models and datasets.
Julien Chaumond, CTO of Hugging Face, stated that integrating XetHub’s technology will greatly advance the company's datasets and models over the next five years. This will involve adopting a new version of Large File Storage (LFS) for the Hugging Face Hub.
XetHub, founded in 2021 by Yucheng Low, Ajit Banerjee, and Rajat Arya—all of whom previously worked on Apple’s internal ML infrastructure—is known for its platform that manages extensive models and datasets. XetHub offers a Git-like version control system for repositories that can grow to terabytes, enabling teams to track changes, collaborate, and ensure reproducibility in their machine learning projects.
In its three years of operation, XetHub attracted a diverse range of customers, including major firms like Tableau and Gather AI. The platform is known for handling complex scalability needs with features such as content-defined chunking, deduplication, instant repository mounting, and file streaming.
Following this acquisition, the XetHub platform will be retired, and its features will be incorporated into the Hugging Face Hub. This integration will improve the platform’s capabilities for model and dataset sharing by providing a more robust storage and versioning backend.
Currently, the Hugging Face Hub uses Git LFS, which supports files up to 5GB and repositories up to 10GB. The technology from XetHub will upgrade this to support files larger than 1TB and repositories exceeding 100TB.
Additionally, XetHub’s advanced storage and transfer features will streamline processes. For instance, users will be able to upload only updated chunks of a dataset, rather than re-uploading entire files, which will save time and improve efficiency.
As the industry moves toward models with trillions of parameters, Hugging Face expects this new technology to enhance scalability for both community and enterprise users. The company plans to collaborate closely with XetHub to develop solutions that will improve collaboration and asset tracking within the Hugging Face Hub.
Currently, the Hugging Face Hub manages 1.3 million models, 450,000 datasets, and 680,000 spaces, totaling up to 12PB of data in LFS. The impact of this acquisition on the platform’s capacity and functionality will become clear as the new storage backend is implemented. The timeline for the integration and additional features is yet to be announced.