��VR��Ƶ Labs: Training large language models using Amazon SageMaker HyperPod

carrie.brooker@thomsonreuters.com — Tue, 17 Sep 2024 09:12:44 +0000

2023 proved to be an inflection point for AI, prompting ��VR��Ƶ to consider how our high-value, curated, data could improve general language models on customer-specific tasks. Training and finetuning a large language model (LLM) is compute-intensive and requires specialized hardware.

quickly discovered that it was extremely difficult to acquire these resources on-demand and at scale in our cloud environments. Further, looking to other third parties presented its own set of risks and challenges.

We turned to (AWS), which has long been a trusted partner in secure and scalable solutions, to get early access to . With our computing platform acquired, we were ready to roll up our sleeves and do the hard work of exploring how to optimally train and finetune models to our domain. In our first phase of experimentation, we peaked at 16 compute instances, 128 A100 GPUs, with the longest job taking 36 days to complete training a 70 billion parameter model.

Initial results of our custom models look promising and our research continues, supported by the release of . Our explores the journey that ��VR��Ƶ took to enable cutting-edge research in training domain-adapted LLMs using Amazon SageMaker HyperPod.

This is a guest post from John Duprey, distinguished engineer, ��VR��Ƶ.

compute-intensive Archives - ����VR��Ƶ Institute

����VR��Ƶ Labs: Training large language models using Amazon SageMaker HyperPod

compute-intensive Archives - ��VR��Ƶ Institute

��VR��Ƶ Labs: Training large language models using Amazon SageMaker HyperPod