A foundation model, also known as large X model (LxM), is a machine learning or deep learning model that is trained on vast datasets so it can be applied across a wide range of use cases.[1]Generative AI applications like Large Language Models are often examples of foundation models.[1]
Building foundation models is often highly resource-intensive, with the most advanced models costing hundreds of millions of dollars to cover the expenses of acquiring, curating, and processing massive datasets, as well as the compute power required for training.[2] These costs stem from the need for sophisticated infrastructure, extended training times, and advanced hardware, such as GPUs. In contrast, adapting an existing foundation model for a specific task or using it directly is far less costly, as it leverages pre-trained capabilities and typically requires only fine-tuning on smaller, task-specific datasets.
Early examples of foundation models are language models (LMs) like OpenAI's GPT series and Google's BERT.[3][4] Beyond text, foundation models have been developed across a range of modalities—including DALL-E and Flamingo[5] for images, MusicGen[6] for music, and RT-2[7] for robotic control. Foundation models are also being developed for fields like astronomy,[8] radiology,[9] genomics,[10] music,[11] coding,[12]times-series forecasting,[13] mathematics,[14] and chemistry.[15]
^Nestor Maslej, Loredana Fattorini, Erik Brynjolfsson, John Etchemendy, Katrina Ligett, Terah Lyons, James Manyika, Helen Ngo, Juan Carlos Niebles, Vanessa Parli, Yoav Shoham, Russell Wald, Jack Clark, and Raymond Perrault, "The AI Index 2023 Annual Report," AI Index Steering Committee, Institute for Human-Centered AI, Stanford University, Stanford, CA, April 2023.
^Rogers, Anna; Kovaleva, Olga; Rumshisky, Anna (2020). "A Primer in BERTology: What we know about how BERT works". arXiv:2002.12327 [cs.CL].
^Azerbayev, Zhangir; Schoelkopf, Hailey; Paster, Keiran; Santos, Marco Dos; McAleer, Stephen; Jiang, Albert Q.; Deng, Jia; Biderman, Stella; Welleck, Sean (30 November 2023). "Llemma: An Open Language Model For Mathematics". arXiv:2310.10631 [cs.CL].