Encore Episode: Generative AI and Large Language Models
In this week’s episode, Lois Houston and Nikita Abraham, along with Senior Instructor Himanshu Raj, take you through the extraordinary capabilities of Generative AI, a subset of deep learning that doesn’t make predictions but rather creates its own content. They also explore the workings of Large Language Models. Oracle MyLearn: https://mylearn.oracle.com/ou/learning-path/become-an-oci-ai-foundations-associate-2023/127177 Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X (formerly Twitter): https://twitter.com/Oracle_Edu Special thanks to Arijit Ghosh, David Wright, and the OU Studio Team for helping us create this episode. --------------------------------------------------------- Episode Transcript: 00:00 The world of artificial intelligence is vast and everchanging. And with all the buzz around it lately, we figured it was the perfect time to revisit our AI Made Easy series. Join us over the next few weeks as we chat about all things AI, helping you to discover its endless possibilities. Ready to dive in? Let’s go! 00:33 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started! 00:46 Lois: Hello and welcome to the Oracle University Podcast. I’m Lois Houston, Director of Innovation Programs with Oracle University, and with me is Nikita Abraham, Principal Technical Editor. Nikita: Hi everyone! In our last episode, we went over the basics of deep learning. Today, we’ll look at generative AI and large language models, and discuss how they work. To help us with that, we have Himanshu Raj, Senior Instructor on AI/ML. So, let’s jump right in. Hi Himanshu, what is generative AI? 01:21 Himanshu: Generative AI refers to a type of AI that can create new content. It is a subset of deep learning, where the models are trained not to make predictions but rather to generate output on their own. Think of generative AI as an artist who looks at a lot of paintings and learns the patterns and styles present in them. Once it has learned these patterns, it can generate new paintings that resembles what it learned. 01:48 Lois: Let's take an example to understand this better. Suppose we want to train a generative AI model to draw a dog. How would we achieve this? Himanshu: You would start by giving it a lot of pictures of dogs to learn from. The AI does not know anything about what a dog looks like. But by looking at these pictures, it starts to figure out common patterns and features, like dogs often have pointy ears, narrow faces, whiskers, etc. You can then ask it to draw a new picture of a dog. The AI will use the patterns it learned to generate a picture that hopefully looks like a dog. But remember, the AI is not copying any of the pictures it has seen before but creating a new image based on the patterns it has learned. This is the basic idea behind generative AI. In practice, the process involves a lot of complex maths and computation, and there are different techniques and architectures that can be used, such as variational autoencoders (VAs) and Generative Adversarial Networks (GANs). 02:48 Nikita: Himanshu, where is generative AI used in the real world? Himanshu: Generative AI models have a wide variety of applications across numerous domains. For the image generation, generative models like GANs are used to generate realistic images. They can be used for tasks, like creating artwork, synthesizing images of human faces, or transforming sketches into photorealistic images. For text generation, large language models like GPT 3, which are generative in nature, can create human-like text. This has applications in content creation, like writing articles, generating ideas, and again, conversational AI, like chat bots, customer service agents. They are also used in programming for code generation and debugging, and much more. For music generation, generative AI models can also be used. They create new pieces of music after being trained on a specific style or collection of tunes. A famous example is OpenAI's MuseNet. 03:42 Lois: You mentioned large language models in the context of text-based generative AI. So, let’s talk a little more about it. Himanshu, what exactly are large language models? Himanshu: LLMs are a type of artificial intelligence models built to understand, generate, and process human language at a massive scale. They were primarily designed for sequence to sequence tasks such as machine translation, where an input sequence is transformed into an output sequence. LLMs can be used to translate text from one language to another. For example, an LLM could be used to translate English text into French. To do this job, LLM is trained on a massive data set of text and code which allows it to learn the patterns and relationships that exist between different languages. The LLM translates, “How are you?” from English to French, “Comment allez-vous?” It can also answer questions like, what is the capital of France? And it would answer the capital of France is Paris. And it will write an essay on a given topic. For example, write an essay on French Revolution, and it will come up with a response like with a title and introduction. 04:53 Lois: And how do LLMs actually work? Himanshu: So, LLM models are typically based on deep learning architectures such as transformers. They are also trained on vast amount of text data to learn language patterns and relationships, again, with a massive number of parameters usually in order of millions or even billions. LLMs have also the ability to comprehend and understand natural language text at a semantic level. They can grasp context, infer meaning, and identify relationships between words and phrases. 05:26 Nikita: What are the most important factors for a large language model? Himanshu: Model size and parameters are crucial aspects of large language models and other deep learning models. They significantly impact the model’s capabilities, performance, and resource requirement. So, what is model size? The model size refers to the amount of memory required to store the model's parameter and other data structures. Larger model sizes generally led to better performance as they can capture more complex patterns and representation from the data. The parameters are the numerical values of the model that change as it learns to minimize the model's error on the given task. In the context of LLMs, parameters refer to the weights and biases of the model's transformer layers. Parameters are usually measured in terms of millions or billions. For example, GPT-3, one of the largest LLMs to date, has 175 billion parameters making it extremely powerful in language understanding and generation. Tokens represent the individual units into which a piece of text is divided during the processing by the model. In natural language, tokens are usually words, subwords, or characters. Some models have a maximum token limit that they can process and longer text can may require truncation or splitting. Again, balancing model size, parameters, and token handling is crucial when working with LLMs. 06:49 Nikita: But what’s so great about LLMs? Himanshu: Large language models can understand and interpret human language more accurately and contextually. They can comprehend complex sentence structures, nuances, and word meanings, enabling them to provide more accurate and relevant responses to user queries. This model can generate human-like text that is coherent and contextually appropriate. This capability is valuable for context creation, automated writing, and generating personalized response in applications like chatbots and virtual assistants. They can perform a variety of tasks. Large language models are very versatile and adaptable to various industries. They can be customized to excel in applications such as language translation, sentiment analysis, code generation, and much more. LLMs can handle multiple languages making them valuable for cross-lingual tasks like translation, sentiment analysis, and understanding diverse global content. Large language models can be again, fine-tuned for a specific task using a minimal amount of domain data. The efficiency of LLMs usually grows with more data and parameters. 07:55 Lois: You mentioned the “sequence to sequence tasks” earlier. Can you explain the concept in simple terms for us? Himanshu: Understanding language is difficult for computers and AI systems. The reason being that words often have meanings based on context. Consider a sentence such as Jane threw the frisbee, and her dog fetched it. In this sentence, there are a few things that relate to each other. Jane is doing the throwing. The dog is doing the fetching. And it refers to the frisbee. Suppose we are looking at the word “it” in the sentence. As a human, we understand easily that “it” refers to the frisbee. But for a machine, it can be tricky. The goal in sequence problems is to find patterns, dependencies, or relationships within the data and make predictions, classification, or generate new sequences based on that understanding. 08:48 Lois: And where are sequence models mostly used? Himanshu: Some common example of sequence models includes natural language processing, which we call NLP, tasks such as machine translation, text generation sentiment analysis, language modeling involve dealing with sequences of words or characters. Speech recognition. Converting audio signals into text, involves working with sequences of phonemes or subword units to recognize spoken words. Music generation. Generating new music involves modeling musical sequences, nodes, and rhythms to create original compositions. Gesture recognition. Sequences of motion or hand gestures are used to interpret human movements for applications, such as sign language recognition or gesture-based interfaces. Time series analysis. In fields such as finance, economics, weather forecasting, and signal processing, time series data is used to predict future values, detect anomalies, and understand patterns in temporal data. 09:56 The Oracle University Learning Community is an excellent place to collaborate and learn with Oracle experts and fellow learners. Grow your skills, inspire innovation, and celebrate your successes. All your activities, from liking a post to answering questions and sharing with others, will help you earn a valuable reputation, badges, and ranks to be recognized in the community. Visit mylearn.oracle.com to get started. 10:23 Nikita: Welcome back! Himanshu, what would be the best way to solve those sequence problems you mentioned? Let’s use the same sentence, “Jane threw the frisbee, and her dog fetched it” as an example. Himanshu: The solution is transformers. It's like model has a bird's eye view of the entire sentence and can see how all the words relate to each other. This allows it to understand the sentence as a whole instead of just a series of individual words. Transformers with their self-attention mechanism can look at all the words in the sentence at the same time and understand how they relate to each other. For example, transformer can simultaneously understand the connections between Jane and dog even though they are far apart in the sentence. 11:13 Nikita: But how? Himanshu: The answer is attention, which adds context to the text. Attention would notice dog comes after frisbee, fetched comes after dog, and it comes after fetched. Transformer does not look at it in isolation. Instead, it also pays attention to all the other words in the sentence at the same time. But considering all these connections, the model can figure out that “it” likely refers to the frisbee. The most famous current models that are emerging in natural language processing tasks consist of dozens of transformers or some of their variants, for example, GPT or Bert. 11:53 Lois: I was looking at the AI Foundations course on MyLearn and came across the terms “prompt engineering” and “fine tuning.” Can you shed some light on them? Himanshu: A prompt is the input or initial text provided to the model to elicit a specific response or behavior. So, this is something which you write or ask to a language model. Now, what is prompt engineering? So prompt engineering is the process of designing and formulating specific instructions or queries to interact with a large language model effectively. In the context of large language models, such as GPT 3 or Burt, prompts are the input text or questions given to the model to generate responses or perform specific tasks. The goal of prompt engineering is to ensure that the language model understands the user's intent correctly and provide accurate and relevant responses. 12:47 Nikita: That sounds easy enough, but fine tuning seems a bit more complex. Can you explain it with an example? Himanshu: Imagine you have a versatile recipe robot named chef bot. Suppose that chef bot is designed to create delicious recipes for any dish you desire. Chef bot recognizes the prompt as a request for a pizza recipe, and it knows exactly what to do. However, if you want chef bot to be an expert in a particular type of cuisine, such as Italian dishes, you fine-tune chef bot for Italian cuisine by immersing it in a culinary crash course filled with Italian cookbooks, traditional Italian recipes, and even Italian cooking shows. During this process, chef bot becomes more specialized in creating authentic Italian recipes, and this option is called fine tuning. LLMs are general purpose models that are pre-trained on large data sets but are often fine-tuned to address specific use cases. When you combine prompt engineering and fine tuning, and you get a culinary wizard in chef bot, a recipe robot that is not only great at understanding specific dish requests but also capable of following a specific dish requests and even mastering the art of cooking in a particular culinary style. 14:08 Lois: Great! Now that we’ve spoken about all the major components, can you walk us through the life cycle of a large language model? Himanshu: The life cycle of a Large Language Model, LLM, involves several stages, from its initial pre-training to its deployment and ongoing refinement. The first of this lifecycle is pre-training. The LLM is initially pre-trained on a large corpus of text data from the internet. During pre-training, the model learns grammar, facts, reasoning abilities, and general language understanding. The model predicts the next word in a sentence given the previous words, which helps it capture relationships between words and the structure of language. The second phase is fine tuning initialization. After pre-training, the model's weights are initialized, and it's ready for task-specific fine tuning. Fine tuning can involve supervised learning on labeled data for specific tasks, such as sentiment analysis, translation, or text generation. The model is fine-tuned on specific tasks using a smaller domain-specific data set. The weights from pre-training are updated based on the new data, making the model task aware and specialized. The next phase of the LLM life cycle is prompt engineering. So this phase craft effective prompts to guide the model's behavior in generating specific responses. Different prompt formulations, instructions, or context can be used to shape the output. 15:34 Nikita: Ok… we’re with you so far. What’s next? Himanshu: The next phase is evaluation and iteration. So models are evaluated using various metrics to access their performance on specific tasks. Iterative refinement involves adjusting model parameters, prompts, and fine tuning strategies to improve results. So as a part of this step, you also do few shot and one shot inference. If needed, you further fine tune the model with a small number of examples. Basically, few shot or a single example, one shot for new tasks or scenarios. Also, you do the bias mitigation and consider the ethical concerns. These biases and ethical concerns may arise in models output. You need to implement measures to ensure fairness in inclusivity and responsible use. 16:28 Himanshu: The next phase in LLM life cycle is deployment. Once the model has been fine-tuned and evaluated, it is deployed for real world applications. Deployed models can perform tasks, such as text generation, translation, summarization, and much more. You also perform monitoring and maintenance in this phase. So you continuously monitor the model's performance and output to ensure it aligns with desired outcomes. You also periodically update and retrain the model to incorporate new data and to adapt to evolving language patterns. This overall life cycle can also consist of a feedback loop, whether you gather feedbacks from users and incorporate it into the model’s improvement process. You use this feedback to further refine prompts, fine tuning, and overall model behavior. RLHF, which is Reinforcement Learning with Human Feedback, is a very good example of this feedback loop. You also research and innovate as a part of this life cycle, where you continue to research and develop new techniques to enhance the model capability and address different challenges associated with it. 17:40 Nikita: As we’re talking about the LLM life cycle, I see that fine tuning is not only about making an LLM task specific. So, what are some other reasons you would fine tune an LLM model? Himanshu: The first one is task-specific adaptation. Pre-trained language models are trained on extensive and diverse data sets and have good general language understanding. They excel in language generation and comprehension tasks, though the broad understanding of language may not lead to optimal performance in specific task. These models are not task specific. So the solution is fine tuning. The fine tuning process customizes the pre-trained models for a specific task by further training on task-specific data to adapt the model's knowledge. The second reason is domain-specific vocabulary. Pre-trained models might lack knowledge of specific words and phrases essential for certain tasks in fields, such as legal, medical, finance, and technical domains. This can limit their performance when applied to domain-specific data. Fine tuning enables the model to adapt and learn domain-specific words and phrases. These words could be, again, from different domains. 18:56 Himanshu: The third reason to fine tune is efficiency and resource utilization. So fine tuning is computationally efficient compared to training from scratch. Fine tuning reuses the knowledge from pre-trained models, saving time and resources. Fine tuning requires fewer iterations to achieve task-specific competence. Shorter training cycles expedite the model development process. It conserves computational resources, such as GPU memory and processing power. Fine tuning is efficient in quicker model deployment. It has faster time to production for real world applications. Fine tuning is, again, scalable, enabling adaptation to various tasks with the same base model, which further reduce resource demands, and it leads to cost saving for research and development. The fourth reason to fine tune is of ethical concerns. Pre-trained models learns from diverse data. And those potentially inherit different biases. Fine tune might not completely eliminate biases. But careful curation of task-specific data ensures avoiding biased or harmful vocabulary. The responsible uses of domain-specific terms promotes ethical AI applications. 20:14 Lois: Thank you so much, Himanshu, for spending time with us. We had such a great time learning from you. If you want to learn more about the topics discussed today, head over to mylearn.oracle.com and get started on our free AI Foundations course. Nikita: Yeah, we even have a detailed walkthrough of the architecture of transformers that you might want to check out. Join us next week for a discussion on the OCI AI Portfolio. Until then, this is Nikita Abraham… Lois: And Lois Houston signing off! 20:44 That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.