Sebastian Raschka's Blog, page 2

February 4, 2025

Understanding Reasoning LLMs

In this article, I will describe the four main approaches to building reasoning models, or how we can enhance LLMs with reasoning capabilities. I hope this provides valuable insights and helps you navigate the rapidly evolving literature and hype surrounding this topic.

View more on Sebastian Raschka's website »

Like • 0 comments • flag

Published on February 04, 2025 22:03

January 22, 2025

Noteworthy LLM Research Papers of 2024

This article covers 12 influential AI research papers of 2024, ranging from mixture-of-experts models to new LLM scaling laws for precision.

View more on Sebastian Raschka's website »

Like • 0 comments • flag

Published on January 22, 2025 22:03

January 16, 2025

Implementing A Byte Pair Encoding (BPE) Tokenizer From Scratch

This is a standalone notebook implementing the popular byte pair encoding (BPE) tokenization algorithm, which is used in models like GPT-2 to GPT-4, Llama 3, etc., from scratch for educational purposes."

View more on Sebastian Raschka's website »

Like • 0 comments • flag

Published on January 16, 2025 22:03

December 28, 2024

LLM Research Papers: The 2024 List

I want to share my running bookmark list of many fascinating (mostly LLM-related) papers I stumbled upon in 2024. It's just a list, but maybe it will come in handy for those who are interested in finding some gems to read for the holidays.

View more on Sebastian Raschka's website »

Like • 0 comments • flag

Published on December 28, 2024 22:03

November 2, 2024

Understanding Multimodal LLMs

There has been a lot of new research on the multimodal LLM front, including the latest Llama 3.2 vision models, which employ diverse architectural strategies to integrate various data types like text and images. For instance, The decoder-only method uses a single stack of decoder blocks to process all modalities sequentially. On the other hand, cross-attention methods (for example, used in Llama 3.2) involve separate encoders for different modalities with a cross-attention layer that allows these encoders to interact. This article explains how these different types of multimodal LLMs function. Additionally, I will review and summarize roughly a dozen other recent multimodal papers and models published in recent weeks to compare their approaches.

View more on Sebastian Raschka's website »

Like • 0 comments • flag

Published on November 02, 2024 23:03

September 20, 2024

Building A GPT-Style LLM Classifier From Scratch

This article shows you how to transform pretrained large language models (LLMs) into strong text classifiers.��But why focus on classification? First, finetuning a pretrained model for classification offers a gentle yet effective introduction to model finetuning. Second, many real-world and business challenges revolve around text classification: spam detection, sentiment analysis, customer feedback categorization, topic labeling, and more.

View more on Sebastian Raschka's website »

Like • 0 comments • flag

Published on September 20, 2024 23:03

August 31, 2024

Building LLMs from the Ground Up: A 3-hour Coding Workshop

This tutorial is aimed at coders interested in understanding the building blocks of large language models (LLMs), how LLMs work, and how to code them from the ground up in PyTorch. We will kick off this tutorial with an introduction to LLMs, recent milestones, and their use cases. Then, we will code a small GPT-like LLM, including its data input pipeline, core architecture components, and pretraining code ourselves. After understanding how everything fits together and how to pretrain an LLM, we will learn how to load pretrained weights and finetune LLMs using open-source libraries.

View more on Sebastian Raschka's website »

Like • 0 comments • flag

Published on August 31, 2024 23:03

August 16, 2024

New LLM Pre-training and Post-training Paradigms

There are hundreds of LLM papers each month proposing new techniques and approaches. However, one of the best ways to see what actually works well in practice is to look at the pre-training and post-training pipelines of the most recent state-of-the-art models. Luckily, four major new LLMs have been released in the last months, accompanied by relatively detailed technical reports. In this article, I focus on the pre-training and post-training pipelines of the following models: Alibaba's Qwen 2, Apple Intelligence Foundation Language Models, Google's Gemma 2, Meta AI's Llama 3.1.

View more on Sebastian Raschka's website »

Like • 0 comments • flag

Published on August 16, 2024 23:03

July 19, 2024

Instruction Pretraining LLMs

This article covers a new, cost-effective method for generating data for instruction finetuning LLMs; instruction finetuning from scratch; pretraining LLMs with instruction data; and an overview of what's new in Gemma 2.

View more on Sebastian Raschka's website »

Like • 0 comments • flag

Published on July 19, 2024 23:03

June 1, 2024

Developing an LLM: Building, Training, Finetuning

This is an overview of the LLM development process. This one-hour talk focuses on the essential three stages of developing an LLM: coding the architecture, implementing pretraining, and fine-tuning the LLM. Lastly, we also discuss the main ways LLMs are evaluated, along with the caveats of each method.

View more on Sebastian Raschka's website »

Like • 0 comments • flag

Published on June 01, 2024 23:03

← Previous 1 2 3 4 5 6 7 8 9 10 Next →

Sebastian Raschka's Blog

Sebastian Raschka's profile
148 followers

Sebastian Raschka isn't a Goodreads Author (yet), but they do have a blog, so here are some recent posts imported from their feed.