Practice 1
Foundations & Basic PyTorch Operations
- Basic Tensor Creation:
Question: Write a PyTorch code snippet to create a 3×3 tensor of random numbers and multiply it by 2.
Why: To ensure you understand tensor creation, basic arithmetic, and using built‑in functions.
- Tensor from Python Data Structures:
Question: How do you create a tensor from a list of lists and compute its element‑wise square?
Why: It practices converting Python lists to tensors and applying element‑wise operations.
- Tensor Indexing & Slicing:
Question: Write code to slice a given 4×4 tensor to extract the middle 2×2 sub-tensor.
Why: It reinforces your ability to index and manipulate tensor subsets.
- Broadcasting in PyTorch:
Question: Demonstrate broadcasting by adding a 1×3 tensor to a 4×3 tensor.
Why: To understand how PyTorch automatically expands dimensions during operations.
- Matrix Multiplication:
Question: Write a snippet that computes the matrix product of two tensors using
torch.matmul.Why: To become comfortable with linear algebra operations that are foundational in deep learning.
- Understanding Data Types:
Question: How do you change a tensor’s data type (e.g., from float32 to int64) and why might this be important?
Why: It’s critical to know how data types affect model computations and performance.
Autograd & Custom Gradients
- Gradient Computation Basics:
Question: Write a code snippet to compute the gradient of a simple scalar function (e.g., f(x)=x²) using
torch.autograd.Why: To understand how PyTorch tracks operations and computes gradients automatically.
- Disabling Gradients:
Question: How would you disable gradient calculations during inference using PyTorch? Write an example.
Why: This practice is essential for saving memory and speeding up inference.
- Custom Autograd Function:
Question: Create a custom PyTorch autograd Function for a simple operation (for example, a custom square function) that implements both forward and backward passes.
Why: To deepen your understanding of the autograd system and custom gradient computation.
- In-place vs. Out-of-place Operations:
Question: Explain the difference between in-place and out-of-place tensor operations and demonstrate with a code example.
Why: To learn how in-place operations can affect gradient tracking and memory usage.
Building Neural Network Modules
- Simple Linear Model:
Question: Implement a linear regression model using
nn.Modulein PyTorch.Why: To practice building a custom model and understand module structure.
- Feedforward Neural Network:
Question: Code a two-layer MLP for a simple classification task and explain each component.
Why: It builds your skills in structuring multi-layer networks.
- Activation Functions:
Question: Write a custom PyTorch module that applies ReLU or GELU activation and explain why non-linearities are essential.
Why: To appreciate the role of activation functions in deep networks.
- Building a Convolutional Layer:
Question: Create a custom convolutional layer using PyTorch’s functional API.
Why: To understand low-level implementation details of convolution, a building block for many models.
- Dropout Implementation:
Question: Write code to add dropout in a neural network module and explain its role during training vs. inference.
Why: Dropout is crucial for regularization, and you should know how to integrate it properly.
- Batch Normalization:
Question: Implement a simple network that uses BatchNorm, and explain how it improves training stability.
Why: Batch normalization is a widely used technique to accelerate convergence.
Training Loop & Model Optimization
- Custom Training Loop:
Question: Write a complete training loop (forward pass, loss computation, backpropagation, and optimizer step) for a simple neural network.
Why: To master the end-to-end process of training models in PyTorch.
- Using GPU Acceleration:
Question: How do you modify your training loop to leverage GPU acceleration in PyTorch? Provide a code example.
Why: To ensure you can transfer models and data to GPU for faster computation.
- Learning Rate Scheduling:
Question: Write code to implement a learning rate scheduler using
torch.optim.lr_schedulerand explain its benefit.Why: Adaptive learning rates are important for effective training.
- Gradient Clipping:
Question: How do you apply gradient clipping in a training loop? Write a code snippet.
Why: It prevents exploding gradients, which is especially useful in training deep or recurrent networks.
- Saving and Loading Models:
Question: Write functions to save a model’s state and then load it back.
Why: To understand model serialization and checkpointing for production deployment.
- Custom Loss Function:
Question: Create and implement a custom loss function in PyTorch (e.g., a weighted mean squared error).
Why: Building custom loss functions deepens your understanding of how optimization objectives affect training.
- Early Stopping Mechanism:
Question: Write code to implement early stopping in a training loop.
Why: Early stopping is a practical strategy to prevent overfitting and save resources.
- Model Evaluation:
Question: Code a validation loop that computes accuracy and loss on a validation set, and discuss its integration in training.
Why: To ensure you can evaluate your models effectively during training.
- Data Augmentation & Custom DataLoader:
Question: Write a custom PyTorch Dataset class and DataLoader that applies data augmentation on-the-fly.
Why: Data loading and augmentation are key to building robust models.
Deep Dive into Transformers & LLMs
- Implementing Positional Encoding:
Question: Write a PyTorch module to compute sinusoidal positional encodings for sequence data.
Why: Positional encoding is vital for Transformers to capture sequence order.
- Scaled Dot-Product Attention:
Question: Code the scaled dot-product attention mechanism and explain its components (queries, keys, values, scaling).
Why: Attention mechanisms are at the heart of Transformers.
- Multi-head Attention Module:
Question: Implement a multi-head attention module in PyTorch, including splitting and concatenating heads.
Why: Multi-head attention enhances the model’s ability to capture diverse features.
- Transformer Encoder Block:
Question: Build a single Transformer encoder block that includes multi-head attention, layer normalization, and a feedforward network.
Why: It’s a core building block for LLMs and advanced NLP models.
- Masked Self-Attention:
Question: Code a masked self-attention layer for auto-regressive generation, explaining how and why the mask is applied.
Why: To understand how Transformers handle sequential data during generation.
- Feedforward Network in Transformer:
Question: Implement the position-wise feedforward network used in Transformer layers.
Why: It complements the attention mechanism and is essential for learning non-linear transformations.
- Layer Normalization from Scratch:
Question: Write your own PyTorch implementation of layer normalization and compare it to
nn.LayerNorm.Why: Understanding normalization techniques helps optimize model training.
- Custom Transformer Decoder Layer:
Question: Build a Transformer decoder layer, incorporating masked attention and encoder-decoder attention mechanisms.
Why: Decoders are critical for sequence-to-sequence tasks like translation.
- Implementing a Simple Transformer Model:
Question: Assemble a full Transformer model for a language modeling task, including embeddings, positional encoding, encoder/decoder stacks, and a linear output layer.
Why: To integrate all components into a functioning model from scratch.
- Handling Variable-length Sequences:
Question: Write code to implement padding and create attention masks for variable-length sequences in a Transformer model.
Why: Proper masking is key to processing batches with varying sequence lengths.
- Custom Activation Functions:
Question: Implement the GELU activation function in PyTorch without using built-in functions.
Why: To deepen your understanding of non-linear activations and their numerical implementations.
Advanced Topics & Production-level Coding
- Gradient Accumulation:
Question: Write a training loop that uses gradient accumulation to simulate larger batch sizes on limited GPU memory.
Why: It helps when hardware constraints limit batch size.
- Mixed-Precision Training:
Question: Modify your training loop to use PyTorch’s AMP for mixed-precision training.
Why: Mixed-precision can significantly speed up training while conserving memory.
- Custom Collate Function:
Question: Write a custom collate function for a DataLoader that dynamically pads sequences for a batch.
Why: To efficiently handle variable-length data during batching.
- Integrating TensorBoard:
Question: Write a snippet that logs loss and accuracy during training using TensorBoard.
Why: Monitoring training metrics is critical for debugging and optimizing performance.
- Implementing Beam Search:
Question: Code a basic beam search algorithm for sequence generation using a Transformer model.
Why: Beam search is a common technique to improve generation quality in LLMs.
- Caching in Auto-Regressive Generation:
Question: Implement a caching mechanism to store and reuse key/value pairs during Transformer inference.
Why: This optimization is crucial for efficient text generation.
- Visualization of Attention Weights:
Question: Write a function that extracts and visualizes attention weights from a Transformer layer.
Why: Visualization helps you understand and debug model behavior.
- Model Parallelism Basics:
Question: Explain and demonstrate how to split a model’s layers across multiple GPUs using PyTorch’s distributed features.
Why: To handle large models that exceed the memory of a single GPU.
- Implementing a Custom Optimizer/Modifier:
Question: Write code that customizes an existing optimizer (e.g., by modifying learning rates per layer) or implements a simple custom optimizer.
Why: To explore and understand optimization strategies beyond standard implementations.
- Integrating PyTorch Lightning:
Question: Refactor one of your training scripts using PyTorch Lightning and discuss the benefits.
Why: Lightning helps to simplify and structure training code for scalability and reproducibility.
- Custom Callbacks in Lightning:
Question: Create a custom PyTorch Lightning callback for early stopping or dynamic learning rate adjustments.
Why: Custom callbacks further automate training management and improve model performance.
- Distributed Data Parallel (DDP):
Question: Write a sample script that trains a model using PyTorch’s Distributed Data Parallel (DDP) on multiple GPUs.
Why: Understanding DDP is crucial for scaling training on multi-GPU systems.
- Profiling GPU Memory Usage:
Question: Develop a small utility that profiles and logs GPU memory usage during model training/inference.
Why: Profiling helps you optimize memory consumption and debug performance issues.
- End-to-End LLM Implementation:
Question: Implement a Transformer-based language model from scratch—including training, evaluation, and inference scripts—that can be fine-tuned on a sample dataset.
Why: This capstone challenge integrates all components (tensor operations, autograd, custom modules, advanced optimization, and distributed training) to simulate a production-level LLM development process.
Happy coding and good luck with your preparation! Kummesey!