Information
| Unit | INSTITUTE OF NATURAL AND APPLIED SCIENCES |
| COMPUTER ENGINEERING (MASTER) (WITH THESIS) (ENGLISH) | |
| Code | CENG524 |
| Name | Advanced Paradigms in NLP |
| Term | 2026-2027 Academic Year |
| Term | Spring |
| Duration (T+A) | 3-0 (T-A) (17 Week) |
| ECTS | 6 ECTS |
| National Credit | 3 National Credit |
| Teaching Language | İngilizce |
| Level | Belirsiz |
| Type | Normal |
| Mode of study | Yüz Yüze Öğretim |
| Catalog Information Coordinator | Prof. Dr. UMUT ORHAN |
| Course Instructor |
The current term course schedule has not been prepared yet.
|
Course Goal / Objective
The primary objective of this course is to explore state-of-the-art (SotA) architectures and paradigms in Natural Language Processing that extend beyond standard Transformer models. The course aims to develop students' ability to critically analyze recent top-tier research papers, understand fundamental architectural shifts in the LLM ecosystem, and apply these advanced concepts to formulate and solve complex research problems in generative AI.
Course Content
This research-oriented course focuses on the latest advancements and structural shifts in modern NLP. Key topics include overcoming the quadratic bottlenecks of standard Transformers through alternative architectures like State Space Models (e.g., Mamba); dynamic compute allocation strategies such as Mixture of Experts (MoE) and Mixture of Depths (MoD); and the shift towards inference-time compute and "System 2" logical reasoning. Additionally, the course covers the mathematical foundations of Parameter-Efficient Fine-Tuning (PEFT), the evolution of Large Action Models (LAMs) for autonomous GUI/OS interactions, and natively multimodal (Omni) architectures. The curriculum is heavily driven by literature review, paper discussions, and advanced research projects.
Course Precondition
none
Resources
- Scaling Laws for Neural Language Models (Kaplan et al., 2020) - Training Compute-Optimal Large Language Models [Chinchilla] (Hoffmann et al., 2022) - Attention Is All You Need (Vaswani et al., 2017) - FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness (Dao et al., 2022) - LoRA: Low-Rank Adaptation of Large Language Models (Hu et al., 2021) - QLoRA: Efficient Finetuning of Quantized LLMs (Dettmers et al., 2023) - Direct Preference Optimization: Your Language Model is Secretly a Reward Model (Rafailov et al., 2023) - Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks (Lewis et al., 2020) - Lost in the Middle: How Language Models Use Long Contexts (Liu et al., 2023) - From Local to Global: A Graph RAG Approach to Query-Focused Summarization (Edge et al., 2024) - A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions (Ji et al., 2023) - SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models (Manakul et al., 2023) - Mamba: Linear-Time Sequence Modeling with Selective State Spaces (Gu & Dao, 2023) - Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (Wei et al., 2022) - Let's Verify Step by Step (Lightman et al., 2023) - Mixtral of Experts (Jiang et al., 2024) - Mixture-of-Depths: Dynamically allocating compute in transformer-based language models (Raposo et al., 2024) - OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments (Xie et al., 2024)
Notes
- AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling (Zhan et al., 2024)
Course Learning Outcomes
| Order | Course Learning Outcomes |
|---|---|
| LO01 | Knows the basic architectures and working principles of large language models |
| LO02 | Knows the theoretical background of vector spaces, text embedding operations, and semantic search |
| LO03 | Knows the evaluation metrics of models and optimization processes such as hallucination detection |
| LO04 | Accomplishes developing applications based on Retrieval-Augmented Generation (RAG) architecture using vector databases |
| LO05 | Accomplishes executing parameter-efficient fine-tuning (PEFT/LoRA) processes on open-source models using local hardware |
| LO06 | Accomplishes designing autonomous AI agents that solve multi-step tasks using external APIs and tools |
Relation with Program Learning Outcome
| Order | Type | Program Learning Outcomes | Level |
|---|---|---|---|
| PLO01 | Bilgi - Kuramsal, Olgusal | On the basis of the competencies gained at the undergraduate level, it has an advanced level of knowledge and understanding that provides the basis for original studies in the field of Computer Engineering. | 4 |
| PLO02 | Bilgi - Kuramsal, Olgusal | By reaching scientific knowledge in the field of engineering, he/she reaches the knowledge in depth and depth, evaluates, interprets and applies the information. | 4 |
| PLO03 | Yetkinlikler - Öğrenme Yetkinliği | Being aware of the new and developing practices of his / her profession and examining and learning when necessary. | 3 |
| PLO04 | Yetkinlikler - Öğrenme Yetkinliği | Constructs engineering problems, develops methods to solve them and applies innovative methods in solutions. | |
| PLO05 | Yetkinlikler - Öğrenme Yetkinliği | Designs and applies analytical, modeling and experimental based researches, analyzes and interprets complex situations encountered in this process. | 4 |
| PLO06 | Yetkinlikler - Öğrenme Yetkinliği | Develops new and / or original ideas and methods, develops innovative solutions in system, part or process design. | |
| PLO07 | Beceriler - Bilişsel, Uygulamalı | Has the skills of learning. | |
| PLO08 | Beceriler - Bilişsel, Uygulamalı | Being aware of new and emerging applications of Computer Engineering examines and learns them if necessary. | |
| PLO09 | Beceriler - Bilişsel, Uygulamalı | Transmits the processes and results of their studies in written or oral form in the national and international environments outside or outside the field of Computer Engineering. | |
| PLO10 | Beceriler - Bilişsel, Uygulamalı | Has comprehensive knowledge about current techniques and methods and their limitations in Computer Engineering. | 3 |
| PLO11 | Beceriler - Bilişsel, Uygulamalı | Uses information and communication technologies at an advanced level interactively with computer software required by Computer Engineering. | |
| PLO12 | Bilgi - Kuramsal, Olgusal | Observes social, scientific and ethical values in all professional activities. |
Week Plan
| Week | Topic | Preparation | Methods |
|---|---|---|---|
| 1 | Information Theory & Scaling Laws | Reading paper | Öğretim Yöntemleri: Anlatım |
| 2 | Transformer Mechanics & Bottlenecks | Attention paper | Öğretim Yöntemleri: Anlatım, Tartışma |
| 3 | Parameter-Efficient Fine-Tuning I (PEFT & LoRA) | LoRA paper | Öğretim Yöntemleri: Anlatım, Tartışma |
| 4 | Parameter-Efficient Fine-Tuning II & Alignment | QLoRA paper | Öğretim Yöntemleri: Anlatım, Tartışma |
| 5 | Advanced Retrieval Architectures I | RAG (Lewis et al., 2020) paper | Öğretim Yöntemleri: Anlatım, Tartışma |
| 6 | Graph RAG & Structured Retrieval | From Local to Global paper | Öğretim Yöntemleri: Anlatım, Tartışma |
| 7 | Hallucination & Model Evaluation | Hallucination in LLMs papers | Öğretim Yöntemleri: Anlatım, Tartışma |
| 8 | Project Tasks | Task 1. Mamba vs. Attention | Ölçme Yöntemleri: Proje / Tasarım |
| 9 | Beyond Attention: State Space Models (SSMs) | Mamba paper | Öğretim Yöntemleri: Anlatım, Tartışma |
| 10 | Inference-Time Compute & System 2 Reasoning | Chain-of-Thought paper | Öğretim Yöntemleri: Anlatım, Tartışma |
| 11 | Dynamic Compute & Routing | Mixtral of Experts paper | Öğretim Yöntemleri: Anlatım, Tartışma |
| 12 | Large Action Models (LAMs) | OSWorld paper | Öğretim Yöntemleri: Anlatım, Tartışma |
| 13 | Omni Architectures (Multimodality) | AnyGPT paper | Öğretim Yöntemleri: Anlatım, Tartışma |
| 14 | Project presentations | Task 2. MoD, MoE ve o1 | Ölçme Yöntemleri: Proje / Tasarım |
| 15 | Project presentations-2 | Task 3. Any-to-Any (Omni), Text-to-Action, GUI reading Large Action Model (LAM) | Ölçme Yöntemleri: Proje / Tasarım |
| 16 | Term Exams | exam | Ölçme Yöntemleri: Yazılı Sınav |
| 17 | Term Exams | exam | Ölçme Yöntemleri: Yazılı Sınav |
Student Workload - ECTS
| Works | Number | Time (Hour) | Workload (Hour) |
|---|---|---|---|
| Course Related Works | |||
| Class Time (Exam weeks are excluded) | 14 | 3 | 42 |
| Out of Class Study (Preliminary Work, Practice) | 14 | 5 | 70 |
| Assesment Related Works | |||
| Homeworks, Projects, Others | 3 | 5 | 15 |
| Mid-term Exams (Written, Oral, etc.) | 0 | 0 | 0 |
| Final Exam | 1 | 25 | 25 |
| Total Workload (Hour) | 152 | ||
| Total Workload / 25 (h) | 6,08 | ||
| ECTS | 6 ECTS | ||