CMPT 419/983 Trustworthy Deep Learning | Spring 2025

CMPT 419/983

Special Topics in Artificial Intelligence

Trustworthy Deep Learning

Spring 2025 (Jan 7 - Apr 8) In Person

Tuesdays 4:30 - 6:20 PM @ B9201
Thursdays 5:30 - 6:20 PM @ AQ3181

Instructor: Linyi Li (TAI Lab) linyi@sfu.ca
Office hour: Thursdays 4:20 - 5:20 PM @ TASC1 9215

TAs:

Pegah Aryadoost paa40@sfu.ca
Arash Khoeini akhoeini@sfu.ca

TA office hour: Wednesdays 2:00 - 3:00 PM, on Zoom (link on Canvas)

Deep learning, represented by large language models, is revolutionizing human lives. However, trustworthiness threats in deep learning widely exist, posing great challenges to AI safety, security, and reliability. This course introduces state-of-the-art frontiers on deep learning research for a wide range of trustworthiness issues, including threat discovery, mitigation, and certification methods through seminar-style presentations and hands-on projects.

This is a seminar-style course for trustworthy deep learning. The first half of the course is an overview of deep learning and preliminaries for trustworthy AI methods, including training of neural networks, common neural network architectures, large language models, the definition of AI attacks, defences, and certification and verification in the context of AI. The second half of the course visits representative and recent research papers in the field through student presentations, covering topics like evasion attacks and defences, robustness certification, differential privacy, membership inference attacks, watermarks, detection of AI-generated contents, machine unlearning, prompt injection attacks, model stealing, and finetuning attacks.

Prerequisites #

There is no formal pre-requisite. Background in algorithms, calculus, linear algebra (e.g., MATH 151, MATH 152, MATH 232, CMPT 225), CMPT 410/726 Machine Learning strongly recommended. It is also recommended to have a background in CMPT 412/762 Computer Vision and CMPT 713 NLP.

Textbook and Reading Materials #

There is no primary reference material. We will read an assortment of research papers during lectures.

Deep Learning Book
- By Ian Goodfellow, Yoshua Bengio, and Aaron Courville
- Recommended for students to gain background in deep learning before taking the course.
Online course Intro to ML Safety
- By Dan Hendrycks at the Center for AI Safety
- Optional, advanced reading for interested students
- A well-developed course recommended for those who want to learn general machine learning safety from a systematic and interdisciplinary perspective.

Grading #

10% Homework 0 (raw score) + 40% course project (1.1 × raw score with no cap) + 30% paper presentation + 20% notes of peer evaluation and summary

Schedule and Syllabus #

Syllabus

Slides will be updated as the term progresses. All slides are available in this OneDrive folder. The slides are password encrypted - password posted on Canvas.

Week	Date	Topics (Tentative)	Assignment & Due	Reading
Week 1 (1/5 - 1/11)	Tue (1/7) 2h	(Lecture) Syllabus, Introduction to Deep Learning I, Homework 0 Explained	Homework 0 Release	See References in slides
	Thur (1/9) 1h	(Lecture) Introduction to Deep Learning II		See References in Slides
Week 2 (1/12 - 1/18)	Tue (1/14) 2h	(Lecture) Introduction to Deep Learning III		See References in Slides
	Thur (1/16) 1h	(Lecture) Course Presentation Instructions; Trustworthy Deep Learning Overview	Presentation Signing-up Sheet Release Homework 0 Due (1/18)	See References in Slides
Week 3 (1/19 - 1/25)	Tue (1/21) 2h	(Lecture) Robustness Threats in Deep Learning - Attacks I		See References in Slides
	Thur (1/23) 1h	(Lecture) Robustness Threats in Deep Learning - Attacks II		See References in Slides
Week 4 (1/26 - 2/1)	Tue (1/28) 2h	(Lecture) Robustness Threats in Deep Learning - Defenses		See References in Slides
	Thur (1/30) 1h	(Lecture) Robustness Threats in Deep Learning - Certification I		See References in Slides
Week 5 (2/2 - 2/8)	Tue (2/4) 2h	Class cancelled
	Thur (2/6) 1h	(Lecture) Robustness Threats in Deep Learning - Certification I (Cont.d) & II	Presentation Signing-up Due (2/8)	See References in Slides
Week 6 (2/9 - 2/15)	Tue (2/11) 2h	(Lecture) Robustness Threats in Deep Learning - Certification II (Cont.d)	Course Project Release	See References in Slides
	Thur (2/13) 1h	(Presentation) Privacy Attacks Membership Inference Attacks Against Machine Learning Models (35 min presentation).	Notes Submission Required	1802.08232 1702.07464
Week 7 (2/16 - 2/22)	Reading Break
Week 8 (2/23 - 3/1)	Tue (2/25) 2h	(Presentation) Privacy Attacks & Differential Privacy Deep Leakage from Gradients (15 min presentation). Differentially Private Synthetic Data via Foundation Model APIs 1: Images & Differentially Private Synthetic Data via Foundation Model APIs 2: Text (35 min presentation) Flocks of Stochastic Parrots: Differentially Private Prompt Learning for Large Language Models (15 min presentation).	Notes Submission Required	2301.13188 2404.17399 2112.03570 2202.07646 2302.00539 1607.00133 1702.07476 2106.02848 1610.05755 1805.10559 1802.08908 2110.06500 2110.05679 2212.06470 2205.10683
	Thur (2/27) 1h	Class cancelled
Week 9 (3/2 - 3/8)	Tue (3/4) 2h	(Presentation) Copyright, Unlearning, Model Stealing Machine Unlearning (35 min presentation). Protecting Artists from Style Mimicry by Text-to-Image Models (25 min presentation). Extracting Training Data from Large Language Models (25 min presentation).	Notes Submission Required	Towards Making Systems Forget with Machine Unlearning 2109.13398 2403.06634 2404.03233 1911.03030 2303.07345
	Thur (3/6) 1h	(Presentation) Fairness A Survey on Bias and Fairness in Machine Learning (35 min presentation).	Notes Submission Required	1906.08386 2205.15494 2002.10312
Week 10 (3/9 - 3/15)	Tue (3/11) 2h	(Presentation) Data Poisoning Attacks, Fairness Fairness Without Demographics in Repeated Loss Minimization (15 min presentation). (Lecture) Poisoning & Backdoor Attacks for Deep Neural Networks	Notes Submission Required	1804.00792 2302.10149 Trojaning Attack on Neural Networks Robust Logistic Regression and Classification 1706.03691 Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks 2006.14768 Exploring the Orthogonality and Linearity of Backdoor Attacks
	Thur (3/13) 1h	(Presentation) Data Valuation Towards Efficient Data Valuation Based on the Shapley Value (15 min presentation). Studying Large Language Model Generalization with Influence Functions (15 min presentation).	Notes Submission Required	Data Shapley: Equitable Valuation of Data for Machine Learning 1703.04730
Week 11 (3/16 - 3/22)	Tue (3/18) 2h	(Lecture) Large Language Models: Overview		CMU 11-711 ANLP
	Thur (3/20) 1h	(Lecture) LLM Trustworthiness: Overview		DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models AI Risk Management Should Incorporate Both Safety and Security Recommendations for Technical AI Safety Research Directions
Week 12 (3/23 - 3/29)	Tue (3/25) 2h	(Presentation) LLM Alignment Tuning Training Language Models to Follow Instructions with Human Feedback (25 min presentation). Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback (25 min presentation). Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations (25 min presentation). SimPO: Simple Preference Optimization with a Reference-Free Reward (15 min presentation).	Notes Submission Required	2305.18290 Safety Alignment Should be Made More Than Just a Few Tokens Deep 2404.12358 2208.03274
	Thur (3/27) 1h	(Presentation) LLM Safety Benchmarks TruthfulQA: Measuring How Models Mimic Human Falsehoods (15 min presentation). Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference (15 min presentation).	Notes Submission Required	2402.04249 2308.03825 2406.14598 2111.02840
Week 13 (3/30 - 4/5)	Tue (4/1) 2h	(Presentation) LLM Prompting and Prompt Injection Jailbroken: How Does LLM Safety Training Fail? (25 min presentation). Universal and Transferable Adversarial Attacks on Aligned Language Models (15 min presentation). Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks (15 min presentation). Jailbreaking Black Box Large Language Models in Twenty Queries (15 min presentation).	Notes Submission Required	2306.15447 2401.06373 LLM-Fuzzer: Scaling Assessment of Large Language Model Jailbreaks 2310.06987 2306.13213 2404.01318 2405.09113
	Thur (4/3) 1h	(Presentation) LLM Finetuning Attacks and Defenses Locking Down the Finetuned LLMs Safety (15 min presentation). Poisoning Attacks against Support Vector Machines (15 min presentation) - adjusted due to presenter absence.	Notes Submission Required Course project Due (4/5)	2409.18169 2310.03693 2406.20053 2404.01099 2405.16833
Week 14 (4/6 - 4/12)	Tue (4/8) 2h	(Lecture) Course Project Discussion, Closing Remarks
Week 15 (4/13-4/19)	Fri (4/19)	Grade Released

Extended Topics #

Trustworthy deep learning is a broad area. Some important topics are not covered in lectures and presentations due to the limited time frame. Some of them are listed below.

LLM Hallucination
Risks of LLM agents
Reward hacking and goal misspecification in RL and RLHF
Social-economic Impact with Generative AI
…

Assignments and Project #

Homework 0
- Deadline: 23:59, Jan 18, 2025
- Score released (Jan 28, 2025)
Presentation:
- Signing-up spreadsheet
- Signing-up deadline: Feb 8, 2025
- Presentation date: see the signed slot
Course Project:
- Deadline: Apr 5, 2025
Note Submission:
- Submission links dynamically released on Canvas
- Only for student presentation dates
- Due 7 days after each presentation date
- Submit on Canvas
- Up to 3 exemptions

Information Platform #

Course website (here)
Canvas
Piazza
Syllabus

Ethics Statement #

This course will include topics related computer security and privacy. As part of this investigation we may cover technologies whose abuse could infringe on the rights of others. As computer scientists, we rely on the ethical use of these technologies. Unethical use includes circumvention of an existing security or privacy mechanisms for any purpose, or the dissemination, promotion, or exploitation of vulnerabilities of these services. Any activity outside the letter or spirit of these guidelines will be reported to the proper authorities and may result in dismissal from the class and possibly more severe academic and legal sanctions.

Academic Integrity Policy #

Some examples of unacceptable behaviour in homeworks and course projects:
- Handing in assignments that are not 100% your own work (in design, implementation, wording, etc.), without proper citation. There must be a README file in your submission with citations to any external code used.
- Sharing code fragments with others in class (for group project, with others who are not in the same group) is not allowed.
- Keep discussions to high level information rather than specific code hints.
- Copying and then obfuscating code is a serious academic honesty violation.
- Submitting work that has been submitted before, for any course at any institution.
If you are unclear on what academic honesty is, see Simon Fraser University’s Policy S10-01.
All instances of academic dishonesty will be dealt with very severely.
In general, minimum requested penalties will be as follows:
- For assignments and course projcets: a mark of -50% on the assignment. So, academic dishonesty on an assignment worth 10% of your final mark will result in a zero on the assignment, and a penalty of 5% from your final grade.
Please note that these are minimum penalties. At the instructor’s option, more severe penalties may be given/requested. All instances of academic dishonesty will be noted on your University record.
The instructor may use an automated service that will check for plagiarism.

Acknowledgement #

The course is developed from CS562 and CS598GS at UIUC. Part of the content is adapted from Intro to ML Safety. Some course policies are developed from CMPT 413 Natural Language Processing.