FourthBrain Workshop
Building with MultiModal Models
A 3-week live workshop for engineers on developing applications with the next generation of large foundation models that integrate text, image, audio, and beyond.
- Date: Coming Soon
- Time: 4-7pm PT on Tuesdays and Thursdays
- Cost: $2,000
Key Outcomes
Master the fundamentals of multimodal training like CLIP for contrastive language-image pre-training
Apply MMFs to tasks like image classification, text-based image retrieval, and conversational assistance
Explore emerging research directions in the field of large multimodal models including generating multimodal output
Workshop Schedule
This workshop includes two live sessions a week on Tuesdays and Thursdays, plus additional practice on your own time.
- Exploring multi-modal datasets and training objectives
- Understanding CLIP - contrastive language-image pre-training
- Introduction to Flamingo and alternative architectures
This workshop is for you if:
You are an engineer, data scientist, or developer who wants to incorporate multimodal capabilities to your projects
You have a solid understanding of machine learning and deep learning concepts
You are fluent in Python and have some experience with NLP and Computer Vision
How to Prepare
Create a Google Colab Account
We suggest you work in Google Colab for fine-tuning, so you should have a paid account.
Familiarize yourself with Hugging Face
This Introduction to Hugging Face Course will teach you about NLP using libraries from the Hugging Face ecosystem.
Many employers offer reimbursement for programs like ours. Check out our tips for getting reimbursed.
Register as a group here.