Mehmet Saygin Seyfioglu

CV  |  Google Scholar  |  GitHub |  Twitter |  LinkedIn 

I am a PhD candidate at the Graphics and Imaging Laboratory at the University of Washington, Seattle, where I work with Prof. Linda Shapiro and Prof. Ranjay Krishna. My research, situated at the intersection of vision and language, spans areas such as visual instruction tuning, self-supervised learning, and diffusion models. My work has been supported by the Fulbright Fellowship (2019-2021), the Garvey Institute (2021-2022) and Microsoft (2023-2024).

Currently, I am a Research Scientist Intern at Google working on diffusion models. Previously, I interned at Amazon during the summers of 2020-2023, where I worked on a range of projects, from self-supervised learning to diffusion models. Prior to my PhD, I worked at a tech company in Turkey for two years as a machine learning researcher on various projects on vision and language.

I will be entering the job market around September 2024. Please feel free to reach out if you believe that I am a good fit!

msaygin[at]cs[dot]washington[dot]edu

profile photo
Research

My research interests revolve around self-supervised learning at the intersection of vision and language, visual instruction tuning, and generative diffusion models.

clean-usnob Diffuse to Choose: Enriching Image Conditioned Inpainting in Latent Diffusion Models for Virtual Try-All.
Mehmet Saygin Seyfioglu, K. Bouyarmane, S. Kumar, A. Tavanaei and I. Tutar
PrePrint January 2024.

Summary: In e-commerce, a common challenge faced during virtual try-ons with zero-shot, image-conditioned inpainting models is their inability to preserve the fine-grained details of products. While frameworks like DreamPaint offer better quality, they rely on few-shot fine-tuning for each individual product, which is too costly. In this work, we introduce Diffuse2Choose, which enables any e-commerce product to be virtually placed in any user setting by preserving the product's fine-grained details in a zero-shot setting. It works by stitching the reference product directly into the source image to approximate its pixel-level appearance. A secondary UNet encoder then processes this collage, generating pixel-level signals. These signals are then modulated to the main UNet decoder through affine transformations using a FILM layer, ensuring high fidelity in detail preservation. In comparative evaluations, Diffuse2Choose outperforms both few-shot personalization models like DreamPaint and zero-shot inpainting models like Paint by Example, demonstrating superior results in both public virtual try-on datasets and in-house virtual try-all datasets.

clean-usnob Quilt-LLaVA: Visual Instruction Tuning by Extracting Localized Narratives from Open-Source Histopathology Videos.
Mehmet Saygin Seyfioglu*, W. O. Ikezogwo*, F. Ghezloo*, R. Krishna and L. Shapiro
CVPR 2024.

Summary: We introduce Quilt-Instruct, a dataset with over 107,000 histopathology-specific instructional question/answer pairs, which we compiled from educational histopathology videos on YouTube, capturing narrators' cursor movements for spatial localization of captions. We distill key facts and diagnoses from the broader video content, leveraging these in GPT-4 prompts for contextually anchored, extrapolative reasoning, which helps minimize hallucinations when generating instruction tuning dataset. Quilt-Instruct led to the development of Quilt-LLaVA, a multi modal chatbot that excels in diagnostic reasoning and spatial awareness by reasoning beyond single image patches. Quilt-LLaVA shows a significant improvement over SOTA by over 10% on relative GPT-4 score and 4% and 9% on open and closed set VQA tasks.

clean-usnob Quilt-1M: One Million Image-Text Pairs for Histopathology.
Mehmet Saygin Seyfioglu*, W. O. Ikezogwo* ,F. Ghezloo*, D. Geva, F. S. Mohammed, P. K. Anand, R. Krishna and L. Shapiro
Neurips ORAL 2023.

Summary: We introduce Quilt-1M, the largest to date vision-language dataset created to aid representation learning in histopathology, utilizing various resources including YouTube. The dataset, assembled using a blend of handcrafted models and tools like large language models and automatic speech recognition, expands upon existing datasets by integrating additional data from sources like Twitter and research papers. A pre-trained CLIP model fine-tuned with Quilt-1M significantly outperforms state-of-the-art models for classifying histopathology images across 13 benchmarks spanning 8 sub-pathologies.

clean-usnob DreamPaint: Few-Shot Inpainting of E-Commerce Items for Virtual Try-On without 3D Modeling.
Mehmet Saygin Seyfioglu, K. Bouyarmane, S. Kumar, A. Tavanaei and I. B. Tutar
PrePrint May 2023.

Summary:DreamPaint is a framework for inpainting of e-commerce products onto user-provided context images, using only 2D images from product catalogs and user context. It utilizes few-shot fine-tuning of pre-trained diffusion models, allowing for accurate placement of products in images, preserving both product and context characteristics. DreamPaint outperforms state of the art inpainting modules in both subjective human studies and quantitative metrics, showcasing its potential for virtual try-ons in e-commerce.

clean-usnob Multi-modal Masked Autoencoders Learn Compositional Histopathological Representations.
Mehmet Saygin Seyfioglu*, W.O. Ikezogwo*, and L. Shapiro
Extended abstract: Machine Learning for Health (ML4H), Dec 2022.

Summary:We propose the use of of Multi-modal Masked Autoencoders (MMAE) in histopathology for self-supervised learning (SSL) on whole slide images. MMAE, utilizing specific compositionality of Hematoxylin & Eosin stained WSIs, shows superior performance over other state-of-the-art SSL techniques and supervised baselines in an eight-class tissue phenotyping task.

clean-usnob Brain-Aware Replacements for Supervised Contrastive Learning in Detection of Alzheimer’s Disease.
Mehmet Saygin Seyfioglu, Z. Liu, P. Kamath, S. Gangolli, S. Wang, T. Grabowski, and L. Shapiro.
International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer Nature Switzerland, March 2022.

Summary:We introduce a novel framework for Alzheimer's disease detection using brain MRIs. We utilize a data augmentation method called Brain-Aware Replacements (BAR) to generate synthetic samples. Then we trained a soft-label-capable supervised contrastive loss which aims to learn the relative similarity of representations. Through fine-tuning, the model pre-trained with this framework exhibits superior performance in the Alzheimer's disease detection task, outperforming other training approaches.

clean-usnob MLIM: Vision-and-Language Model Pre-training with Masked Language and Image Modeling.
Mehmet Saygin Seyfioglu*, T. Arici*, T. Neiman, Y. Xu, S. Tran, T. Cilimbi, B. Zeng, and I. B. Tutar.
Preprint September 2021.

Summary:We introduce Masked Language and Image Modeling for enhancing Vision-and-Language Pre-training by merging Masked Language Modeling (MLM) and reconstruction (Recon) losses. We also propose Modality Aware Masking (MAM) which aims to boost cross-modality interaction and separately gauge text and image reconstruction quality. By coupling MLM + Recon tasks with MAM, we exhibit a simplified VLP methodology, demonstrating improved performance on downstream tasks within a proprietary e-commerce multi-modal dataset​.

clean-usnob Leveraging Unlabeled Data for Glioma Molecular Subtype and Survival Prediction.
N. Nuechterlein, B. Li, Mehmet Saygin Seyfioglu, M. S. Seyfioğlu, S. Mehta, P. J. Cimino, and L. Shapiro.
ICPR May 2020.

Summary:We address radio-genomic challenges in glioma subtype and survival prediction by utilizing multi-task learning (MTL) to leverage unlabeled MR data and combine it with genomic data. Our MTL model significantly surpasses comparable models trained only on labeled MR data for glioma subtype prediction tasks. We also demonstrate that our model's embeddings enhance survival predictions beyond using MR or somatic copy number alteration data alone. Our methodology showcases the potential of MTL in harnessing multimodal data for improved glioma characterization and survival prediction.

I have some publications from my earlier work. Please see my Google Scholar page to see them.

Teaching
cs188 {TA} CSE 576: Computer Vision, Spring 2023.

{TA} CSE 473: Introduction to Artificial Intelligence, Winter/Fall 2023.


Website credits: Jon Barron