publications | Dongjun Lee

2025

C³: Capturing Consensus with Contrastive Learning in Group Recommendation

Soyoung Kim, Dongjun Lee, and Jaekwang Kim

underreview, 2025
Hierarchical Contrastive Learning with Multiple Augmentations for Sequential Recommendation

Dongjun Lee, Donggeun Ko, and Jaekwang Kim

In Proceedings of the 40th ACM/SIGAPP Symposium on Applied Computing, Catania International Airport, Catania, Italy, 2025

Abs DOI

Sequential recommendation aims to predict users’ next actions by analyzing their historical behavior. Lately, contrastive learning has become prominent in this domain, especially when user interactions with items are sparse. Although data augmentation methods have flourished in fields like computer vision, their potential in sequential recommendation remains under-explored. Thus, we present Hierarchical Contrastive Learning with Multiple Augmentations for Sequential Recommendation (HCLRec), a novel framework that harnesses multiple augmentation techniques to create diverse views on user sequences. This framework systematically employs existing augmentation techniques, creating a hierarchy to generate varied views. First, we augment the input sequences to various views using multiple augmentations. Through the continuous composition of these augmentation methods, we formulate both low-level and high-level view pairs. Second, an effective sequence-based encoder is used to embed input sequences, complemented by the supplementary blocks to capture users’ nonlinear behaviors, which are further varied by augmentations. Input sequences are routed to subsequent layers based on the number of augmentations applied, helping the model discern intricate sequential patterns intensified by these augmentations. Finally, contrastive losses is calculated between view pairs of the same level within each layer. This allows the encoder to learn from the contrastive losses between augmented views of the same level, and the gap caused by different information between the low-level views and high-level views by multiple augmentations is reduced. In evaluations, HCLRec outperforms state-of-the-art methods by up to 7.22% and demonstrates its effectiveness in handling sparse data.
Debiasing Classifiers by Amplifying Bias with Latent Diffusion and Large Language Models

Donggeun Ko, Dongjun Lee, Namjun Park, and 2 more authors

In Proceedings of the 40th ACM/SIGAPP Symposium on Applied Computing, Catania International Airport, Catania, Italy, 2025

Abs DOI

Neural networks struggle with image classification when biases are learned and misleads correlations, affecting their generalization and performance. Previous methods require attribute labels (e.g. background, color) or utilizes Generative Adversarial Networks (GANs) to mitigate biases. We introduce DiffuBias, a novel pipeline for text-to-image generation that generates bias-conflict samples, without any training. By utilizing pretrained diffusion and image captioning models, DiffuBias generates, bias-conflict samples using the top-K losses from a biased classifier (fB) to debias the classifier. This method not only debiases effectively but also boosts classifier generalization capabilities. Our comprehensive experimental evaluations demonstrate that DiffuBias achieves state-of-the-art performance on benchmark datasets.

2024

DiffInject: Revisiting Debias via Synthetic Data Generation using Diffusion-based Style Injection

Donggeun Ko, Sangwoo Jo, Dongjun Lee, and 2 more authors

In IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR) 2024 Workshop in SynData4CV, 2024

Abs

Dataset bias is a significant challenge in machine learning, where specific attributes, such as texture or color of the images are unintentionally learned resulting in detrimental performance. To address this, previous efforts have focused on debiasing models either by developing novel debiasing algorithms or by generating synthetic data to mitigate the prevalent dataset biases. However, generative approaches to date have largely relied on using bias-specific samples from the dataset, which are typically too scarce. In this work, we propose, DiffInject, a straightforward yet powerful method to augment synthetic bias-conflict samples using a pretrained diffusion model. This approach significantly advances the use of diffusion models for debiasing purposes by manipulating the latent space. Our framework does not require any explicit knowledge of the bias types or labelling, making it a fully unsupervised setting for debiasing. Our methodology demonstrates substantial result in effectively reducing dataset bias.
Retrieval-Based Disease Prediction for Myocardial Injury after Noncardiac Surgery: Leveraging Language Models as Diagnostic Tools

Namjun Park, Donggeun Ko, Dongjun Lee, and 2 more authors

In AAAI 2024 Spring Symposium on Clinical Foundation Models, 2024

Abs

Dataset bias is a significant challenge in machine learning, where specific attributes, such as texture or color of the images are unintentionally learned resulting in detrimental performance. To address this, previous efforts have focused on debiasing models either by developing novel debiasing algorithms or by generating synthetic data to mitigate the prevalent dataset biases. However, generative approaches to date have largely relied on using bias-specific samples from the dataset, which are typically too scarce. In this work, we propose, DiffInject, a straightforward yet powerful method to augment synthetic bias-conflict samples using a pretrained diffusion model. This approach significantly advances the use of diffusion models for debiasing purposes by manipulating the latent space. Our framework does not require any explicit knowledge of the bias types or labelling, making it a fully unsupervised setting for debiasing. Our methodology demonstrates substantial result in effectively reducing dataset bias.
Elevating CTR Prediction: Field Interaction, Global Context Integration, and High-Order Representations

Sojeong Kim, Dongjun Lee, and Jaekwang Kim

In Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing, Avila, Spain, 2024

Abs DOI

Recommendation systems have been increasingly prevalent in online applications. For CTR prediction, attention based models are common as a means to efficiently learn interactions between attribute features. However, self-attention has limitations, such as not considering relationships between fields and causing partial information reflection when specific feature combinations have strong relationships. To enhance this, the research introduces interaction weights to capture field relationships and incorporates Multi-layer Perceptron (MLP) and Squeeze and Excitation Networks (SENET) to include global information. Additionally, an extra module is added to address the challenge of creating explicit high-order representations. Experimental results show that the proposed model outperforms all state-of-the-art baseline models in CTR prediction across three public datasets.

2023

How Important is Periodic Model Update in Recommender System?

Hyunsung Lee, Sungwook Yoo, Dongjun Lee, and 1 more author

In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, Taipei, Taiwan, 2023

Abs DOI

In real-world recommender model deployments, the models are typically retrained and deployed repeatedly. It is the rule-of-thumb to periodically retrain recommender models to capture up-to-date user behavior and item trends. However, the harm caused by delayed model updates has not been investigated extensively yet. in this perspective paper, we formulate the delayed model update problem and quantitatively demonstrate the delayed model update actually harms the model performance by increasing the number of cold users and cold items increase and decreasing overall model performances. These effects vary across different domains having different characteristics. Upon these findings, we further argue that although the delayed model update has negative effects on online recommender model deployment, yet it has not gathered enough attention from research communities. We argue our verification of the relationship between the model update cycle and model performance calls for further research such as faster model training, and more efficient data pipelines to keep the model more up-to-date with the latest user behaviors and item trends.
AmpliBias: Mitigating Dataset Bias through Bias Amplification in Few-Shot Learning for Generative Models

Donggeun Ko, Dongjun Lee, Namjun Park, and 3 more authors

In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, Birmingham, United Kingdom, 2023

Abs DOI

Deep learning models exhibit a dependency on peripheral attributes of input data, such as shapes and colors, leading the models to become biased towards these certain attributes that result in subsequent degradation of performance. In this paper, we alleviate this problem by presenting sysname, a novel framework that tackles dataset bias by leveraging generative models to amplify bias and facilitate the learning of debiased representations of the classifier. Our method involves three major steps. We initially train a biased classifier, denoted as f_b, on a biased dataset and extract the top-K biased-conflict samples. Next, we train a generator solely on a bias-conflict dataset comprised of these top-K samples, aiming to learn the distribution of bias-conflict samples. Finally, we re-train the classifier on the newly constructed debiased dataset, which combines the original and amplified data. This allows the biased classifier to competently learn debiased representation. Extensive experiments validate that our proposed method effectively debiases the biased classifier.
Self-Interactive Attention Networks via Factorization Machinces for Click-Through Rate Prediction

Dongjun Lee, Hyunsung Lee, and Jaekwang Kim

In Proceedings of the 24th International Symposium on Advanced Intelligent Systems, 2023

Abs

Click-through rate prediction is the task of estimating the likelihood that the target users would click on recommendations of a website (e.g., advertisements or product lists). As data complexity and volume increase, recommending suitable items for individuals has become economically critical for online applications; however, it remains challenging because the inputs for the prediction model are typically sparse and have high-dimensional categorical features. Owing to the outstanding performance of deep-learning models over classical methods, several deep-learning approaches for learning low- and high-order interactions from the aforementioned inputs have been proposed. However, feed-forward networks are inefficient at capturing common feature interactions and have limited ability to model functions with high-order interactions, efficiently. Furthermore, as the model layer becomes more complex, reflecting loworder interactions to output becomes more demanding. In this paper, we propose SelfInteractive Attention Networks that explicitly captures high-order interactions and reflect low-order interactions in the final prediction. We continuously feed low-order interactions captured via Factorization Machines to the attention layer to carry loworder interactions into the final prediction. We demonstrate that our model outperforms state-of-the-art methods through experiments on two real-world datasets and verify the effectiveness of proposed model.