
Overview of BAM: the Perceptual Boundary Alignment & Manipulation Framework
Overview
Human decision-making in cognitive tasks and daily life exhibits considerable variability, shaped by factors such as task difficulty, individual preferences, and personal experiences. Understanding this variability across individuals is essential for uncovering the perceptual and decision-making mechanisms that humans rely on when faced with uncertainty and ambiguity. We propose a systematic Boundary Alignment Manipulation (BAM) framework for studying human perceptual variability through image generation. BAM combines perceptual boundary sampling in ANNs and human behavioral experiments to systematically investigate this phenomenon. Our perceptual boundary sampling algorithm generates stimuli along ANN perceptual boundaries that intrinsically induce significant perceptual variability. The efficacy of these stimuli is empirically validated through large-scale behavioral experiments involving 246 participants across 116,715 trials, culminating in the variMNIST dataset containing 19,943 systematically annotated images. Through personalized model alignment and adversarial generation, we establish a reliable method for simultaneously predicting and manipulating the divergent perceptual decisions of pairs of participants. This work bridges the gap between computational models and human individual difference research, providing new tools for personalized perception analysis.Perceptual Boundary Sampling Algorithm

Flowchart of our Perceptual Boundary Sampling Algorithm. Samples are drawn along an ANN’s decision boundary (high-uncertainty regions), then passed to human observers to measure perceptual ambiguity.
Collecting Human Perceptual Variability

Guidance Outcome. (a) shows the definition for guidance outcomes, (b) shows the guidance outcome of the hand-written digits and natural images.
To comprehensively evaluate the guiding effectiveness of the generation method, we define three types of guidance outcome: success, bias, and failure. For the guidance targets o1 and o2, let p1 and p2 represent the probabilities of participants choosing o1 and o2, respectively.
A result is considered success if \(p_1 + p_2 \geq 80\%\) and \(\min(p_1, p_2) \geq 10\%\), indicating the generated stimuli guide participants to make a balanced choice between the two targets. A result is labeled as bias if \(p_1 + p_2 \geq 80\%\) but \(\min(p_1, p_2) < 10\%\), indicating a strong bias toward one target. A result is classified as failure if \(p_1 + p_2 < 80\%\), meaning the stimuli fail to guide participants effectively. These definitions allow us to evaluate and compare the performance of different guidance strategies and classifiers.
Predicting Human Perceptual Variability

Human Alignment Results.
(a) Accuracy of BaseNet, GroupNet, and IndivNet on MNIST, variMNIST, and variMNIST-i. All models performed similarly on MNIST. On variMNIST, GroupNet and IndivNet improved accuracy by \( \sim 20\%\) over BaseNet, with IndivNet outperforming GroupNet by \( \sim 5\%\) on variMNIST-i. Accuracy improved for 241 participants and decreased for 5 after individual fine-tuning.
(b) Fine-tuning results for five classifiers used in the work.
(c) For VGG, Spearman rank correlation between model and human entropy increased from \( \rho = 0.08\) to \( \rho = 0.74\) after group fine-tuning.
(d) Performance of BaseNet, GroupNet, and IndivNet across varying entropy levels. The choices from the selected subject for the example images are 8, 6, 9, 6, with increasing entropy. Gray backgrounds indicate model–subject disagreement. GroupNet and IndivNet improved over BaseNet at all entropy levels, while IndivNet’s gains over GroupNet were concentrated on high-entropy images.
To align the base models pretrained on MNIST with the performance of both group and individual levels, we adopted a 2-stage fine-tuning approach. For the whole procedure, the group model (GroupNet) was finetuned from the base model (BaseNet), and the individual model (IndivNet) was finetuned from the group model. For individual-level datasets (variMNIST-i), which are subsets of variMNIST corresponding to specific individuals, the validation set was designed to avoid overlap with the group validation set. More details can be found in the paper.
Manipulating Human Perceptual Variablity
Building on variMNIST and alignment experiments, we designed a paradigm to test whether individually fine-tuned models can amplify perceptual differences and guide decision-making. This experiment evaluates the ability of targeted stimuli to reveal individual variability and achieve precise manipulation of perceptual outcomes, highlighting the potential of personalized modeling in understanding human perception.
First Round
For the first round of experiments, we initially selected around 500 balanced samples from the variMNIST dataset as stimuli. After collecting behavioral data from pairs of participants, we fine-tuned their individual models. Controversial stimuli were then generated using the updated models, aiming to elicit distinct choices between the two participants, with each choosing their respective guidance targets.
Second Round
In the second round of experiments, these controversial stimuli were presented to participants in pairs, with each pair completing trials designed to test whether the fine-tuned models could effectively guide their decisions in opposite directions. The goal was to evaluate whether the generated stimuli amplified perceptual differences and aligned participants’ responses with their respective guidance targets. Experiment details can be found in the paper.

Manipulation Analysis.
(a) Manipulation experiment procedure.
(b) The middle two bars show the guidance outcomes for variMNIST and the individually customized dataset, with the individually customized dataset achieving a higher success rate. Compared to variMNIST, IndivNets also improve the directionality of guiding perceptual changes.
(c) The left panel shows the guidance success rates for the first-round stimuli and the second-round stimuli generated by the fine-tuned models, with an improvement of \(\sim 3\%\) (\(p < 0.001\)). The right panel shows the targeted ratios (i.e., the proportion of participant choices aligned with the guidance direction) for these two groups of stimuli, with an increase of \(\sim 12\%\) (\(p < 0.001\)).
To analyze the effects of individual manipulation, we employed two key metrics. The first metric, referred to as the guidance outcome, was adapted from the Predicting Human Perceptual Variability section. It categorizes outcomes for two participants, \(s_1\) and \(s_2\), with respective guidance targets \(o_1\) and \(o_2\), and choices \(c_1\) and \(c_2\). A result is labeled as success if both participants’ choices fall within their respective guidance targets and are distinct, i.e., \(c_1, c_2 \in \{o_1, o_2\}\) and \(c_1 \neq c_2\). If both choices are biased toward the same target, such as \(c_1 = c_2 = o_1\) or \(c_1 = c_2 = o_2\), it is categorized as bias. Finally, if at least one choice is outside the targets \(c_1, c_2 \notin \{o_1, o_2\}\), the outcome is labeled as failure.
The second metric, called the targeted ratio, quantifies the directionality of successful guidance. Within successful trials, participant choices are classified as either positive, where \(c_1 = o_1\) and \(c_2 = o_2\), meaning both choices align with their respective targets, or negative, where \(c_1 = o_2\) and \(c_2 = o_1\), indicating swapped choices. The targeted ratio is defined as the proportion of positive trials among all success trials, providing a measure of the effectiveness of directional guidance.
Code Repository
View the full source code on GitHub: https://github.com/ncclab-sustech/HumanPerceptualVariability
View the full paper: https://arxiv.org/abs/2505.03641