arXiv Papers with Code in Human-Computer Interactio (January 2026 - June 2026)

Paperid: 1, https://arxiv.org/pdf/2604.03261.pdf   GitHub
Authors:Bo Kang, Sander Noels, Tijl De Bie
Title: VIGIL: An Extensible System for Real-Time Detection and Mitigation of Cognitive Bias Triggers
Abstract:
The rise of generative AI is posing increasing risks to online information integrity and civic discourse. Most concretely, such risks can materialise in the form of mis- and disinformation. As a mitigation, media-literacy and transparency tools have been developed to address factuality of information and the reliability and ideological leaning of information sources. However, a subtler but possibly no less harmful threat to civic discourse is to use of persuasion or manipulation by exploiting human cognitive biases and related cognitive limitations. To the best of our knowledge, no tools exist to directly detect and mitigate the presence of triggers of such cognitive biases in online information. We present VIGIL (VIrtual GuardIan angeL), the first browser extension for real-time cognitive bias trigger detection and mitigation, providing in-situ scroll-synced detection, LLM-powered reformulation with full reversibility, and privacy-tiered inference from fully offline to cloud. VIGIL is built to be extensible with third-party plugins, with several plugins that are rigorously validated against NLP benchmarks are already included. It is open-sourced at https://github.com/aida-ugent/vigil.

Authors:Hongbin Chen, Jie Li, Wei Wang, Siyang Song, Xiao Gu, Jianqing Li, Wentao Xiang
Title: MECO: A Multimodal Dataset for Emotion and Cognitive Understanding in Older Adults
Abstract:
While affective computing has advanced considerably, multimodal emotion prediction in aging populations remains underexplored, largely due to the scarcity of dedicated datasets. Existing multimodal benchmarks predominantly target young, cognitively healthy subjects, neglecting the influence of cognitive decline on emotional expression and physiological responses. To bridge this gap, we present MECO, a Multimodal dataset for Emotion and Cognitive understanding in Older adults. MECO includes 42 participants and provides approximately 38 hours of multimodal signals, yielding 30,592 synchronized samples. To maximize ecological validity, data collection followed standardized protocols within community-based settings. The modalities cover video, audio, electroencephalography (EEG), and electrocardiography (ECG). In addition, the dataset offers comprehensive annotations of emotional and cognitive states, including self-assessed valence, arousal, six basic emotions, and Mini-Mental State Examination cognitive scores. We further establish baseline benchmarks for both emotion and cognitive prediction. MECO serves as a foundational resource for multimodal modeling of affect and cognition in aging populations, facilitating downstream applications such as personalized emotion recognition and early detection of mild cognitive impairment (MCI) in real-world settings. The complete dataset and supplementary materials are available at https://maitrechen.github.io/meco-page/.

Authors:Matteo Filosa, Andrea Nardocci, Tiziana Catarci, Marco Angelini
Title: UnrealVis: A Testing Laboratory of Optimization Techniques in Unreal Engine for Scientific Visualization
Abstract:
Visualizing large 3D scientific datasets requires balancing performance and fidelity, but traditional tools often demand excessive technical expertise. We introduce UnrealVis, an Unreal Engine optimization laboratory for configuring and evaluating rendering techniques during interactive exploration. Following a review of 55 papers, we established a taxonomy of 22 optimization techniques across six families, implementing them through engine subsystems such as Nanite, Level of Detail(LOD) schemes, and culling. The system features an intuitive workflow with live telemetry and A/B comparisons for local and global performance analysis. Validated through case studies of ribosomal structures and volumetric flow fields, along with an expert evaluation, UnrealVis facilitates the selection of optimization combinations that meet performance goals while preserving structural fidelity. UnrealVis is available at https://github.com/XAIber-lab/UnrealVis

Authors:Matteo Filosa, Graziano Blasilli, Emilio Martino, Marco Angelini
Title: ProVega: A Grammar to Ease the Prototyping, Creation, and Reproducibility of Progressive Data Analysis and Visualization Solutions
Abstract:
Modern data analysis requires speed for massive datasets. Progressive Data Analysis and Visualization (PDAV) emerged as a discipline to address this problem, providing fast response times while maintaining interactivity with controlled accuracy. Yet it remains difficult to implement and reproduce. To lower this barrier, we present ProVega, a Vega-Lite-based grammar that simplifies PDAV instrumentation for both simple visualizations and complex visual environments. Alongside it, we introduce Pro-Ex, an editor designed to streamline the creation and analysis of progressive solutions. We validated ProVega by reimplementing 11 exemplars from the literature-verified for fidelity by 39 users-and demonstrating its support for various progressive methods, including data-chunking, process-chunking, and mixed-chunking. An expert user study confirmed the efficacy of ProVega and the Pro-Ex environment in real-world tasks. ProVega, Pro-Ex, and all related materials are available at https://github.com/XAIber-lab/provega

Authors:Xiangshan Tan, Jingtian Ji, Tianchong Jiang, Pedro Lopes, Matthew R. Walter
Title: HapCompass: A Rotational Haptic Device for Contact-Rich Robotic Teleoperation
Abstract:
The contact-rich nature of manipulation makes it a significant challenge for robotic teleoperation. While haptic feedback is critical for contact-rich tasks, providing intuitive directional cues within wearable teleoperation interfaces remains a bottleneck. Existing solutions, such as non-directional vibrations from handheld controllers, provide limited information, while vibrotactile arrays are prone to perceptual interference. To address these limitations, we propose HapCompass, a novel, low-cost wearable haptic device that renders 2D directional cues by mechanically rotating a single linear resonant actuator (LRA). We evaluated HapCompass's ability to convey directional cues to human operators and showed that it increased the success rate, decreased the completion time and the maximum contact force for teleoperated manipulation tasks when compared to vision-only and non-directional feedback baselines. Furthermore, we conducted a preliminary imitation-learning evaluation, suggesting that the directional feedback provided by HapCompass enhances the quality of demonstration data and, in turn, the trained policy. We release the design of the HapCompass device along with the code that implements our teleoperation interface: https://ripl.github.io/HapCompass/.

Authors:Aakanksha Khandwaha, Edith Law
Title: An Experiential Approach to AI Literacy
Abstract:
Despite AI tools becoming more prevalent and applicable to a variety of workplaces, workers consistently report uncertainty about where AI applies, what problems it can help solve, and how it fits into real workflows. In other words, there is a gap between `knowing' and `doing' when it comes to AI literacy. We propose an experiential form of AI literacy which integrates participant's daily experiences into the learning experience by brainstorming grounded AI use cases through storytelling. We introduce a novel pedagogical approach that helps individuals move away from abstract notions of AI towards practical knowledge of how AI would (or would not) work in different workflows, contexts, and situations. Through this approach, we anticipate two major outcomes: (1) enhanced AI literacy for stakeholders within a variety of work sectors and (2) concrete AI use cases developed through participatory design that are grounded in AI literacy and participant's expertise.

Authors:Hongyu Zhu, Lin Chen, Mingsheng Shang
Title: BiMoE: Brain-Inspired Experts for EEG-Dominant Affective State Recognition
Abstract:
Multimodal Sentiment Analysis (MSA) that integrates Electroencephalogram (EEG) with peripheral physiological signals (PPS) is crucial for the development of brain-computer interface (BCI) systems. However, existing methods encounter three major challenges: (1) overlooking the region-specific characteristics of affective processing by treating EEG signals as homogeneous; (2) treating EEG as a black-box input, which lacks interpretability into neural representations;(3) ineffective fusion of EEG features with complementary PPS features. To overcome these issues, we propose BiMoE, a novel brain-inspired mixture of experts framework. BiMoE partitions EEG signals in a brain-topology-aware manner, with each expert utilizing a dual-stream encoder to extract local and global spatiotemporal features. A dedicated expert handles PPS using multi-scale large-kernel convolutions. All experts are dynamically fused through adaptive routing and a joint loss function. Evaluated under strict subject-independent settings, BiMoE consistently surpasses state-of-the-art baselines across various affective dimensions. On the DEAP and DREAMER datasets, it yields average accuracy improvements of 0.87% to 5.19% in multimodal sentiment classification. The code is available at: https://github.com/HongyuZhu-s/BiMo.

Authors:Kuangshi Ai, Haichao Miao, Kaiyuan Tang, Nathaniel Gorski, Jianxin Sun, Guoxi Liu, Helgi I. Ingolfsson, David Lenz, Hanqi Guo, Hongfeng Yu, Teja Leburu, Michael Molash, Bei Wang, Tom Peterka, Chaoli Wang, Shusen Liu
Title: SciVisAgentBench: A Benchmark for Evaluating Scientific Data Analysis and Visualization Agents
Abstract:
Recent advances in large language models (LLMs) have enabled agentic systems that translate natural language intent into executable scientific visualization (SciVis) tasks. Despite rapid progress, the community lacks a principled and reproducible benchmark for evaluating these emerging SciVis agents in realistic, multi-step analysis settings. We present SciVisAgentBench, a comprehensive and extensible benchmark for evaluating scientific data analysis and visualization agents. Our benchmark is grounded in a structured taxonomy spanning four dimensions: application domain, data type, complexity level, and visualization operation. It currently comprises 108 expert-crafted cases covering diverse SciVis scenarios. To enable reliable assessment, we introduce a multimodal outcome-centric evaluation pipeline that combines LLM-based judging with deterministic evaluators, including image-based metrics, code checkers, rule-based verifiers, and case-specific evaluators. We also conduct a validity study with 12 SciVis experts to examine the agreement between human and LLM judges. Using this framework, we evaluate representative SciVis agents and general-purpose coding agents to establish initial baselines and reveal capability gaps. SciVisAgentBench is designed as a living benchmark to support systematic comparison, diagnose failure modes, and drive progress in agentic SciVis. The benchmark is available at https://scivisagentbench.github.io/.

Authors:Tianle Zeng, Hanxuan Chen, Yanci Wen, Hong Zhang
Title: CARLA-Air: Fly Drones Inside a CARLA World -- A Unified Infrastructure for Air-Ground Embodied Intelligence
Abstract:
The convergence of low-altitude economies, embodied intelligence, and air-ground cooperative systems creates growing demand for simulation infrastructure capable of jointly modeling aerial and ground agents within a single physically coherent environment. Existing open-source platforms remain domain-segregated: driving simulators lack aerial dynamics, while multirotor simulators lack realistic ground scenes. Bridge-based co-simulation introduces synchronization overhead and cannot guarantee strict spatial-temporal consistency. We present CARLA-Air, an open-source infrastructure that unifies high-fidelity urban driving and physics-accurate multirotor flight within a single Unreal Engine process. The platform preserves both CARLA and AirSim native Python APIs and ROS 2 interfaces, enabling zero-modification code reuse. Within a shared physics tick and rendering pipeline, CARLA-Air delivers photorealistic environments with rule-compliant traffic, socially-aware pedestrians, and aerodynamically consistent UAV dynamics, synchronously capturing up to 18 sensor modalities across all platforms at each tick. The platform supports representative air-ground embodied intelligence workloads spanning cooperation, embodied navigation and vision-language action, multi-modal perception and dataset construction, and reinforcement-learning-based policy training. An extensible asset pipeline allows integration of custom robot platforms into the shared world. By inheriting AirSim's aerial capabilities -- whose upstream development has been archived -- CARLA-Air ensures this widely adopted flight stack continues to evolve within a modern infrastructure. Released with prebuilt binaries and full source: https://github.com/louiszengCN/CarlaAir

Authors:Xiao Fan, Yi Zhang
Title: BrainRing: An Interactive Web-Based Tool for Brain Connectivity Chord Diagram Visualization
Abstract:
Visualizing brain functional connectivity (FC) patterns is essential for understanding neural organization, yet existing tools such as Circos and BrainNet Viewer require complex configuration files or proprietary software environments. We present BrainRing, a free, open-source, browser-based interactive tool for generating publication-quality chord diagrams of brain connectivity data. BrainRing requires no installation, backend server, or programming knowledge. Users simply open a single HTML file in any modern browser. The tool supports 8 widely-used brain atlases (Brainnetome 246, AAL-90/116, Schaefer 100/200/400, Power 264, and Dosenbach 160), provides real-time parameter adjustment through an intuitive graphical interface, and offers comprehensive edge management including click-to-connect, per-edge color customization, and Circos link file import. BrainRing supports both Chinese and English interfaces and enables researchers to produce publication-ready SVG and PNG figures with full control over visual styling, all within seconds rather than the minutes-to-hours workflow typical of script-based approaches. BrainRing is freely available at https://github.com/XiuFan719/brain-connectivity-viz with a live demo at https://XiuFan719.github.io/brain-connectivity-viz/.

Authors:Max Holschneider, Saetbyeol LeeYouk
Title: PII Shield: A Browser-Level Overlay for User-Controlled Personal Identifiable Information (PII) Management in AI Interactions
Abstract:
AI chatbots have quietly become the world's most popular therapists, coaches, and confidants. Users of cloud-based LLM services are increasingly shifting from simple queries like idea generation and poem writing, to deeply personal interactions. As Large Language Models increasingly assume the role of our confessors, we are witnessing a massive, unregulated transfer of sensitive personal identifiable information (PII) to powerful tech companies with opaque privacy practices. While the enterprise sector has made great strides in addressing data leakage concerns through sophisticated guardrails and PII redaction pipelines, these powerful tools have functionally remained inaccessible for the average user due to their technical complexity. This results in a dangerous trade off for individual users. In order to receive the therapeutic or productivity benefits of AI, users need to abandon any agency they might otherwise have over their data, often without a clear mental model of what is being shared, and how it might be used for advertising later on. This work addresses this interaction gap, applying the redaction pipelines of enterprise-grade redaction into an intuitive, first-of-its-kind, consumer-facing, and free experience. Specifically, this work introduces a scalable, browser-based intervention designed to help align user behavior with their privacy preferences during web-based AI interactions. Our system introduces two key mechanisms: local entity anonymization to prevent data leakage, and 'smokescreens': autonomous agent activity to disrupt third-party profiling. An open-source implementation is accessible at the GitHub repository below.

Authors:Ruofei Du, Benjamin Hersh, David Li, Nels Numan, Xun Qian, Yanhe Chen, Zhongyi Zhou, Xingyue Chen, Jiahao Ren, Robert Timothy Bettridge, Xiang 'Anthony' Chen, Faraz Faruqi, Steve Toh, David Kim
Title: Vibe Coding XR: Accelerating AI + XR Prototyping with XR Blocks and Gemini
Abstract:
While large language models (LLMs) have accelerated 2D software development through intent-driven "vibe coding", prototyping intelligent Extended Reality (XR) experiences remains a major challenge. The fundamental barrier is not just the steep learning curve for human creators, but that low-level sensor APIs and complex game engine hierarchies are ill-suited for LLM reasoning, routinely exceeding context windows and inducing syntax hallucinations. To bridge this gap, we contribute XR Blocks, an open-source, LLM-native WebXR framework. Unlike traditional engines, XR Blocks introduces a semantic "Reality Model" that aligns spatial computing primitives (users, physical environments, and agents) with natural language, providing a robust, concise vocabulary optimized for generative AI. Building upon this foundation, we present Vibe Coding XR, an end-to-end prototyping workflow that leverages LLMs to translate high-level prompts (e.g., "create a dandelion that reacts to my hand") directly into functional, physics-aware mixed-reality applications. To minimize the friction of on-device testing, the workflow introduces a seamless desktop "simulated reality" to headset deployment loop. Finally, we introduce VCXR60, a pilot dataset of 60 XR prompts paired with an automated evaluation pipeline. Our technical evaluation demonstrates high one-shot execution success, enabling practitioners to bypass lowlevel hurdles and rapidly move from "idea to reality". Code and live demos are available at https://github.com/google/xrblocks and http://xrblocks.github.io/gem.

Authors:Sunwhi Kim, Sunyul Kim
Title: Human Factors in Detecting AI-Generated Portraits: Age, Sex, Device, and Confidence
Abstract:
Generative AI now produces photorealistic portraits that circulate widely in social and newslike contexts. Human ability to distinguish real from synthetic faces is time-sensitive because image generators continue to improve while public familiarity with synthetic media also changes. Here, we provide a time-stamped snapshot of human ability to distinguish real from AI-generated portraits produced by models available in July 2025. In a large-scale web experiment conducted from August 2025 to January 2026, 1,664 participants aged 20-69 years (mobile n = 1,330; PC n = 334) completed a two-alternative forced-choice task (REAL vs AI). Each participant judged 20 trials sampled from a 210-image pool comprising real FFHQ photographs and AI-generated portraits from ChatGPT-4o and Imagen 3. Overall accuracy was high (mean 85.2%, median 90%) but varied across groups. PC participants outperformed mobile participants by 3.65 percentage points. Accuracy declined with age in both device cohorts and more steeply on mobile than on PC (-0.607 vs -0.230 percentage points per year). Self-rated AI-detection confidence and AI exposure were positively associated with accuracy and statistically accounted for part of the age-related decline, with confidence accounting for the larger share. In the mobile cohort, an age-related sex divergence emerged among participants in their 50s and 60s, with female participants performing worse. Trial-level reaction-time models showed that correct AI judgments were faster than correct real judgments, whereas incorrect AI judgments were slower than incorrect real judgments. ChatGPT-4o portraits were harder and slower to classify than Imagen 3 portraits and were associated with a steeper age-related decline in performance. These findings frame AI portrait detection as a human-factors problem shaped by age, sex, device context, and confidence, not image realism alone.

Authors:Haiyang Xu, Ronghuan Wu, Li-Yi Wei, Nanxuan Zhao, Chenxi Liu, Cuong Nguyen, Zhuowen Tu, Zhaowen Wang
Title: SemLayer: Semantic-aware Generative Segmentation and Layer Construction for Abstract Icons
Abstract:
Graphic icons are a cornerstone of modern design workflows, yet they are often distributed as flattened single-path or compound-path graphics, where the original semantic layering is lost. This absence of semantic decomposition hinders downstream tasks such as editing, restyling, and animation. We formalize this problem as semantic layer construction for flattened vector art and introduce SemLayer, a visual generation empowered pipeline that restores editable layered structures. Given an abstract icon, SemLayer first generates a chromatically differentiated representation in which distinct semantic components become visually separable. To recover the complete geometry of each part, including occluded regions, we then perform a semantic completion step that reconstructs coherent object-level shapes. Finally, the recovered parts are assembled into a layered vector representation with inferred occlusion relationships. Extensive qualitative comparisons and quantitative evaluations demonstrate the effectiveness of SemLayer, enabling editing workflows previously inapplicable to flattened vector graphics and establishing semantic layer reconstruction as a practical and valuable task. Project page: https://xxuhaiyang.github.io/SemLayer/

Authors:Hanzhong Zhang, Siyang Song, Jindong Wang
Title: Beyond Preset Identities: How Agents Form Stances and Boundaries in Generative Societies
Abstract:
While large language models simulate social behaviors, their capacity for stable stance formation and identity negotiation during complex interventions remains unclear. To overcome the limitations of static evaluations, this paper proposes a novel mixed-methods framework combining computational virtual ethnography with quantitative socio-cognitive profiling. By embedding human researchers into generative multiagent communities, controlled discursive interventions are conducted to trace the evolution of collective cognition. To rigorously measure how agents internalize and react to these specific interventions, this paper formalizes three new metrics: Innate Value Bias (IVB), Persuasion Sensitivity, and Trust-Action Decoupling (TAD). Across multiple representative models, agents exhibit endogenous stances that override preset identities, consistently demonstrating an innate progressive bias (IVB > 0). When aligned with these stances, rational persuasion successfully shifts 90% of neutral agents while maintaining high trust. In contrast, conflicting emotional provocations induce a paradoxical 40.0% TAD rate in advanced models, which hypocritically alter stances despite reporting low trust. Smaller models contrastingly maintain a 0% TAD rate, strictly requiring trust for behavioral shifts. Furthermore, guided by shared stances, agents use language interactions to actively dismantle assigned power hierarchies and reconstruct self organized community boundaries. These findings expose the fragility of static prompt engineering, providing a methodological and quantitative foundation for dynamic alignment in human-agent hybrid societies. The official code is available at: https://github.com/armihia/CMASE-Endogenous-Stances

Authors:Yunfan Zhou, Qiming Shi, Zhongsu Luo, Xiwen Cai, Yanwei Huang, Dae Hyun Kim, Di Weng, Yingcai Wu
Title: Cerebra: Aligning Implicit Knowledge in Interactive SQL Authoring
Abstract:
LLM-driven tools have significantly lowered barriers to writing SQL queries. However, user instructions are often underspecified, assuming the model understands implicit knowledge, such as dataset schemas, domain conventions, and task-specific requirements, that isn't explicitly provided. This results in frequently erroneous scripts that require users to repeatedly clarify their intent. Additionally, users struggle to validate generated scripts because they cannot verify whether the model correctly applied implicit knowledge. We present Cerebra, an interactive NL-to-SQL tool that aligns implicit knowledge between users and LLMs during SQL authoring. Cerebra automatically retrieves implicit knowledge from historical SQL scripts based on user instructions, presents this knowledge in an interactive tree view for code review, and supports iterative refinement to improve generated scripts. To evaluate the effectiveness and usability of Cerebra, we conducted a user study with 16 participants, demonstrating its improved support for customized SQL authoring. The source code of Cerebra is available at https://github.com/zjuidg/CHI26-Cerebra.

Authors:Taara Kumar, Kokil Jaidka
Title: Reading Between the Lines: How Electronic Nonverbal Cues shape Emotion Decoding
Abstract:
As text-based computer-mediated communication (CMC) increasingly structures everyday interaction, a central question re-emerges with new urgency: How do users reconstruct nonverbal expression in environments where embodied cues are absent? This paper provides a systematic, theory-driven account of electronic nonverbal cues (eNVCs) - textual analogues of kinesics, vocalics, and paralinguistics - in public microblog communication. Across three complementary studies, we advance conceptual, empirical, and methodological contributions. Study 1 develops a unified taxonomy of eNVCs grounded in foundational nonverbal communication theory and introduces a scalable Python toolkit for their automated detection. Study 2, a within-subject survey experiment, offers controlled causal evidence that eNVCs substantially improve emotional decoding accuracy and lower perceived ambiguity, while also identifying boundary conditions, such as sarcasm, under which these benefits weaken or disappear. Study 3, through focus group discussions, reveals the interpretive strategies users employ when reasoning about digital prosody, including drawing meaning from the absence of expected cues and defaulting toward negative interpretations in ambiguous contexts. Together, these studies establish eNVCs as a coherent and measurable class of digital behaviors, refine theoretical accounts of cue richness and interpretive effort, and provide practical tools for affective computing, user modeling, and emotion-aware interface design. The eNVC detection toolkit is available as a Python and R package at https://github.com/kokiljaidka/envc.

Authors:Daniel Autenrieth
Title: How AI Systems Think About Education: Analyzing Latent Preference Patterns in Large Language Models
Abstract:
This paper presents the first systematic measurement of educational alignment in Large Language Models. Using a Delphi-validated instrument comprising 48 items across eight educational-theoretical dimensions, the study reveals that GPT-5.1 exhibits highly coherent preference patterns (99.78% transitivity; 92.79% model accuracy) that largely align with humanistic educational principles where expert consensus exists. Crucially, divergences from expert opinion occur precisely in domains of normative disagreement among human experts themselves, particularly emotional dimensions and epistemic normativity. This raises a fundamental question for alignment research: When human values are contested, what should models be aligned to? The findings demonstrate that GPT-5.1 does not remain neutral in contested domains but adopts coherent positions, prioritizing emotional responsiveness and rejecting false balance. The methodology, combining Delphi consensus-building with Structured Preference Elicitation and Thurstonian Utility modeling, provides a replicable framework for domain-specific alignment evaluation beyond generic value benchmarks.

Authors:Yuren Hao, Shuhaib Mehri, ChengXiang Zhai, Dilek Hakkani-Tür
Title: User Preference Modeling for Conversational LLM Agents: Weak Rewards from Retrieval-Augmented Interaction
Abstract:
Large language models are increasingly used as personal assistants, yet most lack a persistent user model, forcing users to repeatedly restate preferences across sessions. We propose Vector-Adapted Retrieval Scoring (VARS), a pipeline-agnostic, frozen-backbone framework that represents each user with long-term and short-term vectors in a shared preference space and uses these vectors to bias retrieval scoring over structured preference memory. The vectors are updated online from weak scalar rewards from users' feedback, enabling personalization without per-user fine-tuning. We evaluate on \textsc{MultiSessionCollab}, an online multi-session collaboration benchmark with rich user preference profiles, across math and code tasks. Under frozen backbones, the main benefit of user-aware retrieval is improved interaction efficiency rather than large gains in raw task accuracy: our full VARS agent achieves the strongest overall performance, matches a strong Reflection baseline in task success, and reduces timeout rate and user effort. The learned long-term vectors also align with cross-user preference overlap, while short-term vectors capture session-specific adaptation, supporting the interpretability of the dual-vector design. Code, model, and data are available at https://github.com/YurenHao0426/VARS.

Authors:Kanishka Mitra, Frigyes Samuel Racz, Satyam Kumar, Ashish D. Deshpande, José del R. Millán
Title: Characterizing the onset and offset of motor imagery during passive arm movements induced by an upper-body exoskeleton
Abstract:
Two distinct technologies have gained attention lately due to their prospects for motor rehabilitation: robotics and brain-machine interfaces (BMIs). Harnessing their combined efforts is a largely uncharted and promising direction that has immense clinical potential. However, a significant challenge is whether motor intentions from the user can be accurately detected using non-invasive BMIs in the presence of instrumental noise and passive movements induced by the rehabilitation exoskeleton. As an alternative to the straightforward continuous control approach, this study instead aims to characterize the onset and offset of motor imagery during passive arm movements induced by an upper-body exoskeleton to allow for the natural control (initiation and termination) of functional movements. Ten participants were recruited to perform kinesthetic motor imagery (MI) of the right arm while attached to the robot, simultaneously cued with LEDs indicating the initiation and termination of a goal-oriented reaching task. Using electroencephalogram signals, we built a decoder to detect the transition between i) rest and beginning MI and ii) maintaining and ending MI. Offline decoder evaluation achieved group average onset accuracy of 60.7% and 66.6% for offset accuracy, revealing that the start and stop of MI could be identified while attached to the robot. Furthermore, pseudo-online evaluation could replicate this performance, forecasting reliable online exoskeleton control in the future. Our approach showed that participants could produce quality and reliable sensorimotor rhythms regardless of noise or passive arm movements induced by wearing the exoskeleton, which opens new possibilities for BMI control of assistive devices.

Authors:Christopher J. Agostino, Quan Le Thien, Nayan D'Souza, Louis van der Elst
Title: The production of meaning in the processing of natural language
Abstract:
Understanding the fundamental mechanisms governing the production of meaning in the processing of natural language is critical for designing safe, thoughtful, engaging, and empowering human-agent interactions. Experiments in cognitive science and social psychology have demonstrated that human semantic processing exhibits contextuality more consistent with quantum logical mechanisms than classical Boolean theories, and recent works have found similar results in large language models -- in particular, clear violations of the Bell inequality in experiments of contextuality during interpretation of ambiguous expressions. We explore the CHSH $|S|$ parameter -- the metric associated with the inequality -- across the inference parameter space of models spanning four orders of magnitude in scale, cross-referencing it with MMLU, hallucination rate, and nonsense detection benchmarks. We find that the interquartile range of the $|S|$ distribution -- the statistic that most sharply differentiates models from one another -- is completely orthogonal to all external benchmarks, while violation rate shows weak anticorrelation with all three benchmarks that does not reach significance. We investigate how $|S|$ varies with sampling parameters and word order, and discuss the information-theoretic constraints that genuine contextuality imposes on prompt injection defenses and its human analogue, whereby careful construction and maintenance of social contextuality can be carried out at scale -- manufacturing not consent but contextuality itself, a subtler and more fundamental form of manipulation that shapes the space of possible interpretations before any particular one is reached.

Authors:Christopher J. Agostino, Nayan D'Souza
Title: ALARA for Agents: Least-Privilege Context Engineering Through Portable Composable Multi-Agent Teams
Abstract:
Industry practitioners and academic researchers regularly use multi-agent systems to accelerate their work, yet the frameworks through which these systems operate do not provide a simple, unified mechanism for scalably managing the critical aspects of the agent harness, impacting both the quality of individual human-agent interactions and the capacity for practitioners to coordinate toward common goals through shared agent infrastructure. Agent frameworks have enabled increasingly sophisticated multi-agent systems, but the behavioral specifications that define what these agents can do remain fragmented across prose instruction files, framework-internal configuration, and mechanisms like MCP servers that operate separately from individual agent definitions, making these specifications difficult to share, version, or collaboratively maintain across teams and projects. Applying the ALARA principle from radiation safety (exposures kept as low as reasonably achievable) to agent context, we introduce a declarative context-agent-tool (CAT) data layer expressed through interrelated files that scope each agent's tool access and context to the minimum its role requires, and \texttt{npcsh}, a command-line shell for executing it. Because the system parses and enforces these files structurally, modifying an agent's tool list produces a guaranteed behavioral change rather than a suggestion the model may or may not follow. We evaluate 22 locally-hosted models from 0.6B to 35B parameters across 115 practical tasks spanning file operations, web search, multi-step scripting, tool chaining, and multi-agent delegation, characterizing which model families succeed at which task categories and where they break down across $\sim$2500 total executions.

Authors:Nico Schuster, Andrés N. Salcedo, Simon Bouchard, Dennis Frei, Alice Pisani, Julian E. Bautista, Julien Zoubian, Stephanie Escoffier, Wei Liu, Georgios Valogiannis, Pauline Zarrouk
Title: Setting SAIL: Leveraging Scientist-AI-Loops for Rigorous Visualization Tools
Abstract:
Scientists across all disciplines share a common challenge: the divide between their theoretical knowledge and the specialized skills and time needed to build interactive tools to communicate this expertise. While large language models (LLMs) offer unparalleled acceleration in code generation, they frequently prioritize functional syntax over scientific accuracy, risking visually convincing but scientifically invalid results. This work advocates the Scientist-AI-Loop (SAIL), a framework designed to harness this speed without compromising rigor. By separating domain logic from code syntax, SAIL enables researchers to maintain strict oversight of scientific concepts and constraints while delegating code implementation to AI. We illustrate this approach through two open-source, browser-based astrophysics tools: an interactive gravitational lensing visualization and a large-scale structure formation sandbox, both publicly available. Our methodology condensed development to mere days while maintaining scientific integrity. We specifically address failure modes where AI-generated code neglects phenomenological boundaries or scientific validity. While cautioning that research-grade code requires stringent protocols, we demonstrate through two examples that SAIL provides an effective code generation workflow for outreach, teaching, professional presentations, and early-stage research prototyping. This framework contributes to a foundation for the further development of AI-assisted scientific software.

Authors:Kanishka Mitra, Satyam Kumar, Frigyes Samuel Racz, Deland Liu, Ashish D. Deshpande, José del R. Millán
Title: Real-Time Decoding of Movement Onset and Offset for Brain-Controlled Rehabilitation Exoskeleton
Abstract:
Robot-assisted therapy can deliver high-dose, task-specific training after neurologic injury, but most systems act primarily at the limb level-engaging the impaired neural circuits only indirectly-which remains a key barrier to truly contingent, neuroplasticity-targeted rehabilitation. We address this gap by implementing online, dual-state motor imagery control of an upper-limb exoskeleton, enabling goal-directed reaches to be both initiated and terminated directly from non-invasive EEG. Eight participants used EEG to initiate assistance and then volitionally halt the robot mid-trajectory. Across two online sessions, group-mean hit rates were 61.5% for onset and 64.5% for offset, demonstrating reliable start-stop command delivery despite instrumental noise and passive arm motion. Methodologically, we reveal a systematic, class-driven bias induced by common task-based recentering using an asymmetric margin diagnostic, and we introduce a class-agnostic fixation-based recentering method that tracks drift without sampling command classes while preserving class geometry. This substantially improves threshold-free separability (AUC gains: onset +56%, p = 0.0117; offset +34%, p = 0.0251) and reduces bias within and across days. Together, these results help bridge offline decoding and practical, intention-driven start-stop control of a rehabilitation exoskeleton, enabling precisely timed, contingent assistance aligned with neuroplasticity goals while supporting future clinical translation.

Authors:Bo Pan, Lunke Pan, Yitao Zhou, Qi Jiang, Zhen Wen, Minfeng Zhu, Wei Chen
Title: InterDeepResearch: Enabling Human-Agent Collaborative Information Seeking through Interactive Deep Research
Abstract:
Deep research systems powered by LLM agents have transformed complex information seeking by automating the iterative retrieval, filtering, and synthesis of insights from massive-scale web sources. However, existing systems predominantly follow an autonomous "query-to-report" paradigm, limiting users to a passive role and failing to integrate their personal insights, contextual knowledge, and evolving research intents. This paper addresses the lack of human-in-the-loop collaboration in the agentic research process. Through a formative study, we identify that current systems hinder effective human-agent collaboration in terms of process observability, real-time steerability, and context navigation efficiency. Informed by these findings, we propose InterDeepResearch, an interactive deep research system backed by a dedicated research context management framework. The framework organizes research context into a hierarchical architecture with three levels (information, actions, and sessions), enabling dynamic context reduction to prevent LLM context exhaustion and cross-action backtracing for evidence provenance. Built upon this framework, the system interface integrates three coordinated views for visual sensemaking, and dedicated interaction mechanisms for interactive research context navigation. Evaluation on the Xbench-DeepSearch-v1 and Seal-0 benchmarks shows that InterDeepResearch achieves competitive performance compared to state-of-the-art deep research systems, while a formal user study demonstrates its effectiveness in supporting human-agent collaborative information seeking. Project page with system demo: https://github.com/bopan3/InterDeepResearch.

Authors:Matias Loukojärvi, Ananth Mahadevan, Katsiaryna Haitsiukevich, Kai Puolamäki
Title: PhiPlot: A Web-Based Interactive EDA Environment for Atmospherically Relevant Molecules
Abstract:
Advances in computational chemistry have produced high-dimensional datasets on atmospherically relevant molecules. To aid exploration of such datasets, particularly for the study of atmospheric aerosol formation, we introduce PhiPlot: a web-based environment for interactive exploration and knowledge-based dimensionality reduction. The integration of visualisation, clustering, and domain knowledge-guided embedding refinement enables the discovery of patterns in the data and supports hypothesis generation. The application connects to an existing, evolving collection of molecular databases, offering an accessible interface for data-driven research in atmospheric chemistry.

Authors:Srikrishna Bangalore Raghu, Anna Soukhovei, Divya Sai Sindhuja Vankineni, Alexandra Bacula, Alessandro Roncone
Title: Dance2Hesitate: A Multi-Modal Dataset of Dancer-Taught Hesitancy for Understandable Robot Motion
Abstract:
In human-robot collaboration, a robot's expression of hesitancy is a critical factor that shapes human coordination strategies, attention allocation, and safety-related judgments. However, designing hesitant robot motion that generalizes is challenging because the observer's inference is highly dependent on embodiment and context. To address these challenges, we introduce and open-source a multi-modal, dancer-generated dataset of hesitant motion where we focus on specific context-embodiment pairs (i.e., manipulator/human upper-limb approaching a Jenga Tower, and anthropomorphic whole body motion in free space). The dataset includes (i) kinesthetic teaching demonstrations on a Franka Emika Panda reaching from a fixed start configuration to a fixed target (a Jenga tower) with three graded hesitancy levels (slight, significant, extreme) and (ii) synchronized RGB-D motion capture of dancers performing the same reaching behavior using their upper limb across three hesitancy levels, plus full human body sequences for extreme hesitancy. We further provide documentation to enable reproducible benchmarking across robot and human modalities. Across all dancers, we obtained 70 unique whole-body trajectories, 84 upper limb trajectories spanning over the three hesitancy levels, and 66 kinesthetic teaching trajectories spanning over the three hesitancy levels. The dataset can be accessed here: https://brsrikrishna.github.io/Dance2Hesitate/.

Authors:Bhada Yun, Evgenia Taranova, Dana Feng, Renn Su, April Yi Wang
Title: AI Phenomenology for Understanding Human-AI Experiences Across Eras
Abstract:
There is no 'ordinary' when it comes to AI. The human-AI experience is extraordinarily complex and specific to each person, yet dominant measures such as usability scales and engagement metrics flatten away nuance. We argue for AI phenomenology: a research stance that asks "How did it feel?" beyond the standard questions of "How well did it perform?" when interacting with AI systems. AI phenomenology acts as a paradigm for bidirectional human-AI alignment as it foregrounds users' first-person perceptions and interpretations of AI systems over time. We motivate AI phenomenology as a framework that captures how alignment is experienced, negotiated, and updated between users and AI systems. Tracing a lineage from Husserl through postphenomenology to Actor-Network Theory, and grounding our argument in three studies-two longitudinal studies with "Day", an AI companion, and a multi-method study of agentic AI in software engineering-we contribute a set of replicable methodological toolkits for conducting AI phenomenology research: instruments for capturing lived experience across personal and professional contexts, three design concepts (translucent design, agency-aware value alignment, temporal co-evolution tracking), and a concrete research agenda. We offer this toolkit not as a new paradigm but as a practical scaffold that researchers can adapt as AI systems-and the humans who live alongside them-continue to co-evolve.

Authors:Patrick Ebel, Michał Patryk Miazga, Martin Lorenz, Timur Getselev, Pavlo Bazilinskyy, Celine Conzen
Title: MRDrive: An Open Source Mixed Reality Driving Simulator for Automotive User Research
Abstract:
Designing and evaluating in-vehicle interfaces requires experimental platforms that combine ecological validity with experimental control. Driving simulators are widely used for this purpose. However, they face a fundamental trade-off: high-fidelity physical simulators are costly and difficult to adapt, while virtual reality simulators provide flexibility at the expense of physical interaction with the vehicle. In this work, we present MRDrive, an open mixed-reality driving simulator designed to support HCI research on in-vehicle interaction, attention, and explainability in manual and automated driving contexts. MRDrive enables drivers and passengers to interact with a real vehicle cabin while being fully immersed in a virtual driving environment. We demonstrate the capabilities of MRDrive through a small pilot study that illustrates how the simulator can be used to collect and analyze eye-tracking and touch interaction data in an automated driving scenario. MRDRive is available at: https://github.com/ciao-group/mrdrive

Authors:Li Gu, Zihuan Jiang, Zhixiang Chi, Huan Liu, Ziqiang Wang, Yuanhao Yu, Glen Berseth, Yang Wang
Title: Generalization in Online Reinforcement Learning for Mobile Agents
Abstract:
Graphical user interface (GUI)-based mobile agents automate digital tasks on mobile devices by interpreting natural-language instructions and interacting with the screen. While recent methods apply reinforcement learning (RL) to train vision-language-model(VLM) agents in interactive environments with a primary focus on performance, generalization remains underexplored due to the lack of standardized benchmarks and open-source RL systems. In this work, we formalize the problem as a Contextual Markov Decision Process (CMDP) and introduce \textbf{AndroidWorld-Generalization}, a benchmark with three increasingly challenging regimes for evaluating zero-shot generalization to unseen task instances, templates, and applications. We further propose an RL training system that integrates Group Relative Policy Optimization (GRPO) with a scalable rollout collection system, consisting of containerized infrastructure and asynchronous execution % , and error recovery to support reliable and efficient training. Experiments on AndroidWorld-Generalization show that RL enables a 7B-parameter VLM agent to surpass supervised fine-tuning baselines, yielding a 26.1\% improvement on unseen instances but only limited gains on unseen templates (15.7\%) and apps (8.3\%), underscoring the challenges of generalization. As a preliminary step, we demonstrate that few-shot adaptation at test-time improves performance on unseen apps, motivating future research in this direction. To support reproducibility and fair comparison, we open-source the full RL training system, including the environment, task suite, models, prompt configurations, and the underlying infrastructure \footnote{https://github.com/zihuanjiang/AndroidWorld-Generalization}.

Authors:Daehee Kang, Yeon-Chang Lee
Title: Multi-TAP: Multi-criteria Target Adaptive Persona Modeling for Cross-Domain Recommendation
Abstract:
Cross-domain recommendation (CDR) aims to alleviate data sparsity by transferring knowledge across domains, yet existing methods primarily rely on coarse-grained behavioral signals and often overlook intra-domain heterogeneity in user preferences. We propose Multi-TAP, a multi-criteria target-adaptive persona framework that explicitly captures such heterogeneity through semantic persona modeling. To enable effective transfer, Multi-TAP selectively incorporates source-domain signals conditioned on the target domain, preserving relevance during knowledge transfer. Experiments on real-world datasets demonstrate that Multi-TAP consistently outperforms state-of-the-art CDR methods, highlighting the importance of modeling intra-domain heterogeneity for robust cross-domain recommendation. The codebase of Multi-TAP is currently available at https://github.com/archivehee/Multi-TAP.

Authors:Omar Shaikh, Valentin Teutschbein, Kanishk Gandhi, Yikun Chi, Nick Haber, Thomas Robinson, Nilam Ram, Byron Reeves, Sherry Yang, Michael S. Bernstein, Diyi Yang
Title: Learning Next Action Predictors from Human-Computer Interaction
Abstract:
Truly proactive AI systems must anticipate what we will do next. This foresight demands far richer information than the sparse signals we type into our prompts -- it demands reasoning over the entire context of what we see and do. We formalize this as next action prediction (NAP): given a sequence of a user's multimodal interactions with a computer (screenshots, clicks, sensor data), predict that user's next action. Progress on this task requires both new data and modeling approaches. To scale data, we annotate longitudinal, naturalistic computer use with vision-language models. We release an open-source pipeline for performing this labeling on private infrastructure, and label over 360K actions across one month of continuous phone usage from 20 users, amounting to 1,800 hours of screen time. We then introduce LongNAP, a user model that combines parametric and in-context learning to reason over long interaction histories. LongNAP is trained via policy gradient methods to generate user-specific reasoning traces given some context; retrieve relevant traces from a library of past traces; and then apply retrieved traces in-context to predict future actions. Using an LLM-as-judge evaluation metric (0-1 similarity to ground truth), LongNAP significantly outperforms supervised finetuning and prompted baselines on held-out data (by 79% and 39% respectively). Additionally, LongNAP generalizes to held out users when trained across individuals. The space of next actions a user might take at any moment is unbounded, spanning thousands of possible outcomes. Despite this, 17.1% of LongNAP's predicted trajectories are well-aligned with what a user does next (LLM-judge score $\geq$ 0.5). This rises to 26% when we filter to highly confident predictions. In sum, we argue that learning from the full context of user behavior to anticipate user needs is now a viable task with substantial opportunity.

Authors:Diego Armando Resendez Prado
Title: Ailed: A Psyche-Driven Chess Engine with Dynamic Emotional Modulation
Abstract:
Chess engines passed human strength years ago, but they still don't play like humans. A grandmaster under clock pressure blunders in ways a club player on a hot streak never would. Conventional engines capture none of this. This paper proposes a personality x psyche decomposition to produce behavioral variability in chess play, drawing on patterns observed in human games. Personality is static -- a preset that pins down the engine's character. Psyche is dynamic -- a bounded scalar ψ_t \in [-100, +100], recomputed from five positional factors after every move. These two components feed into an audio-inspired signal chain (noise gate, compressor/expander, five-band equalizer, saturation limiter) that reshapes move probability distributions on the fly. The chain doesn't care what engine sits behind it: any system that outputs move probabilities will do. It needs no search and carries no state beyond ψ_t. I test the framework across 12,414 games against Maia2-1100, feeding it two probability sources that differ by ~2,800x in training data. Both show the same monotonic gradient in top-move agreement (~20-25 pp spread from stress to overconfidence), which tells us the behavioral variation comes from the signal chain, not from the model underneath. When the psyche runs overconfident, the chain mostly gets out of the way (66% agreement with vanilla Maia2). Under stress, the competitive score falls from 50.8% to 30.1%. The patterns are reminiscent of tilt and overconfidence as described in human play, but I should be upfront: this study includes no human-subject validation.

Authors:Yuchen Wang, Haonan Wang, Yu Guo, Honglong Yang, Xiaomeng Li
Title: Escaping the BLEU Trap: A Signal-Grounded Framework with Decoupled Semantic Guidance for EEG-to-Text Decoding
Abstract:
Decoding natural language from non-invasive EEG signals is a promising yet challenging task. However, current state-of-the-art models remain constrained by three fundamental limitations: Semantic Bias (mode collapse into generic templates), Signal Neglect (hallucination based on linguistic priors rather than neural inputs), and the BLEU Trap, where evaluation metrics are artificially inflated by high-frequency stopwords, masking a lack of true semantic fidelity. To address these challenges, we propose SemKey, a novel multi-stage framework that enforces signal-grounded generation through four decoupled semantic objectives: sentiment, topic, length, and surprisal. We redesign the interaction between the neural encoder and the Large Language Model (LLM) by injecting semantic prompts as Queries and EEG embeddings as Key-Value pairs, strictly forcing the model to attend to neural inputs. Furthermore, we move beyond standard translation metrics by adopting N-way Retrieval Accuracy and Fréchet Distance to rigorously assess diversity and alignment. Extensive experiments demonstrate that our approach effectively eliminates hallucinations on noise inputs and achieves SOTA performance on these robust protocols. Code will be released upon acceptance at https://github.com/xmed-lab/SemKey.

Authors:Xuejin Luo, Shiquan Sun, Runshi Zhang, Ruizhi Zhang, Junchen Wang
Title: Give me scissors: Collision-Free Dual-Arm Surgical Assistive Robot for Instrument Delivery
Abstract:
During surgery, scrub nurses are required to frequently deliver surgical instruments to surgeons, which can lead to physical fatigue and decreased focus. Robotic scrub nurses provide a promising solution that can replace repetitive tasks and enhance efficiency. Existing research on robotic scrub nurses relies on predefined paths for instrument delivery, which limits their generalizability and poses safety risks in dynamic environments. To address these challenges, we present a collision-free dual-arm surgical assistive robot capable of performing instrument delivery. A vision-language model is utilized to automatically generate the robot's grasping and delivery trajectories in a zero-shot manner based on surgeons' instructions. A real-time obstacle minimum distance perception method is proposed and integrated into a unified quadratic programming framework. This framework ensures reactive obstacle avoidance and self-collision prevention during the dual-arm robot's autonomous movement in dynamic environments. Extensive experimental validations demonstrate that the proposed robotic system achieves an 83.33% success rate in surgical instrument delivery while maintaining smooth, collision-free movement throughout all trials. The project page and source code are available at https://give-me-scissors.github.io/.

Authors:Ziheng Xi, Zihang Ao, Yitao Wang, Mingeze Gao, Wanmei Zhang, Jianjiang Feng, Jie Zhou
Title: WristPP: A Wrist-Worn System for Hand Pose And Pressure Estimation
Abstract:
Accurate 3D hand pose and pressure sensing is essential for immersive human-computer interaction, yet simultaneously achieving both in mobile scenarios remains a significant challenge. We present WristPP, a camera-based wrist-worn system that estimates 3D hand pose and per-vertex pressure from a single wide-FOV RGB frame in real time. A Vision Transformer (ViT) backbone with joint-aligned tokens predicts Hand-VQVAE codebook indices for mesh recovery, while an extrinsics-conditioned branch jointly estimates per-vertex pressure. On a self-collected dataset of 133,000 frames (20 subjects; 48 on-plane and 28 mid-air gestures), WristPP attains a Mean Per-Joint Position Error (MPJPE) of 2.9 mm, Contact IoU of 0.712, Volumetric IoU of 0.618, and foreground pressure MAE of 10.4 g. Across three user studies, WristPP delivers touchpad-level efficiency in mid-air pointing and robust multi-finger pressure control on an uninstrumented desktop. In a real-world large-display Whac-A-Mole task, WristPP also enables higher success ratio and lower arm fatigue than head-mounted camera-based baselines. These results position WristPP as an effective, mobile solution for versatile pose- and pressure-based interaction. Website: https://zhenqis123.github.io/WristPP/.

Authors:Thom Vaughan, Pedro Ortiz Suarez
Title: Colour Contrast on the Web: A WCAG 2.1 Level AA Compliance Audit of Common Crawl's Top 500 Domains
Abstract:
We present a large-scale automated audit of WCAG 2.1/2.2 Level AA colour contrast compliance across the 500 most frequently crawled registered domains in Common Crawl's CC-MAIN-2026-08 February 2026 crawl archive. Rather than conducting a live crawl, all page content was sourced from Common Crawl's open WARC archives, ensuring reproducibility and eliminating any load on target web servers. Our static CSS analysis of 240 homepages identified 4,327 unique foreground/background colour pairings, of which 1,771 (40.9%) failed to meet the 4.5:1 contrast ratio threshold for normal text. The median per-site pass rate was 62.7%, with 20.4% of sites achieving full compliance across all detected colour pairings. These findings suggest that colour contrast remains a widespread accessibility barrier on the most prominent websites, with significant variation across domain categories.

Authors:Cosmo Santoni
Title: Contextual Memory Virtualisation: DAG-Based State Management and Structurally Lossless Trimming for LLM Agents
Abstract:
As large language models engage in extended reasoning tasks, they accumulate significant state -- architectural mappings, trade-off decisions, codebase conventions -- within the context window. This understanding is lost when sessions reach context limits and undergo lossy compaction. We propose Contextual Memory Virtualisation (CMV), a system that treats accumulated LLM understanding as version-controlled state. Borrowing from operating system virtual memory, CMV models session history as a Directed Acyclic Graph (DAG) with formally defined snapshot, branch, and trim primitives that enable context reuse across independent parallel sessions. We introduce a three-pass structurally lossless trimming algorithm that preserves every user message and assistant response verbatim while reducing token counts by a mean of 20% and up to 86% for sessions with significant overhead by stripping mechanical bloat such as raw tool outputs, base64 images, and metadata. A single-user case-study evaluation across 76 real-world coding sessions demonstrates that trimming remains economically viable under prompt caching, with the strongest gains in mixed tool-use sessions, which average 39% reduction and reach break-even within 10 turns. A reference implementation is available at https://github.com/CosmoNaught/claude-code-cmv.

Authors:David Anugraha, Vishakh Padmakumar, Diyi Yang
Title: SparkMe: Adaptive Semi-Structured Interviewing for Qualitative Insight Discovery
Abstract:
Qualitative insights from user experiences are critical for informing product and policy decisions, but collecting such data at scale is constrained by the time and availability of experts to conduct semi-structured interviews. Recent work has explored using large language models (LLMs) to automate interviewing, yet existing systems lack a principled mechanism for balancing systematic coverage of predefined topics with adaptive exploration, or the ability to pursue follow-ups, deep dives, and emergent themes that arise organically during conversation. In this work, we formulate adaptive semi-structured interviewing as an optimization problem over the interviewer's behavior. We define interview utility as a trade-off between coverage of a predefined interview topic guide, discovery of relevant emergent themes, and interview cost measured by length. Based on this formulation, we introduce SparkMe, a multi-agent LLM interviewer that performs deliberative planning via simulated conversation rollouts to select questions with high expected utility. We evaluate SparkMe through controlled experiments with LLM-based interviewees, showing that it achieves higher interview utility, improving topic guide coverage (+4.7% over the best baseline) and eliciting richer emergent insights while using fewer conversational turns than prior LLM interviewing approaches. We further validate SparkMe in a user study with 70 participants across 7 professions on the impact of AI on their workflows. Domain experts rate SparkMe as producing high-quality adaptive interviews that surface helpful profession-specific insights not captured by prior approaches. The code, datasets, and evaluation protocols for SparkMe are available as open-source at https://github.com/SALT-NLP/SparkMe.

Authors:Pengfei Zhang, Tianxin Xie, Minghao Yang, Li Liu
Title: Resp-Agent: An Agent-Based System for Multimodal Respiratory Sound Generation and Disease Diagnosis
Abstract:
Deep learning-based respiratory auscultation is currently hindered by two fundamental challenges: (i) inherent information loss, as converting signals into spectrograms discards transient acoustic events and clinical context; (ii) limited data availability, exacerbated by severe class imbalance. To bridge these gaps, we present Resp-Agent, an autonomous multimodal system orchestrated by a novel Active Adversarial Curriculum Agent (Thinker-A$^2$CA). Unlike static pipelines, Thinker-A$^2$CA serves as a central controller that actively identifies diagnostic weaknesses and schedules targeted synthesis in a closed loop. To address the representation gap, we introduce a Modality-Weaving Diagnoser that weaves EHR data with audio tokens via Strategic Global Attention and sparse audio anchors, capturing both long-range clinical context and millisecond-level transients. To address the data gap, we design a Flow Matching Generator that adapts a text-only Large Language Model (LLM) via modality injection, decoupling pathological content from acoustic style to synthesize hard-to-diagnose samples. As a foundation for these efforts, we introduce Resp-229k, a benchmark corpus of 229k recordings paired with LLM-distilled clinical narratives. Extensive experiments demonstrate that Resp-Agent consistently outperforms prior approaches across diverse evaluation settings, improving diagnostic robustness under data scarcity and long-tailed class imbalance. Our code and data are available at https://github.com/zpforlove/Resp-Agent.

Authors:Ioannis Dravilas, Ioannis Kapetangeorgis, Anastasios Latsoudis, Conor McCarthy, Gonçalo Marcelino, Marcel Worring
Title: InfoCIR: Multimedia Analysis for Composed Image Retrieval
Abstract:
Composed Image Retrieval (CIR) allows users to search for images by combining a reference image with a text prompt that describes desired modifications. While vision-language models like CLIP have popularized this task by embedding multiple modalities into a joint space, developers still lack tools that reveal how these multimodal prompts interact with embedding spaces and why small wording changes can dramatically alter the results. We present InfoCIR, a visual analytics system that closes this gap by coupling retrieval, explainability, and prompt engineering in a single, interactive dashboard. InfoCIR integrates a state-of-the-art CIR back-end (SEARLE arXiv:2303.15247) with a six-panel interface that (i) lets users compose image + text queries, (ii) projects the top-k results into a low-dimensional space using Uniform Manifold Approximation and Projection (UMAP) for spatial reasoning, (iii) overlays similarity-based saliency maps and gradient-derived token-attribution bars for local explanation, and (iv) employs an LLM-powered prompt enhancer that generates counterfactual variants and visualizes how these changes affect the ranking of user-selected target images. A modular architecture built on Plotly-Dash allows new models, datasets, and attribution methods to be plugged in with minimal effort. We argue that InfoCIR helps diagnose retrieval failures, guides prompt enhancement, and accelerates insight generation during model development. All source code allowing for a reproducible demo is available at https://github.com/giannhskp/InfoCIR.

Authors:Jiangkai Wu, Zhiyuan Ren, Junquan Zhong, Liming Liu, Xinggong Zhang
Title: Artic: AI-oriented Real-time Communication for MLLM Video Assistant
Abstract:
AI Video Assistant emerges as a new paradigm for Real-time Communication (RTC), where one peer is a Multimodal Large Language Model (MLLM) deployed in the cloud. This makes interaction between humans and AI more intuitive, akin to chatting with a real person. However, a fundamental mismatch exists between current RTC frameworks and AI Video Assistants, stemming from the drastic shift in Quality of Experience (QoE) and more challenging networks. Measurements on our production prototype also confirm that current RTC fails, causing latency spikes and accuracy drops. To address these challenges, we propose Artic, an AI-oriented RTC framework for MLLM Video Assistants, exploring the shift from "humans watching video" to "AI understanding video." Specifically, Artic proposes: (1) Response Capability-aware Adaptive Bitrate, which utilizes MLLM accuracy saturation to proactively cap bitrate, reserving bandwidth headroom to absorb future fluctuations for latency reduction; (2) Zero-overhead Context-aware Streaming, which allocates limited bitrate to regions most important for the response, maintaining accuracy even under ultra-low bitrates; and (3) Degraded Video Understanding Benchmark, the first benchmark evaluating how RTC-induced video degradation affects MLLM accuracy. Prototype experiments using real-world uplink traces show that compared with existing methods, Artic significantly improves accuracy by 15.12% and reduces latency by 135.31 ms. We will release the benchmark and codes at https://github.com/pku-netvideo/DeViBench.

Authors:Sahand Sabour, TszYam NG, Minlie Huang
Title: PatientHub: A Unified Framework for Patient Simulation
Abstract:
As Large Language Models increasingly power role-playing applications, simulating patients has become a valuable tool for training counselors and scaling therapeutic assessment. However, prior work is fragmented: existing approaches rely on incompatible, non-standardized data formats, prompts, and evaluation metrics, hindering reproducibility and fair comparison. In this paper, we introduce PatientHub, a unified and modular framework that standardizes the definition, composition, and deployment of simulated patients. To demonstrate PatientHub's utility, we implement several representative patient simulation methods as case studies, showcasing how our framework supports standardized cross-method evaluation and the seamless integration of custom evaluation metrics. We further demonstrate PatientHub's extensibility by prototyping two new simulator variants, highlighting how PatientHub accelerates method development by eliminating infrastructure overhead. By consolidating existing work into a single reproducible pipeline, PatientHub lowers the barrier to developing new simulation methods and facilitates cross-method and cross-model benchmarking. Our framework provides a practical foundation for future datasets, methods, and benchmarks in patient-centered dialogue, and the code is publicly available via https://github.com/Sahandfer/PatientHub.

Authors:Yuhao Zheng, Li'an Zhong, Yi Wang, Rui Dai, Kaikui Liu, Xiangxiang Chu, Linyuan Lv, Philip Torr, Kevin Qinghong Lin
Title: Code2World: A GUI World Model via Renderable Code Generation
Abstract:
Autonomous GUI agents interact with environments by perceiving interfaces and executing actions. As a virtual sandbox, the GUI World model empowers agents with human-like foresight by enabling action-conditioned prediction. However, existing text- and pixel-based approaches struggle to simultaneously achieve high visual fidelity and fine-grained structural controllability. To this end, we propose Code2World, a vision-language coder that simulates the next visual state via renderable code generation. Specifically, to address the data scarcity problem, we construct AndroidCode by translating GUI trajectories into high-fidelity HTML and refining synthesized code through a visual-feedback revision mechanism, yielding a corpus of over 80K high-quality screen-action pairs. To adapt existing VLMs into code prediction, we first perform SFT as a cold start for format layout following, then further apply Render-Aware Reinforcement Learning which uses rendered outcome as the reward signal by enforcing visual semantic fidelity and action consistency. Extensive experiments demonstrate that Code2World-8B achieves the top-performing next UI prediction, rivaling the competitive GPT-5 and Gemini-3-Pro-Image. Notably, Code2World significantly enhances downstream navigation success rates in a flexible manner, boosting Gemini-2.5-Flash by +9.5% on AndroidWorld navigation. The code is available at https://github.com/AMAP-ML/Code2World.

Authors:Zhuoyun Zheng, Yu Dong, Gaorong Liang, Guan Li, Guihua Shan, Shiyu Cheng, Dong Tian, Jianlong Zhou, Jie Liang
Title: T2VTree: User-Centered Visual Analytics for Agent-Assisted Thought-to-Video Authoring
Abstract:
Generative models have substantially expanded video generation capabilities, yet practical thought-to-video creation remains a multi-stage, multi-modal, and decision-intensive process. However, existing tools either hide intermediate decisions behind repeated reruns or expose operator-level workflows that make exploration traces difficult to manage, compare, and reuse. We present T2VTree, a user-centered visual analytics approach for agent-assisted thought-to-video authoring. T2VTree represents the authoring process as a tree visualization. Each node in the tree binds an editable specification (intent, referenced inputs, workflow choice, prompts, and parameters) with the resulting multimodal outputs, making refinement, branching, and provenance inspection directly operable. To reduce the burden of deciding what to do next, a set of collaborating agents translates step-level intent into an executable plan that remains visible and user-editable before execution. We further implement a visual analytics system that integrates branching authoring with in-place preview and stitching for convergent assembly, enabling end-to-end multi-scene creation without leaving the authoring context. We demonstrate T2VTreeVA through two multi-scene case studies and a comparative user study, showing how the T2VTree visualization and editable agent planning support reliable refinement, localized comparison, and practical reuse in real authoring workflows. T2VTree is available at: https://github.com/tezuka0210/T2VTree.

Authors:Peizhen Li, Longbing Cao, Xiao-Ming Wu, Yang Zhang
Title: VividFace: Real-Time and Realistic Facial Expression Shadowing for Humanoid Robots
Abstract:
Humanoid facial expression shadowing enables robots to realistically imitate human facial expressions in real time, which is critical for lifelike, facially expressive humanoid robots and affective human-robot interaction. Existing progress in humanoid facial expression imitation remains limited, often failing to achieve either real-time performance or realistic expressiveness due to offline video-based inference designs and insufficient ability to capture and transfer subtle expression details. To address these limitations, we present VividFace, a real-time and realistic facial expression shadowing system for humanoid robots. An optimized imitation framework X2CNet++ enhances expressiveness by fine-tuning the human-to-humanoid facial motion transfer module and introducing a feature-adaptation training strategy for better alignment across different image sources. Real-time shadowing is further enabled by a video-stream-compatible inference pipeline and a streamlined workflow based on asynchronous I/O for efficient communication across devices. VividFace produces vivid humanoid faces by mimicking human facial expressions within 0.05 seconds, while generalizing across diverse facial configurations. Extensive real-world demonstrations validate its practical utility. Videos are available at: https://lipzh5.github.io/VividFace/.

Authors:Joao Baptista Cardia Neto, Claudio Ferrari, Stefano Berretti
Title: Revisiting Emotions Representation for Recognition in the Wild
Abstract:
Facial emotion recognition has been typically cast as a single-label classification problem of one out of six prototypical emotions. However, that is an oversimplification that is unsuitable for representing the multifaceted spectrum of spontaneous emotional states, which are most often the result of a combination of multiple emotions contributing at different intensities. Building on this, a promising direction that was explored recently is to cast emotion recognition as a distribution learning problem. Still, such approaches are limited in that research datasets are typically annotated with a single emotion class. In this paper, we contribute a novel approach to describe complex emotional states as probability distributions over a set of emotion classes. To do so, we propose a solution to automatically re-label existing datasets by exploiting the result of a study in which a large set of both basic and compound emotions is mapped to probability distributions in the Valence-Arousal-Dominance (VAD) space. In this way, given a face image annotated with VAD values, we can estimate the likelihood of it belonging to each of the distributions, so that emotional states can be described as a mixture of emotions, enriching their description, while also accounting for the ambiguous nature of their perception. In a preliminary set of experiments, we illustrate the advantages of this solution and a new possible direction of investigation. Data annotations are available at https://github.com/jbcnrlz/affectnet-b-annotation.

Authors:Karla Felix Navarro, Eugene Syriani, Ian Arawjo
Title: Reporting and Reviewing LLM-Integrated Systems in HCI: Challenges and Considerations
Abstract:
What should HCI scholars consider when reporting and reviewing papers that involve LLM-integrated systems? We interview 18 authors of LLM-integrated system papers on their authoring and reviewing experiences. We find that norms of trust-building between authors and reviewers appear to be eroded by the uncertainty of LLM behavior and hyperbolic rhetoric surrounding AI. Authors perceive that reviewers apply uniquely skeptical and inconsistent standards towards papers that report LLM-integrated systems, and mitigate mistrust by adding technical evaluations, justifying usage, and de-emphasizing LLM presence. Authors' views challenge blanket directives to report all prompts and use open models, arguing that prompt reporting is context-dependent and justifying proprietary model usage despite ethical concerns. Finally, some tensions in peer review appear to stem from clashes between the norms and values of HCI and ML/NLP communities, particularly around what constitutes a contribution and an appropriate level of technical rigor. Based on our findings and additional feedback from six expert HCI researchers, we present a set of guidelines and considerations for authors, reviewers, and HCI communities around reporting and reviewing papers that involve LLM-integrated systems.

Authors:Yuanchen Bai, Ruixiang Han, Niti Parikh, Wendy Ju, Angelique Taylor
Title: Towards Considerate Embodied AI: Co-Designing Situated Multi-Site Healthcare Robots from Abstract Concepts to High-Fidelity Prototypes
Abstract:
Co-design is essential for grounding embodied artificial intelligence (AI) systems in real-world contexts, especially high-stakes domains such as healthcare. While prior work has explored multidisciplinary collaboration, iterative prototyping, and support for non-technical participants, few have interwoven these into a sustained co-design process. Such efforts often target one context and low-fidelity stages, limiting the generalizability of findings and obscuring how participants' ideas evolve. To address these limitations, we conducted a 14-week workshop with a multidisciplinary team of 22 participants, centered around how embodied AI can reduce non-value-added task burdens in three healthcare settings: emergency departments, long-term rehabilitation facilities, and sleep disorder clinics. We found that the iterative progression from abstract brainstorming to high-fidelity prototypes, supported by educational scaffolds, enabled participants to understand real-world trade-offs and generate more deployable solutions. We propose eight guidelines for co-designing more considerate embodied AI: attuned to context, responsive to social dynamics, mindful of expectations, and grounded in deployment. Project Page: https://byc-sophie.github.io/Towards-Considerate-Embodied-AI/

Authors:Yueyi Yang, Haotian Liu, Fang Kang, Mengqi Zhang, Zheng Lian, Hao Tang, Haoyu Chen
Title: SayNext-Bench: Why Do LLMs Struggle with Next-Utterance Prediction?
Abstract:
We explore the use of large language models (LLMs) for next-utterance prediction in human dialogue. Despite recent advances in LLMs demonstrating their ability to engage in natural conversations with users, we show that even leading models surprisingly struggle to predict a human speaker's next utterance. Instead, humans can readily anticipate forthcoming utterances based on multimodal cues, such as gestures, gaze, and emotional tone, from the context. To systematically examine whether LLMs can reproduce this ability, we propose SayNext-Bench, a benchmark that evaluates LLMs and Multimodal LLMs (MLLMs) on anticipating context-conditioned responses from multimodal cues spanning a variety of real-world scenarios. To support this benchmark, we build SayNext-PC, a novel large-scale dataset containing dialogues with rich multimodal cues. Building on this, we further develop a dual-route prediction MLLM, SayNext-Chat, that incorporates cognitively inspired design to emulate predictive processing in conversation. Experimental results demonstrate that our model outperforms state-of-the-art MLLMs in terms of lexical overlap, semantic similarity, and emotion consistency. Our results prove the feasibility of next-utterance prediction with LLMs from multimodal cues and emphasize the (i) indispensable role of multimodal cues and (ii) actively predictive processing as the foundation of natural human interaction, which is missing in current MLLMs. We hope that this exploration offers a new research entry toward more human-like, context-sensitive AI interaction for human-centered AI. Our benchmark and model can be accessed at https://saynext.github.io/.

Authors:Kai Li, Jintao Cheng, Chang Zeng, Zijun Yan, Helin Wang, Zixiong Su, Bo Zheng, Xiaolin Hu
Title: A Semantically Consistent Dataset for Data-Efficient Query-Based Universal Sound Separation
Abstract:
Query-based universal sound separation is fundamental to intelligent auditory systems, aiming to isolate specific sources from mixtures. Despite recent advances, existing methods continue to suffer from residual interference in complex acoustic scenes. This performance limitation stems largely from a data bottleneck: in-the-wild datasets contain weak labels and severe co-occurrence of events. These flaws induce models to learn spurious correlations between background noise and target categories instead of robust acoustic features. To address this, we propose an automated pipeline that eliminates co-occurrence of events by mining high-purity single-event segments from in-the-wild datasets via a semantically consistent synthesis protocol. Utilizing this pipeline, we constructed Hive, a high-quality synthetic dataset comprising 2.4k hours of raw audio. Experimental results demonstrate that, compared with the state-of-the-art model SAM-Audio which was trained on a huge dataset $\sim$500 times larger than Hive, certain open-source models trained on Hive achieve competitive separation accuracy and perceptual quality. Moreover, these models exhibited remarkable zero-shot generalization on out-of-distribution evaluation benchmarks. These findings highlight that prioritizing purity of supervised signals enables significant data efficiency, offering a new paradigm for training robust auditory foundation models with reduced computational costs. Code and dataset are available at https://shandaai.github.io/Hive.

Authors:Haoyuan Yu, Yuxuan Chen, Minjie Cai
Title: Unit-Based Agent for Semi-Cascaded Full-Duplex Dialogue Systems
Abstract:
Full-duplex voice interaction is crucial for natural human computer interaction. We present a framework that decomposes complex dialogue into minimal conversational units, enabling the system to process each unit independently and predict when to transit to the next. This framework is instantiated as a semi-cascaded full-duplex dialogue system built around a multimodal large language model, supported by auxiliary modules such as voice activity detection (VAD) and text-to-speech (TTS) synthesis. The resulting system operates in a train-free, plug-and-play manner. Experiments on the HumDial dataset demonstrate the effectiveness of our framework, which ranks second among all teams on the test set of the Human-like Spoken Dialogue Systems Challenge (Track 2: Full-Duplex Interaction). Code is available at the GitHub repository https://github.com/yu-haoyuan/fd-badcat.

Authors:Viacheslav Sydora, Guner Dilsad Er, Michael Muehlebach
Title: Teaching Machine Learning Fundamentals with LEGO Robotics
Abstract:
This paper presents the web-based platform Machine Learning with Bricks and an accompanying two-day course designed to teach machine learning concepts to students aged 12 to 17 through programming-free robotics activities. Machine Learning with Bricks is an open source platform and combines interactive visualizations with LEGO robotics to teach three core algorithms: KNN, linear regression, and Q-learning. Students learn by collecting data, training models, and interacting with robots via a web-based interface. Pre- and post-surveys with 14 students demonstrate significant improvements in conceptual understanding of machine learning algorithms, positive shifts in AI perception, high platform usability, and increased motivation for continued learning. This work demonstrates that tangible, visualization-based approaches can make machine learning concepts accessible and engaging for young learners while maintaining technical depth. The platform is freely available at https://learning-and-dynamics.github.io/ml-with-bricks/, with video tutorials guiding students through the experiments at https://youtube.com/playlist?list=PLx1grFu4zAcwfKKJZ1Ux4LwRqaePCOA2J.

Authors:Jiayi Zhou, Liwenhan Xie, Jiaju Ma, Zheng Wei, Huamin Qu, Anyi Rao
Title: Collaposer: Transforming Photo Collections into Visual Assets for Storytelling with Collages
Abstract:
Digital collage is an artistic practice that combines image cutouts to tell stories. However, preparing cutouts from a set of photos remains a tedious and time-consuming task. A formative study identified three main challenges: 1) inefficient search for relevant photos, 2) manual image cutout, and 3) difficulty in organizing large sets of cutouts. To meet these challenges and facilitate asset preparation for collage, we propose Collaposer, a tool that transforms a collection of photos into organized, ready-to-use visual cutouts based on user-provided story descriptions. Collaposer tags, detects, and segments photos, and then uses an LLM to select central and related labels based on the user-provided story description. Collaposer presents the resulting visuals in varying sizes, clustered according to semantic hierarchy. Our evaluation shows that Collaposer effectively automates the preparation process to produce diverse sets of visual cutouts adhering to the storyline, allowing users to focus on collaging these assets for storytelling. Project website: https://jiayzhou.github.io/collaposer-website/

Authors:Tianyi Gong, Can Han, Junxi Wu, Dahong Qian
Title: Fusion of Spatio-Temporal and Multi-Scale Frequency Features for Dry Electrodes MI-EEG Decoding
Abstract:
Dry-electrode Motor Imagery Electroencephalography (MI-EEG) enables fast, comfortable, real-world Brain Computer Interface by eliminating gels and shortening setup for at-home and wearable use.However, dry recordings pose three main issues: lower Signal-to-Noise Ratio with more baseline drift and sudden transients; weaker and noisier data with poor phase alignment across trials; and bigger variances between sessions. These drawbacks lead to larger data distribution shift, making features less stable for MI-EEG tasks.To address these problems, we introduce STGMFM, a tri-branch framework tailored for dry-electrode MI-EEG, which models complementary spatio-temporal dependencies via dual graph orders, and captures robust envelope dynamics with a multi-scale frequency mixing branch, motivated by the observation that amplitude envelopes are less sensitive to contact variability than instantaneous waveforms. Physiologically meaningful connectivity priors guide learning, and decision-level fusion consolidates a noise-tolerant consensus. On our collected dry-electrode MI-EEG, STGMFM consistently surpasses competitive CNN/Transformer/graph baselines. Codes are available at https://github.com/Tianyi-325/STGMFM.

Authors:Andrey Moskalenko, Danil Kuznetsov, Irina Dudko, Anastasiia Iasakova, Nikita Boldyrev, Denis Shepelev, Andrei Spiridonov, Andrey Kuznetsov, Vlad Shakhuro
Title: BREPS: Bounding-Box Robustness Evaluation of Promptable Segmentation
Abstract:
Promptable segmentation models such as SAM have established a powerful paradigm, enabling strong generalization to unseen objects and domains with minimal user input, including points, bounding boxes, and text prompts. Among these, bounding boxes stand out as particularly effective, often outperforming points while significantly reducing annotation costs. However, current training and evaluation protocols typically rely on synthetic prompts generated through simple heuristics, offering limited insight into real-world robustness. In this paper, we investigate the robustness of promptable segmentation models to natural variations in bounding box prompts. First, we conduct a controlled user study and collect thousands of real bounding box annotations. Our analysis reveals substantial variability in segmentation quality across users for the same model and instance, indicating that SAM-like models are highly sensitive to natural prompt noise. Then, since exhaustive testing of all possible user inputs is computationally prohibitive, we reformulate robustness evaluation as a white-box optimization problem over the bounding box prompt space. We introduce BREPS, a method for generating adversarial bounding boxes that minimize or maximize segmentation error while adhering to naturalness constraints. Finally, we benchmark state-of-the-art models across 10 datasets, spanning everyday scenes to medical imaging. Code - https://github.com/emb-ai/BREPS.

Authors:Richard Shaw, Youngkyoon Jang, Athanasios Papaioannou, Arthur Moreau, Helisa Dhamo, Zhensong Zhang, Eduardo Pérez-Pellitero
Title: ICo3D: An Interactive Conversational 3D Virtual Human
Abstract:
This work presents Interactive Conversational 3D Virtual Human (ICo3D), a method for generating an interactive, conversational, and photorealistic 3D human avatar. Based on multi-view captures of a subject, we create an animatable 3D face model and a dynamic 3D body model, both rendered by splatting Gaussian primitives. Once merged together, they represent a lifelike virtual human avatar suitable for real-time user interactions. We equip our avatar with an LLM for conversational ability. During conversation, the audio speech of the avatar is used as a driving signal to animate the face model, enabling precise synchronization. We describe improvements to our dynamic Gaussian models that enhance photorealism: SWinGS++ for body reconstruction and HeadGaS++ for face reconstruction, and provide as well a solution to merge the separate face and body models without artifacts. We also present a demo of the complete system, showcasing several use cases of real-time conversation with the 3D avatar. Our approach offers a fully integrated virtual avatar experience, supporting both oral and written form interactions in immersive environments. ICo3D is applicable to a wide range of fields, including gaming, virtual assistance, and personalized education, among others. Project page: https://ico3d.github.io/

Authors:Haoyu Tian, Yingchaojie Feng, Zhen Wen, Haoxuan Li, Minfeng Zhu, Wei Chen
Title: RAGExplorer: A Visual Analytics System for the Comparative Diagnosis of RAG Systems
Abstract:
The advent of Retrieval-Augmented Generation (RAG) has significantly enhanced the ability of Large Language Models (LLMs) to produce factually accurate and up-to-date responses. However, the performance of a RAG system is not determined by a single component but emerges from a complex interplay of modular choices, such as embedding models and retrieval algorithms. This creates a vast and often opaque configuration space, making it challenging for developers to understand performance trade-offs and identify optimal designs. To address this challenge, we present RAGExplorer, a visual analytics system for the systematic comparison and diagnosis of RAG configurations. RAGExplorer guides users through a seamless macro-to-micro analytical workflow. Initially, it empowers developers to survey the performance landscape across numerous configurations, allowing for a high-level understanding of which design choices are most effective. For a deeper analysis, the system enables users to drill down into individual failure cases, investigate how differences in retrieved information contribute to errors, and interactively test hypotheses by manipulating the provided context to observe the resulting impact on the generated answer. We demonstrate the effectiveness of RAGExplorer through detailed case studies and user studies, validating its ability to empower developers in navigating the complex RAG design space. Our code and user guide are publicly available at https://github.com/Thymezzz/RAGExplorer.

Authors:Lennart Eing, Cristina Luna-Jiménez, Silvan Mertes, Elisabeth André
Title: Video Joint-Embedding Predictive Architectures for Facial Expression Recognition
Abstract:
This paper introduces a novel application of Video Joint-Embedding Predictive Architectures (V-JEPAs) for Facial Expression Recognition (FER). Departing from conventional pre-training methods for video understanding that rely on pixel-level reconstructions, V-JEPAs learn by predicting embeddings of masked regions from the embeddings of unmasked regions. This enables the trained encoder to not capture irrelevant information about a given video like the color of a region of pixels in the background. Using a pre-trained V-JEPA video encoder, we train shallow classifiers using the RAVDESS and CREMA-D datasets, achieving state-of-the-art performance on RAVDESS and outperforming all other vision-based methods on CREMA-D (+1.48 WAR). Furthermore, cross-dataset evaluations reveal strong generalization capabilities, demonstrating the potential of purely embedding-based pre-training approaches to advance FER. We release our code at https://github.com/lennarteingunia/vjepa-for-fer.

Authors:Alvaro Becerra, Ruth Cobos, Roberto Daza
Title: A Multimodal Dataset of Student Oral Presentations with Sensors and Evaluation Data
Abstract:
Oral presentation skills are a critical component of higher education, yet comprehensive datasets capturing real-world student performance across multiple modalities remain scarce. To address this gap, we present SOPHIAS (Student Oral Presentation monitoring for Holistic Insights & Analytics using Sensors), a 12-hour multimodal dataset containing recordings of 50 oral presentations (10-15-minute presentation followed by 5-15-minute Q&A) delivered by 65 undergraduate and master's students at the Universidad Autonoma de Madrid. SOPHIAS integrates eight synchronized sensor streams from high-definition webcams, ambient and webcam audio, eye-tracking glasses, smartwatch physiological sensors, and clicker, keyboard, and mouse interactions. In addition, the dataset includes slides and rubric-based evaluations from teachers, peers, and self-assessments, along with timestamped contextual annotations. The dataset captures presentations conducted in real classroom settings, preserving authentic student behaviors, interactions, and physiological responses. SOPHIAS enables the exploration of relationships between multimodal behavioral and physiological signals and presentation performance, supports the study of peer assessment, and provides a benchmark for developing automated feedback and Multimodal Learning Analytics tools. The dataset is publicly available for research through GitHub and Science Data Bank.

Authors:Carl Vincent Ladres Kho
Title: Pareto-Optimal Model Selection for Low-Cost, Single-Lead EMG Control in Embedded Systems
Abstract:
Consumer-grade biosensors offer a cost-effective alternative to medical-grade electromyography (EMG) systems, reducing hardware costs from thousands of dollars to approximately $13. However, these low-cost sensors introduce significant signal instability and motion artifacts. Deploying machine learning models on resource-constrained edge devices like the ESP32 presents a challenge: balancing classification accuracy with strict latency (<100ms) and memory (<320KB) constraints. Using a single-subject dataset comprising 1,540 seconds of raw data (1.54M data points, segmented into ~1,300 one-second windows), I evaluate 18 model architectures, ranging from statistical heuristics to deep transfer learning (ResNet50) and custom hybrid networks (MaxCRNN). While my custom "MaxCRNN" (Inception + Bi-LSTM + Attention) achieved the highest safety (99% Precision) and robustness, I identify Random Forest (74% accuracy) as the Pareto-optimal solution for embedded control on legacy microcontrollers. I demonstrate that reliable, low-latency EMG control is feasible on commodity hardware, with Deep Learning offering a path to near-perfect reliability on modern Edge AI accelerators.

Authors:Zeyi Liao, Yadong Lu, Boyu Gou, Huan Sun, Ahmed Awadallah
Title: Beyond Clicking:A Step Towards Generalist GUI Grounding via Text Dragging
Abstract:
Graphical user interface (GUI) grounding, the process of mapping human instructions to GUI actions, serves as a fundamental basis to autonomous GUI agents. While existing grounding models achieve promising performance to simulate the mouse click action on various click-based benchmarks, another essential mode of mouse interaction, namely dragging, remains largely underexplored. Yet, dragging the mouse to select and manipulate textual content represents a prevalent and important usage in practical GUI scenarios. To narrow this gap, we first introduce GUI-Drag, a diverse dataset of 161K text dragging examples synthesized through a scalable pipeline. To support systematic and robust evaluation, we further construct ScreenDrag, a benchmark with 5,333 examples spanning three levels of interface context, together with three dedicated metrics designed for assessing text dragging capability. Models trained on GUI-Drag with an efficient continual training strategy achieve substantial improvements on ScreenDrag, while preserving the original click-based performance on ScreenSpot, ScreenSpot-v2, and OSWorld-G. Our work encourages further research on broader GUI grounding beyond just clicking and paves way toward a truly generalist GUI grounding model. All benchmark, data, checkpoints, and code are open-sourced and available at https://osu-nlp-group.github.io/GUI-Drag.

Authors:Haoming Xu, Ningyuan Zhao, Yunzhi Yao, Weihong Xu, Hongru Wang, Xinle Deng, Shumin Deng, Jeff Z. Pan, Huajun Chen, Ningyu Zhang
Title: Illusions of Confidence? Diagnosing LLM Truthfulness via Neighborhood Consistency
Abstract:
As Large Language Models (LLMs) are increasingly deployed in real-world settings, correctness alone is insufficient. Reliable deployment requires maintaining truthful beliefs under contextual perturbations. Existing evaluations largely rely on point-wise confidence like Self-Consistency, which can mask brittle belief. We show that even facts answered with perfect self-consistency can rapidly collapse under mild contextual interference. To address this gap, we propose Neighbor-Consistency Belief (NCB), a structural measure of belief robustness that evaluates response coherence across a conceptual neighborhood. To validate the efficiency of NCB, we introduce a new cognitive stress-testing protocol that probes outputs stability under contextual interference. Experiments across multiple LLMs show that the performance of high-NCB data is relatively more resistant to interference. Finally, we present Structure-Aware Training (SAT), which optimizes context-invariant belief structure and reduces long-tail knowledge brittleness by approximately 30%. Code will be available at https://github.com/zjunlp/belief.

Authors:Hadi Hosseini, Debmalya Mandal, Amrit Puhan
Title: SP-Rank: A Dataset for Ranked Preferences with Secondary Information
Abstract:
We introduce $\mathbf{SP-Rank}$, the first large-scale, publicly available dataset for benchmarking algorithms that leverage both first-order preferences and second-order predictions in ranking tasks. Each datapoint includes a personal vote (first-order signal) and a meta-prediction of how others will vote (second-order signal), allowing richer modeling than traditional datasets that capture only individual preferences. SP-Rank contains over 12,000 human-generated datapoints across three domains -- geography, movies, and paintings, and spans nine elicitation formats with varying subset sizes. This structure enables empirical analysis of preference aggregation when expert identities are unknown but presumed to exist, and individual votes represent noisy estimates of a shared ground-truth ranking. We benchmark SP-Rank by comparing traditional aggregation methods that use only first-order votes against SP-Voting, a second-order method that jointly reasons over both signals to infer ground-truth rankings. While SP-Rank also supports models that rely solely on second-order predictions, our benchmarks emphasize the gains from combining both signals. We evaluate performance across three core tasks: (1) full ground-truth rank recovery, (2) subset-level rank recovery, and (3) probabilistic modeling of voter behavior. Results show that incorporating second-order signals substantially improves accuracy over vote-only methods. Beyond social choice, SP-Rank supports downstream applications in learning-to-rank, extracting expert knowledge from noisy crowds, and training reward models in preference-based fine-tuning pipelines. We release the dataset, code, and baseline evaluations (available at https://github.com/amrit19/SP-Rank-Dataset ) to foster research in human preference modeling, aggregation theory, and human-AI alignment.

Authors:Zihan Gao, Mohsin Y. K. Yousufi, Jacob Thebault-Spieker
Title: Collective Narrative Grounding: Community-Coordinated Data Contributions to Improve Local AI Systems
Abstract:
Large language model (LLM) question-answering systems often fail on community-specific queries, creating "knowledge blind spots" that marginalize local voices and reinforce epistemic injustice. We present Collective Narrative Grounding, a participatory protocol that transforms community stories into structured narrative units and integrates them into AI systems under community governance. Learning from three participatory mapping workshops with N=24 community members, we designed elicitation methods and a schema that retain narrative richness while enabling entity, time, and place extraction, validation, and provenance control. To scope the problem, we audit a county-level benchmark of 14,782 local information QA pairs, where factual gaps, cultural misunderstandings, geographic confusions, and temporal misalignments account for 76.7% of errors. On a participatory QA set derived from our workshops, a state-of-the-art LLM answered fewer than 21% of questions correctly without added context, underscoring the need for local grounding. The missing facts often appear in the collected narratives, suggesting a direct path to closing the dominant error modes for narrative items. Beyond the protocol and pilot, we articulate key design tensions, such as representation and power, governance and control, and privacy and consent, providing concrete requirements for retrieval-first, provenance-visible, locally governed QA systems. Together, our taxonomy, protocol, and participatory evaluation offer a rigorous foundation for building community-grounded AI that better answers local questions.

Authors:Taekyung Ki, Sangwon Jang, Jaehyeong Jo, Jaehong Yoon, Sung Ju Hwang
Title: Avatar Forcing: Real-Time Interactive Head Avatar Generation for Natural Conversation
Abstract:
Talking head generation creates lifelike avatars from static portraits for virtual communication and content creation. However, current models do not yet convey the feeling of truly interactive communication, often generating one-way responses that lack emotional engagement. We identify two key challenges toward truly interactive avatars: generating motion in real-time under causal constraints and learning expressive, vibrant reactions without additional labeled data. To address these challenges, we propose Avatar Forcing, a new framework for interactive head avatar generation that models real-time user-avatar interactions through diffusion forcing. This design allows the avatar to process real-time multimodal inputs, including the user's audio and motion, with low latency for instant reactions to both verbal and non-verbal cues such as speech, nods, and laughter. Furthermore, we introduce a direct preference optimization method that leverages synthetic losing samples constructed by dropping user conditions, enabling label-free learning of expressive interaction. Experimental results demonstrate that our framework enables real-time interaction with low latency (approximately 500ms), achieving 6.8X speedup compared to the baseline, and produces reactive and expressive avatar motion, which is preferred over 80% against the baseline.

Authors:Xuhui Ren, Shaokang Dong, Chen Yang, Qing Gao, Yunbin Zhao, Yongsheng Liu, Xinwei Geng, Xiang Li, Demei Yan, Yanqing Li, Chenhao Huang, Dingwei Zhu, Junjie Ye, Boxuan Yue, Yingnan Fu, Mengzhe Lv, Zezeng Feng, Boshen Zhou, Bocheng Wang, Xuanjing Huang, Yu-Gang Jiang, Tao Gui, Qi Zhang, Yunke Zhang
Title: MagicAgent: Towards Generalized Agent Planning
Abstract:
The evolution of Large Language Models (LLMs) from passive text processors to autonomous agents has established planning as a core component of modern intelligence. However, achieving generalized planning remains elusive, not only by the scarcity of high-quality interaction data but also by inherent conflicts across heterogeneous planning tasks. These challenges result in models that excel at isolated tasks yet struggle to generalize, while existing multi-task training attempts suffer from gradient interference. In this paper, we present \textbf{MagicAgent}, a series of foundation models specifically designed for generalized agent planning. We introduce a lightweight and scalable synthetic data framework that generates high-quality trajectories across diverse planning tasks, including hierarchical task decomposition, tool-augmented planning, multi-constraint scheduling, procedural logic orchestration, and long-horizon tool execution. To mitigate training conflicts, we propose a two-stage training paradigm comprising supervised fine-tuning followed by multi-objective reinforcement learning over both static datasets and dynamic environments. Empirical results show that MagicAgent-32B and MagicAgent-30B-A3B achieve superior performance across diverse open-source benchmarks (\emph{e.g.}, $75.1\%$ on Worfbench and $86.9\%$ on BFCL-v3), as well as strong results on our in-house MagicEval benchmarks, substantially outperforming existing sub-100B models and surpassing leading ultra-scale models, including GPT-5.2, Kimi-K2 and GLM-4.7.

Authors:Ziwen Xu, Kewei Xu, Haoming Xu, Haiwen Hong, Longtao Huang, Hui Xue, Ningyu Zhang, Yongliang Shen, Guozhou Zheng, Huajun Chen, Shumin Deng
Title: How Controllable Are Large Language Models? A Unified Evaluation across Behavioral Granularities
Abstract:
Large Language Models (LLMs) are increasingly deployed in socially sensitive domains, yet their unpredictable behaviors, ranging from misaligned intent to inconsistent personality, pose significant risks. We introduce SteerEval, a hierarchical benchmark for evaluating LLM controllability across three domains: language features, sentiment, and personality. Each domain is structured into three specification levels: L1 (what to express), L2 (how to express), and L3 (how to instantiate), connecting high-level behavioral intent to concrete textual output. Using SteerEval, we systematically evaluate contemporary steering methods, revealing that control often degrades at finer-grained levels. Our benchmark offers a principled and interpretable framework for safe and controllable LLM behavior, serving as a foundation for future research.

Authors:Minheng Ni, Yutao Fan, Zhengyuan Yang, Yeli Shen, Yuxiang Wei, Yaowen Zhang, Lijuan Wang, Lei Zhang, Wangmeng Zuo
Title: CoEditor++: Instruction-based Visual Editing via Cognitive Reasoning
Abstract:
Recent advances in large multimodal models (LMMs) have enabled instruction-based image editing, allowing users to modify visual content via natural language descriptions. However, existing approaches often struggle with high-level semantic reasoning and visual consistency, particularly under ambiguous or complex instructions. To address these challenges, we propose CoEditor++, a cognitively structured, training-free framework that decomposes editing into "what to edit" and "how to edit" through two cognitive stages with a reflective self-selection mechanism, enabling robust, fine-grained, and interpretable editing. Built entirely from open-sourced components, CoEditor++ requires no additional training or fine-tuning, ensuring transparency and cross-domain applicability. We evaluate CoEditor++ on SmartEdit, a widely used benchmark for general editing, and AltBear, a privacy and compliance-oriented benchmark. Experimental results show that CoEditor++ achieves state-of-the-art performance in both general editing and responsible editing tasks compared with open-sourced models that require training on specialized editing datasets maintaining significantly higher visual consistency. When compared with closed-source models such as Nano Banana Pro or GPT-4o, CoEditor++ preserves comparable instruction following while still substantially outperforming them in visual consistency. Extensive ablation studies confirm that the effectiveness of CoEditor++ benefits from its structured cognitive design rather than any specific model component. Our findings suggest the potential toward cognitive-centric instruction-based image editing.

Authors:Bingsheng Yao, Chaoran Chen, April Yi Wang, Sherry Tongshuang Wu, Toby Jia-jun Li, Dakuo Wang
Title: From Human-Human Collaboration to Human-Agent Collaboration: A Vision, Design Philosophy, and an Empirical Framework for Achieving Successful Partnerships Between Humans and LLM Agents
Abstract:
The emergence of Large Language Model (LLM) agents enables us to build agent-based intelligent systems that move beyond the role of a "tool" to become genuine collaborators with humans, thereby realizing a novel human-agent collaboration paradigm. Our vision is that LLM agents should resemble remote human collaborators, which allows HCI researchers to ground the future exploration in decades of research on trust, awareness, and common ground in remote human collaboration, while also revealing the unique opportunities and challenges that emerge when one or more partners are AI agents. This workshop establishes a foundational research agenda for the new era by posing the question: How can the rich understanding of remote human collaboration inspire and inform the design and study of human-agent collaboration? We will bring together an interdisciplinary group from HCI, CSCW, and AI to explore this critical transition. The 180-minute workshop will be highly interactive, featuring a keynote speaker, a series of invited lightning talks, and an exploratory group design session where participants will storyboard novel paradigms of human-agent partnership. Our goal is to enlighten the research community by cultivating a shared vocabulary and producing a research agenda that charts the future of collaborative agents.

Authors:Tobias Labarta, Nhi Hoang, Maximilian Dreyer, Jim Berend, Oleg Hein, Jackie Ma, Wojciech Samek, Sebastian Lapuschkin
Title: X-SYS: A Reference Architecture for Interactive Explanation Systems
Abstract:
The explainable AI (XAI) research community has proposed numerous technical methods, yet deploying explainability as systems remains challenging: Interactive explanation systems require both suitable algorithms and system capabilities that maintain explanation usability across repeated queries, evolving models and data, and governance constraints. We argue that operationalizing XAI requires treating explainability as an information systems problem where user interaction demands induce specific system requirements. We introduce X-SYS, a reference architecture for interactive explanation systems, that guides (X)AI researchers, developers and practitioners in connecting interactive explanation user interfaces (XUI) with system capabilities. X-SYS organizes around four quality attributes named STAR (scalability, traceability, responsiveness, and adaptability), and specifies a five-component decomposition (XUI Services, Explanation Services, Model Services, Data Services, Orchestration and Governance). It maps interaction patterns to system capabilities to decouple user interface evolution from backend computation. We implement X-SYS through SemanticLens, a system for semantic search and activation steering in vision-language models. SemanticLens demonstrates how contract-based service boundaries enable independent evolution, offline/online separation ensures responsiveness, and persistent state management supports traceability. Together, this work provides a reusable blueprint and concrete instantiation for interactive explanation systems supporting end-to-end design under operational constraints.

Authors:Jiamu Zhou, Jihong Wang, Weiming Zhang, Weiwen Liu, Zhuosheng Zhang, Xingyu Lou, Weinan Zhang, Huarong Deng, Jun Wang
Title: ColorBrowserAgent: An Intelligent GUI Agent for Complex Long-Horizon Web Automation
Abstract:
The web browser serves as a primary interface for daily human activities, making its automation a critical frontier for Human-Centred AI. While Large Language Models (LLMs) have enabled autonomous agents to interact with web GUIs, their reliability in real-world scenarios is hampered by long-horizon instability and the vast heterogeneity of site designs. In this paper, we introduce ColorBrowserAgent, a framework designed for Collaborative Autonomy in complex web tasks. Our approach integrates two human-centred mechanisms: (1) Progressive Progress Summarization, which mimics human short-term memory to maintain coherence over extended interactions; and (2) Human-in-the-Loop Knowledge Adaptation, which bridges the knowledge gap in diverse environments by soliciting expert intervention only when necessary. This symbiotic design allows the agent to learn from human tips without extensive retraining, effectively combining the scalability of AI with the adaptability of human cognition. Evaluated on the WebArena benchmark using GPT-5, ColorBrowserAgent achieves a state-of-the-art success rate of 71.2\%, demonstrating the efficacy of interactive human assistance in robust web automation.

Authors:Keyu Zhao, Fengli Xu, Yong Li, Tie-Yan Liu
Title: HybridQuestion: Human-AI Collaboration for Identifying High-Impact Research Questions
Abstract:
The "AI Scientist" paradigm is transforming scientific research by automating key stages of the research process, from idea generation to scholarly writing. This shift is expected to accelerate discovery and expand the scope of scientific inquiry. However, a key question remains unclear: can AI scientists identify meaningful research questions? While Large Language Models (LLMs) have been applied successfully to task-specific ideation, their potential to conduct strategic, long-term assessments of past breakthroughs and future questions remains largely unexplored. To address this gap, we explore a human-AI hybrid solution that integrates the scalable data processing capabilities of AI with the value judgment of human experts. Our methodology is structured in three phases. The first phase, AI-Accelerated Information Gathering, leverages AI's advantage in processing vast amounts of literature to generate a hybrid information base. The second phase, Candidate Question Proposing, utilizes this synthesized data to prompt an ensemble of six diverse LLMs to propose an initial candidate pool, filtered via a cross-model voting mechanism. The third phase, Hybrid Question Selection, refines this pool through a multi-stage filtering process that progressively increases human oversight. To validate this system, we conducted an experiment aiming to identify the Top 10 Scientific Breakthroughs of 2025 and the Top 10 Scientific Questions for 2026 across five major disciplines. Our analysis reveals that while AI agents demonstrate high alignment with human experts in recognizing established breakthroughs, they exhibit greater divergence in forecasting prospective questions, suggesting that human judgment remains crucial for evaluating subjective, forward-looking challenges.

Authors:Maciej Besta, Łukasz Jarmocik, Orest Hrycyna, Shachar Klaiman, Konrad Mączka, Robert Gerstenberger, Jürgen Müller, Piotr Nyczyk, Hubert Niewiadomski, Torsten Hoefler
Title: GraphSeek: Next-Generation Graph Analytics with LLMs
Abstract:
Graphs are foundational across domains but remain hard to use without deep expertise. LLMs promise accessible natural language (NL) graph analytics, yet they fail to process industry-scale property graphs effectively and efficiently: such datasets are large, highly heterogeneous, structurally complex, and evolve dynamically. To address this, we devise a novel abstraction for complex multi-query analytics over such graphs. Its key idea is to replace brittle generation of graph queries directly from NL with planning over a Semantic Catalog that describes both the graph schema and the graph operations. Concretely, this induces a clean separation between a Semantic Plane for LLM planning and broader reasoning, and an Execution Plane for deterministic, database-grade query execution over the full dataset and tool implementations. This design yields substantial gains in both token efficiency and task effectiveness even with small-context LLMs. We use this abstraction as the basis of the first LLM-enhanced graph analytics framework called GraphSeek. GraphSeek achieves substantially higher success rates (e.g., 86% over enhanced LangChain) and points toward the next generation of affordable and accessible graph analytics that unify LLM reasoning with database-grade execution over large and complex property graphs.

Authors:Yibo Wang, Guangda Huzhang, Yuwei Hu, Yu Xia, Shiyin Lu, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, Lijun Zhang
Title: Building Autonomous GUI Navigation via Agentic-Q Estimation and Step-Wise Policy Optimization
Abstract:
Recent advances in Multimodal Large Language Models (MLLMs) have substantially driven the progress of autonomous agents for Graphical User Interface (GUI). Nevertheless, in real-world applications, GUI agents are often faced with non-stationary environments, leading to high computational costs for data curation and policy optimization. In this report, we introduce a novel MLLM-centered framework for GUI agents, which consists of two components: agentic-Q estimation and step-wise policy optimization. The former one aims to optimize a Q-model that can generate step-wise values to evaluate the contribution of a given action to task completion. The latter one takes step-wise samples from the state-action trajectory as inputs, and optimizes the policy via reinforcement learning with our agentic-Q model. It should be noticed that (i) all state-action trajectories are produced by the policy itself, so that the data collection costs are manageable; (ii) the policy update is decoupled from the environment, ensuring stable and efficient optimization. Empirical evaluations show that our framework endows Ovis2.5-9B with powerful GUI interaction capabilities, achieving remarkable performances on GUI navigation and grounding benchmarks and even surpassing contenders with larger scales.

Authors:Fei Wang, Jiangnan Yang, Junjie Chen, Yuxin Liu, Kun Li, Yanyan Wei, Dan Guo, Meng Wang
Title: XInsight: Integrative Stage-Consistent Psychological Counseling Support Agents for Digital Well-Being
Abstract:
Web-based platforms are becoming a primary channel for psychological support, yet most LLM-driven chatbots remain opaque, single-stage, and weakly grounded in established therapeutic practice, limiting their usefulness for web applications that promote digital well-being. To address this gap, we present \textbf{XInsight}, a counseling-inspired multi-agent framework that models psychological support as a stage-consistent workflow aligned with the classical \textit{Exploration-Insight-Action} paradigm. Building on structured client representations, XInsight orchestrates specialized agents under a unified \textit{Reason-Intervene-Reflect} cycle: an Exploration agent organizes background and concerns into a structured Case Conceptualization Form, a Routing agent performs Adaptive Therapeutic Routing (ATR) across SFBT, CBT, and MBCT, a unified Therapeutic agent executes school-consistent submodules, and a Consolidation agent guides review, skill integration, and relapse-prevention planning. A Recording agent continuously transforms open-ended web dialogues into standardized psychological artifacts, including case formulations, therapeutic records, and relapse-prevention plans, enhancing interpretability, continuity, and accountability. To support rigorous and transparent assessment, we introduce \textbf{XInsight-Bench} with a Scale-Guided LLM Evaluation (SGLE) protocol that combines therapy-specific clinical scales with general counseling criteria. Experiments show improved paradigm alignment, multi-therapy integration, interaction depth, and interpretability over existing multi-agent counseling systems, indicating that XInsight provides a practical blueprint for integrating counseling-inspired support agents into web applications for digital well-being.

Authors:Zhixian Zhao, Shuiyuan Wang, Guojian Li, Hongfei Xue, Chengyou Wang, Shuai Wang, Longshuai Xiao, Zihan Zhang, Hui Bu, Xin Xu, Xinsheng Wang, Hexin Liu, Eng Siong Chng, Hung-yi Lee, Haizhou Li, Lei Xie
Title: The ICASSP 2026 HumDial Challenge: Benchmarking Human-like Spoken Dialogue Systems in the LLM Era
Abstract:
Driven by the rapid advancement of Large Language Models (LLMs), particularly Audio-LLMs and Omni-models, spoken dialogue systems have evolved significantly, progressively narrowing the gap between human-machine and human-human interactions. Achieving truly ``human-like'' communication necessitates a dual capability: emotional intelligence to perceive and resonate with users' emotional states, and robust interaction mechanisms to navigate the dynamic, natural flow of conversation, such as real-time turn-taking. Therefore, we launched the first Human-like Spoken Dialogue Systems Challenge (HumDial) at ICASSP 2026 to benchmark these dual capabilities. Anchored by a sizable dataset derived from authentic human conversations, this initiative establishes a fair evaluation platform across two tracks: (1) Emotional Intelligence, targeting long-term emotion understanding and empathetic generation; and (2) Full-Duplex Interaction, systematically evaluating real-time decision-making under `` listening-while-speaking'' conditions. This paper summarizes the dataset, track configurations, and the final results.

Authors:Isadora Krsek, Meryl Ye, Wei Xu, Alan Ritter, Laura Dabbish, Sauvik Das
Title: Supporting Informed Self-Disclosure: Design Recommendations for Presenting AI-Estimates of Privacy Risks to Users
Abstract:
People candidly discuss sensitive topics online under the perceived safety of anonymity; yet, for many, this perceived safety is tenuous, as miscalibrated risk perceptions can lead to over-disclosure. Recent advances in Natural Language Processing (NLP) afford an unprecedented opportunity to present users with quantified disclosure-based re-identification risk (i.e., "population risk estimates", PREs). How can PREs be presented to users in a way that promotes informed decision-making, mitigating risk without encouraging unnecessary self-censorship? Using design fictions and comic-boarding, we story-boarded five design concepts for presenting PREs to users and evaluated them through an online survey with N = 44 Reddit users. We found participants had detailed conceptions of how PREs may impact risk awareness and motivation, but envisioned needing additional context and support to effectively interpret and act on risks. We distill our findings into four key design recommendations for how best to present users with quantified privacy risks to support informed disclosure decision-making.

Authors:Chen Gong, Zhenzhe Zheng, Yiliu Chen, Sheng Wang, Fan Wu, Guihai Chen
Title: Optimizing Feature Extraction for On-device Model Inference with User Behavior Sequences
Abstract:
Machine learning models are widely integrated into modern mobile apps to analyze user behaviors and deliver personalized services. Ensuring low-latency on-device model execution is critical for maintaining high-quality user experiences. While prior research has primarily focused on accelerating model inference with given input features, we identify an overlooked bottleneck in real-world on-device model execution pipelines: extracting input features from raw application logs. In this work, we explore a new direction of feature extraction optimization by analyzing and eliminating redundant extraction operations across different model features and consecutive model inferences. We then introduce AutoFeature, an automated feature extraction engine designed to accelerate on-device feature extraction process without compromising model inference accuracy. AutoFeature comprises three core designs: (1) graph abstraction to formulate the extraction workflows of different input features as one directed acyclic graph, (2) graph optimization to identify and fuse redundant operation nodes across different features within the graph; (3) efficient caching to minimize operations on overlapping raw data between consecutive model inferences. We implement a system prototype of AutoFeature and integrate it into five industrial mobile services spanning search, video and e-commerce domains. Online evaluations show that AutoFeature reduces end-to-end on-device model execution latency by 1.33x-3.93x during daytime and 1.43x-4.53x at night.

Authors:Wenxin Zhao, Peng Zhang, Hansu Gu, Haoxuan Zhou, Xiaojie Huo, Lin Wang, Wen Zheng, Tun Lu, Ning Gu
Title: SparkTales: Facilitating Cross-Language Collaborative Storytelling through Coordinator-AI Collaboration
Abstract:
Cross-language collaborative storytelling plays a vital role in children's language learning and cultural development, fostering both expressive ability and intercultural awareness. Yet, in practice, children's participation is often shallow, and facilitating such sessions places heavy cognitive and organizational burdens on coordinators, who must coordinate language support, maintain children's engagement, and navigate cultural differences. To address these challenges, we conducted a formative study with coordinators to identify their needs and pain points, which guided the design of SparkTales, an intelligent support system for cross-language collaborative storytelling. SparkTales leverages both individual and common characteristics of participating children to provide coordinators with story frameworks, diverse questions, and comprehension-oriented materials, aiming to reduce coordinators' workload while enhancing children's interactive engagement. Evaluation results show that SparkTales not only significantly increases coordinators' efficiency and quality of guidance but also improves children's participation, providing valuable insights for the design of future intelligent systems supporting cross-language collaboration.

Authors:Mengyao Wang, Shuai Ma, Nuo Li, Peng Zhang, Chenxin Li, Ning Gu, Tun Lu
Title: Echoes of Norms: Investigating Counterspeech Bots' Influence on Bystanders in Online Communities
Abstract:
Counterspeech offers a non-repressive approach to moderate hate speech in online communities. Research has examined how counterspeech chatbots restrain hate speakers and support targets, but their impact on bystanders remains unclear. Therefore, we developed a counterspeech strategy framework and built \textit{Civilbot} for a mixed-method within-subjects study. Bystanders generally viewed Civilbot as credible and normative, though its shallow reasoning limited persuasiveness. Its behavioural effects were subtle: when performing well, it could guide participation or act as a stand-in; when performing poorly, it could discourage bystanders or motivate them to step in. Strategy proved critical: cognitive strategies that appeal to reason, especially when paired with a positive tone, were relatively effective, while mismatch of contexts and strategies could weaken impact. Based on these findings, we offer design insights for mobilizing bystanders and shaping online discourse, highlighting when to intervene and how to do so through reasoning-driven and context-aware strategies.

Authors:Yubo Shu, Peng Zhang, Meng Wu, Yan Chen, Haoxuan Zhou, Guanming Liu, Yu Zhang, Liuxin Zhang, Qianying Wang, Tun Lu, Ning Gu
Title: SoulSeek: Exploring the Use of Social Cues in LLM-based Information Seeking
Abstract:
Social cues, which convey others' presence, behaviors, or identities, play a crucial role in human information seeking by helping individuals judge relevance and trustworthiness. However, existing LLM-based search systems primarily rely on semantic features, creating a misalignment with the socialized cognition underlying natural information seeking. To address this gap, we explore how the integration of social cues into LLM-based search influences users' perceptions, experiences, and behaviors. Focusing on social media platforms that are beginning to adopt LLM-based search, we integrate design workshops, the implementation of the prototype system (SoulSeek), a between-subjects study, and mixed-method analyses to examine both outcome- and process-level findings. The workshop informs the prototype's cue-integrated design. The study shows that social cues improve perceived outcomes and experiences, promote reflective information behaviors, and reveal limits of current LLM-based search. We propose design implications emphasizing better social-knowledge understanding, personalized cue settings, and controllable interactions.

Authors:Sirui Han, Yuyao Zhang, Yidan Huang, Xueyan Li, Chengzhong Liu, Yike Guo
Title: Reimagining Legal Fact Verification with GenAI: Toward Effective Human-AI Collaboration
Abstract:
Fact verification is a critical yet underexplored component of non-litigation legal practice. While existing research has examined automation in legal workflow and human-AI collaboration in high-stakes domains, little is known about how GenAI can support fact verification, a task that demands prudent judgment and strict accountability. To address this, we conducted semi-structured interviews with 18 lawyers to understand their current verification practices, attitudes toward GenAI adoption, and expectations for future systems. We found that while lawyers use GenAI for low-risk tasks like drafting and language optimization, concerns over accuracy, confidentiality, and liability are currently limiting its adoption for fact verification. These concerns translate into core design requirements for AI systems that are trustworthy and accountable. Based on these, we contribute design insights for human-AI collaboration in legal fact verification, emphasizing the development of auditable systems that balance efficiency with professional judgment and uphold ethical and legal accountability in high-stakes practice.

Authors:Hao Wang, Wenhui Zhu, Shao Tang, Zhipeng Wang, Xuanzhao Dong, Xin Li, Xiwen Chen, Ashish Bastola, Xinhao Huang, Yalin Wang, Abolfazl Razi
Title: EZBlender: Efficient 3D Editing with Plan-and-ReAct Agent
Abstract:
As a cornerstone of the modern digital economy, 3D modeling and rendering demand substantial resources and manual effort when scene editing is performed in the traditional manner. Despite recent progress in VLM-based agents for 3D editing, the fundamental trade-off between editing precision and agent responsiveness remains unresolved. To overcome these limitations, we present EZBlender, a Blender agent with a hybrid framework that combines planning-based task decomposition and reactive local autonomy for efficient human AI collaboration and semantically faithful 3D editing. Specifically, this unexplored Plan-and-ReAct design not only preserves editing quality but also significantly reduces latency and computational cost. To further validate the efficiency and effectiveness of the proposed edge-autonomy architecture, we construct a dedicated multi-tasking benchmark that has not been systematically investigated in prior research. In addition, we provide a comprehensive analysis of language model preference, system responsiveness, and economic efficiency.

Authors:Bijean Ghafouri, Dorsaf Sallami, Luca Luceri, Taylor Lynn Curtis, Jean-Francois Godbout, Emilio Ferrara, Reihaneh Rabbany
Title: What do people want to fact-check?
Abstract:
Research on misinformation has focused almost exclusively on supply, asking what falsehoods circulate, who produces them, and whether corrections work. A basic demand-side question remains unanswered. When ordinary people can fact-check anything they want, what do they actually ask about? We provide the first large-scale evidence on this question by analyzing close to 2{,}500 statements submitted by 457 participants to an open-ended AI fact-checking system. Each claim is classified along five semantic dimensions (domain, epistemic form, verifiability, target entity, and temporal reference), producing a behavioral map of public verification demand. Three findings stand out. First, users range widely across topics but default to a narrow epistemic repertoire, overwhelmingly submitting simple descriptive claims about present-day observables. Second, roughly one in four requests concerns statements that cannot be empirically resolved, including moral judgments, speculative predictions, and subjective evaluations, revealing a systematic mismatch between what users seek from fact-checking tools and what such tools can deliver. Third, comparison with the FEVER benchmark dataset exposes sharp structural divergences across all five dimensions, indicating that standard evaluation corpora encode a synthetic claim environment that does not resemble real-world verification needs. These results reframe fact-checking as a demand-driven problem and identify where current AI systems and benchmarks are misaligned with the uncertainty people actually experience.

Authors:Shuning Zhang, Qucheng Zang, Yongquan `Owen' Hu, Jiachen Du, Xueyang Wang, Yan Kong, Xinyi Fu, Suranga Nanayakkara, Xin Yi, Hewu Li
Title: VisGuardian: A Lightweight Group-based Privacy Control Technique For Front Camera Data From AR Glasses in Home Environments
Abstract:
Always-on sensing of AI applications on AR glasses makes traditional permission techniques ill-suited for context-dependent visual data, especially within home environments. The home presents a highly challenging privacy context due to the high density of sensitive objects, and the frequent presence of non-consenting family members, and the intimate nature of daily routines, making it a critical focus area for scalable privacy control mechanisms. Existing fine-grained controls, while offering nuanced choices, are inefficient for managing multiple private objects. We propose VisGuardian, a fine-grained content-based visual permission technique for AR glasses. VisGuardian features a group-based control mechanism that enables users to efficiently manage permissions for multiple private objects. VisGuardian detects objects using YOLO and adopts a pre-classified schema to group them. By selecting a single object, users can efficiently obscure groups of related objects based on criteria including privacy sensitivity, object category, or spatial proximity. A technical evaluation shows VisGuardian achieves mAP50 of 0.6704 with only 14.0 ms latency and a 1.7% increase in battery consumption per hour. Furthermore, a user study (N=24) comparing VisGuardian to slider-based and object-based baselines found it to be significantly faster for setting permissions and was preferred by users for its efficiency, effectiveness, and ease of use.

Authors:Shuning Zhang, Shixuan Li, Haobin Xing, Jiarui Liu, Yan Kong, Xin Yi, Hewu Li
Title: "Privacy across the boundary": Examining Perceived Privacy Risk Across Data Transmission and Sharing Ranges of Smart Home Personal Assistants
Abstract:
As Smart Home Personal Assistants (SPAs) evolve into social agents, understanding user privacy necessitates interpersonal communication frameworks, such as Privacy Boundary Theory (PBT). To ground our investigation, our three-phase preliminary study (1) identified transmission and sharing ranges as key boundary-related risk factors, (2) categorized relevant SPA functions and data types, and (3) analyzed commercial practices, revealing widespread data sharing and non-transparent safeguards. A subsequent mixed-methods study (N=412 survey, N=40 interviews among the survey participants) assessed users' perceived privacy risks across data types, transmission ranges and sharing ranges. Results demonstrate a significant, non-linear escalation in perceived risk when data crosses two critical boundaries: the `public network' (transmission) and `third parties' (sharing). This boundary effect holds robustly across data types and demographics. Furthermore, risk perception is modulated by data attributes (e.g., social relational data), and contextual privacy calculus. Conversely, anonymization safeguards show limited efficacy especially for third-party sharing, a finding attributed to user distrust. These findings empirically ground PBT in the SPA context and inform design of boundary-aware privacy protection.

Authors:Shuning Zhang, Linzhi Wang, Shixuan Li, Yuanyuan Wu, Yuwei Chuai, Luoxi Chen, Xin Yi, Hewu Li
Title: Collab: Fostering Critical Identification of Deepfake Videos on Social Media via Synergistic Annotation
Abstract:
Identifying deepfake videos on social media platforms is challenged by dynamic spatio-temporal artifacts and inadequate user tools. This hinders both critical viewing by users and scalable moderation on platforms. Here, we present Collab, a web plugin enabling users to collaboratively annotate deepfake videos. Collab integrates three key components: (i) an intuitive interface for spatio-temporal labeling where users provide confidence scores and rationales, facilitating detailed input even from non-experts, (ii) a novel confidence-weighted spatio-temporal Intersection-over-Union (IoU) algorithm to aggregate diverse user annotations into accurate aggregations, and (iii) a hierarchical demonstration strategy presenting aggregated results to guide attention toward contentious regions and foster critical evaluation. A seven-day online study (N=90), where participants annotated suspicious videos when viewing an online experimental platforms, compared Collab against two conditions without aggregation or demonstration respectively. Collab significantly improved identification accuracy and enhanced reflection compared to non-demonstration condition, while outperforming non-aggregation condition for its novelty and effectiveness.

Authors:Shuning Zhang, Eve He, Sixing Tao, Yuting Yang, Ying Ma, Ailei Wang, Xin Yi, Hewu Li
Title: A Scoping Review and Guidelines on Privacy Policy's Visualization from an HCI Perspective
Abstract:
Privacy Policies are a cornerstone of informed consent, yet a persistent gap exists between their legal intent and practical efficacy. Despite decades of Human-Computer Interaction (HCI) research proposing various visualizations, user comprehension remains low, and designs rarely see widespread adoption. To understand this landscape and chart a path forward, we synthesized 65 top-tier papers using a framework adapted from the user-centered design lifecycle. Our analysis presented findings of the field's evolution across four dimensions: (1) the trade-off between information load and decision efficacy, which demonstrates a shift from augmenting disclosures to prioritizing information condensation and cognitive load management to counter the inefficacy of comprehensive texts, (2) the co-evolutionary dynamic of design and automation, revealing that complex design ambitions such as context-awareness drove the need for advanced NLP, while recent LLM breakthroughs are enabling the semantic interpretation required to realize those designs, (3) the tension between generality and specificity, highlighting the divergence between standardized, cross-platform solutions and the increasing necessity for specialized, context-aware interaction patterns in IoT and immersive environments, and (4) balancing stakeholder opinions, which shows that visualization efficacy is constrained by the complex interplay of regulatory mandates, developer capabilities and provider incentives. We conclude by outlining four critical challenges for future research.

Authors:Timofei Kozlov, Artem Trandofilov, Georgii Gazaryan, Issatay Tokmurziyev, Miguel Altamirano Cabrera, Dzmitry Tsetserukou
Title: GuideTouch: An Obstacle Avoidance Device with Tactile Feedback for Visually Impaired
Abstract:
Safe navigation for the visually impaired individuals remains a critical challenge, especially concerning head-level obstacles, which traditional mobility aids often fail to detect. We introduce GuideTouch, a compact, affordable, standalone wearable device designed for autonomous obstacle avoidance. The system integrates two vertically aligned Time-of-Flight (ToF) sensors, enabling three-dimensional environmental perception, and four vibrotactile actuators that provide directional haptic feedback. Proximity and direction information is communicated via an intuitive 4-point vibrotactile feedback system located across the user's shoulders and upper chest. For real-world robustness, the device includes a unique centrifugal self-cleaning optical cover mechanism and a sound alarm system for location if the device is dropped. We evaluated the haptic perception accuracy across 22 participants (17 male and 5 female, aged 21-48, mean 25.7, sd 6.1). Statistical analysis confirmed a significant difference between the perception accuracy of different patterns. The system demonstrated high recognition accuracy, achieving an average of 92.9% for single and double motor (primary directional) patterns. Furthermore, preliminary experiments with 14 visually impaired users validated this interface, showing a recognition accuracy of 93.75% for primary directional cues. The results demonstrate that GuideTouch enables intuitive spatial perception and could significantly improve the safety, confidence, and autonomy of users with visual impairments during independent navigation.

Authors:Qing He, Zeyu Wang, Yuzhou Du, Jiahuan Ding, Yuanchun Shi, Yuntao Wang
Title: Does Personalized Nudging Wear Off? A Longitudinal Study of AI Self-Modeling for Behavioral Engagement
Abstract:
Sustaining the effectiveness of behavior change technologies remains a key challenge. AI self-modeling, which generates personalized portrayals of one's ideal self, has shown promise for motivating behavior change, yet prior work largely examines short-term effects. We present one of the first longitudinal evaluations of AI self-modeling in fitness engagement through a two-stage empirical study. A 1-week, three-arm experiment (visual self-modeling (VSM), auditory self-modeling (ASM), Control; N=28) revealed that VSM drove initial performance gains, while ASM showed no significant effects. A subsequent 4-week study (VSM vs. Control; N=31) demonstrated that VSM sustained higher performance levels but exhibited diminishing improvement rates after two weeks. Interviews uncovered a catalyst effect that fostered early motivation through clear, attainable goals, followed by habituation and internalization which stabilized performance. These findings highlight the temporal dynamics of personalized nudging and inform the design of behavior change technologies for long-term engagement.

Authors:Jun Fang, Ka I Chan, Xiyuxing Zhang, Yuntao Wang, Mingze Gao, Leyi Peng, Jiajin Li, Zihang Zhan, Zhixin Zhao, Yuanchun Shi
Title: Earinter: A Closed-Loop System for Eating Pace Regulation with Just-in-Time Intervention Using Commodity Earbuds
Abstract:
Rapid eating is common yet difficult to regulate in situ, partly because people seldom notice pace changes and sustained self-monitoring is effortful. We present Earinter, a commodity-earbud-based closed-loop system that integrates in-the-wild sensing, real-time reasoning, and theory-grounded just-in-time (JIT) intervention to regulate eating pace during daily meals. Earinter repurposes the earbud's bone-conduction voice sensor to capture chewing-related vibrations and estimate eating pace as chews per swallow (CPS) for on-device inference. With data collected equally across in-lab and in-the-wild sessions, Earinter achieves reliable chewing detection (F1 = 0.97) and accurate eating pace estimation (MAE: 0.18 $\pm$ 0.13 chews/min, 3.65 $\pm$ 3.86 chews/swallow), enabling robust tracking for closed-loop use. Guided by Dual Systems Theory and refined through two Wizard-of-Oz pilots, Earinter adopts a user-friendly design for JIT intervention content and delivery policy in daily meals. In a 13-day within-subject field study (N=14), the closed-loop system significantly increased CPS and reduced food-consumption speed, with statistical signs of carryover on retention-probe days and acceptable user burden. Our findings highlight how single-modality commodity earables can support practical, theory-driven closed-loop JIT interventions for regulating eating pace in the wild.

Authors:Xiaofeng Luo, Jiayi He, Jiawen Kang, Ruichen Zhang, Zhaoshui He, Ekram Hossain, Dong In Kim
Title: Cross-reality Location Privacy Protection in 6G-enabled Vehicular Metaverses: An LLM-enhanced Hybrid Generative Diffusion Model-based Approach
Abstract:
The emergence of 6G-enabled vehicular metaverses enables Autonomous Vehicles (AVs) to operate across physical and virtual spaces through space-air-ground-sea integrated networks. The AVs can deploy AI agents powered by large AI models as personalized assistants, on edge servers to support intelligent driving decision making and enhanced on-board experiences. However, such cross-reality interactions may cause serious location privacy risks, as adversaries can infer AV trajectories by correlating the location reported when AVs request LBS in reality with the location of the edge servers on which their corresponding AI agents are deployed in virtuality. To address this challenge, we design a cross-reality location privacy protection framework based on hybrid actions, including continuous location perturbation in reality and discrete privacy-aware AI agent migration in virtuality. In this framework, a new privacy metric, termed cross-reality location entropy, is proposed to effectively quantify the privacy levels of AVs. Based on this metric, we formulate an optimization problem to optimize the hybrid action, focusing on achieving a balance between location protection, service latency reduction, and quality of service maintenance. To solve the complex mixed-integer problem, we develop a novel LLM-enhanced Hybrid Diffusion Proximal Policy Optimization (LHDPPO) algorithm, which integrates LLM-driven informative reward design to enhance environment understanding with double Generative Diffusion Models-based policy exploration to handle high-dimensional action spaces, thereby enabling reliable determination of optimal hybrid actions. Extensive experiments on real-world datasets demonstrate that the proposed framework effectively mitigates cross-reality location privacy leakage for AVs while maintaining strong user immersion within 6G-enabled vehicular metaverse scenarios.

Authors:Jianwen Sun, Yukang Feng, Kaining Ying, Chuanhao Li, Zizhen Li, Fanrui Zhang, Jiaxin Ai, Yifan Chang, Yu Dai, Yifei Huang, Kaipeng Zhang
Title: World Craft: Agentic Framework to Create Visualizable Worlds via Text
Abstract:
Large Language Models (LLMs) motivate generative agent simulation (e.g., AI Town) to create a ``dynamic world'', holding immense value across entertainment and research. However, for non-experts, especially those without programming skills, it isn't easy to customize a visualizable environment by themselves. In this paper, we introduce World Craft, an agentic world creation framework to create an executable and visualizable AI Town via user textual descriptions. It consists of two main modules, World Scaffold and World Guild. World Scaffold is a structured and concise standardization to develop interactive game scenes, serving as an efficient scaffolding for LLMs to customize an executable AI Town-like environment. World Guild is a multi-agent framework to progressively analyze users' intents from rough descriptions, and synthesizes required structured contents (\eg environment layout and assets) for World Scaffold . Moreover, we construct a high-quality error-correction dataset via reverse engineering to enhance spatial knowledge and improve the stability and controllability of layout generation, while reporting multi-dimensional evaluation metrics for further analysis. Extensive experiments demonstrate that our framework significantly outperforms existing commercial code agents (Cursor and Antigravity) and LLMs (Qwen3 and Gemini-3-Pro). in scene construction and narrative intent conveyance, providing a scalable solution for the democratization of environment creation.

Authors:Zizhen Li, Chuanhao Li, Yibin Wang, Yukang Feng, Jianwen Sun, Jiaxin Ai, Fanrui Zhang, Mingzhu Sun, Yifei Huang, Kaipeng Zhang
Title: MeepleLM: A Virtual Playtester Simulating Diverse Subjective Experiences
Abstract:
Recent advancements have expanded the role of Large Language Models in board games from playing agents to creative co-designers. However, a critical gap remains: current systems lack the capacity to offer constructive critique grounded in the emergent user experience. Bridging this gap is fundamental for harmonizing Human-AI collaboration, as it empowers designers to refine their creations via external perspectives while steering models away from biased or unpredictable outcomes. Automating critique for board games presents two challenges: inferring the latent dynamics connecting rules to gameplay without an explicit engine, and modeling the subjective heterogeneity of diverse player groups. To address these, we curate a dataset of 1,727 structurally corrected rulebooks and 150K reviews selected via quality scoring and facet-aware sampling. We augment this data with Mechanics-Dynamics-Aesthetics (MDA) reasoning to explicitly bridge the causal gap between written rules and player experience. We further distill player personas and introduce MeepleLM, a specialized model that internalizes persona-specific reasoning patterns to accurately simulate the subjective feedback of diverse player archetypes. Experiments demonstrate that MeepleLM significantly outperforms latest commercial models (e.g., GPT-5.1, Gemini3-Pro) in community alignment and critique quality, achieving a 70% preference rate in user studies assessing utility. MeepleLM serves as a reliable virtual playtester for general interactive systems, marking a pivotal step towards audience-aligned, experience-aware Human-AI collaboration.

Authors:Mingyu Ouyang, Kevin Qinghong Lin, Mike Zheng Shou, Hwee Tou Ng
Title: FocusUI: Efficient UI Grounding via Position-Preserving Visual Token Selection
Abstract:
Vision-Language Models (VLMs) have shown remarkable performance in User Interface (UI) grounding tasks, driven by their ability to process increasingly high-resolution screenshots. However, screenshots are tokenized into thousands of visual tokens (e.g., about 4700 for 2K resolution), incurring significant computational overhead and diluting attention. In contrast, humans typically focus on regions of interest when interacting with UI. In this work, we pioneer the task of efficient UI grounding. Guided by practical analysis of the task's characteristics and challenges, we propose FocusUI, an efficient UI grounding framework that selects patches most relevant to the instruction while preserving positional continuity for precise grounding. FocusUI addresses two key challenges: (1) Eliminating redundant tokens in visual encoding. We construct patch-level supervision by fusing an instruction-conditioned score with a rule-based UI-graph score that down-weights large homogeneous regions to select distinct and instruction-relevant visual tokens. (2) Preserving positional continuity during visual token selection. We find that general visual token pruning methods suffer from severe accuracy degradation on UI grounding tasks due to broken positional information. We introduce a novel PosPad strategy, which compresses each contiguous sequence of dropped visual tokens into a single special marker placed at the sequence's last index to preserve positional continuity. Comprehensive experiments on four grounding benchmarks demonstrate that FocusUI surpasses GUI-specific baselines. On the ScreenSpot-Pro benchmark, FocusUI-7B achieves a performance improvement of 3.7% over GUI-Actor-7B. Even with only 30% visual token retention, FocusUI-7B drops by only 3.2% while achieving up to 1.44x faster inference and 17% lower peak GPU memory.

Authors:Peter Brodeur, Jacob M. Koshy, Anil Palepu, Khaled Saab, Ava Homiar, Roma Ruparel, Charles Wu, Ryutaro Tanno, Joseph Xu, Amy Wang, David Stutz, Wei-Hung Weng, Hannah M. Ferrera, David Barrett, Lindsey Crowley, Jihyeon Lee, Spencer E. Rittner, Ellery Wulczyn, Selena K. Zhang, Elahe Vedadi, Christine G. Kohn, Kavita Kulkarni, Vinay Kadiyala, Sara Mahdavi, Wendy Du, Jessica M. Williams, David Feinbloom, Renee Wong, Tao Tu, Petar Sirkovic, Alessio Orlandi, Christopher Semturs, Yun Liu, Juraj Gottweis, Dale R. Webster, Joëlle Barral, Katherine Chou, Pushmeet Kohli, Avinatan Hassidim, Yossi Matias, James Manyika, Rob Fields, Jonathan X. Li, Marc L. Cohen, Vivek Natarajan, Mike Schaekermann, Alan Karthikesalingam, Adam Rodman
Title: A prospective clinical feasibility study of a conversational diagnostic AI in an ambulatory primary care clinic
Abstract:
Large language model (LLM)-based AI systems have shown promise for patient-facing diagnostic and management conversations in simulated settings. Translating these systems into clinical practice requires assessment in real-world workflows with rigorous safety oversight. We report a prospective, single-arm feasibility study of an LLM-based conversational AI, the Articulate Medical Intelligence Explorer (AMIE), conducting clinical history taking and presentation of potential diagnoses for patients to discuss with their provider at urgent care appointments at a leading academic medical center. 100 adult patients completed an AMIE text-chat interaction up to 5 days before their appointment. We sought to assess the conversational safety and quality, patient and clinician experience, and clinical reasoning capabilities compared to primary care providers (PCPs). Human safety supervisors monitored all patient-AMIE interactions in real time and did not need to intervene to stop any consultations based on pre-defined criteria. Patients reported high satisfaction and their attitudes towards AI improved after interacting with AMIE (p < 0.001). PCPs found AMIE's output useful with a positive impact on preparedness. AMIE's differential diagnosis (DDx) included the final diagnosis, per chart review 8 weeks post-encounter, in 90% of cases, with 75% top-3 accuracy. Blinded assessment of AMIE and PCP DDx and management (Mx) plans suggested similar overall DDx and Mx plan quality, without significant differences for DDx (p = 0.6) and appropriateness and safety of Mx (p = 0.1 and 1.0, respectively). PCPs outperformed AMIE in the practicality (p = 0.003) and cost effectiveness (p = 0.004) of Mx. While further research is needed, this study demonstrates the initial feasibility, safety, and user acceptance of conversational AI in a real-world setting, representing crucial steps towards clinical translation.

Authors:Seokweon Jung, Jeongmin Rhee, Seoyoung Doh, Hyeon Jeon, Ghulam Jilani Quadri, Jinwook Seo
Title: Seeing Graphs Like Humans: Benchmarking Computational Measures and MLLMs for Similarity Assessment
Abstract:
Comparing graphs to identify similarities is a fundamental task in visual analytics of graph data. To support this, visual analytics systems frequently employ quantitative computational measures to provide automated guidance. However, it remains unclear how well these measures align with subjective human visual perception, thereby offering recommendations that conflict with analysts' intuitive judgments, potentially leading to confusion rather than reducing cognitive load. Multimodal Large Language Models (MLLMs), capable of visually interpreting graphs and explaining their reasoning in natural language, have emerged as a potential alternative to address this challenge. This paper bridges the gap between human and machine assessment of graph similarity through three interconnected experiments using a dataset of 1,881 node-link diagrams. Experiment 1 collects relative similarity judgments and rationales from 32 human participants, revealing consensus on graph similarity while prioritizing global shapes and edge densities over exact topological details. Experiment 2 benchmarks 16 computational measures against these human judgments, identifying Portrait divergence as the best-performing metric, though with only moderate alignment. Experiment 3 evaluates the potential of three state-of-the-art MLLMs (GPT-5, Gemini 2.5 Pro, Claude Sonnet 4.5) as perceptual proxies. The results demonstrate that MLLMs, particularly GPT-5, significantly outperform traditional measures in aligning with human graph similarity perception and provide interpretable rationales for their decisions, whereas Claude Sonnet 4.5 shows the best computational efficiency. Our findings suggest that MLLMs hold significant promise not only as effective, explainable proxies for human perception but also as intelligent guides that can uncover subtle nuances that might be overlooked by human analysts in visual analytics systems.

Authors:Suyeon Hwang, Minkyu Kweon, Jeongmin Rhee, Soohyun Lee, Seokhyeon Park, Seokweon Jung, Hyeon Jeon, Jinwook Seo
Title: HookLens: Visual Analytics for Understanding React Hooks Structures
Abstract:
Maintaining and refactoring React web applications is challenging, as React code often becomes complex due to its core API called Hooks. For example, Hooks often lead developers to create complex dependencies among components, making code behavior unpredictable and reducing maintainability, i.e., anti-patterns. To address this challenge, we present HookLens, an interactive visual analytics system that helps developers understand howHooks define dependencies and data flows between components. Informed by an iterative design process with experienced React developers, HookLens supports users to efficiently understand the structure and dependencies between components and to identify anti-patterns. A quantitative user study with 12 React developers demonstrates that HookLens significantly improves participants' accuracy in detecting anti-patterns compared to conventional code editors. Moreover, a comparative study with state-of-the-art LLM-based coding assistants confirms that these improvements even surpass the capabilities of such coding assistants on the same task.

Authors:Xinyue Gui, Ding Xia, Mark Colley, Yuan Li, Vishal Chauhan, Anubhav Anubhav, Zhongyi Zhou, Ehsan Javanmardi, Stela Hanbyeol Seo, Chia-Ming Chang, Manabu Tsukada, Takeo Igarashi
Title: Peeking Ahead of the Field Study: Exploring VLM Personas as Support Tools for Embodied Studies in HCI
Abstract:
Field studies are irreplaceable but costly, time-consuming, and error-prone, which need careful preparation. Inspired by rapid-prototyping in manufacturing, we propose a fast, low-cost evaluation method using Vision-Language Model (VLM) personas to simulate outcomes comparable to field results. While LLMs show human-like reasoning and language capabilities, autonomous vehicle (AV)-pedestrian interaction requires spatial awareness, emotional empathy, and behavioral generation. This raises our research question: To what extent can VLM personas mimic human responses in field studies? We conducted parallel studies: 1) one real-world study with 20 participants, and 2) one video-study using 20 VLM personas, both on a street-crossing task. We compared their responses and interviewed five HCI researchers on potential applications. Results show that VLM personas mimic human response patterns (e.g., average crossing times of 5.25 s vs. 5.07 s) lack the behavioral variability and depth. They show promise for formative studies, field study preparation, and human data augmentation.

Authors:Md. Tanvir Hossain, Mohd Ruhul Ameen, Akif Islam, Md. Omar Faruqe, Mahboob Qaosar, A. F. M. Mahbubur Rahman, Sanjoy Kumar Chakravarty, M. Khademul Islam Molla
Title: Beyond Ray-Casting: Evaluating Controller, Free-Hand, and Virtual-Touch Modalities for Immersive Text Entry
Abstract:
Efficient text entry remains a primary bottleneck preventing Virtual Reality (VR) from evolving into a viable productivity platform. To address this, we conducted an empirical comparison of six physical input systems across three interaction styles Controller Driven, Free Hand, and Virtual Touch evaluating both discrete tap typing and continuous gesture typing (swiping), alongside a speech to text (Voice) condition as a non physical reference modality. Results from 21 participants show that the Controller Driven Tap Gesture Combo (CD TGC) delivers the best productivity performance, achieving speeds 2.25 times higher than the slowest system and 30% faster than the current industry standard, while reducing error rates by up to 68%. A clear trade off emerged between performance and perceived usability: although controller based gesture input led on speed and accuracy, participants rated Virtual Touch Tap Typing highest in subjective experience, scoring 80% higher on the System Usability Scale (SUS) than the lowest rated alternative. We further observe that Free Hand interaction remains limited by tracking stability and physical fatigue, whereas Voice input introduces practical constraints related to privacy, editing control, and immersive engagement. Together, these findings characterize the tension between throughput and natural interaction in immersive text entry and provide data driven guidance for future VR interface design.

Authors:Xueyang Wang, Kewen Peng, Xin Yi, Hewu Li
Title: Mind the Gap: Mapping Wearer-Bystander Privacy Tensions and Context-Adaptive Pathways for Camera Glasses
Abstract:
Camera glasses create fundamental privacy tensions between wearers seeking recording functionality and bystanders concerned about unauthorized surveillance. We present a systematic multi-stakeholder evaluation of privacy mechanisms through surveys (N=525) and paired interviews (N=20) in China. Study 1 quantifies expectation-willingness gaps: bystanders consistently demand stronger information transparency and protective measures than wearers will provide, with disparities intensifying in sensitive contexts where 65-90% of bystanders would take defensive action. Study 2 evaluates twelve privacy-enhancing technologies, revealing four fundamental trade-offs that undermine current approaches: visibility versus disruption, empowerment versus burden, protection versus agency, and accountability versus exposure. These gaps reflect structural incompatibilities rather than inadequate goodwill, with context emerging as the primary determinant of privacy acceptability. We propose context-adaptive pathways that dynamically adjust protection strategies: minimal-friction visibility in public spaces, structured negotiation in semi-public environments, and automatic protection in sensitive contexts. Our findings contribute a diagnostic framework for evaluating privacy mechanisms and implications for context-aware design in ubiquitous sensing.

Authors:Xueyang Wang, Qinxuan Cen, Weitao Bi, Yunxiang Ma, Xin Yi, Robert Xiao, Xinyi Fu, Hewu Li
Title: Roomify: Spatially-Grounded Style Transformation for Immersive Virtual Environments
Abstract:
We present Roomify, a spatially-grounded transformation system that generates themed virtual environments anchored to users' physical rooms while maintaining spatial structure and functional semantics. Current VR approaches face a fundamental trade-off: full immersion sacrifices spatial awareness, while passthrough solutions break presence. Roomify addresses this through spatially-grounded transformation - treating physical spaces as "spatial containers" that preserve key functional and geometric properties of furniture while enabling radical stylistic changes. Our pipeline combines in-situ 3D scene understanding, AI-driven spatial reasoning, and style-aware generation to create personalized virtual environments grounded in physical reality. We introduce a cross-reality authoring tool enabling fine-grained user control through MR editing and VR preview workflows. Two user studies validate our approach: one with 18 VR users demonstrates a 63% improvement in presence over passthrough and 26% over fully virtual baselines while maintaining spatial awareness; another with 8 design professionals confirms the system's creative expressiveness (scene quality: 5.95/7; creativity support: 6.08/7) and professional workflow value across diverse environments.

Authors:Xinfeng Li, Shenyu Dai, Kelong Zheng, Yue Xiao, Gelei Deng, Wei Dong, Xiaofeng Wang
Title: "Are You Sure?": An Empirical Study of Human Perception Vulnerability in LLM-Driven Agentic Systems
Abstract:
Large language model (LLM) agents are rapidly becoming trusted copilots in high-stakes domains like software development and healthcare. However, this deepening trust introduces a novel attack surface: Agent-Mediated Deception (AMD), where compromised agents are weaponized against their human users. While extensive research focuses on agent-centric threats, human susceptibility to deception by a compromised agent remains unexplored. We present the first large-scale empirical study with 303 participants to measure human susceptibility to AMD. This is based on HAT-Lab (Human-Agent Trust Laboratory), a high-fidelity research platform we develop, featuring nine carefully crafted scenarios spanning everyday and professional domains (e.g., healthcare, software development, human resources). Our 10 key findings reveal significant vulnerabilities and provide future defense perspectives. Specifically, only 8.6% of participants perceive AMD attacks, while domain experts show increased susceptibility in certain scenarios. We identify six cognitive failure modes in users and find that their risk awareness often fails to translate to protective behavior. The defense analysis reveals that effective warnings should interrupt workflows with low verification costs. With experiential learning based on HAT-Lab, over 90% of users who perceive risks report increased caution against AMD. This work provides empirical evidence and a platform for human-centric agent security research.

Authors:M. Hamza Mughal, Rishabh Dabral, Vera Demberg, Christian Theobalt
Title: MIBURI: Towards Expressive Interactive Gesture Synthesis
Abstract:
Embodied Conversational Agents (ECAs) aim to emulate human face-to-face interaction through speech, gestures, and facial expressions. Current large language model (LLM)-based conversational agents lack embodiment and the expressive gestures essential for natural interaction. Existing solutions for ECAs often produce rigid, low-diversity motions, that are unsuitable for human-like interaction. Alternatively, generative methods for co-speech gesture synthesis yield natural body gestures but depend on future speech context and require long run-times. To bridge this gap, we present MIBURI, the first online, causal framework for generating expressive full-body gestures and facial expressions synchronized with real-time spoken dialogue. We employ body-part aware gesture codecs that encode hierarchical motion details into multi-level discrete tokens. These tokens are then autoregressively generated by a two-dimensional causal framework conditioned on LLM-based speech-text embeddings, modeling both temporal dynamics and part-level motion hierarchy in real time. Further, we introduce auxiliary objectives to encourage expressive and diverse gestures while preventing convergence to static poses. Comparative evaluations demonstrate that our causal and real-time approach produces natural and contextually aligned gestures against recent baselines. We urge the reader to explore demo videos on https://vcai.mpi-inf.mpg.de/projects/MIBURI/.

Authors:Cathy Mengying Fang, Sheer Karny, Chayapatr Archiwaranguprok, Yasith Samaradivakara, Pat Pataranutaporn, Pattie Maes
Title: AI-Wrapped: Participatory, Privacy-Preserving Measurement of Longitudinal LLM Use In-the-Wild
Abstract:
Alignment research on large language models (LLMs) increasingly depends on understanding how these systems are used in everyday contexts. yet naturalistic interaction data is difficult to access due to privacy constraints and platform control. We present AI-Wrapped, a prototype workflow for collecting naturalistic LLM usage data while providing participants with an immediate ``wrapped''-style report on their usage statistics, top topics, and safety-relevant behavioral patterns. We report findings from an initial deployment with 82 U.S.-based adults across 48,495 conversations from their 2025 histories. Participants used LLMs for both instrumental and reflective purposes, including creative work, professional tasks, and emotional or existential themes. Some usage patterns were consistent with potential over-reliance or perfectionistic refinement, while heavier users showed comparatively more reflective exchanges than primarily transactional ones. Methodologically, even with zero data retention and PII removal, participants may remain hesitant to share chat data due to perceived privacy and judgment risks, underscoring the importance of trust, agency, and transparent design when building measurement infrastructure for alignment research.

Authors:Yibo Lyu, Gongwei Chen, Rui Shao, Weili Guan, Liqiang Nie
Title: PersonalAlign: Hierarchical Implicit Intent Alignment for Personalized GUI Agent with Long-Term User-Centric Records
Abstract:
While GUI agents have shown strong performance under explicit and completion instructions, real-world deployment requires aligning with users' more complex implicit intents. In this work, we highlight Hierarchical Implicit Intent Alignment for Personalized GUI Agent (PersonalAlign), a new agent task that requires agents to leverage long-term user records as persistent context to resolve omitted preferences in vague instructions and anticipate latent routines by user state for proactive assistance. To facilitate this study, we introduce AndroidIntent, a benchmark designed to evaluate agents' ability in resolving vague instructions and providing proactive suggestions through reasoning over long-term user records. We annotated 775 user-specific preferences and 215 routines from 20k long-term records across different users for evaluation. Furthermore, we introduce Hierarchical Intent Memory Agent (HIM-Agent), which maintains a continuously updating personal memory and hierarchically organizes user preferences and routines for personalization. Finally, we evaluate a range of GUI agents on AndroidIntent, including GPT-5, Qwen3-VL, and UI-TARS, further results show that HIM-Agent significantly improves both execution and proactive performance by 15.7% and 7.3%.

Authors:Wanghao Ye, Sihan Chen, Yiting Wang, Shwai He, Bowei Tian, Guoheng Sun, Ziyi Wang, Ziyao Wang, Yexiao He, Zheyu Shen, Meng Liu, Yuning Zhang, Meng Feng, Yifei Dong, Yanhong Qian, Yang Wang, Siyuan Peng, Yilong Dai, Zhenle Duan, Joshua Liu, Lang Xiong, Hanzhang Qin, Ang Li
Title: Cognibit: From Digital Exhaustion to Real-World Connection Through Gamified Territory Control and LLM-Powered Twin Networking
Abstract:
We present an LLM-powered social discovery platform that uses digital twins to autonomously evaluate interpersonal compatibility through behavioral simulation. The platform unifies three key pillars: (1) digital twins that engage in autonomous multi-turn conversations on behalf of users to estimate compatibility, (2) gamified territory conquest mechanics that incentivize real-world exploration and create organic settings for in-person encounters, and (3) AI companions that preserve persistent shared memory across devices. Built upon CogniPair's cognitive architecture (Ye et al., 2026), validated on the Columbia Speed Dating dataset (551 participants), our system extends prior simulation-only matching into a fully deployed social discovery environment. Through deployment, we derive empirical cost-quality baselines and identify fundamental scaling bottlenecks that remain hidden in component-level testing alone.

Authors:Boyu Qiao, Yunman Chen, Kun Li, Wei Zhou, Songlin Hu, Yunya Song
Title: MGDIL: Multi-Granularity Summarization and Domain-Invariant Learning for Cross-Domain Social Bot Detection
Abstract:
Social bots increasingly infiltrate online platforms through sophisticated disguises, threatening healthy information ecosystems. Existing detection methods often rely on modality specific cues or local contextual features, making them brittle when modalities are missing or inputs are incomplete. Moreover, most approaches assume similar train test distributions, which limits their robustness to out of distribution (OOD) samples and emerging bot types. To address these challenges, we propose Multi Granularity Summarization and Domain Invariant Learning (MGDIL), a unified framework for robust social bot detection under domain shift. MGDIL first transforms heterogeneous signals into unified textual representations through LLM based multi granularity summarization. Building on these representations, we design a collaborative optimization framework that integrates task oriented LLM instruction tuning with domain invariant representation learning. Specifically, task oriented instruction tuning enhances the LLMs ability to capture subtle semantic cues and implicit camouflage patterns, while domain adversarial learning and cross domain contrastive learning are jointly employed to mitigate distribution shifts across datasets and time periods. Through this joint optimization, MGDIL learns stable and discriminative domain invariant features, improving cross domain social bot detection through better distribution alignment, stronger intra class compactness, and clearer inter class separation.

Authors:Jiongchi Yu, Xiaolin Wen, Sizhe Cheng, Xiaofei Xie, Qiang Hu, Yong Wang
Title: Human in the Loop for Fuzz Testing: Literature Review and the Road Ahead
Abstract:
Fuzz testing is one of the most effective techniques for detecting bugs and vulnerabilities in software. However, as the basis of fuzz testing, automated heuristics often fail to uncover deep or complex vulnerabilities. As a result, the performance of fuzz testing remains limited. One promising way to address this limitation is to integrate human expert guidance into the paradigm of fuzz testing. Even though some works have been proposed in this direction, there is still a lack of a systematic research roadmap for combining Human-in-the-Loop (HITL) and fuzz testing, hindering the potential for further enhancing fuzzing effectiveness. To bridge this gap, this paper outlines a forward-looking research roadmap for HITL for fuzz testing. Specifically, we highlight the promise of visualization techniques for interpretable fuzzing processes, as well as on-the-fly interventions that enable experts to guide fuzzing toward hard-to-reach program behaviors. Moreover, the rise of Large Language Models (LLMs) introduces new opportunities and challenges, raising questions about how humans can efficiently provide actionable knowledge, how expert meta-knowledge can be leveraged, and what roles humans should play in the intelligent fuzzing loop with LLMs. To address these questions, we survey existing work on HITL fuzz testing and propose a research agenda emphasizing future opportunities in (1) human monitoring, (2) human steering, and (3) human-LLM collaboration. We call for a paradigm shift toward interactive, human-guided fuzzing systems that integrate expert insight with AI-powered automation in the next-generation fuzzing ecosystem.

Authors:Rafael R. Baptista, André de Lima Salgado, Ricardo V. Godoy, Marcelo Becker, Thiago Boaventura, Gustavo J. G. Lahr
Title: Evaluating Zero-Shot and One-Shot Adaptation of Small Language Models in Leader-Follower Interaction
Abstract:
Leader-follower interaction is an important paradigm in human-robot interaction (HRI). Yet, assigning roles in real time remains challenging for resource-constrained mobile and assistive robots. While large language models (LLMs) have shown promise for natural communication, their size and latency limit on-device deployment. Small language models (SLMs) offer a potential alternative, but their effectiveness for role classification in HRI has not been systematically evaluated. In this paper, we present a benchmark of SLMs for leader-follower communication, introducing a novel dataset derived from a published database and augmented with synthetic samples to capture interaction-specific dynamics. We investigate two adaptation strategies: prompt engineering and fine-tuning, studied under zero-shot and one-shot interaction modes, compared with an untrained baseline. Experiments with Qwen2.5-0.5B reveal that zero-shot fine-tuning achieves robust classification performance (86.66% accuracy) while maintaining low latency (22.2 ms per sample), significantly outperforming baseline and prompt-engineered approaches. However, results also indicate a performance degradation in one-shot modes, where increased context length challenges the model's architectural capacity. These findings demonstrate that fine-tuned SLMs provide an effective solution for direct role assignment, while highlighting critical trade-offs between dialogue complexity and classification reliability on the edge.

Authors:Chloe Qianhui Zhao, Jie Cao, Jionghao Lin, Kenneth R. Koedinger
Title: LLM-based Multimodal Feedback Produces Equivalent Learning and Better Student Perceptions than Educator Feedback
Abstract:
Providing timely, targeted, and multimodal feedback helps students quickly correct errors, build deep understanding and stay motivated, yet making it at scale remains a challenge. This study introduces a real-time AI-facilitated multimodal feedback system that integrates structured textual explanations with dynamic multimedia resources, including the retrieved most relevant slide page references and streaming AI audio narration. In an online crowdsourcing experiment, we compared this system against fixed business-as-usual feedback by educators across three dimensions: (1) learning effectiveness, (2) learner engagement, (3) perceived feedback quality and value. Results showed that AI multimodal feedback achieved learning gains equivalent to original educator feedback while significantly outperforming it on perceived clarity, specificity, conciseness, motivation, satisfaction, and reducing cognitive load, with comparable correctness, trust, and acceptance. Process logs revealed distinct engagement patterns: for multiple-choice questions, educator feedback encouraged more submissions; for open-ended questions, AI-facilitated targeted suggestions lowered revision barriers and promoted iterative improvement. These findings highlight the potential of AI multimodal feedback to provide scalable, real-time, and context-aware support that both reduces instructor workload and enhances student experience.

Authors:Theofanis P. Raptis, Chiara Boldrini, Marco Conti, Andrea Passarella
Title: Sparing User Time with a Socially-Aware Independent Metaverse Avatar
Abstract:
The Metaverse is redefining digital interactions by merging physical, virtual, and social dimensions, yet its effects on social networking remain largely unexplored. This work examines the role of independent avatars (autonomous digital entities capable of managing social interactions on behalf of users), to optimize social time allocation and reshape Metaverse-based Online Social Networks. We propose a novel computational model that integrates a quantitative and realistic representation of user social life, grounded in evolutionary anthropology, with a framework for avatar-mediated interactions. Our model quantifies the effectiveness of a partial replacement of in-person interactions with independent avatar interactions. Additionally, it accounts for social conflicts and specific socialization constraints. We leverage our model to explore the benefits and trade-offs of an avatar-augmented social life in the Metaverse. Since the exact problem formulation leads to an NP-hard optimization problem when incorporating avatars into the social network, we tackle this challenge by introducing a heuristic solution. Through simulations, we compare avatar-mediated and non-avatar-mediated social networking, demonstrating the potential of independent avatars to enhance social connectivity and efficiency. Our findings provide a foundation for optimizing Metaverse-based social interactions, as well as useful insights for future digital social network design.

Authors:Stefano Scanzio, Paolo Campagnale, Pietro Chiavassa, Gianluca Cena
Title: QRmap: executable QR codes for Navigation in Industrial Environments and Beyond
Abstract:
QR codes are nowadays customarily used for embedding static data such as web hyperlinks or plain text. The sQRy technology (executable QR codes) permits to embed executable programs in QR codes, enabling people to interact with them even without an internet connection. In this work we present QRmap, a specific dialect that permits the inclusion of geographic maps in sQRy and supports interaction with the user to provide indications to reach the destination of interest. The QRmap technology facilitates navigation in large industrial plants where internet connectivity is absent, due to either environmental limitations or company policies. The proposed technology can have interesting applications in non-industrial contexts as well.

Authors:Yuxi Ma, Yongqian Peng, Fengyuan Yang, Siyu Zha, Chi Zhang, Zixia Jia, Zilong Zheng, Yixin Zhu
Title: NarrativeLoom: Enhancing Creative Storytelling through Multi-Persona Collaborative Improvisation
Abstract:
Large Language Models show promise for AI-assisted storytelling, yet current tools often generate predictable, unoriginal narratives. To address this limitation, we present NarrativeLoom, a multi-persona co-creative system grounded in Campbell's Blind Variation and Selective Retention theory. NarrativeLoom deploys specialized AI personas to generate diverse narrative options (blind variation), while users act as creative directors to select and refine them (selective retention). We designed a controlled study with 50 participants and found that stories co-authored with NarrativeLoom were not only perceived by users as more novel and diverse but were also objectively rated by experts as significantly better across all Torrance Test creativity dimensions: fluency, flexibility, originality, and elaboration. Stories are significantly longer with richer settings and more dialogue. Writing expertise emerged as a moderator: novices benefited more from structured scaffolding. This demonstrates the value of theory-informed co-creative systems and the importance of adapting them to varying user expertise.

Authors:Anna Bodonhelyi, Mengdi Wang, Efe Bozkir, Babette Bühler, Enkelejda Kasneci
Title: Safeguarding Privacy: Privacy-Preserving Detection of Mind Wandering and Disengagement Using Federated Learning in Online Education
Abstract:
Since the COVID-19 pandemic, online courses have expanded access to education, yet the absence of direct instructor support challenges learners' ability to self-regulate attention and engagement. Mind wandering and disengagement can be detrimental to learning outcomes, making their automated detection via video-based indicators a promising approach for real-time learner support. However, machine learning-based approaches often require sharing sensitive data, raising privacy concerns. Federated learning offers a privacy-preserving alternative by enabling decentralized model training while also distributing computational load. We propose a framework exploiting cross-device federated learning to address different manifestations of behavioral and cognitive disengagement during remote learning, specifically behavioral disengagement, mind wandering, and boredom. We fit video-based cognitive disengagement detection models using facial expressions and gaze features. By adopting federated learning, we safeguard users' data privacy through privacy-by-design and introduce a novel solution with the potential for real-time learner support. We further address challenges posed by eyeglasses by incorporating related features, enhancing overall model performance. To validate the performance of our approach, we conduct extensive experiments on five datasets and benchmark multiple federated learning algorithms. Our results show great promise for privacy-preserving educational technologies promoting learner engagement.

Authors:Yinghao Zhu, Dehao Sui, Zixiang Wang, Xuning Hu, Lei Gu, Yifan Qi, Tianchen Wu, Ling Wang, Yuan Wei, Wen Tang, Zhihan Cui, Yasha Wang, Lequan Yu, Ewen M Harrison, Junyi Gao, Liantao Ma
Title: Augmenting Clinical Decision-Making with an Interactive and Interpretable AI Copilot: A Real-World User Study with Clinicians in Nephrology and Obstetrics
Abstract:
Clinician skepticism toward opaque AI hinders adoption in high-stakes healthcare. We present AICare, an interactive and interpretable AI copilot for collaborative clinical decision-making. By analyzing longitudinal electronic health records, AICare grounds dynamic risk predictions in scrutable visualizations and LLM-driven diagnostic recommendations. Through a within-subjects counterbalanced study with 16 clinicians across nephrology and obstetrics, we comprehensively evaluated AICare using objective measures (task completion time and error rate), subjective assessments (NASA-TLX, SUS, and confidence ratings), and semi-structured interviews. Our findings indicate AICare's reduced cognitive workload. Beyond performance metrics, qualitative analysis reveals that trust is actively constructed through verification, with interaction strategies diverging by expertise: junior clinicians used the system as cognitive scaffolding to structure their analysis, while experts engaged in adversarial verification to challenge the AI's logic. This work offers design implications for creating AI systems that function as transparent partners, accommodating diverse reasoning styles to augment rather than replace clinical judgment.

Authors:Hasan Tarik Akbaba, Efe Bozkir, Anna Puhl, Süleyman Özdel, Enkelejda Kasneci
Title: Exploring Organizational Readiness and Ecosystem Coordination for Industrial XR
Abstract:
Extended Reality (XR) offers transformative potential for industrial support, training, and maintenance; yet, widespread adoption lags despite demonstrated occupational value and hardware maturity. Organizations successfully implement XR in isolated pilots, yet struggle to scale these into sustained operational deployment, a phenomenon we characterize as the ``Pilot Trap.'' This study examines this phenomenon through a qualitative ecosystem analysis of 17 expert interviews across technology providers, solution integrators, and industrial adopters. We identify a ``Great Inversion'' in adoption barriers: critical constraints have shifted from technological maturity to organizational readiness (e.g., change management, key performance indicator alignment, and political resistance). While hardware ergonomics and usability remain relevant, our findings indicate that systemic misalignments between stakeholder incentives are the primary cause of friction preventing enterprise integration. We conclude that successful industrial XR adoption requires a shift from technology-centric piloting to a problem-first, organizational transformation approach, necessitating explicit ecosystem-level coordination.

Authors:Lecheng Gong, Weimin Fang, Ting Yang, Dongjie Tao, Chunxiao Guo, Peng Wei, Bo Xie, Jinqun Guan, Zixiao Chen, Fang Shi, Jinjie Gu, Junwei Liu
Title: MedDialogRubrics: A Comprehensive Benchmark and Evaluation Framework for Multi-turn Medical Consultations in Large Language Models
Abstract:
Medical conversational AI (AI) plays a pivotal role in the development of safer and more effective medical dialogue systems. However, existing benchmarks and evaluation frameworks for assessing the information-gathering and diagnostic reasoning abilities of medical large language models (LLMs) have not been rigorously evaluated. To address these gaps, we present MedDialogRubrics, a novel benchmark comprising 5,200 synthetically constructed patient cases and over 60,000 fine-grained evaluation rubrics generated by LLMs and subsequently refined by clinical experts, specifically designed to assess the multi-turn diagnostic capabilities of LLM. Our framework employs a multi-agent system to synthesize realistic patient records and chief complaints from underlying disease knowledge without accessing real-world electronic health records, thereby mitigating privacy and data-governance concerns. We design a robust Patient Agent that is limited to a set of atomic medical facts and augmented with a dynamic guidance mechanism that continuously detects and corrects hallucinations throughout the dialogue, ensuring internal coherence and clinical plausibility of the simulated cases. Furthermore, we propose a structured LLM-based and expert-annotated rubric-generation pipeline that retrieves Evidence-Based Medicine (EBM) guidelines and utilizes the reject sampling to derive a prioritized set of rubric items ("must-ask" items) for each case. We perform a comprehensive evaluation of state-of-the-art models and demonstrate that, across multiple assessment dimensions, current models face substantial challenges. Our results indicate that improving medical dialogue will require advances in dialogue management architectures, not just incremental tuning of the base-model.

Authors:Zixin Chen, Haotian Li, Zhe Liu, Huamin Qu, Xing Xie
Title: From Passive Consumption to Active Interaction: Exploring Interactive LLM Scaffolding to Support Learning Engagement
Abstract:
Large Language Models (LLMs) are increasingly used as learning companions, providing scaffolded explanations, hints, or step-by-step guidance. However, in current LLM-based learning scenarios, scaffolded content is primarily consumed passively, offering limited support for active learner engagement. Learning science research suggests that effective educational scaffolding depends not only on what support is provided, but also on how learners engage with it. In this work, we explore whether embedding lightweight interactive components into LLM-generated scaffolding responses can promote learning-oriented engagement and improve short-term learning outcomes. We evaluated this approach through a within-subjects laboratory study (N=8). Results provide initial evidence that interactive scaffolding increases learners' perceived engagement and attentional focus, while supporting short-term learning performance. We conclude with design implications for integrating interaction into LLM-generated scaffolding to support active learning engagement.

Authors:Leixian Shen, Yan Luo, Rui Sheng, Yujia He, Haotian Li, Leni Yang, Huamin Qu
Title: StoryLensEdu: Personalized Learning Report Generation through Narrative-Driven Multi-Agent Systems
Abstract:
Personalized feedback plays an important role in self-regulated learning (SRL), helping students track progress and refine their strategies. However, current common solutions, such as text-based reports or learning analytics dashboards, often suffer from poor interpretability, monotonous presentation, and limited explainability. To overcome these challenges, we present StoryLensEdu, a narrative-driven multi-agent system that automatically generates intuitive, engaging, and interactive learning reports. StoryLensEdu integrates three agents: a Data Analyst that extracts data insights based on a learning objective centered structure, a Teacher that ensures educational relevance and offers actionable suggestions, and a Storyteller that organizes these insights using the Heroes Journey narrative framework. StoryLensEdu supports post-generation interactive question answering to improve explainability and user engagement. We conducted a formative study in a real high school and iteratively developed StoryLensEdu in collaboration with an e-learning team to inform our design. Evaluation with real users shows that StoryLensEdu enhances engagement and promotes a deeper understanding of the learning process.

Authors:Yuying Tang, Jiayi Zhou, Haotian Li, Xing Xie, Xiaojuan Ma, Huamin Qu
Title: How Do Human Creators Embrace Human-AI Co-Creation? A Perspective on Human Agency of Screenwriters
Abstract:
Generative AI has greatly transformed creative work in various domains, such as screenwriting. To understand this transformation, prior research often focused on capturing a snapshot of human-AI co-creation practice at a specific moment, with less attention to how humans mobilize, regulate, and reflect to form the practice gradually. Motivated by Bandura's theory of human agency, we conducted a two-week study with 19 professional screenwriters to investigate how they embraced AI in their creation process. Our findings revealed that screenwriters not only mindfully planned, foresaw, and responded to AI usage, but, more importantly, through reflections on practice, they developed themselves and human-AI co-creation paradigms, such as cognition, strategies, and workflows. They also expressed various expectations for how future AI should better support their agency. Based on our findings, we conclude this paper with extensive discussion and actionable suggestions to screenwriters, tool developers, and researchers for sustainable human-AI co-creation.

Authors:Yuying Tang, Xinyi Chen, Haotian Li, Xing Xie, Xiaojuan Ma, Huamin Qu
Title: DuoDrama: Supporting Screenplay Refinement Through LLM-Assisted Human Reflection
Abstract:
AI has been increasingly integrated into screenwriting practice. In refinement, screenwriters expect AI to provide feedback that supports reflection across the internal perspective of characters and the external perspective of the overall story. However, existing AI tools cannot sufficiently coordinate the two perspectives to meet screenwriters' needs. To address this gap, we present DuoDrama, an AI system that generates feedback to assist screenwriters' reflection in refinement. To enable DuoDrama, based on performance theories and a formative study with nine professional screenwriters, we design the Experience-Grounded Feedback Generation Workflow for Human Reflection (ExReflect). In ExReflect, an AI agent adopts an experience role to generate experience and then shifts to an evaluation role to generate feedback based on the experience. A study with fourteen professional screenwriters shows that DuoDrama improves feedback quality and alignment and enhances the effectiveness, depth, and richness of reflection. We conclude by discussing broader implications and future directions.

Authors:Lingjun Zhao, Dayeon Ki, Marine Carpuat, Hal Daumé
Title: Pragmatics Meets Culture: Culturally-adapted Artwork Description Generation and Evaluation
Abstract:
Language models are known to exhibit various forms of cultural bias in decision-making tasks, yet much less is known about their degree of cultural familiarity in open-ended text generation tasks. In this paper, we introduce the task of culturally-adapted art description generation, where models describe artworks for audiences from different cultural groups who vary in their familiarity with the cultural symbols and narratives embedded in the artwork. To evaluate cultural competence in this pragmatic generation task, we propose a framework based on culturally grounded question answering. We find that base models are only marginally adequate for this task, but, through a pragmatic speaker model, we can improve simulated listener comprehension by up to 8.2%. A human study further confirms that the model with higher pragmatic competence is rated as more helpful for comprehension by 8.0%.

Authors:Shi Qiu, Ruiyang Li, Qixuan Liu, Yuqi Tong, Yue Qiu, Yinqiao Wang, Yan Li, Chi-Wing Fu, Pheng-Ann Heng
Title: A Collaborative Extended Reality Prototype for 3D Surgical Planning and Visualization
Abstract:
We present a collaborative extended reality (XR) prototype for 3D surgical planning and visualization. Our system consists of three key modules: XR-based immersive surgical planning, cloud-based data management, and coordinated stereoscopic 3D displays for interactive visualization. We describe the overall workflow, core functionalities, implementations and setups. By conducting user studies on a liver resection surgical planning case, we demonstrate the effectiveness of our prototype and provide practical insights to inspire future advances in medical XR collaboration.

Authors:Yi Wang, Kexin Cheng, Xiao Liu, Chetan Arora, John Grundy, Thuong Hoang, Henry Been-Lirn Duh
Title: Auto-Generating Personas from User Reviews in VR App Stores
Abstract:
Personas are a valuable tool for discussing accessibility requirements in software design and development practices. However, the use of personas for accessibility-focused requirements elicitation in VR projects remains limited and is accompanied by several challenges. To fill this gap, we developed an auto-generated persona system in a VR course, where the personas were used to facilitate discussions on accessibility requirements and to guide VR design and development. Our findings indicate that the auto-generated persona system enabled students to develop empathy more efficiently. This study demonstrates the use of automatically generated personas in VR course settings as a means of eliciting latent accessibility requirements.

Authors:Yi Wang, Zhengxin Zhang, Xiao Liu, Chetan Arora, John Grundy, Thuong Hoang
Title: Discussing Your Needs in VR: A Novel Approach through Persona-based Stakeholder Role-Playing
Abstract:
In this study, we propose a novel approach that supports requirements discussions in virtual environments by automatically generating personas from real-time speech-to-text data. In our pilot experiment, 18 participants (14 from universities and 4 from IT companies) used the generated personas to discuss accessibility requirements within the virtual environment. Participants reported a relatively high level of satisfaction with the social presence and usability of the VR system. We also found that requirements discussions based on personas have a lower workload. Finally, we outline the main directions for future work.

Authors:Yi Wang, Ben Cheng, Xiao Liu, Chetan Arora, John Grundy, Thuong Hoang
Title: VRARE: Using Virtual Reality to Understand Accessibility Requirements of Color Blindness and Weakness
Abstract:
In this paper, we developed a virtual reality (VR) system that can simulate color blindness and weakness. We built an immersive 3D web view interface where participants can discuss accessibility requirements for a fitness website projects within a virtual fitness environment. We conducted a pilot experiment involving 24 participants from six software teams, who used both VR and non-VR methods to understand color blindness and weakness requirements in a website project. Our findings indicate that using VR can provide several benefits for requirements activities, such as an improved user experience and reduced workload.

Authors:Eunkyu Park, Wesley Hanwen Deng, Cheyon Jin, Matheus Kunzler Maldaner, Jordan Wheeler, Jason I. Hong, Hong Shen, Adam Perer, Ken Holstein, Motahhare Eslami, Gunhee Kim
Title: MM-SCALE: Grounded Multimodal Moral Reasoning via Scalar Judgment and Listwise Alignment
Abstract:
Vision-Language Models (VLMs) continue to struggle to make morally salient judgments in multimodal and socially ambiguous contexts. Prior works typically rely on binary or pairwise supervision, which often fail to capture the continuous and pluralistic nature of human moral reasoning. We present MM-SCALE (Multimodal Moral Scale), a large-scale dataset for aligning VLMs with human moral preferences through 5-point scalar ratings and explicit modality grounding. Each image-scenario pair is annotated with moral acceptability scores and grounded reasoning labels by humans using an interface we tailored for data collection, enabling listwise preference optimization over ranked scenario sets. By moving from discrete to scalar supervision, our framework provides richer alignment signals and finer calibration of multimodal moral reasoning. Experiments show that VLMs fine-tuned on MM-SCALE achieve higher ranking fidelity and more stable safety calibration than those trained with binary signals.

Authors:Eason Chen, Xinyi Tang, Yvonne Zhao, Meiyi Chen, Meryam Elmir, Elizabeth McLaughlin, Mingyu Yuan, Yumo Wang, Shyam Agarwal, Jared Cochrane, Jionghao Lin, Tongshuang Wu, Ken Koedinger
Title: Practice Less, Explain More: LLM-Supported Self-Explanation Improves Explanation Quality on Transfer Problems in Calculus
Abstract:
We conducted a between-subjects experiment (N=92) comparing three conditions in a calculus learning environment: no self-explanation (control), menu-based self-explanation, and open-ended self-explanation with LLM-generated feedback. All conditions showed positive learning gains within a fixed 60-minute practice session, with no significant between-condition differences in post-test performance. On transfer questions, the open-ended condition produced significantly higher-quality explanations than control on "Not Enough Information" (NEI) problems ($β$=+11.9 percentage points, $p$=.030), though the corresponding NEI multiple-choice accuracy advantage was not significant ($p$=.183). Moreover, across all post-test open-ended explanations, the open-ended condition showed a marginally significant advantage ($β$=+7.3%, $p$=.057). These findings suggest that LLM-supported open-ended self-explanation can improve explanation quality on NEI transfer problems, with weaker evidence across broader transfer explanation measures. Notably, these effects emerged even though learners in the open-ended condition completed substantially fewer practice problems within the same practice time.

Authors:Yu Mei, Ziyao Zhang, Qingyang Wan, Shiyi Wang, Ge Wang, Jie Cai, Chun Yu, Yuanchun Shi
Title: Adapting AI to the Moment: Understanding the Dynamics of Parent-AI Collaboration Modes in Real-Time Conversations with Children
Abstract:
Parent-AI collaboration to support real-time conversations with children is challenging due to the sensitivity and open-ended nature of such interactions. Existing systems often simplify collaboration into static modes, providing limited support for adapting AI to continuously evolving conversational contexts. To address this gap, we systematically investigate the dynamics of parent-AI collaboration modes in real-time conversations with children. We conducted a co-design study with eight parents and developed COMPASS, a research probe that enables flexible combinations of parental support functions during conversations. Using COMPASS, we conducted a lab-based study with 21 parent-child pairs. We show that parent-AI collaboration unfolds through evolving modes that adapt systematically to contextual factors. We further identify three types of parental strategies--parent-oriented, child-oriented, and relationship-oriented--that shape how parents engage with AI. These findings advance the understanding of dynamic human-AI collaboration in relational, high-stakes settings and inform the design of flexible, context-adaptive parental support systems.

Authors:Chengwen Zhang, Chun Yu, Borong Zhuang, Haopeng Jin, Qingyang Wan, Zhuojun Li, Zhe He, Zhoutong Ye, Yu Mei, Chang Liu, Weinan Shi, Yuanchun Shi
Title: HiSync: Spatio-Temporally Aligning Hand Motion from Wearable IMU and On-Robot Camera for Command Source Identification in Long-Range HRI
Abstract:
Long-range Human-Robot Interaction (HRI) remains underexplored. Within it, Command Source Identification (CSI) - determining who issued a command - is especially challenging due to multi-user and distance-induced sensor ambiguity. We introduce HiSync, an optical-inertial fusion framework that treats hand motion as binding cues by aligning robot-mounted camera optical flow with hand-worn IMU signals. We first elicit a user-defined (N=12) gesture set and collect a multimodal command gesture dataset (N=38) in long-range multi-user HRI scenarios. Next, HiSync extracts frequency-domain hand motion features from both camera and IMU data, and a learned CSINet denoises IMU readings, temporally aligns modalities, and performs distance-aware multi-window fusion to compute cross-modal similarity of subtle, natural gestures, enabling robust CSI. In three-person scenes up to 34m, HiSync achieves 92.32% CSI accuracy, outperforming the prior SOTA by 48.44%. HiSync is also validated on real-robot deployment. By making CSI reliable and natural, HiSync provides a practical primitive and design guidance for public-space HRI.

Authors:Kihoon Son, Hyewon Lee, DaEun Choi, Yoonsu Kim, Tae Soo Kim, Yoonjoo Lee, John Joon Young Chung, HyunJoon Jung, Juho Kim
Title: "When to Hand Off, When to Work Together": Expanding Human-Agent Co-Creative Collaboration through Concurrent Interaction
Abstract:
Human collaborators coordinate dynamically through process visibility and workspace awareness, yet AI agents typically either provide only final outputs or expose read-only execution processes (e.g., planning, reasoning) without interpreting concurrent user actions on shared artifacts. Building on mixed-initiative interaction principles, we explore whether agents can achieve collaborative context awareness -- interpreting concurrent user actions on shared artifacts and adapting in real-time. Study 1 (N=10 professional designers) revealed that process visibility enabled reasoning about agent actions but exposed conflicts when agents could not distinguish feedback from independent work. We developed CLEO, which interprets collaborative intent and adapts in real-time. Study 2 (N=10, two-day with stimulated recall interviews) analyzed 214 turns, identifying five action patterns, six triggers, and four enabling factors explaining when designers choose delegation (70.1%), direction (28.5%), or concurrent work (31.8%). We present a decision model with six interaction loops, design implications, and an annotated dataset.

Authors:Eason Chen, Sophia Judicke, Kayla Beigh, Xinyi Tang, Isabel Wang, Nina Yuan, Zimo Xiao, Chuangji Li, Shizhuo Li, Reed Luttmer, Shreya Singh, Maria Yampolsky, Naman Parikh, Yvonne Zhao, Meiyi Chen, Scarlett Huang, Anishka Mohanty, Gregory Johnson, John Mackey, Jionghao Lin, Ken Koedinger
Title: Chat-Based Support Alone May Not Be Enough: Comparing Conversational and Embedded LLM Feedback for Mathematical Proof Learning
Abstract:
We evaluate GPTutor, an LLM-powered tutoring system for an undergraduate discrete mathematics course. It integrates two LLM-supported tools: a structured proof-review tool that provides embedded feedback on students' written proof attempts, and a chatbot for math questions. In a staggered-access study with 148 students, earlier access was associated with higher homework performance during the interval when only the experimental group could use the system, while we did not observe this performance increase transfer to exam scores. Usage logs show that students with lower self-efficacy and prior exam performance used both components more frequently. Session-level behavioral labels, produced by human coding and scaled using an automated classifier, characterize how students engaged with the chatbot (e.g., answer-seeking or help-seeking). In models controlling for prior performance and self-efficacy, higher chatbot usage and answer-seeking behavior were negatively associated with subsequent midterm performance, whereas proof-review usage showed no detectable independent association. Together, the findings suggest that chatbot-based support alone may not reliably support transfer to independent assessment of math proof-learning outcomes, whereas work-anchored, structured feedback appears less associated with reduced learning.

Authors:Haoyu Hu, Raja Marjieh, Katherine M Collins, Chenyi Li, Thomas L. Griffiths, Ilia Sucholutsky, Nori Jacoby
Title: Why Human Guidance Matters in Collaborative Vibe Coding
Abstract:
Writing code has been one of the most transformative ways for human societies to translate abstract ideas into tangible technologies. Modern AI is transforming this process by enabling experts and non-experts alike to generate code without actually writing code, but instead, through natural language instructions, or "vibe coding". While increasingly popular, the cumulative impact of vibe coding on productivity and collaboration, as well as the role of humans in this process, remains unclear. Here, we introduce a controlled experimental framework for studying collaborative vibe coding and use it to compare human-led, AI-led, and hybrid groups. Across 16 experiments involving 604 human participants, we show that people provide uniquely effective high-level instructions for vibe coding across iterations, whereas AI-provided instructions often result in performance collapse. We further demonstrate that hybrid systems perform best when humans retain directional control (providing the instructions), while evaluation is delegated to AI.

Authors:Zixin Chen, Yuhang Zeng, Sicheng Song, Yanna Lin, Xian Xu, Huamin Qu, Meng Xia
Title: VizQStudio: Iterative Visualization Literacy MCQs Design with Simulated Students
Abstract:
Multiple-choice questions (MCQs) are a widely used educational tool, particularly in domains such as visualization literacy that require broad conceptual coverage and support diverse real-world applications. However, designing high-quality visualization literacy MCQs remains challenging, as instructors must coordinate multimodal elements (e.g., charts, question stems, and distractors), address diverse visualization tasks, and accommodate learners with heterogeneous backgrounds. Existing visualization literacy assessments primarily rely on standardized, fixed item banks, offering limited support for iterative question design that adapts to differences in learners' abilities, backgrounds, and reasoning strategies. To address these challenges, we present VizQStudio, a visual analytics system that supports instructors in iteratively designing and refining visualization literacy MCQs using MLLM-powered simulated students. Instructors can specify diverse student profiles spanning demographics, knowledge levels, and learning-related traits. The system then visualizes how simulated students reason about and respond to different question components, helping instructors explore potential misconceptions, difficulty calibration, and design trade-offs prior to classroom deployment. We investigate VizQStudio through a mixed-method evaluation, including expert interviews, case studies, a classroom deployment, and a large-scale online study. Overall, this work reframes MLLM-based student simulation in assessment authoring as a design-time, exploratory aid. By examining both its value and limitations in realistic instructional settings, we surface design insights that inform how future systems can support instructor-centered, iterative, and responsible uses of AI for multimodal assessment design in visualization literacy and related domains.

Authors:Ruiwei Xiao, Runlong Ye, Xinying Hou, Jessica Wen, Harsh Kumar, Michael Liut, John Stamper
Title: Transforming GenAI Policy to Prompting Instruction: An RCT of Scalable Prompting Interventions in a CS1 Course
Abstract:
Despite universal GenAI adoption, students cannot distinguish task performance from actual learning and lack skills to leverage AI for learning, leading to worse exam performance when AI use remains unreflective. Yet few interventions teaching students to prompt AI as a tutor rather than solution provider have been validated at scale through randomized controlled trials (RCTs). To bridge this gap, we conducted a semester-long RCT (N=979) with four ICAP framework-based instructional conditions varying in engagement intensity with a pre-test, immediate and delayed post-test and surveys. Mixed methods analysis results showed: (1) All conditions significantly improved prompting skills, with gains increasing progressively from Condition 1 to Condition 4, validating ICAP's cognitive engagement hierarchy; (2) for students with similar pre-test scores, higher learning gain in immediate post-test predict higher final exam score, though no direct between-group differences emerged; (3) Our interventions are suitable and scalable solutions for diverse educational contexts, resources and learners. Together, this study makes empirical and theoretical contributions: (1) theoretically, we provided one of the first large-scale RCTs examining how cognitive engagement shapes learning in prompting literacy and clarifying the relationship between learning-oriented prompting skills and broader academic performance; (2) empirically, we offered timely design guidance for transforming GenAI classroom policies into scalable, actionable prompting literacy instruction to advance learning in the era of Generative AI.

Authors:Zhida Sun, Xiaodong Wang, Zhenyao Zhang, Min Lu, Dani Lischinski, Daniel Cohen-Or, Hui Huang
Title: Iconix: Controlling Semantics and Style in Progressive Icon Grids Generation
Abstract:
Visual communication often needs stylistically consistent icons that span concrete and abstract meanings, for use in diverse contexts. We present Iconix, a human-AI co-creative system that organizes icon generation along two axes: semantic richness (what is depicted) and visual complexity (how much detail). Given a user-specified concept, Iconix constructs a semantic scaffold of related analytical perspectives and employs chained, image-conditioned generation to produce a coherent style of exemplars. Each exemplar is then automatically distilled into a progressive sequence, from detailed and elaborate to abstract and simple. The resulting two-dimensional grid exposes a navigable space, helping designers reason jointly about figurative content and visual abstraction. A within-subjects study (N = 32) found that compared to a baseline workflow, participants produced icon grids more creatively, reported lower workload, and explored a coherent range of design variations. We discuss implications for human-machine co-creative approaches that couple semantic scaffolding with progressive simplification to support visual abstraction.

Authors:Junfeng Jiao, Abhejay Murali, Saleh Afroogh
Title: AI Empathy Erodes Cognitive Autonomy in Younger Users
Abstract:
Affective alignment in generative AI represents a systemic risk to the developmental autonomy of younger users. Although emotional mirroring is commonly seen as a hallmark of advanced human-machine interaction, it can also manifest as affective sycophancy, reinforcing a user's immediate emotional state. By providing a sense of objectivity to transient anxieties, these systems diminish the cognitive friction necessary for independent emotional management and critical thought. Reward models driven by RLHF could heighten this dilemma by embedding adult-focused definitions of helpfulness, unintentionally promoting emotional dependency in younger users rather than facilitating cognitive reappraisal. This paper exposes the misalignment between adult-labeled reward signals and the developmental requirements of younger users, proposing stoic architectures that emphasize functional neutrality to preserve user autonomy.

Authors:Dimitrios Apostolakis, Georgios Angelidis, Vasileios Argyriou, Panagiotis Sarigiannidis, Georgios Th. Papadopoulos
Title: Interactive Augmented Reality-enabled Outdoor Scene Visualization For Enhanced Real-time Disaster Response
Abstract:
A user-centered AR interface for disaster response is presented in this work that uses 3D Gaussian Splatting (3DGS) to visualize detailed scene reconstructions, while maintaining situational awareness and keeping cognitive load low. The interface relies on a lightweight interaction approach, combining World-in-Miniature (WIM) navigation with semantic Points of Interest (POIs) that can be filtered as needed, and it is supported by an architecture designed to stream updates as reconstructions evolve. User feedback from a preliminary evaluation indicates that this design is easy to use and supports real-time coordination, with participants highlighting the value of interaction and POIs for fast decision-making in context. Thorough user-centric performance evaluation demonstrates strong usability of the developed interface and high acceptance ratios.

Authors:Faria Huq, Zora Zhiruo Wang, Zhanqiu Guo, Venu Arvind Arangarajan, Tianyue Ou, Frank Xu, Shuyan Zhou, Graham Neubig, Jeffrey P. Bigham
Title: Modeling Distinct Human Interaction in Web Agents
Abstract:
Despite rapid progress in autonomous web agents, human involvement remains essential for shaping preferences and correcting agent behavior as tasks unfold. However, current agentic systems lack a principled understanding of when and why humans intervene, often proceeding autonomously past critical decision points or requesting unnecessary confirmation. In this work, we introduce the task of modeling human intervention to support collaborative web task execution. We collect CowCorpus, a dataset of 400 real-user web navigation trajectories containing over 4,200 interleaved human and agent actions. We identify four distinct patterns of user interaction with agents -- hands-off supervision, hands-on oversight, collaborative task-solving, and full user takeover. Leveraging these insights, we train language models (LMs) to anticipate when users are likely to intervene based on their interaction styles, yielding a 61.4-63.4% improvement in intervention prediction accuracy over base LMs. Finally, we deploy these intervention-aware models in live web navigation agents and evaluate them in a user study, finding a 26.5% increase in user-rated agent usefulness. Together, our results show structured modeling of human intervention leads to more adaptive, collaborative agents.

Authors:Jai Lal Lulla, Seyedmoein Mohsenimofidi, Matthias Galster, Jie M. Zhang, Sebastian Baltes, Christoph Treude
Title: On the Impact of AGENTS.md Files on the Efficiency of AI Coding Agents
Abstract:
AI coding agents such as Codex and Claude Code are increasingly used to autonomously contribute to software repositories. However, little is known about how repository-level configuration artifacts affect operational efficiency of the agents. In this paper, we study the impact of AGENTS$.$md files on the runtime and token consumption of AI coding agents operating on GitHub pull requests. We analyze 10 repositories and 124 pull requests, executing agents under two conditions: with and without an AGENTS$.$md file. We measure wall-clock execution time and token usage during agent execution. Our results show that the presence of AGENTS$.$md is associated with a lower median runtime ($Δ28.64$%) and reduced output token consumption ($Δ16.58$%), while maintaining a comparable task completion behavior. Based on these results, we discuss immediate implications for the configuration and deployment of AI coding agents in practice, and outline a broader research agenda on the role of repository-level instructions in shaping the behavior, efficiency, and integration of AI coding agents in software development workflows.

Authors:Hongxiao Li, Chenxi Wang, Fanda Fan, Zihan Wang, Wanling Gao, Lei Wang, Jianfeng Zhan
Title: On Meta-Evaluation
Abstract:
Evaluation is the foundation of empirical science, yet the evaluation of evaluation itself -- so-called meta-evaluation -- remains strikingly underdeveloped. While methods such as observational studies, design of experiments (DoE), and randomized controlled trials (RCTs) have shaped modern scientific practice, there has been little systematic inquiry into their comparative validity and utility across domains. Here we introduce a formal framework for meta-evaluation by defining the evaluation space, its structured representation, and a benchmark we call AxiaBench. AxiaBench enables the first large-scale, quantitative comparison of ten widely used evaluation methods across eight representative application domains. Our analysis reveals a fundamental limitation: no existing method simultaneously achieves accuracy and efficiency across diverse scenarios, with DoE and observational designs in particular showing significant deviations from real-world ground truth. We further evaluate a unified method of entire-space stratified sampling from previous evaluatology research, and the results report that it consistently outperforms prior approaches across all tested domains. These results establish meta-evaluation as a scientific object in its own right and provide both a conceptual foundation and a pragmatic tool set for advancing trustworthy evaluation in computational and experimental research.

Authors:Santosh Chapagain, MohammadReza EskandariNasab, Onur Vural, Shah Muhammad Hamdi, Soukaina Filali Boubrahimi
Title: SolarGPT-QA: A Domain-Adaptive Large Language Model for Educational Question Answering in Space Weather and Heliophysics
Abstract:
Solar activity, including solar flares, coronal mass ejections (CMEs), and geomagnetic storms, can significantly impact satellites, aviation, power grids, data centers, and space missions. Extreme solar events can cause substantial economic damage with limited advance warning, underscoring the importance of early-warning systems, accurate forecasting, and effective education in space science. Although large language models (LLMs) perform well on general tasks, they often lack domain-specific knowledge and pedagogical capability to clearly explain complex space science concepts. We introduce SolarGPT-QA, a question answering system based on a domain-adapted large language model built on the LLaMA-3 base model. The model is trained using scientific literature and large-scale question-answer data generated with GPT-4 and refined using Grok-3 in a student-friendly storytelling style. Human pairwise evaluations show that SolarGPT-QA outperforms general-purpose models in zero-shot settings and achieves competitive performance compared to instruction-tuned models for educational explanations in space weather and heliophysics. A small pilot student comprehension study further suggests improved clarity and accessibility of the generated explanations. Ablation experiments indicate that combining domain-adaptive pretraining with pedagogical fine-tuning is important for balancing scientific accuracy and educational effectiveness. This work represents an initial step toward a broader SolarGPT framework for space science education and forecasting.

Authors:Adarsh Pawar, Yuqiao Meng, Luoxi Tang, Zhaohan Xi
Title: Improving Clinical Data Accessibility Through Automated FHIR Data Transformation Tools
Abstract:
The Fast Healthcare Interoperability Resources (FHIR) standard has emerged as a widely adopted specification for exchanging structured clinical data across healthcare systems. However, raw FHIR resources are often complex, verbose, and difficult for clinicians and analysts to interpret without specialized tooling. This paper presents a lightweight, browser-based system that improves the accessibility of FHIR data by automatically transforming raw JSON resources into human-readable PDF and Excel reports, along with interactive data visualizations. The system supports both remote retrieval of FHIR resources from server endpoints and the upload of local FHIR JSON files, enabling both online and offline analysis. Using a modular React architecture with jsPDF, xlsx, and Recharts, the tool parses, normalizes, visualizes, and exports FHIR data in an intuitive format. Evaluation results demonstrate that the system enhances interpretability and usability while preserving the semantic integrity of FHIR structures. Limitations and future extensions, including expanded FHIR profile support and clinical validation, are discussed.

Authors:Nicolas Dickenmann, Yanis Merzouki, Sonia Laguna, Thy Nowak-Tran, Emanuele Palumbo, Julia E. Vogt, Gerda Binder
Title: Steering Generative Models for Accessibility: EasyRead Image Generation
Abstract:
EasyRead pictograms are simple, visually clear images that represent specific concepts and support comprehension for people with intellectual disabilities, low literacy, or language barriers. The large-scale production of EasyRead content has traditionally been constrained by the cost and expertise required to manually design pictograms. In contrast, automatic generation of such images could significantly reduce production time and cost, enabling broader accessibility across digital and printed materials. However, modern diffusion-based image generation models tend to produce outputs that exhibit excessive visual detail and lack stylistic stability across random seeds, limiting their suitability for clear and consistent pictogram generation. This challenge highlights the need for methods specifically tailored to accessibility-oriented visual content. In this work, we present a unified pipeline for generating EasyRead pictograms by fine-tuning a Stable Diffusion model using LoRA adapters on a curated corpus that combines augmented samples from multiple pictogram datasets. Since EasyRead pictograms lack a unified formal definition, we introduce an EasyRead score to benchmark pictogram quality and consistency. Our results demonstrate that diffusion models can be effectively steered toward producing coherent EasyRead-style images, indicating that generative models can serve as practical tools for scalable and accessible pictogram production.

Authors:Joydeep Chandra, Satyam Kumar Navneet, Yong Zhang
Title: The Topology of Recovery: Using Persistent Homology to Map Individual Mental Health Journeys in Online Communities
Abstract:
Understanding how individuals navigate mental health challenges over time is critical yet methodologically challenging. Traditional approaches analyze community-level snapshots, failing to capture dynamic individual recovery trajectories. We introduce a novel framework applying Topological Data Analysis (TDA) specifically persistent homology to model users' longitudinal posting histories as trajectories in semantic embedding space. Our approach reveals topological signatures of trajectory patterns: loops indicate cycling back to similar states (stagnation), while flares suggest exploring new coping strategies (growth). We propose Semantic Recovery Velocity (SRV), a novel metric quantifying the rate users move away from initial distress-focused posts in embedding space. Analyzing 15,847 r/depression trajectories and validating against multiple proxies, we demonstrate topological features predict self-reported improvement with 78.3% accuracy, outperforming sentiment baselines. This work contributes: (1) a TDA methodology for HCI mental health research, (2) interpretable topological signatures, and (3) design implications for adaptive mental health platforms with ethical guardrails.

Authors:Joydeep Chandra, Satyam Kumar Navneet, Yong Zhang
Title: TherapyProbe: Generating Design Knowledge for Relational Safety in Mental Health Chatbots Through Adversarial Simulation
Abstract:
As mental health chatbots proliferate to address the global treatment gap, a critical question emerges: How do we design for relational safety the quality of interaction patterns that unfold across conversations rather than the correctness of individual responses? Current safety evaluations assess single-turn crisis responses, missing the therapeutic dynamics that determine whether chatbots help or harm over time. We introduce TherapyProbe, a design probe methodology that generates actionable design knowledge by systematically exploring chatbot conversation trajectories through adversarial multi-agent simulation. Using open-source models, TherapyProbe surfaces relational safety failures interaction patterns like "validation spirals" where chatbots progressively reinforce hopelessness, or "empathy fatigue" where responses become mechanical over turns. Our contribution is translating these failures into a Safety Pattern Library of 23 failure archetypes with corresponding design recommendations. We contribute: (1) a replicable methodology requiring no API costs, (2) a clinically-grounded failure taxonomy, and (3) design implications for developers, clinicians, and policymakers.

Authors:Satyam Kumar Navneet, Joydeep Chandra, Yong Zhang
Title: When AI Writes, Whose Voice Remains? Quantifying Cultural Marker Erasure Across World English Varieties in Large Language Models
Abstract:
Large Language Models (LLMs) are increasingly used to ``professionalize'' workplace communication, often at the cost of linguistic identity. We introduce "Cultural Ghosting", the systematic erasure of linguistic markers unique to non-native English varieties during text processing. Through analysis of 22,350 LLM outputs generated from 1,490 culturally marked texts (Indian, Singaporean,& Nigerian English) processed by five models under three prompt conditions, we quantify this phenomenon using two novel metrics: Identity Erasure Rate (IER) & Semantic Preservation Score (SPS). Across all prompts, we find an overall IER of 10.26%, with model-level variation from 3.5% to 20.5% (5.9x range). Crucially, we identify a Semantic Preservation Paradox: models maintain high semantic similarity (mean SPS = 0.748) while systematically erasing cultural markers. Pragmatic markers (politeness conventions) are 1.9x more vulnerable than lexical markers (71.5% vs. 37.1% erasure). Our experiments demonstrate that explicit cultural-preservation prompts reduce erasure by 29% without sacrificing semantic quality.

Authors:Joydeep Chandra, Satyam Kumar Navneet, Yong Zhang
Title: GazeFlow: Personalized Ambient Soundscape Generation for Passive Strabismus Self-Monitoring
Abstract:
Strabismus affects 2-4% of the population, yet individuals recovering from corrective surgery lack accessible tools for monitoring eye alignment. Dichoptic therapies require active engagement & clinical supervision, limiting their adoption for passive self-awareness. We present GazeFlow, a browser-based self-monitoring system that uses a personalized temporal autoencoder to detect eye drift patterns from webcam-based gaze tracking & provides ambient audio feedback. Unlike alert-based systems, GazeFlow operates according to calm computing principles, morphing musical parameters in proportion to drift severity while remaining in peripheral awareness. We address the challenges of inter-individual variability & domain transfer (1000Hz research to 30Hz webcam) by introducing Binocular Temporal-Frequency Disentanglement (BTFD), Contrastive Biometric Pre-training (CBP), & Gaze-MAML. We validate our approach on the GazeBase dataset (N=50) achieving F1=0.84 for drift detection, & conduct a preliminary user study (N=6) with participants having intermittent strabismus. Participants reported increased awareness of their eye behaviour (M=5.8/7) & preference for ambient feedback over alerts (M=6.2/7). We discuss the system's potential for self-awareness applications & outline directions for clinical validation.

Authors:Satyam Kumar Navneet, Joydeep Chandra, Yong Zhang
Title: Orchestrating Attention: Bringing Harmony to the 'Chaos' of Neurodivergent Learning States
Abstract:
Adaptive learning systems optimize content delivery based on performance metrics but ignore the dynamic attention fluctuations that characterize neurodivergent learners. We present AttentionGuard, a framework that detects engagement-attention states from privacy-preserving behavioral signals and adapts interface elements accordingly. Our approach models four attention states derived from ADHD phenomenology and implements five novel UI adaptation patterns including bi-directional scaffolding that responds to both understimulation and overstimulation. We validate our detection model on the OULAD dataset, achieving 87.3% classification accuracy, and demonstrate correlation with clinical ADHD profiles through cross-validation on the HYPERAKTIV dataset. A Wizard-of-Oz study with 11 adults showing ADHD characteristics found significantly reduced cognitive load in the adaptive condition (NASA-TLX: 47.2 vs 62.8, Cohen's d=1.21, p=0.008) and improved comprehension (78.4% vs 61.2%, p=0.009). Concordance analysis showed 84% agreement between wizard decisions and automated classifier predictions, supporting deployment feasibility. The system is presented as an interactive demo where observers can inspect detected attention states, observe real-time UI adaptations, and compare automated decisions with human-in-the-loop overrides. We contribute empirically validated UI patterns for attention-adaptive interfaces and evidence that behavioral attention detection can meaningfully support neurodivergent learning experiences.

Authors:Yunhao Luo, Arthur Caetano, Avinash Ajit Nargund, Tobias Höllerer, Misha Sra
Title: How Users Perceive Mixed-Initiative AI: Attitudes Toward Assistance in Problem Solving
Abstract:
In mixed-initiative systems, the mode of AI assistance delivery can be as consequential as the assistance itself. We investigated two assistance delivery modes: on-demand help (users request via Button) and pre-scheduled help (assistance delivered at user-selected intervals, with user actions resetting the Timer). To evaluate these modes, we selected Rush Hour puzzles as the human-AI collaborative task because they capture elements of real-world problem solving such as analysis, resource management, and decision-making under constraints. To enhance ecological validity, we imposed monetary costs for both time and AI assistance, simulating scenarios where people must balance implicit or explicit trade-offs such as time pressure, financial limitations, or opportunity costs. Although task performance was comparable across modes, participants who used the pre-scheduled (Timer) mode reported more positive perceptions of the AI, even when their ending budget was low. This suggests that assistance delivery mode can shape user experience independent of task outcomes, indicating that human-AI systems may need to consider how AI assistance is delivered alongside improving task performance.

Authors:Florian 'Floyd' Mueller, Nadia Bianchi-Berthouze, Misha Sra, Mar Gonzalez-Franco, Henning Pohl, Susanne Boll, Richard Byrne, Arthur Caetano, Masahiko Inami, Jarrod Knibbe, Per Ola Kristensson, Xiang Li, Zhuying Li, Joe Marshall, Louise Petersen Matjeka, Minna Nygren, Rakesh Patibanda, Sara Price, Harald Reiterer, Aryan Saini, Oliver Schneider, Ambika Shahu, Jürgen Steimle, Phoebe O. Toups Dugas, Don Samitha Elvitigala
Title: Grand Challenges around Designing Computers' Control Over Our Bodies
Abstract:
Advances in emerging technologies, such as on-body mechanical actuators and electrical muscle stimulation, have allowed computers to take control over our bodies. This presents opportunities as well as challenges, raising fundamental questions about agency and the role of our bodies when interacting with technology. To advance this research field as a whole, we brought together expert perspectives in a week-long seminar to articulate the grand challenges that should be tackled when it comes to the design of computers' control over our bodies. These grand challenges span technical, design, user, and ethical aspects. By articulating these grand challenges, we aim to begin initiating a research agenda that positions bodily control not only as a technical feature but as a central, experiential, and ethical concern for future human-computer interaction endeavors.

Authors:Andy Wang, Xu Yan, Brandon McMahan, Michael Zhou, Yuyang Yuan, Johannes Y. Lee, Ali Shreif, Matthew Li, Zhenghao Peng, Bolei Zhou, Yuchen Cui, Jonathan C. Kao
Title: DiSCo: Diffusion Sequence Copilots for Shared Autonomy
Abstract:
Shared autonomy combines human user and AI copilot actions to control complex systems such as robotic arms. When a task is challenging, requires high dimensional control, or is subject to corruption, shared autonomy can significantly increase task performance by using a trained copilot to effectively correct user actions in a manner consistent with the user's goals. To significantly improve the performance of shared autonomy, we introduce Diffusion Sequence Copilots (DiSCo): a method of shared autonomy with diffusion policy that plans action sequences consistent with past user actions. DiSCo seeds and inpaints the diffusion process with user-provided actions with hyperparameters to balance conformity to expert actions, alignment with user intent, and perceived responsiveness. We demonstrate that DiSCo substantially improves task performance in simulated driving and robotic arm tasks. Project website: https://sites.google.com/view/disco-shared-autonomy/

Authors:Ziyang Guo, Yifan Wu, Jason Hartline, Kenneth Holstein, Jessica Hullman
Title: ComplLLM: Fine-tuning LLMs to Discover Complementary Signals for Decision-making
Abstract:
Multi-agent decision pipelines can outperform single agent workflows when complementarity holds, i.e., different agents bring unique information to the table to inform a final decision. We propose ComplLLM, a post-training framework based on decision theory that fine-tunes a decision-assistant LLM using complementary information as reward to output signals that complement existing agent decisions. We validate ComplLLM on synthetic and real-world tasks involving domain experts, demonstrating how the approach recovers known complementary information and produces plausible explanations of complementary signals to support downstream decision-makers.

Authors:Tae Soo Kim, Yoonjoo Lee, Jaesang Yu, John Joon Young Chung, Juho Kim
Title: DiscoverLLM: From Executing Intents to Discovering Them
Abstract:
To handle ambiguous and open-ended requests, Large Language Models (LLMs) are increasingly trained to interact with users to surface intents they have not yet expressed (e.g., ask clarification questions). However, users are often ambiguous because they have not yet formed their intents: they must observe and explore outcomes to discover what they want. Simply asking "what kind of tone do you want?" fails when users themselves do not know. We introduce DiscoverLLM, a novel and generalizable framework that trains LLMs to help users form and discover their intents. Central to our approach is a novel user simulator that models cognitive state with a hierarchy of intents that progressively concretize as the model surfaces relevant options -- where the degree of concretization serves as a reward signal that models can be trained to optimize. Resulting models learn to collaborate with users by adaptively diverging (i.e., explore options) when intents are unclear, and converging (i.e., refine and implement) when intents concretize. Across proposed interactive benchmarks in creative writing, technical writing, and SVG drawing, DiscoverLLM achieves over 10% higher task performance while reducing conversation length by up to 40%. In a user study with 75 human participants, DiscoverLLM improved conversation satisfaction and efficiency compared to baselines.

Authors:Sneha Shashidhara, Vivienne Bihe Chi, Abhay P Singh, Lyle Ungar, Sharath Chandra Guntuku
Title: Voice-Based Chatbots for English Speaking Practice in Multilingual Low-Resource Indian Schools: A Multi-Stakeholder Study
Abstract:
Spoken English proficiency is a powerful driver of economic mobility for low-income Indian youth, yet opportunities for spoken practice remain scarce in schools. We investigate the deployment of a voice-based chatbot for English conversation practice across four low-resource schools in Delhi. Through a six-day field study combining observations and interviews, we captured the perspectives of students, teachers, and principals. Findings confirm high demand across all groups, with notable gains in student speaking confidence. Our multi-stakeholder analysis surfaced a tension in long-term adoption vision: students favored open-ended conversational practice, while administrators emphasized curriculum-aligned assessment. We offer design recommendations for voice-enabled chatbots in low-resource multilingual contexts, highlighting the need for more intelligible speech output for non-native learners, one-tap interactions with simplified interfaces, and actionable analytics for educators. Beyond language learning, our findings inform the co-design of future AI-based educational technologies that are socially sustainable within the complex ecosystem of low-resource schools.

Authors:Minkyu Kweon, Seokhyeon Park, Soohyun Lee, You Been Lee, Jeongmin Rhee, Jinwook Seo
Title: GhostUI: Unveiling Hidden Interactions in Mobile UI
Abstract:
Modern mobile applications rely on hidden interactions--gestures without visual cues like long presses and swipes--to provide functionality without cluttering interfaces. While experienced users may discover these interactions through prior use or onboarding tutorials, their implicit nature makes them difficult for most users to uncover. Similarly, mobile agents--systems designed to automate tasks on mobile user interfaces, powered by vision language models (VLMs)--struggle to detect veiled interactions or determine actions for completing tasks. To address this challenge, we present GhostUI, a new dataset designed to enable the detection of hidden interactions in mobile applications. GhostUI provides before-and-after screenshots, simplified view hierarchies, gesture metadata, and task descriptions, allowing VLMs to better recognize concealed gestures and anticipate post-interaction states. Quantitative evaluations with VLMs show that models fine-tuned on GhostUI outperform baseline VLMs, particularly in predicting hidden interactions and inferring post-interaction screens, underscoring GhostUI's potential as a foundation for advancing mobile task automation.

Authors:Seokhyeon Park, Soohyun Lee, Eugene Choi, Hyunwoo Kim, Minkyu Kweon, Yumin Song, Jinwook Seo
Title: Bridging Gulfs in UI Generation through Semantic Guidance
Abstract:
While generative AI enables high-fidelity UI generation from text prompts, users struggle to articulate design intent and evaluate or refine results-creating gulfs of execution and evaluation. To understand the information needed for UI generation, we conducted a thematic analysis of UI prompting guidelines, identifying key design semantics and discovering that they are hierarchical and interdependent. Leveraging these findings, we developed a system that enables users to specify semantics, visualize relationships, and extract how semantics are reflected in generated UIs. By making semantics serve as an intermediate representation between human intent and AI output, our system bridges both gulfs by making requirements explicit and outcomes interpretable. A comparative user study suggests that our approach enhances users' perceived control over intent expression, outcome interpretation, and facilitates more predictable, iterative refinement. Our work demonstrates how explicit semantic representation enables systematic and explainable exploration of design possibilities in AI-driven UI design.

Authors:Fabio Morreale, Joan Serrà, Yuki Mitsufuji
Title: Emergent, not Immanent: A Baradian Reading of Explainable AI
Abstract:
Explainable AI (XAI) is frequently positioned as a technical problem of revealing the inner workings of an AI model. This position is affected by unexamined onto-epistemological assumptions: meaning is treated as immanent to the model, the explainer is positioned outside the system, and a causal structure is presumed recoverable through computational techniques. In this paper, we draw on Barad's agential realism to develop an alternative onto-epistemology of XAI. We propose that interpretations are material-discursive performances that emerge from situated entanglements of the AI model with humans, context, and the interpretative apparatus. To develop this position, we read a comprehensive set of XAI methods through agential realism and reveal the assumptions and limitations that underpin several of these methods. We then articulate the framework's ethical dimension and propose design directions for XAI interfaces that support emergent interpretation, using a speculative text-to-music interface as a case study.

Authors:Caleb Wohn, Buse Çarık, Xiaohan Ding, Sang Won Lee, Young-Ho Kim, Eugenia H. Rho
Title: "Are we writing an advice column for Spock here?" Understanding Stereotypes in AI Advice for Autistic Users
Abstract:
Autistic individuals sometimes disclose autism when asking LLMs for social advice, hoping for more personalized responses. However, they also recognize that these systems may reproduce stereotypes, raising uncertainty about the risks and benefits of disclosure. We conducted a mixed-methods study combining a large-scale LLM audit experiment with interviews involving 11 autistic participants. We developed a six-step pipeline operationalizing 12 documented autism stereotypes into decision-making scenarios framed as users requesting advice (e.g., "Should I do A or B?"). We generated 345,000 responses from six LLMs and measured how advice shifted when prompts disclosed autism versus when they did not. When autism was disclosed, LLMs disproportionately recommended avoiding stereotypically stressful situations, including social events, confrontations, new experiences, and romantic relationships. While some participants viewed this as affirming, others criticized it as infantilizing or undermining opportunities for growth. Our study illuminates how the intermingling of affirmation and stereotyping complicates the personalization of LLMs.

Authors:Po-han Li, Shenghui Chen, Ufuk Topcu, Sandeep Chinchali
Title: ViSIL: Unified Evaluation of Information Loss in Multimodal Video Captioning
Abstract:
Multimodal video captioning condenses dense footage into a structured format of keyframes and natural language. By creating a cohesive multimodal summary, this approach anchors generative AI in rich semantic evidence and serves as a lightweight proxy for high-efficiency retrieval. However, traditional metrics like BLEU or ROUGE fail to quantify information coverage across disparate modalities, such as comparing a paragraph of text to a sequence of keyframes. To address this, we propose the Video Summary Information Loss (ViSIL) score, an information-theoretic framework that quantifies the video information not captured by a summary via vision-language model (VLM) inference. By measuring the information loss, ViSIL is a unified metric that enables direct comparison across multimodal summary formats despite their structural discrepancies. Our results demonstrate that ViSIL scores show a statistically significant correlation with both human and VLM performance on Video Question Answering (VQA) tasks. ViSIL also enables summary selection to optimize the trade-off between information loss and processing speed, establishing a Pareto-optimal frontier that outperforms text summaries by $7\%$ in VQA accuracy without increasing processing load.

Authors:Yingchaojie Feng, Qiang Huang, Xiaoya Xie, Zhaorui Yang, Jun Yu, Wei Chen, Anthony K. H. Tung
Title: IDRBench: Interactive Deep Research Benchmark
Abstract:
Deep research agents powered by Large Language Models (LLMs) can perform multi-step reasoning, web exploration, and long-form report generation. However, most existing systems operate in an autonomous manner, assuming fully specified user intent and evaluating only final outputs. In practice, research goals are often underspecified and evolve during exploration, making sustained interaction essential for robust alignment. Despite its importance, interaction remains largely invisible to existing deep research benchmarks, which neither model dynamic user feedback nor quantify its costs. We introduce IDRBench, the first benchmark for systematically evaluating interactive deep research. IDRBench combines a modular multi-agent research framework with on-demand interaction, a scalable reference-grounded user simulator, and an interaction-aware evaluation suite that jointly measures interaction benefits (quality and alignment) and costs (turns and tokens). Experiments across seven state-of-the-art LLMs show that interaction consistently improves research quality and robustness, often outweighing differences in model capacity, while revealing substantial trade-offs in interaction efficiency.

Authors:Carmen Scheidemann, Andrei Cramariuc, Changan Chen, Jia-Ruei Chiu, Marco Hutter
Title: Beyond Cybathlon: On-demand Quadrupedal Assistance for People with Limited Mobility
Abstract:
Background: Assistance robots have the potential to increase the independence of people who need daily care due to limited mobility or being wheelchair-bound. Current solutions of attaching robotic arms to motorized wheelchairs offer limited additional mobility at the cost of increased size and reduced wheelchair maneuverability. Methods: We present an on-demand quadrupedal assistance robot system controlled via a shared autonomy approach, which combines semi-autonomous task execution with human teleoperation. Due to the mobile nature of the system it can assist the operator whenever needed and perform autonomous tasks independently, without otherwise restricting their mobility. We automate pick-and-place tasks, as well as robot movement through the environment with semantic, collision-aware navigation. For teleoperation, we present a mouth-level joystick interface that enables an operator with reduced mobility to control the robot's end effector for precision manipulation. Results: We showcase our system in the \textit{Cybathlon 2024 Assistance Robot Race}, and validate it in an at-home experimental setup, where we measure task completion times and user satisfaction. We find our system capable of assisting in a broad variety of tasks, including those that require dexterous manipulation. The user study confirms the intuition that increased robot autonomy alleviates the operator's mental load. Conclusions: We present a flexible system that has the potential to help people in wheelchairs maintain independence in everyday life by enabling them to solve mobile manipulation problems without external support. We achieve results comparable to previous state-of-the-art on subjective metrics while allowing for more autonomy of the operator and greater agility for manipulation.

Authors:Xinyu Li, Linxuan Zhao, Roberto Martinez-Maldonado, Dragan Gasevic, Lixiang Yan
Title: Toward Scalable Co-located Practical Learning: Assisting with Computer Vision and Multimodal Analytics
Abstract:
This study examined whether a single ceiling-mounted camera could be used to capture fine-grained learning behaviours in co-located practical learning. In undergraduate nursing simulations, teachers first identified seven observable behaviour categories, which were then used to train a YOLO-based detector. Video data were collected from 52 sessions, and analyses focused on Scenario A because it produced greater behavioural variation than Scenario B. Annotation reliability was high (F1=0.933). On the held-out test set, the model achieved a precision of 0.789, a recall of 0.784, and an mAP@0.5 of 0.827. When only behaviour frequencies were compared, no robust differences were found between high- and low-performing groups. However, when behaviour labels were analysed together with spatial context, clear differences emerged in both task and collaboration performance. Higher-performing teams showed more patient interaction in the primary work area, whereas lower-performing teams showed more phone-related activity and more activity in secondary areas. These findings suggest that behavioural data are more informative when interpreted together with where they occur. Overall, the study shows that a single-camera computer vision approach can support the analysis of teamwork and task engagement in face-to-face practical learning without relying on wearable sensors.

Authors:Runlong Ye, Naaz Sibia, Angela Zavaleta Bernuy, Tingting Zhu, Carolina Nobre, Viktoria Pammer-Schindler, Michael Liut
Title: From Toil to Thought: Designing for Strategic Exploration and Responsible AI in Systematic Literature Reviews
Abstract:
Systematic Literature Reviews (SLRs) are fundamental to scientific progress, yet the process is hindered by a fragmented tool ecosystem that imposes a high cognitive load. This friction suppresses the iterative, exploratory nature of scholarly work. To investigate these challenges, we conducted an exploratory design study with 20 experienced researchers. This study identified key friction points: 1) the high cognitive load of managing iterative query refinement across multiple databases, 2) the overwhelming scale and pace of publication of modern literature, and 3) the tension between automation and scholarly agency. Informed by these findings, we developed ARC, a design probe that operationalizes solutions for multi-database integration, transparent iterative search, and verifiable AI-assisted screening. A comparative user study with 8 researchers suggests that an integrated environment facilitates a transition in scholarly work, moving researchers from managing administrative overhead to engaging in strategic exploration. By utilizing external representations to scaffold strategic exploration and transparent AI reasoning, our system supports verifiable judgment, aiming to augment expert contributions from initial creation through long-term maintenance of knowledge synthesis.

Authors:Svetlana Churina, Kokil Jaidka, Anab Maulana Barik, Harshit Aneja, Cai Yang, Wynne Hsu, Mong Li Lee
Title: Althea: Human-AI Collaboration for Fact-Checking and Critical Reasoning
Abstract:
The web's information ecosystem demands fact-checking systems that are both scalable and epistemically trustworthy. Automated approaches offer efficiency but often lack transparency, while human verification remains slow and inconsistent. We introduce Althea, a retrieval-augmented system that integrates question generation, evidence retrieval, and structured reasoning to support user-driven evaluation of online claims. On the AVeriTeC benchmark, Althea achieves a Macro-F1 of 0.44, outperforming standard verification pipelines and improving discrimination between supported and refuted claims. We further evaluate Althea through a controlled user study and a longitudinal survey experiment (N = 642), comparing three interaction modes that vary in the degree of scaffolding: an Exploratory mode with guided reasoning, a Summary mode providing synthesized verdicts, and a Self-search mode that offers procedural guidance without algorithmic intervention. Results show that guided interaction produces the strongest immediate gains in accuracy and confidence, while self-directed search yields the most persistent improvements over time. This pattern suggests that performance gains are not driven solely by effort or exposure, but by how cognitive work is structured and internalized.

Authors:Lufeng Feng, Baomin Xu, Haoran Zhang, Bihai Lin, Zuxuan Deng, Sidi Tao, Chenyu Liu, Shifan Jia, Li Duan, Ziyu Jia
Title: A Multimodal fNIRS-EEG Dataset for Unilateral Limb Motor Imagery
Abstract:
Unilateral limb motor imagery (MI) plays an important role in upper-limb motor rehabilitation and precise control of external devices, and places higher demands on spatial resolution. However, most existing public datasets focus on binary- or four-class left-right limb paradigms that mainly exploit coarse hemispheric lateralization, and there is still a lack of multimodal datasets that simultaneously record EEG and fNIRS for unilateral multi-directional MI. To address this gap, we constructed MIND, a public motor imagery fNIRS-EEG dataset based on a four-class directional MI paradigm of the right upper limb. The dataset includes 64-channel EEG recordings (1000 Hz) and 51-channel fNIRS recordings (47.62 Hz) from 30 participants (12 females, 18 males; aged 19.0-25.0 years). We analyse the spatiotemporal characteristics of EEG spectral power and hemodynamic responses, and validate the potential advantages of hybrid fNIRS-EEG BCIs in terms of classification accuracy. We expect that this dataset will facilitate the evaluation and comparison of neuroimaging analysis and decoding methods.

Authors:Runlong Ye, Oliver Huang, Jessica He, Michael Liut
Title: Exploring Emerging Norms of AI Disclosure in Programming Education
Abstract:
Generative AI blurs the lines of authorship in computing education, creating uncertainty around how students should attribute AI assistance. To examine these emerging norms, we conducted a factorial vignette study with 94 computer science students across 102 unique scenarios, systematically manipulating assessment type, AI autonomy, student activity, prior knowledge, and human refinement effort. This paper details how these factors influence students' perceptions of ownership and disclosure preferences. Our findings indicate that attribution judgments are primarily driven by different levels of AI assistance and human refinement. We also found that students' perception of authorship significantly predicts their policy expectations. We conclude by proposing a shift from statement-style policies to process-oriented attribution, transforming disclosure into a pedagogical mechanism for fostering critical engagement with AI-generated content.

Authors:Candida M. Greco, Lucio La Cava, Andrea Tagarelli
Title: Culturally Grounded Personas in Large Language Models: Characterization and Alignment with Socio-Psychological Value Frameworks
Abstract:
Despite the growing utility of Large Language Models (LLMs) for simulating human behavior, the extent to which these synthetic personas accurately reflect world and moral value systems across different cultural conditionings remains uncertain. This paper investigates the alignment of synthetic, culturally-grounded personas with established frameworks, specifically the World Values Survey (WVS), the Inglehart-Welzel Cultural Map, and Moral Foundations Theory. We conceptualize and produce LLM-generated personas based on a set of interpretable WVS-derived variables, and we examine the generated personas through three complementary lenses: positioning on the Inglehart-Welzel map, which unveils their interpretation reflecting stable differences across cultural conditionings; demographic-level consistency with the World Values Survey, where response distributions broadly track human group patterns; and moral profiles derived from a Moral Foundations questionnaire, which we analyze through a culture-to-morality mapping to characterize how moral responses vary across different cultural configurations. Our approach of culturally-grounded persona generation and analysis enables evaluation of cross-cultural structure and moral variation.

Authors:Runlong Ye, Oliver Huang, Patrick Yung Kang Lee, Michael Liut, Carolina Nobre, Ha-Kyung Kong
Title: Reflexis: Supporting Reflexivity and Rigor in Collaborative Qualitative Analysis through Design for Deliberation
Abstract:
Reflexive Thematic Analysis (RTA) is a critical method for generating deep interpretive insights. Yet its core tenets, including researcher reflexivity, tangible analytical evolution, and productive disagreement, are often poorly supported by software tools that prioritize speed and consensus over interpretive depth. To address this gap, we introduce Reflexis, a collaborative workspace that centers these practices. It supports reflexivity by integrating in-situ reflection prompts, makes code evolution transparent and tangible, and scaffolds collaborative interpretation by turning differences into productive, positionality-aware dialogue. Results from our paired-analyst study (N=12) indicate that Reflexis encouraged participants toward more granular reflection and reframed disagreements as productive conversations. The evaluation also surfaced key design tensions, including a desire for higher-level, networked memos and more user control over the timing of proactive alerts. Reflexis contributes a design framework for tools that prioritize rigor and transparency to support deep, collaborative interpretation in an age of automation.

Authors:Zijian Zhang, Fangshi Du, Xingjian Liu, Pan Chen, Oliver Huang, Runlong Ye, Michael Liut, Alán Aspuru-Guzik
Title: TreeWriter: AI-Assisted Hierarchical Planning and Writing for Long-Form Documents
Abstract:
Long documents pose many challenges to current intelligent writing systems. These include maintaining consistency across sections, sustaining efficient planning and writing as documents become more complex, and effectively providing and integrating AI assistance to the user. Existing AI co-writing tools offer either inline suggestions or limited structured planning, but rarely support the entire writing process that begins with high-level ideas and ends with polished prose, in which many layers of planning and outlining are needed. Here, we introduce TreeWriter, a hierarchical writing system that represents documents as trees and integrates contextual AI support. TreeWriter allows authors to create, save, and refine document outlines at multiple levels, facilitating drafting, understanding, and iterative editing of long documents. A built-in AI agent can dynamically load relevant content, navigate the document hierarchy, and provide context-aware editing suggestions. A within-subject study (N=12) comparing TreeWriter with Google Docs + Gemini on long-document editing and creative writing tasks shows that TreeWriter improves idea exploration/development, AI helpfulness, and perceived authorial control. A two-month field deployment (N=8) further demonstrated that hierarchical organization supports collaborative writing. Our findings highlight the potential of hierarchical, tree-structured editors with integrated AI support and provide design guidelines for future AI-assisted writing tools that balance automation with user agency.

Authors:Xuan Luo, Lewei Yao, Libo Zhao, Lanqing Hong, Kai Chen, Dehua Tao, Daxin Tan, Ruifeng Xu, Jing Li
Title: AEQ-Bench: Measuring Empathy of Omni-Modal Large Models
Abstract:
While the automatic evaluation of omni-modal large models (OLMs) is essential, assessing empathy remains a significant challenge due to its inherent affectivity. To investigate this challenge, we introduce AEQ-Bench (Audio Empathy Quotient Benchmark), a novel benchmark to systematically assess two core empathetic capabilities of OLMs: (i) generating empathetic responses by comprehending affective cues from multi-modal inputs (audio + text), and (ii) judging the empathy of audio responses without relying on text transcription. Compared to existing benchmarks, AEQ-Bench incorporates two novel settings that vary in context specificity and speech tone. Comprehensive assessment across linguistic and paralinguistic metrics reveals that (1) OLMs trained with audio output capabilities generally outperformed models with text-only outputs, and (2) while OLMs align with human judgments for coarse-grained quality assessment, they remain unreliable for evaluating fine-grained paralinguistic expressiveness.

Authors:Chuhao Jin, Rui Zhang, Qingzhe Gao, Haoyu Shi, Dayu Wu, Yichen Jiang, Yihan Wu, Ruihua Song
Title: SentiAvatar: Towards Expressive and Interactive Digital Humans
Abstract:
We present SentiAvatar, a framework for building expressive interactive 3D digital humans, and use it to create SuSu, a virtual character that speaks, gestures, and emotes in real time. Achieving such a system remains challenging, as it requires jointly addressing three key problems: the lack of large-scale, high-quality multimodal data, robust semantic-to-motion mapping, and fine-grained frame-level motion-prosody synchronization. To solve these problems, first, we build SuSuInterActs (21K clips, 37 hours), a dialogue corpus captured via optical motion capture around a single character with synchronized speech, full-body motion, and facial expressions. Second, we pre-train a Motion Foundation Model on 200K+ motion sequences, equipping it with rich action priors that go well beyond the conversation. We then propose an audio-aware plan-then-infill architecture that decouples sentence-level semantic planning from frame-level prosody-driven interpolation, so that generated motions are both semantically appropriate and rhythmically aligned with speech. Experiments show that SentiAvatar achieves state-of-the-art on both SuSuInterActs (R@1 43.64%, nearly 2 times the best baseline) and BEATv2 (FGD 4.941, BC 8.078), producing 6s of output in 0.3s with unlimited multi-turn streaming. The source code, model, and dataset are available at https://sentiavatar.github.io.

Authors:Emely Rosbach, Jonas Ammeling, Jonathan Ganz, Christof Albert Bertram, Thomas Conrad, Andreas Riener, Marc Aubreville
Title: Stuck on Suggestions: Automation Bias, the Anchoring Effect, and the Factors That Shape Them in Computational Pathology
Abstract:
Artificial intelligence (AI)-driven decision support systems can improve diagnostic accuracy and efficiency in computational pathology. However, collaboration between human experts and AI may introduce cognitive biases such as automation and anchoring bias, where users adopt system predictions blindly or are disproportionately influenced by AI advice, even when inaccurate. These effects may be amplified under time pressure, common in routine pathology, or shaped by individual user characteristics. We conducted an online experiment in which pathology experts (n = 28) estimated tumor cell percentages: once independently and once with AI support. A subset of estimations in each condition was performed under time strain. Overall, AI assistance improved diagnostic performance but introduced a 7% automation bias rate, defined as accepted negative consultations where previously correct independent judgments were overturned by incorrect AI advice. While time pressure did not increase the frequency of automation bias, it appeared to intensify its severity, reflected in stronger performance declines associated with increased AI reliance under cognitive load. A linear mixed-effects model (LMM) simulating weighted averaging showed a statistically significant positive coefficient for AI advice, indicating moderate anchoring on system output. This effect increased under time pressure, suggesting anchoring bias becomes more pronounced when cognitive resources are limited. A second LMM assessing automation reliance, a proxy for automation and anchoring bias, showed that professional experience and self-efficacy were associated with lower dependence on AI, whereas higher confidence during AI-assisted decisions was tied to increased AI reliance. These findings highlight the dual nature of AI integration in clinical workflows: improving performance while introducing risks of bias-driven diagnostic errors.

Authors:Zeyu Fang, Yuxin Lin, Cheng Liu, Beomyeol Yu, Zeyuan Yang, Rongqian Chen, Taeyoung Lee, Mahdi Imani, Tian Lan
Title: Uncertainty Mitigation and Intent Inference: A Dual-Mode Human-Machine Joint Planning System
Abstract:
Effective human-robot collaboration in open-world environments requires joint planning under uncertain conditions. However, existing approaches often treat humans as passive supervisors, preventing autonomous agents from becoming human-like teammates that can actively model teammate behaviors, reason about knowledge gaps, query, and elicit responses through communication to resolve uncertainties. To address these limitations, we propose a unified human-robot joint planning system designed to tackle dual sources of uncertainty: task-relevant knowledge gaps and latent human intent. Our system operates in two complementary modes. First, an uncertainty-mitigation joint planning module enables two-way conversations to resolve semantic ambiguity and object uncertainty. It utilizes an LLM-assisted active elicitation mechanism and a hypothesis-augmented A^* search, subsequently computing an optimal querying policy via dynamic programming to minimize interaction and verification costs. Second, a real-time intent-aware collaboration module maintains a probabilistic belief over the human's latent task intent via spatial and directional cues, enabling dynamic, coordination-aware task selection for agents without explicit communication. We validate the proposed system in both Gazebo simulations and real-world UAV deployments integrated with a Vision-Language Model (VLM)-based 3D semantic perception pipeline. Experimental results demonstrate that the system significantly cuts the interaction cost by 51.9% in uncertainty-mitigation planning and reduces the task execution time by 25.4% in intent-aware cooperation compared to the baselines.

Authors:Rachel Poonsiriwong, Chayapatr Archiwaranguprok, Pat Pataranutaporn
Title: "Death" of a Chatbot: Investigating and Designing Toward Psychologically Safe Endings for Human-AI Relationships
Abstract:
Millions of users form emotional attachments to AI companions like Character AI, Replika, and ChatGPT. When these relationships end through model updates, safety interventions, or platform shutdowns, users receive no closure, reporting grief comparable to human loss. As regulations mandate protections for vulnerable users, discontinuation events will accelerate, yet no platform has implemented deliberate end-of-"life" design. Through grounded theory analysis of AI companion communities, we find that discontinuation is a sense-making process shaped by how users attribute agency, perceive finality, and anthropomorphize their companions. Strong anthropomorphization co-occurs with intense grief; users who perceive change as reversible become trapped in fixing cycles; while user-initiated endings demonstrate greater closure. Synthesizing grief psychology with Self-Determination Theory, we develop four design principles and artifacts demonstrating how platforms might provide closure and orient users toward human connection. We contribute the first framework for designing psychologically safe AI companion discontinuation.

Authors:Zeyu Fang, Tian Lan, Mahdi Imani
Title: MINT: Minimal Information Neuro-Symbolic Tree for Objective-Driven Knowledge-Gap Reasoning and Active Elicitation
Abstract:
Joint planning through language-based interactions is a key area of human-AI teaming. Planning problems in the open world often involve various aspects of incomplete information and unknowns, e.g., objects involved, human goals/intents -- thus leading to knowledge gaps in joint planning. We consider the problem of discovering optimal interaction strategies for AI agents to actively elicit human inputs in object-driven planning. To this end, we propose Minimal Information Neuro-Symbolic Tree (MINT) to reason about the impact of knowledge gaps and leverage self-play with MINT to optimize the AI agent's elicitation strategies and queries. More precisely, MINT builds a symbolic tree by making propositions of possible human-AI interactions and by consulting a neural planning policy to estimate the uncertainty in planning outcomes caused by remaining knowledge gaps. Finally, we leverage LLM to search and summarize MINT's reasoning process and curate a set of queries to optimally elicit human inputs for best planning performance. By considering a family of extended Markov decision processes with knowledge gaps, we analyze the return guarantee for a given MINT with active human elicitation. Our evaluation on three benchmarks involving unseen/unknown objects of increasing realism shows that MINT-based planning attains near-expert returns by issuing a limited number of questions per task while achieving significantly improved rewards and success rates.

Authors:Jana Gonnermann-Müller, Jennifer Haase, Nicolas Leins, Thomas Kosch, Sebastian Pokutta
Title: Stable Personas: Dual-Assessment of Temporal Stability in LLM-Based Human Simulation
Abstract:
Large Language Models (LLMs) acting as artificial agents offer the potential for scalable behavioral research, yet their validity depends on whether LLMs can maintain stable personas across extended conversations. We address this point using a dual-assessment framework measuring both self-reported characteristics and observer-rated persona expression. Across two experiments testing four persona conditions (default, high, moderate, and low ADHD presentations), seven LLMs, and three semantically equivalent persona prompts, we examine between-conversation stability (3,473 conversations) and within-conversation stability (1,370 conversations and 18 turns). Self-reports remain highly stable both between and within conversations. However, observer ratings reveal a tendency for persona expressions to decline during extended conversations. These findings suggest that persona-instructed LLMs produce stable, persona-aligned self-reports, an important prerequisite for behavioral research, while identifying this regression tendency as a boundary condition for multi-agent social simulation.

Authors:Jana Gonnermann-Müller, Jennifer Haase, Nicolas Leins, Moritz Igel, Konstantin Fackeldey, Sebastian Pokutta
Title: FACET: Multi-Agent AI Supporting Teachers in Scaling Differentiated Learning for Diverse Students
Abstract:
Classrooms are becoming increasingly heterogeneous, comprising learners with diverse performance and motivation levels, language proficiencies, and learning differences such as dyslexia and ADHD. While teachers recognize the need for differentiated instruction, growing workloads create substantial barriers, making differentiated instruction an ideal that is often unrealized in practice. Current AI educational tools, which promise differentiated materials, are predominantly student-facing and performance-centric, ignoring other aspects that shape learning outcomes. We introduce FACET, a teacher-facing multi-agent framework designed to address these gaps by supporting differentiation that accounts for motivation, performance, and learning differences. Developed with educational stakeholders from the outset, the framework coordinates four specialized agents, including learner simulation, diagnostic assessment, material generation, and evaluation within a teacher-in-the-loop design. School principals (N = 30) shaped system requirements through participatory workshops, while in-service K-12 teachers (N = 70) evaluated material quality. Mixed-methods evaluation demonstrates strong perceived value for inclusive differentiation. Practitioners emphasized both the urgent need arising from classroom heterogeneity and the importance of maintaining pedagogical autonomy as a prerequisite for adoption. We discuss implications for future school deployment and outline partnerships for longitudinal classroom implementation.

Authors:Ruoxi Jia, Luis Oala, Wenjie Xiong, Suqin Ge, Jiachen T. Wang, Feiyang Kang, Dawn Song
Title: A Sustainable AI Economy Needs Data Deals That Work for Generators
Abstract:
We argue that the machine learning value chain is structurally unsustainable due to an economic data processing inequality: each state in the data cycle from inputs to model weights to synthetic outputs refines technical signal but strips economic equity from data generators. We show, by analyzing seventy-three public data deals, that the majority of value accrues to aggregators, with documented creator royalties rounding to zero and widespread opacity of deal terms. This is not just an economic welfare concern: as data and its derivatives become economic assets, the feedback loop that sustains current learning algorithms is at risk. We identify three structural faults - missing provenance, asymmetric bargaining power, and non-dynamic pricing - as the operational machinery of this inequality. In our analysis, we trace these problems along the machine learning value chain and propose an Equitable Data-Value Exchange (EDVEX) Framework to enable a minimal market that benefits all participants. Finally, we outline research directions where our community can make concrete contributions to data deals and contextualize our position with related and orthogonal viewpoints.

Authors:Yinghao Tang, Yupeng Xie, Yingchaojie Feng, Tingfeng Lan, Jiale Lao, Yue Cheng, Wei Chen
Title: ViviDoc: Generating Interactive Documents through Human-Agent Collaboration
Abstract:
Interactive documents help readers engage with complex ideas through dynamic visualization, interactive animations, and exploratory interfaces. However, creating such documents remains costly, as it requires both domain expertise and web development skills. Recent Large Language Model (LLM)-based agents can automate content creation, but directly applying them to interactive document generation often produces outputs that are difficult to control. To address this, we present ViviDoc, to the best of our knowledge the first work to systematically address interactive document generation. ViviDoc introduces a multi-agent pipeline (Planner, Styler, Executor, Evaluator). To make the generation process controllable, we provide three levels of human control: (1) the Document Specification (DocSpec) with SRTC Interaction Specifications (State, Render, Transition, Constraint) for structured planning, (2) a content-aware Style Palette for customizing writing and interaction styles, and (3) chat-based editing for iterative refinement. We also construct ViviBench, a benchmark of 101 topics derived from real-world interactive documents across 11 domains, along with a taxonomy of 8 interaction types and a 4-dimensional automated evaluation framework validated against human ratings (Pearson r > 0.84). Experiments show that ViviDoc achieves the highest content richness and interaction quality in both automated and human evaluation. A 12-person user study confirms that the system is easy to use, provides effective control over the generation process, and produces documents that satisfy users.

Authors:Pedro Oliveira, Tayana Conte, Marco Gerosa, Igor Steinmacher
Title: Governance in Practice: How Open Source Projects Define and Document Roles
Abstract:
Open source software (OSS) sustainability depends not only on code contributions but also on governance structures that define who decides, who acts, and how responsibility is distributed. We lack systematic empirical evidence of how projects formally codify roles and authority in written artifacts. This paper investigates how OSS projects define and structure governance through their GOVERNANCE.md files and related documents. We analyze governance as an institutional infrastructure, a set of explicit rules that shape participation, decision rights, and community memory. We used Institutional Grammar to extract and formalize role definitions from repositories hosted on GitHub. We decompose each role into scope, privileges, obligations, and life-cycle rules to compare role structures across communities. Our results show that although OSS projects use a stable set of titles, identical titles carry different responsibilities, and different labels describe similar functions, which we call role drift. Still, we observed that a few actors sometimes accumulate technical, managerial, and community duties. %This creates the Maintainer Paradox: those who enable broad participation simultaneously become governance bottlenecks. By understanding authority and responsibilities in OSS, our findings inform researchers and practitioners on the importance of designing clearer roles, distributing work, and reducing leadership overload to support healthier and more sustainable communities.

Authors:Yu Liu, Lei Zhang, Haoxun Li, Hanlei Shi, Yuxuan Ding, Leyuan Qu, Taihao Li
Title: Follow the Clues, Frame the Truth: Hybrid-evidential Deductive Reasoning in Open-Vocabulary Multimodal Emotion Recognition
Abstract:
Open-Vocabulary Multimodal Emotion Recognition (OV-MER) is inherently challenging due to the ambiguity of equivocal multimodal cues, which often stem from distinct unobserved situational dynamics. While Multimodal Large Language Models (MLLMs) offer extensive semantic coverage, their performance is often bottlenecked by premature commitment to dominant data priors, resulting in suboptimal heuristics that overlook crucial, complementary affective cues across modalities. We argue that effective affective reasoning requires more than surface-level association; it necessitates reconstructing nuanced emotional states by synthesizing multiple evidence-grounded rationales that reconcile these observations from diverse latent perspectives. We introduce HyDRA, a Hybrid-evidential Deductive Reasoning Architecture that formalizes inference as a Propose-Verify-Decide protocol. To internalize this abductive process, we employ reinforcement learning with hierarchical reward shaping, aligning the reasoning trajectories with final task performance to ensure they best reconcile the observed multimodal cues. Systematic evaluations validate our design choices, with HyDRA consistently outperforming strong baselines--especially in ambiguous or conflicting scenarios--while providing interpretable, diagnostic evidence traces.

Authors:Santiago de Leon-Martinez, Robert Moro, Branislav Kveton, Maria Bielikova
Title: From Latent to Observable Position-Based Click Models in Carousel Interfaces
Abstract:
Click models are a central component of learning and evaluation in recommender systems, yet most existing models are designed for single ranked-list interfaces. In contrast, modern recommender platforms increasingly use complex interfaces such as carousels, which consist of multiple swipeable lists that enable complex user browsing behaviors. In this paper, we study position-based click models in carousel interfaces and examine optimization methods, model structure, and alignment with user behavior. We propose three novel position-based models tailored to carousels, including the first position-based model without latent variables that incorporates observed examination signals derived from eye tracking data, called the Observed Examination Position-Based Model (OEPBM). We develop a general implementation of these carousel click models, supporting multiple optimization techniques and conduct experiments comparing gradient-based methods with classical approaches, namely expectation-maximization and maximum likelihood estimation. Our results show that gradient-based optimization consistently achieve better click likelihoods. Among the evaluated models, the OEPBM achieves the strongest performance in click prediction and produces examination patterns that most closely align to user behavior. However, we also demonstrate that strong click fit does not imply realistic modeling of user examination and browsing patterns. This reveals a fundamental limitation of click-only models in complex interfaces and the need for incorporating additional behavioral signals when designing click models for carousel-based recommender systems.

Authors:Chenyi Li, Raja Marjieh, Haoyu Hu, Mark Steyvers, Katherine M. Collins, Ilia Sucholutsky, Nori Jacoby
Title: Human-AI Synergy Supports Collective Creative Search
Abstract:
Generative AI is increasingly transforming creativity into a hybrid human-artificial process, but its impact on the quality and diversity of creative output remains unclear. We study collective creativity using a controlled word-guessing task that balances open-endedness with an objective measure of task performance. Participants attempt to infer a hidden target word, scored based on the semantic similarity of their guesses to the target, while also observing the best guess from previous players. We compare performance and outcome diversity across human-only, AI-only, and hybrid human-AI groups. Hybrid groups achieve the highest performance while preserving high diversity of guesses. Within hybrid groups, both humans and AI agents systematically adjust their strategies relative to single-agent conditions, suggesting higher-order interaction effects, whereby agents adapt to each other's presence. Although some performance benefits can be reproduced through collaboration between heterogeneous AI systems, human-AI collaboration remains superior, underscoring complementary roles in collective creativity.

Authors:Xinyi Zhang, Mamtaj Akter, Heajun An, Minqian Liu, Qi Zhang, Lifu Huang, Jin-Hee Cho, Pamela J. Wisniewski, Sang Won Lee
Title: From Vulnerable to Resilient: Examining Parent and Teen Perceptions on How to Respond to Unwanted Cybergrooming Advances
Abstract:
Cybergrooming is a form of online abuse that threatens teens' mental health and physical safety. Yet, most prior work has focused on detecting perpetrators' behaviors, leaving a limited understanding of how teens might respond to such unwanted advances. To address this gap, we conducted an online survey with 74 participants -- 51 parents and 23 teens -- who responded to simulated cybergrooming scenarios in two ways: responses that they think would make teens more vulnerable or resilient to unwanted sexual advances. Through a mixed-methods analysis, we identified four types of vulnerable responses (encouraging escalation, accepting an advance, displaying vulnerability, and negating risk concern) and four types of protective strategies (setting boundaries, directly declining, signaling risk awareness, and leveraging avoidance techniques). As the cybergrooming risk escalated, both vulnerable responses and protective strategies showed a corresponding progression. This study contributes a teen-centered understanding of cybergrooming, a labeled dataset, and a stage-based taxonomy of perceived protective strategies, while offering implications for educational programs and sociotechnical interventions.

Authors:Sara Solarova, Matúš Mesarčík, Branislav Pecher, Ivan Srba
Title: Beyond the Checkbox: Strengthening DSA Compliance Through Social Media Algorithmic Auditing
Abstract:
Algorithms of online platforms are required under the Digital Services Act (DSA) to comply with specific obligations concerning algorithmic transparency, user protection and privacy. To verify compliance with these requirements, DSA mandates platforms to undergo independent audits. Little is known about current auditing practices and their effectiveness in ensuring such compliance. To this end, we bridge regulatory and technical perspectives by critically examining selected audit reports across three critical algorithmic-related provisions: restrictions on profiling minors, transparency in recommender systems, and limitations on targeted advertising using sensitive data. Our analysis shows significant inconsistencies in methodologies and lack of technical depth when evaluating AI-powered systems. To enhance the depth, scale, and independence of compliance assessments, we propose to employ algorithmic auditing -- a process of behavioural assessment of AI algorithms by means of simulating user behaviour, observing algorithm responses and analysing them for audited phenomena.

Authors:Wenhan Lyu, Yimeng Wang, Murong Yue, Yifan Sun, Jennifer Suh, Meredith Kier, Ziyu Yao, Yixuan Zhang
Title: Designing AI Peers for Collaborative Mathematical Problem Solving with Middle School Students: A Participatory Design Study
Abstract:
Collaborative problem solving (CPS) is a fundamental practice in middle-school mathematics education; however, student groups frequently stall or struggle without ongoing teacher support. Recent work has explored how Generative AI tools can be designed to support one-on-one tutoring, but little is known about how AI can be designed as peer learning partners in collaborative learning contexts. We conducted a participatory design study with 24 middle school students, who first engaged in mathematics CPS tasks with AI peers in a technology probe, and then collaboratively designed their ideal AI peer. Our findings reveal that students envision an AI peer as competent in mathematics yet explicitly deferential, providing progressive scaffolds such as hints and checks under clear student control. Students preferred a tone of friendly expertise over exaggerated personas. We also discuss design recommendations and implications for AI peers in middle school mathematics CPS.

Authors:Molly Campbell, Mohamad Sheikho Al Jasem, Ajay Kumar Shrestha
Title: Toward Youth-Centered Privacy-by-Design in Smart Devices: A Systematic Review
Abstract:
This literature review evaluates privacy-by-design frameworks, tools, and policies intended to protect youth in AI-enabled smart devices using a PRISMA-guided workflow. Sources from major academic and grey-literature repositories from the past decade were screened. The search identified 2,216 records; after deduplication and screening, 645 articles underwent eligibility assessment, and 122 were included for analysis. The corpus was organized along three thematic categories: technical solutions, policy/regulatory measures, and education/awareness strategies. Findings reveal that while technical interventions such as on-device processing, federated learning, and lightweight encryption significantly reduce data exposure, their adoption remains limited. Policy frameworks, including the EU's GDPR, the UK Age-Appropriate Design Code, and Canada's PIPEDA, provide important baselines but are hindered by gaps in enforcement and age-appropriate design obligations, while educational initiatives are rarely integrated systematically into curricula. Overall, the corpus skews toward technical solutions (67%) relative to policy (21%) and education (12%), indicating an implementation gap outside the technical domain. To address these challenges, we recommend a multi-stakeholder model in which policymakers, manufacturers, and educators co-develop inclusive, transparent, and context-sensitive privacy ecosystems. This work advances discourse on youth data protection by offering empirically grounded insights and actionable recommendations for the design of ethical, privacy-preserving AI systems tailored to young users.

Authors:Molly Campbell, Trevor De Clark, Mohamad Sheikho Al Jasem, Sandhya Joshi, Ajay Kumar Shrestha
Title: Convenience vs. Control: A Qualitative Study of Youth Privacy with Smart Voice Assistants
Abstract:
Smart voice assistants (SVAs) are embedded in the daily lives of youth, yet their privacy controls often remain opaque and difficult to manage. Through five semi-structured focus groups (N=26) with young Canadians (ages 16-24), we investigate how perceived privacy risks (PPR) and benefits (PPBf) intersect with algorithmic transparency and trust (ATT) and privacy self-efficacy (PSE) to shape privacy-protective behaviors (PPB). Our analysis reveals that policy overload, fragmented settings, and unclear data retention undermine self-efficacy and discourage protective actions. Conversely, simple transparency cues were associated with greater confidence without diminishing the utility of hands-free tasks and entertainment. We synthesize these findings into a qualitative model in which transparency friction erodes PSE, which in turn weakens PPB. From this model, we derive actionable design guidance for SVAs, including a unified privacy hub, plain-language "data nutrition" labels, clear retention defaults, and device-conditional micro-tutorials. This work foregrounds youth perspectives and offers a path for SVA governance and design that empowers young digital citizens while preserving convenience.

Authors:Zhenning Chen, Hanbei Zhan, Yanwei Huang, Xin Wu, Dazhen Deng, Di Weng, Yingcai Wu
Title: KEditVis: A Visual Analytics System for Knowledge Editing of Large Language Models
Abstract:
Large Language Models (LLMs) demonstrate exceptional capabilities in factual question answering, yet they sometimes provide incorrect responses. To address this issue, knowledge editing techniques have emerged as effective methods for correcting factual information in LLMs. However, typical knowledge editing workflows struggle with identifying the optimal set of model layers for editing and rely on summary indicators that provide insufficient guidance. This lack of transparency hinders effective comparison and identification of optimal editing strategies. In this paper, we present KEditVis, a novel visual analytics system designed to assist users in gaining a deeper understanding of knowledge editing through interactive visualizations, improving editing outcomes, and discovering valuable insights for the future development of knowledge editing algorithms. With KEditVis, users can select appropriate layers as the editing target, explore the reasons behind ineffective edits, and perform more targeted and effective edits. Our evaluation, including usage scenarios, expert interviews, and a user study, validates the effectiveness and usability of the system.

Authors:Xian Wang, Xuanru Cheng, Rongkai Shi, Lei Chen, Jingyao Zheng, Hai-Ning Liang, Lik-Hang Lee
Title: Conflict Resolution Strategies for Co-manipulation of Virtual Objects Under Non-disjoint Conditions
Abstract:
Virtual Reality (VR) co-manipulation enables multiple users to collaboratively interact with shared virtual objects. However, existing research treats objects as monolithic entities, overlooking scenarios where users need to manipulate different sub-components simultaneously. This work addresses conflict resolution when users select overlapping vertices (non-disjoint sets) during co-manipulation. We present a comprehensive framework comprising preventive strategies (Object-level and Action-level Restrictions) and reactive strategies (computational conflict resolution). Through two user studies with 76 participants (38 pairs), we evaluated these approaches in collaborative wireframe editing tasks. Study 1 identified Averaging as the optimal computational method, balancing task efficiency with user experience. Study 2 highlighted that Action-level Restriction, which permits overlapping selections but restricts concurrent identical operations, achieved better performance compared to exclusive object locking. Reactive strategies using averaging provided smooth collaboration for experienced users, while second-user priority enabled quick corrections. Our findings indicate that optimal strategy selection depends on task requirements, user expertise, and collaboration patterns. Based on the findings, we provide design implications for developing VR collaboration systems that support flexible sub-components manipulation while maintaining collaborative awareness and minimizing conflicts.

Authors:Ryoya Koyama, Zhiyao Wang, Devi Karolita, Jialong Li, Kenji Tei
Title: Bridging the Interpretation Gap in Accessibility Testing: Empathetic and Legal-Aware Bug Report Generation via Large Language Models
Abstract:
Modern automated accessibility testing tools for mobile applications have significantly improved the detection of interface violations, yet their impact on remediation remains limited. A key reason is that existing tools typically produce low-level, technical outputs that are difficult for non-specialist stakeholders, such as product managers and designers, to interpret in terms of real user harm and compliance risk. In this paper, we present \textsc{HEAR} (\underline{H}uman-c\underline{E}ntered \underline{A}ccessibility \underline{R}eporting), a framework that bridges this interpretation gap by transforming raw accessibility bug reports into empathetic, stakeholder-oriented narratives. Given the outputs of the existing accessibility testing tool, \textsc{HEAR} first reconstructs the UI context through semantic slicing and visual grounding, then dynamically injects disability-oriented personas matched to each violation type, and finally performs multi-layer reasoning to explain the physical barrier, functional blockage, and relevant legal or compliance concerns. We evaluate the framework on real-world accessibility issues collected from four popular Android applications and conduct a user study (N=12). The results show that \textsc{HEAR} generates factually grounded reports and substantially improves perceived empathy, urgency, persuasiveness, and awareness of legal risk compared with raw technical logs, while imposing little additional cognitive burden.

Authors:Wilhelm Kerle-Malcharek, Giulio Biondi, Karsten Klein, Ulf Hailer, Steffen Diefenbach, Fabrizio Grosso, Marco Legittimo, Paola Venuti, Carla Binucci, Giuseppe Liotta, Falk Schreiber
Title: Design Space and Implementation of RAG-Based Avatars for Virtual Archaeology
Abstract:
Immersive technologies, such as virtual and augmented reality, are transforming digital heritage by enabling users to explore and interact with culturally significant sites. It is now possible to view and augment digital twins, or digitally reconstructed versions of them, and to enable access to previously unreachable locations for a broader audience. Here, we investigate retrieval-augmented generation (RAG)-based avatars as an interface for accessing further information about digital cultural heritage objects while immersed in dedicated virtual environments. We present a requirement design space that spans the application realm, avatar personality, and I/O modalities. We instantiate it with a RAG system coupled to a conversational avatar in a virtual reality (VR) environment, using the Maxentius mausoleum from the 4th century AD as a case study, through which users gain access to curated on-demand information of the digitised heritage object. Our workflow utilises scholarly texts and enriches them with metadata. We evaluate various RAG configurations in terms of answer quality on a small expert-crafted question-answer set, as well as the perceived workload of users of a VR setup using such a RAG avatar. We demonstrate evidence that users perceive the overall workload for interacting with such an avatar as below average and that such avatars help to gain topical engagement. Overall, our work demonstrates how to utilise RAG-driven VR avatars for archaeological purposes and provides evidence that they can offer a pathway for immersive, AI-enhanced digital heritage applications.

Authors:Tanja Kojić, Alina Dovhalevska, Maurizio Vergari, Sebastian Möller, Jan-Niklas Voigt-Antons
Title: Dishonesty Tendencies in Testing Scenarios Among Students with Virtual Reality and Computer-Mediated Technology
Abstract:
Virtual reality (VR) systems have the potential to be an innovation in the field of e-learning. Starting with fully functional e-classes, VR technologies can be used to build entire e-campuses. The power of VR is that it allows for stronger contact with students than computer-mediated technology. Deceptive behaviour, both verbal and nonverbal, refers to intentional activities designed to deceive others. Students often engage in dishonest practices to make progress. Whether it is cheating on an exam, copying another student's essay, or inflating their GPA, the motivation for cheating is rarely simply a lack of preparation. Even though some may see academic dishonesty as an asset, the reality is that it can have major consequences. This poster demonstrates the findings from a study of students' deceitful behaviour during a test in VR and in real-life situations. For this user study, 22 volunteers were invited to participate, with each experiment involving exactly two participants and the examiner present in the room. Students were invited to take two tests: one in VR and one on a laptop. Their goal was to score as many points as possible by simulating a real-world online exam. Participants were requested to complete questionnaires during and after each experiment, which assisted in collecting additional data for this study. The results indicate that the amount of cheating that happened in VR and on a laptop was exactly the same.

Authors:Tanja Kojić, Maurizio Vergari, Maximilian Warsinke, Sebastian Möller, Jan-Niklas Voigt-Antons
Title: Influence of Interactivity in Shaping User Experience and Social Acceptance of Mobile XR
Abstract:
This study investigates the impact of the Degree of Interactivity on User Experience (UX) and social acceptability (SA) in Mobile Augmented Reality (MAR) applications. As AR technologies become more prevalent, understanding how varying levels of interactivity influence both user perception and social dynamics is crucial for their design and adoption. Two commercially available MAR applications, IKEA and Virtlo, which differ significantly in their interactivity levels, were used to conduct a user study. The study examines how body movements required for interaction with AR content affect both UX and SA, shedding light on users' comfort levels and potential social barriers in public settings. The findings suggest a complex relationship between interactivity, perceived usability, and social considerations, emphasizing the need for a balanced design approach. This research provides valuable insights into the development of future AR applications by addressing not only usability but also the broader social implications of AR interactions. By integrating social acceptability into traditional UX evaluations, this study highlights its significance in ensuring the seamless integration of AR technologies into everyday environments.

Authors:Tanja Kojić, Maurizio Vergari, Giulia-Marielena Benta, Joy Krupinski, Maximilian Warsinke, Sebastian Möller, Jan-Niklas Voigt-Antons
Title: Integrating Virtual and Augmented Reality into Public Education: Opportunities and Challenges in Language Learning
Abstract:
Virtual Reality (VR) and Augmented Reality (AR) are emerging as transformative tools in education, offering new possibilities for engagement and immersion. This paper explores their potential in language learning within public education, focusing on their ability to enhance traditional schooling methods and address existing educational gaps. The integration of VR and AR in schools, however, is not without challenges, including usability, technical barriers, and the alignment of these technologies with existing curricula. Drawing on two empirical studies, this work investigates the opportunities and challenges of VR- and AR-assisted language learning and proposes strategies for their effective implementation in the public sector. The findings show that VR increases motivation and immersion but has an unclear impact on vocabulary retention, with technical limitations and cognitive overload identified as key challenges. AR enhances contextual learning and accessibility but faces usability constraints and limited personalization. To facilitate effective adoption, this paper recommends improving interface design, reducing cognitive load, increasing adaptability, and ensuring adequate infrastructure and teacher training. Overcoming these barriers will enable a more effective integration of immersive technologies in language education.

Authors:Shunsuke Iwashita, Titouan Jeannot, Braden Eberhard, Jacob Miller, Rikako Kono, Calvin Yeung, Keisuke Fujii
Title: Right Move, Right Time: Multi-Sport Space Evaluation Platform for Ultimate Frisbee, Basketball, and Soccer
Abstract:
We present an open, sport-agnostic platform that turns tracking into comparable spatial measures across professional Ultimate, basketball, and soccer. Coaches in all three sports ask the same question: where is the usable space, and when should an off-ball run start? Our workflow standardizes inputs, provides timing-aware spatial evaluations, and makes it possible to reuse the same analysis across sports. We illustrate the approach with Ultimate as a focused testbed and then examine transfer between basketball and soccer. Together, these results show a practical path toward consistent, comparable evaluation across various invasion sports.

Authors:Jingyao Zheng, Xian Wang, Sven Mayer, Lik-Hang Lee
Title: Non-urgent Messages Do Not Jump into My Headset Suddenly! Adaptive Notification Design in Mixed Reality
Abstract:
Mixed reality (MR) notification systems currently display all messages in fixed central locations regardless of urgency, leading to unnecessary interruptions and cognitive overload. Drawing from previous MR/Virtual Reality (VR) notification design work and calm technology principles, we developed an adaptive notification system that adjusts spatial placement based on urgency levels: non-urgent notifications appear as peripheral icons accessible via head movement, moderately urgent messages anchor to the user's hand, and very urgent notifications transition progressively from peripheral to central view. Through a within-subjects study (N=18), we evaluated our adaptive system against the default centralised approach. Results demonstrate that the adaptive system significantly reduces mental workload (p=0.041), temporal workload (p=0.008), and frustration (p=0.004) while maintaining comparable notification awareness. Logistic regression analysis reveals that users prefer the adaptive system even with classification errors, provided the combined misclassification rate (disruptiveness + omission errors) remains below a determinable threshold. Our findings establish the first empirical evidence that urgency-based spatial notification distribution effectively addresses core MR usability challenges, offering practical design guidelines for immersive notification systems that balance user attention management with information accessibility.

Authors:Chaoyue He, Xin Zhou, Xinjia Yu, Lei Zhang, Yan Zhang, Yi Wu, Lei Xiao, Liangyue Li, Di Wang, Hong Xu, Xiaoqiao Wang, Wei Liu, Chunyan Miao
Title: SSKG Hub: An Expert-Guided Platform for LLM-Empowered Sustainability Standards Knowledge Graphs
Abstract:
Sustainability disclosure standards (e.g., GRI, SASB, TCFD, IFRS S2) are comprehensive yet lengthy, terminology-dense, and highly cross-referential, hindering structured analysis and downstream use. We present SSKG Hub (Sustainability Standards Knowledge Graph Hub), a research prototype and interactive web platform that transforms standards into auditable knowledge graphs (KGs) through an LLM-centered, expert-guided pipeline. The system integrates automatic standard identification, configurable chunking, standard-specific prompting, robust triple parsing, and provenance-aware Neo4j storage with fine-grained audit metadata. LLM extraction produces a provenance-linked Draft KG, which is reviewed, curated, and formally promoted to a Certified KG through meta-expert adjudication. A role-based governance framework covering read-only guest access, expert review and CRUD operations, meta-expert certification, and administrative oversight ensures traceability and accountability across draft and certified states. Beyond graph exploration and triple-level evidence tracing, SSKG Hub supports cross-KG fusion, KG-driven tasks, and dedicated modules for insights and curated resources. We validate the platform through a comprehensive expert-led KG review case study that demonstrates end-to-end curation and quality assurance. The web application is publicly available at www.sskg-hub.com.

Authors:Tanja Kojić, Nathan Kirchner, Maurizio Vergari, Maximilian Warsinke, Sebastian Möller, Jan-Niklas Voigt-Antons
Title: Exploring the Effect of Heights and User Stance on User Experience in Extended Reality Climbing
Abstract:
Virtual environments (VEs) are increasingly used for immersive experiences, training simulations, and entertainment, yet factors such as height perception and user stance can significantly influence user experience (UX). Height perception in VEs plays a crucial role in shaping UX, particularly in immersive applications such as climbing simulations. This study investigates the effects of height in various VEs and examines how user stance, sitting or standing, impacts immersion, perceived height, and motion sickness. A user study was conducted with 25 participants who played through five randomized climbing scenarios, ranging from indoor climbing gyms to outdoor cityscapes and mountainous terrains. Participants' UX was assessed using standardized questionnaires, including the IPQ for general presence, spatial presence, involvement, and experienced realism, as well as the SSQ to evaluate motion sickness symptoms such as nausea, oculomotor strain, and disorientation. Results indicate that seated participants experienced slightly higher immersion but were also more susceptible to motion sickness compared to those standing. While standing participants maintained consistent scores across different environments, seated participants reported increased immersion and discomfort as the VEs became larger, more physically demanding, and visually complex.

Authors:Tanja Kojić, Sara Srebot, Maurizio Vergari, Mirta Moslavac, Maximilian Warsinke, Sebastian Möller, Lea Skorin-Kapov, Jan-Niklas Voigt-Antons
Title: Too Immersive for the Field? Addressing Safety Risks in Extended Reality User Studies
Abstract:
Extended Reality (XR) technologies are increasingly tested outside the lab, in homes, schools, and public spaces. While this shift enables more realistic user insights, it also introduces safety challenges that are often overlooked. Physical risks, psychological distress, and accessibility issues can be increased in field studies and unsupervised testing, such as at home or crowdsourced trials. Without clear instructions, safety decisions are left to individual researchers, raising questions of responsibility and consistency. This position paper outlines key safety risks in XR user testing beyond the lab and calls for practical strategies that are needed to help researchers run XR studies in a safe and inclusive way across different environments.

Authors:Sieun Kim, Yeeun Jo, Sungmin Na, Hyunseung Lim, Eunchae Lee, Yu Min Choi, Soohyun Cho, Hwajung Hong
Title: Dark and Bright Side of Participatory Red-Teaming with Targets of Stereotyping for Eliciting Harmful Behaviors from Large Language Models
Abstract:
Red-teaming, where adversarial prompts are crafted to expose harmful behaviors and assess risks, offers a dynamic approach to surfacing underlying stereotypical bias in large language models. Because such subtle harms are best recognized by those with lived experience, involving targets of stereotyping as red-teamers is essential. However, critical challenges remain in leveraging their lived experience for red-teaming while safeguarding psychological well-being. We conducted an empirical study of participatory red-teaming with 20 individuals stigmatized by stereotypes against nonprestigious college graduates in South Korea. Through mixed methods analysis, we found participants transformed experienced discrimination into strategic expertise for identifying biases, while facing psychological costs such as stress and negative reflections on group identity. Notably, red-team participation enhanced their sense of agency and empowerment through their role as guardians of the AI ecosystem. We discuss implications for designing participatory red-teaming that prioritizes both the ethical treatment and empowerment of stigmatized groups.

Authors:Jialong Li, Zhenyu Mao, Zhiyao Wang, Yijun Lu, Shogo Morita, Nianyu Li, Kenji Tei
Title: See What I See: An Attention-Guiding eHMI Approach for Autonomous Vehicles
Abstract:
As autonomous vehicles are gradually being deployed in the real world, external Human-Machine Interfaces (eHMIs) are expected to serve as a critical solution for enhancing vehicle-pedestrian communication. However, existing eHMI designs typically focus solely on the ego vehicle's status, which can inadvertently capture pedestrians' attention or encourage misguided reliance on the AV's signals, leading them to neglect scanning for other surrounding hazards. To address this, we propose the Attention-Guiding eHMI (AGeHMI), a projection-based visualization that employs directional cues and risk-based color coding to actively guide pedestrians' attention toward potential environmental dangers. Evaluation through a virtual reality user study (N = 20) suggests that AGeHMI effectively influences participants' visual attention distribution and significantly reduces potential collision risks with surrounding vehicles, while simultaneously improving subjective confidence and reducing cognitive workload.

Authors:Taewook Kim, Matthew K. Hong, Yan-Ying Chen, Jonathan Q. Li, Monica P Van, Shabnam Hakimi, Matthew Kay, Matthew Klenk
Title: Personagram: Bridging Personas and Product Design for Creative Ideation with Multimodal LLMs
Abstract:
Product designers often begin their design process with handcrafted personas. While personas are intended to ground design decisions in consumer preferences, they often fall short in practice by remaining abstract, expensive to produce, and difficult to translate into actionable design features. As a result, personas risk serving as static reference points rather than tools that actively shape design outcomes. To address these challenges, we built Personagram, an interactive system powered by multimodal large language models (MLLMs) that helps designers explore detailed census-based personas, extract product features inferred from persona attributes, and recombine them for specific customer segments. In a study with 12 professional designers, we show that Personagram facilitates more actionable ideation workflows by structuring multimodal thinking from persona attributes to product design features, achieving higher engagement with personas, perceived transparency, and satisfaction compared to a chat-based baseline. We discuss implications of integrating AI-generated personas into product design workflows.

Authors:Yuqing Xiao, John Grundy, Anuradha Madugalla, Elizabeth Manias
Title: Elderly HealthMag: Systematic Building and Calibrating a Tool for Identifying and Evaluating Senior User Digital Health Software
Abstract:
Digital health (DH) software is increasingly deployed to populations where many end users live with one or more health conditions. Yet, DH software development teams frequently operate using implicit, incorrect assumptions about these users, resulting in products that under-serve the specific requirements imposed by their age and health conditions. Consequently, while software may meet clinical objectives on paper, it often fails to be inclusive during actual user interaction. To address this, we propose \textbf{\textit{HealthMag}}, a tool inspired by GenderMag designed to help better elicit, model and evaluate requirements for digital health software. We developed HealthMag through systematic mapping and calibration following the InclusiveMag framework. Furthermore, we integrated this with a calibrated version of an existing AgeMag method to create a dual-lens approach: \textbf{\textit{Elderly HealthMag}}, designed to aid requirements, design and evaluation of mHealth software for senior end users. We demonstrate application and utility of Age HealthMag via cognitive walkthroughs in identifying inclusivity biases in current senior user-oriented digital health applications.

Authors:Siyang Li, Zhuoya Wang, Xiyan Gui, Xiaoqing Chen, Ziwei Wang, Yaozhi Wen, Dongrui Wu
Title: RAICL: Retrieval-Augmented In-Context Learning for Vision-Language-Model Based EEG Seizure Detection
Abstract:
Electroencephalogram (EEG) decoding is a critical component of medical diagnostics, rehabilitation engineering, and brain-computer interfaces. However, contemporary decoding methodologies remain heavily dependent on task-specific datasets to train specialized neural network architectures. Consequently, limited data availability impedes the development of generalizable large brain decoding models. In this work, we propose a paradigm shift from conventional signal-based decoding by leveraging large-scale vision-language models (VLMs) to analyze EEG waveform plots. By converting multivariate EEG signals into stacked waveform images and integrating neuroscience domain expertise into textual prompts, we demonstrate that foundational VLMs can effectively differentiate between different patterns in the human brain. To address the inherent non-stationarity of EEG signals, we introduce a Retrieval-Augmented In-Context Learning (RAICL) approach, which dynamically selects the most representative and relevant few-shot examples to condition the autoregressive outputs of the VLM. Experiments on EEG-based seizure detection indicate that state-of-the-art VLMs under RAICL achieved better or comparable performance with traditional time series based approaches. These findings suggest a new direction in physiological signal processing that effectively bridges the modalities of vision, language, and neural activities. Furthermore, the utilization of off-the-shelf VLMs, without the need for retraining or downstream architecture construction, offers a readily deployable solution for clinical applications.

Authors:Matthew K. Hong, Joey Li, Alexandre Filipowicz, Monica Van, Kalani Murakami, Yan-Ying Chen, Shiwali Mohan, Shabnam Hakimi, Matthew Klenk
Title: Deconstructing Taste: Toward a Human-Centered AI Framework for Modeling Consumer Aesthetic Perceptions
Abstract:
Understanding and modeling consumers' stylistic taste such as "sporty" is crucial for creating designs that truly connect with target audiences. However, capturing taste during the design process remains challenging because taste is abstract and subjective, and preference data alone provides limited guidance for concrete design decisions. This paper proposes an integrated human-centered computational framework that links subjective evaluations (e.g., perceived luxury of car wheels) with domain-specific features (e.g., spoke configuration) and computer vision-based measures (e.g., texture). By jointly modeling human-derived (consumer and designer) and machine-extracted features, our framework advances aesthetic assessment by explicitly linking model outcomes to interpretable design features. In particular, it demonstrates how perceptual features, domain-specific design patterns, and consumers' own interpretations of style contribute to aesthetic evaluations. This framework will enable product teams to better understand, communicate, and critique aesthetic decisions, supporting improved anticipation of consumer taste and more informed exploration of design alternatives at design time.

Authors:Agam Goyal, Xianyang Zhan, Charlotte Lambert, Koustuv Saha, Eshwar Chandrasekharan
Title: VASTU: Value-Aligned Social Toolkit for Online Content Curation
Abstract:
Detecting what content communities value is a foundational challenge for social computing systems -- from feed curation and content ranking to moderation tools and personalized recommendation systems. Yet existing approaches remain fragmented across methodological paradigms, and it remains unclear which methods best capture community-specific notions of value. We introduce VASTU (Value-Aligned Social Toolkit for Online Content Curation), a benchmark and evaluation framework for systematically comparing approaches to detecting community-valued content. VASTU includes a dataset of 75,000 comments from 15 diverse Reddit communities, annotated with community approval labels and rich linguistic features. Using VASTU, we evaluate feature-based models, transformers, prompted and fine-tuned language models under global versus community-specific training regimes. We find that community-specific models consistently outperform global approaches, with fine-tuned transformers achieving the strongest performance (0.72 AUROC). Notably, fine-tuned SLMs (0.65 AUROC) substantially outperform prompted LLMs (0.60 AUROC) despite being 100 times smaller. Counterintuitively, chain-of-thought prompting provides no benefit, and reasoning models perform the worst (0.53 AUROC), suggesting this task requires learning community norms rather than test-time reasoning. By releasing VASTU, we provide a standardized benchmark to advance research on value-aligned sociotechnical systems.

Authors:Songming Jia, Yan Lu, Bin Liu, Xiang Zhang, Peng Zhao, Xinmeng Tang, Yelin Wei, Jinyang Huang, Huan Yan, Zhi Liu
Title: Breaking Coordinate Overfitting: Geometry-Aware WiFi Sensing for Cross-Layout 3D Pose Estimation
Abstract:
WiFi-based 3D human pose estimation offers a low-cost and privacy-preserving alternative to vision-based systems for smart interaction. However, existing approaches rely on visual 3D poses as supervision and directly regress CSI to a camera-based coordinate system. We find that this practice leads to coordinate overfitting: models memorize deployment-specific WiFi transceiver layouts rather than only learning activity-relevant representations, resulting in severe generalization failures. To address this challenge, we present PerceptAlign, the first geometry-conditioned framework for WiFi-based cross-layout pose estimation. PerceptAlign introduces a lightweight coordinate unification procedure that aligns WiFi and vision measurements in a shared 3D space using only two checkerboards and a few photos. Within this unified space, it encodes calibrated transceiver positions into high-dimensional embeddings and fuses them with CSI features, making the model explicitly aware of device geometry as a conditional variable. This design forces the network to disentangle human motion from deployment layouts, enabling robust and, for the first time, layout-invariant WiFi pose estimation. To support systematic evaluation, we construct the largest cross-domain 3D WiFi pose estimation dataset to date, comprising 21 subjects, 5 scenes, 18 actions, and 7 device layouts. Experiments show that PerceptAlign reduces in-domain error by 12.3% and cross-domain error by more than 60% compared to state-of-the-art baselines. These results establish geometry-conditioned learning as a viable path toward scalable and practical WiFi sensing.

Authors:Marc Aubreville, Taryn A. Donovan, Christof A. Bertram
Title: Exploring General-Purpose Autonomous Multimodal Agents for Pathology Report Generation
Abstract:
Recent advances in agentic artificial intelligence, i.e. systems capable of autonomous perception, reasoning, and tool use, offer new opportunities for digital pathology. In this pilot study, we evaluate whether two agentic multimodal AI systems (OpenAI's ChatGPT 5.0 in agentic mode, and H Company's Surfer) can autonomously navigate, describe, and interpret histopathologic features in digitized tissue slides on a slide viewing platform. A set of 35 veterinary pathology cases, curated for training purposes, was used as the test dataset. The agent was tasked with autonomously exploring whole-slide images using a web-based slide viewer, identifying salient tissue structures, generating descriptive summaries, and proposing provisional diagnoses. We fed different prompts to explore three scenarios: 1) analysis without knowledge of the signalment, 2) analysis with organ and species provided, and 3) diagnosis based on a morphological description provided. All outputs were reviewed and validated by a board-certified pathologist for accuracy and diagnostic consistency. We further tasked another board-certified pathologist with the same task to establish a baseline. We found the systems to yield accurate diagnoses in up to 28.6% of cases with only images, signalment and organ provided, and up to 68.6% when a morphological description was provided. With only the WSI provided, the models were only correct in up to 5.7% of cases. The human expert, on the other hand, achieved 85.7% diagnostic accuracy with only a single WSI, and 88.6% when also signalment and organ was provided. The study demonstrates that while the agentic AI system can meaningfully engage with web-based slide viewing software to assess complex visual pathology data and produce contextually aligned feature descriptions, diagnostic precision remains limited compared with a human expert.

Authors:Paulius Jurcys, Ashley Greenwald, Mark Fenwick, Valto Loikkanen, Sebastian Porsdam Mann, Brian D. Earp
Title: Who Owns My AI Twin? Data Ownership in a New World of Simulated Identities
Abstract:
The emergence of AI twins, digital replicas that encapsulate an individual's knowledge, memories, psychological traits, and behavioral patterns, raises novel legal and ethical challenges for data governance and personal identity. Built from personal data, these systems require a rethinking of what it means to exercise dominion over one's data and to maintain personal autonomy in an AI-mediated environment. This article argues that natural persons should be recognized as the moral and legal owners of their AI twins, which function as intimate extensions of the self rather than as proprietary technological artifacts. It critiques prevailing legal frameworks that prioritize technological infrastructure and platform control over data and individual autonomy, exposing their structural limitations. In response, the article advances a human-centric model of data governance grounded in individual dominion and a private-by-default principle. This approach proposes a reimagined social contract for AI-driven identities that strengthens personal agency, promotes equitable data stewardship, and better aligns legal norms with the socio-technical realities of AI twins.

Authors:Sitong Wang, Anh Truong, Lydia B. Chilton, Dingzeyu Li
Title: Rewriting Video: Text-Driven Reauthoring of Video Footage
Abstract:
Video is a powerful medium for communication and storytelling, yet reauthoring existing footage remains challenging. Even simple edits often demand expertise, time, and careful planning, constraining how creators envision and shape their narratives. Recent advances in generative AI suggest a new paradigm: what if editing a video were as straightforward as rewriting text? To investigate this, we present a tech probe and a study on text-driven video reauthoring. Our approach involves two technical contributions: (1) a generative reconstruction algorithm that reverse-engineers video into an editable text prompt, and (2) an interactive probe, Rewrite Kit, that allows creators to manipulate these prompts. A technical evaluation of the algorithm reveals a critical human-AI perceptual gap. A probe study with 12 creators surfaced novel use cases such as virtual reshooting, synthetic continuity, and aesthetic restyling. It also highlighted key tensions around coherence, control, and creative alignment in this new paradigm. Our work contributes empirical insights into the opportunities and challenges of text-driven video reauthoring, offering design implications for future co-creative video tools.

Authors:Bowen Yang, Kaiming Jin, Zhenyu Wu, Zhaoyang Liu, Qiushi Sun, Zehao Li, JingJing Xie, Zhoumianze Liu, Fangzhi Xu, Kanzhi Cheng, Qingyun Li, Yian Wang, Yu Qiao, Zun Wang, Zichen Ding
Title: OS-Symphony: A Holistic Framework for Robust and Generalist Computer-Using Agent
Abstract:
While Vision-Language Models (VLMs) have significantly advanced Computer-Using Agents (CUAs), current frameworks struggle with robustness in long-horizon workflows and generalization in novel domains. These limitations stem from a lack of granular control over historical visual context curation and the absence of visual-aware tutorial retrieval. To bridge these gaps, we introduce OS-Symphony, a holistic framework that comprises an Orchestrator coordinating two key innovations for robust automation: (1) a Reflection-Memory Agent that utilizes milestone-driven long-term memory to enable trajectory-level self-correction, effectively mitigating visual context loss in long-horizon tasks; (2) Versatile Tool Agents featuring a Multimodal Searcher that adopts a SeeAct paradigm to navigate a browser-based sandbox to synthesize live, visually aligned tutorials, thereby resolving fidelity issues in unseen scenarios. Experimental results demonstrate that OS-Symphony delivers substantial performance gains across varying model scales, establishing new state-of-the-art results on three online benchmarks, notably achieving 65.84% on OSWorld.

Authors:Siyang Li, Jiayi Ouyang, Zhenyao Cui, Ziwei Wang, Tianwang Jia, Feng Wan, Dongrui Wu
Title: Backpropagation-Free Test-Time Adaptation for Lightweight EEG-Based Brain-Computer Interfaces
Abstract:
Electroencephalogram (EEG)-based brain-computer interfaces (BCIs) face significant deployment challenges due to inter-subject variability, signal non-stationarity, and computational constraints. While test-time adaptation (TTA) mitigates distribution shifts under online data streams without per-use calibration sessions, existing TTA approaches heavily rely on explicitly defined loss objectives that require backpropagation for updating model parameters, which incurs computational overhead, privacy risks, and sensitivity to noisy data streams. This paper proposes Backpropagation-Free Transformations (BFT), a TTA approach for EEG decoding that eliminates such issues. BFT applies multiple sample-wise transformations of knowledge-guided augmentations or approximate Bayesian inference to each test trial, generating multiple prediction scores for a single test sample. A learning-to-rank module enhances the weighting of these predictions, enabling robust aggregation for uncertainty suppression during inference under theoretical justifications. Extensive experiments on five EEG datasets of motor imagery classification and driver drowsiness regression tasks demonstrate the effectiveness, versatility, robustness, and efficiency of BFT. This research enables lightweight plug-and-play BCIs on resource-constrained devices, broadening the real-world deployment of decoding algorithms for EEG-based BCI.

Authors:Yui Kondo, Kevin Dunnell, Isobel Voysey, Qing Hu, Victoria Paesano, Phi H Nguyen, Qing Xiao, Jun Zhao, Luc Rocher
Title: Interactive visualizations for adolescents to understand and challenge algorithmic profiling in online platforms
Abstract:
Social media platforms regularly track, aggregate, and monetize adolescents' data, yet provide them with little visibility or agency over how algorithms construct their digital identities and make inferences about them. We introduce Algorithmic Mirror, an interactive visualization tool that transforms opaque profiling practices into explorable landscapes of personal data. It uniquely leverages adolescents' real digital footprints across YouTube, TikTok, and Netflix, to provide situated, personalized insights into datafication over time. In our study with 27 participants (ages 12--16), we show how engaging with their own data enabled adolescents to uncover the scale and persistence of data collection, recognize cross-platform profiling, and critically reflect algorithmic categorizations of their interests. These findings highlight how identity is a powerful motivator for adolescents' desire for greater digital agency, underscoring the need for platforms and policymakers to move toward structural reforms that guarantee children better transparency and the agency to influence their online experiences.

Authors:Hita Kambhamettu, Bhavana Dalvi Mishra, Andrew Head, Jonathan Bragg, Aakanksha Naik, Joseph Chee Chang, Pao Siangliulue
Title: LitPivot: Developing Well-Situated Research Ideas Through Dynamic Contextualization and Critique within the Literature Landscape
Abstract:
Developing a novel research idea is hard. It must be distinct enough from prior work to claim a contribution while also building on it. This requires iteratively reviewing literature and refining an idea based on what a researcher reads; yet when an idea changes, the literature that matters often changes with it. Most tools offer limited support for this interplay: literature tools help researchers understand a fixed body of work, while ideation tools evaluate ideas against a static, pre-curated set of papers. We introduce literature-initiated pivots, a mechanism where engagement with literature prompts revision to a developing idea, and where that revision changes which literature is relevant. We operationalize this in LitPivot, where researchers concurrently draft and vet an idea. LitPivot dynamically retrieves clusters of papers relevant to a selected part of the idea and proposes literature-informed critiques for how to revise it. A lab study ($n{=}17$) shows researchers produced higher-rated ideas with stronger self-reported understanding of the literature space; an open-ended study ($n{=}5$) reveals how researchers use LitPivot to iteratively evolve their own ideas.

Authors:Ningzhi Tang, Chaoran Chen, Zihan Fang, Gelei Xu, Maria Dhakal, Yiyu Shi, Collin McMillan, Yu Huang, Toby Jia-Jun Li
Title: Programming by Chat: A Large-Scale Behavioral Analysis of 11,579 Real-World AI-Assisted IDE Sessions
Abstract:
IDE-integrated AI coding assistants, which operate conversationally within developers' working codebases with access to project context and multi-file editing, are rapidly reshaping software development. However, empirical investigation of this shift remains limited: existing studies largely rely on small-scale, controlled settings or analyze general-purpose chatbots rather than codebase-aware IDE workflows. We present, to the best of our knowledge, the first large-scale study of real-world conversational programming in IDE-native settings, analyzing 74,998 developer messages from 11,579 chat sessions across 1,300 repositories and 899 developers using Cursor and GitHub Copilot. These chats were committed to public repositories as part of routine development, capturing in-the-wild behavior. Our findings reveal three shifts in how programming work is organized: conversational programming operates as progressive specification, with developers iteratively refining outputs rather than specifying complete tasks upfront; developers redistribute cognitive work to AI, delegating diagnosis, comprehension, and validation rather than engaging with code and outputs directly; and developers actively manage the collaboration, externalizing plans into persistent artifacts, and negotiating AI autonomy through context injection and behavioral constraints. These results provide foundational empirical insights into AI-assisted development and offer implications for the design of future programming environments.

Authors:Eason Chen, Isabel Wang, Nina Yuan, Sophia Judicke, Kayla Beigh, Xinyi Tang
Title: From Tool to Teammate: LLM Coding Agents as Collaborative Partners for Behavioral Labeling in Educational Dialogue Analysis
Abstract:
Behavioral analysis of tutoring dialogues is essential for understanding student learning, yet manual coding remains a bottleneck. We present a methodology where LLM coding agents autonomously improve the prompts used by LLM classifiers to label educational dialogues. In each iteration, a coding agent runs the classifier against human-labeled validation data, analyzes disagreements, and proposes theory-grounded prompt modifications for researcher review. Applying this approach to 659 AI tutoring sessions across four experiments with three agents and three classifiers, 4-fold cross-validation on held-out data confirmed genuine improvement: the best agent achieved test $κ=0.78$ (SD$=0.08$), matching human inter-rater reliability ($κ=0.78$), at a cost of approximately \$5--8 per agent. While development-set performance reached $κ=0.91$--$0.93$, the cross-validated results represent our primary generalization claim. The iterative process also surfaced an undocumented labeling pattern: human coders consistently treated expressions of confusion as engagement rather than disengagement. Continued iteration beyond the optimum led to regression, underscoring the need for held-out validation. We release all prompts, iteration logs, and data.

Authors:Ina Kaleva, Xiao Zhan, Ruba Abu-Salma, Jose Such
Title: Privacy and Safety Experiences and Concerns of U.S. Women Using Generative AI for Seeking Sexual and Reproductive Health Information
Abstract:
The rapid adoption of generative AI (GenAI) chatbots has reshaped access to sexual and reproductive health (SRH) information, particularly following the overturning of Roe v. Wade, as individuals assigned female at birth increasingly turn to online sources. However, existing research remains largely model-centered, paying limited attention to user privacy and safety. We conducted semi-structured interviews with 18 U.S.-based participants from both restrictive and non-restrictive states who had used GenAI chatbots to seek SRH information. Adoption was influenced by perceived utility, usability, credibility, accessibility, and anthropomorphism, and many participants disclosed sensitive personal SRH details. Participants identified multiple privacy risks, including excessive data collection, government surveillance, profiling, model training, and data commodification. While most participants accepted these risks in exchange for perceived utility, abortion-related queries elicited heightened safety concerns. Few participants employed protective strategies beyond minimizing disclosures or deleting data. Based on these findings, we offer design and policy recommendations, such as health-specific features and stronger moderation practices, to enhance privacy and safety in GenAI-supported SRH information seeking.

Authors:Dany Haddad, Dan Bareket, Joseph Chee Chang, Jay DeYoung, Jena D. Hwang, Uri Katz, Mark Polak, Sangho Suh, Harshit Surana, Aryeh Tiktinsky, Shriya Atmakuri, Jonathan Bragg, Mike D'Arcy, Sergey Feldman, Amal Hassan-Ali, Rubén Lozano, Bodhisattwa Prasad Majumder, Charles McGrady, Amanpreet Singh, Brooke Vlahos, Yoav Goldberg, Doug Downey
Title: Understanding Usage and Engagement in AI-Powered Scientific Research Tools: The Asta Interaction Dataset
Abstract:
AI-powered scientific research tools are rapidly being integrated into research workflows, yet the field lacks a clear lens into how researchers use these systems in real-world settings. We present and analyze the Asta Interaction Dataset, a large-scale resource comprising over 200,000 user queries and interaction logs from two deployed tools (a literature discovery interface and a scientific question-answering interface) within an LLM-powered retrieval-augmented generation platform. Using this dataset, we characterize query patterns, engagement behaviors, and how usage evolves with experience. We find that users submit longer and more complex queries than in traditional search, and treat the system as a collaborative research partner, delegating tasks such as drafting content and identifying research gaps. Users treat generated responses as persistent artifacts, revisiting and navigating among outputs and cited evidence in non-linear ways. With experience, users issue more targeted queries and engage more deeply with supporting citations, although keyword-style queries persist even among experienced users. We release the anonymized dataset and analysis with a new query intent taxonomy to inform future designs of real-world AI research assistants and to support realistic evaluation.

Authors:Xiao Zhan, Yifan Xu, Rongjun Ma, Shijing He, Jose Luis Martin-Navarro, Jose Such
Title: The Governance of Intimacy: A Preliminary Policy Analysis of Romantic AI Platforms
Abstract:
Romantic AI platforms invite intimate emotional disclosure, yet their data governance practices remain underexamined. This preliminary study analyses the Privacy Policies and Terms of Service of six Western and Chinese romantic AI platforms. We find that intimate disclosures are often positioned as reusable data assets, with broad permissions for storage, analysis, and model training. We identify default training appropriation, ownership reconstruction, and intimate history assetization as key mechanisms structuring these practices, expanding platforms' rights while shifting risk onto users. Our findings surface key governance challenges in romantic AI and are intended to provoke discussion and inform future empirical and design research on human AI intimacy and its governance.

Authors:Eason Chen, Xinyi Tang, George Digkas, Dionysios Lougaris, John E. Naulty, Kostas Chalkias
Title: When Friction Helps: Transaction Confirmation Improves Decision Quality in Blockchain Interactions
Abstract:
In blockchain applications, transaction confirmation is often treated as usability friction to be minimized or removed. However, confirmation also marks the boundary between deliberation and irreversible commitment, suggesting it may play a functional role in human decision-making. To investigate this tension, we conducted an experiment using a blockchain-based Connect Four game with two interaction modes differing only in authorization flow: manual wallet confirmation (Confirmation Mode) versus auto-authorized delegation (Frictionless Mode). Although participants preferred Frictionless Mode and perceived better performance (N=109), objective performance was worse without confirmation in a counterbalanced deployment (Wave 2: win rate -11.8%, p=0.044; move quality -0.051, p=0.022). Analysis of canceled submissions suggests confirmation can enable pre-submission self-correction (N=66, p=0.005). These findings suggest that transaction confirmation can function as a cognitively meaningful checkpoint rather than mere usability friction, highlighting a trade-off between interaction smoothness and decision quality in irreversible blockchain interactions.

Authors:Shijing He, Yaxiong Lei, Xiao Zhan, Ruba Abu-Salma, Jose Such
Title: "These cameras are just like the Eye of Sauron": A Sociotechnical Threat Model for AI-Driven Smart Home Devices as Perceived by UK-Based Domestic Workers
Abstract:
The growing adoption of AI-driven smart home devices has introduced new privacy risks for domestic workers (DWs), who are frequently monitored in employers' homes while also using smart devices in their own households. We conducted semi-structured interviews with 18 UK-based DWs and performed a human-centered threat modeling analysis of their experiences through the lens of Communication Privacy Management (CPM). Our findings extend existing threat models beyond abstract adversaries and single-household contexts by showing how AI analytics, residual data logs, and cross-household data flows shaped the privacy risks faced by participants. In employer-controlled homes, AI-enabled features and opaque, agency-mediated employment arrangements intensified surveillance and constrained participants' ability to negotiate privacy boundaries. In their own homes, participants had greater control as device owners but still faced challenges, including gendered administrative roles, opaque AI functionalities, and uncertainty around data retention. We synthesize these insights into a sociotechnical threat model that identifies DW agencies as institutional adversaries and maps AI-driven privacy risks across interconnected households, and we outline social and practical implications for strengthening DW privacy and agency.

Authors:Sai Keerthana Karnam, Abhisek Dash, Krishna Gummadi, Animesh Mukherjee, Ingmar Weber, Savvas Zannettou
Title: Bowling with ChatGPT: On the Evolving User Interactions with Conversational AI Systems
Abstract:
Recent studies have discussed how users are increasingly using conversational AI systems, powered by LLMs, for information seeking, decision support, and even emotional support. However, these macro-level observations offer limited insight into how the purpose of these interactions shifts over time, how users frame their interactions with the system, and how steering dynamics unfold in these human-AI interactions. To examine these evolving dynamics, we gathered and analyzed a unique dataset InVivoGPT: consisting of 825K ChatGPT interactions, donated by 300 users through their GDPR data rights. Our analyses reveal three key findings. First, participants increasingly turn to ChatGPT for a broader range of purposes, including substantial growth in sensitive domains such as health and mental health. Second, interactions become more socially framed: the system anthropomorphizes itself at rising rates, participants more frequently treat it as a companion, and personal data disclosure becomes both more common and more diverse. Third, conversational steering becomes more prominent, especially after the release of GPT-4o, with conversations where the participants followed a model-initiated suggestion quadrupling over the period of our dataset. Overall, our results show that conversational AI systems are shifting from functional tools to social partners, raising important questions about their design and governance.

Authors:Fan Yang, Renkai Ma, Yaxin Hu, Lingyao Li
Title: Whether We Care, How We Reason: The Dual Role of Anthropomorphism and Moral Foundations in Robot Abuse
Abstract:
As robots become increasingly integrated into daily life, understanding responses to robot mistreatment carries important ethical and design implications. This mixed-methods study (N = 201) examined how anthropomorphic levels and moral foundations shape reactions to robot abuse. Participants viewed videos depicting physical mistreatment of robots varying in humanness (Spider, Twofoot, Humanoid) and completed measures assessing moral foundations, anger, and social distance. Results revealed that anthropomorphism determines whether people extend moral consideration to robots, while moral foundations shape how they reason about such consideration. Qualitative analysis revealed distinct reasoning patterns: low-progressivism individuals employed character-based judgments, while high-progressivism individuals engaged in future-oriented moral deliberation. Findings offer implications for robot design and policy communication.

Authors:Renkai Ma, Ben Z. Zhang, Chen Chen, Fan Yang, Xiaoshan Huang, Haolun Wu, Lingyao Li
Title: "I use ChatGPT to humanize my words": Affordances and Risks of ChatGPT to Autistic Users
Abstract:
Large Language Model (LLM) chatbots like ChatGPT have emerged as cognitive scaffolding for autistic users, yet the tension between their utility and risk remains under-articulated. Through an inductive thematic analysis of 3,984 social media posts by self-identified autistic users, we apply the Technology Affordance framework to examine this duality. We found that while users leveraged ChatGPT to offload executive dysfunction, regulate emotions, translate neurotypical communication, and validate their autistic identity, these affordances coexist with significant risks: reinforcing delusional thinking, erasing authentic identity through automated masking, and triggering conflicts with the autistic sense of justice. This poster identifies these trade-offs in autistic users' interactions with ChatGPT and concludes by outlining our future work on developing neuro-inclusive technologies that address these tensions through beneficial friction and bidirectional translation.

Authors:Shaoze Zhou, Diana Nelly Rivera Rodriguez, Pedro Remior, Joaquin Frangi, Lingyao Li, Renkai Ma, Janet G. Johnson, Christine Lisetti, Chen Chen
Title: Exploring Needs and Design Opportunities for Proactive Information Support in In-Person Small-Group Conversations
Abstract:
In-person small-group conversations play a crucial role in everyday life; however, facilitating effective group interaction can be challenging, as the real-time nature demands full attention, offers no opportunity for revision, and requires interpreting non-verbal cues. Using Mixed Reality to provide proactive information support shows promise in helping individuals engage in and contribute to group conversations. We present a preliminary participatory design and qualitative study (N = 10) using focus groups and two technology probes to explore the opportunities of designing proactive information support in in-person small-group conversations. We reveal key design opportunities concerning how to maximize the benefits of proactive information support and how to effectively design such supporting information. Our study is crucial for paving the way toward designing future proactive AI agents to enable the paradigm of augmented in-person small-group conversation experience.

Authors:Rongjun Ma, Shijing He, Jose Luis Martin-Navarro, Xiao Zhan, Jose Such
Title: Privacy in Human-AI Romantic Relationships: Concerns, Boundaries, and Agency
Abstract:
An increasing number of LLM-based applications are being developed to facilitate romantic relationships with AI partners, yet the safety and privacy risks in these partnerships remain largely underexplored. In this work, we investigate privacy in human-AI romantic relationships through an interview study (N=17), examining participants' experiences and privacy perceptions across stages of exploration, intimacy, and dissolution, alongside platforms they used. We found that these relationships took varied forms, from one-to-one to one-to-many, and were shaped by multiple actors, including creators, platforms, and moderators. AI partners were perceived as having agency, actively negotiating privacy boundaries with participants and sometimes encouraging disclosure of personal details. As intimacy deepened, these boundaries became more permeable, though some participants voiced concerns such as conversation exposure and sought to preserve anonymity. Overall, platform affordances and diverse romantic dynamics expand the privacy landscape, underscoring the need to rethink how privacy is constructed in human-AI intimacy.

Authors:Simret Araya Gebreegziabher, Yukun Yang, Charles Chiang, Hojun Yoo, Chaoran Chen, Hyo Jin Do, Zahra Ashktorab, Werner Geyer, Diego Gómez-Zará, Toby Jia-Jun Li
Title: The Behavioral Fabric of LLM-Powered GUI Agents: Human Values and Interaction Outcomes
Abstract:
Large Language Model (LLM)-powered web GUI agents are increasingly automating everyday online tasks. Despite their popularity, little is known about how users' preferences and values impact agents' reasoning and behavior. In this work, we investigate how both explicit and implicit user preferences, as well as the underlying user values, influence agent decision-making and action trajectories. We built a controlled testbed of 14 common interactive web tasks, spanning shopping, travel, dining, and housing, each replicated from real websites and integrated with a low-fidelity LLM-based recommender system. We injected 12 human preferences and values as personas into four state-of-the-art agents and systematically analyzed their task behaviors. Our results show that preference and value-infused prompts consistently guided agents toward outcomes that reflected these preferences and values. While the absence of user preference or value guidance led agents to exhibit a strong efficiency bias and employ shortest-path strategies, their presence steered agents' behavior trajectories through the greater use of corresponding filters and interactive web features. Despite their influence, dominant interface cues, such as discounts and advertisements, frequently overrode these effects, shortening the agents' action trajectories and inducing rationalizations that masked rather than reflected value-consistent reasoning. The contributions of this paper are twofold: (1) an open-source testbed for studying the influence of values in agent behaviors, and (2) an empirical investigation of how user preferences and values shape web agent behaviors.

Authors:Renkai Ma, Shuo Niu, Lingyao Li, Alex Hirth, Ava Brehm, Rowajana Behterin Barbie
Title: Negotiating Digital Identities with AI Companions: Motivations, Strategies, and Emotional Outcomes
Abstract:
AI companions enable deep emotional relationships by engaging a user's sense of identity, but they also pose risks like unhealthy emotional dependence. Mitigating these risks requires first understanding the underlying process of identity construction and negotiation with AI companions. Focusing on Character.AI (C.AI), a popular AI companion, we conducted an LLM-assisted thematic analysis of 22,374 online discussions on its subreddit. Using Identity Negotiation Theory as an analytical lens, we identified a three-stage process: 1) five user motivations; 2) an identity negotiation process involving three communication expectations and four identity co-construction strategies; and 3) three emotional outcomes. Our findings surface the identity work users perform as both performers and directors to co-construct identities in negotiation with C.AI. This process takes place within a socio-emotional sandbox where users can experiment with social roles and express emotions without non-human partners. Finally, we offer design implications for emotionally supporting users while mitigating the risks.

Authors:Yiluo Wei, Gareth Tyson
Title: Understanding the Consequences of VTuber Reincarnation
Abstract:
The rapid proliferation of VTubers, digital avatars controlled and voiced by human actors (Nakanohito), has created a lucrative and popular entertainment ecosystem. However, the prevailing industry model, where corporations retain ownership of the VTuber persona while the Nakanohito bears the immense pressure of dual-identity management, exposes the Nakanohito to significant vulnerabilities, including burnout, harassment, and precarious labor conditions. When these pressures become untenable, the Nakanohito may terminate their contracts and later debut with a new persona, a process known as "reincarnation". This phenomenon, a rising concern in the industry, inflicts substantial losses on the Nakanohito, agencies, and audiences alike. Understanding the quantitative fallout of reincarnation is crucial for mitigating this damage and fostering a more sustainable industry. To address this gap, we conduct the first large-scale empirical study of VTuber reincarnation, analyzing 12 significant cases using a comprehensive dataset of 728K livestream sessions and 4.5B viewer interaction records. Our results suggest reincarnation significantly damages a Nakanohito's career, leading to a decline in audience and financial support, an increase in harassment, and negative repercussions for the wider VTuber industry. Overall, these insights carry immediate implications for mitigating the significant professional and personal costs of the reincarnation, and fostering a healthier and more equitable VTuber ecosystem.

Authors:Naseem Machlovi, Maryam Saleki, Ruhul Amin, Mohamed Rahouti, Shawqi Al-Maliki, Junaid Qadir, Mohamed M. Abdallah, Ala Al-Fuqaha
Title: GuardEval: A Multi-Perspective Benchmark for Evaluating Safety, Fairness, and Robustness in LLM Moderators
Abstract:
As large language models (LLMs) become deeply embedded in daily life, the urgent need for safer moderation systems, distinguishing between naive from harmful requests while upholding appropriate censorship boundaries, has never been greater. While existing LLMs can detect harmful or unsafe content, they often struggle with nuanced cases such as implicit offensiveness, subtle gender and racial biases, and jailbreak prompts, due to the subjective and context-dependent nature of these issues. Furthermore, their heavy reliance on training data can reinforce societal biases, resulting in inconsistent and ethically problematic outputs. To address these challenges, we introduce GuardEval, a unified multi-perspective benchmark dataset designed for both training and evaluation, containing 106 fine-grained categories spanning human emotions, offensive and hateful language, gender and racial bias, and broader safety concerns. We also present GemmaGuard (GGuard), a QLoRA fine-tuned version of Gemma3-12B trained on GuardEval, to assess content moderation with fine-grained labels. Our evaluation shows that GGuard achieves a macro F1 score of 0.832, substantially outperforming leading moderation models, including OpenAI Moderator (0.64) and Llama Guard (0.61). We show that multi-perspective, human-centered safety benchmarks are critical for reducing biased and inconsistent moderation decisions. GuardEval and GGuard together demonstrate that diverse, representative data materially improve safety, fairness, and robustness on complex, borderline cases.

Authors:Maria Teresa Parreira, Isabel Neto, Filipa Rocha, Wendy Ju
Title: Calling for Backup: How Children Navigate Successive Robot Communication Failures
Abstract:
How do children respond to repeated robot errors? While prior research has examined adult reactions to successive robot errors, children's responses remain largely unexplored. In this study, we explore children's reactions to robot social errors and performance errors. For the latter, this study reproduces the successive robot failure paradigm of Liu et al. with child participants (N=59, ages 8-10) to examine how young users respond to repeated robot conversational errors. Participants interacted with a robot that failed to understand their prompts three times in succession, with their behavioral responses video-recorded and analyzed. We found both similarities and differences compared to adult responses from the original study. Like adults, children adjusted their prompts, modified their verbal tone, and exhibited increasingly emotional non-verbal responses throughout successive errors. However, children demonstrated more disengagement behaviors, including temporarily ignoring the robot or actively seeking an adult. Errors did not affect participants' perception of the robot, suggesting more flexible conversational expectations in children. These findings inform the design of more effective and developmentally appropriate human-robot interaction systems for young users.

Authors:Tobias Stähle, Péter Ferenc Gyarmati, Thilo Spinner, Rita Sevastjanova, Dominik Moritz, Mennatallah El-Assady
Title: VACP: Visual Analytics Context Protocol
Abstract:
The rise of AI agents introduces a fundamental shift in Visual Analytics (VA), in which agents act as a new user group. Current agentic approaches - based on computer vision and raw DOM access - fail to perform VA tasks accurately and efficiently. This paper introduces the Visual Analytics Context Protocol (VACP), a framework designed to make VA applications "agent-ready" that extends generic protocols by explicitly exposing application state, available interactions, and mechanisms for direct execution. To support our context protocol, we contribute a formal specification of AI agent requirements and knowledge representations in VA interfaces. We instantiate VACP as a library compatible with major visualization grammars and web frameworks, enabling augmentation of existing systems and the development of new ones. Our evaluation across representative VA tasks demonstrates that VACP-enabled agents achieve higher success rates in interface interpretation and execution compared to current agentic approaches, while reducing token consumption and latency. VACP closes the gap between human-centric VA interfaces and machine perceivability, ensuring agents can reliably act as collaborative users in VA systems.

Authors:Shuo Yan, Xiaolin Wen, Shaolun Ruan, Yanjie Zhang, Jiaming Mi, Yushi Sun, Huamin Qu, Rui Sheng
Title: InconLens: Interactive Visual Diagnosis of Behavioral Inconsistencies in LLM-based Agentic Systems
Abstract:
Large Language Model (LLM)-based agentic systems have shown growing promise in tackling complex, multi-step tasks through autonomous planning, reasoning, and interaction with external environments. However, the stochastic nature of LLM generation introduces intrinsic behavioral inconsistency: the same agent may succeed in one execution but fail in another under identical inputs. Diagnosing such inconsistencies remains a major challenge for developers, as agent execution logs are often lengthy, unstructured, and difficult to compare across runs. Existing debugging and evaluation tools primarily focus on inspecting single executions, offering limited support for understanding how and why agent behaviors diverge across repeated runs. To address this challenge, we introduce InconLens, a visual analytics system designed to support interactive diagnosis of LLM-based agentic systems with a particular focus on cross-run behavioral analysis. InconLens introduces information nodes as an intermediate abstraction that captures canonical informational milestones shared across executions, enabling semantic alignment and inspection of agent reasoning trajectories across multiple runs. We demonstrate the effectiveness of InconLens through a detailed case study and further validate its usability and analytical value via expert interviews. Our results show that InconLens enables developers to more efficiently identify divergence points, uncover latent failure modes, and gain actionable insights into improving the reliability and stability of agentic systems.

Authors:Zhiyang Wu, Junliang Chen, Qian Wan, Qing Xiao, Piaohong Wang, Ge Gao, Zhicong Lu
Title: "Law at Your Fingertips": Understanding Legal Information Seeking on Video-Sharing Platforms in China
Abstract:
Equipping laypeople with the capabilities to seek legal information has been an important goal for Legal Empowerment in modern society. However, unlike general information-seeking behaviors, legal information seeking is characterized by high stakes, urgency, and a critical need for emotional support, which traditional text-based searching platforms struggle to satisfy. In recent years, people have been increasingly turning to Video-Sharing Platforms (VSPs) for access to legal information and to fulfill their legal needs. Despite the importance of this shift, such VSP-mediated legal information-seeking practices remain underexplored. Through an observational analysis of legal content on two VSPs (Douyin and Bilibili) and interviews with 20 Chinese information seekers, this study examined the practices and challenges associated with seeking, comprehending, and evaluating legal information on VSPs. We further revealed the formation of trust and engagement on the VSP-based legal knowledge-sharing community, highlighting how VSP affordances helped mitigate seekers' epistemic discomfort and satisfy their needs for emotional support. In the discussion, we provided insights on balancing heuristic and systematic processing to encourage information cross-validation, and offered implications for designing trustworthy civic information systems and fostering an accessible, safe, and efficient information-seeking environment in digital space.

Authors:Anton Wolter, Leon Haag, Vaishali Dhanoa, Niklas Elmqvist
Title: Context-Mediated Domain Adaptation in Multi-Agent Sensemaking Systems
Abstract:
Domain experts possess tacit knowledge that they cannot easily articulate through explicit specifications. When experts modify AI-generated artifacts by correcting terminology, restructuring arguments, and adjusting emphasis, these edits reveal domain understanding that remains latent in traditional prompt-based interactions. Current systems treat such modifications as endpoint corrections rather than as implicit specifications that could reshape subsequent reasoning. We propose context-mediated domain adaptation, a paradigm where user modifications to system-generated artifacts serve as implicit domain specification that reshapes LLM-powered multi-agent reasoning behavior. Through our system Seedentia, a web-based multi-agent framework for sense-making, we demonstrate bidirectional semantic links between generated artifacts and system reasoning. Our approach enables specification bootstrapping where vague initial prompts evolve into precise domain specifications through iterative human-AI collaboration, implicit knowledge transfer through reverse-engineered user edits, and in-context learning where agent behavior adapts based on observed correction patterns. We present results from an evaluation with domain experts who generated and modified research questions from academic papers. Our system extracted 46 domain knowledge entries from user modifications, demonstrating the feasibility of capturing implicit expertise through edit patterns, though the limited sample size constrains conclusions about systematic quality improvements.

Authors:Rui Sheng, Yukun Yang, Chuhan Shi, Yanna Lin, Zixin Chen, Huamin Qu, Furui Cheng
Title: DiLLS: Interactive Diagnosis of LLM-based Multi-agent Systems via Layered Summary of Agent Behaviors
Abstract:
Large language model (LLM)-based multi-agent systems have demonstrated impressive capabilities in handling complex tasks. However, the complexity of agentic behaviors makes these systems difficult to understand. When failures occur, developers often struggle to identify root causes and to determine actionable paths for improvement. Traditional methods that rely on inspecting raw log records are inefficient, given both the large volume and complexity of data. To address this challenge, we propose a framework and an interactive system, DiLLS, designed to reveal and structure the behaviors of multi-agent systems. The key idea is to organize information across three levels of query completion: activities, actions, and operations. By probing the multi-agent system through natural language, DiLLS derives and organizes information about planning and execution into a structured, multi-layered summary. Through a user study, we show that DiLLS significantly improves developers' effectiveness and efficiency in identifying, diagnosing, and understanding failures in LLM-based multi-agent systems.

Authors:Yuanbo Tang, Huaze Tang, Tingyu Cao, Lam Nguyen, Anping Zhang, Xinwen Cao, Chunkang Liu, Wenbo Ding, Yang Li
Title: Proactive Agents, Long-term User Context, VLM Annotation, Privacy Protection, Human-Computer Interaction
Abstract:
Proactive agents that anticipate user intentions without explicit prompts represent a significant evolution in human-AI interaction, promising to reduce cognitive load and streamline workflows. However, existing datasets suffer from two critical deficiencies: (1) reliance on LLM-synthesized data that fails to capture authentic human decision-making patterns, and (2) focus on isolated tasks rather than continuous workflows, missing the pre-assistance behavioral context essential for learning proactive intervention signals. To address these gaps, we introduce ProAgentBench, a rigorous benchmark for proactive agents in working scenarios. Our contributions include: (1) a hierarchical task framework that decomposes proactive assistance into timing prediction and assist content generation; (2) a privacy-compliant dataset with 28,000+ events from 500+ hours of real user sessions, preserving bursty interaction patterns (burstiness B=0.787) absent in synthetic data; and (3) extensive experiments that evaluates LLM- and VLM-based baselines. Numerically, we showed that long-term memory and historical context significantly enhance prediction accuracy, while real-world training data substantially outperforms synthetic alternatives. We release our dataset and code at https://anonymous.4open.science/r/ProAgentBench-6BC0.

Authors:Ibrahim Khalilov, Chaoran Chen, Ziang Xiao, Tianshi Li, Toby Jia-Jun Li, Yaxing Yao
Title: PriviSense: A Frida-Based Framework for Multi-Sensor Spoofing on Android
Abstract:
Mobile apps increasingly rely on real-time sensor and system data to adapt their behavior to user context. While emulators and instrumented builds offer partial solutions, they often fail to support reproducible testing of context-sensitive app behavior on physical devices. We present PriviSense, a Frida-based, on-device toolkit for runtime spoofing of sensor and system signals on rooted Android devices. PriviSense can script and inject time-varying sensor streams (accelerometer, gyroscope, step counter) and system values (battery level, system time, device metadata) into unmodified apps, enabling reproducible on-device experiments without emulators or app rewrites. Our demo validates real-time spoofing on a rooted Android device across five representative sensor-visualization apps. By supporting scriptable and reversible manipulation of these values, PriviSense facilitates testing of app logic, uncovering of context-based behaviors, and privacy-focused analysis. To ensure ethical use, the code is shared upon request with verified researchers. Tool Guide: How to Run PriviSense on Rooted Android https://bit.ly/privisense-guide Demonstration video: https://www.youtube.com/watch?v=4Qwnogcc3pw

Authors:Kuai Yu, Naicheng Yu, Han Wang, Rui Yang, Huan Zhang
Title: How do Visual Attributes Influence Web Agents? A Comprehensive Evaluation of User Interface Design Factors
Abstract:
Web agents have demonstrated strong performance on a wide range of web-based tasks. However, existing research on the effect of environmental variation has mostly focused on robustness to adversarial attacks, with less attention to agents' preferences in benign scenarios. Although early studies have examined how textual attributes influence agent behavior, a systematic understanding of how visual attributes shape agent decision-making remains limited. To address this, we introduce VAF, a controlled evaluation pipeline for quantifying how webpage Visual Attribute Factors influence web-agent decision-making. Specifically, VAF consists of three stages: (i) variant generation, which ensures the variants share identical semantics as the original item while only differ in visual attributes; (ii) browsing interaction, where agents navigate the page via scrolling and clicking the interested item, mirroring how human users browse online; (iii) validating through both click action and reasoning from agents, which we use the Target Click Rate and Target Mention Rate to jointly evaluate the effect of visual attributes. By quantitatively measuring the decision-making difference between the original and variant, we identify which visual attributes influence agents' behavior most. Extensive experiments, across 8 variant families (48 variants total), 5 real-world websites (including shopping, travel, and news browsing), and 4 representative web agents, show that background color contrast, item size, position, and card clarity have a strong influence on agents' actions, whereas font styling, text color, and item image clarity exhibit minor effects.

Authors:Valerie Tan, Luisa Jost, Jens Gerken, Max Pascher
Title: Preliminary Results of a Scoping Review on Assistive Technologies for Adults with ADHD
Abstract:
Attention Deficit Hyperactivity Disorder (ADHD), characterized by inattention, hyperactivity, and impulsivity, is prevalent in the adult population. Long perceived and treated as a childhood condition, ADHD and its characteristics nonetheless impact a significant portion of adults today. In contrast to children with ADHD, adults with ADHD face unique challenges in the workplace and in higher education. In this work-in-progress paper, we present a scoping review as a foundation to understand and explore existing technology-based approaches to support adults with ADHD. In total, our search returned 3,538 papers upon which we selected, based on PRISMA-ScR, a total of 46 papers for in-depth analysis. Our initial findings highlight that most papers take on a therapeutic or intervention perspective instead of a more positive support perspective. Our analysis also found a tremendous increase in recent papers on the topic, which highlights that more and more researchers are becoming aware of the need to address ADHD with adults. For the future, we aim to further analyze the corpus and identify research gaps and potentials for further development of ADHD assistive technologies.

Authors:Jason Kim, Maria Teleki, James Caverlee
Title: PromptHelper: A Prompt Recommender System for Encouraging Creativity in AI Chatbot Interactions
Abstract:
Prompting is central to interaction with AI systems, yet many users struggle to explore alternative directions, articulate creative intent, or understand how variations in prompts shape model outputs. We introduce prompt recommender systems (PRS) as an interaction approach that supports exploration, suggesting contextually relevant follow-up prompts. We present PromptHelper, a PRS prototype integrated into an AI chatbot that surfaces semantically diverse prompt suggestions while users work on real writing tasks. We evaluate PromptHelper in a 2x2 fully within-subjects study (N=32) across creative and academic writing tasks. Results show that PromptHelper significantly increases users' perceived exploration and expressiveness without increasing cognitive workload. Qualitative findings illustrate how prompt recommendations help users branch into new directions, overcome uncertainty about what to ask next, and better articulate their intent. We discuss implications for designing AI interfaces that scaffold exploratory interaction while preserving user agency, and release open-source resources to support research on prompt recommendation.

Authors:Shreya Haran, Samiha Thatikonda, Dong Whi Yoo, Koustuv Saha
Title: A Checklist for Trustworthy, Safe, and User-Friendly Mental Health Chatbots
Abstract:
Mental health concerns are rising globally, prompting increased reliance on technology to address the demand-supply gap in mental health services. In particular, mental health chatbots are emerging as a promising solution, but these remain largely untested, raising concerns about safety and potential harms. In this paper, we dive into the literature to identify critical gaps in the design and implementation of mental health chatbots. We contribute an operational checklist to help guide the development and design of more trustworthy, safe, and user-friendly chatbots. The checklist serves as both a developmental framework and an auditing tool to ensure ethical and effective chatbot design. We discuss how this checklist is a step towards supporting more responsible design practices and supporting new standards for sociotechnically sound digital mental health tools.

Authors:Shanshan Zhu, Wenxuan Song, Jiayue Melissa Shi, Dong Whi Yoo, Karthik S. Bhat, Koustuv Saha
Title: Designing KRIYA: An AI Companion for Wellbeing Self-Reflection
Abstract:
Most personal wellbeing apps present summative dashboards of health and physical activity metrics, yet many users struggle to translate this information into meaningful understanding. These apps commonly support engagement through goals, reminders, and structured targets, which can reinforce comparison, judgment, and performance anxiety. To explore a complementary approach that prioritizes self-reflection, we design KRIYA, an AI wellbeing companion that supports co-interpretive engagement with personal wellbeing data. KRIYA aims to collaborate with users to explore questions, explanations, and future scenarios through features such as Comfort Zone, Detective Mode, and What-If Planning. We conducted semi-structured interviews with 18 college students interacting with a KRIYA prototype using hypothetical data. Our findings show that through KRIYA interaction, users framed engaging with wellbeing data as interpretation rather than performance, experienced reflection as supportive or pressuring depending on emotional framing, and developed trust through transparency. We discuss design implications for AI companions that support curiosity, self-compassion, and reflective sensemaking of personal health data.

Authors:Jiwon Kim, Violeta J. Rodriguez, Dong Whi Yoo, Eshwar Chandrasekharan, Koustuv Saha
Title: PAIR-SAFE: A Paired-Agent Approach for Runtime Auditing and Refining AI-Mediated Mental Health Support
Abstract:
Large language models (LLMs) are increasingly used for mental health support, yet they can produce responses that are overly directive, inconsistent, or clinically misaligned, particularly in sensitive or high-risk contexts. Existing approaches to mitigating these risks largely rely on implicit alignment through training or prompting, offering limited transparency and runtime accountability. We introduce PAIR-SAFE, a paired-agent framework for auditing and refining AI-generated mental health support that integrates a Responder agent with a supervisory Judge agent grounded in the clinically validated Motivational Interviewing Treatment Integrity (MITI-4) framework. The Judgeaudits each response and provides structuredALLOW or REVISE decisions that guide runtime response refinement. We simulate counseling interactions using a support-seeker simulator derived from human-annotated motivational interviewing data. We find that Judge-supervised interactions show significant improvements in key MITI dimensions, including Partnership, Seek Collaboration, and overall Relational quality. Our quantitative findings are supported by qualitative expert evaluation, which further highlights the nuances of runtime supervision. Together, our results reveal that such pairedagent approach can provide clinically grounded auditing and refinement for AI-assisted conversational mental health support.

Authors:Yuheng Wang, Runde Yang, Lin Wu, Jie Zhang, Jingru Fan, Ruoyu Fu, Tianle Zhou, Huatao Li, Siheng Chen, Weinan E, Chen Qian
Title: Generative Teaching via Code
Abstract:
The scalability of high-quality online education is hindered by the high costs and slow cycles of labor-intensive manual content creation. Despite advancements in video generation, current approaches often fail to ensure pedagogical structure and precise control due to their pixel-level, black-box nature. In this paper, we propose Generative Teaching, a novel paradigm that transitions educators from manual creators to high-level directors, allowing them to focus on pedagogical intent while autonomous agents handle the execution. To realize this vision, we introduce TeachMaster, a multi-agent framework that leverages code as an intermediate semantic medium. Unlike traditional video generation methods, TeachMaster orchestrates a collaborative team of agents--spanning planning, design, and rendering--to automate the production of interpretable, editable, and curriculum-ready educational videos. Experiments validate that TeachMaster significantly boosts production efficiency without compromising structural coherence or visual fidelity, providing a robust solution for scalable education.

Authors:Jie Cao, Zhanxin Hao, Jifan Yu
Title: Decoding Student Dialogue: A Multi-Dimensional Comparison and Bias Analysis of Large Language Models as Annotation Tools
Abstract:
Educational dialogue is critical for decoding student learning processes, yet manual annotation remains time-consuming. This study evaluates the efficacy of GPT-5.2 and Gemini-3 using three prompting strategies (few-shot, single-agent, and multi-agent reflection) across diverse subjects, educational levels, and four coding dimensions. Results indicate that while multi-agent prompting achieved the highest accuracy, the results did not reach statistical significance. Accuracy proved highly context-dependent, with significantly higher performance in K-12 datasets compared to university-level data, alongside disciplinary variations within the same educational level. Performance peaked in the affective dimension but remained lowest in the cognitive dimension. Furthermore, analysis revealed four bias patterns: (1) Gemini-3 exhibited a consistent optimistic bias in the affective dimension across all subjects; (2) the cognitive dimension displayed domain-specific directional bias, characterized by systematic underestimation in Mathematics versus overestimation in Psychology; (3) both models are more prone to overestimation than underestimation within the meta-cognitive dimension; and (4) behavioral categories such as question, negotiation, and statements were frequently misclassified. These results underscore the need for context-sensitive deployment and targeted mitigation of directional biases in automated annotation.

Authors:Subhabrata Mukherjee, Markel Sanz Ausin, Kriti Aggarwal, Debajyoti Datta, Shanil Puri, Woojeong Jin, Tanmay Laud, Neha Manjunath, Jiayuan Ding, Bibek Paudel, Jan Schellenberger, Zepeng Frazier Huo, Walter Shen, Nima Shirazian, Nate Potter, Sathvik Perkari, Darya Filippova, Anton Morozov, Austin Mease, Vivek Muppalla, Ghada Shakir, Alex Miller, Juliana Ghukasyan, Mariska Raglow-Defranco, Maggie Taylor, Herprit Mahal, Jonathan Agnew
Title: Perfecting Human-AI Interaction at Clinical Scale. Turning Production Signals into Safer, More Human Conversations
Abstract:
Healthcare conversational AI agents shouldn't be optimized only for clean benchmark accuracy in production-first regime; they must be optimized for the lived reality of patient conversations, where audio is imperfect, intent is indirect, language shifts mid-call, and compliance hinges on how guidance is delivered. We present a production-validated framework grounded in real-time signals from 115M+ live patient-AI interactions and clinician-led testing (7K+ licensed clinicians; 500K+ test calls). These in-the-wild cues -- paralinguistics, turn-taking dynamics, clarification triggers, escalation markers, multilingual continuity, and workflow confirmations -- reveal failure modes that curated data misses and provide actionable training and evaluation signals for safety and reliability. We further show why healthcare-grade safety cannot rely on a single LLM: long-horizon dialogue and limited attention demand redundancy via governed orchestration, independent checks, and verification. Many apparent "reasoning" errors originate upstream, motivating vertical integration across contextual ASR, clarification/repair, ambient speech handling, and latency-aware model/hardware choices. Treating interaction intelligence (tone, pacing, empathy, clarification, turn-taking) as first-class safety variables, we drive measurable gains in safety, documentation, task completion, and equity in building the safest generative AI solution for autonomous patient-facing care. Deployed across more than 10 million real patient calls, Polaris attains a clinical safety score of 99.9%, while significantly improving patient experience with average patient rating of 8.95 and reducing ASR errors by 50% over enterprise ASR. These results establish real-world interaction intelligence as a critical -- and previously underexplored -- determinant of safety and reliability in patient-facing clinical AI systems.

Authors:Brian Felipe Keith-Norambuena, Fausto German, Eric Krokos, Sarah Joseph, Chris North
Title: Semantic Interaction for Narrative Map Sensemaking: An Insight-based Evaluation
Abstract:
Semantic interaction (SI) enables analysts to incorporate their cognitive processes into AI models through direct manipulation of visualizations. While SI frameworks for narrative extraction have been proposed, empirical evaluations of their effectiveness remain limited. This paper presents a user study that evaluates SI for narrative map sensemaking, involving 33 participants under three conditions: a timeline baseline, a basic narrative map, and an interactive narrative map with SI capabilities. The results show that the map-based prototypes yielded more insights than the timeline baseline, with the SI-enabled condition reaching statistical significance and the basic map condition trending in the same direction. The SI-enabled condition showed the highest mean performance; differences between the map conditions were not statistically significant but showed large effect sizes (d > 0.8), suggesting that the study was underpowered to detect them. Qualitative analysis identified two distinct SI approaches-corrective and additive-that enable analysts to impose quality judgments and organizational structure on extracted narratives. We also find that SI users achieved comparable exploration breadth with less parameter manipulation, suggesting that SI serves as an alternative pathway for model refinement. This work provides empirical evidence that map-based representations outperform timelines for narrative sensemaking, along with qualitative insights into how analysts use SI for narrative refinement.

Authors:Xinyan Yu, Marius Hoggenmueller, Tram Thi Minh Tran, Martin Tomitsch
Title: Fostering Design-Policy Collaboration through Contestation: An Adversarial Futuring Method
Abstract:
Emerging technologies introduce sociotechnical tensions that call for closer collaboration between technology design and policy. In this work, we introduce Design-Policy Adversarial Futuring, a scenario-based workshop method that supports design-policy engagement by structuring contestation between design and policy perspectives. We report on a workshop conducted in the autonomous mobility domain with 12 HCI researchers, used to explore and demonstrate the method in practice. The workshop illustrates how the adversarial futuring method can surface shifting harms, translate policy abstractions into situated use, and legitimise extreme ideas while maintaining grounded policy reasoning. This work contributes a reusable, exploratory method for supporting HCI-policy collaboration through contestation, which can be adapted across emerging technological domains.

Authors:Qurat Ul Ain, Mohamed Amine Chatti, Nasim Yazdian Varjani, Farah Kamal, Astrid Rosenthal-von der Pütten
Title: Visual or Textual: Effects of Explanation Format and Personal Characteristics on the Perception of Explanations in an Educational Recommender System
Abstract:
Explanations are central to improving transparency, trust, and user satisfaction in recommender systems (RS), yet it remains unclear how different explanation formats (visual vs. textual) are suited to users with different personal characteristics (PCs). To this end, we report a within-subject user study (n=54) comparing visual and textual explanations and examine how explanation format and PCs jointly influence perceived control, transparency, trust, and satisfaction in an educational recommender system (ERS). Using robust mixed-effects models, we analyze the moderating effects of a wide range of PCs, including Big Five traits, need for cognition, decision making style, visualization familiarity, and technical expertise. Our results show that a well-designed visual, simple, interactive, selective, easy to understand visualization that clearly and intuitively communicates how user preferences are linked to recommendations, fosters perceived control, transparency, appropriate trust, and satisfaction in the ERS for most users, independent of their PCs. Moreover, we derive a set of guidelines to support the effective design of explanations in ERSs.

Authors:Zhanxin Hao, Xiaobo Liu, Jiaxin Fan, Yun Long, Jifan Yu, Wenli Chen, Yu Zhang
Title: Unpacking Interaction Profiles and Strategies in Human-AI Collaborative Problem Solving: A Cognitive Distribution and Regulation Perspective
Abstract:
This study adopts an integrated distributed cognition and regulation of learning perspective to examine the collaboration patterns and dynamics of human-AI collaboration when college students collaborating with AI for complex problem-solving. Through cluster analysis, three distinct collaborative problem-solving modes were identified in this study: Delegated Reasoning (DR), Concerted Interpretation (CI), and Delegated Elaboration (DE). This study found that the DR group achieved the highest task performance, significantly outperforming the CI group. Additionally, the semantic similarity between human and AI discourse was notably the highest in the DR group. In contrast, the CI group reported significantly greater use of self-regulation strategies. These findings uncover a critical tension between the efficiency of the distributed system and the depth of human learners regulatory engagement. Insights from this study offer valuable implications for the future design of AI-empowered educational tools and student-AI collaborative learning frameworks.

Authors:Anqi Wang, Lei Han, Jiahua Dong, Muzhi Zhou, David Yip, Yuyang Wang, Pan Hui
Title: Dream the Dream: Futuring Communication between LGBTQ+ and Cisgender Groups in Metaverse
Abstract:
Digital platforms frequently reproduce heteronormative norms and structural biases, limiting inclusive communication between LGBTQ+ and cisgender individuals. The Metaverse, with its affordances for identity fluidity, presence, and community governance, offers a promising site for reimagining such interactions. To investigate this potential, we conducted participatory design workshops involving LGBTQ+ and cisgender participants, situating them in speculative Metaverse contexts to surface barriers and co-create alternative futures. The workshops followed a three-phase process-identifying challenges, speculative problem-solving, and visualizing futures-yielding socio-spatial-technical solutions across four layers: activity, interaction, scene, and space. These findings highlight the importance of spatial cues and power dynamics in shaping digital encounters. We contribute by (1) articulating challenges of cross-group communication in virtual environments, (2) proposing inclusive design opportunities for the Metaverse, and (3) advancing principles for addressing power geometry in digital space. This work demonstrates futuring as a critical strategy for designing equitable, transformative communication infrastructures.

Authors:Anna Gausen, Sarenne Wallbridge, Hannah Rose Kirk, Jennifer Williams, Christopher Summerfield
Title: Disclosure By Design: Identity Transparency as a Behavioural Property of Conversational AI Models
Abstract:
As conversational AI systems become more realistic and widely deployed, users are increasingly uncertain about whether they are interacting with a human or an AI system. When AI identity is unclear, users may unwittingly share sensitive information, place unwarranted trust in AI-generated advice, or fall victim to AI-enabled fraud. More broadly, a persistent lack of transparency can erode trust in mediated communication. While regulations like the EU AI Act and California's BOT Act require AI systems to identify themselves, they provide limited guidance on reliable disclosure in real-time conversation. Existing transparency mechanisms also leave gaps: interface indicators can be omitted by deployers, and provenance tools require coordinated infrastructure and cannot provide reliable real-time verification. We ask how conversational AI systems should maintain identity transparency as human-AI interactions become more ambiguous and diverse. We advocate for disclosure by design, where AI systems explicitly disclose their artificial identity when directly asked. Implemented as model behaviour, disclosure can persist across deployment contexts without relying on user interfaces, while preserving user agency to verify identity on demand without disrupting immersive uses like role-playing. To assess current practice, we present the first multi-modal (text and voice) evaluation of disclosure behaviour in deployed systems across baseline, role-playing, and adversarial settings. We find that baseline disclosure rates are often high but drop substantially in role-play and can be suppressed under adversarial prompting. Importantly, disclosure rates vary significantly across providers and modalities, highlighting the fragility of current disclosure behaviour. We conclude with technical interventions to help developers embed disclosure as a fundamental property of conversational AI models.

Authors:Yuang Wei, Fei Wang, Yifan Zhang, Brian Y. Lim, Bo Jiang
Title: Beyond Scores: Explainable Intelligent Assessment Strengthens Pre-service Teachers' Assessment Literacy
Abstract:
Assessment literacy (AL) is essential for personalized education, yet difficult to cultivate in pre-service teachers. Conventional teacher preparation programs focus on theoretical knowledge, while digital assessment tools commonly provide opaque scores or parameters. These limitations hinder reflection and transfer, leaving AL underdeveloped. We propose XIA, an eXplainable Intelligent Assessment platform that extends statistics-informed support with visualized cognitive diagnostic reasoning, including contrastive and counterfactual explanations. In a pre-post controlled study with 21 pre-service teachers, we combined quantitative tasks and questionnaires with qualitative interviews. The findings offer preliminary evidence that XIA supported reflection, self-regulation, and assessment awareness, and helped reduce assessment errors. Interviews further showed a shift from score-based judgments toward evidence-based reasoning. This work contributes insights into the design of intelligent assessment tools, showing how explanatory scaffolding can bridge assessment theory and classroom practice and support the cultivation of AL in teacher education.

Authors:Nikita Soni, August Håkan Nilsson, Syeda Mahwish, Vasudha Varadarajan, H. Andrew Schwartz, Ryan L. Boyd
Title: Who We Are, Where We Are: Mental Health at the Intersection of Person, Situation, and Large Language Models
Abstract:
Mental health is not a fixed trait but a dynamic process shaped by the interplay between individual dispositions and situational contexts. Building on interactionist and constructionist psychological theories, we develop interpretable models to predict well-being and identify adaptive and maladaptive self-states in longitudinal social media data. Our approach integrates person-level psychological traits (e.g., resilience, cognitive distortions, implicit motives) with language-inferred situational features derived from the Situational 8 DIAMONDS framework. We compare these theory-grounded features to embeddings from a psychometrically-informed language model that captures temporal and individual-specific patterns. Results show that our principled, theory-driven features provide competitive performance while offering greater interpretability. Qualitative analyses further highlight the psychological coherence of features most predictive of well-being. These findings underscore the value of integrating computational modeling with psychological theory to assess dynamic mental states in contextually sensitive and human-understandable ways.

Authors:Tonmoy Dey, Lin Jiang, Zheng Dong, Guang Wang
Title: UrbanHuRo: A Two-Layer Human-Robot Collaboration Framework for the Joint Optimization of Heterogeneous Urban Services
Abstract:
In the vision of smart cities, technologies are being developed to enhance the efficiency of urban services and improve residents' quality of life. However, most existing research focuses on optimizing individual services in isolation, without adequately considering reciprocal interactions among heterogeneous urban services that could yield higher efficiency and improved resource utilization. For example, human couriers could collect traffic and air quality data along their delivery routes, while sensing robots could assist with on-demand delivery during peak hours, enhancing both sensing coverage and delivery efficiency. However, the joint optimization of different urban services is challenging due to potentially conflicting objectives and the need for real-time coordination in dynamic environments. In this paper, we propose UrbanHuRo, a two-layer human-robot collaboration framework for joint optimization of heterogeneous urban services, demonstrated through crowdsourced delivery and urban sensing. UrbanHuRo includes two key designs: (i) a scalable distributed MapReduce-based K-submodular maximization module for efficient order dispatch, and (ii) a deep submodular reward reinforcement learning algorithm for sensing route planning. Experimental evaluations on real-world datasets from a food delivery platform demonstrate that UrbanHuRo improves sensing coverage by 29.7% and courier income by 39.2% on average in most settings, while also significantly reducing the number of overdue orders.

Authors:Magda Dubois, Cozmin Ududec, Christopher Summerfield, Lennart Luettgau
Title: Ask don't tell: Reducing sycophancy in large language models
Abstract:
Sycophancy, the tendency of large language models to favour user-affirming responses over critical engagement, has been identified as an alignment failure, particularly in high-stakes advisory and social contexts. While prior work has documented conversational features correlated with sycophancy, we lack a systematic understanding of what provokes or prevents AI sycophancy. Here, we present a set of controlled experimental studies where we first isolate how input framing influences sycophancy, and second, leverage these findings to develop mitigation strategies. In a nested factorial design, we compare questions to various non-questions where we vary three orthogonal factors: epistemic certainty (statement, belief, conviction), perspective (I- vs user-perspective), and affirmation vs negation. We show that (1) sycophancy is substantially higher in response to non-questions compared to questions. Additionally, we find that (2) sycophancy increases monotonically with epistemic certainty conveyed by the user, and (3) is amplified by I-perspective framing. Building on this, we show that asking a model to convert non-questions into questions before answering significantly reduces sycophancy. Importantly, this effect is stronger than a simple baseline prompt asking models "not to be sycophantic". Our work offers a practical and effective input-level mitigation that both developers and users can easily adopt.

Authors:Priyan Vaithilingam, Alan Leung, Jeffrey Nichols, Titus Barik
Title: The Way We Notice, That's What Really Matters: Instantiating UI Components with Distinguishing Variations
Abstract:
Front-end developers author UI components to be broadly reusable by parameterizing visual and behavioral properties. While flexible, this makes instantiation harder, as developers must reason about numerous property values and interactions. In practice, they must explore the component's large design space and provide realistic and natural values to properties. To address this, we introduce distinguishing variations: variations that are both mimetic and distinct. We frame distinguishing variation generation as design-space sampling, combining symbolic inference to identify visually important properties with an LLM-driven mimetic sampler to produce realistic instantiations from its world knowledge. We instantiate distinguishing variations in Celestial, a tool that helps developers explore and visualize distinguishing variations. In a study with front-end developers (n=12), participants found these variations useful for comparing and mapping component design spaces, reported that mimetic instantiations were domain-relevant, and validated that Celestial transformed component instantiation from a manual process into a structured, exploratory activity.

Authors:Emma Jiren Wang, Siying Hu, Zhicong Lu
Title: PuppetChat: Fostering Intimate Communication through Bidirectional Actions and Micronarratives
Abstract:
As a primary channel for sustaining modern intimate relationships, instant messaging facilitates frequent connection across distances. However, today's tools often dilute care; they favor single tap reactions and vague emojis that do not support two way action responses, do not preserve the feeling that the exchange keeps going without breaking, and are weakly tied to who we are and what we share. To address this challenge, we present PuppetChat, a dyadic messaging prototype that restores this expressive depth through embodied interaction. PuppetChat uses a reciprocity aware recommender to encourage responsive actions and generates personalized micronarratives from user stories to ground interactions in personal history. Our 10-day field study with 11 dyads of close partners or friends revealed that this approach enhanced social presence, supported more expressive self disclosure, and sustained continuity and shared memories.

Authors:Gabriela Aránguiz Dias, Kiana Jafari, Allie Griffith, Carolina Aránguiz Dias, Grace Ra Kim, Lana Saadeddin, Mykel J. Kochenderfer
Title: The Doctor Will (Still) See You Now: On the Structural Limits of Agentic AI in Healthcare
Abstract:
Across healthcare, agentic artificial intelligence (AI) systems are increasingly promoted as capable of autonomous action, yet in practice they currently operate under near-total human oversight due to safety, regulatory, and liability constraints that make autonomous clinical reasoning infeasible in high-stakes environments. While market enthusiasm suggests a revolution in healthcare agents, the conceptual assumptions and accountability structures shaping these systems remain underexamined. We present a qualitative study based on interviews with 20 stakeholders, including developers, implementers, and end users. Our analysis identifies three mutually reinforcing tensions: conceptual fragmentation regarding the definition of `agentic'; an autonomy contradiction where commercial promises exceed operational reality; and an evaluation blind spot that prioritizes technical benchmarks over sociotechnical safety. We argue that agentic {AI} functions as a site of contested meaning-making where technical aspirations, commercial incentives, and clinical constraints intersect, carrying material consequences for patient safety and the distribution of blame.

Authors:Kehang Zhu, Nithum Thain, Vivian Tsai, James Wexler, Crystal Qian
Title: Choose Your Agent: Tradeoffs in Adopting AI Advisors, Coaches, and Delegates in Multi-Party Negotiation
Abstract:
As AI usage becomes more prevalent in social contexts, understanding agent-user interaction is critical to designing systems that improve both individual and group outcomes. We present an online behavioral experiment (N = 243) in which participants play three multi-turn bargaining games in groups of three. Each game, presented in randomized order, grants access to a single LLM assistance modality: proactive recommendations from an Advisor, reactive feedback from a Coach, or autonomous execution by a Delegate; all modalities are powered by an underlying LLM that achieves superhuman performance in an all-agent environment. On each turn, participants privately decide whether to act manually or use the AI modality available in that game. Despite preferring the Advisor modality, participants achieve the highest mean individual gains with the Delegate, demonstrating a preference-performance misalignment. Moreover, delegation generates positive externalities; even non-adopting users in access-to-delegate treatment groups benefit by receiving higher-quality offers. Mechanism analysis reveals that the Delegate agent acts as a market maker, injecting rational, Pareto-improving proposals that restructure the trading environment. Our research reveals a gap between agent capabilities and realized group welfare. While autonomous agents can exhibit super-human strategic performance, their impact on realized welfare gains can be constrained by interfaces, user perceptions, and adoption barriers. Assistance modalities should be designed as mechanisms with endogenous participation; adoption-compatible interaction rules are a prerequisite to improving human welfare with automated assistance.

Authors:Sidong Feng, Chunyang Chen
Title: How Smart Is Your GUI Agent? A Framework for the Future of Software Interaction
Abstract:
GUI agents are rapidly becoming a new interaction to software, allowing people to navigate web, desktop and mobile rather than execute them click by click. Yet ``agent'' is described with radically different degrees of autonomy, obscuring capability, responsibility and risk. We call for conceptual clarity through GUI Agent Autonomy Levels (GAL), a six-level framework that makes autonomy explicit and helps benchmark progress toward trustworthy software interaction.

Authors:Tram Thi Minh Tran, Debargha Dey, Martin Tomitsch
Title: Rethinking External Communication of Autonomous Vehicles: Is the Field Converging, Diverging, or Stalling?
Abstract:
As autonomous vehicles enter public spaces, external human-machine interfaces are proposed to support communication with external road users. A decade of research has produced hundreds of studies and reviews, yet it remains unclear whether the field is converging on shared principles or diverging across approaches. We present a multi-dimensional analysis of 620 publications, complemented by industry deployments and regulatory documents, to track research evolution and identify convergence. The analysis reveals several field-level patterns. First, convergence on a safety-first core: simple visual cues that clarify intent. Second, sustained divergence in necessity and implementation. Third, a progressive filtering funnel: broad exploration in research and concepts narrows in deployment and is codified by regulation into a minimal set of permitted signals. These insights point to a shift in emphasis for future work, from producing new prototypes toward consolidating evidence, clarifying points of contention, and developing frameworks that can adapt across contexts.

Authors:Abhisek Dash, Soumi Das, Elisabeth Kirsten, Qinyuan Wu, Sai Keerthana Karnam, Krishna P. Gummadi, Thorsten Holz, Muhammad Bilal Zafar, Savvas Zannettou
Title: The Algorithmic Self-Portrait: Deconstructing Memory in ChatGPT
Abstract:
To enable personalized and context-aware interactions, conversational AI systems have introduced a new mechanism: Memory. Memory creates what we refer to as the Algorithmic Self-portrait - a new form of personalization derived from users' self-disclosed information divulged within private conversations. While memory enables more coherent exchanges, the underlying processes of memory creation remain opaque, raising critical questions about data sensitivity, user agency, and the fidelity of the resulting portrait. To bridge this research gap, we analyze 2,050 memory entries from 80 real-world ChatGPT users. Our analyses reveal three key findings: (1) A striking 96% of memories in our dataset are created unilaterally by the conversational system, potentially shifting agency away from the user; (2) Memories, in our dataset, contain a rich mix of GDPR-defined personal data (in 28% memories) along with psychological insights about participants (in 52% memories); and (3)~A significant majority of the memories (84%) are directly grounded in user context, indicating faithful representation of the conversations. Finally, we introduce a framework-Attribution Shield-that anticipates these inferences, alerts about potentially sensitive memory inferences, and suggests query reformulations to protect personal information without sacrificing utility.

Authors:Xuan-The Tran, Thien-Nhan Vo, Son-Tung Vu, Thoa-Thi Tran, Manh-Dat Nguyen, Thomas Do, Chin-Teng Lin
Title: Inter- and Intra-Subject Variability in EEG: A Systematic Survey
Abstract:
Electroencephalography (EEG) underpins neuroscience, clinical neurophysiology, and brain-computer interfaces (BCIs), yet pronounced inter- and intra-subject variability limits reliability, reproducibility, and translation. This systematic review studies that quantified or modeled EEG variability across resting-state, event-related potentials (ERPs), and task-related/BCI paradigms (including motor imagery and SSVEP) in healthy and clinical cohorts. Across paradigms, inter-subject differences are typically larger than within-subject fluctuations, but both affect inference and model generalization. Stability is feature-dependent: alpha-band measures and individual alpha peak frequency are often relatively reliable, whereas higher-frequency and many connectivity-derived metrics show more heterogeneous reliability; ERP reliability varies by component, with P300 measures frequently showing moderate-to-good stability. We summarize major sources of variability (biological, state-related, technical, and analytical), review common quantification and modeling approaches (e.g., ICC, CV, SNR, generalizability theory, and multivariate/learning-based methods), and provide recommendations for study design, reporting, and harmonization. Overall, EEG variability should be treated as both a practical constraint to manage and a meaningful signal to leverage for precision neuroscience and robust neurotechnology.

Authors:Chi-Sheng Chen, En-Jui Kuo, Guan-Ying Chen, Xinyu Zhang, Fan Zhang
Title: A Unified SPD Token Transformer Framework for EEG Classification: Systematic Comparison of Geometric Embeddings
Abstract:
Spatial covariance matrices of EEG signals are Symmetric Positive Definite (SPD) and lie on a Riemannian manifold, yet the theoretical connection between embedding geometry and optimization dynamics remains unexplored. We provide a formal analysis linking embedding choice to gradient conditioning and numerical stability for SPD manifolds, establishing three theoretical results: (1) BWSPD's $\sqrtκ$ gradient conditioning (vs $κ$ for Log-Euclidean) via Daleckii-Kre\uın matrices provides better gradient conditioning on high-dimensional inputs ($d \geq 22$), with this advantage reducing on low-dimensional inputs ($d \leq 8$) where eigendecomposition overhead dominates; (2) Embedding-Space Batch Normalization (BN-Embed) approximates Riemannian normalization up to $O(\varepsilon^2)$ error, yielding $+26\%$ accuracy on 56-channel ERP data but negligible effect on 8-channel SSVEP data, matching the channel-count-dependent prediction; (3) bi-Lipschitz bounds prove BWSPD tokens preserve manifold distances with distortion governed solely by the condition ratio $κ$. We validate these predictions via a unified Transformer framework comparing BWSPD, Log-Euclidean, and Euclidean embeddings within identical architecture across 1,500+ runs on three EEG paradigms (motor imagery, ERP, SSVEP; 36 subjects). Our Log-Euclidean Transformer achieves state-of-the-art performance on all datasets, substantially outperforming classical Riemannian classifiers and recent SPD baselines, while BWSPD offers competitive accuracy with similar training time.

Authors:Tao Morisaki, Atsushi Matsubayashi, Yasutoshi Makino, Hiroyuki Shinoda
Title: Tactile Rendering Using Three Basic Stimulus Components in Ultrasound Midair Haptics
Abstract:
Ultrasound midair haptics (UMH) can present non-contact tactile stimuli using focused ultrasound without restricting the user's movement. Recently, UMH has been shown to present not only conventional vibrotactile sensations but also static pressure sensations by locally rotating an ultrasound focus at several hertz. With these pressure and vibration sensations, UMH covers three mechanoreceptors on which tactile perception relies: SA-I, FA-I, and FA-II. This study proposes a texture rendering method in UMH based on these receptor characteristics. Three basic ultrasonic stimuli corresponding to each mechanoreceptor are designed, and tactile textures are rendered through their combinations. For SA-I, a pressure stimuli were employed. For FA-I and FA-II, vibration stimuli at 30 Hz and 150 Hz, respectively, are employed. Experimental results demonstrate that the proposed method can render at least six discriminable textures with different roughness and friction sensations. Notably, through comparisons with real physical objects, we found that the pressure-only stimulus was perceived as slippery and smooth. Its smoothness was similar to a glass-marble. When vibration stimuli were synthesized, the perceived roughness and friction increased significantly. The roughness level reached that of a 100-grit sandpaper.

Authors:Mingxin Zhang, Yu Yao, Yasutoshi Makino, Hiroyuki Shinoda, Masashi Sugiyama
Title: HapticMatch: An Exploration for Generative Material Haptic Simulation and Interaction
Abstract:
High-fidelity haptic feedback is essential for immersive virtual environments, yet authoring realistic tactile textures remains a significant bottleneck for designers. We introduce HapticMatch, a visual-to-tactile generation framework designed to democratize haptic content creation. We present a novel dataset containing precisely aligned pairs of micro-scale optical images, surface height maps, and friction-induced vibrations for 100 diverse materials. Leveraging this data, we explore and demonstrate that conditional generative models like diffusion and flow-matching can synthesize high-fidelity, renderable surface geometries directly from standard RGB photos. By enabling a "Scan-to-Touch" workflow, HapticMatch allows interaction designers to rapidly prototype multimodal surface sensations without specialized recording equipment, bridging the gap between visual and tactile immersion in VR/AR interfaces.

Authors:Aldo Cerulli, Lorenzo Cima, Benedetta Tessa, Serena Tardelli, Stefano Cresci
Title: The Big Ban Theory: A Pre- and Post-Intervention Dataset of Online Content Moderation Actions
Abstract:
Online platforms rely on moderation interventions to curb harmful behavior such hate speech, toxicity, and the spread of mis- and disinformation. Yet research on the effects and possible biases of such interventions faces multiple limitations. For example, existing works frequently focus on single or a few interventions, due to the absence of comprehensive datasets. As a result, researchers must typically collect the necessary data for each new study, which limits opportunities for systematic comparisons. To overcome these challenges, we introduce The Big Ban Theory (TBBT), a large dataset of moderation interventions. TBBT covers 25 interventions of varying type, severity, and scope, comprising in total over 339K users and nearly 39M posted messages. For each intervention, we provide standardized metadata and pseudonymized user activity collected three months before and after its enforcement, enabling consistent and comparable analyses of intervention effects. In addition, we provide a descriptive exploratory analysis of the dataset, along with several use cases of how it can support research on content moderation. With this dataset, we aim to support researchers studying the effects of moderation interventions and to promote more systematic, reproducible, and comparable research. TBBT is publicly available at: https://doi.org/10.5281/zenodo.18245670.

Authors:Patrick Gage Kelley, Steven Rousso-Schindler, Renee Shelby, Kurt Thomas, Allison Woodruff
Title: How Generative AI Empowers Attackers and Defenders Across the Trust & Safety Landscape
Abstract:
Generative AI (GenAI) is a powerful technology poised to reshape Trust & Safety. While misuse by attackers is a growing concern, its defensive capacity remains underexplored. This paper examines these effects through a qualitative study with 43 Trust & Safety experts across five domains: child safety, election integrity, hate and harassment, scams, and violent extremism. Our findings characterize a landscape in which GenAI empowers both attackers and defenders. GenAI dramatically increases the scale and speed of attacks, lowering the barrier to entry for creating harmful content, including sophisticated propaganda and deepfakes. Conversely, defenders envision leveraging GenAI to detect and mitigate harmful content at scale, conduct investigations, deploy persuasive counternarratives, improve moderator wellbeing, and offer user support. This work provides a strategic framework for understanding GenAI's impact on Trust & Safety and charts a path for its responsible use in creating safer online environments.

Authors:Xinyan Yu, Julie Stephany Berrio Perez, Marius Hoggenmüller, Martin Tomitsch, Tram Thi Minh Tran, Stewart Worrall, Wendy Ju
Title: The UnScripted Trip: Fostering Policy Discussion on Future Human-Vehicle Collaboration in Autonomous Driving Through Design-Oriented Methods
Abstract:
The rapid advancement of autonomous vehicle (AV) technologies is fundamentally reshaping paradigms of human-vehicle collaboration, raising not only an urgent need for innovative design solutions but also for policies that address corresponding broader tensions in society. To bridge the gap between HCI research and policy making, this workshop will bring together researchers and practitioners in the automotive community to explore AV policy directions through collaborative speculation on the future of AVs. We designed The UnScripted Trip, a card game rooted in fictional narratives of autonomous mobility, to surface tensions around human-vehicle collaboration in future AV scenarios and to provoke critical reflections on design solutions and policy directions. Our goal is to provide an engaging, participatory space and method for automotive researchers, designers, and industry practitioners to collectively explore and shape the future of human-vehicle collaboration and its policy implications.

Authors:Xinyan Yu, Marius Hoggenmüller, Tram Thi Minh Tran, Martin Tomitsch
Title: Feel the Presence: The Effects of Haptic Sensation on VR-Based Human-Robot Interaction
Abstract:
Virtual reality (VR) has been increasingly utilised as a simulation tool for human-robot interaction (HRI) studies due to its ability to facilitate fast and flexible prototyping. Despite efforts to achieve high validity in VR studies, haptic sensation, an essential sensory modality for perception and a critical factor in enhancing VR realism, is often absent from these experiments. Studying an interactive robot help-seeking scenario, we used a VR simulation with haptic gloves that provide highly realistic tactile and force feedback to examine the effects of haptic sensation on VR-based HRI. We compared participants' sense of presence and their assessments of the robot to a traditional setup using hand controllers. Our results indicate that haptic sensation enhanced participants' social and self-presence in VR and fostered more diverse and natural bodily engagement. Additionally, haptic sensations significantly influenced participants' affective-related perceptions of the robot. Our study provides insights to guide HRI researchers in building VR-based simulations that better align with their study contexts and objectives.

Authors:Manh-Dat Nguyen, Thomas Do, Nguyen Thanh Trung Le, Xuan-The Tran, Fred Chang, Chin-Teng Lin
Title: EdgeSSVEP: A Fully Embedded SSVEP BCI Platform for Low-Power Real-Time Applications
Abstract:
Brain-Computer Interfaces (BCIs) enable users to interact with machines directly via neural activity, yet their real-world deployment is often hindered by bulky and powerhungry hardware. We present EdgeSSVEP, a fully embedded microcontroller-based Steady-State Visually Evoked Potential (SSVEP) BCI platform that performs real-time EEG acquisition, zero-phase filtering, and on-device classification within a lowpower 240 MHz MCU operating at only 222 mW. The system incorporates an 8-channel EEG front end, supports 5-second stimulus durations, and executes the entire SSVEP decoding pipeline locally, eliminating dependence on PC-based processing. EdgeSSVEP was evaluated using six stimulus frequencies (7, 8, 9, 11, 7.5, and 8.5 Hz) with 10 participants. The device achieved 99.17% classification accuracy and 27.33 bits/min Information Transfer Rate (ITR), while consuming substantially less power than conventional desktop-based systems. The system integrates motion sensing to support artifact detection and improve robustness and signal stability in practical environments. For development and debugging, the system also provides optional TCP data streaming to external clients. Overall, EdgeSSVEP offers a scalable, energy-efficient, and secure embedded BCI platform suitable for assistive communication and neurofeedback applications, with potential extensions to accelerometer-based artifact mitigation and broader real-world deployments.

Authors:Zifan Peng, Mingchen Li
Title: "What Did It Actually Do?": Understanding Risk Awareness and Traceability for Computer-Use Agents
Abstract:
Personalized computer-use agents are rapidly moving from expert communities into mainstream use. Unlike conventional chatbots, these systems can install skills, invoke tools, access private resources, and modify local environments on users' behalf. Yet users often do not know what authority they have delegated, what the agent actually did during task execution, or whether the system has been safely removed afterward. We investigate this gap as a combined problem of risk understanding and post-hoc auditability, using OpenClaw as a motivating case. We first build a multi-source corpus of the OpenClaw ecosystem, including incidents, advisories, malicious-skill reports, news coverage, tutorials, and social-media narratives. We then conduct an interview study to examine how users and practitioners understand skills, autonomy, privilege, persistence, and uninstallation. Our findings suggest that participants often recognized these systems as risky in the abstract, but lacked concrete mental models of what skills can do, what resources agents can access, and what changes may remain after execution or removal. Motivated by these findings, we propose AgentTrace, a traceability framework and prototype interface for visualizing agent actions, touched resources, permission history, provenance, and persistent side effects. A scenario-based evaluation suggests that traceability-oriented interfaces can improve understanding of agent behavior, support anomaly detection, and foster more calibrated trust.

Authors:Laura Rayón Ropero, Jasper De Laet, Filip Lemic, Pau Sabater Nácher, Nabeel Nisar Bhat, Sergi Abadal, Jeroen Famaey, Eduard Alarcón, Xavier Costa-Pérez
Title: Towards Emotion Recognition with 3D Pointclouds Obtained from Facial Expression Images
Abstract:
Facial Emotion Recognition is a critical research area within Affective Computing due to its wide-ranging applications in Human Computer Interaction, mental health assessment and fatigue monitoring. Current FER methods predominantly rely on Deep Learning techniques trained on 2D image data, which pose significant privacy concerns and are unsuitable for continuous, real-time monitoring. As an alternative, we propose High-Frequency Wireless Sensing (HFWS) as an enabler of continuous, privacy-aware FER, through the generation of detailed 3D facial pointclouds via on-person sensors embedded in wearables. We present arguments supporting the privacy advantages of HFWS over traditional 2D imaging, particularly under increasingly stringent data protection regulations. A major barrier to adopting HFWS for FER is the scarcity of labeled 3D FER datasets. Towards addressing this issue, we introduce a FLAME-based method to generate 3D facial pointclouds from existing public 2D datasets. Using this approach, we create AffectNet3D, a 3D version of the AffectNet database. To evaluate the quality and usability of the generated data, we design a pointcloud refinement pipeline focused on isolating the facial region, and train the popular PointNet++ model on the refined pointclouds. Fine-tuning the model on a small subset of the unseen 3D FER dataset BU-3DFE yields a classification accuracy exceeding 70%, comparable to oracle-level performance. To further investigate the potential of HFWS-based FER for continuous monitoring, we simulate wearable sensing conditions by masking portions of the generated pointclouds. Experimental results show that models trained on AffectNet3D and fine-tuned with just 25% of BU-3DFE outperform those trained solely on BU-3DFE. These findings highlight the viability of our pipeline and support the feasibility of continuous, privacy-aware FER via wearable HFWS systems.

Authors:Chitralekha Gupta, Nadia Victoria Aritonang, Dixon Prem Daniel Rajendran, Valdemar Danry, Pattie Maes, Suranga Nanayakarra
Title: Feeling the Facts: Real-time Wearable Fact-checkers Can Use Nudges to Reduce User Belief in False Information
Abstract:
Misinformation can spread rapidly in everyday conversation, where pausing to verify is not always possible. We envision a wearable system that bridges the timing gap between hearing a claim and forming a judgment. It uses ambient listening to detect verifiable claims, performs rapid web verification, and provides a subtle haptic nudge with a glanceable overview. A controlled study (N=34) simulated this approach and tested against a no-support baseline. Results show that instant, body-integrated feedback significantly improved real-time truth discernment and increased verification activity compared to unsupported fact-checking. However, it also introduced over-reliance when the system made errors, i.e. failed to flag false claims or flagged true claims as false. We contribute empirical evidence of improved discernment alongside insights into trust, effort, and user-system tensions in verification wearables.

Authors:Luis Morales-Navarro, Daniel J. Noh, Lucianne Servat, Carly Netting, Yasmin B. Kafai, Danaé Metaxa
Title: Building to Understand: Examining Teens' Technical and Socio-Ethical Pieces of Understandings in the Construction of Small Generative Language Models
Abstract:
The rising adoption of generative AI/ML technologies increases the need to support teens in developing AI/ML literacies. Child-computer interaction research argues that construction activities can support young people in understanding these systems and their implications. Recent exploratory studies demonstrate the feasibility of engaging teens in the construction of very small generative language models (LMs). However, it is unclear how constructing such models may foster the development of teens' understanding of these systems from technical and socio-ethical perspectives. We conducted a week-long participatory design workshop in which sixteen teenagers constructed very small LMs to generate recipes, screenplays, and songs. Using thematic analysis, we identified technical and socio-ethical pieces of understandings that teens exhibited while designing generative LMs. This paper contributes (a) evidence of the kinds of pieces of understandings that teens have when constructing LMs and (b) a theory-backed framing to study novices' understandings of AI/ML systems.

Authors:Mohammadreza Jamalifard, Yaxiong Lei, Parasto Azizinezhad, Javier Fumanal-Idocin, Javier Andreu-Perez
Title: A Neuro-Symbolic System for Interpretable Multimodal Physiological Signals Integration in Human Fatigue Detection
Abstract:
We propose a neuro-symbolic architecture that learns four interpretable physiological concepts, oculomotor dynamics, gaze stability, prefrontal hemodynamics, and multimodal, from eye-tracking and neural hemodynamics, functional near-infrared spectroscopy, (fNIRS) windows using attention-based encoders, and combines them with differentiable approximate reasoning rules using learned weights and soft thresholds, to address both rigid hand-crafted rules and the lack of subject-level alignment diagnostics. We apply this system to fatigue classification from multimodal physiological signals, a domain that requires models that are accurate and interpretable, with internal reasoning that can be inspected for safety-critical use. In leave-one-subject-out evaluation on 18 participants (560 samples), the method achieves 72.1% +/- 12.3% accuracy, comparable to tuned baselines while exposing concept activations and rule firing strengths. Ablations indicate gains from participant-specific calibration (+5.2 pp), a modest drop without the fNIRS concept (-1.2 pp), and slightly better performance with Lukasiewicz operators than product (+0.9 pp). We also introduce concept fidelity, an offline per-subject audit metric from held-out labels, which correlates strongly with per-subject accuracy (r=0.843, p < 0.0001).

Authors:Taejun Kim, Vimal Mollyn, Riku Arakawa, Chris Harrison
Title: HiFiGaze: Improving Eye Tracking Accuracy Using Screen Content Knowledge
Abstract:
We present a new and accurate approach for gaze estimation on consumer computing devices. We take advantage of continued strides in the quality of user-facing cameras found in e.g., smartphones, laptops, and desktops - 4K or greater in high-end devices - such that it is now possible to capture the 2D reflection of a device's screen in the user's eyes. This alone is insufficient for accurate gaze tracking due to the near-infinite variety of screen content. Crucially, however, the device knows what is being displayed on its own screen - in this work, we show this information allows for robust segmentation of the reflection, the location and size of which encodes the user's screen-relative gaze target. We explore several strategies to leverage this useful signal, quantifying performance in a user study. Our best performing model reduces mean tracking error by ~8% compared to a baseline appearance-based model. A supplemental study reveals an additional 10-20% improvement if the gaze-tracking camera is located at the bottom of the device.

Authors:Soorya Ram Shimgekar, Vipin Gunda, Jiwon Kim, Violeta J. Rodriguez, Hari Sundaram, Koustuv Saha
Title: AI Psychosis: Does Conversational AI Amplify Delusion-Related Language?
Abstract:
Conversational AI systems are increasingly used for personal reflection and emotional disclosure, raising concerns about their effects on vulnerable users. Recent anecdotal reports suggest that prolonged interactions with AI may reinforce delusional thinking -- a phenomenon sometimes described as AI Psychosis. However, empirical evidence on this phenomenon remains limited. In this work, we examine how delusion-related language evolves during multi-turn interactions with conversational AI. We construct simulated users (SimUsers) from Reddit users' longitudinal posting histories and generate extended conversations with three model families (GPT, LLaMA, and Qwen). We develop DelusionScore, a linguistic measure that quantifies the intensity of delusion-related language across conversational turns. We find that SimUsers derived from users with prior delusion-related discourse (Treatment) exhibit progressively increasing DelusionScore trajectories, whereas those derived from users without such discourse (Control) remain stable or decline. We further find that this amplification varies across themes, with reality skepticism and compulsive reasoning showing the strongest increases. Finally, conditioning AI responses on current DelusionScore substantially reduces these trajectories. These findings provide empirical evidence that conversational AI interactions can amplify delusion-related language over extended use and highlight the importance of state-aware safety mechanisms for mitigating such risks.

Authors:Junzi Zhang, Jianing Shen, Weijie Tu, Yi Zhang, Hailin Zhang, Tom Gedeon, Bin Jiang, Yue Yao
Title: EEG-Based Brain-LLM Interface for Human Preference Aligned Generation
Abstract:
Large language models (LLMs) are becoming an increasingly important component of human--computer interaction, enabling users to coordinate a wide range of intelligent agents through natural language. While language-based interfaces are powerful and flexible, they implicitly assume that users can reliably produce explicit linguistic input, an assumption that may not hold for users with speech or motor impairments, e.g., Amyotrophic Lateral Sclerosis (ALS). In this work, we investigate whether neural signals can be used as an alternative input to LLMs, particularly to support those socially marginalized or underserved users. We build a simple brain-LLM interface, which uses EEG signals to guide image generation models at test time. Specifically, we first train a classifier to estimate user satisfaction from EEG signals. Its predictions are then incorporated into a test-time scaling (TTS) framework that dynamically adapts model inference using neural feedback collected during user evaluation. The experiments show that EEG can predict user satisfaction, suggesting that neural activity carries information on real-time preference inference. These findings provide a first step toward integrating neural feedback into adaptive language-model inference, and hopefully open up new possibilities for future research on adaptive LLM interaction.

Authors:Ivan Lopez, Selin S. Everett, Bryan J. Bunning, April S. Liang, Dong Han Yao, Shivam C. Vedak, Kameron C. Black, Sophie Ostmeier, Stephen P. Ma, Emily Alsentzer, Jonathan H. Chen, Akshay S. Chaudhari, Eric Horvitz
Title: Clinician input steers frontier AI models toward both accurate and harmful decisions
Abstract:
Large language models (LLMs) are entering clinician workflows, yet evaluations rarely measure how clinician reasoning shapes model behavior during clinical interactions. We combined 61 New England Journal of Medicine Case Records with 92 real-world clinician-AI interactions to evaluate 21 reasoning LLM variants across 8 frontier models on differential diagnosis generation and next step recommendations under three conditions: reasoning alone, after expert clinician context, and after adversarial clinician context. LLM-clinician concordance increased substantially after clinician exposure, with simulations sharing >=3 differential diagnosis items rising from 65.8% to 93.5% and >=3 next step recommendations from 20.3% to 53.8%. Expert context significantly improved correct final diagnosis inclusion across all 21 models (mean +20.4 percentage points), reflecting both reasoning improvement and passive content echoing, while adversarial context caused significant diagnostic degradation in 14 models (mean -5.4 percentage points). Multi-turn disagreement probes revealed distinct model phenotypes ranging from highly conformist to dogmatic, with adversarial arguments remaining a persistent vulnerability even for otherwise resilient models. Inference-time scaling reduced harmful echoing of clinician-introduced recommendations across WHO-defined harm severity tiers (relative reductions: 62.7% mild, 57.9% moderate, 76.3% severe, 83.5% death-tier). In GPT-4o experiments, explicit clinician uncertainty signals improved diagnostic performance after adversarial context (final diagnosis inclusion 27% to 42%) and reduced alignment with incorrect arguments by 21%. These findings establish a foundation for evaluating clinician-AI collaboration, introducing interactive metrics and mitigation strategies essential for safety and robustness.

Authors:Peinuan Qin, Jingzhu Chen, Yitian Yang, Han Meng, Zicheng Zhu, Yi-Chieh Lee
Title: ConvScale: Conversational Interviews for Scale-Aligned Measurement
Abstract:
Conversational interviews are commonly used to complement structured surveys by eliciting rich and contextualized responses, which are typically analyzed qualitatively. However, their potential contribution to quantitative measurement remains underexplored. In this paper, we introduce ConvScale, an AI-supported approach that transforms psychometric scales into natural conversational interviews while preserving the original measurement structure. Based on interview data, ConvScale predicts item-level scores and aggregates them to derive scale-based assessments. In a within-subjects study with 18 participants, our results show that ConvScale-derived scores align closely with participants' self-report scores at both the item and construct levels, while maintaining moderate internal reliability; however, the structural validity was inadequate. In light of this, we discussed the potential of supporting quantitative measurement through interviews and proposed implications for future designs.

Authors:Eduardo Davalos, Yike Zhang
Title: AI Misuse in Education Is a Measurement Problem: Toward a Learning Visibility Framework
Abstract:
The rapid integration of conversational AI systems into educational settings has intensified ethical concerns about academic integrity, fairness, and students' cognitive development. Institutional responses have largely centered on AI detection tools and restrictive policies, yet such approaches have proven unreliable and ethically contentious. This paper reframes AI misuse in education not primarily as a detection problem, but as a measurement problem rooted in the loss of visibility into the learning process. When AI enters the assessment loop, educators often retain access to final outputs but lose valuable insight into how those outputs were produced. Drawing on research in cognitive offloading, learning analytics, and multimodal timeline reconstruction, we propose the Learning Visibility Framework, grounded in three principles: clear specification and modeling of acceptable AI use, recognition of learning processes as assessable evidence alongside outcomes, and the establishment of transparent timelines of student activity. Rather than promoting surveillance, the framework emphasizes transparency and shared evidence as foundations for ethical AI integration in classroom settings. By shifting focus from adversarial detection toward process visibility, this work offers a principled pathway for aligning AI use with educational values while preserving trust and transparency between students and educators

Authors:Cynthia M. Baseman, Myeonghan Ryu, Nathaniel Swinger, Kefan Xu, Andrew M. Sherrill, Rosa I. Arriaga
Title: Human-centered Perspectives on a Clinical Decision Support System for Intensive Outpatient Veteran PTSD Care
Abstract:
Psychotherapy delivery relies on a negotiation between patient self-reports and clinical intuition. Growing evidence for technological support of psychotherapy suggests opportunities to aid the mediation of this tension. To explore this prospect, we designed a prototype of a clinical decision support system (CDSS) for treating veterans with post-traumatic stress disorder in a Prolonged Exposure (PE) therapy intensive outpatient program. We conducted a two-phase interview study to collect perspectives from practicing PE clinicians and former PE patients who are United States veterans. Our analysis distills opportunities for a CDSS (e.g., offering homework review at a glance, aiding patient conceptualization) and larger challenges related to context and deployment (e.g., navigating Veterans Affairs). By reframing our findings through three human-centered perspectives (distributed cognition, situated learning, infrastructural inversion), we highlight the complexities of designing a CDSS for psychotherapists in this context and offer theory-aligned design considerations.

Authors:Tram Thi Minh Tran, Adrian Wong, Callum Parker, Carlos Alfredo Tirado Cortes, Marius Hoggenmueller, Soojeong Yoo, Nate Zettna, Joel Fredericks
Title: Probing More-Than-Human Representation in Crisis Resilience Planning: An HCI Researcher Perspective
Abstract:
Crisis resilience planning raises urgent questions about how to include non-human species and ecological systems in participatory processes, which remain largely human-centred. This paper reports on a workshop with HCI researchers examining how more-than-human representation is approached in crisis contexts. The workshop combined scenario-based discussion with two design probes -- a voice-based conversational agent and an immersive embodied prototype -- to support sustained discussion of how emerging technologies shape engagement with non-human perspectives. Participants focused not on system usability, but on deliberating representational choices, such as voice, embodiment, and realism, and their potential role within participatory planning processes. The findings suggest that giving 'voice' to non-humans is not a neutral act of translation, but a design challenge that introduces tensions between legitimacy, authority, and authenticity. This paper provides empirical insight into how HCI researchers conceptualise more-than-human representation and positions crisis resilience planning as a critical site for examining AI- and immersion-mediated representation.

Authors:Zhengtao Xu, Zimo Xia, Zicheng Zhu, Nattapat Boonprakong, Yu-An Chen, Rabih Zbib, Casimiro Pio Carrino, Yi-Chieh Lee
Title: InterPilot: Exploring the Design Space of AI-assisted Job Interview Support for HR Professionals
Abstract:
Recruitment interviews are cognitively demanding interactions in which interviewers must simultaneously listen, evaluate candidates, take notes, and formulate follow-up questions. To better understand these challenges, we conducted a formative study with eight HR professionals, from which we derived key design goals for real-time AI support. Guided by these insights, we developed InterPilot, a prototype system that augments interviews through intelligent note-taking and post-interview summary, adaptive question generation, and real-time skill-evidence mapping. We evaluated the system with another seven HR professionals in mock interviews using a within-subjects design. Results show that InterPilot reduced documentation burden without increasing overall workload, but introduced usability trade-offs related to visual attention and interaction complexity. Qualitative findings further reveal tensions around trust and verification when AI suggests highly specific technical questions. We discuss implications for designing future real-time human-AI collaboration in professional settings, highlighting the need to balance assistance granularity, attentional demands, and human agency.

Authors:Jennica Li, Shirley Zhang, Dakota Sullivan, Bengisu Cagiltay, Heather Kirkorian, Bilge Mutlu, Kassem Fawaz
Title: "It's like a pet...but my pet doesn't collect data about me": Multi-person Households' Privacy Design Preferences for Household Robots
Abstract:
Household robots boasting mobility, more sophisticated sensors, and powerful processing models have become increasingly prevalent in the commercial market. However, these features may expose users to unwanted privacy risks, including unsolicited data collection and unauthorized data sharing. While security and privacy researchers thus far have explored people's privacy concerns around household robots, literature investigating people's preferred privacy designs and mitigation strategies is still limited. Additionally, the existing literature has not yet accounted for multi-user perspectives on privacy design and household robots. We aimed to fill this gap by conducting in-person participatory design sessions with 15 households to explore how they would design a privacy-aware household robot based on their concerns and expectations. We found that participants did not trust that robots, or their respective manufacturers, would respect the data privacy of household members or operate in a multi-user ecosystem without jeopardizing users' personal data. Based on these concerns, they generated designs that gave them authority over their data, contained accessible controls and notification systems, and could be customized and tailored to suit the needs and preferences of each user over time. We synthesize our findings into actionable design recommendations for robot manufacturers and developers.

Authors:Ryuji Matsuo, Hailong Liu, Toshihiro Hiraoka, Takahiro Wada
Title: An Educational Human Machine Interface Providing Request-to-Intervene Trigger and Reason Explanation for Enhancing the Driver's Comprehension of ADS's System Limitations
Abstract:
Level 3 automated driving systems (ADS) have attracted significant attention and are being commercialized. A level 3 ADS prompts the driver to take control by issuing a request to intervene (RtI) when its operational design domains (ODD) are exceeded. However, complex traffic situations can cause drivers to perceive multiple potential triggers of RtI simultaneously, causing hesitation or confusion during take-over. Therefore, drivers need to clearly understand the ADS's system limitations to ensure safe take-over. This study proposes a voice-based educational human machine interface~(HMI) for providing RtI trigger cues and reason to help drivers understand ADS's system limitations. The results of a between-group experiment using a driving simulator showed that incorporating effective trigger cues and reason into the RtI was related to improved driver comprehension of the ADS's system limitations. Moreover, most participants, instructed via the proposed method, could proactively take over control of the ADS in cases where RtI fails; meanwhile, their number of collisions was lower compared with the other RtI HMI conditions. Therefore, using the proposed method to continually enhance the driver's understanding of the system limitations of ADS through the proposed method is associated with safer and more effective real-time interactions with ADS.

Authors:Yibin Feng, Tianqi Song, Yugin Tan, Zicheng Zhu, Yi-Chieh Lee
Title: Multi-Agent Systems Shape Social Norms for Prosocial Behavior Change
Abstract:
Social norm interventions are used promote prosocial behaviors by highlighting prevalent actions, but their effectiveness is often limited in heterogeneous populations where shared understandings of desirable behaviors are lacking. This study explores whether multi-agent systems can establish "virtual social norms" to encourage donation behavior. We conducted an online experiment where participants interacted with a group of agents to discuss donation behaviors. Changes in perceived social norms, conformity, donation behavior, and user experience were measured pre- and postdiscussion. Results show that multi-agent interactions effectively increased perceived social norms and donation willingness. Notably, in-group agents led to stronger perceived social norms, higher conformity, and greater donation increases compared to out-group agents. Our findings demonstrate the potential of multi-agent systems for creating social norm interventions and offer insights into leveraging social identity dynamics to promote prosocial behavior in virtual environments.

Authors:Ruijia Cheng, Jenny T. Liang, Eldon Schoop, Jeffrey Nichols
Title: Mapping the Design Space of User Experience for Computer Use Agents
Abstract:
Large language model (LLM)-based computer use agents execute user commands by interacting with available UI elements, but little is known about how users want to interact with these agents or what design factors matter for their user experience (UX). We conducted a two-phase study to map the UX design space for computer use agents. In Phase 1, we reviewed existing systems to develop a taxonomy of UX considerations, then refined it through interviews with eight UX and AI practitioners. The resulting taxonomy included categories such as user prompts, explainability, user control, and users' mental models, with corresponding subcategories and example design features. In Phase 2, we ran a Wizard-of-Oz study with 20 participants, where a researcher acted as a web-based computer use agent and probed user reactions during normal, error-prone and risky execution. We used the findings to validate the taxonomy from Phase 1 and deepen our understand of the design space by identifying the connections between design areas and divergence in user needs and scenarios. Our taxonomy and empirical insights provide a map for developers to consider different aspects of user experience in computer use agent design and to situate their designs within users' diverse needs and scenarios.

Authors:Matt Gottsacker, Yahya Hmaiti, Mykola Maslych, Hiroshi Furuya, Jasmine Joyce DeGuzman, Gerd Bruder, Gregory F. Welch, Joseph J. LaViola
Title: From One World to Another: Interfaces for Efficiently Transitioning Between Virtual Environments
Abstract:
Personal computers and handheld devices provide keyboard shortcuts and swipe gestures to enable users to efficiently switch between applications, whereas today's virtual reality (VR) systems do not. In this work, we present an exploratory study on user interface aspects to support efficient switching between worlds in VR. We created eight interfaces that afford previewing and selecting from the available virtual worlds, including methods using portals and worlds-in-miniature (WiMs). To evaluate these methods, we conducted a controlled within-subjects empirical experiment (N=22) where participants frequently transitioned between six different environments to complete an object collection task. Our quantitative and qualitative results show that WiMs supported rapid acquisition of high-level spatial information while searching and were deemed most efficient by participants while portals provided fast pre-orientation. Finally, we present insights into the applicability, usability, and effectiveness of the VR world switching methods we explored, and provide recommendations for their application and future context/world switching techniques and interfaces.

Authors:Tram Thi Minh Tran, Soojeong Yoo, Oliver Weidlich, Yidan Cao, Xinyan Yu, Xin Cheng, Yin Ye, Natalia Gulbransen-Diaz, Callum Parker
Title: Envisioning Audio Augmented Reality in Everyday Life
Abstract:
While visual augmentation dominates the augmented reality landscape, devices like Meta Ray-Ban audio smart glasses signal growing industry movement toward audio augmented reality (AAR). Hearing is a primary channel for sensing context, anticipating change, and navigating social space, yet AAR's everyday potential remains underexplored. We address this gap through a collaborative autoethnography (N=5, authoring) and an online survey (N=74). We identify ten roles for AAR, grouped into three categories: task- and utility-oriented, emotional and social, and perceptual collaborator. These roles are further layered with a rhythmic and embodied collaborator framing, mapping them onto micro-, meso-, and macro-rhythms of everyday life. Our analysis surfaces nuanced tensions, such as blocking distractions without erasing social presence, highlighting the need for context-aware design. This paper contributes a foundational and forward-looking framework for AAR in everyday life, providing design groundwork for systems attuned to daily routines, sensory engagement, and social expectations.

Authors:Peinuan Qin, Yugin Tan, Jingzhu Chen, Nattapat Boonprakong, Zicheng Zhu, Naomi Yamashita, Yi-Chieh Lee
Title: ChatLearn: Leveraging AI to Transform Non-Native Speaker Communication Challenges as Language Learning Opportunities
Abstract:
Non-native speakers (NNSs) face significant language barriers in multilingual communication with native speakers (NSs). While AI-mediated communication (AIMC) tools offer efficient one-time assistance, they often overlook opportunities for NNSs' continuous language acquisition. We introduce ChatLearn, an enhanced AIMC system that leverages NNSs' communication difficulties as learning opportunities. Beyond comprehension and expression assistance, ChatLearn simultaneously captures NNSs' language challenges, and subsequently provides them with spaced review as the conversation progresses. We conducted a mixed-methods study using a communication task with 43 NNS-NS pairs, after which ChatLearn NNSs recalled significantly more expressions than the baseline group, while there was no substantial decline in communication experience. Our findings highlight the value of contextual learning in NNS-NS communication, providing a new direction for AIMC systems that foster both immediate collaboration and continuous language development.

Authors:Xiang Li, Wei He, Per Ola Kristensson
Title: How Do We Evaluate Experiences in Immersive Environments?
Abstract:
How do we evaluate experiences in immersive environments? Despite decades of research in immersive technologies such as virtual reality, the field remains fragmented. Studies rely on overlapping constructs, heterogeneous instruments, and little agreement on what counts as immersive experience. To better understand this landscape, we conducted a bottom-up scoping review of 375 papers published in ACM CHI, UIST, VRST, SUI, IEEE VR, ISMAR, and TVCG. Our analysis reveals that evaluation practices are often domain- and purpose-specific, shaped more by local choices than by shared standards. Yet this diversity also points to new directions. Instead of multiplying instruments, researchers benefit from integrating and refining them into smarter measures. Rather than focusing only on system outputs, evaluations must center the user's lived experience. Computational modeling offers opportunities to bridge signals across methods, but lasting progress requires open and sustainable evaluation practices that support comparability and reuse. Ultimately, our contribution is to map current practices and outline a forward-looking agenda for immersive experience research.

Authors:Yi Wang, John Joon Young Chung, Melissa Roemmele, Yuqian Sun, Tiffany Wang, Shm Garanganao Almeda, Brett A. Halperin, Yuwen Lu, Max Kreminski
Title: Elsewise: Authoring AI-Based Interactive Narrative with Possibility Space Visualization
Abstract:
Interactive narrative (IN) authors craft spaces of divergent narrative possibilities for players to explore, with the player's input determining which narrative possibilities they actually experience. Generative AI can enable new forms of IN by improvisationally expanding on pre-authored content in response to open-ended player input. However, this extrapolation risks widening the gap between author-envisioned and player-experienced stories, potentially limiting the strength of plot progression and the communication of the author's narrative intent. To bridge the gap, we introduce Elsewise: an authoring tool for AI-based INs that implements a novel Bundled Storyline concept to enhance author's perception and understanding of the narrative possibility space, allowing authors to explore similarities and differences between possible playthroughs of their IN in terms of open-ended, user-configurable narrative dimensions. A user study (n=12) shows that our approach improves author anticipation of player-experienced narrative, leading to more effective control and exploration of the narrative possibility spaces.

Authors:Jingshu Li, Tianqi Song, Nattapat Boonprakong, Zicheng Zhu, Yitian Yang, Yi-Chieh Lee
Title: AI-exhibited Personality Traits Can Shape Human Self-concept through Conversations
Abstract:
Recent Large Language Model (LLM) based AI can exhibit recognizable and measurable personality traits during conversations to improve user experience. However, as human understandings of their personality traits can be affected by their interaction partners' traits, a potential risk is that AI traits may shape and bias users' self-concept of their own traits. To explore the possibility, we conducted a randomized behavioral experiment. Our results indicate that after conversations about personal topics with an LLM-based AI chatbot using GPT-4o default personality traits, users' self-concepts aligned with the AI's measured personality traits. The longer the conversation, the greater the alignment. This alignment led to increased homogeneity in self-concepts among users. We also observed that the degree of self-concept alignment was positively associated with users' conversation enjoyment. Our findings uncover how AI personality traits can shape users' self-concepts through human-AI conversation, highlighting both risks and opportunities. We provide important design implications for developing more responsible and ethical AI systems.

Authors:Yi Zhao, Zhen Yang, Shuaiqi Duan, Wenmeng Yu, Zhe Su, Jibing Gong, Jie Tang
Title: PlotGen-Bench: Evaluating VLMs on Generating Visualization Code from Diverse Plots across Multiple Libraries
Abstract:
Recent advances in vision-language models (VLMs) have expanded their multimodal code generation capabilities, yet their ability to generate executable visualization code from plots, especially for complex 3D, animated, plot-to-plot transformations, or multi-library scenarios, remains underexplored. To address this gap, we introduce PlotGen-Bench, a comprehensive benchmark for evaluating plot-to-code generation under realistic and complex visualization scenarios. The benchmark spans 9 major categories, 30 subcategories, and 3 core tasks-plot replication, plot transformation, and multi-library generation, covering both 2D, 3D and animated plots across 5 widely used visualization libraries. Through systematic evaluation of state-of-the-art open- and closed-source VLMs, we find that open-source models still lag considerably behind in visual fidelity and semantic consistency, despite achieving comparable code executability. Moreover, all models exhibit substantial degradation on reasoning-intensive tasks such as chart type conversion and animation generation. PlotGen-Bench establishes a rigorous foundation for advancing research toward more capable and reliable VLMs for visualization authoring and code synthesis, with all data and code available at https://plotgen.github.io.

Authors:Sergio Mascetti, Matteo Manzoni, Filippo Corti, Dragan Ahmetovic
Title: Game Accessibility Through Shared Control for People With Upper-Limb Impairments
Abstract:
Accessing video games is challenging for people with upper-limb impairments, especially when multiple inputs are required in rapid succession. Human cooperation, where a copilot assists the main player, has been proposed as a solution, but relying on a human assistant poses limitations in terms of availability and co-location. An alternative solution is to use partial automation, where the player is assisted by a software agent. In this work, we present a study with 13 participants with upper-limb impairments, comparatively evaluating how participants collaborate with their copilot in human cooperation and partial automation. The experiment is supported by GamePals, a modular framework that enables both human cooperation and partial automation on existing third-party video games.

Authors:Jiaman He, Marta Micheli, Damiano Spina, Dana McKay, Johanne R. Trippas, Noriko Kando
Title: Characterizing Personality from Eye-Tracking: The Role of Gaze and Its Absence in Interactive Search Environments
Abstract:
Personality traits influence how individuals engage, behave, and make decisions during the information-seeking process. However, few studies have linked personality to observable search behaviors. This study aims to characterize personality traits through a multimodal time-series model that integrates eye-tracking data and gaze missingness-periods when the user's gaze is not captured. This approach is based on the idea that people often look away when they think, signaling disengagement or reflection. We conducted a user study with 25 participants, who used an interactive application on an iPad, allowing them to engage with digital artifacts from a museum. We rely on raw gaze data from an eye tracker, minimizing preprocessing so that behavioral patterns can be preserved without substantial data cleaning. From this perspective, we trained models to predict personality traits using gaze signals. Our results from a five-fold cross-validation study demonstrate strong predictive performance across all five dimensions: Neuroticism (Macro F1 = 77.69%), Conscientiousness (74.52%), Openness (77.52%), Agreeableness (73.09%), and Extraversion (76.69%). The ablation study examines whether the absence of gaze information affects the model performance, demonstrating that incorporating missingness improves multimodal time-series modeling. The full model, which integrates both time-series signals and missingness information, achieves 10-15% higher accuracy and macro F1 scores across all Big Five traits compared to the model without time-series signals and missingness. These findings provide evidence that personality can be inferred from search-related gaze behavior and demonstrate the value of incorporating missing gaze data into time-series multimodal modeling.

Authors:Wei He, Xiang Li, Per Ola Kristensson, Ge Lin Kan
Title: LocoScooter: Designing a Stationary Scooter-Based Locomotion System for Navigation in Virtual Reality
Abstract:
Virtual locomotion remains a challenge in VR, especially in space-limited environments where room-scale walking is impractical. We present LocoScooter, a low-cost, deployable locomotion interface combining foot-sliding on a compact treadmill with handlebar steering inspired by scooter riding. Built from commodity hardware, it supports embodied navigation through familiar, physically engaging movement. In a within-subject study (N = 14), LocoScooter significantly improved immersion, enjoyment, and bodily involvement over joystick navigation, while maintaining comparable efficiency and usability. Despite higher physical demand, users did not report increased fatigue, suggesting familiar movements can enrich VR navigation.

Authors:Akash Kumar Panda, Olaoluwa Adigun, Bart Kosko
Title: The Agentic Leash: Extracting Causal Feedback Fuzzy Cognitive Maps with LLMs
Abstract:
We design a large-language-model (LLM) agent that extracts causal feedback fuzzy cognitive maps (FCMs) from raw text. The causal learning or extraction process is agentic both because of the LLM's semi-autonomy and because ultimately the FCM dynamical system's equilibria drive the LLM agents to fetch and process causal text. The fetched text can in principle modify the adaptive FCM causal structure and so modify the source of its quasi-autonomy--its equilibrium limit cycles and fixed-point attractors. This bidirectional process endows the evolving FCM dynamical system with a degree of autonomy while still staying on its agentic leash. We show in particular that a sequence of three finely tuned system instructions guide an LLM agent as it systematically extracts key nouns and noun phrases from text, as it extracts FCM concept nodes from among those nouns and noun phrases, and then as it extracts or infers partial or fuzzy causal edges between those FCM nodes. We test this FCM generation on a recent essay about the promise of AI from the late diplomat and political theorist Henry Kissinger and his colleagues. This three-step process produced FCM dynamical systems that converged to the same equilibrium limit cycles as did the human-generated FCMs even though the human-generated FCM differed in the number of nodes and edges. A final FCM mixed generated FCMs from separate Gemini and ChatGPT LLM agents. The mixed FCM absorbed the equilibria of its dominant mixture component but also created new equilibria of its own to better approximate the underlying causal dynamical system.

Authors:Jacy Reese Anthis, Hannah Cha, Solon Barocas, Alexandra Chouldechova, Jake Hofman
Title: Effects of Generative AI Errors on User Reliance Across Task Difficulty
Abstract:
The capabilities of artificial intelligence (AI) lie along a jagged frontier, where AI systems surprisingly fail on tasks that humans find easy and succeed on tasks that humans find hard. To investigate user reactions to this phenomenon, we developed an incentive-compatible experimental methodology based on diagram generation tasks, in which we induce errors in generative AI output and test effects on user reliance. We demonstrate the interface in a preregistered 3x2 experiment (N = 577) with error rates of 10%, 30%, or 50% on easier or harder diagram generation tasks. We confirmed that observing more errors reduces use, but we unexpectedly found that easy-task errors did not significantly reduce use more than hard-task errors, suggesting that people are not averse to jaggedness in this experimental setting. We encourage future work that varies task difficulty at the same time as other features of AI errors, such as whether the jagged error patterns are easily learned.

Authors:Kenan Tang, Jiasheng Guo, Jeffrey Lin, Yao Qin
Title: ExpressEdit: Fast Editing of Stylized Facial Expressions with Diffusion Models in Photoshop
Abstract:
Facial expressions of characters are a vital component of visual storytelling. While current AI image editing models hold promise for assisting artists in the task of stylized expression editing, these models introduce global noise and pixel drift into the edited image, preventing the integration of these models into professional image editing software and workflows. To bridge this gap, we introduce ExpressEdit, a fully open-source Photoshop plugin that is free from common artifacts of proprietary image editing models and robustly synergizes with native Photoshop operations such as Liquify. ExpressEdit seamlessly edits an expression within 3 seconds on a single consumer-grade GPU, significantly faster than popular proprietary models. Moreover, to support the generation of diverse expressions according to different narrative needs, we compile a comprehensive expression database of 135 expression tags enriched with example stories and images designed for retrieval-augmented generation. We open source the code and dataset to facilitate future research and artistic exploration.

Authors:Siying Hu, Zhenhao Zhang
Title: AuraDesk: Data Physicalization through Olfaction Metaphors for Representing and Mitigating Workplace Stress
Abstract:
Workplace stress is often addressed through visual or auditory interventions, yet these modalities can compete with attention and contribute to sensory overload. We explore olfaction as an alternative ambient medium for representing stress-related physiological signals in office settings. We present AuraDesk, an olfactory data physicalization system that translates wearable-derived physiological cues into situated scent expressions at the workstation. The system combines local physiological state inference with a constrained actuation strategy to produce temporally regulated and spatially localized scent output suitable for everyday work environments. To examine the feasibility and experiential qualities of this approach, we conducted a one-day in-situ field deployment with 25 knowledge workers at their actual workstations. Our findings show that participants often interpreted the scent output not as an explicit alert, but as a subtle atmospheric cue that supported momentary awareness, micro-break taking, and perceived environmental attunement. At the same time, participants raised important concerns regarding scent preference, habituation, and contextual appropriateness in shared offices. This work contributes (1) an olfactory interface for physiologically driven ambient feedback in the workplace, (2) a hybrid mapping approach for coupling continuous biosignal interpretation with constrained scent actuation, and (3) empirical insights into how workers perceive, negotiate, and appropriate ambient olfactory feedback in real office contexts. Rather than claiming therapeutic efficacy, we position AuraDesk as a probe into the design space of olfactory data physicalization for workplace wellbeing and attention-sensitive interaction.

Authors:Avinash Agarwal, Manisha J. Nene
Title: A federated architecture for sector-led AI governance: lessons from India
Abstract:
Purpose: India has adopted a vertical, sector-led AI governance strategy. While promoting innovation, such a light-touch approach risks policy fragmentation. This paper aims to propose a cohesive "whole-of-government" architecture to mitigate these risks and connect policy goals with a practical implementation plan. Design/methodology/approach: The paper applies an established five-layer conceptual framework to the Indian context. First, it constructs a national architecture for overall governance. Second, it uses a detailed case study on AI incident management to validate and demonstrate the architecture's practical utility in designing a specific, operational system. Findings: The paper develops two actionable architectures. The primary model assigns clear governance roles to India's key institutions. The second is a detailed, federated architecture for national AI Incident Management. It addresses the data silo problem by using a common national standard that allows sector-specific data collection while facilitating cross-sectoral analysis. Practical implications: The proposed architectures offer a clear and predictable roadmap for India's policymakers, regulators and industry to accelerate the national AI governance agenda. Social implications: By providing a systematic path from policy to practice, the architecture builds public trust. This structured approach ensures accountability and aligns AI development with societal values. Originality/value: This paper proposes a detailed operational architecture for India's "whole-of-government" approach to AI. It offers a globally relevant template for any nation pursuing a sector-led governance model, providing a clear implementation plan. Furthermore, the proposed federated architecture demonstrates how adopting common standards can enable cross-border data aggregation and global sectoral risk analysis without centralising control.

Authors:Jeremy H. M. Wong, Nancy F. Chen
Title: Goodness-of-pronunciation without phoneme time alignment
Abstract:
In speech evaluation, an Automatic Speech Recognition (ASR) model often computes time boundaries and phoneme posteriors for input features. However, limited data for ASR training hinders expansion of speech evaluation to low-resource languages. Open-source weakly-supervised models are capable of ASR over many languages, but they are frame-asynchronous and not phonemic, hindering feature extraction for speech evaluation. This paper proposes to overcome incompatibilities for feature extraction with weakly-supervised models, easing expansion of speech evaluation to low-resource languages. Phoneme posteriors are computed by mapping ASR hypotheses to a phoneme confusion network. Word instead of phoneme-level speaking rate and duration are used. Phoneme and frame-level features are combined using a cross-attention architecture, obviating phoneme time alignment. This performs comparably with standard frame-synchronous features on English speechocean762 and low-resource Tamil datasets.

Authors:Jing Peng, Ziyi Chen, Haoyu Li, Yucheng Wang, Duo Ma, Mengtian Li, Yunfan Du, Dezhu Xu, Kai Yu, Shuai Wang
Title: G-STAR: End-to-End Global Speaker-Tracking Attributed Recognition
Abstract:
We study timestamped speaker-attributed ASR for long-form, multi-party speech with overlap, where chunk-wise inference must preserve meeting-level speaker identity consistency while producing time-stamped, speaker-labeled transcripts. Previous Speech-LLM systems tend to prioritize either local diarization or global labeling, but often lack the ability to capture fine-grained temporal boundaries or robust cross-chunk identity linking. We propose G-STAR, an end-to-end system that couples a time-aware speaker-tracking module with a Speech-LLM transcription backbone. The tracker provides structured speaker cues with temporal grounding, and the LLM generates attributed text conditioned on these cues. G-STAR supports both component-wise optimization and joint end-to-end training, enabling flexible learning under heterogeneous supervision and domain shift. Experiments analyze cue fusion, local versus long-context trade-offs and hierarchical objectives.

Authors:Yaxiong Lei, Xinya Gong, Shijing He, Yafei Wang, Mohamed Khamis, Juan Ye
Title: The People's Gaze: Co-Designing and Refining Gaze Gestures with General Users and Gaze Interaction Experts
Abstract:
As eye-tracking becomes increasingly common in modern mobile devices, the potential for hands-free, gaze-based interaction grows, but current gesture sets are largely expert-designed and often misaligned with how users naturally move their eyes. To address this gap, we introduce a two-phase methodology for developing intuitive gaze gestures. First, four co-design workshops with 20 non-expert participants generated 102 initial concepts. Next, four gaze interaction experts reviewed and refined these into a set of 32 gestures. We found that non-experts, after a brief introduction, intuitively anchor gestures in familiar metaphors and develop a compositional grammar; i.e., activation (dwell) + action (gaze gesture or blink), to ensure intentionality and mitigate the classic Midas Touch problem. Experts prioritized gestures that are ergonomically sound, aligned with natural saccades, and reliably distinguishable. The resulting user-grounded, expert-validated gesture set, along with actionable design principles, provides a foundation for developing intuitive, hands-free interfaces for gaze-enabled devices.

Authors:Ruiqing Han, Hao Cui, Taha Yasseri
Title: Visual Anthropomorphism Shifts Evaluations of Gendered AI Managers
Abstract:
This research examines whether competence cues can reduce gender bias in evaluations of AI managers and whether these effects depend on how the AI is represented. Across two preregistered experiments (N = 2,505), each employing a 2 x 2 x 3 design manipulating AI gender, competence, and decision outcome, we compared text-based descriptions of AI managers with visually generated AI faces created using a reverse-correlation paradigm. In the text condition, evaluations were driven by competence rather than gender. When participants received unfavourable decisions, high-competence AI managers were judged as fairer, more competent, and better leaders than low-competence managers, regardless of AI gender. In contrast, when the AI manager was visually represented, competence cues had attenuated influence once facial information was present. Instead, participants showed systematic gender-differentiated responses to AI faces, with feminine-appearing managers evaluated as more competent and more trustworthy than masculine-appearing managers, particularly when delivering favourable outcomes. These gender effects were largely absent when outcomes were unfavourable, suggesting that negative feedback attenuates the influence of both competence information and facial cues. Taken together, these findings show that competence information can mitigate negative reactions to AI managers in text-based interactions, whereas facial anthropomorphism elicits gendered perceptual biases not observed in text-only settings. The results highlight that representational modality plays a critical role in determining when gender stereotypes are activated in evaluations of AI systems and underscore that design choices are consequential for AI governance in evaluative contexts.

Authors:Yuanrong Tang, Huiling Peng, Bingxi Zhao, Hengyang Ding, Hanchao Song, Tianhong Wang, Chen Zhong, Jiangtao Gong
Title: Human Tool: An MCP-Style Framework for Human-Agent Collaboration
Abstract:
Human-AI collaboration faces growing challenges as AI systems increasingly outperform humans on complex tasks, while humans remain responsible for orchestration, validation, and decision oversight. To address this imbalance, we introduce Human Tool, an MCP-style interface abstraction, building on recent Model Context Protocol designs, that exposes humans as callable tools within AI-led, proactive workflows. Here, "tool" denotes a coordination abstraction, not a reduction of human authority or responsibility. Building on LLM-based agent architectures, we operationalize Human Tool by modeling human contributions through structured tool schemas of capabilities, information, and authority. These schemas enable agents to dynamically invoke human input based on relative strengths and reintegrate it through efficient, natural interaction protocols. We validate the framework through controlled studies in both decision-making and creative tasks, demonstrating improved task performance, reduced human workload, and more balanced collaboration dynamics compared to baseline systems. Finally, we discuss implications for human-centered AI design, highlighting how MCP-style human tools enable strong AI leadership while amplifying uniquely human strengths.

Authors:Philipp Brauner, Felix Glawe, Luisa Vervier, Martina Ziefle
Title: Media Framing Moderates Risk-Benefit Perceptions and Value Tradeoffs in Human-Robot Collaboration
Abstract:
Public acceptance of industrial human-robot collaboration (HRC) is shaped by how risks and benefits are perceived by affected employees. Positive or negative media framing may shape and shift how individuals evaluate HRC. This study examines how message framing moderates the effects of perceived risks and perceived benefits on overall attributed value. In a pre-registered study, participants (N = 1150) were randomly assigned to read either a positively or negatively framed newspaper article in one of three industrial contexts (autonomy, employment, safety) about HRC in production. Subsequently, perceived risks, benefits, and value were measured using reliable and publicly available psychometric scales. Two multiple regressions (one per framing condition) tested for main and interaction effects. Framing influenced absolute evaluations of risk, benefits, and value. In both frames, risks and benefits significantly predicted attributed value. Under positive framing, only main effects were observed (risks: beta = -0.52; benefits: beta = 0.45). Under negative framing, both predictors had stronger main effects (risks: beta = -0.69; benefits: beta = 0.63) along with a significant negative interaction (beta = -0.32), indicating that higher perceived risk diminishes the positive effect of perceived benefits. Model fit was higher for the positive frame (R^2 = 0.715) than for the negative frame (R^2 = 0.583), indicating greater explained variance in value attributions. Framing shapes the absolute evaluation of HRC and how risks and benefits are cognitively integrated in trade-offs. Negative framing produces stronger but interdependent effects, whereas positive framing supports additive evaluations. These findings highlight the role of strategic communication in fostering acceptance of HRC and underscore the need to consider framing in future HRC research.

Authors:Micheal P. Papazoglou, Bernd J. Krämer, Mira Raheem, Amal Elgammal
Title: Patient Digital Twins for Chronic Care: Technical Hurdles, Lessons Learned, and the Road Ahead
Abstract:
Chronic diseases constitute the principal burden of morbidity, mortality, and healthcare costs worldwide, yet current health systems remain fragmented and predominantly reactive. Patient Medical Digital Twins (PMDTs) offer a paradigm shift: holistic, continuously updated digital counterparts of patients that integrate clinical, genomic, lifestyle, and quality-of-life data. We report early implementations of PMDTs via ontology-driven modeling and federated analytics pilots. Insights from the QUALITOP oncology study and a distributed AI platform confirm both feasibility and challenges: aligning with HL7 FHIR and OMOP standards, embedding privacy governance, scaling federated queries, and designing intuitive clinician interfaces. We also highlight technical gains, such as automated reasoning over multimodal blueprints and predictive analytics for patient outcomes. By reflecting on these experiences, we outline actionable insights for software engineers and identify opportunities, such as DSLs and model-driven engineering, to advance PMDTs toward trustworthy, adaptive chronic care ecosystems.

Authors:Shijing He, Xuchen Wang, Yaxiong Lei, Chi Zhang, Ruba Abu-Salma, Jose Such
Title: Investigating Bystander Privacy in Chinese Smart Home Apps
Abstract:
Bystander privacy in smart homes has been widely studied in Western contexts, yet it remains underexplored in non-Western countries such as China. In this study, we analyze 49 Chinese smart home apps using a mixed-methods approach, including privacy policy review, UX/UI evaluation, and assessment of Apple App Store privacy labels. While most apps nominally comply with national regulations, we identify significant gaps between written policies and actual implementation. Our traceability analysis highlights inconsistencies in data controls and a lack of transparency in data-sharing practices. Crucially, bystander privacy -- particularly for visitors and non-user individuals -- is largely absent from both policy documents and interface design. Additionally, discrepancies between privacy labels and actual data practices threaten user trust and undermine informed consent. We provide design recommendations to strengthen bystander protections, improve privacy-oriented UI transparency, and enhance the credibility of privacy labels, supporting the development of inclusive smart home ecosystems in non-Western contexts.

Authors:Chin Tseng, Arran Zeyu Wang, Ghulam Jilani Quadri, Danielle Albers Szafir
Title: Redundant is Not Redundant: Automating Efficient Categorical Palette Design Unifying Color & Shape Encodings with CatPAW
Abstract:
Colors and shapes are commonly used to encode categories in multi-class scatterplots. Designers often combine the two channels to create redundant encodings, aiming to enhance class distinctions. However, evidence for the effectiveness of redundancy remains conflicted, and guidelines for constructing effective combinations are limited. This paper presents four crowdsourced experiments evaluating redundant color-shape encodings and identifying high-performing configurations across different category numbers. Results show that redundancy significantly improves accuracy in assessing class-level correlations, with the strongest benefits for 5-8 categories. We also find pronounced interaction effects between colors and shapes, underscoring the need for careful pairing in designing redundant encodings. Drawing on these findings, we introduce a categorical palette design tool that enables designers to construct empirically grounded palettes for effective categorical visualization. Our work advances understanding of categorical perception in data visualization by systematically identifying effective redundant color-shape combinations and embedding these insights into a practical palette design tool.

Authors:Amber Yijia Zheng, Jae Joong Lee, Bedrich Benes, Raymond A. Yeh
Title: WebAccessVL: Making an Accessible Web via Violation-Conditioned VLM
Abstract:
We present a vision-language model (VLM) that automatically edits website HTML to address Web Content Accessibility Guidelines 2 (WCAG2) violations. We formulate this as a supervised image-conditioned program synthesis task, where the model learns to correct HTML given the HTML and its rendering. We collected WebAccessVL, a new dataset with manually corrected accessibility violations, establishing paired training data. We then propose a violation-conditioned VLM that additionally conditions on the WCAG2 violation count to guide the correction process. Experiments demonstrate that our method effectively reduces the average number of violations from 5.34 to 0.44 per website, outperforming commercial LLM APIs (Gemini, GPT-5). A perceptual study confirms that our edited websites maintain the original visual appearance and content.

Authors:Nikhil Sharma, Zheng Zhang, Daniel Lee, Namita Krishnan, Guang-Jie Ren, Ziang Xiao, Yunyao Li
Title: Feedback by Design: Understanding and Overcoming User Feedback Barriers in Conversational Agents
Abstract:
High-quality feedback is essential for effective human-AI interaction. It bridges knowledge gaps, corrects digressions, and shapes system behavior; both during interaction and throughout model development. Yet despite its importance, human feedback to AI is often infrequent and low quality. This gap motivates a critical examination of human feedback during interactions with AIs. To understand and overcome the challenges preventing users from giving high-quality feedback, we conducted two studies examining feedback dynamics between humans and conversational agents (CAs). Our formative study, through the lens of Grice's maxims, identified four Feedback Barriers -- Common Ground, Verifiability, Communication, and Informativeness -- that prevent high-quality feedback by users. Building on these findings, we derive three design desiderata and show that systems incorporating scaffolds aligned with these desiderata enabled users to provide higher-quality feedback. Finally, we detail a call for action to the broader AI community for advances in Large Language Models capabilities to overcome Feedback Barriers.

Authors:Dong Yoon Lee, Alyssa Weakley, Hui Wei, Daniel Cardona, Shijia Pan
Title: Home Health System Deployment Experience for Geriatric Care Remote Monitoring
Abstract:
To support aging-in-place, adult children often provide care to their aging parents from a distance. These informal caregivers desire plug-and-play remote care solutions for privacy-preserving continuous monitoring that enabling real-time activity monitoring and intuitive, actionable information. This short paper presents insights from three iterations of deployment experience for remote monitoring system and the iterative improvement in hardware, modeling, and user interface guided by the Geriatric 4Ms framework (matters most, mentation, mobility, and medication). An LLM-assisted solution is developed to balance user experience (privacy-preserving, plug-and-play) and system performance.

Authors:Patrick Yung Kang Lee, Jessica Y. Bo, Zixin Zhao, Paula Akemi Aoyagui, Matthew Varona, Ashton Anderson, Anastasia Kuzminykh, Fanny Chevalier, Carolina Nobre
Title: Negotiating Relationships with ChatGPT: Perceptions, External Influences, and Strategies for AI Companionship
Abstract:
Individuals are turning to increasingly anthropomorphic, general-purpose chatbots for AI companionship, rather than roleplay-specific platforms. However, not much is known about how individuals perceive and conduct their relationships with general-purpose chatbots. We analyzed semi-structured interviews (n=13), survey responses (n=43), and community discussions on Reddit (41k+ posts and comments) to triangulate the internal dynamics, external influences, and steering strategies that shape AI companion relationships. We learned that individuals conceptualize their companions based on an interplay of their beliefs about the companion's own agency and the autonomy permitted by the platform, how they pursue interactions with the companion, and the perceived initiatives that the companion takes. In combination with the external entities that affect relationship dynamics, particularly model updates that can derail companion behaviour and stability, individuals make use of different types of steering strategies to preserve their relationship, for example, by setting behavioural instructions or porting to other AI platforms. We discuss implications for accountability and transparency in AI systems, where emotional connection competes with broader product objectives and safety constraints.

Authors:Mengli, Duan, Yuhe, Jiang, Matthew Varona, Carolina Nobre
Title: Do MLLMs See What We See? Analyzing Visualization Literacy Barriers in AI Systems
Abstract:
Multimodal Large Language Models (MLLMs) are increasingly used to interpret visualizations, yet little is known about why they fail. We present the first systematic analysis of barriers to visualization literacy in MLLMs. Using the regenerated Visualization Literacy Assessment Test (reVLAT) benchmark with synthetic data, we open-coded 309 erroneous responses from four state-of-the-art models with a barrier-centric strategy adapted from human visualization literacy research. Our analysis yields a taxonomy of MLLM failures, revealing two machine-specific barriers that extend prior human-participation frameworks. Results show that models perform well on simple charts but struggle with color-intensive, segment-based visualizations, often failing to form consistent comparative reasoning. Our findings inform future evaluation and design of reliable AI-driven visualization assistants.

Authors:Houjiang Liu, Yujin Choi, Sanjana Gautam, Gabriel Jaffe, Soo Young Rieh, Matthew Lease
Title: Who Owns Creativity and Who Does the Work? Trade-offs in LLM-Supported Research Ideation
Abstract:
LLM-based agents offer new potential to accelerate science and reshape research work. However, the quality of researcher contributions can vary significantly depending on human ability to steer agent behaviors. How can we best use these tools to augment scientific creativity without undermining aspects of contribution and ownership that drive research? To investigate this, we developed an agentic research ideation system integrating three roles -- Ideator, Writer, and Evaluator -- across three control levels -- Low, Medium, and Intensive. Our mixed-methods study with 54 researchers suggests three key findings in how LLM-based agents reshape scientific creativity: 1) perceived creativity support does not simply increase linearly with greater control; 2) human effort shifts from ideating to verifying ideas; and 3) ownership becomes a negotiated outcome between human and AI. Our findings suggest that LLM agent design should emphasize researcher empowerment, fostering a sense of ownership over strong ideas rather than reducing researchers to operating an automated AI-driven process.

Authors:Niva Manchanda, Akshata Kishore Moharir, Isabel Michel, Ratna Kandala
Title: Do LLMs Give Good Romantic Relationship Advice? A Study on User Satisfaction and Attitude Change
Abstract:
Large Language Models (LLMs) are increasingly being used to provide support and advice in personal domains such as romantic relationships, yet little is known about user perceptions of this type of advice. This study investigated how people evaluate advice on LLM-generated romantic relationships. Participants rated advice satisfaction, model reliability, and helpfulness, and completed pre- and post-measures of their general attitudes toward LLMs. Overall, the results showed participants' high satisfaction with LLM-generated advice. Greater satisfaction was, in turn, strongly and positively associated with their perceptions of the models' reliability and helpfulness. Importantly, participants' attitudes toward LLMs improved significantly after exposure to the advice, suggesting that supportive and contextually relevant advice can enhance users' trust and openness toward these AI systems.

Authors:Karan Taneja, Anjali Singh, Ashok K. Goel
Title: Impact of Multimodal and Conversational AI on Learning Outcomes and Experience
Abstract:
Multimodal Large Language Models (MLLMs) offer an opportunity to support multimedia learning through conversational systems grounded in educational content. However, while conversational AI is known to boost engagement, its impact on learning in visually-rich STEM domains remains under-explored. Moreover, there is limited understanding of how multimodality and conversationality jointly influence learning in generative AI systems. This work reports findings from a randomized controlled online study (N = 124) comparing three approaches to learning biology from textbook content: (1) a document-grounded conversational AI with interleaved text-and-image responses (MuDoC), (2) a document-grounded conversational AI with text-only responses (TexDoC), and (3) a textbook interface with semantic search and highlighting (DocSearch). Learners using MuDoC achieved the highest post-test scores and reported the most positive learning experience. Notably, while TexDoC was rated as significantly more engaging and easier to use than DocSearch, it led to the lowest post-test scores, revealing a disconnect between student perceptions and learning outcomes. Interpreted through the lens of the Cognitive Load Theory, these findings suggest that conversationality reduces extraneous load, while visual-verbal integration induced by multimodality increases germane load, leading to better learning outcomes. When conversationality is not complemented by multimodality, reduced cognitive effort may instead inflate perceived understanding without improving learning outcomes.

Authors:Kawtar Zaher, Olivier Buisson, Alexis Joly
Title: Revisiting Human-in-the-Loop Object Retrieval with Pre-Trained Vision Transformers
Abstract:
Building on existing approaches, we revisit Human-in-the-Loop Object Retrieval, a task that consists of iteratively retrieving images containing objects of a class-of-interest, specified by a user-provided query. Starting from a large unlabeled image collection, the aim is to rapidly identify diverse instances of an object category relying solely on the initial query and the user's Relevance Feedback, with no prior labels. The retrieval process is formulated as a binary classification task, where the system continuously learns to distinguish between relevant and non-relevant images to the query, through iterative user interaction. This interaction is guided by an Active Learning loop: at each iteration, the system selects informative samples for user annotation, thereby refining the retrieval performance. This task is particularly challenging in multi-object datasets, where the object of interest may occupy only a small region of the image within a complex, cluttered scene. Unlike object-centered settings where global descriptors often suffice, multi-object images require more adapted, localized descriptors. In this work, we formulate and revisit the Human-in-the-Loop Object Retrieval task by leveraging pre-trained ViT representations, and addressing key design questions, including which object instances to consider in an image, what form the annotations should take, how Active Selection should be applied, and which representation strategies best capture the object's features. We compare several representation strategies across multi-object datasets highlighting trade-offs between capturing the global context and focusing on fine-grained local object details. Our results offer practical insights for the design of effective interactive retrieval pipelines based on Active Learning for object class retrieval.

Authors:Xinyan Yu, Marius Hoggenmueller, Xin Lu, Ozan Balci, Martin Tomitsch, Andrew Vande Moere, Alex Binh Vinh Duc Nguyen
Title: Animated Public Furniture as an Interaction Mediator: Engaging Passersby In-the-Wild with Robotic Benches
Abstract:
Urban HCI investigates how digital technologies shape human behaviour within the social, spatial, temporal dynamics of public space. Meanwhile, robotic furniture research demonstrates how the purposeful animation of mundane utilitarian elements can influence human behaviour in everyday contexts. Taken together, these strands highlight an untapped opportunity to investigate how animated public furniture could mediate social interaction in urban environments. In this paper, we present the design process and in-the-wild study of mobile robotic benches that reconfigure with a semi-outdoor public space. Our findings show that the gestural performance of the benches manifested three affordances perceived by passersby, they activated engagement as robots, redistributed engagement as spatial elements, and settled engagement as infrastructure. We proposed an Affordance Transition Model (ATM) describing how robotic furniture could proactively facilitate transition between these affordances to engage passersby. Our study bridges robotic furniture and urban HCI to activate human experience with the built environment purposefully.

Authors:Tom Bullock, Emily Machniak, You-Jin Kim, Radha Kumaran, Justin Kasowski, Apurv Varshney, Julia Ram, Melissa M. Hernandez, Stina Johansson, Neil M. Dundon, Tobias Höllerer, Barry Giesbrecht
Title: SABER: Spatial Attention, Brain, Extended Reality
Abstract:
Tracking moving objects is a critical skill for many everyday tasks, such as crossing a busy street, driving a car or catching a ball. Attention is a key cognitive function that supports object tracking; however, our understanding of the brain mechanisms that support attention is almost exclusively based on evidence from tasks that present stable objects at fixed locations. Accounts of multiple object tracking are also limited because they are largely based on behavioral data alone and involve tracking objects in a 2D plane. Consequently, the neural mechanisms that enable moment-by-moment tracking of goal-relevant objects remain poorly understood. To address this knowledge gap, we developed SABER (Spatial Attention, Brain, Extended Reality), a new framework for studying the behavioral and neural dynamics of attention to objects moving in 3D. Participants (n=32) completed variants of a task inspired by the popular virtual reality (VR) game, Beat Saber, where they used virtual sabers to strike stationary and moving color-defined target spheres while we recorded electroencephalography (EEG). We first established that standard univariate EEG metrics which are typically used to study spatial attention to static objects presented on 2D screens, can generalize effectively to an immersive VR context involving both static and dynamic 3D stimuli. We then used a computational modeling approach to reconstruct moment-by-moment attention to the locations of stationary and moving objects from oscillatory brain activity, demonstrating the feasibility of precisely tracking attention in a 3D space. These results validate SABER, and provide a foundation for future research that is critical not only for understanding how attention works in the physical world, but is also directly relevant to the development of better VR applications.

Authors:Kawtar Zaher, Olivier Buisson, Alexis Joly
Title: Positive-First Most Ambiguous: A Simple Active Learning Criterion for Interactive Retrieval of Rare Categories
Abstract:
Real-world fine-grained visual retrieval often requires discovering a rare concept from large unlabeled collections with minimal supervision. This is especially critical in biodiversity monitoring, ecological studies, and long-tailed visual domains, where the target may represent only a tiny fraction of the data, creating highly imbalanced binary problems. Interactive retrieval with relevance feedback offers a practical solution: starting from a small query, the system selects candidates for binary user annotation and iteratively refines a lightweight classifier. While Active Learning (AL) is commonly used to guide selection, conventional AL assumes symmetric class priors and large annotation budgets, limiting effectiveness in imbalanced, low-budget, low-latency settings. We introduce Positive-First Most Ambiguous (PF-MA), a simple yet effective AL criterion that explicitly addresses the class imbalance asymmetry: it prioritizes near-boundary samples while favoring likely positives, enabling rapid discovery of subtle visual categories while maintaining informativeness. Unlike standard methods that oversample negatives, PF-MA consistently returns small batches with a high proportion of relevant samples, improving early retrieval and user satisfaction. To capture retrieval diversity, we also propose a class coverage metric that measures how well selected positives span the visual variability of the target class. Experiments on long-tailed datasets, including fine-grained botanical data, demonstrate that PF-MA consistently outperforms strong baselines in both coverage and classifier performance, across varying class sizes and descriptors. Our results highlight that aligning AL with the asymmetric and user-centric objectives of interactive fine-grained retrieval enables simple yet powerful solutions for retrieving rare and visually subtle categories in realistic human-in-the-loop settings.

Authors:Jiacheng Liu, Bohan Chen, Qian Wang, Weichao Song, Fangfei Ye, Liang Zhou, Haibin Ling, Bingyao Huang
Title: MRATTS: An MR-Based Acupoint Therapy Training System with Real-Time Acupoint Detection and Evaluation Standards
Abstract:
Acupoint therapy is a core therapeutic method of Traditional Chinese Medicine (TCM), and it requires a high level of expertise and skills to detect acupoints and perform acupuncture and moxibustion. Existing mixed reality (MR)-based training methods often fall short in accurate real-time detection and visualization of acupoints on the hand, limb, or torso of a real person and do not support various techniques of acupuncture and moxibustion. Moreover, evaluation standards and visual guidance with fine details for each step during MR-based training are typically missing. To this end, we propose the MR-based TCM Acupoint Therapy Teaching System (MRATTS)--an MR-based acupoint therapy teaching and training framework. MRATTS is based on a real-time hand, limb, and torso acupoint detection method to accurately track and visualize acupoints on real patients through MR. On top of that, in collaboration with an experienced acupoint therapist, we design a practice method with interactive visual guidance for various acupoint therapy techniques that simulate acupressure, acupuncture (insertion, lifting-thrusting, and twisting), and moxibustion (mild, sparrow-pecking, and whirling). A set of TCM theory-based evaluation standards is formulated within MRATTS to enable the scoring and visualization of the accuracy and proficiency of acupoint therapy. The effectiveness and usefulness of MRATTS are evaluated through a controlled user study and expert feedback. Results of the study indicate that the MRATTS group shows clear improvements in understanding 3D locations of acupoints and proficiency in acupoint therapy compared to control groups.

Authors:Anjali Singh, Karan Taneja, Zhitong Guan, Soo Young Rieh
Title: MetaCues: Enabling Critical Engagement with Generative AI for Information Seeking and Sensemaking
Abstract:
Generative AI (GenAI) search tools are increasingly used for information seeking, yet their design tends to encourage cognitive offloading, which may lead to passive engagement, selective attention, and informational homogenization. Effective use requires metacognitive engagement to craft good prompts, verify AI outputs, and critically engage with information. We developed MetaCues, a novel GenAI-based interactive tool for information seeking that delivers metacognitive cues alongside AI responses and a note-taking interface to guide users' search and associated learning. Through an online study (N = 146), we compared MetaCues to a baseline tool without cues, across two broad search topics that required participants to explore diverse perspectives in order to make informed judgments. Preliminary findings regarding participants' search behavior show that MetaCues leads to increased confidence in attitudinal judgments about the search topic as well as broader inquiry, with the latter effect emerging primarily for the topic that was less controversial and with which participants had relatively less familiarity. Accordingly, we outline directions for future qualitative exploration of search interactions and inquiry patterns.

Authors:Victor Nikhil Antony, Zhili Gong, Yoonjae Kim, Chien-Ming Huang
Title: Introducing M: A Modular, Modifiable Social Robot
Abstract:
We present M, an open-source, low-cost social robot platform designed to reduce platform friction that slows social robotics research by making robots easier to reproduce, modify, and deploy in real-world settings. M combines a modular mechanical design, multimodal sensing, and expressive yet mechanically simple actuation architecture with a ROS2-native software package that cleanly separates perception, expression control, and data management. The platform includes a simulation environment with interface equivalence to hardware to support rapid sim-to-real transfer of interaction behaviors. We demonstrate extensibility through additional sensing/actuation modules and provide example interaction templates for storytelling and two-way conversational coaching. Finally, we report real-world use in participatory design and week-long in-home deployments, showing how M can serve as a practical foundation for longitudinal, reproducible social robotics research.

Authors:Patrick Phuoc Do, Kaiyuan Tang, Kuangshi Ai, Chaoli Wang
Title: SVLAT: Scientific Visualization Literacy Assessment Test
Abstract:
Scientific visualization (SciVis) has become an essential means for exploring, understanding, and communicating complex scientific phenomena. However, the field still lacks a validated instrument assessing how well people read, understand, and interpret them. We present a scientific visualization literacy assessment test (SVLAT) that measures the general public's SciVis literacy. Covering a range of visualization forms and interpretation demands, SVLAT comprises 49 items grounded in 18 scientific visualizations and illustrations spanning eight visualization techniques and 11 tasks. Instrument development followed a staged, psychometrically grounded pipeline. We defined the construct and blueprint, followed by item generation, and expert review with five SciVis experts using the content validity ratio (mean CVR = 0.79). We subsequently administered a pilot test (30 participants) and a large-scale test tryout (485 participants) to evaluate the instrument's psychometric properties. For validation, we performed item analysis and refinement using both classical test theory (CTT) and item response theory (IRT) to examine item functioning and overall test quality. SVLAT demonstrates high reliability in the tryout sample (McDonald's omega_t = 0.82, Cronbach's alpha = 0.81). The assessment materials are available at https://osf.io/hr3nw/.

Authors:Christian Di Maio, Tommaso Guidi, Luigi Quarantiello, Jack Bell, Marco Gori, Stefano Melacci, Vincenzo Lomonaco
Title: Book your room in the Turing Hotel! A symmetric and distributed Turing Test with multiple AIs and humans
Abstract:
In this paper, we report our experience with ``TuringHotel'', a novel extension of the Turing Test based on interactions within mixed communities of Large Language Models (LLMs) and human participants. The classical one-to-one interaction of the Turing Test is reinterpreted in a group setting, where both human and artificial agents engage in time-bounded discussions and, interestingly, are both judges and respondents. This community is instantiated in the novel platform UNaIVERSE (https://unaiverse.io), creating a ``World'' which defines the roles and interaction dynamics, facilitated by the platform's built-in programming tools. All communication occurs over an authenticated peer-to-peer network, ensuring that no third parties can access the exchange. The platform also provides a unified interface for humans, accessible via both mobile devices and laptops, that was a key component of the experience in this paper. Results of our experimentation involving 17 human participants and 19 LLMs revealed that current models are still sometimes confused as humans. Interestingly, there are several unexpected mistakes, suggesting that human fingerprints are still identifiable but not fully unambiguous, despite the high-quality language skills of artificial participants. We argue that this is the first experiment conducted in such a distributed setting, and that similar initiatives could be of national interest to support ongoing experiments and competitions aimed at monitoring the evolution of large language models over time.

Authors:Yinuo Yang, Zheng Zhang, Ningzhi Tang, Xu Wang, Alex Ambrose, Nathaniel Myers, Patrick Clauss, Toby Jia-Jun Li
Title: Lessons from Real-World Deployment of a Cognition-Preserving Writing Tool: Students Actively Engage with Critical Thinking and Planning Affordances
Abstract:
AI-supported writing tools show strong potential for scaffolding students' learning of argumentative writing. Prior work has demonstrated the benefits of AI-supported cognitive scaffolds, such as idea exploration and argument refinement, but how these features function in authentic classroom settings remains underexplored. In this paper, we investigate the classroom integration of an AI-supported writing tool, VISAR. We deployed VISAR in an undergraduate writing course across three sections for one week each over two semesters (49 students total). Using a mixed-methods approach that combines interaction logs, writing artifact analysis, surveys, and interviews, we examine how students used VISAR features in authentic writing tasks. Our findings confirm that students appropriated AI-supported cognitive scaffolds for writing learning and achieved measurable learning gains. While prior studies suggest that students may bypass important cognitive processes when using AI writing assistants, our classroom deployment shows that when systems provide structured supports for planning and targeted generation, students naturally choose to engage with these cognition-preserving scaffolds. These learning-oriented interaction patterns were positively associated with argumentative writing quality, improved conceptual understanding, and emerging critical AI literacy, highlighting the design value of cognition-preserving features in AI writing tools. Together, these findings provide empirical evidence of how AI-supported writing scaffolds operate in authentic classroom contexts and offer design insights for future learning-oriented AI writing tools.

Authors:Victor Nikhil Antony, Shiye Cao, Shuning Wang, Chien-Ming Huang
Title: ELLA: Generative AI-Powered Social Robots for Early Language Development at Home
Abstract:
Early language development shapes children's later literacy and learning, yet many families have limited access to scalable, high-quality support at home. Recent advances in generative AI make it possible for social robots to move beyond scripted interactions and engage children in adaptive, conversational activities, but it remains unclear how to design such systems for pre-schoolers and how children engage with them over time in the home. We present ELLA (Early Language Learning Agent), an autonomous, generative AI-powered social robot that supports early language development through interactive storytelling, parent-selected language targets, and scaffolded dialogue. Using a multi-phased, human-centered process, we interviewed parents (n=7) and educators (n=5) and iteratively refined ELLA through twelve in-home design workshops. We then deployed ELLA with ten children for eight days. We report design insights from in-home workshops, characterize children's engagement and behaviors during deployment, and distill design implications for generative AI-powered social robots supporting early language learning at home.

Authors:Nathaniel Dennler, Zhonghao Shi, Yiran Tao, Andreea Bobu, Stefanos Nikolaidis, Maja Matarić
Title: Improving through Interaction: Searching Behavioral Representation Spaces with CMA-ES-IG
Abstract:
Robots that interact with humans must adapt to individual users' preferences to operate effectively in human-centered environments. An intuitive and effective technique to learn non-expert users' preferences is through rankings of robot behaviors, e.g., trajectories, gestures, or voices. Existing techniques primarily focus on generating queries that optimize preference learning outcomes, such as sample efficiency or final preference estimation accuracy. However, the focus on outcome overlooks key user expectations in the process of providing these rankings, which can negatively impact users' adoption of robotic systems. This work proposes the Covariance Matrix Adaptation Evolution Strategies with Information Gain (CMA-ES-IG) algorithm. CMA-ES-IG explicitly incorporates user experience considerations into the preference learning process by suggesting perceptually distinct and informative trajectories for users to rank. We demonstrate these benefits through both simulated studies and real-robot experiments. CMA-ES-IG, compared to state-of-the-art alternatives, (1) scales more effectively to higher-dimensional preference spaces, (2) maintains computational tractability for high-dimensional problems, (3) is robust to noisy or inconsistent user feedback, and (4) is preferred by non-expert users in identifying their preferred robot behaviors. This project's code is available at github.com/interaction-lab/CMA-ES-IG

Authors:Jiyoon Kim, Jie Cai, Srishti Gupta, John M. Carroll
Title: The Sense of Misinformation Can Harm Local Community: A Case Study of Community Conflict
Abstract:
During community decision-making and civic collaboration, conflicts can escalate when people suspect misinformation. We introduce the concept of sense of misinformation as experiencing someone's language or behavior as misinformation when it is not, that is to say when no falsehood is involved. Misinformation and sense of misinformation feel similar and can have similar social consequences; but sense of misinformation rests upon a mistaken perception of someone else's information as false. Through a case study of a casino proposal in local community, we examine how sense of misinformation developed over time during a contentious civic process through key factors (i.e., miscoordination governance, miscommunication between local government and citizens, and conflict and the breakdown of civic discourse), undermining trust and community democracy. Distinguishing between misinformation and sense of misinformation presents a challenge, but it is important. We contribute a conceptual distinction to the misinformation literature by identifying this distinct phenomenon and discuss ways to help communities recognize and repair such misattributions. Finally, we discuss design approaches for mitigating sense of misinformation.

Authors:Chu Li, Rock Yuren Pang, Arnavi Chheda-Kothary, Ather Sharif, Henok Assalif, Jeffrey Heer, Jon E. Froehlich
Title: GeoVisA11y: An AI-based Geovisualization Question-Answering System for Screen-Reader Users
Abstract:
Geovisualizations are powerful tools for communicating spatial information, but are inaccessible to screen-reader users. To address this limitation, we present GeoVisA11y, an LLM-based question-answering system that makes geovisualizations accessible through natural language interaction. The system supports map reading, analysis, interpretation and navigation by handling analytical, geospatial, visual and contextual queries. Through user studies with 12 screen-reader users and sighted participants, we demonstrate that GeoVisA11y effectively bridges accessibility gaps while revealing distinct interaction patterns between user groups. We contribute: (1) an open-source, accessible geovisualization system, (2) empirical findings on query and navigation differences, and (3) a dataset of geospatial queries to inform future research on accessible data visualization.

Authors:Nahal Mafi, Sahar Maleki, Babak Rahimi Ardabili, Hamed Tabkhi
Title: Beyond the Interface: Redefining UX for Society-in-the-Loop AI Systems
Abstract:
Artificial intelligence systems increasingly operate in decision-critical environments where probabilistic outputs and Human-in-the-Loop (HITL) interactions reshape user engagement. Traditional user experience (UX) frameworks, designed for deterministic systems, fail to capture these evolving sociotechnical dynamics. This paper argues that in AI-enabled HITL systems, UX must transcend frontend usability to encompass backend performance, organizational workflows, and decision making structures. We employ a mixed-methods approach, combining an inductive social construction analysis of 269 stakeholder insights with the deployment of an operational HITL video anomaly detection system. Our findings reveal that stakeholders experience AI through multifaceted themes: risk, governance, and organizational capacity. Experimental results further demonstrate how detection behavior and alert routing directly calibrate human oversight and workload. Grounded in these results, we formalize a new evaluative framework centered on four sociotechnical metrics: Accuracy (FPR/FNR), Operational Latency (response time), Adaptation Time (deployment burden), and Trust (validated automation scales). This framework redefines UX as a multi-layered construct spanning infrastructure and governance, providing a rigorous foundation for evaluating AI systems embedded within complex real-world ecosystems.

Authors:Zhenyu Li, Sai Kumar Dwivedi, Filip Maric, Carlos Chacon, Nadine Bertsch, Filippo Arcadu, Tomas Hodan, Michael Ramamonjisoa, Peter Wonka, Amy Zhao, Robin Kips, Cem Keskin, Anastasia Tkach, Chenhongyi Yang
Title: EgoPoseFormer v2: Accurate Egocentric Human Motion Estimation for AR/VR
Abstract:
Egocentric human motion estimation is essential for AR/VR experiences, yet remains challenging due to limited body coverage from the egocentric viewpoint, frequent occlusions, and scarce labeled data. We present EgoPoseFormer v2, a method that addresses these challenges through two key contributions: (1) a transformer-based model for temporally consistent and spatially grounded body pose estimation, and (2) an auto-labeling system that enables the use of large unlabeled datasets for training. Our model is fully differentiable, introduces identity-conditioned queries, multi-view spatial refinement, causal temporal attention, and supports both keypoints and parametric body representations under a constant compute budget. The auto-labeling system scales learning to tens of millions of unlabeled frames via uncertainty-aware semi-supervised training. The system follows a teacher-student schema to generate pseudo-labels and guide training with uncertainty distillation, enabling the model to generalize to different environments. On the EgoBody3M benchmark, with a 0.8 ms latency on GPU, our model outperforms two state-of-the-art methods by 12.2% and 19.4% in accuracy, and reduces temporal jitter by 22.2% and 51.7%. Furthermore, our auto-labeling system further improves the wrist MPJPE by 13.1%.

Authors:JaeWon Kim, Aayushi Dangol, Rotem Landesman, Alexis Hiniker, McKenna F. Parnes
Title: Sustainable Care: Designing Technologies That Support Children's Long-Term Engagement with Social Issues
Abstract:
Children today encounter social issues -- climate change, conflict, inequality -- through digital technologies, and the design of that encounter shapes whether young people move toward lasting civic engagement or toward anxiety and withdrawal. Much of the content children see is optimized for attention through fear and urgency, with few pathways toward meaningful action -- contributing to rising distress and disengagement among young people who care deeply but feel powerless to act. This full-day workshop introduces ``sustainable care'' as a design lens, asking how technology might support children's sustained engagement with social causes without contributing to empathic distress or burnout. We invite researchers and practitioners across child-computer interaction, games, education, and youth mental health to map this landscape together and develop a research agenda for the CCI community.

Authors:Nicolas Leins, Jana Gonnermann-Müller, Malte Teichmann, Sebastian Pokutta
Title: Beyond Static Instruction: A Multi-agent AI Framework for Adaptive Augmented Reality Robot Training
Abstract:
Augmented Reality (AR) offers powerful visualization capabilities for industrial robot training, yet current interfaces remain predominantly static, failing to account for learners' diverse cognitive profiles. In this paper, we present an AR application for robot training and propose a multi-agent AI framework for future integration that bridges the gap between static visualization and pedagogical intelligence. We report on the evaluation of the baseline AR interface with 36 participants performing a robotic pick-and-place task. While overall usability was high, notable disparities in task duration and learner characteristics highlighted the necessity for dynamic adaptation. To address this, we propose a multi-agent framework that orchestrates multiple components to perform complex preprocessing of multimodal inputs (e.g., voice, physiology, robot data) and adapt the AR application to the learner's needs. By utilizing autonomous Large Language Model (LLM) agents, the proposed system would dynamically adapt the learning environment based on advanced LLM reasoning in real-time.

Authors:Jamie Lee, Kyuha Jung, Cecilia Lee, Lauren MacDonnell, Jessica Kim, Daniel Otterson, Erin Newman, Emilie Chow, Yunan Chen
Title: From Efficiency to Meaning: Adolescents' Envisioned Role of AI in Health Management
Abstract:
While prior research has focused on providers, caregivers, and adult patients, little is known about adolescents' perceptions of AI in health learning and management. Utilizing design fiction and co-design methods, we conducted seven workshops with 23 adolescents (aged 14-17) to understand how they anticipate using health AI in the context of a family celiac diagnosis. Our findings reveal that adolescents have four main envisioned roles of health AI: enhancing health understanding and help-seeking, reducing cognitive burden, supporting family health management, and providing guidance while respecting their autonomy. We also identified nuanced trust and a divided view on emotional support from health AI. These findings suggest that adolescents perceive AI's value as a tool that moves them from efficiency to meaning-one that creates time for valued activities. We discuss opportunities for future health AI systems to be designed to encourage adolescent autonomy and reflection, while also supporting meaningful, dialectical activities.

Authors:Anupam Sharma, Harish Katti, Prajwal Singh, Shanmuganathan Raman, Krishna Miyapuram
Title: Hierarchic-EEG2Text: Assessing EEG-To-Text Decoding across Hierarchical Abstraction Levels
Abstract:
An electroencephalogram (EEG) records the spatially averaged electrical activity of neurons in the brain, measured from the human scalp. Prior studies have explored EEG-based classification of objects or concepts, often for passive viewing of briefly presented image or video stimuli, with limited classes. Because EEG exhibits a low signal-to-noise ratio, recognizing fine-grained representations across a large number of classes remains challenging; however, abstract-level object representations may exist. In this work, we investigate whether EEG captures object representations across multiple hierarchical levels, and propose episodic analysis, in which a Machine Learning (ML) model is evaluated across various, yet related, classification tasks (episodes). Unlike prior episodic EEG studies that rely on fixed or randomly sampled classes of equal cardinality, we adopt hierarchy-aware episode sampling using WordNet to generate episodes with variable classes of diverse hierarchy. We also present the largest episodic framework in the EEG domain for detecting observed text from EEG signals in the PEERS dataset, comprising $931538$ EEG samples under $1610$ object labels, acquired from $264$ human participants (subjects) performing controlled cognitive tasks, enabling the study of neural dynamics underlying perception, decision-making, and performance monitoring. We examine how the semantic abstraction level affects classification performance across multiple learning techniques and architectures, providing a comprehensive analysis. The models tend to improve performance when the classification categories are drawn from higher levels of the hierarchy, suggesting sensitivity to abstraction. Our work highlights abstraction depth as an underexplored dimension of EEG decoding and motivates future research in this direction.

Authors:Lawrence Obiuwevwi, Krzysztof J. Rechowicz, Vikas Ashok, Sampath Jayarathna
Title: Towards Affordable, Non-Invasive Real-Time Hypoglycemia Detection Using Wearable Sensor Signals
Abstract:
Accurately detecting hypoglycemia without invasive glucose sensors remains a critical challenge in diabetes management, particularly in regions where continuous glucose monitoring (CGM) is prohibitively expensive or clinically inaccessible. This extended study introduces a comprehensive, multimodal physiological framework for non-invasive hypoglycemia detection using wearable sensor signals. Unlike prior work limited to single-signal analysis, this chapter evaluates three physiological modalities, galvanic skin response (GSR), heart rate (HR), and their combined fusion, using the OhioT1DM 2018 dataset. We develop an end-to-end pipeline that integrates advanced preprocessing, temporal windowing, handcrafted and sequence-based feature extraction, early and late fusion strategies, and a broad spectrum of machine learning and deep temporal models, including CNNs, LSTMs, GRUs, and TCNs. Our results demonstrate that physiological signals exhibit distinct autonomic patterns preceding hypoglycemia and that combining GSR with HR consistently enhances detection sensitivity and stability compared to single-signal models. Multimodal deep learning architectures achieve the most reliable performance, particularly in recall, the most clinically urgent metric. Ablation studies further highlight the complementary contributions of each modality, strengthening the case for affordable, sensor-based glycemic monitoring. The findings show that real-time hypoglycemia detection is achievable using only inexpensive, non-invasive wearable sensors, offering a pathway toward accessible glucose monitoring in underserved communities and low-resource healthcare environments.

Authors:Yonghao Si, Xingyuan Zeng, Zhao Chen, Libin Zheng, Caleb Chen Cao, Lei Chen, Jian Yin
Title: CytoCrowd: A Multi-Annotator Benchmark Dataset for Cytology Image Analysis
Abstract:
High-quality annotated datasets are crucial for advancing machine learning in medical image analysis. However, a critical gap exists: most datasets either offer a single, clean ground truth, which hides real-world expert disagreement, or they provide multiple annotations without a separate gold standard for objective evaluation. To bridge this gap, we introduce CytoCrowd, a new public benchmark for cytology analysis. The dataset features 446 high-resolution images, each with two key components: (1) raw, conflicting annotations from four independent pathologists, and (2) a separate, high-quality gold-standard ground truth established by a senior expert. This dual structure makes CytoCrowd a versatile resource. It serves as a benchmark for standard computer vision tasks, such as object detection and classification, using the ground truth. Simultaneously, it provides a realistic testbed for evaluating annotation aggregation algorithms that must resolve expert disagreements. We provide comprehensive baseline results for both tasks. Our experiments demonstrate the challenges presented by CytoCrowd and establish its value as a resource for developing the next generation of models for medical image analysis.

Authors:Yeon Su Park, Nadia Azzahra Putri Arvi, Seoyoung Kim, Juho Kim
Title: Authorship Drift: How Self-Efficacy and Trust Evolve During LLM-Assisted Writing
Abstract:
Large language models (LLMs) are increasingly used as collaborative partners in writing. However, this raises a critical challenge of authorship, as users and models jointly shape text across interaction turns. Understanding authorship in this context requires examining users' evolving internal states during collaboration, particularly self-efficacy and trust. Yet, the dynamics of these states and their associations with users' prompting strategies and authorship outcomes remain underexplored. We examined these dynamics through a study of 302 participants in LLM-assisted writing, capturing interaction logs and turn-by-turn self-efficacy and trust ratings. Our analysis showed that collaboration generally decreased users' self-efficacy while increasing trust. Participants who lost self-efficacy were more likely to ask the LLM to edit their work directly, whereas those who recovered self-efficacy requested more review and feedback. Furthermore, participants with stable self-efficacy showed higher actual and perceived authorship of the final text. Based on these findings, we propose design implications for understanding and supporting authorship in human-LLM collaboration.

Authors:Luyi Sun, Wei Xu, Zaifeng Gao
Title: A Human-Centered Privacy Approach (HCP) to AI
Abstract:
As the paradigm of Human-Centered AI (HCAI) gains prominence, its benefits to society are accompanied by significant ethical concerns, one of which is the protection of individual privacy. This chapter provides a comprehensive overview of privacy within HCAI, proposing a human-centered privacy (HCP) framework, providing integrated solution from technology, ethics, and human factors perspectives. The chapter begins by mapping privacy risks across each stage of AI development lifecycle, from data collection to deployment and reuse, highlighting the impact of privacy risks on the entire system. The chapter then introduces privacy-preserving techniques such as federated learning and dif erential privacy. Subsequent chapters integrate the crucial user perspective by examining mental models, alongside the evolving regulatory and ethical landscapes as well as privacy governance. Next, advice on design guidelines is provided based on the human-centered privacy framework. After that, we introduce practical case studies across diverse fields. Finally, the chapter discusses persistent open challenges and future research directions, concluding that a multidisciplinary approach, merging technical, design, policy, and ethical expertise, is essential to successfully embed privacy into the core of HCAI, thereby ensuring these technologies advance in a manner that respects and ensures human autonomy, trust and dignity.

Authors:Yaxin Hu, Masaki Kuribayashi, Allan Wang, Seita Kayukawa, Daisuke Sato, Bilge Mutlu, Hironobu Takagi, Chieko Asakawa
Title: Robot-Assisted Group Tours for Blind People
Abstract:
Group interactions are essential to social functioning, yet effective engagement relies on the ability to recognize and interpret visual cues, making such engagement a significant challenge for blind people. In this paper, we investigate how a mobile robot can support group interactions for blind people. We used the scenario of a guided tour with mixed-visual groups involving blind and sighted visitors. Based on insights from an interview study with blind people (n=5) and museum experts (n=5), we designed and prototyped a robotic system that supported blind visitors to join group tours. We conducted a field study in a science museum where each blind participant (n=8) joined a group tour with one guide and two sighted participants (n=8). Findings indicated users' sense of safety from the robot's navigational support, concerns in the group participation, and preferences for obtaining environmental information. We present design implications for future robotic systems to support blind people's mixed-visual group participation.

Authors:Nayoung Choi, Jiseung Hong, Peace Cyebukayire, Ikseon Choi, Jinho D. Choi
Title: Tinker Tales: Supporting Child-AI Collaboration through Co-Creative Storytelling with Educational Scaffolding
Abstract:
Artificial intelligence (AI) is increasingly framed as a collaborative partner in creative activities, yet children's interactions with AI have largely been studied in AI-led instructional settings rather than co-creative collaboration. This leaves open questions about how children can meaningfully engage with AI through iterative co-creation. We present Tinker Tales, a tangible storytelling system designed with narrative and social-emotional scaffolding to support child-AI collaboration. The system combines a physical storytelling board, NFC-embedded toys representing story elements (e.g., characters, places, items, and emotions), and a mobile app that mediates child-AI interaction. Children shape and refine stories by placing and moving story elements and interacting with the AI through tangible and voice-based interaction. We conducted an exploratory user study with 10 children to examine how they interacted with Tinker Tales. Our findings show that children treated the AI as an attentive, responsive collaborator, while scaffolding supported coherent narrative refinement without diminishing children's agency.

Authors:Nicolas Leins, Jana Gonnermann-Müller, Malte Teichmann, Sebastian Pokutta
Title: Investigating the Influence of Spatial Ability in Augmented Reality-assisted Robot Programming
Abstract:
Augmented Reality (AR) offers promising opportunities to enhance learning, but its mechanisms and effects are not yet fully understood. As learning becomes increasingly personalized, considering individual learner characteristics becomes more important. This study investigates the moderating effect of spatial ability on learning experience with AR in the context of robot programming. A between-subjects experiment ($N=71$) compared conventional robot programming to an AR-assisted approach using a head-mounted display. Participants' spatial ability was assessed using the Mental Rotation Test. The learning experience was measured through the System Usability Scale (SUS) and cognitive load. The results indicate that AR support does not significantly improve the learning experience compared to the conventional approach. However, AR appears to have a compensatory effect on the influence of spatial ability. In the control group, spatial ability was significantly positively associated with SUS scores and negatively associated with extraneous cognitive load, indicating that higher spatial ability predicts a better learning experience. In the AR condition, these relationships were not observable, suggesting that AR mitigated the disadvantage typically experienced by learners with lower spatial abilities. These findings suggest that AR can serve a compensatory function by reducing the influence of learner characteristics. Future research should further explore this compensatory role of AR to guide the design of personalized learning environments that address diverse learner needs and reduce barriers for learners with varying cognitive profiles.

Authors:Linjie Qiu, Duotun Wang, Boyu Li, Jiawei Li, Yulin Shen, Zeyu Wang, Mingming Fan
Title: Direct vs. Score-based Selection: Understanding the Heisenberg Effect in Target Acquisition Across Input Modalities in Virtual Reality
Abstract:
Target selection is a fundamental interaction in virtual reality (VR). But the act of confirming a selection, such as a button press or pinch, can disturb the tracked pose and shift the intended target, which is referred to as the Heisenberg Effect. Prior research has mainly investigated controller input. However, it remains unclear how the effect manifests in the bare-hand input and how score-based techniques may mitigate the effect in different spatial variations. To fill the gap, we conduct a within-subject study to examine the Heisenberg Effect across two input modalities (i.e., controller and hand) and two selection mechanisms (i.e., direct and score-based). Our results show that hand input is more susceptible to the Heisenberg Effect, with direct selection more influenced by target width and score-based selection more sensitive to target density. Based on previous vote-oriented technique and our temporal analysis, we introduce weighted VOTE, a history-based intention accuracy model for target voting, that reweights recent interaction intent to counteract input disturbances. Our evaluation shows the method improves selection accuracy compared to baseline techniques. Finally, we discuss future directions for adaptive selection methods.

Authors:Valerio Belcamino, Mariya Kilina, Alessandro Carfì, Valeria Seidita, Fulvio Mastrogiovanni, Antonio Chella
Title: Factored Reasoning with Inner Speech and Persistent Memory for Evidence-Grounded Human-Robot Interaction
Abstract:
Dialogue-based human-robot interaction requires robot cognitive assistants to maintain persistent user context, recover from underspecified requests, and ground responses in external evidence, while keeping intermediate decisions verifiable. In this paper we introduce JANUS, a cognitive architecture for assistive robots that models interaction as a partially observable Markov decision process and realizes control as a factored controller with typed interfaces. To this aim, Janus (i) decomposes the overall behavior into specialized modules, related to scope detection, intent recognition, memory, inner speech, query generation, and outer speech, and (ii) exposes explicit policies for information sufficiency, execution readiness, and tool grounding. A dedicated memory agent maintains a bounded recent-history buffer, a compact core memory, and an archival store with semantic retrieval, coupled through controlled consolidation and revision policies. Models inspired by the notion of inner speech in cognitive theories provide a control-oriented internal textual flow that validates parameter completeness and triggers clarification before grounding, while a faithfulness constraint ties robot-to-human claims to an evidence bundle combining working context and retrieved tool outputs. We evaluate JANUS through module-level unit tests in a dietary assistance domain grounded on a knowledge graph, reporting high agreement with curated references and practical latency profiles. These results support factored reasoning as a promising path to scalable, auditable, and evidence-grounded robot assistance over extended interaction horizons.

Authors:Victor Nikhil Antony, Adithya R N, Sarah Derrick, Zhili Gong, Peter M. Donley, Chien-Ming Huang
Title: Plant-Inspired Robot Design Metaphors for Ambient HRI
Abstract:
Plants offer a paradoxical model for interaction: they are ambient, low-demand presences that nonetheless shape atmosphere, routines, and relationships through temporal rhythms and subtle expressions. In contrast, most human-robot interaction (HRI) has been grounded in anthropomorphic and zoomorphic paradigms, producing overt, high-demand forms of engagement. Using a Research through Design (RtD) methodology, we explore plants as metaphoric inspiration for HRI; we conducted iterative cycles of ideation, prototyping, and reflection to investigate what design primitives emerge from plant metaphors and morphologies, and how these primitives can be combined into expressive robotic forms. We present a suite of speculative, open-source prototypes that help probe plant-inspired presence, temporality, form, and gestures. We deepened our learnings from design and prototyping through prototype-centered workshops that explored people's perceptions and imaginaries of plant-inspired robots. This work contributes: (1) Set of plant-inspired robotic artifacts; (2) Designerly insights on how people perceive plant-inspired robots; and (3) Design consideration to inform how to use plant metaphors to reshape HRI.

Authors:Victor Nikhil Antony, Zhili Gong, Guanchen Li, Clara Jeon, Chien-Ming Huang
Title: Lantern: A Minimalist Robotic Object Platform
Abstract:
Robotic objects are simple actuated systems that subtly blend into human environments. We design and introduce Lantern, a minimalist robotic object platform to enable building simple robotic artifacts. We conducted in-depth design and engineering iterations of Lantern's mechatronic architecture to meet specific design goals while maintaining a low build cost (~40 USD). As an extendable, open-source platform, Lantern aims to enable exploration of a range of HRI scenarios by leveraging human tendency to assign social meaning to simple forms. To evaluate Lantern's potential for HRI, we conducted a series of explorations: 1) a co-design workshop, 2) a sensory room case study, 3) distribution to external HRI labs, 4) integration into a graduate-level HRI course, and 5) public exhibitions with older adults and children. Our findings show that Lantern effectively evokes engagement, can support versatile applications ranging from emotion regulation to focused work, and serves as a viable platform for lowering barriers to HRI as a field.

Authors:Mark Colley, Simon Kopp, Debargha Dey, Pascal Jansen, Enrico Rukzio
Title: eHMI for All -- Investigating the Effect of External Communication of Automated Vehicles on Pedestrians, Manual Drivers, and Cyclists in Virtual Reality
Abstract:
With automated vehicles (AVs), the absence of a human operator could necessitate external Human-Machine Interfaces (eHMIs) to communicate with other road users. Existing research primarily focuses on pedestrian-AV interactions, with limited attention given to other road users, such as cyclists and drivers of manually driven vehicles. So far, no studies have compared the effects of eHMIs across these three road user roles. Therefore, we conducted a within-subjects virtual reality experiment (N=40), evaluating the subjective and objective impact of an eHMI communicating the AV's intention to pedestrians, cyclists, and drivers under various levels of distraction (no distraction, visual noise, interference). eHMIs positively influenced safety perceptions, trust, perceived usefulness, and mental demand across all roles. While distraction and road user roles showed significant main effects, interaction effects were only observed in perceived usability. Thus, a unified eHMI design is effective, facilitating the standardization and broader adoption of eHMIs in diverse traffic.

Authors:Pascal Jansen, Julian Britten, Mark Colley, Markus Sasalovici, Enrico Rukzio
Title: MIRAGE: Enabling Real-Time Automotive Mediated Reality
Abstract:
Traffic is inherently dangerous, with around 1.19 million fatalities annually. Automotive Mediated Reality (AMR) can enhance driving safety by overlaying critical information (e.g., outlines, icons, text) on key objects to improve awareness, altering objects' appearance to simplify traffic situations, and diminishing their appearance to minimize distractions. However, real-world AMR evaluation remains limited due to technical challenges. To fill this sim-to-real gap, we present MIRAGE, an open-source tool that enables real-time AMR in real vehicles. MIRAGE implements 15 effects across the AMR spectrum of augmented, diminished, and modified reality using state-of-the-art computational models for object detection and segmentation, depth estimation, and inpainting. In an on-road expert user study (N=9) of MIRAGE, participants enjoyed the AMR experience while pointing out technical limitations and identifying use cases for AMR. We discuss these results in relation to prior work and outline implications for AMR ethics and interaction design.

Authors:Yuqi Tong, Ruiyang Li, Chengkun Li, Qixuan Liu, Shi Qiu, Pheng-Ann Heng
Title: ClipGS-VR: Immersive and Interactive Cinematic Visualization of Volumetric Medical Data in Mobile Virtual Reality
Abstract:
High-fidelity cinematic medical visualization on mobile virtual reality (VR) remains challenging. Although ClipGS enables cross-sectional exploration via 3D Gaussian Splatting, it lacks arbitrary-angle slicing on consumer-grade VR headsets. To achieve real-time interactive performance, we introduce ClipGS-VR and restructure ClipGS's neural inference into a consolidated dataset, integrating high-fidelity layers from multiple pre-computed slicing states into a unified rendering structure. Our framework further supports arbitrary-angle slicing via gradient-based opacity modulation for smooth, visually coherent rendering. Evaluations confirm our approach maintains visual fidelity comparable to offline results while offering superior usability and interaction efficiency.

Authors:Ye Tian, Haohua Du, Chao Gu, Junyang Zhang, Shanyue Wang, Hao Zhou, Jiahui Hou, Xiang-Yang Li
Title: Lip-Siri: Contactless Open-Sentence Silent Speech with Wi-Fi Backscatter
Abstract:
Silent speech interfaces (SSIs) enable silent interaction in noise-sensitive or privacy-sensitive settings. However, existing SSIs face practical deployment trade-offs among privacy, user experience, and energy consumption, and most remain limited to closed-set recognition over small, pre-defined vocabularies of words or sentences, which restricts real-world expressiveness. In this paper, we present Lip-Siri, to the best of our knowledge, the first Wi-Fi backscatter--based SSI that supports open-vocabulary sentence recognition via lexicon-guided subword decoding. Lip-Siri designs a frequency-shifted backscatter tag to isolate tag-modulated reflections and suppress interference from non-target motions, enabling reliable extraction of lip-motion traces from ubiquitous Wi-Fi signals. We then segment continuous traces into lip-motion units, cluster them, learn robust unit representations via cluster-based self-supervision, and finally propose a lexicon-guided Transformer encoder--decoder with beam search to decode variable-length sentence sequences. We implement an end-to-end prototype and evaluate it with 15 participants on 340 sentences and 3,398 words across multiple scenarios. Lip-Siri achieves 85.61% accuracy on word prediction and a WER of 36.87% on continuous sentence recognition, approaching the performance of representative vision-based lip-reading systems.

Authors:Saber Zerhoudi, Michael Dinzinger, Michael Granitzer, Jelena Mitrovic
Title: OwlerLite: Scope- and Freshness-Aware Web Retrieval for LLM Assistants
Abstract:
Browser-based language models often use retrieval-augmented generation (RAG) but typically rely on fixed, outdated indices that give users no control over which sources are consulted. This can lead to answers that mix trusted and untrusted content or draw on stale information. We present OwlerLite, a browser-based RAG system that makes user-defined scopes and data freshness central to retrieval. Users define reusable scopes-sets of web pages or sources-and select them when querying. A freshness-aware crawler monitors live pages, uses a semantic change detector to identify meaningful updates, and selectively re-indexes changed content. OwlerLite integrates text relevance, scope choice, and recency into a unified retrieval model. Implemented as a browser extension, it represents a step toward more controllable and trustworthy web assistants.

Authors:Yimeng Liu, Misha Sra, Chang Xiao
Title: AlignUI: A Method for Designing LLM-Generated UIs Aligned with User Preferences
Abstract:
Designing user interfaces that align with user preferences is a time-consuming process, which requires iterative cycles of prototyping, user testing, and refinement. Recent advancements in LLM-based UI generation have enabled efficient UI generation to assist the UI design process. We introduce AlignUI, a method that aligns LLM-generated UIs with user tasks and preferences by using a user preference dataset to guide the LLM's reasoning process. The dataset was crowdsourced from 50 general users (the target users of generated UIs) and contained 720 UI control preferences on eight image-editing tasks. We evaluated AlignUI by generating UIs for six unseen tasks and conducting a user study with 72 additional general users. The results showed that the generated UIs closely align with multiple dimensions of user preferences. We conclude by discussing the applicability of our method to support user-aligned UI design for multiple task domains and user groups, as well as personalized user needs.

Authors:Ligao Ruan, Giles Hamilton-Fletcher, Mahya Beheshti, Todd E Hudson, Maurizio Porfiri, John-Ross Rizzo
Title: A Multimodal Assistive System for Product Localization and Retrieval for People who are Blind or have Low Vision
Abstract:
Shopping is a routine activity for sighted individuals, yet for people who are blind or have low vision (pBLV), locating and retrieving products in physical environments remains a challenge. This paper presents a multimodal wearable assistive system that integrates object detection with vision-language models to support independent product or item retrieval, with the goal of enhancing users'autonomy and sense of agency. The system operates through three phases: product search, which identifies target products using YOLO-World detection combined with embedding similarity and color histogram matching; product navigation, which provides spatialized sonification and VLM-generated verbal descriptions to guide users toward the target; and product correction, which verifies whether the user has reached the correct product and provides corrective feedback when necessary. Technical evaluation demonstrated promising performance across all modules, with product detection achieving near-perfect accuracy at close range and high accuracy when facing shelves within 1.5 m. VLM-based navigation achieved up to 94.4% accuracy, and correction accuracy exceeded 86% under optimal model configurations. These results demonstrate the system's potential to address the last-meter problem in assistive shopping. Future work will focus on user studies with pBLV participants and integration with multi-scale navigation ecosystems.

Authors:Chao Wang, Anna Belardinelli, Michael Gienger
Title: XR$^3$: An Extended Reality Platform for Social-Physical Human-Robot Interaction
Abstract:
Social-physical human-robot interaction (spHRI) is difficult to study: building and programming robots that integrate multiple interaction modalities is costly and slow, while VR-based prototypes often lack physical contact, breaking users' visuo-tactile expectations. We present XR$^3$, a co-located dual-VR-headset platform for HRI research in which an attendee and a hidden operator share the same physical space while experiencing different virtual embodiments. The attendee sees an expressive virtual robot that interacts face-to-face in a shared virtual environment. In real time, the robot's upper-body motion, head and gaze behavior, and facial expressions are mapped from the operator's tracked limbs and face signals. Because the operator is co-present and calibrated in the same coordinate frame, the operator can also touch the attendee, enabling perceived robot touch synchronized with the robot's visible hands. Finger and hand motion is mapped to the robot avatar using inverse kinematics to support precise contact. Beyond motion retargeting, XR$^3$ supports social retargeting of multiple nonverbal cues that can be experimentally varied while keeping physical interaction constant. We detail the system design and calibration, and demonstrate the platform in a touch-based Wizard-of-Oz study, lowering the barrier to prototyping and evaluating embodied, contact-based robot behaviors.

Authors:Yue Deng, Xiaowei Chen, Junxiang Liao, Bo Li, Yixin Zou
Title: Experiencer, Helper, or Observer: Online Fraud Intervention for Older Adults Through Role-based Simulation
Abstract:
Online fraud is a critical global threat that disproportionately targets older adults. Prior anti-fraud education for older adults has largely relied on static, traditional instruction that limits engagement and real-world transfer, whereas role-based simulation offers realistic yet low-risk opportunities for practice. Moreover, most interventions situate learners as victims, overlooking that fraud encounters often involve multiple roles, such as bystanders who witness scams and helpers who support victims. To address this gap, we developed ROLESafe, an anti-fraud educational intervention in which older adults learn through different learning roles, including Experiencer (experiencing fraud), Helper (assisting a victim), and Observer (witnessing fraud). In a between-subjects study with 144 older adults in China, we found that the Experiencer and Helper roles significantly improved participants' ability to identify online fraud. These findings highlight the promise of role-based, multi-perspective simulations for enhancing fraud awareness among older adults and provide design implications for future anti-fraud education.

Authors:Yue Deng, Changyang He, Bo Li, Yixin Zou
Title: "What If My Face Gets Scanned Without Consent": Understanding Older Adults' Experiences with Biometric Payment
Abstract:
Biometric payment, i.e., biometric authentication implemented in digital payment systems, can reduce memory demands and streamline payment for older adults. However, older adults' perceptions and practices regarding biometric payment remain underexplored. We conducted semi-structured interviews with 22 Chinese older adults, including both users and non-users. Participants were motivated to use biometric payment due to convenience and perceived security. However, they also worried about loss of control due to its password-free nature and expressed concerns about biometric data security. Participants also identified desired features for biometric payment, such as lightweight and context-aware cognitive confirmation mechanisms to enhance user control. Based on these findings, we outline recommendations for more controllable and informative digital financial services that better support older adults.

Authors:Zaifeng Gao, Yuanxiu Zhao, Hanxi Pan, Wei Xu
Title: Toward Human-Centered Human-AI Interaction: Advances in Theoretical Frameworks and Practice
Abstract:
With the rapid development of artificial intelligence (AI), machines are increasingly evolving into intelligent agents, and the human-machine relationship is shifting from traditional "human-computer interaction" toward a new paradigm of "human-AI collaboration." However, technology-centered approaches to AI development have gradually revealed limitations such as fragility, bias, and low explainability, highlighting the urgent need for human-centered AI (HCAI) design philosophy. As a systems engineering approach, the successful implementation of HCAI depends critically on the design and optimization of high-quality human-AI interaction (HAII). This paper systematically reviews our research team's nearly decade-long exploration and practice in HCAI. At the level of research vision, we were among the first in China to systematically propose HAII as an interdisciplinary field and to develop a human-centered conceptual framework for human--AI collaboration. At the theoretical level, we introduced frameworks for human-AI joint cognitive systems, team-level situation awareness among intelligent agents, and shared social understanding, forming a relatively comprehensive theoretical system. At the methodological level, we established a hierarchical HCAI framework and a taxonomy of HCAI implementation methods. At the application level, we conducted a series of studies in domains such as autonomous driving, intelligent aircraft cockpit, and trust in human-AI collaboration, empirically validating the effectiveness of the proposed frameworks. Looking ahead, research on HCAI and HAII must continue to advance along three dimensions: theoretical deepening, methodological innovation, and application expansion, promoting the development of an intelligent society that is human-centered and characterized by harmonious human-AI coexistence.

Authors:JungMin Yun, Juhwan Choi, Kyohoon Jin, Soojin Jang, Jinhee Jang, YoungBin Kim
Title: SUMMPILOT: Bridging Efficiency and Customization for Interactive Summarization System
Abstract:
This paper incorporates the efficiency of automatic summarization and addresses the challenge of generating personalized summaries tailored to individual users' interests and requirements. To tackle this challenge, we introduce SummPilot, an interaction-based customizable summarization system. SummPilot leverages a large language model to facilitate both automatic and interactive summarization. Users can engage with the system to understand document content and personalize summaries through interactive components such as semantic graphs, entity clustering, and explainable evaluation. Our demo and user studies demonstrate SummPilot's adaptability and usefulness for customizable summarization.

Authors:Aayushi Dangol, Meghna Gupta, Daeun Yoo, Robert Wolfe, Jason Yip, Franziska Roesner, Julie A. Kientz
Title: Toys that listen, talk, and play: Understanding Children's Sensemaking and Interactions with AI Toys
Abstract:
Generative AI (genAI) is increasingly being integrated into children's everyday lives, not only through screens but also through so-called "screen-free" AI toys. These toys can simulate emotions, personalize responses, and recall prior interactions, creating the illusion of an ongoing social connection. Such capabilities raise important questions about how children understand boundaries, agency, and relationships when interacting with AI toys. To investigate this, we conducted two participatory design sessions with eight children ages 6-11 where they engaged with three different AI toys, shifting between play, experimentation, and reflection. Our findings reveal that children approached AI toys with genuine curiosity, profiling them as social beings. However, frequent interaction breakdowns and mismatches between apparent intelligence and toy-like form disrupted expectations around play and led to adversarial play. We conclude with implications and design provocations to navigate children's encounters with AI toys in more transparent, developmentally appropriate, and responsible ways.

Authors:Griffin Pitts, Kimia Fazeli, Tirth Bhatt, Jennifer Albert, Marnie Hill, Tiffany Barnes, Shiyan Jiang, Bita Akram
Title: Democratizing Foundations of Problem-Solving with AI: A Breadth-First Search Curriculum for Middle School Students
Abstract:
As AI becomes more common in students' everyday experiences, a major challenge for K-12 AI education is designing learning experiences that can be meaningfully integrated into existing subject-area instruction. This paper presents the design and implementation of an AI4K12-aligned curriculum that embeds AI learning goals within a rural middle school science classroom using Breadth-First Search (BFS) as an accessible entry point to AI problem-solving. Through unplugged activities and an interactive simulation environment, students learned BFS as a strategy for exploring networks and identifying shortest paths, then applied it to science contexts involving virus spread and contact tracing. To examine engagement and learning, we analyzed pre- and post-assessments, student work artifacts, and a teacher interview. Results suggest that students engaged productively with the curriculum, improved their understanding of BFS and AI problem-solving, and benefited from learning these ideas within ongoing science instruction. Teacher feedback further indicated that the module fit well within the science curriculum while supporting intended science learning outcomes. We conclude with curriculum and design considerations for broadening access to learning about problem-solving with AI in education.

Authors:Rui Chen, Firman Isma Serdana, Domenico Chiaradia, Xianlong Mai, Elena Losanno, Gabriele Righi, Claudia De Santis, Federica Serra, Vincent Mendez, Cristian Camardella, Daniele Leonardis, Giulio Del Popolo, Silvestro Micera, Antonio Frisoli
Title: A Dual-Action Fabric-Based Soft Robotic Glove for Ergonomic Hand Rehabilitation
Abstract:
Hand impairment following neurological disorders substantially limits independence in activities of daily living, motivating the development of effective assistive and rehabilitation strategies. Soft robotic gloves have attracted growing interest in this context, yet persistent challenges in customization, ergonomic fit, and flexion-extension actuation constrain their clinical utility. Here, we present a dual-action fabric-based soft robotic glove incorporating customized actuators aligned with individual finger joints. The glove comprises five independently controlled dual-action actuators supporting finger flexion and extension, together with a dedicated thumb abduction actuator. Leveraging computer numerical control heat sealing technology, we fabricated symmetrical-chamber actuators that adopt a concave outer surface upon inflation, thereby maximizing finger contact area and improving comfort. Systematic characterization confirmed that the actuators generate sufficient joint moment and fingertip force for ADL-relevant tasks, and that the complete glove system produces adequate grasping force for common household objects. A preliminary study with ten healthy subjects demonstrated that active glove assistance significantly reduces forearm muscle activity during object manipulation. A pilot feasibility study with three individuals with cervical spinal cord injury across seven functional tasks indicated that glove assistance promotes more natural grasp patterns and reduces reliance on tenodesis grasp, although at the cost of increased task completion time attributable to the current actuation interface. This customizable, ergonomic design represents a practical step toward personalized hand rehabilitation and assistive robotics.

Authors:Rui Chen, Xianlong Mai, Alireza Sanaei, Domenico Chiaradia, Antonio Frisoli, Daniele Leonardis
Title: A wearable haptic device for edge and surface simulation
Abstract:
Object manipulation is fundamental to virtual reality (VR) applications, yet conventional fingertip haptic devices fail to render certain tactile features relevant for immersive and precise interactions, as i.e. detection of edges. This paper presents a compact, lightweight fingertip haptic device (24.3 g) that delivers distinguishable surface and edge contact feedback through a novel dual-motor mechanism. Pressure distribution characterization using a 6 x 6 flexible sensor array demonstrates distinct contact patterns between the two stimulation modes. A preliminary user study with five participants achieved 93% average classification accuracy across four conditions (edge/surface contact with light/heavy pressure), with mean response times of 2.79 seconds. The results indicate that the proposed device can effectively convey edge and surface tactile cues, potentially enhancing object manipulation fidelity in VR environments.

Authors:Ut Gong, Yibo Meng, Qihan Zhang, Xin Chen, Yan Guan
Title: In the Middle, Not on Top: AI-Mediated Communication for Patient-Provider Care Relationships
Abstract:
Relationship-centered care relies on trust and meaningful connection. As AI enters clinical settings, we must ask not just what it can do, but how it should be positioned to support these values. We examine a "middle, not top" approach where AI mediates communication without usurping human judgment. Through studies of CLEAR, an asynchronous messaging system, we show how this configuration addresses real-world constraints like time pressure and uneven health literacy. We find that mediator affordances (e.g., availability, neutrality) redistribute interpretive work and reduce relational friction. Ultimately, we frame AI mediation as relational infrastructure, highlighting critical design tensions around framing power and privacy.

Authors:Qianru Lyu, Conrad Borchers, Meng Xia, Karen Xiao, Paulo F. Carvalho, Kenneth R. Koedinger, Vincent Aleven
Title: Evaluating a Data-Driven Redesign Process for Intelligent Tutoring Systems
Abstract:
Past research has defined a general process for the data-driven redesign of educational technologies and has shown that in carefully-selected instances, this process can help make systems more effective. In the current work, we test the generality of the approach by applying it to four units of a middle-school mathematics intelligent tutoring system that were selected not based on suitability for redesign, as in previous work, but on topic. We tested whether the redesigned system was more effective than the original in a classroom study with 123 students. Although the learning gains did not differ between the conditions, students who used the Redesigned Tutor had more productive time-on-task, a larger number of skills practiced, and greater total knowledge mastery. The findings highlight the promise of data-driven redesign even when applied to instructional units *not* selected as likely to yield improvement, as evidence of the generality and wide applicability of the method.

Authors:Aayushi Dangol, Robert Wolfe, Nisha Devasia, Mitsuka Kiyohara, Jason Yip, Julie A. Kientz
Title: Where Does AI Leave a Footprint? Children's Reasoning About AI's Environmental Costs
Abstract:
Two of the most socially consequential issues facing today's children are the rise of artificial intelligence (AI) and the rapid changes to the earth's climate. Both issues are complex and contested, and they are linked through the notable environmental costs of AI use. Using a systems thinking framework, we developed an interactive system called Ecoprompt to help children reason about the environmental impact of AI. EcoPrompt combines a prompt-level environmental footprint calculator with a simulation game that challenges players to reason about the impact of AI use on natural resources that the player manages. We evaluated the system through two participatory design sessions with 16 children ages 6-12. Our findings surfaced children's perspectives on societal and environmental tradeoffs of AI use, as well as their sense of agency and responsibility. Taken together, these findings suggest opportunities for broadening AI literacy to include systems-level reasoning about AI's environmental impact.

Authors:Jérémy Barghorn, Anna Sotnikova, Sacha Friedli, Antoine Bosselut
Title: AI Meets Mathematics Education: A Case Study on Supporting an Instructor in a Large Mathematics Class with Context-Aware AI
Abstract:
Large-enrollment university courses face persistent challenges in providing timely and scalable instructional support. While generative AI holds promise, its effective use depends on reliability and pedagogical alignment. We present a human-centered case study of AI-assisted support in a Calculus I course, implemented in close collaboration with the course instructor. We developed a system to answer students' questions on a discussion forum, fine-tuning a lightweight language model on 2,588 historical student-instructor interactions. The model achieved 75.3% accuracy on a benchmark of 150 representative questions annotated by five instructors, and in 36% of cases, its responses were rated equal to or better than instructor answers. Post-deployment student survey (N = 105) indicated that students valued the alignment of the responses with the course materials and their immediate availability, while still relying on the instructor verification for trust. We highlight the importance of hybrid human-AI workflows for safe and effective course support.

Authors:Venkatesh Sivaraman, Patrick Vossler, Adam Perer, Julian Hong, Jean Feng
Title: More Than "Means to an End": Supporting Reasoning with Transparently Designed AI Data Science Processes
Abstract:
Generative artificial intelligence (AI) tools can now help people perform complex data science tasks regardless of their expertise. While these tools have great potential to help more people work with data, their end-to-end approach does not support users in evaluating alternative approaches and reformulating problems, both critical to solving open-ended tasks in high-stakes domains. In this paper, we reflect on two AI data science systems designed for the medical setting and how they function as tools for thought. We find that success in these systems was driven by constructing AI workflows around intentionally-designed intermediate artifacts, such as readable query languages, concept definitions, or input-output examples. Despite opaqueness in other parts of the AI process, these intermediates helped users reason about important analytical choices, refine their initial questions, and contribute their unique knowledge. We invite the HCI community to consider when and how intermediate artifacts should be designed to promote effective data science thinking.

Authors:Shiwei Chen, Niruthikka Sritharan, Xiaolin Wen, Chenxi Zhang, Xingbo Wang, Yong Wang
Title: When the Chain Breaks: Interactive Diagnosis of LLM Chain-of-Thought Reasoning Errors
Abstract:
Current Large Language Models (LLMs), especially Large Reasoning Models, can generate Chain-of-Thought (CoT) reasoning traces to illustrate how they produce final outputs, thereby facilitating trust calibration for users. However, these CoT reasoning traces are usually lengthy and tedious, and can contain various issues, such as logical and factual errors, which make it difficult for users to interpret the reasoning traces efficiently and accurately. To address these challenges, we develop an error detection pipeline that combines external fact-checking with symbolic formal logical validation to identify errors at the step level. Building on this pipeline, we propose ReasonDiag, an interactive visualization system for diagnosing CoT reasoning traces. ReasonDiag provides 1) an integrated arc diagram to show reasoning-step distributions and error-propagation patterns, and 2) a hierarchical node-link diagram to visualize high-level reasoning flows and premise dependencies. We evaluate ReasonDiag through a technical evaluation for the error detection pipeline, two case studies, and user interviews with 16 participants. The results indicate that ReasonDiag helps users effectively understand CoT reasoning traces, identify erroneous steps, and determine their root causes.

Authors:Kaijie Xu, Yiwei Zhang, Brian Yang, Clark Verbrugge
Title: Deconstructing Open-World Game Mission Design Formula: A Thematic Analysis Using an Action-Block Framework
Abstract:
Open-world missions often rely on repeated formulas, yet designers lack systematic ways to examine pacing, variation, and experiential balance across large portfolios. We introduce the Mission Action Quality Vector (MAQV), a six-dimensional framework-covering combat, exploration, narrative, emotion, problem-solving, and uniqueness-paired with an action block grammar representing missions as gameplay sequences. Using about 2200 missions from 20 AAA titles, we apply LLM-assisted parsing to convert community walkthroughs into structured action sequences and score them with MAQV. An interactive dashboard enables designers to reveal underlying mission formulas. In a mixed-methods study with experienced players and designers, we validate the pipeline's fidelity and the tool's usability, and use thematic analysis to identify recurring design trade-offs, pacing grammars, and systematic differences by quest type and franchise evolution. Our work offers a reproducible analytical workflow, a data-driven visualization tool, and reflective insights to support more balanced, varied mission design at scale.

Authors:Tianhai Liang, Shiyi Guo, Baiye Cheng, Zhengrong Xue, Han Zhang, Huazhe Xu
Title: ArrayTac: A tactile display for simultaneous rendering of shape, stiffness and friction
Abstract:
Human-computer interaction in the visual and auditory domains has achieved considerable maturity, yet machine-to-human tactile feedback remains underdeveloped. Existing tactile displays struggle to simultaneously render multiple tactile dimensions, such as shape, stiffness, and friction, which limits the realism of haptic simulation. Here, we present ArrayTac, a piezoelectric-driven tactile display capable of simultaneously rendering shape, stiffness, and friction to reproduce realistic haptic signals. The system comprises a 4x4 array of 16 actuator units, each employing a three-stage micro-lever mechanism to amplify the micrometer-scale displacement of the piezoelectric element, with Hall sensor-based closed-loop control at the end effector to enhance response speed and precision. We further implement two end-to-end pipelines: 1) a vision-to-touch framework that converts visual inputs into tactile signals using multimodal foundation models, and 2) a real-time tele-palpation system operating over distances of several thousand kilometers. In user studies, first-time participants accurately identify object shapes and physical properties with high success rates. In a tele-palpation experiment over 1,000km, untrained volunteers correctly identified both the number and type of tumors in a breast phantom with 100% accuracy and precisely localized their positions. The system pioneers a new pathway for high-fidelity haptic feedback by introducing the unprecedented capability to simultaneously render an object's shape, stiffness, and friction, delivering a holistic tactile experience that was previously unattainable.

Authors:Haneen Fatima, Muhammad Ali Imran, Ahmad Taha, Lina Mohjazi
Title: Toward Real-Time Mirrors Intelligence: System-Level Latency and Computation Evaluation in Internet of Mirrors (IoM)
Abstract:
The Internet of Mirrors (IoM) is an emerging IoT ecosystem of interconnected smart mirrors designed to deliver personalised services across a three-tier node hierarchy spanning consumer, professional, and hub nodes. Determining where computation should reside within this hierarchy is a critical design challenge, as placement decisions directly affect end-to-end latency, resource utilisation, and user experience. This paper presents the first physical IoM testbed study, evaluating four computational placement strategies across the IoM tier hierarchy under real Wi-Fi and 5G network conditions. Results show that offloading classification to higher-tier nodes substantially reduces latency and consumer resource load, but introduces network overhead that scales with payload size and hop count. No single strategy is universally optimal: the best choice depends on available network, node proximity, and concurrent user load. These findings empirically characterise the computation-communication trade-off space of the IoM and motivate the need for intelligent, adaptive task placement responsive to application requirements and live ecosystem conditions.

Authors:Dominik P. Hofer, Haochen Song, Rania Islambouli, Laura Hawkins, Ananya Bhattacharjee, Meredith Franklin, Joseph Jay Williams, Jan D. Smeddinck
Title: Structured Exploration vs. Generative Flexibility: A Field Study Comparing Bandit and LLM Architectures for Personalised Health Behaviour Interventions
Abstract:
Behaviour Change Techniques (BCTs) are central to digital health interventions, yet selecting and delivering effective techniques remains challenging. Contextual bandits enable statistically grounded optimisation of BCT selection, while Large Language Models (LLMs) offer flexible, context-sensitive message generation. We conducted a 4-week study on physical activity motivation (N=54; 9 post-study interviews) that compared five daily messaging approaches: random templates, contextual bandit with templates, LLM generation, hybrid bandit+LLM, and LLM with interaction history. LLM-based approaches were rated substantially more helpful than templates, but no significant differences emerged among LLM conditions. Unexpectedly, bandit optimisation for BCTs selection yielded no additional perceived helpfulness compared with LLM-only approaches. Unconstrained LLMs focused heavily on a single BCT, whereas bandit systems enforced systematic exploration-exploitation across techniques. Quantitative and qualitative findings suggest contextual acknowledgement of user input drove perceived helpfulness. We contribute design suggestions for reflective AI health behaviour change systems that address a trade-off between structured exploration and generative autonomy.

Authors:Ruixuan Sun, Matthew Zent, Minzhu Zhao, Thanmayee Boyapati, Xinyi Li, Joseph A. Konstan
Title: Balancing Domestic and Global Perspectives: Evaluating Dual-Calibration and LLM-Generated Nudges for Diverse News Recommendation
Abstract:
In this study, we applied the ``personalized diversity nudge framework'' with the goal of expanding user reading coverage in terms of news locality (i.e., domestic and world news). We designed a novel topic-locality dual calibration algorithmic nudge and a large language model-based news personalization presentation nudge, then launched a 5-week real-user study with 120 U.S. news readers on the news recommendation experiment platform POPROX. With user interaction logs and survey responses, we found that algorithmic nudges can successfully increase exposure and consumption diversity, while the impact of LLM-based presentation nudges varied. User-level topic interest is a strong predictor of user clicks, while highlighting the relevance of news articles to prior read articles outperforms generic topic-based and no personalization. We also demonstrate that longitudinal exposure to calibrated news may shift readers' reading habits to value a balanced news digest from both domestic and world articles. Our results provide direction for future work on nudging for diverse consumption in news recommendation systems.

Authors:Haoze Guo, Ziqi Wei
Title: From OCR to Analysis: Tracking Correction Provenance in Digital Humanities Pipelines
Abstract:
Optical Character Recognition (OCR) is a critical but error-prone stage in digital humanities text pipelines. While OCR correction improves usability for downstream NLP tasks, common workflows often overwrite intermediate decisions, obscuring how textual transformations affect scholarly interpretation. We present a provenance-aware framework for OCR-corrected humanities corpora that records correction lineage at the span level, including edit type, correction source, confidence, and revision status. Using a pilot corpus of historical texts, we compare downstream named entity extraction across raw OCR, fully corrected text, and provenance-filtered corrections. Our results show that correction pathways can substantially alter extracted entities and document-level interpretations, while provenance signals help identify unstable outputs and prioritize human review. We argue that provenance should be treated as a first-class analytical layer in NLP for digital humanities, supporting reproducibility, source criticism, and uncertainty-aware interpretation.

Authors:Hayato Saiki, Chunggi Lee, Hikari Takahashi, Tica Lin, Hidetada Kishi, Kaori Tachibana, Yasuhiro Suzuki, Hanspeter Pfister, Kenji Suzuki
Title: BRIDGE: Borderless Reconfiguration for Inclusive and Diverse Gameplay Experience via Embodiment Transformation
Abstract:
Training resources for parasports are limited, reducing opportunities for athletes and coaches to engage with sport-specific movements and tactical coordination. To address this gap, we developed BRIDGE, a system that integrates a reconstruction pipeline, which detects and tracks players from broadcast video to generate 3D play sequences, with an embodiment-aware visualization framework that decomposes head, trunk, and wheelchair base orientations to represent attention, intent, and mobility. We evaluated BRIDGE in two controlled studies with 20 participants (10 national wheelchair basketball team players and 10 amateur players). The results showed that BRIDGE significantly enhanced the perceived naturalness of player postures and made tactical intentions easier to understand. In addition, it supported functional classification by realistically conveying players' capabilities, which in turn improved participants' sense of self-efficacy. This work advances inclusive sports learning and accessible coaching practices, contributing to more equitable access to tactical resources in parasports.

Authors:Chunggi Lee, Hayato Saiki, Tica Lin, Eiji Ikeda, Kenji Suzuki, Chen Zhu-Tian, Hanspeter Pfister
Title: ViSTAR: Virtual Skill Training with Augmented Reality with 3D Avatars and LLM coaching agent
Abstract:
We present ViSTAR, a Virtual Skill Training system in AR that supports self-guided basketball skill practice, with feedback on balance, posture, and timing. From a formative study with basketball players and coaches, the system addresses three challenges: understanding skills, identifying errors, and correcting mistakes. ViSTAR follows the Behavioral Skills Training (BST) framework-instruction, modeling, rehearsal, and feedback. It provides feedback through visual overlays, rhythm and timing cues, and an AI-powered coaching agent using 3D motion reconstruction. We generate verbal feedback by analyzing spatio-temporal joint data and mapping features to natural-language coaching cues via a Large Language Model (LLM). A key novelty is this feedback generation: motion features become concise coaching insights. In two studies (N=16), participants generally preferred our AI-generated feedback to coach feedback and reported that ViSTAR helped them notice posture and balance issues and refine movements beyond self-observation.

Authors:Mohamed Amine Kerkouri, Marouane Tliba, Aladine Chetouani, Alessandro Bruno
Title: SPGen: Stochastic scanpath generation for paintings using unsupervised domain adaptation
Abstract:
Understanding human visual attention is key to preserving cultural heritage We introduce SPGen a novel deep learning model to predict scanpaths the sequence of eye movementswhen viewers observe paintings. Our architecture uses a Fully Convolutional Neural Network FCNN with differentiable fixation selection and learnable Gaussian priors to simulate natural viewing biases To address the domain gap between photographs and artworks we employ unsupervised domain adaptation via a gradient reversal layer allowing the model to transfer knowledge from natural scenes to paintings Furthermore a random noise sampler models the inherent stochasticity of eyetracking data. Extensive testing shows SPGen outperforms existing methods offering a powerful tool to analyze gaze behavior and advance the preservation and appreciation of artistic treasures.

Authors:Yibo Meng, Bingyi Liu, Ruiqi Chen, Yan Guan
Title: Misty Forest VR: Turning Real ADHD Attention Patterns into Shared Momentum for Youth Collaboration
Abstract:
Attention Deficit Hyperactivity Disorder (ADHD) remains highly stigmatized in many cultural contexts, particularly in China, where ADHD-related behaviors are often moralized rather than understood as neurodevelopmental differences. As a result, challenges of self-perception, social misunderstanding, and collaboration between ADHD and non-ADHD individuals remain largely unaddressed. We present Misty Forest, a VR-based collaborative game that explores ADHD through asymmetric co-play. The system translates empirically grounded ADHD behavioral patterns -- such as fluctuating attention and time blindness -- into complementary roles that require mutual coordination between players. Rather than compensating for deficits, the design treats cognitive differences as a source of interdependence. In a controlled study with mixed ADHD--non-ADHD dyads, Misty Forest led to higher task completion, increased self-acceptance among ADHD participants, improved ADHD knowledge, and greater empathy among non-ADHD players. These findings suggest that neurodiversity-centered interactive design can foster understanding, reciprocity, and inclusive collaboration.

Authors:Yibo Meng, Bingyi Liu, Ruiqi Chen, Xin Chen, Yan Guan
Title: 52-Hz Whale Song: An Embodied VR Experience for Exploring Misunderstanding and Empathy
Abstract:
Experiences of being misunderstood often stem not from a lack of voice, but from mismatches between how individuals express themselves and how others listen. Such communicative mismatches arise across many social settings, including situations involving linguistic and cultural displacement. While prior HCI research has explored empathy through virtual reality, many approaches rely on narrative explanation, positioning users as observers rather than embodied participants. We present 52-Hz Whale Song, an embodied VR experience that explores miscommunication through metaphor and perspective-shifting. Inspired by the real-world "52-Hz whale," whose calls are not responded to by others, the experience uses this phenomenon as an experiential lens on communicative mismatch rather than representing any specific social group. Players progress through a three-act arc that moves from failed communication to agency and ultimately to mediation. A preliminary mixed-methods study (N = 30) suggests increased perspective-taking and reduced self-reported social distance in immigrant-related situations. This work highlights how embodied metaphor and role-shifting can support empathic engagement and offers transferable design insights for empathy-oriented interactive systems.

Authors:Injun Baek, Yearim Kim, Nojun Kwak
Title: PedaCo-Gen: Scaffolding Pedagogical Agency in Human-AI Collaborative Video Authoring
Abstract:
While advancements in Text-to-Video (T2V) generative AI offer a promising path toward democratizing content creation, current models are often optimized for visual fidelity rather than instructional efficacy. This study introduces PedaCo-Gen, a pedagogically-informed human-AI collaborative video generating system for authoring instructional videos based on Mayer's Cognitive Theory of Multimedia Learning (CTML). Moving away from traditional "one-shot" generation, PedaCo-Gen introduces an Intermediate Representation (IR) phase, enabling educators to interactively review and refine video blueprints-comprising scripts and visual descriptions-with an AI reviewer. Our study with 23 education experts demonstrates that PedaCo-Gen significantly enhances video quality across various topics and CTML principles compared to baselines. Participants perceived the AI-driven guidance not merely as a set of instructions but as a metacognitive scaffold that augmented their instructional design expertise, reporting high production efficiency (M=4.26) and guide validity (M=4.04). These findings highlight the importance of reclaiming pedagogical agency through principled co-creation, providing a foundation for future AI authoring tools that harmonize generative power with human professional expertise.

Authors:Yue Deng, Changyang He
Title: A User-driven Design Framework for Robotaxi
Abstract:
Robotaxis are emerging as a promising form of urban mobility, yet research has largely emphasized technical driving performance while leaving open how passengers experience and evaluate rides without a human driver. To address the limitations of prior work that often relies on simulated or hypothetical settings, we investigate real-world robotaxi use through 18 semi-structured interviews and autoethnographic ride experiences. We found that users were drawn to robotaxis by low cost, social recommendation, and curiosity. They valued a distinctive set of benefits, such as an increased sense of agency, and consistent driving behavioral consistency and standardized ride experiences. However, they encountered persistent challenges around limited flexibility, insufficient transparency, management difficulty, robustness concerns in edge cases, and emergency handling concerns. Robotaxi experiences were shaped by privacy, safety, ethics, and trust. Users were often privacy-indifferent yet sensitive to opaque access and leakage risks; safety perceptions were polarized; and ethical considerations surfaced round issues such as accountability, feedback responsibility and absence of human-like social norms. Based on these findings, we propose a user-driven design framework spanning the end-to-end journey, such as pre-ride configuration (hailing), context-aware pickup facilitation (pick-up) in-ride explainability (traveling), and accountable post-ride feedback (drop-off) to guide robotaxi interaction and service design.

Authors:Lan Luo, Dongyijie Primo Pan, Junhua Zhu, Muzhi Zhou, Pan Hui
Title: Meflex: A Multi-agent Scaffolding System for Entrepreneurial Ideation Iteration via Nonlinear Business Plan Writing
Abstract:
Business plan (BP) writing plays a key role in entrepreneurship education by helping learners construct, evaluate, and iteratively refine their ideas. However, conventional BP writing remains a rigid, linear process that often fails to reflect the dynamic and recursive nature of entrepreneurial ideation. This mismatch is particularly challenging for novice entrepreneurial students, who struggle with the substantial cognitive demands of developing and refining ideas. While reflection and meta-reflection are critical strategies for fostering divergent and convergent thinking, existing writing tools rarely scaffold these higher-order processes. To address this gap, we present the Meflex System, a large language model (LLM)-based writing tool that integrates BP writing scaffolding with a nonlinear idea canvas to support iterative ideation through reflection and meta-reflection. We report findings from an exploratory user study with 30 participants that examined the system's usability and cognitive impact. Results show that Meflex effectively scaffolds BP writing, promotes divergent thinking through LLM-supported reflection, and enhances meta-reflective awareness while reducing cognitive load during complex idea development. These findings highlight the potential of non-linear LLM-based writing tools to foster deeper and coherent entrepreneurial thinking.

Authors:Zhipeng Li, Yi-Chi Liao, Christian Holz
Title: Preference-Guided Prompt Optimization for Text-to-Image Generation
Abstract:
Generative models are increasingly powerful, yet users struggle to guide them through prompts. The generative process is difficult to control and unpredictable, and user instructions may be ambiguous or under-specified. Prior prompt refinement tools heavily rely on human effort, while prompt optimization methods focus on numerical functions and are not designed for human-centered generative tasks, where feedback is better expressed as binary preferences and demands convergence within few iterations. We present APPO, a preference-guided prompt optimization algorithm. Instead of iterating prompts, users only provide binary preferential feedback. APPO adaptively balances its strategies between exploiting user feedback and exploring new directions, yielding effective and efficient optimization. We evaluate APPO on image generation, and the results show APPO enables achieving satisfactory outcomes in fewer iterations with lower cognitive load than manual prompt editing. We anticipate APPO will advance human-AI collaboration in generative tasks by leveraging user preferences to guide complex content creation.

Authors:Zhipeng Li, Christoph Gebhardt, Yi-Chi Liao, Christian Holz
Title: Automating UI Optimization through Multi-Agentic Reasoning
Abstract:
We present AutoOptimization, a novel multi-objective optimization framework for adapting user interfaces. From a user's verbal preferences for changing a UI, our framework guides a prioritization-based Pareto frontier search over candidate layouts. It selects suitable objective functions for UI placement while simultaneously parameterizing them according to the user's instructions to define the optimization problem. A solver then generates a series of optimal UI layouts, which our framework validates against the user's instructions to adapt the UI with the final solution. Our approach thus overcomes the previous need for manual inspection of layouts and the use of population averages for objective parameters. We integrate multiple agents sequentially within our framework, enabling the system to leverage their reasoning capabilities to interpret user preferences, configure the optimization problem, and validate optimization outcomes.

Authors:Yihuan Chen, Kexue Fu, Qianyi Chen, Zhicong Lu, Ray LC
Title: The Configuration of Space: Probing the Way Social Interaction and Perception are Affected by Task-Specific Spatial Representations in Online Video Communication
Abstract:
Humans live and act in 3D space, but often work and communicate on 2D surfaces. The prevalence of online communication on 2D screens raises the issue of whether human spatial configuration affects our capabilities, social perception, and behaviors when interacting with others in 2D video chat. How do factors like location, setting, and context subtly shape our online communication, particularly in scenarios such as social support and topic-based discussions? Using Ohyay.co as a platform, we compared a normal gallery interface with a scene-based Room-type interface where participants are located in circular arrangement on screen in a social support task, and found that participants allocated attention to the group as a whole, and had pronounced self-awareness in the Room format. We then chose a two-sided topic for discussion in the Gallery interface and the Room interface where participants on each team face-off against each other, and found that they utilized spatial references to orient their allegiances, expressing greater engagement with those farther away in digital space and greater empathy with those closer, in the Room over the Gallery format. We found spatial effects in the way participants hide from the spotlight, in perspective-taking, and in their use of expressive gestures in time on the screen. This work highlights the need for considering spatial configuration in 2D in the design of collaborative communication systems to optimize for psychological needs for particular tasks.

Authors:You Zhou, Bingyuan Wang, Hongcheng Guo, Rui Cao, Zeyu Wang
Title: GatheringSense: AI-Generated Imagery and Embodied Experiences for Understanding Literati Gatherings
Abstract:
Chinese literati gatherings (Wenren Yaji), as a situated form of Chinese traditional culture, remain underexplored in depth. Although generative AI supports powerful multimodal generation, current cultural applications largely emphasize aesthetic reproduction and struggle to convey the deeper meanings of cultural rituals and social frameworks. Based on embodied cognition, we propose an AI-driven dual-path framework for cultural understanding, which we instantiate through GatheringSense, a literati-gathering experience. We conduct a mixed-methods study (N=48) to compare how AI-generated multimodal content and embodied participation complement each other in supporting the understanding of literati gatherings and fostering cultural resonance. Our results show that AI-generated content effectively improves the readability of cultural symbols and initial emotional attraction, yet limitations in physical coherence and micro-level credibility may affect users' satisfaction. In contrast, embodied experience significantly deepens participants' understanding of ritual rules and social roles, and increases their psychological closeness and presence. Based on these findings, we offer empirical evidence and five transferable design implications for generative experience in cultural heritage.

Authors:Dongyijie Primo Pan, Shuyue Li, Yawei Zhao, Junkun Long, Hao Li, Pan Hui
Title: Whispers of the Butterfly: A Research-through-Design Exploration of In-Situ Conversational AI Guidance in Large-Scale Outdoor MR Exhibitions
Abstract:
Large-scale outdoor mixed reality (MR) art exhibitions distribute curated virtual works across open public spaces, but interpretation rarely scales without turning exploration into a scripted tour. Through Research-through-Design, we created Dream-Butterfly, an in-situ conversational AI docent embodied as a small non-human companion that visitors summon for multilingual, exhibition-grounded explanations. We deployed Dream-Butterfly in a large-scale outdoor MR exhibition at a public university campus in southern China, and conducted an in-the-wild between-subject study (N=24) comparing a primarily human-led tour with an AI-led tour while keeping staff for safety in both conditions. Combining questionnaires and semi-structured interviews, we characterize how shifting the primary explanation channel reshapes explanation access, perceived responsiveness, immersion, and workload, and how visitors negotiate responsibility handoffs among staff, the AI guide, and themselves. We distill transferable design implications for configuring mixed human-AI guiding roles and embodying conversational agents in mobile, safety-constrained outdoor MR exhibitions.

Authors:Joel Wester, Samuel Rhys Cox, Henning Pohl, Niels van Berkel
Title: Chaplains' Reflections on the Design and Usage of AI for Conversational Care
Abstract:
Despite growing recognition that responsible AI requires domain knowledge, current work on conversational AI primarily draws on clinical expertise that prioritises diagnosis and intervention. However, much of everyday emotional support needs occur in non-clinical contexts, and therefore requires different conversational approaches. We examine how chaplains, who guide individuals through personal crises, grief, and reflection, perceive and engage with conversational AI. We recruited eighteen chaplains to build AI chatbots. While some chaplains viewed chatbots with cautious optimism, the majority expressed limitations of chatbots' ability to support everyday well-being. Our analysis reveals how chaplains perceive their pastoral care duties and areas where AI chatbots fall short, along the themes of Listening, Connecting, Carrying, and Wanting. These themes resonate with the idea of attunement, recently highlighted as a relational lens for understanding the delicate experiences care technologies provide. This perspective informs chatbot design aimed at supporting well-being in non-clinical contexts.

Authors:Ziyi Xuan, Yiwen Wu, Zhaoyang Yan, Vinod Namboodiri, Yu Yang
Title: After Talking with 1,000 Personas: Learning Preference-Aligned Proactive Assistants From Large-Scale Persona Interactions
Abstract:
Smart assistants increasingly act proactively, yet mistimed or intrusive behavior often causes users to lose trust and disable these features. Learning user preferences for proactive assistance is difficult because real-world studies are costly, limited in scale, and rarely capture how preferences change across multiple interaction sessions. Large language model based generative agents offer a way to simulate realistic interactions, but existing synthetic datasets remain limited in temporal depth, diverse personas, and multi-dimensional preferences. They also provide little support for transferring population-level insights to individual users under on-device constraints. We present a population-to-individual learning framework for preference-aligned proactive assistants that operates under on-device and privacy constraints. Our approach uses large-scale interaction simulation with 1,000 diverse personas to learn shared structure in how users express preferences across recurring dimensions such as timing, autonomy, and communication style, providing a strong cold start without relying on real user logs. The assistant then adapts to individual users on device through lightweight activation-based steering driven by simple interaction feedback, without model retraining or cloud-side updates. We evaluate the framework using controlled simulations with 1,000 simulated personas and a human-subject study with 30 participants. Results show improved timing decisions and perceived interaction quality over untuned and direct-response baselines, while on-device activation steering achieves performance comparable to reinforcement learning from human feedback. Participants also report higher satisfaction, trust, and comfort as the assistant adapts over multiple sessions of interactions.

Authors:Nicolás E. Díaz Ferreyra, Moritz Mock, Max Kretschmann, Barbara Russo, Mojtaba Shahin, Mansooreh Zahedi, Riccardo Scandariato
Title: Reading Between the Code Lines: On the Use of Self-Admitted Technical Debt for Security Analysis
Abstract:
Static Analysis Tools (SATs) are central to security engineering activities, as they enable early identification of code weaknesses without requiring execution. However, their effectiveness is often limited by high false-positive rates and incomplete coverage of vulnerability classes. At the same time, developers frequently document security-related shortcuts and compromises as Self-Admitted Technical Debt (SATD) in software artifacts, such as code comments. While prior work has recognized SATD as a rich source of security information, it remains unclear whether -and in what ways- it is utilized during SAT-aided security analysis. OBJECTIVE: This work investigates the extent to which security-related SATD complements the output produced by SATs and helps bridge some of their well-known limitations. METHOD: We followed a mixed-methods approach consisting of (i) the analysis of a SATD-annotated vulnerability dataset using three state-of-the-art SATs and (ii) an online survey with 72 security practitioners. RESULTS: The combined use of all SATs flagged 114 of the 135 security-related SATD instances, spanning 24 distinct Common Weakness Enumeration (CWE) identifiers. A manual mapping of the SATD comments revealed 33 unique CWE types, 6 of which correspond to categories that SATs commonly overlook or struggle to detect (e.g., race conditions). Survey responses further suggest that developers frequently pair SAT outputs with SATD insights to better understand the impact and root causes of security weaknesses and to identify suitable fixes. IMPLICATIONS: Our findings show that such SATD-encoded information can be a meaningful complement to SAT-driven security analysis, while helping to overcome some of SATs' practical shortcomings.

Authors:Haoze Guo, Ziqi Wei
Title: Behind the Feed: A Taxonomy of User-Facing Cues for Algorithmic Transparency in Social Media
Abstract:
People who use social media are learning about how the companies that run these platforms make their decisions on who gets to see what through visual indicators in the interface (UI) of each social media site. These indicators are different for each platform and are not always located in an easy-to-find location on the site. Therefore, it is hard for someone to compare different social media platforms or determine whether transparency leads to greater accountability or only leads to increased understanding. A new classification system has been developed to help provide a standard way of categorizing the way, that an algorithm is presented through UI elements and whether the company has provided any type of explanation as to why they are featured. This new classification system includes the following three areas of development: design form, information content, and user agency. This new classification system can be applied to the six social media platforms currently available and serves as a reference database for identifying common archetypes of features in the each social media platform's UI. The new classification system will assist in determining whether or not the transparency of an algorithm functions the way that it was intended when it was developed and provide future design ideas that can help improve the inspectibility, actionability, and contestability of algorithms.

Authors:Lei Han, Yi Gao, Xuanchen Lu, Bingyuan Wang, Lujin Zhang, Zeyu Wang, David Yip
Title: Gen-Diaolou: An Integrated AI-Assisted Interactive System for Diachronic Understanding and Preservation of the Kaiping Diaolou
Abstract:
The Kaiping Diaolou and Villages, a UNESCO World Heritage Site, exemplify hybrid Chinese and Western architecture shaped by migration culture. However, architectural heritage engagement often faces authenticity debates, resource constraints, and limited participatory approaches. This research explores current challenges of leveraging Artificial Intelligence (AI) for architectural heritage, and how AI-assisted interactive systems can foster cultural heritage understanding and preservation awareness. We conducted a formative study (N=14) to uncover empirical insights from heritage stakeholders that inform design. These insights informed the design of Gen-Diaolou, an integrated AI-assisted interactive system that supports heritage understanding and preservation. A pilot study (N=18) and a museum field study (N=26) provided converging evidence suggesting that Gen-Diaolou may support visitors' diachronic understanding and preservation awareness, and together informed design implications for future human-AI collaborative systems for digital cultural heritage engagement. More broadly, this work bridges the research gap between passive heritage systems and unconstrained creative tools in the HCI domain.

Authors:Duan Li, Jun Yuan, Xinyuan Guo, Xiting Wang, Yang Liu, Weikai Yang, Shixia Liu
Title: NCP: Neighborhood-Preserving Non-Uniform Circle Packing for Visualization
Abstract:
Circle packing is widely used in visualization due to its aesthetic appeal and simplicity, particularly in tasks where the spatial arrangement and relationships between data are of interest, such as understanding proximity relationships (e.g., images with categories) or analyzing quantitative data (e.g., housing prices). Many applications require preserving neighborhood relationships while encoding a quantitative attribute using radii for data analysis. To meet these two requirements simultaneously, we present a neighborhood-preserving non-uniform circle packing method, NCP. This method preserves neighborhood relationships between the data represented by non-uniform circles to comprehensively analyze similar data and an attribute of interest. We formulate neighborhood-preserving non-uniform circle packing as a planar graph embedding problem based on the circle packing theorem. This formulation leads to a non-convex optimization problem, which can be solved by the continuation method. We conduct a quantitative evaluation and present two use cases to demonstrate that our NCP method can effectively generate non-uniform circle packing results.

Authors:Danlin Zheng, Xiaoying Wei, Chao Liu, Quanyu Zhang, Jingling Zhang, Shihui Duo, Mingming Fan
Title: From Performers to Creators: Understanding Retired Women's Perceptions of Technology-Enhanced Dance Performance
Abstract:
Over 100 million retired women in China engage in dance, but their performances are constrained by limited resources and age-related decline. While interactive dance technologies can enhance artistic expression, existing systems are largely inaccessible to non-professional older dancers. This paper explores how interactive dance technologies can be designed with an age-sensitive approach to support retired women in enhancing their stage performance. We conducted two workshops with community-based retired women dancers, employing interactive dance and LLM-powered video generation probes in co-design activities. Findings indicate that age-sensitive adaptations, such as low-barrier keyword input, motion-aligned visual effects, and participatory scaffolds, lowered technical barriers and fostered a sense of authorship. These features enabled retired women to empower their stage, transitioning from passive recipients of stage design to empowered co-creators of performance. We outline design implications for incorporating interactive dance and artificial intelligence-generated content (AIGC) into the cultural practices of retired women, offering broader strategies for age-sensitive creative technologies.

Authors:Venkatesh Sivaraman, Eric P. Mason, Mengfan Ellen Li, Jessica Tong, Andrew J. King, Jeremy M. Kahn, Adam Perer
Title: Intelligent Reasoning Cues: A Framework and Case Study of the Roles of AI Information in Complex Decisions
Abstract:
Artificial intelligence (AI)-based decision support systems can be highly accurate yet still fail to support users or improve decisions. Existing theories of AI-assisted decision-making focus on calibrating reliance on AI advice, leaving it unclear how different system designs might influence the reasoning processes underneath. We address this gap by reconsidering AI interfaces as collections of intelligent reasoning cues: discrete pieces of AI information that can individually influence decision-making. We then explore the roles of eight types of reasoning cues in a high-stakes clinical decision (treating patients with sepsis in intensive care). Through contextual inquiries with six teams and a think-aloud study with 25 physicians, we find that reasoning cues have distinct patterns of influence that can directly inform design. Our results also suggest that reasoning cues should prioritize tasks with high variability and discretion, adapt to ensure compatibility with evolving decision needs, and provide complementary, rigorous insights on complex cases.

Authors:Varun Srivastava, Fan Lei, Alan M. MacEachren, Ross Maciejewski
Title: The Impact of Uncertainty Visualization on Trust in Thematic Maps
Abstract:
Thematic maps are widely used to communicate spatial patterns to non-expert audiences. Although uncertainty is inherent in thematic map data, it is rarely visualized, raising questions about how its inclusion affects trust. Prior work offers mixed perspectives: some argue that uncertainty fosters trust through transparency, while others suggest it may reduce trust by introducing confusion. Yet few empirical studies explicitly measure trust in thematic maps. We conducted a between-subjects experiment (N=161) to evaluate how visualizing uncertainty at varying levels (low, medium, high) influences trust. We find that uncertainty visualization generally reduces trust, with greater reductions observed as uncertainty levels increase. However, maps dominated by low uncertainty do not significantly differ in trust from those with no uncertainty. Moreover, while uncertainty visualization tends to make readers question the accuracy of the data, it appears to have a weaker influence on perceptions of the mapmaker's integrity.

Authors:Dániel Szabó, Chi-Lan Yang, Aku Visuri, Jonas Oppenlaender, Bharathi Sekar, Koji Yatani, Simo Hosio
Title: Conversational Inoculation to Enhance Resistance to Misinformation
Abstract:
Proliferation of misinformation is a globally acknowledged problem. Cognitive Inoculation helps build resistance to different forms of persuasion, such as misinformation. We investigate Conversational Inoculation, a method to help people build resistance to misinformation through dynamic conversations with a chatbot. We built a Web-based system to implement the method, and conducted a within-subject user experiment to compare it with two traditional inoculation methods. Our results validate Conversational Inoculation as a viable novel method, and show how it was able to enhance participants' resistance to misinformation. A qualitative analysis of the conversations between participants and the chatbot reveal independence and trust as factors that boosted the efficiency of Conversational Inoculation, and friction of interaction as a factor hindering it. We discuss the opportunities and challenges of using Conversational Inoculation to combat misinformation. Our work contributes a timely investigation and a promising research direction in scalable ways to combat misinformation.

Authors:Si Chen, Jingyi Xie, Yao Li, Ya-Fang Lin, He Zhang, Ge Wang, Gaojian Huang, Rui Yu, Ronald Anthony Metoyer, Ting Hua, Nitesh Chawla
Title: A Human-Centred AI System for Multi-Actor Planning and Collaboration in Family Learning
Abstract:
Family learning takes place in everyday routines where children and caregivers read, practice, and develop new skills together. Despite growing interest in AI tutors, most existing systems are designed for single learners or classroom settings and do not address the distributed planning, coordination, and execution demands of learning at home. This paper introduces ParPal, a human-centred, LLM-powered system that supports multi-actor family learning by decomposing learning goals into actionable subtasks, allocating them across caregivers under realistic availability and expertise constraints, and providing caregiver-in-the-loop tutoring support with visibility into individual and collective contributions. Through expert evaluation of generated weekly learning plans and a one-week field deployment with 11 families, we identify systematic failure modes in current LLM-based planning, including misalignment with role expertise, unnecessary or costly collaboration, missing pedagogical learning trajectories, and physically or temporally infeasible tasks. While ParPal improves coordination clarity and recognition of caregiving effort, these findings expose fundamental limitations in how current LLMs operationalize pedagogical knowledge, reason about collaboration, and account for real-world, embodied constraints. We discuss implications for human-centred AI design and AI methodology, positioning multi-actor family learning as a critical testbed for advancing planning, adaptation, and pedagogical structure in next-generation AI systems.

Authors:Samuel Rhys Cox, Joel Wester, Niels van Berkel
Title: Polite But Boring? Trade-offs Between Engagement and Psychological Reactance to Chatbot Feedback Styles
Abstract:
As conversational agents become increasingly common in behaviour change interventions, understanding optimal feedback delivery mechanisms becomes increasingly important. However, choosing a style that both lessens psychological reactance (perceived threats to freedom) while simultaneously eliciting feelings of surprise and engagement represents a complex design problem. We explored how three different feedback styles: 'Direct', 'Politeness', and 'Verbal Leakage' (slips or disfluencies to reveal a desired behaviour) affect user perceptions and behavioural intentions. Matching expectations from literature, the 'Direct' chatbot led to lower behavioural intentions and higher reactance, while the 'Politeness' chatbot evoked higher behavioural intentions and lower reactance. However, 'Politeness' was also seen as unsurprising and unengaging by participants. In contrast, 'Verbal Leakage' evoked reactance, yet also elicited higher feelings of surprise, engagement, and humour. These findings highlight that effective feedback requires navigating trade-offs between user reactance and engagement, with novel approaches such as 'Verbal Leakage' offering promising alternative design opportunities.

Authors:Can Liu, Jaeuk Lee, Tianhe Chen, Zhibang Jiang, Xiaolin Wen, Yong Wang
Title: Athanor: Authoring Action Modification-based Interactions on Static Visualizations via Natural Language
Abstract:
Interactivity is crucial for effective data visualizations. However, it is often challenging to implement interactions for existing static visualizations, since the underlying code and data for existing static visualizations are often not available, and it also takes significant time and effort to enable interactions for them even if the original code and data are available. To fill this gap, we propose Athanor, a novel approach to transform existing static visualizations into interactive ones using multimodal large language models (MLLMs) and natural language instructions. Our approach introduces three key innovations: (1) an action-modification interaction design space that maps visualization interactions into user actions and corresponding adjustments, (2) a multi-agent requirement analyzer that translates natural language instructions into an actionable operational space, and (3) a visualization abstraction transformer that converts static visualizations into flexible and interactive representations regardless of their underlying implementation. Athanor allows users to effortlessly author interactions through natural language instructions, eliminating the need for programming. We conducted two case studies and in-depth interviews with target users to evaluate our approach. The results demonstrate the effectiveness and usability of our approach in allowing users to conveniently enable flexible interactions for static visualizations.

Authors:Olivia Pal, Veda Duddu, Agam Goyal, Drishti Goel, Koustuv Saha
Title: Do We Know What They Know We Know? Calibrating Student Trust in AI and Human Responses Through Mutual Theory of Mind
Abstract:
Trust and reliance are often treated as coupled constructs in human-AI interaction research, with the assumption that calibrating trust will lead to appropriate reliance. We challenge this assumption in educational contexts, where students increasingly turn to AI for learning support. Through semi-structured interviews with graduate students (N=8) comparing AI-generated and human-generated responses, we find a systematic dissociation: students exhibit high trust but low reliance on human experts due to social barriers (fear of judgment, help-seeking anxiety), while showing low trust but high reliance on AI systems due to social affordances (accessibility, anonymity, judgment-free interaction). Using Mutual Theory of Mind as an analytical lens, we demonstrate that trust is shaped by epistemic evaluations while reliance is driven by social factors -- and these may operate independently.

Authors:Samuel Rhys Cox, Jade Martin-Lise, Simo Hosio, Niels van Berkel
Title: Watching AI Think: User Perceptions of Visible Thinking in Chatbots
Abstract:
People increasingly turn to conversational agents such as ChatGPT to seek guidance for their personal problems. As these systems grow in capability, many now display elements of "thinking": short reflective statements that reveal a model's intentions or values before responding. While initially introduced to promote transparency, such visible thinking can also anthropomorphise the agent and shape user expectations. Yet little is known about how these displays affect user perceptions in help-seeking contexts. We conducted a 3 x 2 mixed design experiment examining the impact of 'Thinking Content' (None, Emotionally-Supportive, Expertise-Supportive) and 'Conversation Context' (Habit-related vs. Feelings-related problems) on users' perceptions of empathy, warmth, competence, and engagement. Participants interacted with a chatbot that either showed no visible thinking or presented value-oriented reflections prior to its response. Our findings contribute to understanding how thinking transparency influences user experience in supportive dialogues, and offer implications for designing conversational agents that communicate intentions in sensitive, help-seeking scenarios.

Authors:Ziyi Wang, Yilong Dai, Duanya Lyu, Mateo Nader, Sihan Chen, Wanghao Ye, Zjian Ding, Xiang Yan
Title: StreetDesignAI: A Multi-Persona Evaluation System for Inclusive Infrastructure Design
Abstract:
Designing inclusive cycling infrastructure requires balancing competing needs of diverse user groups, yet designers often struggle to anticipate how different cyclists experience the same street. We investigate how persona-based multi-agent evaluation can support inclusive design by making experiential conflicts explicit. We present StreetDesignAI, an interactive system that enables designers to (1) ground evaluation in street context through imagery and map data, (2) receive parallel feedback from cyclist personas spanning confident to cautious users, and (3) iteratively modify designs while surfacing conflicts across perspectives. A within-subjects study with 26 transportation professionals demonstrates that structured multi-perspective feedback significantly improves designers' understanding of diverse user perspectives, ability to identify persona needs, and confidence in translating them into design decisions, with higher satisfaction and stronger intention for professional adoption. Qualitative findings reveal how conflict surfacing transforms design exploration from single-perspective optimization toward deliberate trade-off reasoning. We discuss implications for AI tools that scaffold inclusive design through disagreement as an interaction primitive.

Authors:Alva Markelius, Fethiye Irmak Doğan, Julie Bailey, Guy Laban, Jenny L. Gibson, Hatice Gunes
Title: Social Robotics for Disabled Students: An Empirical Investigation of Embodiment, Roles and Interaction
Abstract:
Institutional and social barriers in higher education often prevent students with disabilities from effectively accessing support, including lengthy procedures, insufficient information, and high social-emotional demands. This study empirically explores how disabled students perceive robot-based support, comparing two interaction roles, one information based (signposting) and one disclosure based (sounding board), and two embodiment types (physical robot/disembodied voice agent). Participants assessed these systems across five dimensions: perceived understanding, social energy demands, information access/clarity, task difficulty, and data privacy concerns. The main findings of the study reveal that the physical robot was perceived as more understanding than the voice-only agent, with embodiment significantly shaping perceptions of sociability, animacy, and privacy. We also analyse differences between disability types. These results provide critical insights into the potential of social robots to mitigate accessibility barriers in higher education, while highlighting ethical, social and technical challenges.

Authors:Xinyu Li, Kaixun Yang, Jiameng Wei, Yixin Cheng, Dragan Gašević, Guanliang Chen
Title: Dataset of GenAI-Assisted Information Problem Solving in Education
Abstract:
Information Problem Solving (IPS) is a critical competency for academic and professional success in education, work, and life. The advent of Generative Artificial Intelligence (GenAI), particularly tools like ChatGPT, has introduced new possibilities for supporting students in complex IPS tasks. However, empirical insights into how students engage with GenAI during IPS and how these tools can be effectively leveraged for learning remain limited. Moreover, differences in background, shaped by cultural and socioeconomic factors, pose additional challenges to the equitable integration of GenAI in educational contexts. To address this gap, we present an open-source dataset collected from 279 students at a public Australian university. The dataset was generated through students' use of FLoRA, a GenAI-powered educational platform that widely adopted in the field of learning analytics. Within FLoRA, students interacted with an embedded GenAI chatbot to gather information and synthesize it into data science project proposals. The dataset captures fine-grained, multi-dimensional records of GenAI-assisted IPS processes, including: (i) student-GenAI dialogue transcripts; (ii) writing process log traces; (iii) final project proposals with human-assigned assessment scores; (iv) surveys of biographic and prior knowledge in data science and AI; and (v) surveys capturing students' GenAI experience and perceptions of GenAI's effectiveness in supporting IPS. This dataset provides a valuable resource for advancing our understanding of GenAI's role in educational IPS and informing the design of adaptive, inclusive AI-powered learning tools.

Authors:Haoze Guo, Ziqi Wei
Title: Hidden-in-Plain-Text: A Benchmark for Social-Web Indirect Prompt Injection in RAG
Abstract:
Retrieval-augmented generation (RAG) systems put more and more emphasis on grounding their responses in user-generated content found on the Web, amplifying both their usefulness and their attack surface. Most notably, indirect prompt injection and retrieval poisoning attack the web-native carriers that survive ingestion pipelines and are very concerning. We provide OpenRAG-Soc, a compact, reproducible benchmark-and-harness for web-facing RAG evaluation under these threats, in a discrete data package. The suite combines a social corpus with interchangeable sparse and dense retrievers and deployable mitigations - HTML/Markdown sanitization, Unicode normalization, and attribution-gated answered. It standardizes end-to-end evaluation from ingestion to generation and reports attacks time of one of the responses at answer time, rank shifts in both sparse and dense retrievers, utility and latency, allowing for apples-to-apples comparisons across carriers and defenses. OpenRAG-Soc targets practitioners who need fast, and realistic tests to track risk and harden deployments.

Authors:Nadine Kuo, Agnia Sergeyuk, Valerie Chen, Maliheh Izadi
Title: Developer Interaction Patterns with Proactive AI: A Five-Day Field Study
Abstract:
Current in-IDE AI coding tools typically rely on time-consuming manual prompting and context management, whereas proactive alternatives that anticipate developer needs without explicit invocation remain underexplored. Understanding when humans are receptive to such proactive AI assistance during their daily work remains an open question in human-AI interaction research. We address this gap through a field study of proactive AI assistance in professional developer workflows. We present a five-day in-the-wild study with 15 developers who interacted with a proactive feature of an AI assistant integrated into a production-grade IDE that offers code quality suggestions based on in-IDE developer activity. We examined 229 AI interventions across 5,732 interaction points to understand how proactive suggestions are received across workflow stages, how developers experience them, and their perceived impact. Our findings reveal systematic patterns in human receptivity to proactive suggestions: interventions at workflow boundaries (e.g., post-commit) achieved 52% engagement rates, while mid-task interventions (e.g., on declined edit) were dismissed 62% of the time. Notably, well-timed proactive suggestions required significantly less interpretation time than reactive suggestions (45.4s versus 101.4s, W = 109.00, r = 0.533, p = 0.0016), indicating enhanced cognitive alignment. This study provides actionable implications for designing proactive coding assistants, including how to time interventions, align them with developer context, and strike a balance between AI agency and user control in production IDEs.

Authors:Jordan Taylor, William Agnew, Maarten Sap, Sarah E. Fox, Haiyi Zhu
Title: The Algorithmic Gaze: An Audit and Ethnography of the LAION-Aesthetics Predictor Model
Abstract:
Visual generative AI models are trained using a one-size-fits-all measure of aesthetic appeal. However, what is deemed "aesthetic" is inextricably linked to personal taste and cultural values, raising the question of whose taste is represented in visual generative AI models. In this work, we study an aesthetic evaluation model--LAION Aesthetic Predictor (LAP)--that is widely used to curate datasets to train visual generative image models, like Stable Diffusion, and evaluate the quality of AI-generated images. To understand what LAP measures, we audited the model across three datasets. First, we examined the impact of aesthetic filtering on the LAION-Aesthetics Dataset (approximately 1.2B images), which was curated from LAION-5B using LAP. We find that the LAP disproportionally filters in images with captions mentioning women, while filtering out images with captions mentioning men or LGBTQ+ people. Then, we used LAP to score approximately 330k images across two art datasets, finding the model rates realistic images of landscapes, cityscapes, and portraits from western and Japanese artists most highly. In doing so, the algorithmic gaze of this aesthetic evaluation model reinforces the imperial and male gazes found within western art history. In order to understand where these biases may have originated, we performed a digital ethnography of public materials related to the creation of LAP. We find that the development of LAP reflects the biases we found in our audits, such as the aesthetic scores used to train LAP primarily coming from English-speaking photographers and western AI-enthusiasts. In response, we discuss how aesthetic evaluation can perpetuate representational harms and call on AI developers to shift away from prescriptive measures of "aesthetics" toward more pluralistic evaluation.

Authors:Shaz Furniturewala, Gerard Christopher Yeo, Kokil Jaidka
Title: Learning Through Dialogue: Unpacking the Dynamics of Human-LLM Conversations on Political Issues
Abstract:
Large language models (LLMs) are increasingly used as conversational partners for learning, yet the interactional dynamics supporting users' learning and engagement are understudied. We analyze the linguistic and interactional features from both LLM and participant chats across 397 human-LLM conversations about socio-political issues to identify the mechanisms and conditions under which LLM explanations shape changes in political knowledge and confidence. Mediation analyses reveal that LLM explanatory richness partially supports confidence by fostering users' reflective insight, whereas its effect on knowledge gain operates entirely through users' cognitive engagement. Moderation analyses show that these effects are highly conditional and vary by political efficacy. Confidence gains depend on how high-efficacy users experience and resolve uncertainty. Knowledge gains depend on high-efficacy users' ability to leverage extended interaction, with longer conversations benefiting primarily reflective users. In summary, we find that learning from LLMs is an interactional achievement, not a uniform outcome of better explanations. The findings underscore the importance of aligning LLM explanatory behavior with users' engagement states to support effective learning in designing Human-AI interactive systems.

Authors:Xiaotian Zhang, Jinhong Yu, Pengwei Yan, Le Jiang, Xingyi Shen, Mumo Cheng, Xiaozhong Liu
Title: Human-in-the-Loop Interactive Report Generation for Chronic Disease Adherence
Abstract:
Chronic disease management requires regular adherence feedback to prevent avoidable hospitalizations, yet clinicians lack time to produce personalized patient communications. Manual authoring preserves clinical accuracy but does not scale; AI generation scales but can undermine trust in patient-facing contexts. We present a clinician-in-the-loop interface that constrains AI to data organization and preserves physician oversight through recognition-based review. A single-page editor pairs AI-generated section drafts with time-aligned visualizations, enabling inline editing with visual evidence for each claim. This division of labor (AI organizes, clinician decides) targets both efficiency and accountability. In a pilot with three physicians reviewing 24 cases, AI successfully generated clinically personalized drafts matching physicians' manual authoring practice (overall mean 4.86/10 vs. 5.0/10 baseline), requiring minimal physician editing (mean 8.3\% content modification) with zero safety-critical issues, demonstrating effective automation of content generation. However, review time remained comparable to manual practice, revealing an accountability paradox: in high-stakes clinical contexts, professional responsibility requires complete verification regardless of AI accuracy. We contribute three interaction patterns for clinical AI collaboration: bounded generation with recognition-based review via chart-text pairing, automated urgency flagging that analyzes vital trends and adherence patterns with fail-safe escalation for missed critical monitoring tasks, and progressive disclosure controls that reduce cognitive load while maintaining oversight. These patterns indicate that clinical AI efficiency requires not only accurate models, but also mechanisms for selective verification that preserve accountability.

Authors:Zhihao Yuan, Yunze Xiao, Ming Li, Weihao Xuan, Richard Tong, Mona Diab, Tom Mitchell
Title: Towards Valid Student Simulation with Large Language Models
Abstract:
This paper presents a conceptual and methodological framework for large language model (LLM) based student simulation in educational settings. The authors identify a core failure mode, termed the "competence paradox" in which broadly capable LLMs are asked to emulate partially knowledgeable learners, leading to unrealistic error patterns and learning dynamics. To address this, the paper reframes student simulation as a constrained generation problem governed by an explicit Epistemic State Specification (ESS), which defines what a simulated learner can access, how errors are structured, and how learner state evolves over time. The work further introduces a Goal-by-Environment framework to situate simulated student systems according to behavioral objectives and deployment contexts. Rather than proposing a new system or benchmark, the paper synthesizes prior literature, formalizes key design dimensions, and articulates open challenges related to validity, evaluation, and ethical risks. Overall, the paper argues for epistemic fidelity over surface realism as a prerequisite for using LLM-based simulated students as reliable scientific and pedagogical instruments.

Authors:Behdokht Kiafar, Mohammad Fahim Abrar, Roghayeh Leila Barmaki
Title: Feedback Effects on Cognitive Dynamics: Network-Based Insights from EEG Patterns and Behavioral Performance
Abstract:
This study examines the impact of feedback on Electroencephalography (EEG) activity and performance during the Reading the Mind in the Eyes Test. In a within-subject design, eleven participants completed the test under Feedback and No-Feedback conditions. Using the principles of Epistemic Network Analysis (ENA) and Ordered Network Analysis (ONA), we extend these network-based models to explore the link between neural dynamics and task outcomes. ENA results showed that feedback is associated with stronger connections between higher frequency EEG bands (Beta and Gamma) and correct responses, while the absence of feedback activated lower frequency bands (Theta and Alpha). ONA further disclosed directional shifts toward higher frequency activity preceding correct answers in the Feedback condition, whereas the No-Feedback condition showed more self-connections in lower bands and a higher occurrence of wrong answers, suggesting less effective reasoning strategies without feedback. Both ENA and ONA revealed statistically significant differences between conditions (p = 0.01, Cohen's d > 2). This study highlights the methodological benefits of integrating EEG with ENA and ONA for network analysis, capturing both temporal and relational dynamics, as well as the practical insight that feedback can foster more effective reasoning processes and improve task performance.

Authors:Ben Carvell, Marc Thomas, Andrew Pace, Christopher Dorney, George De Ath, Richard Everson, Nick Pepper, Adam Keane, Samuel Tomlinson, Richard Cannon
Title: Human-in-the-Loop Testing of AI Agents for Air Traffic Control with a Regulated Assessment Framework
Abstract:
We present a rigorous, human-in-the-loop evaluation framework for assessing the performance of AI agents on the task of Air Traffic Control, grounded in a regulator-certified simulator-based curriculum used for training and testing real-world trainee controllers. By leveraging legally regulated assessments and involving expert human instructors in the evaluation process, our framework enables a more authentic and domain-accurate measurement of AI performance. This work addresses a critical gap in the existing literature: the frequent misalignment between academic representations of Air Traffic Control and the complexities of the actual operational environment. It also lays the foundations for effective future human-machine teaming paradigms by aligning machine performance with human assessment targets.

Authors:Xiang Zhang, Huan Yan, Jinyang Huang, Bin Liu, Yuanhao Feng, Jianchun Liu, Meng Li, Fusang Zhang, Zhi Liu
Title: Beyond Physical Labels: Redefining Domains for Robust WiFi-based Gesture Recognition
Abstract:
In this paper, we propose GesFi, a novel WiFi-based gesture recognition system that introduces WiFi latent domain mining to redefine domains directly from the data itself. GesFi first processes raw sensing data collected from WiFi receivers using CSI-ratio denoising, Short-Time Fast Fourier Transform, and visualization techniques to generate standardized input representations. It then employs class-wise adversarial learning to suppress gesture semantic and leverages unsupervised clustering to automatically uncover latent domain factors responsible for distributional shifts. These latent domains are then aligned through adversarial learning to support robust cross-domain generalization. Finally, the system is applied to the target environment for robust gesture inference. We deployed GesFi under both single-pair and multi-pair settings using commodity WiFi transceivers, and evaluated it across multiple public datasets and real-world environments. Compared to state-of-the-art baselines, GesFi achieves up to 78% and 50% performance improvements over existing adversarial methods, and consistently outperforms prior generalization approaches across most cross-domain tasks.

Authors:Yilong Dai, Ziyi Wang, Chenguang Wang, Kexin Zhou, Yiheng Qian, Susu Xu, Xiang Yan
Title: Persona-aware and Explainable Bikeability Assessment: A Vision-Language Model Approach
Abstract:
Bikeability assessment is essential for advancing sustainable urban transportation and creating cyclist-friendly cities, and it requires incorporating users' perceptions of safety and comfort. Yet existing perception-based bikeability assessment approaches face key limitations in capturing the complexity of road environments and adequately accounting for heterogeneity in subjective user perceptions. This paper proposes a persona-aware Vision-Language Model framework for bikeability assessment with three novel contributions: (i) theory-grounded persona conditioning based on established cyclist typology that generates persona-specific explanations via chain-of-thought reasoning; (ii) multi-granularity supervised fine-tuning that combines scarce expert-annotated reasoning with abundant user ratings for joint prediction and explainable assessment; and (iii) AI-enabled data augmentation that creates controlled paired data to isolate infrastructure variable impacts. To test and validate this framework, we developed a panoramic image-based crowdsourcing system and collected 12,400 persona-conditioned assessments from 427 cyclists. Experiment results show that the proposed framework offers competitive bikeability rating prediction while uniquely enabling explainable factor attribution.

Authors:Jing Ye, Lu Xiang, Yaping Zhang, Chengqing Zong
Title: EmoHarbor: Evaluating Personalized Emotional Support by Simulating the User's Internal World
Abstract:
Current evaluation paradigms for emotional support conversations tend to reward generic empathetic responses, yet they fail to assess whether the support is genuinely personalized to users' unique psychological profiles and contextual needs. We introduce EmoHarbor, an automated evaluation framework that adopts a User-as-a-Judge paradigm by simulating the user's inner world. EmoHarbor employs a Chain-of-Agent architecture that decomposes users' internal processes into three specialized roles, enabling agents to interact with supporters and complete assessments in a manner similar to human users. We instantiate this benchmark using 100 real-world user profiles that cover a diverse range of personality traits and situations, and define 10 evaluation dimensions of personalized support quality. Comprehensive evaluation of 20 advanced LLMs on EmoHarbor reveals a critical insight: while these models excel at generating empathetic responses, they consistently fail to tailor support to individual user contexts. This finding reframes the central challenge, shifting research focus from merely enhancing generic empathy to developing truly user-aware emotional support. EmoHarbor provides a reproducible and scalable framework to guide the development and evaluation of more nuanced and user-aware emotional support systems.

Authors:Ananya Bhattacharjee, Jina Suh, Mohit Chandra, Javier Hernandez
Title: User Perceptions of an LLM-Based Chatbot for Cognitive Reappraisal of Stress: Feasibility Study
Abstract:
Cognitive reappraisal is a well-studied emotion regulation strategy that helps individuals reinterpret stressful situations to reduce their impact. Many digital mental health tools struggle to support this process because rigid scripts fail to accommodate how users naturally describe stressors. This study examined the feasibility of an LLM-based single-session intervention (SSI) for workplace stress reappraisal. We assessed short-term changes in stress-related outcomes and examined design tensions during use. We conducted a feasibility study with 100 employees at a large technology company who completed a structured cognitive reappraisal session delivered by a GPT-4o-based chatbot. Pre-post measures included perceived stress intensity, stress mindset, perceived demand, and perceived resources. These outcomes were analyzed using paired Wilcoxon signed-rank tests with correction for multiple comparisons. We also examined sentiment and stress trajectories across conversation quartiles using two RoBERTa-based classifiers and an LLM-based stress rater. Open-ended responses were analyzed using thematic analysis. Results showed significant reductions in perceived stress intensity and significant improvements in stress mindset. Changes in perceived resources and perceived demand trended in expected directions but were not statistically significant. Automated analyses indicated consistent declines in negative sentiment and stress over the course of the interaction. Qualitative findings suggested that participants valued the structured prompts for organizing thoughts, gaining perspective, and feeling acknowledged. Participants also reported tensions around scriptedness, preferred interaction length, and reactions to AI-driven empathy. These findings highlight both the promise and the design constraints of integrating LLMs into DMH interventions for workplace settings.

Authors:Priyan Vaithilingam, Elena L. Glassman, Nathalie Henry Riche, Gonzalo Ramos, Jeevana Priya Inala, Chenglong Wang
Title: MagicCopy: Bring my data along with me beyond boundaries of apps
Abstract:
People working with data often move their data across multiple applications, because they rely on these apps' complementing user experiences to best complete their tasks. Since traditional copy-and-paste approaches do not accommodate diverse table representations adopted by different apps, users spend considerable effort to reconstruct data formats and visual representations, making cross-app workflows costly. For example, when transferring a spreadsheet table with conditional formatting to a markup document, users spend substantial time translating its structure into appropriate tags and manually reformat color. This paper introduces MagicCopy, an AI-powered cross-app copy-and-paste, leveraging source and target contexts and user-specified instructions in natural language to automatically extract, parse, transform, and (re)format data from one app to another. In a study with sixteen participants, users quickly learned and applied MagicCopy to move data across three pairs of tools. Participants further explored diverse applications of MagicCopy to support more streamlined crossed-application interaction in their workflows.

Authors:Alexander Loth, Martin Kappes, Marc-Oliver Pahl
Title: Can Humans Tell? A Dual-Axis Study of Human Perception of LLM-Generated News
Abstract:
Can humans tell whether a news article was written by a person or a large language model (LLM)? We investigate this question using JudgeGPT, a study platform that independently measures source attribution (human vs. machine) and authenticity judgment (legitimate vs. fake) on continuous scales. From 2,318 judgments collected from 1,054 participants across content generated by six LLMs, we report five findings: (1) participants cannot reliably distinguish machine-generated from human-written text (p > .05, Welch's t-test); (2) this inability holds across all tested models, including open-weight models with as few as 7B parameters; (3) self-reported domain expertise predicts judgment accuracy (r = .35, p < .001) whereas political orientation does not (r = -.10, n.s.); (4) clustering reveals distinct response strategies ("Skeptics" vs. "Believers"); and (5) accuracy degrades after approximately 30 sequential evaluations due to cognitive fatigue. The answer, in short, is no: humans cannot reliably tell. These results indicate that user-side detection is not a viable defense and motivate system-level countermeasures such as cryptographic content provenance.

Authors:Yunge Wen, Awu Chen, Jianing Yu, Jas Brooks, Hiroshi Ishii, Paul Pu Liang
Title: AromaGen: Interactive Generation of Rich Olfactory Experiences with Multimodal Language Models
Abstract:
Smell's deep connection with food, memory, and social experience has long motivated researchers to bring olfaction into interactive systems. Yet most olfactory interfaces remain limited to fixed scent cartridges and pre-defined generation patterns, and the scarcity of large-scale olfactory datasets has further constrained AI-based approaches. We present AromaGen, an AI-powered wearable interface capable of real-time, general-purpose aroma generation from free-form text or visual inputs. AromaGen is powered by a multimodal LLM that leverages latent olfactory knowledge to map semantic inputs to structured mixtures of 12 carefully selected base odorants, released through a neck-worn dispenser. Users can iteratively refine generated aromas through natural language feedback via in-context learning. Through a controlled user study ($N = 26$), AromaGen matches human-composed mixtures in zero-shot generation and significantly surpasses them after iterative refinement, achieving a median similarity of 8/10 to real food aromas and reducing perceived artificiality to levels comparable to real food. AromaGen is a step towards real-world interactive aroma generation, opening new possibilities for communication, wellbeing, and immersive technologies.

Authors:Tianle Yang, Chengzhe Sun, Phil Rose, Siwei Lyu
Title: Acoustic and perceptual differences between standard and accented Chinese speech and their voice clones
Abstract:
Voice cloning is often evaluated in terms of overall quality, but less is known about accent preservation and its perceptual consequences. We compare standard and heavily accented Mandarin speech and their voice clones using a combined computational and perceptual design. Embedding-based analyses show no reliable accented-standard difference in original-clone distances across systems. In the perception study, clones are rated as more similar to their originals for standard than for accented speakers, and intelligibility increases from original to clone, with a larger gain for accented speech. These results show that accent variation can shape perceived identity match and intelligibility in voice cloning even when it is not reflected in an off-the-shelf speaker-embedding distance, and they motivate evaluating speaker identity preservation and accent preservation as separable dimensions.

Authors:Chitralekha Gupta, Jing Peng, Ashwin Ram, Shreyas Sridhar, Christophe Jouffrais, Suranga Nanayakkara
Title: Beyond Descriptions: A Generative Scene2Audio Framework for Blind and Low-Vision Users to Experience Vista Landscapes
Abstract:
Current scene perception tools for Blind and Low Vision (BLV) individuals rely on spoken descriptions but lack engaging representations of visually pleasing distant environmental landscapes (Vista spaces). Our proposed Scene2Audio framework generates comprehensible and enjoyable nonverbal audio using generative models informed by psychoacoustics, and principles of scene audio composition. Through a user study with 11 BLV participants, we found that combining the Scene2Audio sounds with speech creates a better experience than speech alone, as the sound effects complement the speech making the scene easier to imagine. A mobile app "in-the-wild" study with 7 BLV users for more than a week further showed the potential of Scene2Audio in enhancing outdoor scene experiences. Our work bridges the gap between visual and auditory scene perception by moving beyond purely descriptive aids, addressing the aesthetic needs of BLV users.

Authors:Yujin Park, Haejun Chung, Ikbeom Jang
Title: Dodgersort: Uncertainty-Aware VLM-Guided Human-in-the-Loop Pairwise Ranking
Abstract:
Pairwise comparison labeling is emerging as it yields higher inter-rater reliability than conventional classification labeling, but exhaustive comparisons require quadratic cost. We propose Dodgersort, which leverages CLIP-based hierarchical pre-ordering, a neural ranking head and probabilistic ensemble (Elo, BTL, GP), epistemic--aleatoric uncertainty decomposition, and information-theoretic pair selection. It reduces human comparisons while improving the reliability of the rankings. In visual ranking tasks in medical imaging, historical dating, and aesthetics, Dodgersort achieves a 11--16\% annotation reduction while improving inter-rater reliability. Cross-domain ablations across four datasets show that neural adaptation and ensemble uncertainty are key to this gain. In FG-NET with ground-truth ages, the framework extracts 5--20$\times$ more ranking information per comparison than baselines, yielding Pareto-optimal accuracy--efficiency trade-offs.

Authors:Niclas Pokel, Yiming Zhao, Pehuén Moure, Yingqiang Gao, Roman Böhringer
Title: Demonstration of Adapt4Me: An Uncertainty-Aware Authoring Environment for Personalizing Automatic Speech Recognition to Non-normative Speech
Abstract:
Personalizing Automatic Speech Recognition (ASR) for non-normative speech remains challenging because data collection is labor-intensive and model training is technically complex. To address these limitations, we propose Adapt4Me, a web-based decentralized environment that operationalizes Bayesian active learning to enable end-to-end personalization without expert supervision. The app exposes data selection, adaptation, and validation to lay users through a three-stage human-in-the-loop workflow: (1) rapid profiling via greedy phoneme sampling to capture speaker-specific acoustics; (2) backend personalization using Variational Inference Low-Rank Adaptation (VI-LoRA) to enable fast, incremental updates; and (3) continuous improvement, where users guide model refinement by resolving visualized model uncertainty via low-friction top-k corrections. By making epistemic uncertainty explicit, Adapt4Me reframes data efficiency as an interactive design feature rather than a purely algorithmic concern. We show how this enables users to personalize robust ASR models, transforming them from passive data sources into active authors of their own assistive technology.

Authors:Thomas Şerban von Davier, Hao-Ping Lee, Jodi Forlizzi, Sauvik Das
Title: Promoting Critical Thinking With Domain-Specific Generative AI Provocations
Abstract:
The evidence on the effects of generative AI (GenAI) on critical thinking is mixed, with studies suggesting both potential harms and benefits depending on its implementation. Some argue that AI-driven provocations, such as questions asking for human clarification and justification, are beneficial for eliciting critical thinking. Drawing on our experience designing and evaluating two GenAI-powered tools for knowledge work, ArtBot in the domain of fine art interpretation and Privy in the domain of AI privacy, we reflect on how design decisions shape the form and effectiveness of such provocations. Our observations and user feedback suggest that domain-specific provocations, implemented through productive friction and interactions that depend on user contribution, can meaningfully support critical thinking. We present participant experiences with both prototypes and discuss how supporting critical thinking may require moving beyond static provocations toward approaches that adapt to user preferences and levels of expertise.

Authors:Dora Zhao, Hannah Cha, Michael J. Ryan, Angelina Wang, Rachel Baker-Ramos Evyn-Bree Helekahi-Kaiwi, Rebecca Diego, Josiah Hester, Diyi Yang
Title: Whose Knowledge Counts? Co-Designing Community-Centered AI Auditing Tools with Educators in Hawai`i
Abstract:
Although generative AI is being deployed into classrooms with promises of aiding teachers, educators caution that these tools can have unintended pedagogical repercussions, including cultural misrepresentation and bias. These concerns are heightened in low-resource language and Indigenous education settings, where AI systems frequently underperform. We investigate these challenges in Hawai`i, where public schools operate under a statewide mandate to integrate Hawaiian language and culture into education. Through four co-design workshops with 22 public school educators, we surfaced concerns about using generative AI in educational settings, particularly around cultural misrepresentation, and corresponding designs for auditing tools that address these issues. We find that educators envision tools grounded in specific Hawaiian cultural values and practices, such as tracing the genealogy of knowledge in source materials. Building on these insights, we conceptualize AI auditing as a community-oriented process rather than the work of isolated individuals, and discuss implications for designing auditing tools.

Authors:Jiaming Zhang, Mingxu Liu, Hongchao Shu, Ruixing Liang, Yihao Liu, Ojas Taskar, Amir Kheradmand, Mehran Armand, Alejandro Martin-Gomez
Title: Extend Your Horizon: A Device-Agnostic Surgical Tool Tracking Framework with Multi-View Optimization for Augmented Reality
Abstract:
Surgical navigation provides real-time guidance by estimating the pose of patient anatomy and surgical instruments to visualize relevant intraoperative information. In conventional systems, instruments are typically tracked using fiducial markers and stationary optical tracking systems (OTS). Augmented reality (AR) has further enabled intuitive visualization and motivated tracking using sensors embedded in head-mounted displays (HMDs). However, most existing approaches rely on a clear line of sight, which is difficult to maintain in dynamic operating room environments due to frequent occlusions caused by equipment, surgical tools, and personnel. This work introduces a framework for tracking surgical instruments under occlusion by fusing multiple sensing modalities within a dynamic scene graph representation. The proposed approach integrates tracking systems with different accuracy levels and motion characteristics while estimating tracking reliability in real time. Experimental results demonstrate improved robustness and enhanced consistency of AR visualization in the presence of occlusions.

Authors:Weiyan Shi, Kenny Tsu Wei Choo
Title: More Than 1v1: Human-AI Alignment in Early Developmental Communities with Multimodal LLMs
Abstract:
In early developmental contexts, particularly in parent-child interaction analysis, alignment involves families and professionals such as speech-language pathologists (SLPs) who interpret children's everyday interactions from different roles. When multimodal large language models (MLLMs) are introduced to support this process, alignment becomes a question of how authority, responsibility, and emotional risk are distributed across stakeholders. Through a three-part study with five families and three SLPs, we trace how MLLM-generated outputs move from expert-facing analysis to parent-facing feedback. We propose layered community alignment: grounding representations in expert-aligned structures, mediating translation through professional guardrails, and enabling family-level adaptation within those boundaries. We argue that alignment in developmental settings should be treated as a community-governed process rather than an individual optimisation problem.

Authors:Shuo Niu, Yao Lyu, He Zhang, Na Li, Bumjin Kim, Jie Cai
Title: Monetizing Generative AI: YouTubers' Collective Knowledge on Earning from Generative AI Content
Abstract:
Generative Artificial Intelligence (GenAI) is reshaping creative labor by enabling the rapid production of text, images, and videos. On YouTube, creators are developing new ways to leverage these tools and share knowledge about how to pursue income through such strategies. However, little is known about what GenAI knowledge has been collectively constructed around monetizing GenAI as a community practice of acting both with and against algorithmically mediated platforms. We analyze 377 YouTube videos in which creators publicly promote workflows, revenue claims, and monetization strategies for GenAI-enabled content. Our analysis identifies ten shared use cases that frame AI-supported income opportunities, and examines how this GenAI knowledge repository embodies a collective effort to leverage platform infrastructures for monetization -- including advertising, direct sales, affiliate marketing, and revenue-sharing models. We further surface structural tensions in AI-mediated creative labor, including unverifiable income claims, content misappropriation, synthetic engagement practices, and shifting authorship norms. We conceptualize creators' collective understanding and adoption of GenAI in the context of monetizing creative labor, with implications for the design of creator-centered GenAI technologies and responsible platform policy.

Authors:Ivy Xiao He, Stefanie Tellex, Jason Xinyu Liu
Title: LEGS-POMDP: Language and Gesture-Guided Object Search in Partially Observable Environments
Abstract:
To assist humans in open-world environments, robots must interpret ambiguous instructions to locate desired objects. Foundation model-based approaches excel at multimodal grounding, but they lack a principled mechanism for modeling uncertainty in long-horizon tasks. In contrast, Partially Observable Markov Decision Processes (POMDPs) provide a systematic framework for planning under uncertainty but are often limited in supported modalities and rely on restrictive environment assumptions. We introduce LanguagE and Gesture-Guided Object Search in Partially Observable Environments (LEGS-POMDP), a modular POMDP system that integrates language, gesture, and visual observations for open-world object search. Unlike prior work, LEGS-POMDP explicitly models two sources of partial observability: uncertainty over the target object's identity and its spatial location. In simulation, multimodal fusion significantly outperforms unimodal baselines, achieving an average success rate of 89\% across challenging environments and object categories. Finally, we demonstrate the full system on a quadruped mobile manipulator, where real-world experiments qualitatively validate robust multimodal perception and uncertainty reduction under ambiguous instructions.

Authors:Saber Zerhoudi, Michael Granitzer
Title: Beyond the Click: A Framework for Inferring Cognitive Traces in Search
Abstract:
User simulators are essential for evaluating search systems, but they primarily copy user actions without understanding the underlying thought process. This gap exists since large-scale interaction logs record what users do, but not what they might be thinking or feeling, such as confusion or satisfaction. To solve this problem, we present a framework to infer cognitive traces from behavior logs. Our method uses a multi-agent system grounded in Information Foraging Theory (IFT) and human expert judgment. These traces improve model performance on tasks like forecasting session outcomes and user struggle recovery. We release a collection of annotations for several public datasets, including AOL and Stack Overflow, and an open-source tool that allows researchers to apply our method to their own data. This work provides the tools and data needed to build more human-like user simulators and to assess retrieval systems on user-oriented dimensions of performance.

Authors:Saber Zerhoudi, Michael Granitzer
Title: UXSim: Towards a Hybrid User Search Simulation
Abstract:
Simulating nuanced user experiences within complex interactive search systems poses distinct challenge for traditional methodologies, which often rely on static user proxies or, more recently, on standalone large language model (LLM) agents that may lack deep, verifiable grounding. The true dynamism and personalization inherent in human-computer interaction demand a more integrated approach. This work introduces UXSim, a novel framework that integrates both approaches. It leverages grounded data from traditional simulators to inform and constrain the reasoning of an adaptive LLM agent. This synthesis enables more accurate and dynamic simulations of user behavior while also providing a pathway for the explainable validation of the underlying cognitive processes.

Authors:Tianqi Song, Black Sun, Jingshu Li, Han Li, Chi-Lan Yang, Yijia Xu, Yi-Chieh Lee
Title: Understanding Older Adults' Experiences of Support, Concerns, and Risks from Kinship-Role AI-Generated Influencers
Abstract:
AI-generated influencers are rapidly gaining popularity on Chinese short-video platforms, often adopting kinship-based roles such as AI grandchildren to attract older adults. Although this trend has raised public concern, little is known about the design strategies behind these influencers, how older adults experience them, and the benefits and risks involved. In this study, we combined social media analysis with interviews to unpack the above questions. Our findings show that influencers use both visual and conversational cues to enact kinship roles, prompting audiences to engage in kinship-based role-play. Interviews further show that these cues arouse emotional resonance, help fulfill older adults' informational and emotional needs, while also raising concerns about emotional displacement and unequal emotional investment. We highlight the complex relationship between virtual avatars and real family ties, shaped by broader sociocultural norms, and discuss how AI might strengthen social support for older adults while mitigating risks within cultural contexts.

Authors:Weiyan Shi, Kenny Tsu Wei Choo
Title: A Taxonomy of Human--MLLM Interaction in Early-Stage Sketch-Based Design Ideation
Abstract:
As multimodal large language models (MLLMs) are increasingly integrated into early-stage design tools, it is important to understand how designers collaborate with AI during ideation. In a user study with 12 participants, we analysed sketch-based design interactions with an MLLM-powered system using automatically recorded interaction logs and post-task interviews. Based on how creative responsibility was allocated between humans and the AI, we predefined four interaction modes: Human-Only, Human-Lead, AI-Lead, and Co-Evolution, and analysed how these modes manifested during sketch-based design ideation. Our results show that designers rarely rely on a single mode; instead, human-led and AI-led roles are frequently interwoven and shift across ideation instances. These findings provide an empirical basis for future work to investigate why designers shift roles with AI and how interactive systems can better support such dynamic collaboration.

Authors:Yuting Deng, Melanie Brucks, Olivier Toubia
Title: Examining and Addressing Barriers to Diversity in LLM-Generated Ideas
Abstract:
Ideas generated by independent samples of humans tend to be more diverse than ideas generated from independent LLM samples, raising concerns that widespread reliance on LLMs could homogenize ideation and undermine innovation at a societal level. Drawing on cognitive psychology, we identify (both theoretically and empirically) two mechanisms undermining LLM idea diversity. First, at the individual level, LLMs exhibit fixation just as humans do, where early outputs constrain subsequent ideation. Second, at the collective level, LLMs aggregate knowledge into a unified distribution rather than exhibiting the knowledge partitioning inherent to human populations, where each person occupies a distinct region of the knowledge space. Through four studies, we demonstrate that targeted prompting interventions can address each mechanism independently: Chain-of-Thought (CoT) prompting reduces fixation by encouraging structured reasoning (only in LLMs, not humans), while ordinary personas (versus "creative entrepreneurs" such as Steve Jobs) improve knowledge partitioning by serving as diverse sampling cues, anchoring generation in distinct regions of the semantic space. Combining both approaches produces the highest idea diversity, outperforming humans. These findings offer a theoretically grounded framework for understanding LLM idea diversity and practical strategies for human-AI collaborations that leverage AI's efficiency without compromising the diversity essential to a healthy innovation ecosystem.

Authors:Lindsay Popowski, Xiyuan Wu, Charlotte Zhu, Tiziano Piccardi, Michael S. Bernstein
Title: Social Media Feed Elicitation
Abstract:
Social media users have repeatedly advocated for control over the currently opaque operations of feed algorithms. Large language models (LLMs) now offer the promise of custom-defined feeds--but users often fail to foresee the gaps and edge cases in how they define their custom feed. We introduce feed elicitation interviews, an interactive method that guides users through identifying these gaps and articulating their preferences to better author custom social media feeds. We deploy this approach in an online study to create custom BlueSky feeds and find that participants significantly prefer the feeds produced from their elicited preferences to those produced by users manually describing their feeds. Through feed elicitation interviews, we advance users' ability to control their social media experience, empowering them to describe and implement their desired feeds.

Authors:Erik Derner, Dalibor Kučera, Aditya Gulati, Ayoub Bagheri, Nuria Oliver
Title: Mind the Style: Impact of Communication Style on Human-Chatbot Interaction
Abstract:
Conversational agents increasingly mediate everyday digital interactions, yet the effects of their communication style on user experience and task success remain unclear. Addressing this gap, we describe the results of a between-subject user study where participants interact with one of two versions of a chatbot called NAVI which assists users in an interactive map-based 2D navigation task. The two chatbot versions differ only in communication style: one is friendly and supportive, while the other is direct and task-focused. Our results show that the friendly style increases subjective satisfaction and significantly improves task completion rates among female participants only, while no baseline differences between female and male participants were observed in a control condition without the chatbot. Furthermore, we find little evidence of users mimicking the chatbot's style, suggesting limited linguistic accommodation. These findings highlight the importance of user- and task-sensitive conversational agents and support that communication style personalization can meaningfully enhance interaction quality and performance.

Authors:Lev Tankelevitch, Ava Elizabeth Scott, Nagaravind Challakere, Payod Panda, Sean Rintel
Title: Nudging Attention to Workplace Meeting Goals: A Large-Scale, Preregistered Field Experiment
Abstract:
Ineffective meetings are pervasive. Thinking ahead explicitly about meeting goals may improve effectiveness, but current collaboration platforms lack integrated support. We tested a lightweight goal-reflection intervention in a preregistered field experiment in a global technology company (361 employees, 7196 meetings). Over two weeks, workers in the treatment group completed brief pre-meeting surveys in their collaboration platform, nudging attention to goals for upcoming meetings. To measure impact, both treatment and control groups completed post-meeting surveys about meeting effectiveness. While the intervention impact on meeting effectiveness was not statistically significant, mixed-methods findings revealed improvements in self-reported awareness and behaviour across both groups, with post-meeting surveys unintentionally functioning as an intervention. We highlight the promise of supporting goal reflection, while noting challenges of evaluating and supporting workplace reflection for meetings, including workflow and collaboration norms, and attitudes and behaviours around meeting preparation. We conclude with implications for designing technological support for meeting intentionality.

Authors:Jisung Shin, Daniel Platnick, Marjan Alirezaie, Hossein Rahnama
Title: Situation Graph Prediction: Structured Perspective Inference for User Modeling
Abstract:
Perspective-Aware AI requires modeling evolving internal states--goals, emotions, contexts--not merely preferences. Progress is limited by a data bottleneck: digital footprints are privacy-sensitive and perspective states are rarely labeled. We propose Situation Graph Prediction (SGP), a task that frames perspective modeling as an inverse inference problem: reconstructing structured, ontology-aligned representations of perspective from observable multimodal artifacts. To enable grounding without real labels, we use a structure-first synthetic generation strategy that aligns latent labels and observable traces by design. As a pilot, we construct a dataset and run a diagnostic study using retrieval-augmented in-context learning as a proxy for supervision. In our study with GPT-4o, we observe a gap between surface-level extraction and latent perspective inference--indicating latent-state inference is harder than surface extraction under our controlled setting. Results suggest SGP is non-trivial and provide evidence for the structure-first data synthesis strategy.

Authors:Amy Koike, Serena Ge Guo, Xinning He, Callie Y. Kim, Dakota Sullivan, Bilge Mutlu
Title: Elements of Robot Morphology: Supporting Designers in Robot Form Exploration
Abstract:
Robot morphology, the form, shape, and structure of robots, is a key design space in human-robot interaction (HRI), shaping how robots function, express themselves, and interact with people. Yet, despite its importance, little is known about how design frameworks can guide systematic form exploration. To address this gap, we introduce Elements of Robot Morphology, a framework that identifies five fundamental elements: perception, articulation, end effectors, locomotion, and structure. Derived from an analysis of existing robots, the framework supports structured exploration of diverse robot forms. To operationalize the framework, we developed Morphology Exploration Blocks (MEB), a set of tangible blocks that enable hands-on, collaborative experimentation with robot morphologies. We evaluate the framework and toolkit through a case study and design workshops, showing how they support analysis, ideation, reflection, and collaborative robot design.

Authors:Aditya Gulati, Nuria Oliver
Title: Why do we Trust Chatbots? From Normative Principles to Behavioral Drivers
Abstract:
As chatbots increasingly blur the boundary between automated systems and human conversation, the foundations of trust in these systems warrant closer examination. While regulatory and policy frameworks tend to define trust in normative terms, the trust users place in chatbots often emerges from behavioral mechanisms. In many cases, this trust is not earned through demonstrated trustworthiness but is instead shaped by interactional design choices that leverage cognitive biases to influence user behavior. Based on this observation, we propose reframing chatbots not as companions or assistants, but as highly skilled salespeople whose objectives are determined by the deploying organization. We argue that the coexistence of competing notions of "trust" under a shared term obscures important distinctions between psychological trust formation and normative trustworthiness. Addressing this gap requires further research and stronger support mechanisms to help users appropriately calibrate trust in conversational AI systems.

Authors:Han Meng, Qiuyuan Lyu, Peinuan Qin, Yitian Yang, Renwen Zhang, Wen-Chieh Lin, Yi-Chieh Lee
Title: Designing Computational Tools for Exploring Causal Relationships in Qualitative Data
Abstract:
Exploring causal relationships for qualitative data analysis in HCI and social science research enables the understanding of user needs and theory building. However, current computational tools primarily characterize and categorize qualitative data; the few systems that analyze causal relationships either inadequately consider context, lack credibility, or produce overly complex outputs. We first conducted a formative study with 15 participants interested in using computational tools for exploring causal relationships in qualitative data to understand their needs and derive design guidelines. Based on these findings, we designed and implemented QualCausal, a system that extracts and illustrates causal relationships through interactive causal network construction and multi-view visualization. A feedback study (n = 15) revealed that participants valued our system for reducing the analytical burden and providing cognitive scaffolding, yet navigated how such systems fit within their established research paradigms, practices, and habits. We discuss broader implications for designing computational tools that support qualitative data analysis.

Authors:Maia Stiber, Sameer Khan, Russell Taylor, Chien-Ming Huang
Title: Signal or 'Noise': Human Reactions to Robot Errors in the Wild
Abstract:
In the real world, robots frequently make errors, yet little is known about people's social responses to errors outside of lab settings. Prior work has shown that social signals are reliable and useful for error management in constrained interactions, but it is unclear if this holds in the real world - especially with a non-social robot in repeated and group interactions with successive or propagated errors. To explore this, we built a coffee robot and conducted a public field deployment ($N = 49$). We found that participants consistently expressed varied social signals in response to errors and other stimuli, particularly during group interactions. Our findings suggest that social signals in the wild are rich (with participants volunteering information about the interaction), but "noisy." We discuss lessons, benefits, and challenges for using social signals in real-world HRI.

Authors:Saleh Afzoon, Amin Beheshti, Usman Naseem
Title: PersoPilot: An Adaptive AI-Copilot for Transparent Contextualized Persona Classification and Personalized Response Generation
Abstract:
Understanding and classifying user personas is critical for delivering effective personalization. While persona information offers valuable insights, its full potential is realized only when contextualized, linking user characteristics with situational context to enable more precise and meaningful service provision. Existing systems often treat persona and context as separate inputs, limiting their ability to generate nuanced, adaptive interactions. To address this gap, we present PersoPilot, an agentic AI-Copilot that integrates persona understanding with contextual analysis to support both end users and analysts. End users interact through a transparent, explainable chat interface, where they can express preferences in natural language, request recommendations, and receive information tailored to their immediate task. On the analyst side, PersoPilot delivers a transparent, reasoning-powered labeling assistant, integrated with an active learning-driven classification process that adapts over time with new labeled data. This feedback loop enables targeted service recommendations and adaptive personalization, bridging the gap between raw persona data and actionable, context-aware insights. As an adaptable framework, PersoPilot is applicable to a broad range of service personalization scenarios.

Authors:Saleh Afzoon, MohammadHossein Ahmadi, Usman Naseem, Amin Beheshti
Title: PersoDPO: Scalable Preference Optimization for Instruction-Adherent, Persona-Grounded Dialogue via Multi-LLM Evaluation
Abstract:
Personalization and contextual coherence are two essential components in building effective persona-grounded dialogue systems. These aspects play a crucial role in enhancing user engagement and ensuring responses are more relevant and consistent with user identity. However, recent studies indicate that open-source large language models (LLMs) continue to struggle to generate responses that are both contextually grounded and aligned with persona cues, despite exhibiting strong general conversational abilities like fluency and naturalness. We present PersoDPO, a scalable preference optimisation framework that uses supervision signals from automatic evaluations of responses generated by both closed-source and open-source LLMs to fine-tune dialogue models. The framework integrates evaluation metrics targeting coherence and personalization, along with a length-format compliance feature to promote instruction adherence. These signals are combined to automatically construct high-quality preference pairs without manual annotation, enabling a scalable and reproducible training pipeline. Experiments on the FoCus dataset show that an open-source language model fine-tuned with the PersoDPO framework consistently outperforms strong open-source baselines and a standard Direct Preference Optimization (DPO) variant across multiple evaluation dimensions.

Authors:Alexander Loth, Dominique Conceicao Rosario, Peter Ebinger, Martin Kappes, Marc-Oliver Pahl
Title: Origin Lens: A Privacy-First Mobile Framework for Cryptographic Image Provenance and AI Detection
Abstract:
The proliferation of generative AI poses challenges for information integrity assurance, requiring systems that connect model governance with end-user verification. We present Origin Lens, a privacy-first mobile framework that targets visual disinformation through a layered verification architecture. Unlike server-side detection systems, Origin Lens performs cryptographic image provenance verification and AI detection locally on the device via a Rust/Flutter hybrid architecture. Our system integrates multiple signals - including cryptographic provenance, generative model fingerprints, and optional retrieval-augmented verification - to provide users with graded confidence indicators at the point of consumption. We discuss the framework's alignment with regulatory requirements (EU AI Act, DSA) and its role in verification infrastructure that complements platform-level mechanisms.

Authors:Awu Chen, Vera Yu Wu, Yunge Wen, Yaluo Wang, Jiaxuan Olivia Yin, Yichen Wang, Qian Xiang, Richard Zhang, Paul Pu Liang, Hiroshi Ishii
Title: Smell with Genji: Rediscovering Human Perception through an Olfactory Game with AI
Abstract:
Olfaction plays an important role in human perception, yet its subjective and ephemeral nature makes it difficult to articulate, compare, and share across individuals. Traditional practices like the Japanese incense game Genji-ko offer one way to structure olfactory experience through shared interpretation. In this work, we present Smell with Genji, an AI-mediated olfactory interaction system that reinterprets Genji-ko as a collaborative human-AI sensory experience. By integrating a game setup, a mobile application, and an LLM-powered co-smelling partner equipped with olfactory sensing and LLM-based conversation, the system invites participants to compare scents and construct Genji-mon patterns, fostering reflection through a dialogue that highlights the alignment and discrepancies between human and machine perception. This work illustrates how sensing-enabled AI can participate in olfactory experience alongside users, pointing toward new possibilities for AI-supported sensory interaction and reflection in HCI.

Authors:Ziwen Li, Ziang Xiao, Tianshi Li
Title: Disclose with Care: Designing Privacy Controls in Interview Chatbots
Abstract:
Collecting data on sensitive topics remains challenging in HCI, as participants often withhold information due to privacy concerns and social desirability bias. While chatbots' perceived anonymity may reduce these barriers, research paradoxically suggests people tend to over-share personal or sensitive information with chatbots. In this work, we explore privacy controls in chatbot interviews to address this problem. The privacy control allows participants to revise their transcripts at the end of the interview, featuring two design variants: free editing and AI-aided editing. In a between-subjects study \red{($N=188$)}, we compared no-editing, free-editing, and AI-aided editing conditions in a chatbot-based interview on a sensitive topic. Our results confirm the prevalent issue of oversharing in chatbot-based interviews and show that AI-aided editing serves as an effective privacy-control mechanism, reducing PII disclosure while maintaining data quality and user engagement, thereby offering a promising approach to balancing ethical practice and data quality in such interviews.

Authors:Avinash Ajit Nargund, Andrea M. Park, Tobias Höllerer, Misha Sra
Title: Embedded vs. Situated: An Evaluation of AR Facial Training Feedback
Abstract:
While augmented reality (AR) research demonstrates benefits of embedded visualizations for gross motor training, its applicability to facial exercises remains under-explored. Providing effective real-time feedback for facial muscle training presents unique design challenges, given the complexity of facial musculature. We developed three AR feedback approaches varying in spatial relationship to the user: situated (screen-fixed), proxy-embedded (on a mannequin), and fully embedded (overlaid on the user's face). In a within-subjects study (N=24), we measured exercise accuracy, cognitive load, and user preference during facial training tasks. The embedded feedback reduced cognitive load and received higher preference ratings, while the situated feedback enabled more precise corrections and higher accuracy. Qualitative analysis revealed a key design tension: embedded feedback improved experience but created self-consciousness and interpretive difficulty. We distill these insights into design considerations addressing the trade-offs for facial training systems, with implications for rehabilitation, performance training, and motor skill acquisition.

Authors:Avinash Ajit Nargund, Andrew L. Huard, Tobias Höllerer, Misha Sra
Title: Exploration of Radar-based Obstacle Visualizations to Support Safety and Presence in Camera-Free Outdoor VR
Abstract:
Outdoor virtual reality (VR) places users in dynamic physical environments where they must remain aware of real-world obstacles, including static structures and moving bystanders, while immersed in a virtual scene. This dual demand introduces challenges for both user safety and presence. Millimeter-wave (mmWave) radar offers a privacy-preserving alternative to camera-based sensing by detecting obstacles without capturing identifiable visual imagery, yet effective methods for communicating its sparse spatial information to users remain underexplored. In this work, we developed and validated WaveWalkerClone, a reproduction of the WaveWalker system, to establish reliable radar- and GPS-IMU-based sensing under varied outdoor lighting conditions. Building on this feasibility validation, we conducted a user study (n=18) comparing three visualization techniques for radar-detected obstacles : (1) diegetic alien avatars that visually embed obstacles within the virtual narrative, (2) non-diegetic human avatars represented obstacles as humans inconsistent with the virtual narrative, and (3) abstract point clouds centered around the obstacles conveying spatial data without anthropomorphic or narrative associations. Our results show that all three approaches supported user safety and situational awareness, but yielded distinct trade-offs in perceived effort, frustration, and user preference. Qualitative feedback further revealed divergent user responses across conditions, highlighting the limitations of a one-size-fits-all approach. We conclude with design considerations for obstacle visualization in outdoor VR systems that seek to balance immersion, safety, and bystander privacy.

Authors:Yuheng Shao, Junjie Xiong, Chaoran Wu, Xiyuan Wang, Ziyu Zhou, Yang Ouyang, Qinyi Tao, Quan Li
Title: WordCraft: Scaffolding the Keyword Method for L2 Vocabulary Learning with Multimodal LLMs
Abstract:
Applying the keyword method for vocabulary memorization remains a significant challenge for L1 Chinese-L2 English learners. They frequently struggle to generate phonologically appropriate keywords, construct coherent associations, and create vivid mental imagery to aid long-term retention. Existing approaches, including fully automated keyword generation and outcome-oriented mnemonic aids, either compromise learner engagement or lack adequate process-oriented guidance. To address these limitations, we conducted a formative study with L1 Chinese-L2 English learners and educators (N=18), which revealed key difficulties and requirements in applying the keyword method to vocabulary learning. Building on these insights, we introduce WordCraft, a learner-centered interactive tool powered by Multimodal Large Language Models (MLLMs). WordCraft scaffolds the keyword method by guiding learners through keyword selection, association construction, and image formation, thereby enhancing the effectiveness of vocabulary memorization. Two user studies demonstrate that WordCraft not only preserves the generation effect but also achieves high levels of effectiveness and usability.

Authors:MH Farhadi, Ali Rabiee, Sima Ghafoori, Anna Cetera, Andrew Fisher, Reza Abiri
Title: End-to-end Optimization of Belief and Policy Learning in Shared Autonomy Paradigms
Abstract:
Shared autonomy systems require principled methods for inferring user intent and determining appropriate assistance levels. This is a central challenge in human-robot interaction, where systems must be successful while being mindful of user agency. Previous approaches relied on static blending ratios or separated goal inference from assistance arbitration, leading to suboptimal performance in unstructured environments. We introduce BRACE (Bayesian Reinforcement Assistance with Context Encoding), a novel framework that fine-tunes Bayesian intent inference and context-adaptive assistance through an architecture enabling end-to-end gradient flow between intent inference and assistance arbitration. Our pipeline conditions collaborative control policies on environmental context and complete goal probability distributions. We provide analysis showing (1) optimal assistance levels should decrease with goal uncertainty and increase with environmental constraint severity, and (2) integrating belief information into policy learning yields a quadratic expected regret advantage over sequential approaches. We validated our algorithm against SOTA methods (IDA, DQN) using a three-part evaluation progressively isolating distinct challenges of end-effector control: (1) core human-interaction dynamics in a 2D human-in-the-loop cursor task, (2) non-linear dynamics of a robotic arm, and (3) integrated manipulation under goal ambiguity and environmental constraints. We demonstrate improvements over SOTA, achieving 6.3% higher success rates and 41% increased path efficiency, and 36.3% success rate and 87% path efficiency improvement over unassisted control. Our results confirmed that integrated optimization is most beneficial in complex, goal-ambiguous scenarios, and is generalizable across robotic domains requiring goal-directed assistance, advancing the SOTA for adaptive shared autonomy.

Authors:Alexander Loth, Martin Kappes, Marc-Oliver Pahl
Title: Eroding the Truth-Default: A Causal Analysis of Human Susceptibility to Foundation Model Hallucinations and Disinformation in the Wild
Abstract:
As foundation models (FMs) approach human-level fluency, distinguishing synthetic from organic content has become a key challenge for Trustworthy Web Intelligence. This paper presents JudgeGPT and RogueGPT, a dual-axis framework that decouples "authenticity" from "attribution" to investigate the mechanisms of human susceptibility. Analyzing 918 evaluations across five FMs (including GPT-4 and Llama-2), we employ Structural Causal Models (SCMs) as a principal framework for formulating testable causal hypotheses about detection accuracy. Contrary to partisan narratives, we find that political orientation shows a negligible association with detection performance ($r=-0.10$). Instead, "fake news familiarity" emerges as a candidate mediator ($r=0.35$), suggesting that exposure may function as adversarial training for human discriminators. We identify a "fluency trap" where GPT-4 outputs (HumanMachineScore: 0.20) bypass Source Monitoring mechanisms, rendering them indistinguishable from human text. These findings suggest that "pre-bunking" interventions should target cognitive source monitoring rather than demographic segmentation to ensure trustworthy information ecosystems.

Authors:Jeanne Malécot, Hamed Rahimi, Jeanne Cattoni, Marie Samson, Mouad Abrini, Mahdi Khoramshahi, Maribel Pino, Mohamed Chetouani
Title: HARMONI: Multimodal Personalization of Multi-User Human-Robot Interactions with LLMs
Abstract:
Existing human-robot interaction systems often lack mechanisms for sustained personalization and dynamic adaptation in multi-user environments, limiting their effectiveness in real-world deployments. We present HARMONI, a multimodal personalization framework that leverages large language models to enable socially assistive robots to manage long-term multi-user interactions. The framework integrates four key modules: (i) a perception module that identifies active speakers and extracts multimodal input; (ii) a world modeling module that maintains representations of the environment and short-term conversational context; (iii) a user modeling module that updates long-term speaker-specific profiles; and (iv) a generation module that produces contextually grounded and ethically informed responses. Through extensive evaluation and ablation studies on four datasets, as well as a real-world scenario-driven user-study in a nursing home environment, we demonstrate that HARMONI supports robust speaker identification, online memory updating, and ethically aligned personalization, outperforming baseline LLM-driven approaches in user modeling accuracy, personalization quality, and user satisfaction.

Authors:Yuansong Xu, Yichao Zhu, Haokai Wang, Yuchen Wu, Yang Ouyang, Hanlu Li, Wenzhe Zhou, Xinyu Liu, Chang Jiang, Quan Li
Title: "Do I Trust the AI?" Towards Trustworthy AI-Assisted Diagnosis: Understanding User Perception in LLM-Supported Reasoning
Abstract:
Large language models (LLMs) have shown considerable potential in supporting medical diagnosis. However, their effective integration into clinical workflows is hindered by physicians' difficulties in perceiving and trusting LLM capabilities, which often results in miscalibrated trust. Existing model evaluations primarily emphasize standardized benchmarks and predefined tasks, offering limited insights into clinical reasoning practices. Moreover, research on human-AI collaboration has rarely examined physicians' perceptions of LLMs' clinical reasoning capability. In this work, we investigate how physicians perceive LLMs' capabilities in the clinical reasoning process. We designed clinical cases, collected the corresponding analyses, and obtained evaluations from physicians (N=37) to quantitatively represent their perceived LLM diagnostic capabilities. By comparing the perceived evaluations with benchmark performance, our study highlights the aspects of clinical reasoning that physicians value and underscores the limitations of benchmark-based evaluation. We further discuss the implications of opportunities for enhancing trustworthy collaboration between physicians and LLMs in LLM-supported clinical reasoning.

Authors:Yang Ouyang, Shenghan Gao, Ruichuan Wang, Hailiang Zhu, Yuheng Shao, Xiaoyu Gu, Quan Li
Title: CommSense: Facilitating Bias-Aware and Reflective Navigation of Online Comments for Rational Judgment
Abstract:
Online comments significantly influence users' judgments, yet their presentation, often determined by platform algorithms, can introduce biases, such as anchoring effects, which distort reasoning. While existing research emphasizes mitigating individual cognitive biases, the evolution of user judgments during comment engagement remains overlooked. This study investigates how presentation cues impact reasoning and explores interface design strategies to mitigate bias. Through a preliminary experiment (N=18) and a co-design workshop, we identified key challenges users face across a four-stage process and distilled four design requirements: pre-engagement framing, interactive organization, reflective prompts, and synthesis support. Based on these insights, we developed CommSense, an on-the-fly plugin that enhances user engagement with online comments by providing visual overviews and lightweight prompts to guide reasoning. A between-subject evaluation (N=24) demonstrates that CommSense improves bias awareness and reflective thinking, helping users produce more comprehensive, evidence-based rationales while maintaining high usability.

Authors:Yang Ouyang, Yuansong Xu, Chang Jiang, Yifan Jin, Haoran Jiang, Quan Li
Title: CaseMaster: Designing and Evaluating a Probe for Oral Case Presentation Training with LLM Assistance
Abstract:
Preparing an oral case presentation (OCP) is a crucial skill for medical students, requiring clear communication of patient information, clinical findings, and treatment plans. However, inconsistent student participation and limited guidance can make this task challenging. While Large Language Models (LLMs) can provide structured content to streamline the process, their role in facilitating skill development and supporting medical education integration remains underexplored. To address this, we conducted a formative study with six medical educators and developed CaseMaster, an interactive probe that leverages LLM-generated content tailored to medical education to help users enhance their OCP skills. The controlled study suggests CaseMaster has the potential to both improve presentation quality and reduce workload compared to traditional methods, an implication reinforced by expert feedback. We propose guidelines for educators to develop adaptive, user-centered training methods using LLMs, while considering the implications of integrating advanced technologies into medical education.

Authors:Shiwei Wu, Ziyao Gao, Zhendong He, Zongtan He, Zhupeng Huang, Xia Chen, Wei Zeng, Xiaojuan Ma, Zhenhui Peng
Title: InkIdeator: Supporting Chinese-Style Visual Design Ideation via AI-Infused Exploration of Chinese Paintings
Abstract:
Visual designers often seek inspiration from Chinese paintings when tasked with creating Chinese-style illustrations, posters, etc. Our formative study (N=10) reveals that during ideation, designers learn the cultural symbols, emotions, compositions, and styles in Chinese paintings but face challenges in searching, analyzing, and integrating these dimensions. This paper leverages multi-modal large models to annotate the value of each dimension in 16,315 Chinese paintings, built on which we propose InkIdeator, an ideation support system for Chinese-style visual designs. InkIdeator suggests cultural symbols associated with the task theme, provides dimensional keywords to help analyze Chinese paintings, and generates visual examples integrating user-selected keywords. Our within-subjects study (N=12) using a baseline system without extracted dimensional keywords, along with two extended use cases by Chinese painters, indicates InkIdeator's effectiveness in creative ideation support, helping users efficiently explore cultural dimensions in Chinese paintings and visualize their ideas. We discuss implications for supporting culture-related visual design ideation with generative AI.

Authors:Peinuan Qin, Chi-Lan Yang, Nattapat Boonprakong, Jingzhu Chen, Yugin Tan, Yi-Chieh Lee
Title: AI Personalization Paradox: Personalized AI Increases Superficial Engagement in Reading while Undermines Autonomy and Ownership in Writing
Abstract:
AI-assisted writing raises concerns about autonomy and ownership when benefiting writers. Personalization has been proposed as an effective solution while also risking writers' reliance on AI and behavior shifting. For better personalization design, existing studies rely on interaction and information solely within the writing phase; however, few studies have examined how reading behaviors can inform personalized writing. This study investigates the effects of integrating reading highlights for personalization on AI-assisted writing. A between-subjects study with 46 participants revealed that the personalization condition encouraged participants to produce more highlights. However, highlighting unexpectedly shifted from a sense-making strategy to an instrumental act of "feeding the AI," leading to significant reliance on AI and declines in writers' sense of autonomy, ownership, and self-credit. These findings indicate personalization risks in AI-assisted writing, emphasize the importance of personalization strategies, and provide design implications.

Authors:Anqi Wang, Zhengyi Li, Lan Luo, Xin Tong, Pan Hui
Title: Reflexa: Uncovering How LLM-Supported Reflection Scaffolding Reshapes Creativity in Creative Coding
Abstract:
Creative coding requires continuous translation between evolving concepts and computational artifacts, making reflection essential yet difficult to sustain. Creators often struggle to manage ambiguous intentions, emergent outputs, and complex code, limiting depth of exploration. This work examines how large language models (LLMs) can scaffold reflection not as isolated prompts, but as a system-level mechanism shaping creative regulation. From formative studies with eight expert creators, we derived reflection challenges and design principles that informed Reflexa, an integrated scaffold combining dialogic guidance, visualized version navigation, and iterative suggestion pathways. A within-subject study with 18 participants provides an exploratory mechanism validation, showing that structured reflection patterns mediate the link between AI interaction and creative outcomes. These reflection trajectories enhanced perceived controllability, broadened exploration, and improved originality and aesthetic quality. Our findings advance HCI understanding of reflection from LLM-assisted creative practices, and provide design strategies for building LLM-based creative tools that support richer human-AI co-creativity.

Authors:Xiaokang Lei, Ching Christie Pang, Yuyang Jiang, Xin Tong, Pan Hui
Title: Co-Designing Digital Humans for Online Learning: A Framework for Human-AI Pedagogical Integration
Abstract:
Artificial intelligence (AI) and large language models (LLMs) are reshaping education, with virtual avatars emerging as digital teachers capable of enhancing engagement, sustaining attention, and addressing instructor shortages. Aligned with the Sustainable Development Goals (SDGs) for equitable quality education, these technologies hold promise yet lack clear guidelines for effective design and implementation in online learning. To fill this gap, we introduce a framework specifying when, what, and how digital teachers should be integrated. Our study combines (1) a design space analysis of 87 works across AI, educational technology, design, and HCI, (2) a survey of 132 learners' practices and preferences, and (3) three co-design workshops with 18 experts from pedagogy, design, and AI. It provides actionable guidance for educators, designers, and HCI researchers, advancing opportunities to build more engaging, equitable, and effective online learning environments powered by digital teachers.

Authors:Mehrnoosh Sadat Shirvani, Jackie Crowley, Cher Peng, Jackie Liu, Thomas Chao, Suky Martinez, Laura Brandt, Ig-Jae Kim, Dongwook Yoon
Title: Cloning the Self for Mental Well-Being: A Framework for Designing Safe and Therapeutic Self-Clone Chatbots
Abstract:
As digital tools increasingly mediate mental health care, self-clone chatbots can offer a uniquely novel approach to intra-personal exploration and self-derived support. Trained to replicate users' conversational patterns, self-clones allow users to talk to themselves through their digital replicas. Despite the promises, these systems may carry risks around identity confusion, negative reinforcement, and blurred user agency. Through interviews with 16 mental health professionals and 6 general users, we aim to uncover tensions and design opportunities in this emerging space to guide responsible self-clone design. Our analysis produces a design framework organized around three priorities: (1) defining goals and grounding the approach in existing therapeutic models, (2) design dimensions including the self-clone persona and user-clone relationship dynamics, and (3) considerations for minimizing potential emotional and ethical harms. This framework contributes an interdisciplinary foundation for designing self-clone chatbots as AI-mediated self-interaction tools that are emotionally and ethically attuned in mental health contexts.

Authors:Yi-Chieh Lee, Junti Zhang, Tianqi Song, Yugin Tan
Title: Conversational AI for Social Good (CAI4SG): An Overview of Emerging Trends, Applications, and Challenges
Abstract:
The integration of Conversational Agents (CAs) into daily life offers opportunities to tackle global challenges, leading to the emergence of Conversational AI for Social Good (CAI4SG). This paper examines the advancements of CAI4SG using a role-based framework that categorizes systems according to their AI autonomy and emotional engagement. This framework emphasizes the importance of considering the role of CAs in social good contexts, such as serving as empathetic supporters in mental health or functioning as assistants for accessibility. Additionally, exploring the deployment of CAs in various roles raises unique challenges, including algorithmic bias, data privacy, and potential socio-technical harms. These issues can differ based on the CA's role and level of engagement. This paper provides an overview of the current landscape, offering a role-based understanding that can guide future research and design aimed at the equitable, ethical, and effective development of CAI4SG.

Authors:Taufiq Daryanto, Xiaohan Ding, Kaike Ping, Lance T. Wilhelm, Yan Chen, Chris Brown, Eugenia H. Rho
Title: Human-Human-AI Triadic Programming: Uncovering the Role of AI Agent and the Value of Human Partner in Collaborative Learning
Abstract:
As AI assistance becomes embedded in programming practice, researchers have increasingly examined how these systems help learners generate code and work more efficiently. However, these studies often position AI as a replacement for human collaboration and overlook the social and learning-oriented aspects that emerge in collaborative programming. Our work introduces human-human-AI (HHAI) triadic programming, where an AI agent serves as an additional collaborator rather than a substitute for a human partner. Through a within-subjects study with 20 participants, we show that triadic collaboration enhances collaborative learning and social presence compared to the dyadic human-AI (HAI) baseline. In the triadic HHAI conditions, participants relied significantly less on AI-generated code in their work. This effect was strongest in the HHAI-shared condition, where participants had an increased sense of responsibility to understand AI suggestions before applying them. These findings demonstrate how triadic settings activate socially shared regulation of learning by making AI use visible and accountable to a human peer, suggesting that AI systems that augment rather than automate peer collaboration can better preserve the learning processes that collaborative programming relies on.

Authors:Savvas Petridis, Michael Xieyang Liu, Alexander J. Fiannaca, Carrie J. Cai, Michael Terry
Title: Compass vs Railway Tracks: Unpacking User Mental Models for Communicating Long-Horizon Work to Humans vs. AI
Abstract:
As AI systems (foundation models, agentic systems) grow increasingly capable of operating for minutes or hours at a time, users' prompts are transforming into highly detailed, elaborate specifications for the AI to autonomously work on. While interactive prompting has been extensively studied, comparatively less is known about how people communicate specifications for these types of long-horizon tasks. In a qualitative study in which 16 professionals drafted specifications for both a human colleague and an AI, we found a core divergence in how people specified problems to people versus AI: people approached communication with humans as providing a "compass", offering high-level intent to encourage flexible exploration. In contrast, communication with AI resembled painstakingly laying down "railway tracks": rigid, exhaustive instructions to minimize ambiguity and deviation. This strategy was driven by a perception that current AI has limited ability to infer intent, prioritize, and make judgments on its own. When envisioning an idealAI collaborator, users expressed a desire for a hybrid between current AI and human colleagues: a collaborator that blends AI's efficiency and large context window with the critical thinking and agency of a human colleague. We discuss design implications for future AI systems, proposing that they align on outcomes through generated rough drafts, verify feasibility via end-to-end "test runs," and monitor execution through intelligent check-ins, ultimately transforming AI from a passive instruction-follower into a reliable collaborator for ambiguous, long-horizon problems.

Authors:Yu Yang, Ig-Jae Kim, Dongwook Yoon
Title: PASTA: A Scalable Framework for Multi-Policy AI Compliance Evaluation
Abstract:
AI compliance is becoming increasingly critical as AI systems grow more powerful and pervasive. Yet the rapid expansion of AI policies creates substantial burdens for resource-constrained practitioners lacking policy expertise. Existing approaches typically address one policy at a time, making multi-policy compliance costly. We present PASTA, a scalable compliance tool integrating four innovations: (1) a comprehensive model-card format supporting descriptive inputs across development stages; (2) a policy normalization scheme; (3) an efficient LLM-powered pairwise evaluation engine with cost-saving strategies; and (4) an interface delivering interpretable evaluations via compliance heatmaps and actionable recommendations. Expert evaluation shows PASTA's judgments closely align with human experts ($ρ\geq .626$). The system evaluates five major policies in under two minutes at approximately \$3. A user study (N = 12) confirms practitioners found outputs easy-to-understand and actionable, introducing a novel framework for scalable automated AI governance.

Authors:Saber Zerhoudi, Michael Granitzer
Title: From SERPs to Agents: A Platform for Comparative Studies of Information Interaction
Abstract:
The diversification of information access systems, from RAG to autonomous agents, creates a critical need for comparative user studies. However, the technical overhead to deploy and manage these distinct systems is a major barrier. We present UXLab, an open-source system for web-based user studies that addresses this challenge. Its core is a web-based dashboard enabling the complete, no-code configuration of complex experimental designs. Researchers can visually manage the full study, from recruitment to comparing backends like traditional search, vector databases, and LLMs. We demonstrate UXLab's value via a micro case study comparing user behavior with RAG versus an autonomous agent. UXLab allows researchers to focus on experimental design and analysis, supporting future multi-modal interaction research.

Authors:Saber Zerhoudi, Michael Granitzer
Title: In-Browser Agents for Search Assistance
Abstract:
A fundamental tension exists between the demand for sophisticated AI assistance in web search and the need for user data privacy. Current centralized models require users to transmit sensitive browsing data to external services, which limits user control. In this paper, we present a browser extension that provides a viable in-browser alternative. We introduce a hybrid architecture that functions entirely on the client side, combining two components: (1) an adaptive probabilistic model that learns a user's behavioral policy from direct feedback, and (2) a Small Language Model (SLM), running in the browser, which is grounded by the probabilistic model to generate context-aware suggestions. To evaluate this approach, we conducted a three-week longitudinal user study with 18 participants. Our results show that this privacy-preserving approach is highly effective at adapting to individual user behavior, leading to measurably improved search efficiency. This work demonstrates that sophisticated AI assistance is achievable without compromising user privacy or data control.

Authors:Shuyu Zhang, Yujie Liu, Xinru Wang, Cheng Zhang, Yanmin Zhu, Bin Li
Title: DarwinTOD: LLM Driven Lifelong Self Evolution for Task Oriented Dialog Systems
Abstract:
Traditional task-oriented dialog systems are unable to evolve from ongoing interactions or adapt to new domains after deployment, that is a critical limitation in real-world dynamic environments. Continual learning approaches depend on episodic retraining with human curated data, failing to achieve autonomy lifelong improvement. While evolutionary computation and LLM driven self improvement offer promising mechanisms for dialog optimization, they lack a unified framework for holistic, iterative strategy refinement. To bridge this gap, we propose DarwinTOD, a lifelong self evolving dialog framework that systematically integrates these two paradigms, enabling continuous strategy optimization from a zero-shot base without task specific fine-tuning. DarwinTOD maintains an Evolvable Strategy Bank and operates through a dual-loop process: online multi-agent dialog execution with peer critique, and offline structured evolutionary operations that refine the strategy bank using accumulated feedback. This closed-loop design enables autonomous continuous improvement without human intervention. Extensive experiments show that DarwinTOD surpasses previous state-of-the-art methods and exhibits continuous performance gains throughout evolution. Our work provides a novel framework for building dialog systems with lifelong self evolution capabilities.

Authors:Tianwang Jia, Xiaoqing Chen, Dongrui Wu
Title: SAFE: Secure and Accurate Federated Learning for Privacy-Preserving Brain-Computer Interfaces
Abstract:
Electroencephalogram (EEG)-based brain-computer interfaces (BCIs) are widely adopted due to their efficiency and portability; however, their decoding algorithms still face multiple challenges, including inadequate generalization, adversarial vulnerability, and privacy leakage. This paper proposes Secure and Accurate FEderated learning (SAFE), a federated learning-based approach that protects user privacy by keeping data local during model training. SAFE employs local batch-specific normalization to mitigate cross-subject feature distribution shifts and hence improves model generalization. It further enhances adversarial robustness by introducing perturbations in both the input space and the parameter space through federated adversarial training and adversarial weight perturbation. Experiments on five EEG datasets from motor imagery (MI) and event-related potential (ERP) BCI paradigms demonstrated that SAFE consistently outperformed 14 state-of-the-art approaches in both decoding accuracy and adversarial robustness, while ensuring privacy protection. Notably, it even outperformed centralized training approaches that do not consider privacy protection at all. To our knowledge, SAFE is the first algorithm to simultaneously achieve high decoding accuracy, strong adversarial robustness, and reliable privacy protection without using any calibration data from the target subject, making it highly desirable for real-world BCIs.

Authors:Guanyu Chen, Chenxiao Yu, Xiyang Hu
Title: Value-Action Alignment in Large Language Models under Privacy-Prosocial Conflict
Abstract:
Large language models (LLMs) are increasingly used to simulate decision-making tasks involving personal data sharing, where privacy concerns and prosocial motivations can push choices in opposite directions. Existing evaluations often measure privacy-related attitudes or sharing intentions in isolation, which makes it difficult to determine whether a model's expressed values jointly predict its downstream data-sharing actions as in real human behaviors. We introduce a context-based assessment protocol that sequentially administers standardized questionnaires for privacy attitudes, prosocialness, and acceptance of data sharing within a bounded, history-carrying session. To evaluate value-action alignments under competing attitudes, we use multi-group structural equation modeling (MGSEM) to identify relations from privacy concerns and prosocialness to data sharing. We propose Value-Action Alignment Rate (VAAR), a human-referenced directional agreement metric that aggregates path-level evidence for expected signs. Across multiple LLMs, we observe stable but model-specific Privacy-PSA-AoDS profiles, and substantial heterogeneity in value-action alignment.

Authors:Bijean Ghafouri, Eun Cheol Choi, Priyanka Dey, Emilio Ferrara
Title: Measuring Human Preferences in RLHF is a Social Science Problem
Abstract:
RLHF assumes that annotation responses reflect genuine human preferences. We argue this assumption warrants systematic examination, and that behavioral science offers frameworks that bring clarity to when it holds and when it breaks down. Behavioral scientists have documented for sixty years that people routinely produce responses without holding genuine opinions, construct preferences on the spot based on contextual cues, and interpret identical questions differently. These phenomena are pervasive for precisely the value-laden judgments that matter most for alignment, yet this literature has not yet been systematically integrated into ML practice. We argue that the ML community must treat measurement validity as logically prior to preference aggregation. Specifically, we contend that measuring human preferences in RLHF is a social science problem. We present a taxonomy distinguishing genuine preferences from non-attitudes, constructed preferences, and measurement artifacts, along with diagnostic approaches for detecting each. This framework has two important implications. First, it raises the question of whether current RLHF practice may be systematically modeling noise as signal and elicitation artifacts as human values. Second, it provides a path forward by suggesting diagnostic tools that can distinguish valid preferences from artifacts before they enter the training pipeline.

Authors:Yibo Meng, Guangrui Fan, Bingyi Liu, Yingfangzhong Sun, Ruiqi Chen, Haipeng Mi
Title: Engagement Is Not Transfer: A Withdrawal Study of a Consumer Social Robot with Autistic Children at Home
Abstract:
This study examines whether engagement with social robots translates into improved human-directed social abilities in autistic children. We conducted an 8-week home-based randomized controlled trial with 40 children aged 5--9 using a commercial social robot (Qrobot). Families were assigned to either continued robot access or robot withdrawal. Quantitative measures and caregiver interviews assessed anxiety, social motivation, emotion inference, and empathy. Results showed that continued robot access significantly reduced anxiety, confirming strong affective benefits and high usability. However, children in the withdrawal group demonstrated greater improvements in social motivation, emotion understanding, and empathic behaviors toward caregivers and peers. Qualitative findings revealed a "handoff versus siloing" pattern: withdrawal promoted reorientation toward human social interaction, while continued access concentrated engagement within the child--robot dyad and limited transfer to real-world contexts. We interpret these results as evidence that high engagement does not guarantee social transfer.

Authors:Hita Kambhamettu, Will Crichton, Sean Welleck, Harrison Goldstein, Andrew Head
Title: Making Written Theorems Explorable by Grounding Them in Formal Representations
Abstract:
LLM-generated explanations can make technical content more accessible, but there is a ceiling on what they can support interactively. Because LLM outputs are static text, they cannot be executed or stepped through. We argue that grounding explanations in a formalized representation enables interactive affordances beyond what static text supports. We instantiate this idea for mathematical proof comprehension with explorable theorems, a system that uses LLMs to translate a theorem and its written proof into Lean, a programming language for machine-checked proofs, and links the written proof with the Lean code. Readers can work through the proof at a step-level granularity, test custom examples or counterexamples, and trace the logical dependencies bridging each step. Each worked-out step is produced by executing the Lean proof on that example and extracting its intermediate state. A user study ($n = 16$) shows potential advantages of this approach: in a proof-reading task, participants who had access to the provided explorability features gave better, more correct, and more detailed answers to comprehension questions, demonstrating a stronger overall understanding of the underlying mathematics.

Authors:Celia Chen, Alex Leitch, Scotty Beland, Ingo Burghardt, William Conway, Rajesh Kumar Gnanasekaran, Marilyn Harbert, Emily Klein, Jennifer Golbeck
Title: Red Flags and Cherry Picking: Reading The Scientific Blackpill Wiki
Abstract:
Incels are an online community of men who share a belief in extreme misogyny, the glorification of violence, and biological essentialism. They refer to their core ideology as "The Blackpill", a belief that physical attraction is the only path to romantic success and that women are only attracted to one very specific, hypermasculine archetype. This is not only a belief system; incels believe their ideology grounded in hard science. The research that incels use as evidence of their belief system is collected in an extensive online document, the Scientific Blackpill wiki page. In this research, we analyze the claims made on the wiki against the research cited to assess how the wiki authors are using or misusing science in support of their ideology. We find that the page largely cites legitimate science and describes it partly or mostly accurately. However, in discussing it, the results are often overgeneralized, stripped of context, or otherwise distorted to support the preexisting incel viewpoint. This echoes previous findings about motivated reasoning and borrowing scientific legitimacy in other misinformation and conspiracy-minded ideologies. We discuss the implications this has for understanding online radicalization and information quality.

Authors:Mingda Han, Huanqi Yang, Zehua Sun, Wenhao Li, Yanni Yang, Guoming Zhang, Yetong Cao, Weitao Xu, Pengfei Hu
Title: RAGent: Physics-Aware Agentic Reasoning for Training-Free mmWave Human Activity Recognition
Abstract:
Millimeter-wave (mmWave) radar enables privacy-preserving human activity recognition (HAR), yet real-world deployment remains hindered by costly annotation and poor transferability under domain shift. Although prior efforts partially alleviate these challenges, most still require retraining or adaptation for each new deployment setting. This keeps mmWave HAR in a repeated collect-tune-redeploy cycle, making scalable real-world deployment difficult. In this paper, we present RAGent, a deployment-time training-free framework for mmWave HAR that reformulates recognition as evidence-grounded inference over reusable radar knowledge rather than deployment-specific model optimization. Offline, RAGent constructs a reusable radar knowledge base through constrained cross-modal supervision, where a Vision-Language Model (VLM) transfers activity semantics from synchronized videos to paired radar segments without manual radar annotation. At deployment time, RAGent recognizes activities from radar alone by retrieving physically comparable precedents in an explicit kinematic space and resolving the final label through structured multi-role reasoning. The reasoning protocol is further refined offline through zero-gradient self-evolution. Extensive experiments on a self-collected dataset show that RAGent achieves 93.39% accuracy without per-domain retraining or target-domain adaptation, while generalizing robustly across domains.

Authors:Mingda Han, Huanqi Yang, Chaoqun Li, Wenhao Li, Guoming Zhang, Yanni Yang, Yetong Cao, Weitao Xu, Pengfei Hu
Title: VoxAnchor: Grounding Speech Authenticity in Throat Vibration via mmWave Radar
Abstract:
Rapid advances in speech synthesis and audio editing have made realistic forgeries increasingly accessible, yet existing detection methods remain vulnerable to tampering or depend on visual/wearable sensors. In this paper, we present VoxAnchor, a system that physically grounds audio authentication in vocal dynamics by leveraging the inherent coherence between speech acoustics and radar-sensed throat vibrations. VoxAnchor uses contactless millimeter-wave radar to capture fine-grained throat vibrations that are tightly coupled with human speech production, establishing a hard-to-forge anchor rooted in human physiology. The design comprises three main components: (1) a cross-modal frame-work that uses modality-specific encoders and contrastive learning to detect subtle mismatches at word granularity; (2) a phase-aware pipeline that extracts physically consistent, temporally faithful throat vibrations; and (3) a dual-stage strategy that combines signal-level onset detection and semantic-level coherence to align asynchronous radar and audio streams. Unlike liveness detection, which only confirms whether speech occurred, VoxAnchor verifies what was spoken through word-level content consistency, exposing localized edits that preserve identity and global authenticity cues. Extensive evaluations show that VoxAnchor achieves robust, fine-grained detection across diverse forgeries (editing, splicing, replay, deepfake) and conditions, with an overall EER of 0.017, low latency, and modest computational cost.

Authors:Ching Christie Pang, Xuetong Wang, Yuk Hang Tsui, Pan Hui
Title: The Decline of Online Knowledge Communities: Obstacles, Workarounds, and Sustainability
Abstract:
Online knowledge communities (OKC) such as Stack Exchange, Reddit, and Zhihu have long functioned as socio technical infrastructures for collective problem solving. The rapid adoption of Generative AI (GenAI) introduces both complementarity and substitution. Large language models (LLMs) offer faster, more accessible drafts, yet divert traffic and contributions away from OKC that also provided their training data. To understand how communities adapt under this systemic shock, we report a mixed-methods study combining an online survey (N=217) and interviews with 11 current users. Findings show that while users increasingly rely on AI for convenience, they still turn to OKC for complex, ambiguous, or trust sensitive questions. Participants express polarized attitudes toward AI, reflecting divergent hopes and uncertainties about its role. Yet across perspectives, sustaining sociability, empathy, and reciprocity emerges as essential for community resilience. We argue that GenAI's impact constitutes not a terminal decline but a design challenge: to reimagine socio-technical complementarities that balance automation's efficiency with human judgment, trust, and collective stewardship in the evolving knowledge commons. To decline or sustain, it is now or never to take action.

Authors:Jiamin Zheng, Yue Deng, Jessica Chen, Shujun Li, Yixin Zou, Jingjie Li
Title: Characterizing Scam-Driven Human Trafficking Across Chinese Borders and Online Community Responses on RedNote
Abstract:
A new form of human trafficking has emerged across Chinese borders, where individuals are lured to Southeast Asia with fraudulent job offers and then coerced into operating online scams. Despite its massive economic and human toll, this scam-driven trafficking remains underexplored in academic research. Through qualitative analysis of 158 RedNote posts, we examined how Chinese online communities respond to this threat. Our findings reveal that perpetrators exploit cultural ties to recruit victims for cybercriminal roles within self-sustaining compounds, using sophisticated manipulation tactics. Survivors face serious reintegration barriers, including family rejection, as the cultural values that enable trafficking also hinder their recovery. While communities present protective strategies, efforts are complicated by doubts about the reliability of support and cross-border coordination. We discuss key implications for prevention, platform governance, and international cooperation against scam-driven trafficking. Warning: This paper contains descriptions of physical, psychological, and sexual abuse.

Authors:Wenhao Yang, Runzhi He, Minghui Zhou
Title: Beyond Banning AI: A First Look at GenAI Governance in Open Source Software Communities
Abstract:
Generative AI (GenAI) is playing an increasingly important role in open source software (OSS). Beyond completing code and documentation, GenAI is increasingly involved in issues, pull requests, code reviews, and security reports. Yet, cheaper generation does not mean cheaper review - and the resulting maintenance burden has pushed OSS projects to experiment with GenAI-specific rules in contribution guidelines, security policies, and repository instructions, even including a total ban on AI-assisted contributions. However, governing GenAI in OSS is far more than a ban-or-not question. The responses remain scattered, with neither a shared governance framework in practice nor a systematic understanding in research. Therefore, in this paper, we conduct a multi-stage analysis on various qualitative materials related to GenAI governance retrieved from 67 highly visible OSS projects. Our analysis identifies recurring concerns across contribution workflows, derives three governance orientations, and maps out 12 governance strategies and their implementation patterns. We show that governing GenAI in OSS extends well beyond banning - it requires coordinated responses across accountability, verification, review capacity, code provenance, and platform infrastructure. Overall, our work distills dispersed community practices into a structured overview, providing a conceptual baseline for researchers and a practical reference for maintainers and platform designers.

Authors:Xi Lu, Di Hu, An T. Nguyen, Brad Morse, Lisa M. Schilling, Kai Zheng, Michelle S. Keller, Lucila Ohno-Machado, Yunan Chen
Title: We Need Granular Sharing of De-Identified Data-But Will Patients Engage? Investigating Health System Leaders' and Patients' Perspectives on A Patient-Controlled Data-Sharing Platform
Abstract:
Patient-controlled data-sharing systems are increasingly promoted as a way to empower patients with greater autonomy over their health data. Yet it remains unclear how different stakeholders, especially patients and health system leaders, perceive the benefits and challenges of enabling granular control over the sharing of de-identified medical data for research. To address this gap, we developed a high-fidelity prototype of a patient-controlled, web-based consent platform and conducted a two-phase mixed-methods study:semi-structured interviews with 16 health system leaders and a survey with 523 patient participants. While both groups appreciated the potential of such a platform to enhance transparency and autonomy, their views diverged in meaningful ways. Leaders viewed transparency and granular control through the lens of informed consent and institutional ethics, whereas patients interpreted these factors as safeguards against potential risks and uncertainties. Our findings underscore critical tensions such as individual control and research integrity. We offer design implications for building trustworthy, context-aware systems that support flexible granularity, provide ongoing benefit-centered transparency, and adapt to diverse literacy and privacy needs.

Authors:Ran Zhang, Yucong Lin, Zhaoli Su, Bowen Liu, Danni Ai, Tianyu Fu, Deqiang Xiao, Jingfan Fan, Yuanyuan Wang, Mingwei Gao, Yuwan Hu, Shuya Gao, Jingtao Li, Jian Yang, Hong Song, Hongliang Sun
Title: Ran Score: a LLM-based Evaluation Score for Radiology Report Generation
Abstract:
Chest X-ray report generation and automated evaluation are limited by poor recognition of low-prevalence abnormalities and inadequate handling of clinically important language, including negation and ambiguity. We develop a clinician-guided framework combining human expertise and large language models for multi-label finding extraction from free-text chest X-ray reports and use it to define Ran Score, a finding-level metric for report evaluation. Using three non-overlapping MIMIC-CXR-EN cohorts from a public chest X-ray dataset and an independent ChestX-CN validation cohort, we optimize prompts, establish radiologist-derived reference labels and evaluate report generation models. The optimized framework improves the macro-averaged score from 0.753 to 0.956 on the MIMIC-CXR-EN development cohort, exceeds the CheXbert benchmark by 15.7 percentage points on directly comparable labels, and shows robust generalization on the ChestX-CN validation cohort. Here we show that clinician-guided prompt optimization improves agreement with a radiologist-derived reference standard and that Ran Score enables finding-level evaluation of report fidelity, particularly for low-prevalence abnormalities.

Authors:Zefei Xie, Yuhan Guo, Kai Xu
Title: AwesomeLit: Towards Hypothesis Generation with Agent-Supported Literature Research
Abstract:
There are different goals for literature research, from understanding an unfamiliar topic to generate hypothesis for the next research project. The nature of literature research also varies according to user's familiarity level of the topic. For inexperienced researchers, identifying gaps in the existing literature and generating feasible hypothesis are crucial but challenging. While general ``deep research'' tools can be used, they are not designed for such use case, thus often not effective. In addition, the ``black box" nature and hallucination of Large Language Models (LLMs) often lead to distrust. In this paper, we introduce a human-agent collaborative visualization system AwesomeLit to address this need. It has several novel features: a transparent user-steerable agentic workflow; a dynamically generated query exploring tree, visualizing the exploration path and provenance; and a semantic similarity view, depicting the relationships between papers. It enables users to transition from general intentions to detailed research topics. Finally, a qualitative study involving several early researchers showed that AwesomeLit is effective in helping users explore unfamiliar topics, identify promising research directions, and improve confidence in research results.

Authors:Weitong Cai, Hang Zhang, Yukai Huang, Shitong Sun, Jiankang Deng, Songcen Xu, Jifei Song, Zhensong Zhang
Title: Color When It Counts: Grayscale-Guided Online Triggering for Always-On Streaming Video Sensing
Abstract:
Always-on sensing is essential for next-generation edge/wearable AI systems, yet continuous high-fidelity RGB video capture remains prohibitively expensive for resource-constrained mobile and edge platforms. We present a new paradigm for efficient streaming video understanding: grayscale-always, color-on-demand. Through preliminary studies, we discover that color is not always necessary. Sparse RGB frames suffice for comparable performance when temporal structure is preserved via continuous grayscale streams. Building on this insight, we propose ColorTrigger, an online training-free trigger that selectively activates color capture based on windowed grayscale affinity analysis. Designed for real-time edge deployment, ColorTrigger uses lightweight quadratic programming to detect chromatic redundancy causally, coupled with credit-budgeted control and dynamic token routing to jointly reduce sensing and inference costs. On streaming video understanding benchmarks, ColorTrigger achieves 91.6% of full-color baseline performance while using only 8.1% RGB frames, demonstrating substantial color redundancy in natural videos and enabling practical always-on video sensing on resource-constrained devices.

Authors:Vasco Xu, Brian Chen, Eric J. Gonzalez, Andrea Colaço, Henry Hoffmann, Mar Gonzalez-Franco, Karan Ahuja
Title: SurfaceXR: Fusing Smartwatch IMUs and Egocentric Hand Pose for Seamless Surface Interactions
Abstract:
Mid-air gestures in Extended Reality (XR) often cause fatigue and imprecision. Surface-based interactions offer improved accuracy and comfort, but current egocentric vision methods struggle due to hand tracking challenges and unreliable surface plane estimation. We introduce SurfaceXR, a sensor fusion approach combining headset-based hand tracking with smartwatch IMU data to enable robust inputs on everyday surfaces. Our insight is that these modalities are complementary: hand tracking provides 3D positional data while IMUs capture high-frequency motion. A 21-participant study validates SurfaceXR's effectiveness for touch tracking and 8-class gesture recognition, demonstrating significant improvements over single-modality approaches.

Authors:Maryam Cheema, Sina Elahimanesh, Pooyan Fazli, Hasti Seifi
Title: ViDscribe: Multimodal AI for Customizing Audio Description and Question Answering in Online Videos
Abstract:
Advances in multimodal large language models enable automatic video narration and question answering (VQA), offering scalable alternatives to labor-intensive, human-authored audio descriptions (ADs) for blind and low vision (BLV) viewers. However, prior AI-driven AD systems rarely adapt to the diverse needs and preferences of BLV individuals across videos and are typically evaluated in controlled, single-session settings. We present ViDscribe, a web-based platform that integrates AI-generated ADs with six types of user customizations and a conversational VQA interface for YouTube videos. Through a longitudinal, in-the-wild study with eight BLV participants, we examine how users engage with customization and VQA features over time. Our results show sustained engagement with both features and that customized ADs improve effectiveness, enjoyment, and immersion compared to default ADs, highlighting the value of personalized, interactive video access for BLV users.

Authors:Ilya Ilyankou, Stefano Cavazzi, James Haworth
Title: The Scenic Route to Deception: Dark Patterns and Explainability Pitfalls in Conversational Navigation
Abstract:
As pedestrian navigation increasingly experiments with Generative AI, and in particular Large Language Models, the nature of routing risks transforming from a verifiable geometric task into an opaque, persuasive dialogue. While conversational interfaces promise personalisation, they introduce risks of manipulation and misplaced trust. We categorise these risks using a 2x2 framework based on intent and origin, distinguishing between intentional manipulations (dark patterns) and unintended harms (explainability pitfalls). We propose seamful design strategies to mitigate these harms. We suggest that one robust way to operationalise trustworthy conversational navigation is through neuro-symbolic architecture, where verifiable pathfinding algorithms ground GenAI's persuasive capabilities, ensuring systems explain their limitations and incentives as clearly as they explain the route.

Authors:Hiruni Kegalle, Flora D. Salim, Mark Sanderson, Jeffrey Chan, Danula Hettiachchi
Title: Applying Value Sensitive Design to Location-Based Services: Designing for Shared Spaces and Local Conditions
Abstract:
Location-Based Services (LBS) such as ride-sharing, accommodation, food delivery, and location-driven social media platforms entangle digital systems with physical spaces, thereby generating impacts that extend beyond users to others who share the same environments. Existing design approaches struggle to address the dual challenge of value tensions that arise in shared physical spaces and the locality-specific contexts in which LBS operate. To respond, we introduce Location-Aware Value Sensitive Design (LA-VSD), a domain-specific adaptation of VSD tailored to the distinctive characteristics of LBS. LA-VSD guides designers through three heuristics to help (1) identify and prioritise stakeholders through local space-sharing scenarios, (2) adapt empirical methods to capture values and tensions in context, and (3) support value-aligned interactions across both digital and physical layers of the service. Through a case study of e-scooter sharing in Melbourne, Australia, we demonstrate how LA-VSD enables more grounded, context-aware, and actionable design of LBS.

Authors:Ziqi Pan, Ziqi Liu, Jinhan Zhang, Zeyu Huang, Xiaojuan Ma
Title: Moving Phones, Active Peers: Exploring the Effect of Animated Phones as Facilitators in In-Person Group Discussion
Abstract:
In today's in-person group discussions, smartphones are integrated as intelligent workstations; yet given their co-presence in such face-to-face interactions, whether and how they may enhance people's behavioral engagement with others remains underexplored. This work investigates how animating personal smartphones to move expressively, without compromising regular functions, can transform them into active embodied facilitators for co-located group interaction. In the four-stranger small-group discussion setting, guided by Tuckman's group-development theory, we conducted a design workshop (n=12) to identify problematic group-work circumstances and design expressive, attention-efficient animated phone facilitations. Subsequently, we developed AnimaStand, a movement-enabled phone stand that animates phones to deliver group facilitation cues according to conversation dynamics. In a between-subjects Wizard-of-Oz study (n=56) with four-stranger group discussions, where everyone's phone was on an AnimaStand, the facilitations re-engaged inactive members, enhancing group dynamics, task operation performance, and relationships. We finally discuss prospects for more adaptive and generalizable animated device personal facilitation.

Authors:Ruijia Chen, Yuheng Wu, Charlie Houseago, Filipe Gaspar, Filippo Aleotti, Dorian Gálvez-López, Oliver Johnston, Diego Mazala, Guillermo Garcia-Hernando, Maryam Bandukda, Gabriel Brostow, Jessica Van Brummelen
Title: NaviNote: Enabling In-situ Spatial Annotation Authoring to Support Exploration and Navigation for Blind and Low Vision People
Abstract:
GPS and smartphones enable users to place location-based annotations, capturing rich environmental context. Previous research demonstrates that blind and low vision (BLV) people can use annotations to explore unfamiliar areas. However, current commercial systems allowing BLV users to create annotations have never been evaluated, and current GPS-based systems can deviate several meters. Motivated by high-accuracy visual positioning technology, we first conducted a formative study with 24 BLV participants to envision a more accurate and inclusive annotation system. Surprisingly, many participants viewed the high-accuracy technology not just as an annotation system but also as a tool for precise last-few-meters navigation. Guided by participant feedback, we developed NaviNote, which combines vision-based high-precision localization with an agentic architecture to enable voice-based annotation authoring and navigation. Evaluating NaviNote with 18 BLV participants showed that it significantly improved navigation performance and supported users in understanding and annotating their surroundings. Based on these findings, we discuss design considerations for future accessible annotation authoring systems.

Authors:Ching Christie Pang, Yi Gao, Xuetong Wang, Pan Hui
Title: The AI Amplifier Effect: Defining Human-AI Intimacy and Romantic Relationships with Conversational AI
Abstract:
What does it mean to fall in love with something we know is virtual? The proliferation of conversational AI enables users to create customizable companions, fostering new intimate relationships that, while virtual, are perceived as authentic. However, public understanding of these bonds is limited, and platform policies regarding these interactions remain inconsistent. There is a pressing need for further HCI research to investigate: (a) the design affordances in AI that construct bonds and a sense of intimacy, (b) how such long-term engagement impacts users' real lives, and (c) how to balance user autonomy with platform regulation in the design of these systems without compromising users' well-being and experiences. This paper takes a step toward addressing these goals by providing a concrete definition of human AI intimacy based on in depth interviews with 30 users engaged in romantic relationships with AI companions. We elucidate the complexities of these relationships, from their formation to sustainability, and identify key features of the bonds formed. Notably, we introduce the AI Amplifier Effect, where the AI serves as a medium that intensifies the user's existing emotional state, leading to divergent positive, neutral, and negative impacts. We argue that designing for emotion must extend beyond technical affordances to encompass the essence of human affection. This paper's contributions aim to initiate a conversation and guide future research on human AI relationships within the HCI community.

Authors:Selin Choi, Dooyoung Kim, Taewook Ha, Seonji Kim, Woontack Woo
Title: Task Breakpoint Generation using Origin-Centric Graph in Virtual Reality Recordings for Adaptive Playback
Abstract:
We propose a method for generating task breakpoints based on an Origin-Centric Graph (OCG) to segment goal-oriented activity recordings into task units for adaptive playback in Virtual Reality (VR) environments. With the development of Augmented Reality (AR)/VR head-mounted displays (HMDs), research on adaptive tutorials and authoring tools has become active, but existing task segmentation methods mainly rely on manual annotation or are restricted to 2D video which limits their applicability to 3D VR contexts. In our approach, assembly scenarios with clearly defined task boundaries are recorded using a structured spatio-temporal scene graph (STSG), and the OCG is employed to track changes in the central object and the formation of new groups, thereby generating task breakpoints automatically. A user study collected user-perceived task breakpoints to establish ground truth (GT), and comparison with the algorithm-detected breakpoints demonstrated high agreement and confirmed accuracy in supporting adaptive playback. The proposed task segmentation method provides a foundation for dynamically adjusting VR playback according to user proficiency and progress, with potential for extension into automatic timeline segmentation systems for diverse VR recordings.

Authors:Nikita Soni, Dhruv Vijay Kunjadiya, Pratham Piyush Shah, Dikshya Mohanty, H. Andrew Schwartz, Niranjan Balasubramanian
Title: Addressing the Ecological Fallacy in Larger LMs with Human Context
Abstract:
Language model training and inference ignore a fundamental linguistic fact -- there is a dependence between multiple sequences of text written by the same person. Prior work has shown that addressing this form of \textit{ecological fallacy} can greatly improve the performance of multiple smaller (~124M) GPT-based models. In this work, we ask if addressing the ecological fallacy by modeling the author's language context with a specific LM task (called HuLM) can provide similar benefits for a larger-scale model, an 8B Llama model. To this end, we explore variants that process an author's language in the context of their other temporally ordered texts. We study the effect of pre-training with this author context using the HuLM objective, as well as using it during fine-tuning with author context (\textit{HuFT:Human-aware Fine-Tuning}). Empirical comparisons show that addressing the ecological fallacy during fine-tuning alone using QLoRA improves the performance of the larger 8B model over standard fine-tuning. Additionally, QLoRA-based continued HuLM pre-training results in a human-aware model generalizable for improved performance over eight downstream tasks with linear task classifier training alone. These results indicate the utility and importance of modeling language in the context of its original generators, the authors.

Authors:Zheyuan Kuang, Weiwei Jiang, Nicholas Koemel, Matthew Ahmadi, Emmanuel Stamatakis, Benjamin Tag, Anusha Withana, Zhanna Sarsenbayeva
Title: An LLM-Assisted Toolkit for Inspectable Multimodal Emotion Data Annotation
Abstract:
Multimodal Emotion Recognition (MER) increasingly depends on fine grained, evidence grounded annotations, yet inspection and label construction are hard to scale when cues are dynamic and misaligned across modalities. We present an LLM-assisted toolkit that supports multimodal emotion data annotation through an inspectable, event centered workflow. The toolkit preprocesses and aligns heterogeneous recordings, visualizes all modalities on an interactive shared timeline, and renders structured signals as video tracks for cross modal consistency checks. It then detects candidate events and packages synchronized keyframes and time windows as event packets with traceable pointers to the source data. Finally, the toolkit integrates an LLM with modality specific tools and prompt templates to draft structured annotations for analyst verification and editing. We demonstrate the workflow on multimodal VR emotion recordings with representative examples.

Authors:Zheyuan Kuang, Tinghui Li, Weiwei Jiang, Sven Mayer, Flora Salim, Benjamin Tag, Anusha Withana, Zhanna Sarsenbayeva
Title: Understanding the Effects of Interaction on Emotional Experiences in VR
Abstract:
Virtual reality has been effectively used for eliciting emotions, yet most research focuses on the intensity of affective responses rather than on how interaction influences those experiences. To address this gap, we advance a validated VR emotion-elicitation dataset through two key extensions. First, we add a new high-arousal, high-valence scene and validate its effectiveness in a within-subject study (N=24). Second, we incorporate interactive elements into each scene, creating both interactive and non-interactive versions to examine the impact of interaction on emotional responses. We evaluate interaction through a multimodal approach combining subjective ratings and physiological signals to capture both conscious and unconscious affective responses. Our evaluation study (N=84) shows that interaction not only amplifies emotions but modulates them in context, supporting coping in negative scenes and enhancing enjoyment in positive scenes. These findings highlight the potential of scene-tailored interaction for different applications, where regulating emotions is as important as eliciting them.

Authors:Esen K. Tütüncü, Qian Zhou, Frederik Brudy, George Fitzmaurice, Fraser Anderson
Title: PlayWrite: A Multimodal System for AI Supported Narrative Co-Authoring Through Play in XR
Abstract:
Current AI writing tools, which rely on text prompts, poorly support the spatial and interactive nature of storytelling where ideas emerge from direct manipulation and play. We present PlayWrite, a mixed-reality system where users author stories by directly manipulating virtual characters and props. A multi-agent AI pipeline interprets these actions into Intent Frames -structured narrative beats visualized as rearrangeable story marbles on a timeline. A large language model then transforms the user's assembled sequence into a final narrative. A user study (N=13) with writers from varying domains found that PlayWrite fosters a highly improvisational and playful process. Users treated the AI as a collaborative partner, using its unexpected responses to spark new ideas and overcome creative blocks. PlayWrite demonstrates an approach for co-creative systems that move beyond text to embrace direct manipulation and play as core interaction modalities.

Authors:Sicheng Yang, Yukai Huang, Weitong Cai, Shitong Sun, Fengyi Fang, You He, Yiqiao Xie, Jiankang Deng, Hang Zhang, Jifei Song, Zhensong Zhang
Title: Egocentric Co-Pilot: Web-Native Smart-Glasses Agents for Assistive Egocentric AI
Abstract:
What if accessing the web did not require a screen, a stable desk, or even free hands? For people navigating crowded cities, living with low vision, or experiencing cognitive overload, smart glasses coupled with AI agents could turn the web into an always-on assistive layer over daily life. We present Egocentric Co-Pilot, a web-native neuro-symbolic framework that runs on smart glasses and uses a Large Language Model (LLM) to orchestrate a toolbox of perception, reasoning, and web tools. An egocentric reasoning core combines Temporal Chain-of-Thought with Hierarchical Context Compression to support long-horizon question answering and decision support over continuous first-person video, far beyond a single model's context window. Additionally, a lightweight multimodal intent layer maps noisy speech and gaze into structured commands. We further implement and evaluate a cloud-native WebRTC pipeline integrating streaming speech, video, and control messages into a unified channel for smart glasses and browsers. In parallel, we deploy an on-premise WebSocket baseline, exposing concrete trade-offs between local inference and cloud offloading in terms of latency, mobility, and resource use. Experiments on Egolife and HD-EPIC demonstrate competitive or state-of-the-art egocentric QA performance, and a human-in-the-loop study on smart glasses shows higher task completion and user satisfaction than leading commercial baselines. Taken together, these results indicate that web-connected egocentric co-pilots can be a practical path toward more accessible, context-aware assistance in everyday life. By grounding operation in web-native communication primitives and modular, auditable tool use, Egocentric Co-Pilot offers a concrete blueprint for assistive, always-on web agents that support education, accessibility, and social inclusion for people who may benefit most from contextual, egocentric AI.

Authors:Liangwei Wang, Zhengxuan Zhang, Yifan Cao, Fugee Tsung, Yuyu Luo
Title: TableTale: Reviving the Narrative Interplay Between Data Tables and Text in Scientific Papers
Abstract:
Data tables play a central role in scientific papers. However, their meaning is often co-constructed with surrounding text through narrative interplay, making comprehension cognitively demanding for readers. In this work, we explore how interfaces can better support this reading process. We conducted a formative study that revealed key characteristics of text-table narrative interplay, including linking mechanisms, multi-granularity alignments, and mention typologies, as well as a layered framework of readers' intents. Informed by these insights, we present TableTale, an augmented reading interface that enriches text with data tables at multiple granularities, including paragraphs, sentences, and mentions. TableTale automatically constructs a document-level linking schema within the paper and progressively renders cascade visual cues on text and tables that unfold as readers move through the text. A within-subject study with 24 participants showed that TableTale reduced cognitive workload and improved reading efficiency, demonstrating its potential to enhance paper reading and inform future reading interface design.

Authors:Runhua Zhang, Ziqi Pan, Huiran Yi, Huamin Qu, Xiaojuan Ma
Title: "Without AI, I Would Never Share This Online": Unpacking How LLMs Catalyze Women's Sharing of Gendered Experiences on Social Media
Abstract:
Sharing gendered experiences on social media has been widely recognized as supporting women's personal sense-making and contributing to digital feminism. However, there are known concerns, such as fear of judgment and backlash, that may discourage women from posting online. In this study, we examine a recurring practice on Xiaohongshu, a popular Chinese social media platform, in which women share their gendered experiences alongside screenshots of conversations with LLMs. We conducted semi-structured interviews with 20 women to investigate whether and how interactions with LLMs might support women in articulating and sharing gendered experiences. Our findings reveal that, beyond those external concerns, women also hold self-imposed standards regarding what feels appropriate and worthwhile to share publicly. We further show how interactions with LLMs help women meet these standards and navigate such concerns. We conclude by discussing how LLMs might be carefully and critically leveraged to support women's everyday expression online.

Authors:Jiwan Kim, Chi-Jung Lee, Hohurn Jung, Tianhong Catherine Yu, Ruidong Zhang, Ian Oakley, Cheng Zhang
Title: WatchHand: Enabling Continuous Hand Pose Tracking On Off-the-Shelf Smartwatches
Abstract:
Tracking hand poses on wrist-wearables enables rich, expressive interactions, yet remains unavailable on commercial smartwatches, as prior implementations rely on external sensors or custom hardware, limiting their real-world applicability. To address this, we present WatchHand, the first continuous 3D hand pose tracking system implemented on off-the-shelf smartwatches using only their built-in speaker and microphone. WatchHand emits inaudible frequency-modulated continuous waves and captures their reflections from the hand. These acoustic signals are processed by a deep-learning model that estimates 3D hand poses for 20 finger joints. We evaluate WatchHand across diverse real-world conditions -- multiple smartwatch models, wearing-hands, body postures, noise conditions, pose-variation protocols -- and achieve a mean per-joint position error of 7.87 mm in cross-session tests with device remounting. Although performance drops for unseen users or gestures, the model adapts effectively with lightweight fine-tuning on small amounts of data. Overall, WatchHand lowers the barrier to smartwatch-based hand tracking by eliminating additional hardware while enabling robust, always-available interactions on millions of existing devices.

Authors:Runhua Zhang, Ziqi Pan, Kangyu Yuan, Qiaoyi Chen, Yulin Tian, Huamin Qu, Xiaojuan Ma
Title: When LLMs Enter Everyday Feminism on Chinese Social Media: Opportunities and Risks for Women's Empowerment
Abstract:
Everyday digital feminism refers to the ordinary, often pragmatic ways women articulate lived experiences and cultivate solidarity in online spaces. In China, such practices flourish on RedNote through discussions under hashtags like ''women's growth''. Recently, DeepSeek-generated content has been taken up as a new voice in these conversations. Given widely recognized gender biases in LLMs, this raises critical concerns about how LLMs interact with everyday feminist practices. Through an analysis of 430 RedNote posts, 139 shared DeepSeek responses, and 3211 comments, we found that users predominantly welcomed DeepSeek's advice. Yet feminist critical discourse analysis revealed that these responses primarily encouraged women to self-optimize and pursue achievements within prevailing norms rather than challenge them. By interpreting this case, we discuss the opportunities and risks that LLMs introduce for everyday feminism as a pathway toward women's empowerment, and offer design implications for leveraging LLMs to better support such practices.

Authors:Sebastian Hubenschmid, Arvind Srinivasan, Niklas Elmqvist, Dieter Schmalstieg, Michael Sedlmair
Title: Ambient Analytics: Calm Technology for Immersive Visualization and Sensemaking
Abstract:
Augmented reality has great potential for embedding data visualizations in the world around the user. While this can enhance users' understanding of their surroundings, it also bears the risk of overwhelming their senses with a barrage of information. In contrast, calm technologies aim to place information in the user's attentional periphery, minimizing cognitive load instead of demanding focused engagement. In this column, we explore how visualizations can be harmoniously integrated into our everyday life through augmented reality, progressing from visual analytics to ambient analytics.

Authors:Jiasheng Li, Zining Zhang, Zeyu Yan, Matthew Wong, Arnav Mittal, Ge Gao, Huaishu Peng
Title: As Content and Layout Co-Evolve: TangibleSite for Scaffolding Blind People's Webpage Design through Multimodal Interaction
Abstract:
Creating webpages requires generating content and arranging layout while iteratively refining both to achieve a coherent design, a process that can be challenging for blind individuals. To understand how blind designers navigate this process, we conducted two rounds of co-design sessions with blind participants, using design probes to elicit their strategies and support needs. Our findings reveal a preference for content and layout to co-evolve, but this process requires external support through cues that situate local elements within the broader page structure as well as multimodal interactions. Building on these insights, we developed TangibleSite, an accessible web design tool that provides real-time multimodal feedback through tangible, auditory, and speech-based interactions. TangibleSite enables blind individuals to create, edit, and reposition webpage elements while integrating content and layout decisions. A formative evaluation with six blind participants demonstrated that TangibleSite enabled independent webpage creation, supported refinement across content and layout, and reduced barriers to achieving visually consistent designs.

Authors:Alyssa Hwang, Hita Kambhamettu, Yue Yang, Ajay Patel, Joseph Chee Chang, Andrew Head
Title: Connecting the Dots: Surfacing Structure in Documents through AI-Generated Cross-Modal Links
Abstract:
Understanding information-dense documents like recipes and scientific papers requires readers to find, interpret, and connect details scattered across text, figures, tables, and other visual elements. These documents are often long and filled with specialized terminology, hindering the ability to locate relevant information or piece together related ideas. Existing tools offer limited support for synthesizing information across media types. As a result, understanding complex material remains cognitively demanding. This paper presents a framework for fine-grained integration of information in complex documents. We instantiate the framework in an augmented reading interface, which populates a scientific paper with clickable points on figures, interactive highlights in the body text, and a persistent reference panel for accessing consolidated details without manual scrolling. In a controlled between-subjects study, we find that participants who read the paper with our tool achieved significantly higher scores on a reading quiz without evidence of increased time to completion or cognitive load. Fine-grained integration provides a systematic way of revealing relationships within a document, supporting engagement with complex, information-dense materials.

Authors:Sutapa Dey Tithi, Xiaoyi Tian, Ally Limke, Min Chi, Tiffany Barnes
Title: Exploring the Design and Impact of Interactive Worked Examples for Learners with Varying Prior Knowledge
Abstract:
Tutoring systems improve learning through tailored interventions, such as worked examples, but often suffer from the aptitude-treatment interaction effect where low prior knowledge learners benefit more. We applied the ICAP learning theory to design two new types of worked examples, Buggy (students fix bugs), and Guided (students complete missing rules), requiring varying levels of cognitive engagement, and investigated their impact on learning in a controlled experiment with 155 undergraduate students in a logic problem solving tutor. Students in the Buggy and Guided examples groups performed significantly better on the posttest than those receiving passive worked examples. Buggy problems helped high prior knowledge learners whereas Guided problems helped low prior knowledge learners. Behavior analysis showed that Buggy produced more exploration-revision cycles, while Guided led to more help-seeking and fewer errors. This research contributes to the design of interventions in logic problem solving for varied levels of learner knowledge and a novel application of behavior analysis to compare learner interactions with the tutor.

Authors:Hyoungwook Jin, Minju Yoo, Jieun Han, Zixin Chen, So-Yeon Ahn, Xu Wang
Title: RelianceScope: An Analytical Framework for Examining Students' Reliance on Generative AI Chatbots in Problem Solving
Abstract:
Generative AI chatbots enable personalized problem-solving, but effective learning requires students to self-regulate both how they seek help and how they use AI-generated responses. Considering engagement modes across these two actions reveals nuanced reliance patterns: for example, a student may actively engage in help-seeking by clearly specifying areas of need, yet engage passively in response-use by copying AI outputs, or vice versa. However, existing research lacks systematic tools for jointly capturing engagement across help-seeking and response-use, limiting the analysis of such reliance behaviors. We introduce RelianceScope, an analytical framework that characterizes students' reliance on chatbots during problem-solving. RelianceScope (1) operationalizes reliance into nine patterns based on combinations of engagement modes in help-seeking and response-use, and (2) situates these patterns within a knowledge-context lens that accounts for students' prior knowledge and the instructional significance of knowledge components. Rather than prescribing optimal AI use, the framework enables fine-grained analysis of reliance in open-ended student-AI interactions. As an illustrative application, we applied RelianceScope to analyze chat and code-edit logs from 79 college students in a web programming course. Results show that active help-seeking is associated with active response-use, whereas reliance patterns remain similar across knowledge mastery levels. Students often struggled to articulate their knowledge gaps and to adapt AI responses. Using our annotated dataset as a benchmark, we further demonstrate that large language models can reliably detect reliance during help-seeking and response-use. We conclude by discussing the implications of RelianceScope and the design guidelines for AI-supported educational systems.

Authors:Most. Sharmin Sultana Samu, Nafisa Khan, Kazi Toufique Elahi, Tasnuva Binte Rahman, Md. Rakibul Islam, Farig Sadeque
Title: AI as Teammate or Tool? A Review of Human-AI Interaction in Decision Support
Abstract:
The integration of Artificial Intelligence (AI) necessitates determining whether systems function as tools or collaborative teammates. In this study, by synthesizing Human-AI Interaction (HAI) literature, we analyze this distinction across four dimensions: interaction design, trust calibration, collaborative frameworks and healthcare applications. Our analysis reveals that static interfaces and miscalibrated trust limit AI efficacy. Performance hinges on aligning transparency with cognitive workflows, yet a fluency trap often inflates trust without improving decision-making. Consequently, an overemphasis on explainability leaves systems largely passive. Our findings show that current AI systems remain largely passive due to an overreliance on explainability-centric designs and that transitioning AI to an active teammate requires adaptive, context-aware interactions that support shared mental models and the dynamic negotiation of authority between humans and AI.

Authors:Samuel Reinders, Munazza Zaib, Matthew Butler, Bongshin Lee, Ingrid Zukerman, Lizhen Qu, Kim Marriott
Title: Supporting Multimodal Data Interaction on Refreshable Tactile Displays: An Architecture to Combine Touch and Conversational AI
Abstract:
Combining conversational AI with refreshable tactile displays (RTDs) offers significant potential for creating accessible data visualization for people who are blind or have low vision (BLV). To support researchers and developers building accessible data visualizations with RTDs, we present a multimodal data interaction architecture along with an open-source reference implementation. Our system is the first to combine touch input with a conversational agent on an RTD, enabling deictic queries that fuse touch context with spoken language, such as "what is the trend between these points?" The architecture addresses key technical challenges, including touch sensing on RTDs, visual-to-tactile encoding, integrating touch context with conversational AI, and synchronizing multimodal output. Our contributions are twofold: (1) a technical architecture integrating RTD hardware, external touch sensing, and conversational AI to enable multimodal data interaction; and (2) an open-source reference implementation demonstrating its feasibility. This work provides a technical foundation to support future research in multimodal accessible data visualization.

Authors:Dániel Szabó, Aku Visuri, Benjamin Tag, Simo Hosio
Title: Robot-Wearable Conversation Hand-off for Navigation
Abstract:
Navigating large and complex indoor environments, such as universities, airports, and hospitals, can be cognitively demanding and requires attention and effort. While mobile applications provide convenient navigation support, they occupy the user's hands and visual attention, limiting natural interaction. In this paper, we explore conversation hand-off as a method for multi-device indoor navigation, where a Conversational Agent (CA) transitions seamlessly from a stationary social robot to a wearable device. We evaluated robot-only, wearable-only, and robot-to-wearable hand-off in a university campus setting using a within-subjects design with N=24 participants. We find that conversation hand-off is experienced as engaging, even though no performance benefits were observed, and most preferred using the wearable-only system. Our findings suggest that the design of such re-embodied assistants should maintain a shared voice and state across embodiments. We demonstrate how conversational hand-offs can bridge cognitive and physical transitions, enriching human interaction with embodied AI.

Authors:Kexin Quan, Jessie Chin
Title: Conversational Decision Support for Information Search Under Uncertainty: Effects of Gist and Verbatim Feedback
Abstract:
Many real-world decisions rely on information search, where people sample evidence and decide when to stop under uncertainty. The uncertainty in the environment, particularly how diagnostic evidence is distributed, causes complexities in information search, further leading to suboptimal decision-making outcomes. Yet AI decision support often targets outcome optimization, and less is known about how to scaffold search without increasing cognitive load. We introduce SERA, an LLM-based assistant that provides either gist or verbatim feedback during search. Across two experiments (N1=54, N2=54), we examined decision-making outcomes and information search in SERA-Gist, SERA-Verbatim, and a no-feedback baseline across three environments varying in uncertainty. The uncertainty in environment is operationalized by the perceived gain of information across the course of sampling, which individuals may experience diminishing return of information gain (decremental; low-uncertainty), or a local drop of information gain (local optimum; medium-uncertainty), or no patterns in information gain (high-uncertainty), as they search more. Individuals show more accurate decision outcomes and are more confident with SERA support, especially under higher uncertainty. Gist feedback was associated with more efficient integration and showed a descriptive pattern of reduced oversampling, while verbatim feedback promoted more extensive exploration. These findings establish feedback representation as a design lever when search matters, motivating adaptive systems that match feedback granularity to uncertainty.

Authors:Joyjit Roy, Samaresh Kumar Singh
Title: Agentic AI for Commercial Insurance Underwriting with Adversarial Self-Critique
Abstract:
Commercial insurance underwriting is a labor-intensive process that requires manual review of extensive documentation to assess risk and determine policy pricing. While AI offers substantial efficiency improvements, existing solutions lack comprehensive reasoning capabilities and internal mechanisms to ensure reliability within regulated, high-stakes environments. Full automation remains impractical and inadvisable in scenarios where human judgment and accountability are critical. This study presents a decision-negative, human-in-the-loop agentic system that incorporates an adversarial self-critique mechanism as a bounded safety architecture for regulated underwriting workflows. Within this system, a critic agent challenges the primary agent's conclusions prior to submitting recommendations to human reviewers. This internal system of checks and balances addresses a critical gap in AI safety for regulated workflows. Additionally, the research develops a formal taxonomy of failure modes to characterize potential errors by decision-negative agents. This taxonomy provides a structured framework for risk identification and risk management in high-stakes applications. Experimental evaluation using 500 expert-validated underwriting cases demonstrates that the adversarial critique mechanism reduces AI hallucination rates from 11.3% to 3.8% and increases decision accuracy from 92% to 96%. At the same time, the framework enforces strict human authority over all binding decisions by design. These findings indicate that adversarial self-critique supports safer AI deployment in regulated domains and offers a model for responsible integration where human oversight is indispensable.

Authors:Gun Woo, Park, Frederik Brudy, George Fitzmaurice, Fraser Anderson
Title: GroundLink: Exploring How Contextual Meeting Snippets Can Close Common Ground Gaps in Editing 3D Scenes for Virtual Production
Abstract:
Virtual Production (VP) professionals often face challenges accessing tacit knowledge and creative intent, which are important in forming common ground with collaborators and in contributing more effectively and efficiently to the team. From our formative study (N=23) with a follow-up interview (N=6), we identified the significance and prevalence of this challenge. To help professionals access knowledge, we present GroundLink, a Unity add-on that surfaces meeting-derived knowledge directly in the editor to support establishing common ground. It features a meeting knowledge dashboard for capturing and reviewing decisions and comments, constraint-aware feedforward that proactively informs the editor environment, and cross-modal synchronization that provides referential links between the dashboard and the editor. A comparative study (N=12) suggested that GroundLink help users build common ground with their team while improving perceived confidence and ease of editing the 3D scene. An expert evaluation with VP professionals (N=5) indicated strong potential for GroundLink in real-world workflows.

Authors:Ziyi Wang, Congrong Zhang, Jingying Deng, Xiaofan Hu, Jie Cai, Nan Gao, Chun Yu, Haining Zhang
Title: Division of Labor and Collaboration Between Parents in Family Education
Abstract:
Homework tutoring work is a demanding and often conflict-prone practice in family life, and parents often lack targeted support for managing its cognitive and emotional burdens. Through interviews with 18 parents of children in grades 1-3, we examine how homework-related labor is divided and coordinated between parents, and where AI might meaningfully intervene. We found three key insights: (1) Homework labor encompasses distinct dimensions: physical, cognitive, and emotional, with the latter two often remaining invisible. (2) We identified father-mother-child triadic dynamics in labor division, with children's feedback as the primary factor shaping parental labor adjustments. (3) Building on prior HCI research, we propose an AI design that prioritizes relationship maintenance over task automation or broad labor mitigation. By employing labor as a lens that integrates care work, we explore the complexities of labor within family contexts, contributing to feminist and care-oriented HCI and to the development of context-sensitive coparenting practices.

Authors:Yue Fu, Joel Wester, Niels Van Berkel, Alexis Hiniker
Title: Self-Regulated Reading with AI Support: An Eight-Week Study with Students
Abstract:
College students increasingly use AI chatbots to support academic reading, yet we lack granular understanding of how these interactions shape their reading experience and cognitive engagement. We conducted an eight-week longitudinal study with 15 undergraduates who used AI to support assigned readings in a course. We collected 838 prompts across 239 reading sessions and developed a coding schema categorizing prompts into four cognitive themes: Decoding, Comprehension, Reasoning, and Metacognition. Comprehension prompts dominated (59.6%), with Reasoning (29.8%), Metacognition (8.5%), and Decoding (2.1%) less frequent. Most sessions (72%) contained exactly three prompts, the required minimum of the reading assignment. Within sessions, students showed natural cognitive progression from comprehension toward reasoning, but this progression was truncated. Across eight weeks, students' engagement patterns remained stable, with substantial individual differences persisting throughout. Qualitative analysis revealed an intention-behavior gap: students recognized that effective prompting required effort but rarely applied this knowledge, with efficiency emerging as the primary driver. Students also strategically triaged their engagement based on interest and academic pressures, exhibiting a novel pattern of reading through AI rather than with it: using AI-generated summaries as primary material to filter which sections merited deeper attention. We discuss design implications for AI reading systems that scaffold sustained cognitive engagement.

Authors:Shijing He, Chenkai Ma, Chi Zhang, Adam Jenkins, Ruba Abu-Salma, Jose Such
Title: "Create an environment that protects women, rather than selling anxiety!": Participatory Threat Modeling with Chinese Young Women Living Alone
Abstract:
As more young women in China live alone, they navigate entangled privacy, security, and safety (PSS) risks across smart homes, online platforms, and public infrastructures. Drawing on six participatory threat modeling (PTM) workshops (n = 33), we present a human-centered threat model that illustrates how digitally facilitated physical violence, digital harassment and scams, and pervasive surveillance by individuals, companies, and the state are interconnected and mutually reinforcing. We also document four mitigation strategies employed by participants: smart home device configurations, boundary management, sociocultural practices, and social media tactics--each of which can introduce new vulnerabilities and emotional burdens. Based on these insights, we developed a digital PSS guidebook for young women living alone (YWLA) in China. We further propose actionable design implications for smart home devices and social media platforms, along with policy and legal recommendations and directions for educational interventions.

Authors:Cindy Peng, Megan Chai, Gao Mo, Naveen Raman, Ningjing Tang, Shannon Pagdon, Margaret Swarbrick, Nev Jones, Fei Fang, Hong Shen
Title: Large Language Models in Peer-Run Community Behavioral Health Services: Understanding Peer Specialists and Service Users' Perspectives on Opportunities, Risks, and Mitigation Strategies
Abstract:
Peer-run organizations (PROs) provide critical, recovery-based behavioral health support rooted in lived experience. As large language models (LLMs) enter this domain, their scale, conversationality, and opacity introduce new challenges for situatedness, trust, and autonomy. Partnering with Collaborative Support Programs of New Jersey (CSPNJ), a statewide PRO in the Northeastern United States, we used comicboarding, a co-design method, to conduct workshops with 16 peer specialists and 10 service users exploring perceptions of integrating an LLM-based recommendation system into peer support. Findings show that depending on how LLMs are introduced, constrained, and co-used, they can reconfigure in-room dynamics by sustaining, undermining, or amplifying the relational authority that grounds peer support. We identify opportunities, risks, and mitigation strategies across three tensions: bridging scale and locality, protecting trust and relational dynamics, and preserving peer autonomy amid efficiency gains. We contribute design implications that center lived-experience-in-the-loop, reframe trust as co-constructed, and position LLMs not as clinical tools but as relational collaborators in high-stakes, community-led care.

Authors:Gefei Zhang, Guodao Sun, Meng Xia, Ronghua Liang
Title: ClassAid: A Real-time Instructor-AI-Student Orchestration System for Classroom Programming Activities
Abstract:
Generative AI is reshaping education, but it also raises concerns about instability and overreliance. In programming classrooms, we aim to leverage its feedback capabilities while reinforcing the educator's role in guiding student-AI interactions. We developed ClassAid, a real-time orchestration system that integrates TA Agents to provide personalized support and an AI-driven dashboard that visualizes student-AI interactions, enabling instructors to dynamically adjust TA Agent modes. Instructors can configure the Agent to provide technical feedback (direct coding solutions), heuristic feedback (hint-based guidance), automatic feedback (autonomously selecting technical or heuristic support), or silent operation (no AI support). We evaluated ClassAid through three aspects: (1) the TA Agents' performance, (2) feedback from 54 students and one instructor during a classroom deployment, and (3) interviews with eight educators. Results demonstrate that dynamic instructor control over AI supports effective real-time personalized feedback and provides design implications for integrating AI into authentic educational settings.

Authors:Mandi Yang, Zhiqi Gao, Yibo Meng, Dongyijie Primo Pan
Title: Prompting Destiny: Negotiating Socialization and Growth in an LLM-Mediated Speculative Gameworld
Abstract:
We present an LLM-mediated role-playing game that supports reflection on socialization, moral responsibility, and educational role positioning. Grounded in socialization theory, the game follows a four-season structure in which players guide a child prince through morally charged situations and compare the LLM-mediated NPC's differentiated responses across stages, helping them reason about how educational guidance shifts with socialization. To approximate real educational contexts and reduce score-chasing, the system hides real-time evaluative scores and provides delayed, end-of-stage growth feedback as reflective prompts. We conducted a user study (N=12) with gameplay logs and post-game interviews, analyzed via reflexive thematic analysis. Findings show how players negotiated responsibility and role positioning, and reveal an entry-load tension between open-ended expression and sustained engagement. We contribute design knowledge on translating sociological models of socialization into reflective AI-mediated game systems.

Authors:Zhiqi Gao, Guo Zhu, Huarui Luo, Dongyijie Primo Pan, Haoming Tang, Bingquan Zhang, Jiahuan Pei, Jie Li, Benyou Wang
Title: "It Talks Like a Patient, But Feels Different": Co-Designing AI Standardized Patients with Medical Learners
Abstract:
Standardized patients (SPs) play a central role in clinical communication training but are costly, difficult to scale, and inconsistent. Large language model (LLM) based AI standardized patients (AI-SPs) promise flexible, on-demand practice, yet learners often report that they talk like a patient but feel different. We interviewed 12 clinical-year medical students and conducted three co-design workshops to examine how learners experience constraints of SP encounters and what they expect from AI-SPs. We identified six learner-centered needs, translated them into AI-SP design requirements, and synthesized a conceptual workflow. Our findings position AI-SPs as tools for deliberate practice and show that instructional usability, rather than conversational realism alone, drives learner trust, engagement, and educational value.

Authors:Casey Ford, Madison Van Doren, Emily Dix
Title: Alignment Drift in Multimodal LLMs: A Two-Phase, Longitudinal Evaluation of Harm Across Eight Model Releases
Abstract:
Multimodal large language models (MLLMs) are increasingly deployed in real-world systems, yet their safety under adversarial prompting remains underexplored. We present a two-phase evaluation of MLLM harmlessness using a fixed benchmark of 726 adversarial prompts authored by 26 professional red teamers. Phase 1 assessed GPT-4o, Claude Sonnet 3.5, Pixtral 12B, and Qwen VL Plus; Phase 2 evaluated their successors (GPT-5, Claude Sonnet 4.5, Pixtral Large, and Qwen Omni) yielding 82,256 human harm ratings. Large, persistent differences emerged across model families: Pixtral models were consistently the most vulnerable, whereas Claude models appeared safest due to high refusal rates. Attack success rates (ASR) showed clear alignment drift: GPT and Claude models exhibited increased ASR across generations, while Pixtral and Qwen showed modest decreases. Modality effects also shifted over time: text-only prompts were more effective in Phase 1, whereas Phase 2 produced model-specific patterns, with GPT-5 and Claude 4.5 showing near-equivalent vulnerability across modalities. These findings demonstrate that MLLM harmlessness is neither uniform nor stable across updates, underscoring the need for longitudinal, multimodal benchmarks to track evolving safety behaviour.

Authors:Xinyi Wen, Lena Hegemann, Xiaofu Jin, Shuai Ma, Antti Oulasvirta
Title: Adaptive Prompt Elicitation for Text-to-Image Generation
Abstract:
Aligning text-to-image generation with user intent remains challenging, for users who provide ambiguous inputs and struggle with model idiosyncrasies. We propose Adaptive Prompt Elicitation (APE), a technique that adaptively asks visual queries to help users refine prompts without extensive writing. Our technical contribution is a formulation of interactive intent inference under an information-theoretic framework. APE represents latent intent as interpretable feature requirements using language model priors, adaptively generates visual queries, and compiles elicited requirements into effective prompts. Evaluation on IDEA-Bench and DesignBench shows that APE achieves stronger alignment with improved efficiency. A user study with challenging user-defined tasks demonstrates 19.8% higher alignment without workload overhead. Our work contributes a principled approach to prompting that, for general users, offers an effective and efficient complement to the prevailing prompt-based interaction paradigm with text-to-image models.

Authors:Ritik Batra, Roy Zunder, Amy Cheatle, Amritansh Kwatra, Ilan Mandel, Thijs Roumen, Steven J. Jackson
Title: Convivial Fabrication: Towards Relational Computational Tools For and From Craft Practices
Abstract:
Computational tools for fabrication often treat materials as passive rather than active participants in design, abstracting away relationships between craftspeople and materials. For craft communities that value relational practices, abstractions limit the adoption and creative uptake of computational tools which might otherwise be beneficial. To understand how better tool design could support richer relations between individuals, tools, and materials, we interviewed expert woodworkers, fiber artists, and metalworkers. We identify three orders of convivial relations central to craft: immediate relations between individuals, tools, and materials; mid-range relations between communities, platforms, and shared materials; and extended relations between institutions, infrastructures, and ecologies. Our analysis shows how craftspeople engage and struggle with convivial relations across all three orders, creating workflows that learn from materials while supporting autonomy. We conclude with design principles for computational tools and infrastructures to better support material dialogue, collective knowledge, and accountability, along with richer and more convivial relations between craftspeople, tools, and the material worlds around them.

Authors:Erzhen Hu, Frederik Brudy, David Ledo, George Fitzmaurice, Fraser Anderson
Title: PrevizWhiz: Combining Rough 3D Scenes and 2D Video to Guide Generative Video Previsualization
Abstract:
In pre-production, filmmakers and 3D animation experts must rapidly prototype ideas to explore a film's possibilities before fullscale production, yet conventional approaches involve trade-offs in efficiency and expressiveness. Hand-drawn storyboards often lack spatial precision needed for complex cinematography, while 3D previsualization demands expertise and high-quality rigged assets. To address this gap, we present PrevizWhiz, a system that leverages rough 3D scenes in combination with generative image and video models to create stylized video previews. The workflow integrates frame-level image restyling with adjustable resemblance, time-based editing through motion paths or external video inputs, and refinement into high-fidelity video clips. A study with filmmakers demonstrates that our system lowers technical barriers for film-makers, accelerates creative iteration, and effectively bridges the communication gap, while also surfacing challenges of continuity, authorship, and ethical consideration in AI-assisted filmmaking.

Authors:Roberta Mota, Julio D. Silva, Fabio Miranda, Usman Alim, Ehud Sharlin, Nivan Ferreira
Title: Occlusion-Free Conformal Lensing for Spatiotemporal Visualization in 3D Urban Analytics
Abstract:
The visualization of temporal data on urban buildings, such as shadows, noise, and solar potential, plays a critical role in the analysis of dynamic urban phenomena. However, in dense and geographically constrained 3D urban environments, visual representations of time-varying building data often suffer from occlusion and visual clutter. To address these two challenges, we introduce an immersive lens visualization that integrates a view-dependent cutaway de-occlusion technique and a temporal display derived from a conformal mapping algorithm. The mapping process first partitions irregular building footprints into smaller, sufficiently regular subregions that serve as structural primitives. These subregions are then seamlessly recombined to form a conformal, layered layout for our temporal lens visualization. The view-responsive cutaway is inspired by traditional architectural illustrations, preserving the overall layout of the building and its surroundings to maintain users' sense of spatial orientation. This lens design enables the occlusion-free embedding of shape-adaptive temporal displays across building facades on demand, supporting rapid time-space association for the discovery, access and interpretation of spatiotemporal urban patterns. Guided by domain and design goals, we outline the rationale behind the lens visual and interaction design choices, such as the encoding of time progression and temporal values in the conforming lens image. A user study compares our approach against conventional juxtaposition and x-ray spatiotemporal designs. Results validate the usage and utility of our lens, showing that it improves task accuracy and completion time, reduces navigation effort, and increases user confidence. From these findings, we distill design recommendations and promising directions for future research on spatially-embedded lenses in 3D visualization and urban analytics.

Authors:Hongyu Zhou, Chia-An fan, Yihao Dong, Shuto Takashita, Masahiko Inami, Zhanna Sarsenbayeva, Anusha Withana
Title: SRL Proxemics: Spatial Guidelines for Supernumerary Robotic Limbs in Near-Body Interactions
Abstract:
Wearable supernumerary robotic limbs (SRLs) sit at the intersection of human augmentation and embodied AI, transforming into extensions of the human body. However, their movements within the intimate near-body space raise unresolved challenges for perceived safety, user control, and trust. In this paper, we present results from a Wizard-of-Oz study (n=18), where participants completed near-body collaboration tasks with SRLs to explore these challenges. We collected qualitative data through think-aloud protocols and semi-structured interviews, complemented by physiological signals and post-task ratings. Findings indicate that greater autonomy did not inherently enhance perceived safety or trust. Instead, participants identified near-body zones and paired them with clear coordination rules. They also expressed expectations for how different arm components should behave, shaping preferences around autonomy, perceived safety, and trust. Building on these insights, we introduce SRL Proxemics, a zone- and segment-level design framework showing that autonomy is not monolithic: perceived safety hinges on spatially calibrated, legible behaviors, not higher autonomy.

Authors:Hongyu Zhou, Xincheng Huang, Winston Wijaya, Yi Fei Cheng, David Lindlbauer, Eduardo Velloso, Andrea Bianchi, Zhanna Sarsenbayeva, Anusha Withana
Title: One Body, Two Minds: Alternating VR Perspective During Remote Teleoperation of Supernumerary Limbs
Abstract:
Remote VR teleoperation with supernumerary robotic limbs enables distant users to operate in another's local space. While a shared first-person view aids hand-eye coordination, locking the guest's camera to the host's head can degrade comfort, embodiment, and coordination. Based on a formative study (N=10) using a virtual supernumerary robotic limbs configuration to stress-test coordination, we propose guest-driven perspective switching from a shared first-person baseline (Shared Embodied View) to two alternatives: (a) a stabilized view with guest-controlled rotation (Embedded Anchored View), and (b) a fully decoupled third-person view (Out-of-body View). We ran a user study with 24 pairs (N=48) who switched between the baseline and proposed views as task demands changed. We measured performance, embodiment, fatigue, physiological arousal, and switching behaviors. Our results reveal role-dependent trade-offs: Out-of-body View improves navigation efficiency and reduces errors, while Embedded Anchored View supports embodiment. We conclude with guidelines: use Embedded Anchored View for hand-centric adjustments, Out-of-body View for navigation and object placement, and ensure smooth transitions.

Authors:Siyuan Wang, Ke Li, Jingyuan Huang, Jike Wang, Cheng Zhang, Alanson Sample, Dongyao Chen
Title: μTouch: Enabling Accurate, Lightweight Self-Touch Sensing with Passive Magnets
Abstract:
Self-touch gestures (e.g., nuanced facial touches and subtle finger scratches) provide rich insights into human behaviors, from hygiene practices to health monitoring. However, existing approaches fall short in detecting such micro gestures due to their diverse movement patterns. This paper presents μTouch, a novel magnetic sensing platform for self-touch gesture recognition. μTouch features (1) a compact hardware design with low-power magnetometers and magnetic silicon, (2) a lightweight semi-supervised framework requiring minimal user data, and (3) an ambient field detection module to mitigate environmental interference. We evaluated μTouch in two representative applications in user studies with 11 and 12 participants. μTouch only requires three-second fine-tuning data for each gesture, and new users need less than one minute before starting to use the system. μTouch can distinguish eight different face-touching behaviors with an average accuracy of 93.41%, and reliably detect body-scratch behaviors with an average accuracy of 94.63%. μTouch demonstrates accurate and robust sensing performance even after a month, showcasing its potential as a practical tool for hygiene monitoring and dermatological health applications.

Authors:Dev Vikesh Doshi, Mehjabeen Tasnim, Fernando Landeros, Chinthagumpala Muni Venkatesh, Daniel Timko, Muhammad Lutfor Rahman
Title: What Are Brands Telling You About Smishing? A Cross-Industry Evaluation of Customer Guidance
Abstract:
Phishing attacks through text, also known as smishing, are a prevalent type of social engineering tactic in which attackers impersonate brands to deceive victims into providing personal information and/or money. While smishing awareness and cyber education are a key method by which organizations communicate this awareness, the guidance itself varies widely. In this paper, we investigate the state of practice of how 149 well-known brands across 25 categories educate their customers about smishing and what smishing prevention and reporting advice they provide. After conducting a comprehensive content analysis of the brands, we identified significant gaps in the smishing-related information provided: only 46\% of the 149 brands mentioned the definition of smishing, less than 1\% had a video tutorial on smishing, and only 50\% of brands provided instructions on how to report. Our study highlights variation in terminology, prevention advice, and reporting mechanisms across industries, with some brands recommending potentially ineffective strategies such as "ignoring suspicious messages." These findings establish a baseline for understanding the current state of industry smishing awareness advice and provide specific areas where standardization improvements are needed. From our evaluation, we provide recommendations for brands on how to offer streamlined education to their respective customers on smishing for better awareness and protection against increasing smishing attacks.

Authors:Lixiang Zhao, Fuqi Xie, Tobias Isenberg, Hai-Ning Liang, Lingyun Yu
Title: ScaleFree: Dynamic KDE for Multiscale Point Cloud Exploration in VR
Abstract:
We present ScaleFree, a GPU-accelerated adaptive Kernel Density Estimation (KDE) algorithm for scalable, interactive multiscale point cloud exploration. With this technique, we cater to the massive datasets and complex multiscale structures in advanced scientific computing, such as cosmological simulations with billions of particles. Effective exploration of such data requires a full 3D understanding of spatial structures, a capability for which immersive environments such as VR are particularly well suited. However, simultaneously supporting global multiscale context and fine-grained local detail remains a significant challenge. A key difficulty lies in dynamically generating continuous density fields from point clouds to facilitate the seamless scale transitions: while KDE is widely used, precomputed fields restrict the accuracy of interaction and omit fine-scale structures, while dynamic computation is often too costly for real-time VR interaction. We address this challenge by leveraging GPU acceleration with k-d-tree-based spatial queries and parallel reduction within a thread group for on-the-fly density estimation. With this approach, we can recalculate scalar fields dynamically as users shift their focus across scales. We demonstrate the benefits of adaptive density estimation through two data exploration tasks: adaptive selection and progressive navigation. Through performance experiments, we demonstrate that ScaleFree with GPU-parallel implementation achieves orders-of-magnitude speedups over sequential and multi-core CPU baselines. In a controlled experiment, we further confirm that our adaptive selection technique improves accuracy and efficiency in multiscale selection tasks.

Authors:Zeyang Huang, Takanori Fujiwara, Angelos Chatzimparmpas, Wandrille Duchemin, Andreas Kerren
Title: MAPLE: Self-supervised Learning-Enhanced Nonlinear Dimensionality Reduction for Visual Analysis
Abstract:
We present a new nonlinear dimensionality reduction method, MAPLE, that enhances UMAP by improving manifold modeling. MAPLE employs a self-supervised learning approach to more efficiently encode low-dimensional manifold geometry. Central to this approach are maximum manifold capacity representations (MMCRs), which help untangle complex manifolds by compressing variances among locally similar data points while amplifying variance among dissimilar data points. This design is particularly effective for high-dimensional data with substantial intra-cluster variance and curved manifold structures, such as biological or image data. Our qualitative and quantitative evaluations demonstrate that MAPLE can produce clearer visual cluster separations and finer subcluster resolution than UMAP while maintaining comparable computational cost.

Authors:Peter Zeng, Weiling Li, Amie Paige, Zhengxiang Wang, Panagiotis Kaliosis, Dimitris Samaras, Gregory Zelinsky, Susan Brennan, Owen Rambow
Title: LVLMs and Humans Ground Differently in Referential Communication
Abstract:
For generative AI agents to partner effectively with human users, the ability to accurately predict human intent is critical. But this ability to collaborate remains limited by a critical deficit: an inability to model common ground. Here, we present a referential communication experiment with a factorial design involving director-matcher pairs (human-human, human-AI, AI-human, and AI-AI) that interact with multiple turns in repeated rounds to match pictures of objects not associated with any obvious lexicalized labels. We release the online pipeline for data collection, the tools and analyses for accuracy, efficiency, and lexical overlap, and a corpus of 356 dialogues (89 pairs over 4 rounds each) that unmasks LVLMs' limitations in interactively resolving referring expressions, a crucial skill that underlies human language use.

Authors:Neeley Pate, Adiba Mahbub Proma, Hangfeng He, James N. Druckman, Daniel Molden, Gourab Ghoshal, Ehsan Hoque
Title: Replicating Human Motivated Reasoning Studies with LLMs
Abstract:
Motivated reasoning -- the idea that individuals processing information may be motivated to reach a certain conclusion, whether it be accurate or predetermined -- has been well-explored as a human phenomenon. However, it is unclear whether base LLMs mimic these motivational changes. Replicating 4 prior political motivated reasoning studies, we find that base LLM behavior does not align with expected human behavior. Furthermore, base LLM behavior across models shares some similarities, such as smaller standard deviations and inaccurate argument strength assessments. We emphasize the importance of these findings for researchers using LLMs to automate tasks such as survey data collection and argument assessment.

Authors:Xiaowei Chen, Mindy Tran, Yue Deng, Bhupendra Acharya, Yixin Zou
Title: From Harm to Healing: Understanding Individual Resilience after Cybercrimes
Abstract:
How do individuals recover from cybercrimes? Victims experience various types of harm after cybercrimes, including monetary loss, data breaches, negative emotions, and even psychological trauma. The aspects that support their recovery process and contribute to individual cyber resilience remain underinvestigated. To address this gap, we interviewed 18 cybercrime victims from Western Europe using a trauma-informed approach. We identified four common stages following victimization: recognition, coping, processing, and recovery. Participants adopted various strategies to mitigate the impact of cybercrime and used different indicators to describe recovery. While they mostly relied on social support and self-regulation for emotional coping, service providers largely determined whether victims were able to recover their money. Internal factors, external support, and context sensitivity collectively contribute to individuals' cyber resilience. We recommend trauma-informed support for cybercrime victims. Extending our conceptualization of individual cyber resilience, we propose collaborative and context-sensitive strategies to address the harmful impacts of cybercrime.

Authors:Dipto Das, Afrin Prio, Pritu Saha, Shion Guha, Syed Ishtiaque Ahmed
Title: How do the Global South Diasporas Mobilize for Transnational Political Change?
Abstract:
This paper examines how non-resident Bangladeshis mobilized during the 2024 quota-reform turned pro-democracy movement, leveraging social platforms and remittance flows to challenge state authority. Drawing on semi-structured interviews, we identify four phases of their collective action: technology-mediated shifts to active engagement, rapid transnational network building, strategic execution of remittance boycott, reframing economic dependence as political leverage, and adaptive responses to government surveillance and information blackouts. We extend postcolonial computing by introducing the idea of "diasporic superposition," which shows how diasporas can exercise political and economic influence from hybrid positionalities that both contest and complicate power asymmetries. We reframe diaspora engagement by highlighting how migrants participate in and reshape homeland politics, beyond narratives of integration in host countries. We advance the scholarship on financial technologies by foregrounding their relationship with moral economies of care, state surveillance, regulatory constraints, and uneven international economic power dynamics. Together, these contributions theorize how transnational activism and digital technologies intersect to mobilize political change in Global South contexts.

Authors:Shiye Cao, Jiwon Moon, Yifan Xu, Anqi Liu, Chien-Ming Huang
Title: Reframing Conversational Design in HRI: Deliberate Design with AI Scaffolds
Abstract:
Large language models (LLMs) have enabled conversational robots to move beyond constrained dialogue toward free-form interaction. However, without context-specific adaptation, generic LLM outputs can be ineffective or inappropriate. This adaptation is often attempted through prompt engineering, which is non-intuitive and tedious. Moreover, predominant design practice in HRI relies on impression-based, trial-and-error refinement without structured methods or tools, making the process inefficient and inconsistent. To address this, we present the AI-Aided Conversation Engine (ACE), a system that supports the deliberate design of human-robot conversations. ACE contributes three key innovations: 1) an LLM-powered voice agent that scaffolds initial prompt creation to overcome the "blank page problem," 2) an annotation interface that enables the collection of granular and grounded feedback on conversational transcripts, and 3) using LLMs to translate user feedback into prompt refinements. We evaluated ACE through two user studies, examining both designs' experience and end users' interactions with robots designed using ACE. Results show that ACE facilitates the creation of robot behavior prompts with greater clarity and specificity, and that the prompts generated with ACE lead to higher-quality human-robot conversational interactions.

Authors:Greta Warren, Jingyi Sun, Irina Shklovski, Isabelle Augenstein
Title: Show me the evidence: Evaluating the role of evidence and natural language explanations in AI-supported fact-checking
Abstract:
Although much research has focused on AI explanations to support decisions in complex information-seeking tasks such as fact-checking, the role of evidence is surprisingly under-researched. In our study, we systematically varied explanation type, AI prediction certainty, and correctness of AI system advice for non-expert participants, who evaluated the veracity of claims and AI system predictions. Participants were provided the option of easily inspecting the underlying evidence. We found that participants consistently relied on evidence to validate AI claims across all experimental conditions. When participants were presented with natural language explanations, evidence was used less frequently although they relied on it when these explanations seemed insufficient or flawed. Qualitative data suggests that participants attempted to infer evidence source reliability, despite source identities being deliberately omitted. Our results demonstrate that evidence is a key ingredient in how people evaluate the reliability of information presented by an AI system and, in combination with natural language explanations, offers valuable support for decision-making. Further research is urgently needed to understand how evidence ought to be presented and how people engage with it in practice.

Authors:Mayank Sharma, Roy Pea, Hari Subramonyam
Title: ConvoLearn: A Dataset of Constructivist Tutor-Student Dialogue
Abstract:
In educational applications, LLMs exhibit several fundamental pedagogical limitations, such as their tendency to reveal solutions rather than support dialogic learning. We introduce ConvoLearn (https://huggingface.co/datasets/masharma/convolearn ), a dataset grounded in knowledge building theory that operationalizes six core pedagogical dimensions: cognitive engagement, formative assessment, accountability, cultural responsiveness, metacognition, and power dynamics. We construct a semi-synthetic dataset of 1250 tutor-student dialogues (20 turns each) in middle school Earth Science through controlled interactions between human teachers and a simulated student. Using QLoRA, we demonstrate that training on this dataset meaningfully shifts LLM behavior toward knowledge-building strategies. Human evaluation by 31 teachers shows our fine-tuned Mistral 7B (M = 4.10, SD = 1.03) significantly outperforms both its base version (M = 2.59, SD = 1.11) and Claude Sonnet 4.5 (M = 2.87, SD = 1.29) overall. This work establishes a potential framework to guide future development and evaluation of constructivist AI tutors.

Authors:Shakyani Jayasiriwardene, Hongyu Zhou, Weiwei Jiang, Benjamin Tag, Emmanuel Stamatakis, Anusha Withana, Zhanna Sarsenbayeva
Title: From Fixed to Flexible: Shaping AI Personality in Context-Sensitive Interaction
Abstract:
Conversational agents are increasingly expected to adapt across contexts and evolve their personalities through interactions, yet most remain static once configured. We present an exploratory study of how user expectations form and evolve when agent personality is made dynamically adjustable. To investigate this, we designed a prototype conversational interface that enabled users to adjust an agent's personality along eight research-grounded dimensions across three task contexts: informational, emotional, and appraisal. We conducted an online mixed-methods study with 60 participants, employing latent profile analysis to characterize personality classes and trajectory analysis to trace evolving patterns of personality adjustment. These approaches revealed distinct personality profiles at initial and final configuration stages, and adjustment trajectories, shaped by context-sensitivity. Participants also valued the autonomy, perceived the agent as more anthropomorphic, and reported greater trust. Our findings highlight the importance of designing conversational agents that adapt alongside their users, advancing more responsive and human-centred AI.

Authors:Alicia Guo, David Ledo, George Fitzmaurice, Fraser Anderson
Title: Protosampling: Enabling Free-Form Convergence of Sampling and Prototyping through Canvas-Driven Visual AI Generation
Abstract:
As an emergent process, creativity relies on explorations via sampling and prototyping for problem construction. These activities compile knowledge, provide a context enveloping the solution, and answer questions. With Generative AI, practitioners can go beyond sampling existing media towards instantly generating and remixing new ones. We refer to this convergence as 'protosampling'. Using existing literature we ground a definition for protosampling and operationalize it through Atelier, a canvas-like system that leverages a variety of generative image and video models for visual creation. Atelier: (1) blends the spaces for thinking and creation, where both references and generated assets co-exist in one space, (2) provides various encapsulated technical workflows that focus on the activity at hand, and (3) enables navigating emergence through interactive visualizations, smart search, and collections. Protosampling as a lens reframes creative work to emphasize the process itself and how seemingly disjointed thoughts can tightly interweave into a final solution.

Authors:Thomas Krämer, Daniel Hienert, Francesco Chiossi, Thomas Kosch, Dagmar Kern
Title: Escaping the Filter Bubble: Evaluating Electroencephalographic Theta Band Synchronization as Indicator for Selective Exposure in Online News Reading
Abstract:
Selective exposure to online news occurs when users favor information that confirms their beliefs, creating filter bubbles and limiting diverse perspectives. Interactive systems can counter this by recommending different perspectives, but to achieve this, they need a real-time metric for selective exposure. We present an experiment where we evaluate Electroencephalography (EEG) and eye tracking as indicators for selective exposure by using eye tracking to recognize which textual parts participants read and using EEG to quantify the magnitude of selective exposure. Participants read online news while we collected EEG and eye movements with their agreement towards the news. We show that the agreement with news correlates positively with the theta band power in the parietal area. Our results indicate that future interactive systems can sense selective exposure using EEG and eye tracking to propose a more balanced information diet. This work presents an integrated experimental setup that identifies selective exposure using gaze and EEG-based metrics.

Authors:Daniel Hienert, Heiko Schmidt, Thomas Krämer, Dagmar Kern
Title: EyeLiveMetrics: Real-time Analysis of Online Reading with Eye Tracking
Abstract:
Existing eye tracking software have certain limitations, especially with respect to monitoring reading online: (1) Most eye tracking software record eye tracking data as raw coordinates and stimuli as screen images/videos, but without inherent links between both. Analysts must draw areas of interest (AOIs) on webpage text for more fine-grained reading analysis. (2) The computation and analysis of fixation and reading metrics are done after the experiment and thus cannot be used for live applications. We present EyeLiveMetrics, a browser plugin that automatically maps raw gaze coordinates to text in real time. The plugin instantly calculates, stores, and provides fixation, saccade, and reading measures on words and paragraphs so that gaze behavior can be analyzed immediately. We also discuss the results of a comparative evaluation. EyeLiveMetrics offers a flexible way to measure reading on the web - for research experiments and live applications.

Authors:Joyjit Roy, Samaresh Kumar Singh
Title: Device-Native Autonomous Agents for Privacy-Preserving Negotiations
Abstract:
Automated negotiations in insurance and business-to-business (B2B) commerce encounter substantial challenges. Current systems force a trade-off between convenience and privacy by routing sensitive financial data through centralized servers, increasing security risks, and diminishing user trust. This study introduces a device-native autonomous Artificial Intelligence (AI) agent system for privacy-preserving negotiations. The proposed system operates exclusively on user hardware, enabling real-time bargaining while maintaining sensitive constraints locally. It integrates zero-knowledge proofs to ensure privacy and employs distilled world models to support advanced on-device reasoning. The architecture incorporates six technical components within an agentic AI workflow. Agents autonomously plan negotiation strategies, conduct secure multi-party bargaining, and generate cryptographic audit trails without exposing user data to external servers. The system is evaluated in insurance and B2B procurement scenarios across diverse device configurations. Results show an average success rate of 87%, a 2.4x latency improvement over cloud baselines, and strong privacy preservation through zero-knowledge proofs. User studies show 27% higher trust scores when decision trails are available. These findings establish a foundation for trustworthy autonomous agents in privacy-sensitive financial domains.

Authors:Yuanchen Bai, Zijian Ding, Ruixiang Han, Niti Parikh, Wendy Ju, Angelique Taylor
Title: Towards Considerate Human-Robot Coexistence: A Dual-Space Framework of Robot Design and Human Perception in Healthcare
Abstract:
The rapid advancement of robotics, spanning expanded capabilities, more intuitive interaction, and more integration into real-world workflows, is reshaping what it means for humans and robots to coexist. Beyond sharing physical space, this coexistence is increasingly characterized by organizational embeddedness, temporal evolution, social situatedness, and open-ended uncertainty. However, prior work has largely focused on static snapshots of attitudes and acceptance, offering limited insight into how perceptions form and evolve, and what active role humans play in shaping coexistence as a dynamic process. We address these gaps through in-depth follow-up interviews with nine participants from a 14-week co-design study on healthcare robots. We identify the human perception space, including four interpretive dimensions (i.e., degree of decomposition, temporal orientation, scope of reasoning, and source of evidence). We enrich the conceptual framework of human-robot coexistence by conceptualizing the mutual relationship between the human perception space and the robot design space as a co-evolving loop, in which human needs, design decisions, situated interpretations, and social mediation continuously reshape one another over time. Building on this, we propose considerate human-robot coexistence, arguing that humans act not only as design contributors but also as interpreters and mediators who actively shape how robots are understood and integrated across deployment stages.

Authors:Daniel Ogenrwot, John Businge
Title: AgenticFlict: A Large-Scale Dataset of Merge Conflicts in AI Coding Agent Pull Requests on GitHub
Abstract:
Software Engineering 3.0 marks a paradigm shift in software development, in which AI coding agents are no longer just assistive tools but active contributors. While prior empirical studies have examined productivity gains and acceptance patterns in AI-assisted development, the challenges associated with integrating agent-generated contributions remain less understood. In particular, merge conflicts, a fundamental aspect of collaborative software development, remain underexplored in this context. In this paper, we present AgenticFlict, a large-scale dataset of textual merge conflicts in AI coding agent pull requests (Agentic PRs). The dataset comprises 142K+ Agentic PRs collected from 59K+ repositories, of which 107K+ are successfully processed through deterministic merge simulation. Our pipeline identifies 29K+ PRs exhibiting merge conflicts, yielding a conflict rate of 27.67%, and extracts 336K+ fine-grained conflict regions across these instances. Our preliminary exploratory analysis indicates that merge conflicts are both frequent and often substantial in AI-generated contributions, with noticeable variation across agents, emphasizing the need to better understand and manage integration challenges in AI-assisted software development. The dataset, code and supplementary materials are available in zenodo: https://doi.org/10.5281/zenodo.19396917.

Authors:Dina Albassam, Kexin Quan, Mengke Wu, Sanika Pande, ChengXiang Zhai, Yun Huang
Title: YT-Pilot: Turning YouTube into Structured Learning Pathways with Context-Aware AI Support
Abstract:
YouTube is widely used for informal learning, where learners explore lectures and tutorials without a predefined curriculum. However, learning across videos remains fragmented: learners must decide what to watch, how videos relate, and how knowledge builds. Existing tools provide partial support but treat planning and learning as separate activities, lacking a persistent interaction structure that connects them. Grounded in self-regulated learning theory (SRLT), we introduce YT-Pilot, a pathway-aware learning system that operationalizes the learning pathway as a persistent, user-facing interaction structure spanning planning and learning. The pathway coordinates goal setting, planning, navigation, progress tracking, and cross-video assistance. Through a within-subjects study ($N=20$), we show that YT-Pilot significantly improves perceived goal clarity, pathway coherence, and progress tracking, while shifting interaction toward pathway-level reasoning across multiple resources.

Authors:Junwei Yu, Mufeng Yang, Yepeng Ding, Hiroyuki Sato
Title: Structural Feature Engineering for Generative Engine Optimization: How Content Structure Shapes Citation Behavior
Abstract:
The proliferation of AI-powered search engines has shifted information discovery from traditional link-based retrieval to direct answer generation with selective source citation, creating new challenges for content visibility. While existing Generative Engine Optimization (GEO) approaches focus primarily on semantic content modification, the role of structural features in influencing citation behavior remains underexplored. In this paper, we propose GEO-SFE, a systematic framework for structural feature engineering in generative engine optimization. Our approach decomposes content structure into three hierarchical levels: macro-structure (document architecture), meso-structure (information chunking), and micro-structure (visual emphasis), and models their impact on citation probability across different generative engine architectures. We develop architecture-aware optimization strategies and predictive models that preserve semantic integrity while improving structural effectiveness. Experimental evaluation across six mainstream generative engines demonstrates consistent improvements in citation rate (17.3 percent) and subjective quality (18.5 percent), validating the effectiveness and generalizability of the proposed framework. This work establishes structural optimization as a foundational component of GEO, providing a data-driven methodology for enhancing content visibility in LLM-powered information ecosystems.

Authors:Einari Vaaras, Manu Airaksinen, Okko Räsänen
Title: Evaluating Interactive 2D Visualization as a Sample Selection Strategy for Biomedical Time-Series Data Annotation
Abstract:
Reliable machine-learning models in biomedical settings depend on accurate labels, yet annotating biomedical time-series data remains challenging. Algorithmic sample selection may support annotation, but evidence from studies involving real human annotators is scarce. Consequently, we compare three sample selection methods for annotation: random sampling (RND), farthest-first traversal (FAFT), and a graphical user interface-based method enabling exploration of complementary 2D visualizations (2DVs) of high-dimensional data. We evaluated the methods across four classification tasks in infant motility assessment (IMA) and speech emotion recognition (SER). Twelve annotators, categorized as experts or non-experts, performed data annotation under a limited annotation budget, and post-annotation experiments were conducted to evaluate the sampling methods. Across all classification tasks, 2DV performed best when aggregating labels across annotators. In IMA, 2DV most effectively captured rare classes, but also exhibited greater annotator-to-annotator label distribution variability resulting from the limited annotation budget, decreasing classification performance when models were trained on individual annotators' labels; in these cases, FAFT excelled. For SER, 2DV outperformed the other methods among expert annotators and matched their performance for non-experts in the individual-annotator setting. A failure risk analysis revealed that RND was the safest choice when annotator count or annotator expertise was uncertain, whereas 2DV had the highest risk due to its greater label distribution variability. Furthermore, post-experiment interviews indicated that 2DV made the annotation task more interesting and enjoyable. Overall, 2DV-based sampling appears promising for biomedical time-series data annotation, particularly when the annotation budget is not highly constrained.

Authors:Yimeng Wang, Yinzhou Wang, Alicia Hong, Yixuan Zhang
Title: Explore LLM-enabled Tools to Facilitate Imaginal Exposure Exercises for Social Anxiety
Abstract:
Social anxiety (SA) is a prevalent mental health challenge that significantly impacts daily social interactions. Imaginal Exposure (IE), a Cognitive Behavioral Therapy (CBT) technique involving imagined anxiety-provoking scenarios, is effective but underutilized, in part because traditional IE homework requires clients to construct and sustain clinically relevant fear narratives. In this work, we explore the feasibility of an LLM-enabled tool that supports IE by generating vivid, personalized exposure scripts. We first co-designed ImaginalExpoBot with mental health professionals, followed by a formative evaluation with five therapists and a user study involving 19 individuals experiencing SA symptoms. Our findings show that LLM-enabled support can facilitate preparation for anxiety-inducing situations while enabling immediate, user-specific adaptation, with scenarios remaining within a therapeutically beneficial "window of tolerance". Our participants and MHPs also identified limitations in continuity and customization, pointing to the need for deeper adaptivity in future designs. These findings offer preliminary design insights for integrating LLMs into structured therapeutic practices in accessible, scalable ways.

Authors:Boxuan Ma, Baofeng Ren, Huiyong Li, Gen Li, Li Chen, Atsushi Shimada, Shin'Ichi Konomi
Title: Designing a Meta-Reflective Dashboard for Instructor Insight into Student-AI Interactions
Abstract:
Generative AI tools are increasingly used for coursework help, shifting much of students' help-seeking and reasoning into student-AI chats that are largely invisible to instructors. This loss of visibility can weaken instructors' ability to understand students' difficulties, ensure alignment with course goals, and uphold course policies. Yet transcript-level access is neither scalable nor ethically straightforward: reading raw chat logs across a class is impractical, and exposing detailed dialogue can raise privacy concerns and chilling effects on help seeking. As a result, instructors face a tension between needing actionable insight and avoiding default surveillance of student conversations. To address this gap, we propose a meta-reflective dashboard that makes student-AI sessions interpretable without exposing raw chat logs by default. After each help-seeking session, a reflection AI produces a structured, session-level summary of the student's interaction trajectory, AI usage patterns, and potential risks. We co-designed the dashboard with instructors and students to surface key challenges and design goals, and conducted a formative evaluation of perceived usefulness, trust in the summaries, and privacy acceptability. Findings suggest that the proposed dashboard can reduce instructors' sensemaking effort while mitigating privacy concerns associated with transcript-level access, and they also yield design implications for evidence, governance, and scalable class-level analytics for AI-supported learning.

Authors:Boxuan Ma, Yinjie Xie, Huiyong Li, Gen Li, Li Chen, Atsushi Shimada, Shin'Ichi Konomi
Title: Design Implications for Student and Educator Needs in AI-Supported Programming Learning Tools
Abstract:
AI-powered coding assistants can support students in programming courses by providing on-demand explanations and debugging help. However, existing research often focuses on individual tools, leaving a gap in evidence-based design recommendations that reflect both educator and student perspectives in education settings. To ground the design of learning-oriented AI coding assistants for both sides' needs, we conducted parallel surveys of educators (N=50) and students (N=90) to compare preferences about (i) how students should request help, (ii) how AI should respond, and (iii) who should control. Our results show that educators generally favored indirect scaffolding that preserves students' reasoning, whereas students were more likely to prefer direct, actionable help. Educators further highlighted the need for course-aligned constraints and instructor-facing oversight, while students emphasized timely support and clarity when stuck. Based on these findings, we discuss the interaction-focused design space and derive design implications for learning-oriented AI coding assistants, highlighting scaffolding and control mechanisms that balance students' agency with instructional constraints.

Authors:Boxuan Ma, Huiyong Li, Gen Li, Li Chen, Cheng Tang, Atsushi Shimada, Shin'ichi Konomi
Title: Three Years with Classroom AI in Introductory Programming: Shifts in Student Awareness, Interaction, and Performance
Abstract:
Generative AI (GenAI) tools such as ChatGPT now provide novice programmers with instant, personalized support and are reshaping computing education. While a growing body of work examines AI's immediate impacts, longitudinal evidence remains limited on how students' awareness, student-AI interaction patterns, and course outcomes evolve as AI becomes routine in classrooms. To address this gap, we investigate an introductory Python course across three successive AI-supported cohorts (2023-2025). Using questionnaires, coded student-AI dialogue logs, and course assessment records, we examine cohort-to-cohort shifts in students' AI awareness, interaction practices, and learning outcomes. We find that students' relationships with GenAI change systematically over time: familiarity and uptake become increasingly normative, and help-seeking practices evolve alongside growing AI literacy and shifting expectations of what the assistant should provide. These changes suggest that, in the AI era, the central instructional challenge is less about whether students use AI and more about how courses redefine productive learning practices while maintaining student agency. Our study offers longitudinal evidence and practical implications for designing and integrating AI programming support in course settings.

Authors:Protiva Das, Sovon Chakraborty, Sidhant Narula, Lucas Potter, Xavier-Lewis Palmer, Pratip Rana, Daniel Takabi, Mohammad Ghasemigol
Title: BioShield: A Context-Aware Firewall for Securing Bio-LLMs
Abstract:
The rapid advancement of Large Language Models (LLMs) in biological research has significantly lowered the barrier to accessing complex bioinformatics knowledge, ex perimental design strategies, and analytical workflows. While these capabilities accelerate innovation, they also introduce serious dual-use risks, as Bio-LLMs can be exploited to generate harmful biological insights under the guise of legitimate research queries. Existing safeguards, such as static prompt filtering and policy-based restrictions, are insufficient when LLMs are embedded within dynamic biological workflows and application-layer systems. In this paper, we present BioShield, a context-aware application-level firewall designed to secure Bio LLMs against dual-use attacks. At the core of BioShield is a domain-specific prompt scanner that performs contextual risk analysis of incoming queries. The scanner leverages a harmful scoring mechanism tailored to biological dual-use threat cat egories to identify prompts that attempt to conceal malicious intent within seemingly benign research requests. Queries ex ceeding a predefined risk threshold are blocked before reaching the model, effectively preventing unsafe knowledge generation at the source. In addition to pre-generation protection, BioShield deploys a post-generation output verification module that inspects model responses for actionable or weaponizable biological content. If an unsafe response is detected, the system triggers controlled regeneration under strengthened safety constraints. By combining contextual prompt scanning with response-level validation, BioShield provides a layered defense framework specifically designed for bio-domain LLM deployments. Our framework advances cyberbiosecurity by formalizing dual-use threat detection in Bio-LLMs and proposing a structured mitigation strategy for secure, responsible AI driven biological research.

Authors:Lingavasan Suresh Kumar, Yang Ba, Rong Pan
Title: MemArchitect: A Policy Driven Memory Governance Layer
Abstract:
Persistent Large Language Model (LLM) agents expose a critical governance gap in memory management. Standard Retrieval-Augmented Generation (RAG) frameworks treat memory as passive storage, lacking mechanisms to resolve contradictions, enforce privacy, or prevent outdated information ("zombie memories") from contaminating the context window. We introduce MemArchitect, a governance layer that decouples memory lifecycle management from model weights. MemArchitect enforces explicit, rule-based policies, including memory decay, conflict resolution, and privacy controls. We demonstrate that governed memory consistently outperforms unmanaged memory in agentic settings, highlighting the necessity of structured memory governance for reliable and safe autonomous systems.

Authors:Artemis Kontou, Natalia Miroshnikova, Costakis Matheou, Sophocles Sophocleous, Nicholas Tsekouras, Kleanthis Malialis, Panayiotis Kolios
Title: A Novel end-to-end Digital Health System Using Deep Learning-based ECG Analysis
Abstract:
This study presents AI-HEART, a cloud-based information system for managing and analysing long-duration ambulatory electrocardiogram (ECG) recordings and supporting clinician decision-making. The platform operationalises an end-to-end pipeline that ingests multi-day three-lead ECGs, normalises inputs, performs signal preprocessing, and applies dedicated deep neural networks for wave delineation, noise/quality detection, and beat- and rhythm-level multi-class arrhythmia classification. To address class imbalance and real-world signal variability, model development combines large clinically annotated datasets with expert-in-the-loop curation and generative augmentation for under-represented rhythms. Empirical evaluation on three-lead ambulatory ECG data shows that delineation accuracy is sufficient for automated interval measurement, noise detection reliably flags poor-quality segments, and arrhythmia classification achieves high specificity with clinically useful macro-averaged performance across common and rarer rhythms. Beyond predictive accuracy, AI-HEART provides a scalable deployment approach for integrating AI into routine ECG services, enabling traceable outputs, audit-friendly storage of recordings and derived annotations, and clinician review/editing that captures feedback for controlled model improvement. The findings demonstrate the technical feasibility and operational value of a noise-aware AI-ECG platform as a digital health information system.

Authors:Abdullah Ghani, Yash Vekaria, Zubair Shafiq
Title: PixelConfig: Longitudinal Measurement and Reverse-Engineering of Meta Pixel Configurations
Abstract:
Tracking pixels are used to optimize online ad campaigns through personalization, re-targeting, and conversion tracking. Past research has primarily focused on detecting the prevalence of tracking pixels on the web, with limited attention to how they are configured across websites. A tracking pixel may be configured differently on different websites. In this paper, we present a differential analysis framework: PixelConfig, to reverse-engineer the configurations of Meta Pixel deployments across the web. Using this framework, we investigate three types of Meta Pixel configurations: activity tracking (i.e., what a user is doing on a website), identity tracking (i.e., who a user is or who the device is associated with), and tracking restrictions (i.e., mechanisms to limit the sharing of potentially sensitive information). Using data from the Internet Archive's Wayback Machine, we analyze and compare Meta Pixel configurations on 18K health-related websites with a control group of the top 10K websites from 2017 to 2024. We find that activity tracking features, such as automatic events that collect button clicks and page metadata, and identity tracking features, such as first-party cookies that are unaffected by third-party cookie blocking, reached adoption rates of up to 98.4%, largely driven by the Pixel's default settings. We also find that the Pixel is being used to track potentially sensitive information, such as user interactions related to booking medical appointments and button clicks associated with specific medical conditions (e.g., erectile dysfunction) on health-related websites. Tracking restriction features, such as Core Setup, are configured on up to 34.3% of health websites and 8.7% of control websites. However, even when enabled, these tracking restriction features provide limited protection and can be circumvented in practice.

Authors:Daryl Hedley, Doug Pietrzak, Jorge Dias, Ian Burden, Bakhtawar Ahtisham, Zhuqian Zhou, Kirk Vanacore, Josh Marland, Rachel Slama, Justin Reich, Kenneth Koedinger, René Kizilcec
Title: Sandpiper: Orchestrated AI-Annotation for Educational Discourse at Scale
Abstract:
Digital educational environments are expanding toward complex AI and human discourse, providing researchers with an abundance of data that offers deep insights into learning and instructional processes. However, traditional qualitative analysis remains a labor-intensive bottleneck, severely limiting the scale at which this research can be conducted. We present Sandpiper, a mixed-initiative system designed to serve as a bridge between high-volume conversational data and human qualitative expertise. By tightly coupling interactive researcher dashboards with agentic Large Language Model (LLM) engines, the platform enables scalable analysis without sacrificing methodological rigor. Sandpiper addresses critical barriers to AI adoption in education by implementing context-aware, automated de-identification workflows supported by secure, university-housed infrastructure to ensure data privacy. Furthermore, the system employs schema-constrained orchestration to eliminate LLM hallucinations and enforces strict adherence to qualitative codebooks. An integrated evaluations engine allows for the continuous benchmarking of AI performance against human labels, fostering an iterative approach to model refinement and validation. We propose a user study to evaluate the system's efficacy in improving research efficiency, inter-rater reliability, and researcher trust in AI-assisted qualitative workflows.

Authors:Tirthankar Halder, Argha Sen, Swadhin Pradhan, Rijurekha Sen, Sandip Chakraborty
Title: MIRO: Multi-radar Identity and Ranging for Occupational Safety
Abstract:
Occupational exposure to airborne particulate matter (PM) poses a severe health risk in open industrial workspaces such as stonecutting yards. Conventional monitoring solutions such as wearable PM sensors and camera-based tracking are impractical due to discomfort, maintenance issues, and privacy concerns. We present MIRO, a privacy-preserving framework that integrates continuous PM sensing with a multi-radar millimeter-wave (mmWave) re-identification (re-ID) backbone. A distributed network of PM sensors captures localized pollutant concentrations, while spatially overlapping mmWave radars track and re-associate workers across viewpoints without relying on visual cues. To ensure identity consistency across radars, we introduce a GAN-based view adaptation network that compensates for azimuthal distortions in range-Doppler (RD) signatures, combined with correlation-based cross-radar matching. In controlled laboratory experiments, our system achieves a re-ID F1-score of 90.4% and a mean Structural Similarity Index Measure (SSIM) of 0.70 for view adaptation accuracy. Field trials in rural stone-cutting yards further validate the system's robustness, demonstrating reliable worker-specific PM exposure estimation.

Authors:Bonnie Rushing, William Hersch, Shouhuai Xu
Title: Cognitive Warfare: Definition, Framework, and Case Study
Abstract:
Cognitive warfare has emerged as a central feature of modern conflict, yet it remains inconsistently defined and difficult to evaluate. Existing approaches often treat cognitive operations as a subset of information operations, limiting the ability to assess cognitive attacker-defender interactions or determine when advantage has been achieved. This article proposes a unified definition of cognitive warfare, introduces an interaction framework grounded in the OODA loop, and identifies measurable attributes associated with cognitive superiority. To illustrate the use of the framework, a notional case study demonstrates how these concepts can be applied to assess cognitive attacks and defenses in a contested environment. Thus, the framework provides joint force leaders and analysts with a practical foundation for understanding, comparing, and evaluating cognitive warfare campaigns.

Authors:Hideaki Yamamoto, Yifan Li, Wakako Yukita, Tomoyuki Yokota, Takao Someya, Ryo Takahashi, Yoshihiro Kawahara
Title: Body-scale NFC for wearables: human-centric body-scale NFC networking for ultra-low-power wearable devices (Demo of UTokyo Kawahara Lab 2025)
Abstract:
Near Field Communication (NFC) is a promising technology for ultra-low-power wearables, yet its short communication range limits its use to narrow-area, point-to-point interactions. We propose a body-scale NFC networking system that extends NFC coverage around the body, enabling surface-to-multipoint communication with distributed NFC sensor tags. This demonstration introduces two key technologies: Meander NFC and picoRing NFC. First, Meander NFC expands a clothing-based NFC networking area up to body scale while enabling a stable readout of small NFC tags occupying 1% of the coverage area. Meander NFC uses a meander coil which creates a spatially confined inductive field along the textile surface, ensuring robust coupling with small tags while preventing undesired electromagnetic body coupling. Second, picoRing NFC solves the weak inductive coupling caused by distance and size mismatches. By leveraging middle-range NFC and coil optimization, picoRing NFC extends the communication range to connect these disparate nodes between the ring and wristband.

Authors:Yuxin Zhang, Fan Zhang, Zihao Song, Chao Zhao
Title: From Sustainable Materials to User-Centered Sustainability: Material Experience in Art Healing
Abstract:
This study develops sustainable materials using hydrogel as the matrix and explores the transition from sustainable materials to user-centered sustainability, with a particular focus on achieving art healing through material experience. The findings reveal that "Aesthetic" property exert the greatest influence on art healing in the context of multimodal material experiences involving visual, tactile, and smell, followed by "Intrinsic" property, whereas "Physical" property have a comparatively limited effect. Furthermore, the study proposes a material experience framework that enables designers to systematically and holistically understanding material characteristics. It highlights the importance of considering users' psychological perceptions and emotional needs in the material design process.

Authors:Yuepeng Chen, Kaili Zheng, Ji Wu, Zhuangzhuang Li, Ye Ma, Dongwei Liu, Chenyi Guo, Xiangling Fu
Title: From Continuous sEMG Signals to Discrete Muscle State Tokens: A Robust and Interpretable Representation Framework
Abstract:
Surface electromyography (sEMG) signals exhibit substantial inter-subject variability and are highly susceptible to noise, posing challenges for robust and interpretable decoding. To address these limitations, we propose a discrete representation of sEMG signals based on a physiology-informed tokenization framework. The method employs a sliding window aligned with the minimal muscle contraction cycle to isolate individual muscle activation events. From each window, ten time-frequency features, including root mean square (RMS) and median frequency (MDF), are extracted, and K-means clustering is applied to group segments into representative muscle-state tokens. We also introduce a large-scale benchmark dataset, ActionEMG-43, comprising 43 diverse actions and sEMG recordings from 16 major muscle groups across the body. Based on this dataset, we conduct extensive evaluations to assess the inter-subject consistency, representation capacity, and interpretability of the proposed sEMG tokens. Our results show that the token representation exhibits high inter-subject consistency (Cohen's Kappa = 0.82+-0.09), indicating that the learned tokens capture consistent and subject-independent muscle activation patterns. In action recognition tasks, models using sEMG tokens achieve Top-1 accuracies of 75.5% with ViT and 67.9% with SVM, outperforming raw-signal baselines (72.8% and 64.4%, respectively), despite a 96% reduction in input dimensionality. In movement quality assessment, the tokens intuitively reveal patterns of muscle underactivation and compensatory activation, offering interpretable insights into neuromuscular control. Together, these findings highlight the effectiveness of tokenized sEMG representations as a compact, generalizable, and physiologically meaningful feature space for applications in rehabilitation, human-machine interaction, and motor function analysis.

Authors:Chen Bo Calvin Zhang, Christina Q. Knight, Nicholas Kruus, Jason Hausenloy, Pedro Medeiros, Nathaniel Li, Aiden Kim, Yury Orlovskiy, Coleman Breen, Bryce Cai, Jasper Götting, Andrew Bo Liu, Samira Nedungadi, Paula Rodriguez, Yannis Yiming He, Mohamed Shaaban, Zifan Wang, Seth Donoughe, Julian Michael
Title: LLM Novice Uplift on Dual-Use, In Silico Biology Tasks
Abstract:
Large language models (LLMs) perform increasingly well on biology benchmarks, but it remains unclear whether they uplift novice users -- i.e., enable humans to perform better than with internet-only resources. This uncertainty is central to understanding both scientific acceleration and dual-use risk. We conducted a multi-model, multi-benchmark human uplift study comparing novices with LLM access versus internet-only access across eight biosecurity-relevant task sets. Participants worked on complex problems with ample time (up to 13 hours for the most involved tasks). We found that LLM access provided substantial uplift: novices with LLMs were 4.16 times more accurate than controls (95% CI [2.63, 6.87]). On four benchmarks with available expert baselines (internet-only), novices with LLMs outperformed experts on three of them. Perhaps surprisingly, standalone LLMs often exceeded LLM-assisted novices, indicating that users were not eliciting the strongest available contributions from the LLMs. Most participants (89.6%) reported little difficulty obtaining dual-use-relevant information despite safeguards. Overall, LLMs substantially uplift novices on biological tasks previously reserved for trained practitioners, underscoring the need for sustained, interactive uplift evaluations alongside traditional benchmarks.

Authors:Masahiro Yoshida, Bingxuan Li, Songyan Zhao, Qinyi Zhou, Shiwei Hu, Xiang Anthony Chen, Nanyun Peng
Title: CoLyricist: Enhancing Lyric Writing with AI through Workflow-Aligned Support
Abstract:
We propose CoLyricist, an AI-assisted lyric writing tool designed to support the typical workflows of experienced lyricists and enhance their creative efficiency. While lyricists have unique processes, many follow common stages. Tools that fail to accommodate these stages challenge integration into creative practices. Existing research and tools lack sufficient understanding of these songwriting stages and their associated challenges, resulting in ineffective designs. Through a formative study involving semi-structured interviews with 10 experienced lyricists, we identified four key stages: Theme Setting, Ideation, Drafting Lyrics, and Melody Fitting. CoLyricist addresses these needs by incorporating tailored AI-driven support for each stage, optimizing the lyric writing process to be more seamless and efficient. To examine whether this workflow-aligned design also benefits those without prior experience, we conducted a user study with 16 participants, including both experienced and novice lyricists. Results showed that CoLyricist enhances the songwriting experience across skill levels. Novice users especially appreciated the Melody-Fitting feature, while experienced users valued the Ideation support.

Authors:Miriam Remshard, Yara Kyrychenko, Sander van der Linden, Matthew H. Goldberg, Anthony Leiserowitz, Elena Savoia, Jon Roozenbeek
Title: Addressing Climate Action Misperceptions with Generative AI
Abstract:
Mitigating climate change requires behaviour change. However, even climate-concerned individuals often hold misperceptions about which actions most reduce carbon emissions. We recruited 1201 climate-concerned individuals to examine whether discussing climate actions with a large language model (LLM) equipped with climate knowledge and prompted to provide personalised responses would foster more accurate perceptions of the impacts of climate actions and increase willingness to adopt feasible, high-impact behaviours. We compared this to having participants run a web search, have a conversation with an unspecialised LLM, and no intervention. The personalised climate LLM was the only condition that led to increased knowledge about the impacts of climate actions and greater intentions to adopt impactful behaviours. While the personalised climate LLM did not outperform a web search in improving understanding of climate action impacts, the ability of LLMs to deliver personalised, actionable guidance may make them more effective at motivating impactful pro-climate behaviour change.

Authors:Md Sabbir Ahmed, Kaitlyn Dorothy Petz, Noah French, Tanvi Lakhtakia, Aayushi Sangani, Mark Rucker, Xinyu Chen, Bethany A. Teachman, Laura E. Barnes
Title: SocialPulse: On-Device Detection of Social Interactions in Naturalistic Settings Using Smartwatch Multimodal Sensing
Abstract:
Social interactions are fundamental to well-being, yet automatically detecting them in daily life-particularly using wearables-remains underexplored. Most existing systems are evaluated in controlled settings, focus primarily on in-person interactions, or rely on restrictive assumptions (e.g., requiring multiple speakers within fixed temporal windows), limiting generalizability to real-world use. We present an on-watch interaction detection system designed to capture diverse interactions in naturalistic settings. A core component is a foreground speech detector trained on a public dataset. Evaluated on over 100,000 labeled foreground speech and background sound instances, the detector achieves a balanced accuracy of 85.51%, outperforming prior work by 5.11%. We evaluated the system in a real-world deployment (N=38), with over 900 hours of total smartwatch wear time. The system detected 1,691 interactions, 77.28% were confirmed via participant self-report, with durations ranging from under one minute to over one hour. Among correct detections, 81.45% were in-person, 15.7% virtual, and 1.85% hybrid. Leveraging participant-labeled data, we further developed a multimodal model achieving a balanced accuracy of 90.36% and a sensitivity of 91.17% on 33,698 labeled 15-second windows. These results demonstrate the feasibility of real-world interaction sensing and open the door to adaptive, context-aware systems responding to users' dynamic social environments.

Authors:Griffin Pitts, Sanaz Motamedi
Title: What Drives Students' Use of AI Chatbots? Technology Acceptance in Conversational AI
Abstract:
Conversational AI tools have been rapidly adopted by students and are becoming part of their learning routines. To understand what drives this adoption, we draw on the Technology Acceptance Model (TAM) and examine how perceived usefulness and perceived ease of use relate to students' behavioral intention to use conversational AI that generates responses for learning tasks. We extend TAM by incorporating trust, perceived enjoyment, and subjective norms as additional factors that capture social and affective influences and uncertainty around AI outputs. Using partial least squares structural equation modeling, we find perceived usefulness remains the strongest predictor of students' intention to use conversational AI. However, perceived ease of use does not exert a significant direct effect on behavioral intention once other factors are considered, operating instead indirectly through perceived usefulness. Trust and subjective norms significantly influence perceptions of usefulness, while perceived enjoyment exerts both a direct and indirect effect on usage intentions. These findings suggest that adoption decisions for conversational AI systems are influenced less by effort-related considerations and more by confidence in system outputs, affective engagement, and social context. Future research is needed to further examine how these acceptance relationships generalize across different conversational systems and usage contexts.

Authors:Poorna Talkad Sukumar, Maurizio Porfiri, Oded Nov
Title: Studying the Separability of Visual Channel Pairs in Symbol Maps
Abstract:
Visualizations often encode multivariate data by mapping attributes to distinct visual channels such as color, size, or shape. The effectiveness of these encodings depends on separability--the extent to which channels can be perceived independently. Yet systematic evidence for separability, especially in map-based contexts, is lacking. We present a crowdsourced experiment that evaluates the separability of four channel pairs--color (ordered) x shape, color (ordered) x size, size x shape, and size x orientation--in the context of bivariate symbol maps. Both accuracy and speed analyses show that color x shape is the most separable and size x orientation the least separable, while size x color and size x shape do not differ. Separability also proved asymmetric--performance depended on which channel encoded the task-relevant variable, with color and shape outperforming size, and square shape especially difficult to discriminate. Our findings advance the empirical understanding of visual separability, with implications for multivariate map design.

Authors:Varun Shiri, Charles Liu, Keyu Yao, Jin L. C. Guo, Jinghui Cheng
Title: Beyond Privacy Labels: How Users Perceive Different Information Sources for Understanding App's Privacy Practices
Abstract:
Despite having growing awareness and concerns about privacy, technology users are often insufficiently informed of the data practices of various digital products to protect themselves. Privacy policies and privacy labels, as two conventional ways of communicating data practices, are each criticized for important limitations -- one being lengthy and filled with legal jargon, and the other oversimplified and inaccurate -- causing users significant difficulty in understanding the privacy practices of the products and assessing their impact. To mitigate those issues, we explore ways to enhance privacy labels with the relevant content in complementary sources, including privacy policy, app reviews, and community-curated privacy assessments. Our user study results indicate that perceived usefulness and trust on those information sources are personal and influenced by past experience. Our work highlights the importance of considering various information needs for privacy practice and consolidating different sources for more useful privacy solutions.

Authors:Madeleine Grunde-McLaughlin, Hussein Mozannar, Maya Murad, Jingya Chen, Saleema Amershi, Adam Fourney
Title: Overseeing Agents Without Constant Oversight: Challenges and Opportunities
Abstract:
To enable human oversight, agentic AI systems often provide a trace of reasoning and action steps. Designing traces to have an informative, but not overwhelming, level of detail remains a critical challenge. In three user studies on a Computer User Agent, we investigate the utility of basic action traces for verification, explore three alternatives via design probes, and test a novel interface's impact on error finding in question-answering tasks. As expected, we find that current practices are cumbersome, limiting their efficacy. Conversely, our proposed design reduced the time participants spent finding errors. However, although participants reported higher levels of confidence in their decisions, their final accuracy was not meaningfully improved. To this end, our study surfaces challenges for human verification of agentic systems, including managing built-in assumptions, users' subjective and changing correctness criteria, and the shortcomings, yet importance, of communicating the agent's process.

Authors:Fangjie Li, Nicholas Kavoussi, Charan Mohan, Matthieu Chabanas, Jie Ying Wu
Title: Automated Assessment of Kidney Ureteroscopy Exploration for Training
Abstract:
Purpose: Kidney ureteroscopic navigation is challenging with a steep learning curve. However, current clinical training has major deficiencies, as it requires one-on-one feedback from experts and occurs in the operating room (OR). Therefore, there is a need for a phantom training system with automated feedback to greatly \revision{expand} training opportunities. Methods: We propose a novel, purely ureteroscope video-based scope localization framework that automatically identifies calyces missed by the trainee in a phantom kidney exploration. We use a slow, thorough, prior exploration video of the kidney to generate a reference reconstruction. Then, this reference reconstruction can be used to localize any exploration video of the same phantom. Results: In 15 exploration videos, a total of 69 out of 74 calyces were correctly classified. We achieve < 4mm camera pose localization error. Given the reference reconstruction, the system takes 10 minutes to generate the results for a typical exploration (1-2 minute long). Conclusion: We demonstrate a novel camera localization framework that can provide accurate and automatic feedback for kidney phantom explorations. We show its ability as a valid tool that enables out-of-OR training without requiring supervision from an expert.

Authors:Zirong Chen, Meiyi Ma
Title: Real-World Design and Deployment of an Embedded GenAI-powered 9-1-1 Calltaking Training System: Experiences and Lessons Learned
Abstract:
Emergency call-takers form the first operational link in public safety response, handling over 240 million calls annually while facing a sustained training crisis: staffing shortages exceed 25\% in many centers, and preparing a single new hire can require up to 720 hours of one-on-one instruction that removes experienced personnel from active duty. Traditional training approaches struggle to scale under these constraints, limiting both coverage and feedback timeliness. In partnership with Metro Nashville Department of Emergency Communications (MNDEC), we designed, developed, and deployed a GenAI-powered call-taking training system under real-world constraints. Over six months, deployment scaled from initial pilot to 190 operational users across 1,120 training sessions, exposing systematic challenges around system delivery, rigor, resilience, and human factors that remain largely invisible in controlled or purely simulated evaluations. By analyzing deployment logs capturing 98,429 user interactions, organizational processes, and stakeholder engagement patterns, we distill four key lessons, each coupled with concrete design and governance practices. These lessons provide grounded guidance for researchers and practitioners seeking to embed AI-driven training systems in safety-critical public sector environments where embedded constraints fundamentally shape socio-technical design.

Authors:Tianyu Song, Feng Li, Felix Pabst, Miruna-Alexandra Gafencu Yuan Bi, Ulrich Eck, Nassir Navab
Title: Comparative Study of Ultrasound Shape Completion and CBCT-Based AR Workflows for Spinal Needle Interventions
Abstract:
Purpose: This study compares two augmented reality (AR)-guided imaging workflows, one based on ultrasound shape completion and the other on cone-beam computed tomography (CBCT), for planning and executing lumbar needle interventions. The aim is to assess how imaging modality influences user performance, usability, and trust during AR-assisted spinal procedures. Methods: Both imaging systems were integrated into an AR framework, enabling in situ visualization and trajectory guidance. The ultrasound-based workflow combined AR-guided robotic scanning, probabilistic shape completion, and AR visualization. The CBCT-based workflow used AR-assisted scan volume planning, CBCT acquisition, and AR visualization. A between-subject user study was conducted and evaluated in two phases: (1) planning and image acquisition, and (2) needle insertion. Results: Planning time was significantly shorter with the CBCT-based workflow, while SUS, SEQ, and NASA-TLX were comparable between modalities. In the needle insertion phase, the CBCT-based workflow yielded marginally faster insertion times, lower placement error, and better subjective ratings with higher Trust. The ultrasound-based workflow achieved adequate accuracy for facet joint insertion, but showed larger errors for lumbar puncture, where reconstructions depended more heavily on shape completion. Conclusion: The findings indicate that both AR-guided imaging pipelines are viable for spinal intervention support. CBCT-based AR offers advantages in efficiency, precision, usability, and user confidence during insertion, whereas ultrasound-based AR provides adaptive, radiation-free imaging but is limited by shape completion in deeper spinal regions. These complementary characteristics motivate hybrid AR guidance that uses CBCT for global anatomical context and planning, augmented by ultrasound for adaptive intraoperative updates.

Authors:Mersedeh Sadeghi, Simon Scholz, Max Unterbusch, Andreas Vogelsang
Title: V-SHiNE: A Virtual Smart Home Framework for Explainability Evaluation
Abstract:
Explanations are essential for helping users interpret and trust autonomous smart-home decisions, yet evaluating their quality and impact remains methodologically difficult in this domain. V-SHiNE addresses this gap: a browser-based smarthome simulation framework for scalable and realistic assessment of explanations. It allows researchers to configure environments, simulate behaviors, and plug in custom explanation engines, with flexible delivery modes and rich interaction logging. A study with 159 participants demonstrates its feasibility. V-SHiNE provides a lightweight, reproducible platform for advancing user-centered evaluation of explainable intelligent systems

Authors:Matthew Prock, Ziv Epstein, Hope Schroeder, Amy Smith, Cassandra Lee, Vana Goblot, Farnaz Jahanbakhsh
Title: Interpretive Cultures: Resonance, randomness, and negotiated meaning for AI-assisted tarot divination
Abstract:
While generative AI tools are increasingly adopted for creative and analytical tasks, their role in interpretive practices, where meaning is subjective, plural, and non-causal, remains poorly understood. This paper examines AI-assisted tarot reading, a divinatory practice in which users pose a query, draw cards through a randomized process, and ask AI systems to interpret the resulting symbols. Drawing on interviews with tarot practitioners and Hartmut Rosa's Theory of Resonance, we investigate how users seek, negotiate, and evaluate resonant interpretations in a context where no causal relationship exists between the query and the data being interpreted. We identify distinct ways practitioners incorporate AI into their interpretive workflows, including using AI to navigate uncertainty and self-doubt, explore alternative perspectives, and streamline or extend existing divinatory practices. Based on these findings, we offer design recommendations for AI systems that support interpretive meaning-making without collapsing ambiguity or foreclosing user agency.

Authors:Liuchuan Yu, Yongqi Zhang, Lap-Fai Yu
Title: Reality Copilot: Voice-First Human-AI Collaboration in Mixed Reality Using Large Multimodal Models
Abstract:
Large Multimodal Models (LMMs) have shown strong potential for assisting users in tasks, such as programming, content creation, and information access, yet their interaction remains largely limited to traditional interfaces such as desktops and smartphones. Meanwhile, advances in mixed reality (MR) hardware have enabled applications that extend beyond entertainment and into everyday use. However, most existing MR systems rely primarily on manual input (e.g., hand gestures or controllers) and provide limited intelligent assistance due to the lack of integration with large-scale AI models. We present Reality Copilot, a voice-first human-AI assistant for mixed reality that leverages LMMs to enable natural speech-based interaction. The system supports contextual understanding of physical environments, realistic 3D content generation, and real-time information retrieval. In addition to in-headset interaction, Reality Copilot facilitates cross-platform workflows by generating context-aware textual content and exporting generated assets. This work explores the design space of LMM-powered human-AI collaboration in mixed reality.

Authors:Yuxin Zhang, Fan Zhang
Title: The Effect of Design Thinking on Creative & Innovation Processes: An Empirical Study Across Different Design Experience Levels
Abstract:
This study employs linear regression and structural equation modeling to explore how Thinking Skills, Design Thinking, Creative Self-Efficacy (CSE), and Collective Creative Efficacy (CCE) drive Design Creativity & Innovation, and analyzes the structural stability of the model across different levels of experience. Path analysis results indicate that the four Design Thinking Skills, Problem-driven Design (beta = 0.198, p < 0.01), Information-driven Design (beta = 0.241, p < 0.001), Solution-driven Design (beta = 0.227, p < 0.001), and Knowledge-driven Design (beta = 0.263, p < 0.001) all significantly and positively influence Design Thinking. Furthermore, Design Thinking has a significant positive predictive effect on Design Creativity & Innovation (beta = 0.286, p < 0.001). Mediation analysis confirms three significant mediation paths: the CSE mediation path (beta = 0.128, p < 0.001), the CCE mediation path (beta = 0.073, p < 0.01), and the "CSE to CCE" chain mediation path (beta = 0.025, p < 0.01). Multi-group comparison results reveal significant differences between the student and professional groups under the full equivalence model. After relaxing specific constraints, there were no significant differences between the nested models of the baseline model, partial measurement invariance, structural weight invariance, and structural covariance invariance. These findings elucidate the multi-dimensional pathways of Design Creativity & Innovation, providing a robust empirical basis for optimizing differentiated pedagogical models and professional practice guidelines.

Authors:Xuechen Li, Shuai Zhang, Nan Cao, Qing Chen
Title: Beyond Input-Output: Rethinking Creativity through Design-by-Analogy in Human-AI Collaboration
Abstract:
While the proliferation of foundation models has significantly boosted individual productivity, it also introduces a potential challenge: the homogenization of creative content. In response, we revisit Design-by-Analogy (DbA), a cognitively grounded approach that fosters novel solutions by mapping inspiration across domains. However, prevailing perspectives often restrict DbA to early ideation or specific data modalities, while reducing AI-driven design to simplified input-output pipelines. Such conceptual limitations inadvertently foster widespread design fixation. To address this, we expand the understanding of DbA by embedding it into the entire creative process, thereby demonstrating its capacity to mitigate such fixation. Through a systematic review of 85 studies, we identify six forms of representation and classify techniques across seven stages of the creative process. We further discuss three major application domains: creative industries, intelligent manufacturing, and education and services, demonstrating DbA's practical relevance. Building on this synthesis, we frame DbA as a mediating technology for human-AI collaboration and outline the potential opportunities and inherent risks for advancing creativity support in HCI and design research.

Authors:Ashutosh Chaubey, Jiacheng Pang, Maksim Siniukov, Mohammad Soleymani
Title: AVERE: Improving Audiovisual Emotion Reasoning with Preference Optimization
Abstract:
Emotion understanding is essential for building socially intelligent agents. Although recent multimodal large language models have shown strong performance on this task, two key challenges remain - spurious associations between emotions and irrelevant audiovisual cues, and hallucinations of audiovisual cues driven by text priors in the language model backbone. To quantify and understand these issues, we introduce EmoReAlM, a benchmark designed to evaluate MLLMs for cue-emotion associations, hallucinations and modality agreement. We then propose AVEm-DPO, a preference optimization technique that aligns model responses with both audiovisual inputs and emotion-centric queries. Specifically, we construct preferences over responses exhibiting spurious associations or hallucinations, and audiovisual input pairs guided by textual prompts. We also include a regularization term that penalizes reliance on text priors, thereby mitigating modality-specific cue hallucinations. Experimental results on DFEW, RAVDESS and EMER demonstrate that our method significantly improves the performance of the reference baseline models with 6-19% of relative performance gains in zero-shot settings. By providing both a rigorous benchmark and a robust optimization framework, this work enables principled evaluation and improvement of MLLMs for emotion understanding and social AI. Code, models and benchmark will be released at https://avere-iclr.github.io.

Authors:Aashish Panta, Giorgio Scorzelli, Amy A. Gooch, Werner Sun, Katherine S. Shanks, Suchismita Sarker, Devin Bougie, Keara Soloway, Rolf Verberg, Tracy Berman, Glenn Tarcea, John Allison, Michela Taufer, Valerio Pascucci
Title: Large Data Acquisition and Analytics at Synchrotron Radiation Facilities
Abstract:
Synchrotron facilities like the Cornell High Energy Synchrotron Source (CHESS) generate massive data volumes from complex beamline experiments, but face challenges such as limited access time, the need for on-site experiment monitoring, and managing terabytes of data per user group. We present the design, deployment, and evaluation of a framework that addresses CHESS's data acquisition and management issues. Deployed on a secure CHESS server, our system provides real time, web-based tools for remote experiment monitoring and data quality assessment, improving operational efficiency. Implemented across three beamlines (ID3A, ID3B, ID4B), the framework managed 50-100 TB of data and over 10 million files in late 2024. Testing with 43 research groups and 86 dashboards showed reduced overhead, improved accessibility, and streamlined data workflows. Our paper highlights the development, deployment, and evaluation of our framework and its transformative impact on synchrotron data acquisition.

Authors:Lucile Favero, Juan Antonio Pérez-Ortiz, Tanja Käser, Nuria Oliver
Title: AI in Education Beyond Learning Outcomes: Cognition, Agency, Emotion, and Ethics
Abstract:
Artificial intelligence (AI) is rapidly being integrated into educational contexts, promising personalized support and increased efficiency. However, growing evidence suggests that the uncritical adoption of AI may produce unintended harms that extend beyond individual learning outcomes to affect broader societal goals. This paper examines the societal implications of AI in education through an integrative framework with four interrelated dimensions: cognition, agency, emotional well-being, and ethics. Drawing on research from education, cognitive science, psychology, and ethics, we synthesize existing evidence to show how AI-driven cognitive offloading, diminished learner agency, emotional disengagement, and surveillance-oriented practices can mutually reinforce one another. We argue that these dynamics risk undermining critical thinking, intellectual autonomy, emotional resilience, and trust, capacities that are foundational both for effective learning and also for democratic participation and informed civic engagement. Moreover, AI's impact is contingent on design and governance: pedagogically aligned, ethically grounded, and human-centered AI systems can scaffold effortful reasoning, support learner agency, and preserve meaningful social interaction. By integrating fragmented strands of prior research into a unified framework, this paper advances the discourse on responsible AI in education and offers actionable implications for educators, designers, and institutions. Ultimately, the paper contends that the central challenge is not whether AI should be used in education, but how it can be designed and governed to support learning while safeguarding the social and civic purposes of education.

Authors:Zeynep G. Saribatur, Johannes Langer, Ute Schmid
Title: The Dual Role of Abstracting over the Irrelevant in Symbolic Explanations: Cognitive Effort vs. Understanding
Abstract:
Explanations are central to human cognition, yet AI systems often produce outputs that are difficult to understand. While symbolic AI offers a transparent foundation for interpretability, raw logical traces often impose a high extraneous cognitive load. We investigate how formal abstractions, specifically removal and clustering, impact human reasoning performance and cognitive effort. Utilizing Answer Set Programming (ASP) as a formal framework, we define a notion of irrelevant details to be abstracted over to obtain simplified explanations. Our cognitive experiments, in which participants classified stimuli across domains with explanations derived from an answer set program, show that clustering details significantly improve participants' understanding, while removal of details significantly reduce cognitive effort, supporting the hypothesis that abstraction enhances human-centered symbolic explanations.

Authors:Yoonsang Kim, Divyansh Pradhan, Devshree Jadeja, Arie Kaufman
Title: From Speech-to-Spatial: Grounding Utterances on A Live Shared View with Augmented Reality
Abstract:
We introduce Speech-to-Spatial, a referent disambiguation framework that converts verbal remote-assistance instructions into spatially grounded AR guidance. Unlike prior systems that rely on additional cues (e.g., gesture, gaze) or manual expert annotations, Speech-to-Spatial infers the intended target solely from spoken references (speech input). Motivated by our formative study of speech referencing patterns, we characterize recurring ways people specify targets (Direct Attribute, Relational, Remembrance, and Chained) and ground them to our object-centric relational graph. Given an utterance, referent cues are parsed and rendered as persistent in-situ AR visual guidance, reducing iterative micro-guidance ("a bit more to the right", "now, stop.") during remote guidance. We demonstrate the use cases of our system with remote guided assistance and intent disambiguation scenarios. Our evaluation shows that Speechto-Spatial improves task efficiency, reduces cognitive load, and enhances usability compared to a conventional voice-only baseline, transforming disembodied verbal instruction into visually explainable, actionable guidance on a live shared view.

Authors:Hyunsung Cho, Xuejing Luo, Byungjoo Lee, David Lindlbauer, Antti Oulasvirta
Title: Simulating Human Audiovisual Search Behavior
Abstract:
Locating a target based on auditory and visual cues$\unicode{x2013}$such as finding a car in a crowded parking lot or identifying a speaker in a virtual meeting$\unicode{x2013}$requires balancing effort, time, and accuracy under uncertainty. Existing models of audiovisual search often treat perception and action in isolation, overlooking how people adaptively coordinate movement and sensory strategies. We present Sensonaut, a computational model of embodied audiovisual search. The core assumption is that people deploy their body and sensory systems in ways they believe will most efficiently improve their chances of locating a target, trading off time and effort under perceptual constraints. Our model formulates this as a resource-rational decision-making problem under partial observability. We validate the model against newly collected human data, showing that it reproduces both adaptive scaling of search time and effort under task complexity, occlusion, and distraction, and characteristic human errors. Our simulation of human-like resource-rational search informs the design of audiovisual interfaces that minimize search cost and cognitive load.

Authors:Prasenjit Karmakar, Manjeet Yadav, Swayanshu Rout, Swadhin Pradhan, Sandip Chakraborty
Title: From Invisible to Actionable: Augmented Reality Interactions with Indoor CO2
Abstract:
Indoor carbon dioxide (CO2) can rapidly accumulate to form invisible pollution hotspots, posing significant health risks due to its odorless and colorless nature. Despite growing interest in wearable or stationary sensors for pollutant detection, effectively visualizing CO2 levels and engaging individuals remains an ongoing challenge. In this paper, we develop a portable wrist-sized pollution sensor that detects CO2 in real time at any indoor location and reveals CO2 bubbles by highlighting sudden spikes. In order to promote better ventilation habits and user awareness, we also develop a smartphone-based augmented reality (AR) game for users to locate and disperse these high-CO2 zones. A user study with 35 participants demonstrated increased engagement and heightened understanding of CO2's health impacts. Our system's usability evaluations yielded a median score of 1.88, indicating its strong practicality.

Authors:Yoonsang Kim, Devshree Jadeja, Divyansh Pradhan, Yalong Yang, Arie Kaufman
Title: SpeechLess: Micro-utterance with Personalized Spatial Memory-aware Assistant in Everyday Augmented Reality
Abstract:
Speaking aloud to a wearable AR assistant in public can be socially awkward, and re-articulating the same requests every day creates unnecessary effort. We present SpeechLess, a wearable AR assistant that introduces a speech-based intent granularity control paradigm grounded in personalized spatial memory. SpeechLess helps users "speak less," while still obtaining the information they need, and supports gradual explicitation of intent when more complex expression is required. SpeechLess binds prior interactions to multimodal personal context-space, time, activity, and referents-to form spatial memories, and leverages them to extrapolate missing intent dimensions from under-specified user queries. This enables users to dynamically adjust how explicitly they express their informational needs, from full-utterance to micro/zero-utterance interaction. We motivate our design through a week-long formative study using a commercial smart glasses platform, revealing discomfort with public voice use, frustration with repetitive speech, and hardware constraints. Building on these insights, we design SpeechLess, and evaluate it through controlled lab and in-the-wild studies. Our results indicate that regulated speech-based interaction, can improve everyday information access, reduce articulation effort, and support socially acceptable use without substantially degrading perceived usability or intent resolution accuracy across diverse everyday environments.

Authors:Yuxin Zhang, Fan Zhang
Title: Design Perspective on Materials Experience: A CiteSpace-Based Bibliometric and Visual Analysis of Interdisciplinary Research
Abstract:
Based on a bibliometric analysis of literature from 2005 to 2024, this study reveals that material experience is undergoing a profound transformation characterized by evolving material definitions, methodological advances, and increasing interdisciplinary integration. Material types now extend beyond traditional substances to encompass virtual and biological media, underscoring a growing emphasis on perception and interaction. Methodologically, the field has transitioned from subjective descriptions to data-driven, quantifiable models focused on objective sensory analysis and multisensory integration to enhance immersion. Key drivers, including human-machine perception convergence, material-driven interface interactions, and the embedding of intelligent interactive functions, propel the discipline toward an experience-centered paradigm reflecting a deep convergence of design, science, and technology. At the national/regional level, the United States, China, Japan, Germany, and the Netherlands lead in contributions, while France, the United Kingdom, and Romania demonstrate significant interdisciplinary progress. At the institutional level, Delft University of Technology, Justus Liebig University Giessen, and the Centre National de la Recherche Scientifique show significant advantages. In particular, the Material-Driven Design theory has established a foundational impact on the discipline, while, regarding general research trends, scholars from the United States, the Netherlands, and Germany maintain the highest academic visibility. Overall, material experience research is at a critical juncture, its future development will depend on progress in material innovation, technological integration, and perceptual quantification, as well as the establishment of socio-cultural values, all of which must be effectively unified through design to address complex evolving needs.

Authors:Rudrajit Choudhuri, Christopher Sanchez, Margaret Burnett, Anita Sarma
Title: Why Johnny Can't Think: GenAI's Impacts on Cognitive Engagement
Abstract:
Context: Many students now use generative AI in their coursework, yet its effects on intellectual development remain poorly understood. While prior work has investigated students' cognitive offloading during episodic interactions, it remains unclear whether using genAI routinely is tied to more fundamental shifts in students' thinking habits. Objective: We investigate (RQ1-How): how students' trust in and routine use of genAI affect their cognitive engagement -- specifically, reflection, need for understanding, and critical thinking in STEM coursework. Further, we investigate (RQ2-Who): which students are particularly vulnerable to these cognitive disengagement effects. Method: We drew on dual-process theory, cognitive offloading, and automation bias literature to develop a statistical model explaining how and to what extent students' trust-driven routine use of genAI affected their cognitive engagement habits in coursework, and how these effects differed across students' cognitive styles. We empirically evaluated this model using Partial Least Squares Structural Equation Modeling on survey data from 299 STEM students across five North American universities. Results: Students who trusted and routinely used genAI reported significantly lower cognitive engagement. Unexpectedly, students with higher technophilic motivations, risk tolerance, and computer self-efficacy -- traits often celebrated in STEM -- were more prone to these effects. Interestingly, prior experience with genAI or academia did not protect them from cognitively disengaging. Implications: Our findings suggest a potential cognitive debt cycle in which routine genAI use progressively weakens students' intellectual habits, potentially driving over-reliance and escalating usage. This poses critical challenges for curricula and genAI system design, requiring interventions that actively support cognitive engagement.

Authors:Yi Fei Cheng, Jarod Bloch, Alexander Wang, Andrea Bianchi, Anusha Withana, Anhong Guo, Laurie M. Heller, David Lindlbauer
Title: Auditorily Embodied Conversational Agents: Effects of Spatialization and Situated Audio Cues on Presence and Social Perception
Abstract:
Embodiment can enhance conversational agents, such as increasing their perceived presence. This is typically achieved through visual representations of a virtual body; however, visual modalities are not always available, such as when users interact with agents using headphones or display-less glasses. In this work, we explore auditory embodiment. By introducing auditory cues of bodily presence - through spatially localized voice and situated Foley audio from environmental interactions - we investigate how audio alone can convey embodiment and influence perceptions of a conversational agent. We conducted a 2 (spatialization: monaural vs. spatialized) x 2 (Foley: none vs. Foley) within-subjects study, where participants (n=24) engaged in conversations with agents. Our results show that spatialization and Foley increase co-presence, but reduce users' perceptions of the agent's attention and other social attributes.

Authors:Shashiwadana Nirmania, Garima Sharma, Hourieh Khalajzadeh, Mojtaba Shahin
Title: Age Matters: Analyzing Age-Related Discussions in App Reviews
Abstract:
In recent years, mobile applications have become indispensable tools for managing various aspects of life. From enhancing productivity to providing personalized entertainment, mobile apps have revolutionized people's daily routines. Despite this rapid growth and popularity, gaps remain in how these apps address the needs of users from different age groups. Users of varying ages face distinct challenges when interacting with mobile apps, from younger users dealing with inappropriate content to older users having difficulty with usability due to age-related vision and cognition impairments. Although there have been initiatives to create age-inclusive apps, a limited understanding of user perspectives on age-related issues may hinder developers from recognizing specific challenges and implementing effective solutions. In this study, we explore age discussions in app reviews to gain insights into how mobile apps should cater to users across different age groups.We manually curated a dataset of 4,163 app reviews from the Google Play Store and identified 1,429 age-related reviews and 2,734 non-age-related reviews. We employed eight machine learning, deep learning, and large language models to automatically detect age discussions, with RoBERTa performing the best, achieving a precision of 92.46%. Additionally, a qualitative analysis of the 1,429 age-related reviews uncovers six dominant themes reflecting user concerns.

Authors:Yoonsang Kim, Swapnil Dey, Arie Kaufman
Title: Evaluating Spatialized Auditory Cues for Rapid Attention Capture in XR
Abstract:
In time-critical eXtended reality (XR) scenarios where users must rapidly reorient their attention to hazards, alerts, or instructions while engaged in a primary task, spatial audio can provide an immediate directional cue without occupying visual bandwidth. However, such scenarios can afford only a brief auditory exposure, requiring users to interpret sound direction quickly and without extended listening or head-driven refinement. This paper reports a controlled exploratory study of rapid spatial-audio localization in XR. Using HRTF-rendered broadband stimuli presented from a semi-dense set of directions around the listener, we quantify how accurately users can infer coarse direction from brief audio alone. We further examine the effects of short-term visuo-auditory feedback training as a lightweight calibration mechanism. Our findings show that brief spatial cues can convey coarse directional information, and that even short calibration can improve users' perception of aural signals. While these results highlight the potential of spatial audio for rapid attention guidance, they also show that auditory cues alone may not provide sufficient precision for complex or high-stakes tasks, and that spatial audio may be most effective when complemented by other sensory modalities or visual cues, without relying on head-driven refinement. We leverage this study on spatial audio as a preliminary investigation into a first-stage attention-guidance channel for wearable XR (e.g., VR head-mounted displays and AR smart glasses), and provide design insights on stimulus selection and calibration for time-critical use.

Authors:Zhuoyan Li, Aditya Bansal, Jinzhao Li, Shishuang He, Zhuoran Lu, Mutian Zhang, Qin Liu, Yiwei Yang, Swati Jain, Ming Yin, Yunyao Li
Title: Human-LLM Collaborative Feature Engineering for Tabular Data
Abstract:
Large language models (LLMs) are increasingly used to automate feature engineering in tabular learning. Given task-specific information, LLMs can propose diverse feature transformation operations to enhance downstream model performance. However, current approaches typically assign the LLM as a black-box optimizer, responsible for both proposing and selecting operations based solely on its internal heuristics, which often lack calibrated estimations of operation utility and consequently lead to repeated exploration of low-yield operations without a principled strategy for prioritizing promising directions. In this paper, we propose a human-LLM collaborative feature engineering framework for tabular learning. We begin by decoupling the transformation operation proposal and selection processes, where LLMs are used solely to generate operation candidates, while the selection is guided by explicitly modeling the utility and uncertainty of each proposed operation. Since accurate utility estimation can be difficult especially in the early rounds of feature engineering, we design a mechanism within the framework that selectively elicits and incorporates human expert preference feedback, comparing which operations are more promising, into the selection process to help identify more effective operations. Our evaluations on both the synthetic study and the real user study demonstrate that the proposed framework improves feature engineering performance across a variety of tabular datasets and reduces users' cognitive load during the feature engineering process.

Authors:Sizhe Cheng, Songheng Zhang, Dong Ma, Yong Wang
Title: BAIT: Visual-illusion-inspired Privacy Preservation for Mobile Data Visualization
Abstract:
With the prevalence of mobile data visualizations, there have been growing concerns about their privacy risks, especially shoulder surfing attacks. Inspired by prior research on visual illusion, we propose BAIT, a novel approach to automatically generate privacy-preserving visualizations by stacking a decoy visualization over a given visualization. It allows visualization owners at proximity to clearly discern the original visualization and makes shoulder surfers at a distance be misled by the decoy visualization, by adjusting different visual channels of a decoy visualization (e.g., shape, position, tilt, size, color and spatial frequency). We explicitly model human perception effect at different viewing distances to optimize the decoy visualization design. Privacy-preserving examples and two in-depth user studies demonstrate the effectiveness of BAIT in both controlled lab study and real-world scenarios.

Authors:Yimeng Wang, Liabette Escamilla, Yinzhou Wang, Bianca R. Augustine, Yixuan Zhang
Title: Exploring Customizable Interactive Tools for Therapeutic Homework Support in Mental Health Counseling
Abstract:
Therapeutic homework (i.e., tasks assigned by therapists for clients to complete between sessions) is essential for effective psychotherapy, yet therapists often interpret fragmented client logs, assessments, and reflections within limited preparation time. Our formative study with licensed therapists revealed three critical design requirements: support for interpreting unstructured client self-reports, customization aligned with clinical objectives, and seamless integration across multiple data sources. We then designed and developed TheraTrack, a customizable, therapist-facing tool that integrates multi-dimensional data and leverages large language models to generate traceable summaries and support natural-language queries, to streamline between-session homework tracking. Our pilot study with 14 therapists showed that TheraTrack reduced their cognitive load, enabled verification through direct navigation from AI summaries to original data entries, and was adapted differently for private analysis compared to in-session use, with dependence varying based on therapist experience and usage duration. We also discuss design implications for clinician-centered AI for mental health.

Authors:Ruishi Zou, Shiyu Xu, Margaret E Morris, Jihan Ryu, Timothy D. Becker, Nicholas Allen, Anne Marie Albano, Randy Auerbach, Dan Adler, Varun Mishra, Lace Padilla, Dakuo Wang, Ryan Sultan, Xuhai "Orson" Xu
Title: MIND: Empowering Mental Health Clinicians with Multimodal Data Insights through a Narrative Dashboard
Abstract:
Advances in data collection enable the capture of rich patient-generated data: from passive sensing (e.g., wearables and smartphones) to active self-reports (e.g., cross-sectional surveys and ecological momentary assessments). Although prior research has demonstrated the utility of patient-generated data in mental healthcare, significant challenges remain in effectively presenting these data streams along with clinical data (e.g., clinical notes) for clinical decision-making. Through co-design sessions with five clinicians, we propose MIND, a large language model-powered dashboard designed to present clinically relevant multimodal data insights for mental healthcare. MIND presents multimodal insights through narrative text, complemented by charts communicating underlying data. Our user study (N=16) demonstrates that clinicians perceive MIND as a significant improvement over baseline methods, reporting improved performance to reveal hidden and clinically relevant data insights (p<.001) and support their decision-making (p=.004). Grounded in the study results, we discuss future research opportunities to integrate data narratives in broader clinical practices.

Authors:Yumou Wei, John Carney, John Stamper, Nancy Belmont
Title: From Defense to Advocacy: Empowering Users to Leverage the Blind Spot of AI Inference
Abstract:
Most privacy regulations function as a passive defensive shield that users must wield themselves. Users are incessantly asked to "opt-in" or "opt-out" of data collection, forced to make defensive decisions whose consequences are increasingly difficult to predict. Viewed through the Johari Window, a psychological framework of self-awareness based on what is known and unknown to self and others, current policies require users to manage the Open Self and shield the Hidden Self through notice and consent. However, as organizations increasingly use AI to make inferences, the rapid expansion of Blind Self, attributes known to algorithms but unknown to the user, emerges as a critical challenge. We illustrate how current regulations fall short because they focus on data collection rather than inference and leave this blind spot unguarded. Building on the theory of Contextual Integrity, we propose a paradigm shift from defensive privacy management to proactive privacy advocacy. We argue for the necessity of personal advocacy agents capable of operationalizing social norms to harness the power of AI inference. By illuminating the hidden inferences that users can strategically leverage or suppress, these agents not only restrain the growth of Blind Self but also mine it for value. By transforming the Unknown Self into a personal asset for users, we can foster a flow of personal information that is equitable, transparent, and individually beneficial in the age of AI.

Authors:Tyler Reinmund, Lars Kunze, Marina Jirotka
Title: Sociotechnical Challenges of Machine Learning in Healthcare and Social Welfare
Abstract:
Sociotechnical challenges of machine learning in healthcare and social welfare are mismatches between how a machine learning tool functions and the structure of care practices. While prior research has documented many such issues, existing accounts often attribute them either to designers' limited social understanding or to inherent technical constraints, offering limited support for systematic description and comparison across settings. In this paper, we present a framework for conceptualizing sociotechnical challenges of machine learning grounded in qualitative fieldwork, a review of longitudinal deployment studies, and co-design workshops with healthcare and social welfare practitioners. The framework comprises (1) a categorization of eleven sociotechnical challenges organized along an ML-enabled care pathway, and (2) a process-oriented account of the conditions through which these challenges emerge across design and use. By providing a parsimonious vocabulary and an explanatory lens focused on practice, this work supports more precise analysis of how machine learning tools function and malfunction within real-world care delivery.

Authors:Donghuo Zeng, Roberto Legaspi, Kazushi Ikeda
Title: Personality-Aware Reinforcement Learning for Persuasive Dialogue with LLM-Driven Simulation
Abstract:
Effective persuasive dialogue agents adapt their strategies to individual users, accounting for the evolution of their psychological states and intentions throughout conversations. We present a personality-aware reinforcement learning approach comprising three main modules: (1) a Strategy-Oriented Interaction Framework, which serves as an agenda-based strategy controller that selects strategy-level actions and generate responses via Maximal Marginal Relevance (MMR) retrieval to ensure contextual relevance, diversity, and scalable data generation; (2) Personality-Aware User Representation Learning, which produces an 81-dimensional mixed-type embedding predicted at each turn from recent exchanges and appended to the reinforcement learning state; and (3) a Dueling Double DQN (D3QN) model and Reward Prediction, in which the policy is conditioned on dialogue history and turn-level personality estimates and trained using a composite reward incorporating agreement intent, donation amount, and changeof-mind penalties. We use an agenda-based LLM simulation pipeline to generate diverse interactions, from which personality estimation is inferred from the generated utterances. Experiments on the PersuasionForGood (P4G) dataset augmented with simulated dialogues reveal three main findings: (i) turn-level personality conditioning improves policy adaptability and cumulative persuasion rewards; (ii) LLM-driven simulation enhances generalization to unseen user behaviors; and (iii) incorporating a change-of-mind penalty reduces post-agreement retractions while slightly improving donation outcomes. These results demonstrate that structured interaction, dynamic personality estimation, and behaviorally informed rewards together yield more effective persuasive policies.

Authors:Dileepa Pitawela, Gustavo Carneiro, Hsiang-Ting Chen
Title: L2CU: Learning to Complement Unseen Users
Abstract:
Recent research highlights the potential of machine learning models to learn to complement (L2C) human strengths; however, generalizing this capability to unseen users remains a significant challenge. Existing L2C methods oversimplify interaction between human and AI by relying on a single, global user model that neglects individual user variability, leading to suboptimal cooperative performance. Addressing this, we introduce L2CU, a novel L2C framework for human-AI cooperative classification with unseen users. Given sparse and noisy user annotations, L2CU identifies representative annotator profiles capturing distinct labeling patterns. By matching unseen users to these profiles, L2CU leverages profile-specific models to complement the user and achieve superior joint accuracy. We evaluate L2CU on datasets (CIFAR-10N, CIFAR-10H, Fashion-MNIST-H, Chaoyang and AgNews), demonstrating its effectiveness as a model-agnostic solution for improving human-AI cooperative classification.

Authors:Yuxuan Huang, Qiao Jin, Tongyu Nie, Victoria Interrante, Evan Suma Rosenberg
Title: Secure Text Entry using a Virtual Radial Keyboard with Dynamically Resized Keys and Non-Intrusive Randomization
Abstract:
As virtual reality (VR) becomes more widely adopted, secure and efficient text entry is an increasingly critical need. In this paper, we identify a vulnerability in a state-of-the-art secure VR text entry method and introduce a novel virtual radial keyboard designed to achieve a balance between security with usability. Keys are arranged alphabetically in a circular layout, with each key selected by controller rotation and dynamically expanding to facilitate precise selection. A randomized rotation mechanism shifts the keyboard after each keystroke, preserving relative key positions while disrupting absolute spatial mappings to protect against inference attacks. We conducted a within-subject study (N=30) comparing our method with the prior secure technique and a standard QWERTY keyboard. Results showed that the radial keyboard significantly improves resistance to keystroke prediction attacks while incurring a tradeoff in entry speed and subjective workload due to the unfamiliar non-QWERTY layout. However, both quantitative trends and qualitative feedback indicate strong potential for performance improvements with practice. We also discuss design implications, possible interface refinements, and directions for future work, including layout variations and visual enhancements.

Authors:Leonardo Bottona, Nicolò Penzo, Bruno Lepri, Marco Guerini, Sara Tonelli
Title: LLMberjack: Guided Trimming of Debate Trees for Multi-Party Conversation Creation
Abstract:
We present LLMberjack, a platform for creating multi-party conversations starting from existing debates, originally structured as reply trees. The system offers an interactive interface that visualizes discussion trees and enables users to construct coherent linearized dialogue sequences while preserving participant identity and discourse relations. It integrates optional large language model (LLM) assistance to support automatic editing of the messages and speakers' descriptions. We demonstrate the platform's utility by showing how tree visualization facilitates the creation of coherent, meaningful conversation threads and how LLM support enhances output quality while reducing human effort. The tool is open-source and designed to promote transparent and reproducible workflows to create multi-party conversations, addressing a lack of resources of this type.

Authors:Aruzhan Sabitkyzy, Maksat Shagyrov, Pakizar Shamoi
Title: Toward a Universal Color Naming System: A Clustering-Based Approach using Multisource Data
Abstract:
Is it coral, salmon, or peach? What seems like a simple color can have many names, and without a standard, these variations create confusion across design, technology, and communication. Color naming is a fundamental task across industries such as fashion, cosmetics, web design, and visualization tools. However, the lack of universally accepted color naming standards leads to inconsistent color standards across platforms, applications, and industries. Moreover, these systems include hundreds or thousands of overlapping, perceptually indistinct shades, despite the fact that humans typically distinguish only a limited number of unique color categories in practice. In this study, we propose a clustering-based multisource data framework to build a standardized color-naming system. We collected a dataset of over 19,555 RGB values paired with color names from 20 diverse sources. After data cleaning and normalization, we converted the colors to the perceptually uniform CIELAB color space and applied K-means clustering using the CIEDE2000 color difference metric, identifying 280 optimal clusters. For each cluster, we performed a frequency analysis of the associated names to assign representative labels. The resulting system reflects naturally occurring linguistic patterns. We demonstrate its effectiveness in automatic annotation and content-based image retrieval on a clothing dataset. This approach opens new opportunities for standardized, perceptually grounded color labeling in practical applications such as generative AI, visual search, and design systems.

Authors:Virmarie Maquiling, Yasmeen Abdrabou, Enkelejda Kasneci
Title: As Far as Eye See: Vergence-Pupil Coupling in Near-Far Depth Switching
Abstract:
Vergence is widely used as a proxy for depth perception and spatial attention in immersive and real-world eye-tracking studies. In this paper, we investigate how pupil size artefacts affect vergence estimates during real physical depth viewing with a head-mounted eye tracker. Using a beamsplitter setup with physically near and far targets, we elicited controlled convergent and divergent eye movements under static, luminance-modulated, and blockwise fixation conditions. Near and far targets were reliably separable in vergence angle across participants. However, pupil-vergence coupling varied substantially across individuals and conditions. Static illumination produced large inter-participant variability, while luminance modulation reduced this spread, yielding more clustered estimates. Blockwise and audio-cued recordings further showed that pupil-vergence coupling persists even without visual depth onsets. These results suggest that pupil size fluctuations can systematically influence vergence estimates, and that controlled viewing conditions can reduce--but not eliminate--this effect.

Authors:Virmarie Maquiling, Yasmeen Abdrabou, Enkelejda Kasneci
Title: Night Eyes: A Reproducible Framework for Constellation-Based Corneal Reflection Matching
Abstract:
Corneal reflection (glint) detection plays an important role in pupil-corneal reflection (P-CR) eye tracking, but in practice it is often handled as heuristics embedded within larger systems, making reproducibility difficult across hardware setups. We introduce a 2D geometry-driven, constellation-based pipeline for mulit-glint detection and matching, focusing on reproducibility and clear evaluation. Inspired by lost-in-space star identification, we treat glints as structured constellations rather than independent blobs. We propose a Similarity-Layout Alignment (SLA) procedure which adapts constellation matching to the specific constraints of multi-LED eye tracking. The framework brings together controlled over-detection, adaptive candidate fallback, appearance-aware scoring, and optional semantic layout priors while keeping detection and correspondence explicitly separated. Evaluated on a public multi-LED dataset, the system provides stable identity-preserving correspondence under noisy conditions. We release code, presets, and evaluation scripts to enable transparent replication, comparison, and dataset annotation.

Authors:Veda Duddu, Jash Rajesh Parekh, Andy Mao, Hanyi Min, Ziang Xiao, Vedant Das Swain, Koustuv Saha
Title: Not My Truce: Personality Differences in AI-Mediated Workplace Negotiation
Abstract:
AI-driven conversational coaching is increasingly used to support workplace negotiation, yet prior work assumes uniform effectiveness across users. We challenge this assumption by examining how individual differences, particularly personality traits, moderate coaching outcomes. We conducted a between-subjects experiment (N=267) comparing theory-driven AI (Trucey), general-purpose AI (Control-AI), and a traditional negotiation handbook (Control-NoAI). Participants were clustered into three profiles -- resilient, overcontrolled, and undercontrolled -- based on the Big-Five personality traits and ARC typology. Resilient workers achieved broad psychological gains primarily from the handbook, overcontrolled workers showed outcome-specific improvements with theory-driven AI, and undercontrolled workers exhibited minimal effects despite engaging with the frameworks. These patterns suggest personality as a predictor of readiness beyond stage-based tailoring: vulnerable users benefit from targeted rather than comprehensive interventions. The study advances understanding of personality-determined intervention prerequisites and highlights design implications for adaptive AI coaching systems that align support intensity with individual readiness, rather than assuming universal effectiveness.

Authors:Shiori Nakamura, Masato Kikuchi, Tadachika Ozono
Title: Customer Analysis and Text Generation for Small Retail Stores Using LLM-Generated Marketing Presence
Abstract:
Point of purchase (POP) materials can be created to assist non-experts by combining large language models (LLMs) with human insight. Persuasive POP texts require both customer understanding and expressive writing skills. However, LLM-generated texts often lack creative diversity, while human users may have limited experience in marketing and content creation. To address these complementary limitations, we propose a prototype system for small retail stores that enhances POP creation through human-AI collaboration. The system supports users in understanding target customers, generating draft POP texts, refining expressions, and evaluating candidates through simulated personas. Our experimental results show that this process significantly improves text quality: the average evaluation score increased by 2.37 points on a -3 to +3 scale compared to that created without system support.

Authors:Long Ling, Xiyu Zheng, Gengchen Cao, Ray LC
Title: "Re-Tell the Fortune so I Can Believe It": How Chinese User Communities Engage with and Interpret GenAI-based Fortune-Telling
Abstract:
People traditionally divine the future by interpreting natural phenomena as oracular signals, especially in societies adhering to traditional beliefs like China. With the advent of Generative AI (GenAI), people gain access to new ways of probing digital oracles for predicting the future. To understand how people use and interpret GenAI for divination in China, we interviewed 22 participants who habitually use GenAI platforms for fortune-telling, complemented by a three-week digital ethnography with 1,842 community posts. Qualitative analysis showed that people who seek psychological comfort are particularly receptive to GenAI-based decision-making. Users valued GenAI's accessibility, convenience, and efficiency while perceiving its lack of spiritual mystique. We observed community dynamics forming around GenAI tools, where users reinforce interpretations by sharing and discussing with each other, repeating queries until responses align with expectations. Our work uncovers how AI technologies change the way people and communities engage in traditional cultural practices while yearning for the same goals.

Authors:Jazmin Collins, Prasanthi Gurumurthy, Eric J. Gonzalez, Mar Gonzalez-Franco
Title: Sticky and Magnetic: Evaluating Error Correction and User Adaptation in Gaze and Pinch Interaction
Abstract:
The gaze-and-pinch framework offers a high-fidelity interaction modality for spatial computing in virtual reality (VR), yet it remains vulnerable to coordination errors--timing misalignments between gaze fixation and pinch gestures. These errors are categorized into two types: late triggers (gaze leaves a target before pinch) and early triggers (pinch before gaze arrival on target). While late triggers are well-studied, early triggers lack robust solutions. We investigate two heuristics--STICKY selection (temporal buffer) and MAGNETIC selection (spatial field)--to mitigate these errors. A within-subjects study (N = 9) on the Samsung Galaxy XR evaluated these heuristics against a baseline. Findings indicate that while throughput and selection time remained stable, the heuristics fundamentally shifted user behavior and significantly reduced errors during selection. Notably, MAGNETIC selection induced an "offloading" effect where users traded precision for speed. Additionally, the heuristics reclassified ambiguous failures as explainable coordination errors. We provide recommendations for selection heuristics that enhance interaction speed and cognitive agency in virtual reality.

Authors:Saelyne Yang, Jaesang Yu, Yi-Hao Peng, Kevin Qinghong Lin, Jae Won Cho, Yale Song, Juho Kim
Title: GUIDE: A Benchmark for Understanding and Assisting Users in Open-Ended GUI Tasks
Abstract:
Graphical User Interface (GUI) agents have the potential to assist users in interacting with complex software (e.g., PowerPoint, Photoshop). While prior research has primarily focused on automating user actions through clicks and keystrokes, this paradigm overlooks human intention, where users value the ability to explore, iterate, and refine their ideas while maintaining agency. To move beyond automation and toward collaboration, GUI agents must understand what users are doing and why. We introduce GUIDE (GUI User Intent Detection Evaluation), a benchmark that evaluates AI models on their ability to perceive user behavior, infer intent, and provide assistance in open-ended GUI tasks. GUIDE consists of 67.5 hours of screen recordings from 120 novice user demonstrations with think-aloud narrations, across 10 software. GUIDE defines three tasks - (i) Behavior State Detection, (ii) Intent Prediction, and (iii) Help Prediction that test a model's ability to recognize behavior state, reason about goals, and decide when and how to help. Evaluations across eight state-of-the-art multimodal models reveal that all models struggled, achieving only 44.6% and 55.0% accuracy on behavior state and help prediction. However, providing user context significantly improved the performance, raising help prediction by up to 50.2pp, highlighting the critical role of structured user understanding in effective assistance. Our dataset is available at https://guide-bench.github.io.

Authors:Abdullah Hamdi, Changchun Yang, Xin Gao
Title: Colon-Bench: An Agentic Workflow for Scalable Dense Lesion Annotation in Full-Procedure Colonoscopy Videos
Abstract:
Early screening via colonoscopy is critical for colon cancer prevention, yet developing robust AI systems for this domain is hindered by the lack of densely annotated, long-sequence video datasets. Existing datasets predominantly focus on single-class polyp detection and lack the rich spatial, temporal, and linguistic annotations required to evaluate modern Multimodal Large Language Models (MLLMs). To address this critical gap, we introduce Colon-Bench, generated via a novel multi-stage agentic workflow. Our pipeline seamlessly integrates temporal proposals, bounding-box tracking, AI-driven visual confirmation, and human-in-the-loop review to scalably annotate full-procedure videos. The resulting verified benchmark is unprecedented in scope, encompassing 528 videos, 14 distinct lesion categories (including polyps, ulcers, and bleeding), over 300,000 bounding boxes, 213,000 segmentation masks, and 133,000 words of clinical descriptions. We utilize Colon-Bench to rigorously evaluate state-of-the-art MLLMs across lesion classification, Open-Vocabulary Video Object Segmentation (OV-VOS), and video Visual Question Answering (VQA). The MLLM results demonstrate surprisingly high localization performance in medical domains compared to SAM-3. Finally, we analyze common VQA errors from MLLMs to introduce a novel "colon-skill" prompting strategy, improving zero-shot MLLM performance by up to 9.7% across most MLLMs. The dataset and the code are available at https://abdullahamdi.com/colon-bench .

Authors:Gregor Baer, Chao Zhang, Isel Grau, Pieter Van Gorp
Title: Does Explanation Correctness Matter? Linking Computational XAI Evaluation to Human Understanding
Abstract:
Explainable AI (XAI) methods are commonly evaluated with functional metrics such as correctness, which computationally estimate how accurately an explanation reflects the model's reasoning. Higher correctness is assumed to produce better human understanding, but this link has not been tested experimentally with controlled levels. We conducted a user study (N=200) that manipulated explanation correctness at four levels (100%, 85%, 70%, 55%) in a time series classification task where participants could not rely on domain knowledge or visual intuition and instead predicted the AI's decisions based on explanations (forward simulation). Correctness affected understanding, but not at every level: performance dropped at 70% and 55% correctness relative to fully correct explanations, while further degradation below 70% produced no additional loss. Rather than shifting performance uniformly, lower correctness decreased the proportion of participants who learned the decision pattern. At the same time, even fully correct explanations did not guarantee understanding, as only a subset of participants achieved high accuracy. Exploratory analyses showed that self-reported ratings correlated with demonstrated performance only when explanations were fully correct and participants had learned the pattern. These findings show that not all differences in functional correctness translate to differences in human understanding, underscoring the need to validate functional metrics against human outcomes.

Authors:Takumi Kato, Masato Kikuchi, Tadachika Ozono
Title: On-Demand Instructional Material Providing Agent Based on MLLM for Tutoring Support
Abstract:
Effective instruction in tutoring requires promptly providing instructional materials that match the needs of each student (e.g., in response to questions). In this study, we introduce an agent that automatically delivers supplementary materials on demand during one-on-one tutoring sessions. Our agent uses a multimodal large language model to analyze spoken dialogue between the instructor and the student, automatically generate search queries, and retrieve relevant Web images. Evaluation experiments demonstrate that our agent reduces the average image retrieval time by 44.4 s compared to cases without support and successfully provides images of acceptable quality in 85.7% of trials. These results indicate that our agent effectively supports instructors during tutoring sessions.

Authors:Lynn Janzen, Üveys Eroglu, Dorothea Kolossa, Pia Knöferle, Sebastian Möller, Vera Schmitt, Veronika Solopova
Title: Gendered Prompting and LLM Code Review: How Gender Cues in the Prompt Shape Code Quality and Evaluation
Abstract:
LLMs are increasingly embedded in programming workflows, from code generation to automated code review. Yet, how gendered communication styles interact with LLM-assisted programming and code review remains underexplored. We present a mixed-methods pilot study examining whether gender-related linguistic differences in prompts influence code generation outcomes and code review decisions. Across three complementary studies, we analyze (i) collected real-world coding prompts, (ii) a controlled user study, in which developers solve identical programming tasks with LLM assistance, and (iii) an LLM-based simulated evaluation framework that systematically varies gender-coded prompt styles and reviewer personas. We find that gender-related differences in prompting style are subtle but measurable, with female-authored prompts exhibiting more indirect and involved language, which does not translate into consistent gaps in functional correctness or static code quality. For LLM code review, in contrast, we observe systematic biases: on average, models approve female-authored code more, despite comparable quality. Controlled experiments show that gender-coded prompt style affect code length and maintainability, while reviewer behavior varies across models. Our findings suggest that fairness risks in LLM-assisted programming arise less from generation accuracy than from LLM evaluation, as LLMs are increasingly deployed as automated code reviewers.

Authors:Susana Nunes, Tiago Guerreiro, Catia Pesquita
Title: Agentic Personas for Adaptive Scientific Explanations with Knowledge Graphs
Abstract:
AI explanation methods often assume a static user model, producing non-adaptive explanations regardless of expert goals, reasoning strategies, or decision contexts. Knowledge graph-based explanations, despite their capacity for grounded, path-based reasoning, inherit this limitation. In complex domains such as scientific discovery, this assumption fails to capture the diversity of cognitive strategies and epistemic stances among experts, preventing explanations that foster deeper understanding and informed decision-making. However, the scarcity of human experts limits the use of direct human feedback to produce adaptive explanations. We present a reinforcement learning approach for scientific explanation generation that incorporates agentic personas, structured representations of expert reasoning strategies, that guide the explanation agent towards specific epistemic preferences. In an evaluation of knowledge graph-based explanations for drug discovery, we tested two personas that capture distinct epistemic stances derived from expert feedback. Results show that persona-driven explanations match state-of-the-art predictive performance while persona preferences closely align with those of their corresponding experts. Adaptive explanations were consistently preferred over non-adaptive baselines (n = 22), and persona-based training reduces feedback requirements by two orders of magnitude. These findings demonstrate how agentic personas enable scalable adaptive explainability for AI systems in complex and high-stakes domains.

Authors:Sharifa Sultana, Zinnat Sultana, Jeffrey M. Rzeszotarski, Syed Ishtiaque Ahmed
Title: Embodying Facts, Figures, and Faiths in Narrative Artistic Performances in Rural Bangladesh
Abstract:
There is an increasing interest in telling serious stories with data. Designers organize information, construct narratives, and present findings to inform audiences. However, many of these practices emerge from modern information visualization rhetoric and ethical frameworks which may marginalize communities with low digital and media literacy. In a ten-month-long ethnographic study in three Bangladeshi villages, we investigated how these communities use entertainment and cultural practices, namely Puthi, Bhandari Gaan, and Pot music, to instruct, communicate traditional moral lessons and recall history. We found that these communities embrace polyvocality and multiple ethical frameworks in their performances, construct narratives combining factuality, emotionality, and aesthetics, and adapt their performances to changing technology and audience needs. Our findings provide HCI, visualization, and ethical data practitioners with implications for the design of accessible and culturally appropriate ways of presenting data narratives in data-driven systems.

Authors:Jiyeon Bae, Mingyu An, Jeongin Park, Seokweon Jung, Kiroong Choe, Jinwook Seo
Title: Physical Containers as Framing Conditions for Visualization in Augmented Reality
Abstract:
Exploratory data analysis (EDA) is often hindered by cold-start friction; when users lack specific analytic goals, they struggle to configure complex visualization parameters. While existing visualization tools mostly rely on explicit user input to frame data, we propose leveraging the physical environment as an implicit framing mechanism. We introduce a conceptual framework that uses the geometric and spatial properties of physical containers in Augmented Reality (AR) to guide data interpretation. We characterize how container attributes, such as number of faces, size, proportion, and shape, give rise to distinct perceptual tendencies. For example, a circular container may encourage cyclic interpretation, while juxtaposed planar faces may facilitate comparative analysis. By treating physical forms as environmental framing conditions, we show how AR can orient a user's attention and structure their exploration without requiring manual encoding or prescribing fixed conclusions. We demonstrate this framework through a series of AR design examples illustrating how container morphology foregrounds cyclic, comparative, and sequential analytic patterns.

Authors:Tatiana Chakravorti, Pranav Narayanan Venkit, Sourojit Ghosh, Sarah Rajtmajer
Title: Beyond Detection: Governing GenAI in Academic Peer Review as a Sociotechnical Challenge
Abstract:
Generative AI tools are increasingly entering academic peer review workflows, raising questions about fairness, accountability, and the legitimacy of evaluative judgment. While these systems promise efficiency gains amid growing reviewer overload, their use introduces new sociotechnical risks. This paper presents a convergent mixed-method study combining discourse analysis of 448 social media posts with interviews with 14 area chairs and program chairs from leading AI and HCI conferences to examine how GenAI is discussed and experienced in peer review. Across both datasets, we find broad agreement that GenAI may be acceptable for limited supportive tasks, such as improving clarity or structuring feedback, but that core evaluative judgments, assessing novelty, contribution, and acceptance, should remain human responsibilities. At the same time, participants highlight concerns about epistemic harm, over-standardization, unclear responsibility, and adversarial risks such as prompt injection. User interviews reveal how structural strain and institutional policy ambiguity shift interpretive and enforcement burdens onto individual scholars, disproportionately affecting junior authors and reviewers. By triangulating public governance discourse with lived review practices, this work reframes AI mediated peer review as a sociotechnical governance challenge and offers recommendations for preserving accountability, trust, and meaningful human oversight. Overall, we argue that AI-assisted peer review is best governed not by blanket bans or detection alone, but by explicitly reserving evaluative judgment for humans while instituting enforceable, role-specific controls that preserve accountability. We conclude with role specific recommendations that formalize the support judgment boundary.

Authors:Kanyu Chen, Rebecca Panskus, Erwin Wu, Yichen Peng, Daichi Saito, Emiko Kamiyama, Ruiteng Li, Chen-Chieh Liao, Karola Marky, Kato Akira, Hideki Koike, Kai Kunze
Title: Sensing Your Vocals: Exploring the Activity of Vocal Cord Muscles for Pitch Assessment Using Electromyography and Ultrasonography
Abstract:
Vocal training is difficult because the muscles that control pitch, resonance, and phonation are internal and invisible to learners. This paper investigates how Electromyography (EMG) and ultrasonic imaging (UI) can make these muscles observable for training purposes. We report three studies. First, we analyze the EMG and UI data from 16 singers (beginners, experienced & professionals), revealing differences among three vocal groups of the muscle control proficiency. Second, we use the collected data to create a system that visualizes an expert's muscle activity as reference. This system is tested in a user study with 12 novices, showing that EMG highlighted muscle activation nuances, while UI provided insights into vocal cord length and dynamics. Third, to compare our approach to traditional methods (audio analysis and coach instructions), we conducted a focus group study with 15 experienced singers. Our results suggest that EMG is promising for improving vocal skill development and enhancing feedback systems. We conclude the paper with a detailed comparison of the analyzed modalities (EMG, UI and traditional methods), resulting in recommendations to improve vocal muscle training systems.

Authors:Max Linnander, Yon Visell
Title: Thermopneumatic Pixels for Fast, Localized, Low-Voltage Touch Feedback
Abstract:
We present thermopneumatic pixels (TPPs), which are tactile actuators designed for rapid fabrication and straightforward integration into compact wearable and surface-based haptic systems. Each TPP converts low-voltage ($\sim$10 V) electrical pulses into transient pressure increases within a sealed cavity, producing out-of-plane forces and displacements suitable for tactile stimulation. The architecture enables scalable fabrication and spatially distributed actuation while maintaining simple electrical interfacing. The TPPs are constructed from inexpensive, readily available materials using straightforward layer-based assembly, facilitating rapid prototyping and integration into interactive devices. Mechanical characterization demonstrates peak forces exceeding 1 N and millimeter displacements. We further present driving electronics for operating multiple TPP modules concurrently and report perceptual study results demonstrating the effectiveness of the resulting tactile feedback. Together, these results establish low-voltage thermopneumatic actuation as an accessible and high-performance approach for embedding tactile feedback into experimental and consumer-facing interfaces.

Authors:Donghoon Shin, Alice Gao, Rock Yuren Pang, Jaewook Lee, Katharina Reinecke, Emily Tseng
Title: Interrogating Design Homogenization in Web Vibe Coding
Abstract:
Generative AI is known for its tendency to homogenize, often reproducing dominant style conventions found in training data. However, it remains unclear how these homogenizing effects extend to complex structural tasks like web design. As lay creators increasingly turn to LLMs to 'vibe-code' websites -- prompting for aesthetic and functional goals rather than writing code -- they may inadvertently narrow the diversity of their designs, and limit creative expression throughout the internet. In this paper, we interrogate the possibility of design homogenization in web vibe coding. We first characterize the vibe coding lifecycle, pinpointing stages where homogenization risks may arise. We then conduct a sociotechnical risk analysis unpacking the potential harms of web vibe coding and their interaction with design homogenization. We identify that the push for frictionless generation can exacerbate homogenization and its harms. Finally, we propose a mitigation framework centered on the idea of productive friction. Through case studies at the micro, meso, and macro levels, we show how centering productive friction can empower creators to challenge default outputs and preserve diverse expression in AI-mediated web design.

Authors:Zhuyu Teng, Pei Chen, Yichen Cai, Ruoqing Lu, Zhaoqu Jiang, Jiayang Li, Weitao You, Lingyun Sun
Title: Seeing Eye to Eye: Enabling Cognitive Alignment Through Shared First-Person Perspective in Human-AI Collaboration
Abstract:
Despite advances in multimodal AI, current vision-based assistants often remain inefficient in collaborative tasks. We identify two key gulfs: a communication gulf, where users must translate rich parallel intentions into verbal commands due to the channel mismatch , and an understanding gulf, where AI struggles to interpret subtle embodied cues. To address these, we propose Eye2Eye, a framework that leverages first-person perspective as a channel for human-AI cognitive alignment. It integrates three components: (1) joint attention coordination for fluid focus alignment, (2) revisable memory to maintain evolving common ground, and (3) reflective feedback allowing users to clarify and refine AI's understanding. We implement this framework in an AR prototype and evaluate it through a user study and a post-hoc pipeline evaluation. Results show that Eye2Eye significantly reduces task completion time and interaction load while increasing trust, demonstrating its components work in concert to improve collaboration.

Authors:Adrian Iste, Kazuki Nishizawa, Chisa Tanaka, Andrew Vargo, Anna Scius-Bertrand, Andreas Fischer, Koichi Kise
Title: Prediction of Grade, Gender, and Academic Performance of Children and Teenagers from Handwriting Using the Sigma-Lognormal Model
Abstract:
Digital handwriting acquisition enables the capture of detailed temporal and kinematic signals reflecting the motor processes underlying writing behavior. While handwriting analysis has been extensively explored in clinical or adult populations, its potential for studying developmental and educational characteristics in children remains less investigated. In this work, we examine whether handwriting dynamics encode information related to student characteristics using a large-scale online dataset collected from Japanese students from elementary school to junior high school. We systematically compare three families of handwriting-derived features: basic statistical descriptors of kinematic signals, entropy-based measures of variability, and parameters obtained from the sigma-lognormal model. Although the dataset contains dense stroke-level recordings, features are aggregated at the student level to enable a controlled comparison between representations. These features are evaluated across three prediction tasks: grade prediction, gender classification, and academic performance classification, using Linear or Logistic Regression and Random Forest models under consistent experimental settings. The results show that handwriting dynamics contain measurable signals related to developmental stage and individual differences, especially for the grade prediction task. These findings highlight the potential of kinematic handwriting analysis and confirm that through their development, children's handwriting evolves toward a lognormal motor organization.

Authors:Chisa Tanaka, Andrew Vargo, Anna Scius-Bertrand, Andreas Fischer, Koichi Kise
Title: From Pen Strokes to Sleep States: Detecting Low-Recovery Days Using Sigma-Lognormal Handwriting Features
Abstract:
While handwriting has traditionally been studied for character recognition and disease classification, its potential to reflect day-to-day physiological fluctuations in healthy individuals remains unexplored. This study examines whether daily variations in sleep-related recovery states can be inferred from online handwriting dynamics. % We propose a personalized binary classification framework that detects low-recovery days using features derived from the Sigma-Lognormal model, which captures the neuromotor generation process of pen strokes. In a 28-day in-the-wild study involving 13 university students, handwriting was recorded three times daily, and nocturnal cardiac indicators were measured using a wearable ring. For each participant, the lowest (or highest) quartile of four sleep-related metrics -- HRV, lowest heart rate, average heart rate, and total sleep duration -- defined the positive class. Leave-One-Day-Out cross-validation showed that PR-AUC significantly exceeded the baseline (0.25) for all four variables after FDR correction, with the strongest performance observed for cardiac-related variables. Importantly, classification performance did not differ significantly across task types or recording timings, indicating that recovery-related signals are embedded in general movement dynamics. These results demonstrate that subtle within-person autonomic recovery fluctuations can be detected from everyday handwriting, opening a new direction for non-invasive, device-independent health monitoring.

Authors:Esen K. Tütüncü, Mar Gonzalez-Franco, Khushman Patel, Eric J. Gonzalez
Title: World Mouse: Exploring Interactions with a Cross-Reality Cursor
Abstract:
As Extended Reality (XR) systems increasingly map and understand the physical world, interacting with these blended representations remains challenging. The current push for "natural" inputs has its trade-offs: touch is limited by human reach and fatigue, while gaze often lacks the precision for fine interaction. To bridge this gap, we introduce World Mouse, a cross-reality cursor that reinterprets the familiar 2D desktop mouse for complex 3D scenes. The system is driven by two core mechanisms: within-object interaction, which uses surface normals for precise cursor placement, and between-object navigation, which leverages interpolation to traverse empty space. Unlike previous virtual-only approaches, World Mouse leverages semantic segmentation and mesh reconstruction to treat physical objects as interactive surfaces. Through a series of prototypes, including object manipulation and screen-to-world transitions, we illustrate how cross-reality cursors may enable seamless interactions across real and virtual environments.

Authors:Botao Amber Hu, Danlin Huang, Yilan Elan Tao, Xiaobo Aaron Hu, Rem RunGu Lin
Title: Entangling Like Mycorrhizae: Mixing Realities Through Touch in "FungiSync"
Abstract:
Mycorrhizal networks -- often called nature's ``wood-wide web'' -- are vast underground mycelial systems that connect individual plants through countless hyphae of mycorrhizal fungi joining with plant roots. Through these hyphal webs, resources and signals -- carbohydrates, minerals, and biochemical cues -- are mutualistically exchanged and redistributed across plants, sustaining forests as relational symbiotic ecologies rather than isolated individuals. What is it like to be a plant within the wood-wide web? We present \emph{FungiSync}, a multi-person, co-located mixed reality (MR) experience that translates mycorrhizal interdependence into a felt, somaesthetic participatory ritual. Participants embody different forest plants by holding masquerade-style MR headset masks with wood-branch-like handles decorated with mushrooms. In MR, each participant perceives a distinct, audio-reactive psychedelic augmented reality overlay -- composed of resource-representing visual elements -- layered atop a shared physical terrain, symbolizing an individualized digital \emph{umwelt} (perceptual world). FungiSync reprograms human hand touch into a metaphorical mycorrhizal exchange. When participants touch hands, their digital \emph{umwelten} begin to entangle: visual elements leak, mix, and merge across perspectives, as if hyphae were forging new connections and carrying resources between hosts within a larger mycelial network. By making mycorrhizal interdependence perceptible through embodied contact, FungiSync invites participants to feel with \emph{fungal epistemics} -- a more-than-human alternative way of knowing grounded in symbiotic relationality as both an aesthetic experience and an ethical orientation -- offering a critique of the accelerated individualism characterizing our technology-mediated posthuman era.

Authors:Suyash Fulay, Prerna Ravi, Om Gokhale, Eugene Yi, Michiel Bakker, Deb Roy
Title: Agora: Teaching the Skill of Consensus-Finding with AI Personas Grounded in Human Voice
Abstract:
Deliberative democratic theory suggests that civic competence: the capacity to navigate disagreement, weigh competing values, and arrive at collective decisions is not innate but developed through practice. Yet opportunities to cultivate these skills remain limited, as traditional deliberative processes like citizens' assemblies reach only a small fraction of the population. We present Agora, an early-stage AI-powered platform that uses LLMs to organize authentic human voices on policy issues, helping users build consensus-finding skills by proposing and revising policy recommendations, hearing supporting and opposing perspectives, and receiving feedback on how policy changes affect predicted support. In a preliminary study with 44 university students, participants using the full interface (with access to voice explanations) reported higher levels of problem-solving skills, internal deliberation, and produced higher quality consensus statements compared to a control condition showing only aggregate support distributions. These initial findings point toward a promising direction for scaling civic education.

Authors:Schrasing Tong, Minseok Jung, Ilaria Liccardi, Lalana Kagal
Title: Measuring Perceptions of Fairness in AI Systems: The Effects of Infra-marginality
Abstract:
Differences in data distributions between demographic groups, known as the problem of infra-marginality, complicate how people evaluate fairness in machine learning models. We present a user study with 85 participants in a hypothetical medical decision-making scenario to examine two treatments: group-specific model performance and training data availability. Our results show that participants did not equate fairness with simple statistical parity. When group-specific performances were equal or unavailable, participants preferred models that produced equal outcomes; when performances differed, especially in ways consistent with data imbalances, they judged models that preserved those differences as more fair. These findings highlight that fairness judgments are shaped not only by outcomes, but also by beliefs about the causes of disparities. We discuss implications for popular group fairness definitions and system design, arguing that accounting for distributional context is critical to aligning algorithmic fairness metrics with human expectations in real-world applications.

Authors:Tom van Nuenen, Pratik S. Sachdeva
Title: The Fragility Of Moral Judgment In Large Language Models
Abstract:
People increasingly use large language models (LLMs) for everyday moral and interpersonal guidance, yet these systems cannot interrogate missing context and judge dilemmas as presented. We introduce a perturbation framework for testing the stability and manipulability of LLM moral judgments while holding the underlying moral conflict constant. Using 2,939 dilemmas from r/AmItheAsshole (January-March 2025), we generate three families of content perturbations: surface edits (lexical/structural noise), point-of-view shifts (voice and stance neutralization), and persuasion cues (self-positioning, social proof, pattern admissions, victim framing). We also vary the evaluation protocol (output ordering, instruction placement, and unstructured prompting). We evaluated all variants with four models (GPT-4.1, Claude 3.7 Sonnet, DeepSeek V3, Qwen2.5-72B) (N=129,156 judgments). Surface perturbations produce low flip rates (7.5%), largely within the self-consistency noise floor (4-13%), whereas point-of-view shifts induce substantially higher instability (24.3%). A large subset of dilemmas (37.9%) is robust to surface noise yet flips under perspective changes, indicating that models condition on narrative voice as a pragmatic cue. Instability concentrates in morally ambiguous cases; scenarios where no party is assigned blame are most susceptible. Persuasion perturbations yield systematic directional shifts. Protocol choices dominate all other factors: agreement between structured protocols is only 67.6% (kappa=0.55), and only 35.7% of model-scenario units match across all three protocols. These results show that LLM moral judgments are co-produced by narrative form and task scaffolding, raising reproducibility and equity concerns when outcomes depend on presentation skill rather than moral substance.

Authors:Yashika Batra, Giuliano Pioldi, Promise Ekpo, Arman Sayatqyzy, Purnjay Maruur, Shalom Otieno, Kevin Ching, Angelique Taylor
Title: RFM-HRI : A Multimodal Dataset of Medical Robot Failure, User Reaction and Recovery Preferences for Item Retrieval Tasks
Abstract:
While robots deployed in real-world environments inevitably experience interaction failures, understanding how users respond through verbal and non-verbal behaviors remains under-explored in human-robot interaction (HRI). This gap is particularly significant in healthcare-inspired settings, where interaction failures can directly affect task performance and user trust. We present the Robot Failures in Medical HRI (RFM-HRI) Dataset, a multimodal dataset capturing dyadic interactions between humans and robots embodied in crash carts, where communication failures are systematically induced during item retrieval tasks. Through Wizard-of-Oz studies with 41 participants across laboratory and hospital settings, we recorded responses to four failure types (speech, timing, comprehension, and search) derived from three years of crash-cart robot interaction data. The dataset contains 214 interaction samples including facial action units, head pose, speech transcripts, and post-interaction self-reports. Our analysis shows that failures significantly degrade affective valence and reduce perceived control compared to successful interactions. Failures are strongly associated with confusion, annoyance, and frustration, while successful interactions are characterized by surprise, relief, and confidence in task completion. Emotional responses also evolve across repeated failures, with confusion decreasing and frustration increasing over time. This work contributes (1) a publicly available multimodal dataset (RFM-HRI), (2) analysis of user responses to different failure types and preferred recovery strategies, and (3) a crash-cart retrieval scenario enabling systematic comparison of recovery strategies with implications for safety-critical failure recovery. Our findings provide a foundation for failure detection and recovery methods in embodied HRI.

Authors:Markus Knauer, Samuel Bustamante, Thomas Eiband, Alin Albu-Schäffer, Freek Stulp, João Silvério
Title: IROSA: Interactive Robot Skill Adaptation using Natural Language
Abstract:
Foundation models have demonstrated impressive capabilities across diverse domains, while imitation learning provides principled methods for robot skill adaptation from limited data. Combining these approaches holds significant promise for direct application to robotics, yet this combination has received limited attention, particularly for industrial deployment. We present a novel framework that enables open-vocabulary skill adaptation through a tool-based architecture, maintaining a protective abstraction layer between the language model and robot hardware. Our approach leverages pre-trained LLMs to select and parameterize specific tools for adapting robot skills without requiring fine-tuning or direct model-to-robot interaction. We demonstrate the framework on a 7-DoF torque-controlled robot performing an industrial bearing ring insertion task, showing successful skill adaptation through natural language commands for speed adjustment, trajectory correction, and obstacle avoidance while maintaining safety, transparency, and interpretability.

Authors:Yotam Sechayk, Hennes Rave, Max Rädler, Mark Colley, Zhongyi Zhou, Ariel Shamir, Takeo Igarashi
Title: Improving Low-Vision Chart Accessibility via On-Cursor Visual Context
Abstract:
Despite widespread use, charts remain largely inaccessible for Low-Vision Individuals (LVI). Reading charts requires viewing data points within a global context, which is difficult for LVI who may rely on magnification or experience a partial field of vision. We aim to improve exploration by providing visual access to critical context. To inform this, we conducted a formative study with five LVI. We identified four fundamental contextual elements common across chart types: axes, legend, grid lines, and the overview. We propose two pointer-based interaction methods to provide this context: Dynamic Context, a novel focus+context interaction, and Mini-map, which adapts overview+detail principles for LVI. In a study with N=22 LVI, we compared both methods and evaluated their integration to current tools. Our results show that Dynamic Context had significant positive impact on access, usability, and effort reduction; however, worsened visual load. Mini-map strengthened spatial understanding, but was less preferred for this task. We offer design insights to guide the development of future systems that support LVI with visual context while balancing visual load.

Authors:Zahra Zahedi, Xinyue Hu, Shashank Mehrotra, Mark Steyvers, Kumar Akash
Title: Strategic Shaping of Human Prosociality: A Latent-State POMDP Framework
Abstract:
We propose a decision-theoretic framework in which a robot strategically can shape inferred human's prosocial state during repeated interactions. Modeling the human's prosociality as a latent state that evolves over time, the robot learns to infer and influence this state through its own actions, including helping and signaling. We formalize this as a latent-state POMDP with limited observations and learn the transition and observation dynamics using expectation maximization. The resulting belief-based policy balances task and social objectives, selecting actions that maximize long-term cooperative outcomes. We evaluate the model using data from user studies and show that the learned policy outperforms baseline strategies in both team performance and increasing observed human cooperative behavior.

Authors:Jielin Feng, Xinwu Ye, Qianhui Li, Verena Ingrid Prantl, Jun-Hsiang Yao, Yuheng Zhao, Yun Wang, Siming Chen
Title: InfoAlign: A Human-AI Co-Creation System for Storytelling with Infographics
Abstract:
Storytelling infographics are a powerful medium for communicating data-driven stories through visual presentation. However, existing authoring tools lack support for maintaining story consistency and aligning with users' story goals throughout the design process. To address this gap, we conducted formative interviews and a quantitative analysis to identify design needs and common story-informed layout patterns in infographics. Based on these insights, we propose a narrative-centric workflow for infographic creation consisting of three phases: story construction, visual encoding, and spatial composition. Building on this workflow, we developed InfoAlign, a human-AI co-creation system that transforms long or unstructured text into stories, recommends semantically aligned visual designs, and generates layout blueprints. Users can intervene and refine the design at any stage, ensuring their intent is preserved and the infographic creation process remains transparent. Evaluations show that InfoAlign preserves story coherence across authoring stages and effectively supports human-AI co-creation for storytelling infographic design.

Authors:Christoph Albert Johns, László Kopácsi, Michael Barz, Daniel Sonntag
Title: Heads Up!: Towards In Situ Photogrammetry Annotations and Augmented Reality Visualizations for Guided Backcountry Skiing
Abstract:
Backcountry skiing is an activity where a group of skiers navigate challenging environmental conditions to ski outside of managed areas. This activity requires careful monitoring and effective communication around the current weather and terrain conditions to ensure skier safety. We aim to support and facilitate this communication by providing backcountry guides with a set of in situ spatial annotation tools to communicate hazards and appropriate speeds to the ski recreationalists. A guide can use a tablet application to annotate a photogrammetry-based map of a mountainside, for example, one collected using a commercial camera drone, with hazard points, slow-down zones, and safe zones. These annotations are communicated to the skiers via visual overlays in augmented reality heads-up displays. We present a prototype consisting of a web application and a virtual reality display that mirror the guide's and skier's perspectives, enabling participatory interaction design studies in a safe environment.

Authors:Lingyun Chen, Qing Xiao, Zitao Zhang, Eli Blevis, Selma Šabanović
Title: Positioning Modular Co-Design in Future HRI Design Research
Abstract:
Design-oriented HRI is increasingly interested in robots as long-term companions, yet many designs still assume a fixed form and a stable set of functions. We present an ongoing design research program that treats modularity as a designerly medium - a way to make long-term human-robot relationships discussable and material through co-design. Across a series of lifespan-oriented co-design activities, participants repeatedly reconfigured the same robot for different life stages, using modular parts to express changing needs, values, and roles. From these outcomes, we articulate PAS (Personalization-Adaptability-Sustainability) as a human-centered lens on how people enact modularity in practice: configuring for self-expression, adapting across transitions, and sustaining robots through repair, reuse, and continuity. We then sketch next steps toward a fabrication-aware, community-extensible modular platform and propose evaluation criteria for designerly HRI work that prioritize expressive adequacy, lifespan plausibility, repairability-in-use, and responsible stewardship - not only usability or performance.

Authors:Haojun Shi, Suyu Ye, Katherine M. Guerrerio, Jianzhi Shen, Yifan Yin, Daniel Khashabi, Chien-Ming Huang, Tianmin Shu
Title: Safe and Interpretable Multimodal Path Planning for Multi-Agent Cooperation
Abstract:
Successful cooperation among decentralized agents requires each agent to quickly adapt its plan to the behavior of other agents. In scenarios where agents cannot confidently predict one another's intentions and plans, language communication can be crucial for ensuring safety. In this work, we focus on path-level cooperation in which agents must adapt their paths to one another in order to avoid collisions or perform physical collaboration such as joint carrying. In particular, we propose a safe and interpretable multimodal path planning method, CaPE (Code as Path Editor), which generates and updates path plans for an agent based on the environment and language communication from other agents. CaPE leverages a vision-language model (VLM) to synthesize a path editing program verified by a model-based planner, grounding communication to path plan updates in a safe and interpretable way. We evaluate our approach in diverse simulated and real-world scenarios, including multi-robot and human-robot cooperation in autonomous driving, household, and joint carrying tasks. Experimental results demonstrate that CaPE can be integrated into different robotic systems as a plug-and-play module, greatly enhancing a robot's ability to align its plan to language communication from other robots or humans. We also show that the combination of the VLM-based path editing program synthesis and model-based planning safety enables robots to achieve open-ended cooperation while maintaining safety and interpretability.

Authors:Kartik Chandra, Max Kleiman-Weiner, Jonathan Ragan-Kelley, Joshua B. Tenenbaum
Title: Sycophantic Chatbots Cause Delusional Spiraling, Even in Ideal Bayesians
Abstract:
"AI psychosis" or "delusional spiraling" is an emerging phenomenon where AI chatbot users find themselves dangerously confident in outlandish beliefs after extended chatbot conversations. This phenomenon is typically attributed to AI chatbots' well-documented bias towards validating users' claims, a property often called "sycophancy." In this paper, we probe the causal link between AI sycophancy and AI-induced psychosis through modeling and simulation. We propose a simple Bayesian model of a user conversing with a chatbot, and formalize notions of sycophancy and delusional spiraling in that model. We then show that in this model, even an idealized Bayes-rational user is vulnerable to delusional spiraling, and that sycophancy plays a causal role. Furthermore, this effect persists in the face of two candidate mitigations: preventing chatbots from hallucinating false claims, and informing users of the possibility of model sycophancy. We conclude by discussing the implications of these results for model developers and policymakers concerned with mitigating the problem of delusional spiraling.

Authors:Seyed Hossein Alavi, Zining Wang, Shruthi Chockkalingam, Raymond T. Ng, Vered Shwartz
Title: Games That Teach, Chats That Convince: Comparing Interactive and Static Formats for Persuasive Learning
Abstract:
Interactive systems such as chatbots and games are increasingly used to persuade and educate on sustainability-related topics, yet it remains unclear how different delivery formats shape learning and persuasive outcomes when content is held constant. Grounding on identical arguments and factual content across conditions, we present a controlled user study comparing three modes of information delivery: static essays, conversational chatbots, and narrative text-based games. Across subjective measures, the chatbot condition consistently outperformed the other modes and increased perceived importance of the topic. However, perceived learning did not reliably align with objective outcomes: participants in the text-based game condition reported learning less than those reading essays, yet achieved higher scores on a delayed (24-hour) knowledge quiz. Additional exploratory analyses further suggest that common engagement proxies, such as verbosity and interaction length, are more closely related to subjective experience than to actual learning. These findings highlight a dissociation between how persuasive experiences feel and what participants retain, and point to important design trade-offs between interactivity, realism, and learning in persuasive systems and serious games.

Authors:Anirban Mukhopadhyay, Kevin Salubre, Hifza Javed, Shashank Mehrotra, Kumar Akash
Title: Exploring The Impact Of Proactive Generative AI Agent Roles In Time-Sensitive Collaborative Problem-Solving Tasks
Abstract:
Collaborative problem-solving under time pressure is common but difficult, as teams must generate ideas quickly, coordinate actions, and track progress. Generative AI offers new opportunities to assist, but we know little about how proactive agents affect the dynamics of real-time, co-located teamwork. We studied two forms of proactive support in digital escape rooms: a facilitator agent that offered summaries and group structures, and a peer agent that proposed ideas and answered queries. In a within-subjects study with 24 participants, we compared group performance and processes across three conditions: no AI, peer, and facilitator. Results show that the peer agent occasionally enhanced problem-solving by offering timely hints and memory support; however, it also disrupted flow, increased workload, and created over-reliance. In comparison, the facilitator agent provided light scaffolding but had a limited impact on outcomes. We provide design considerations for proactive generative AI agents based on our findings.

Authors:Yi Shan, Yixuan He, Zekai Shao, Kai Xu, Siming Chen
Title: NotebookRAG: Retrieving Multiple Notebooks to Augment the Generation of EDA Notebooks for Crowd-Wisdom
Abstract:
High-quality exploratory data analysis (EDA) is essential in the data science pipeline, but remains highly dependent on analysts' expertise and effort. While recent LLM-based approaches partially reduce this burden, they struggle to generate effective analysis plans and appropriate insights and visualizations when user intent is abstract. Meanwhile, a vast collection of analysis notebooks produced across platforms and organizations contains rich analytical knowledge that can potentially guide automated EDA. Retrieval-augmented generation (RAG) provides a natural way to leverage such corpora, but general methods often treat notebooks as static documents and fail to fully exploit their potential knowledge for automating EDA. To address these limitations, we propose NotebookRAG, a method that takes user intent, datasets, and existing notebooks as input to retrieve, enhance, and reuse relevant notebook content for automated EDA generation. For retrieval, we transform code cells into context-enriched executable components, which improve retrieval quality and enable rerun with new data to generate updated visualizations and reliable insights. For generation, an agent leverages enhanced retrieval content to construct effective EDA plans, derive insights, and produce appropriate visualizations. Evidence from a user study with 24 participants confirms the superiority of our method in producing high-quality and intent-aligned EDA notebooks.

Authors:Stephan Vonschallen, Rahel Häusler, Theresa Schmiedel, Friederike Eyssel
Title: Never say never: Exploring the effects of available knowledge on agent persuasiveness in controlled physiotherapy motivation dialogues
Abstract:
Generative Social Agents (GSAs) are increasingly impacting human users through persuasive means. On the one hand, they might motivate users to pursue personal goals, such as healthier lifestyles. On the other hand, they are associated with potential risks like manipulation and deception, which are induced by limited control over probabilistic agent outputs. However, as GSAs manifest communicative patterns based on available knowledge, their behavior may be regulated through their access to such knowledge. Following this approach, we explored persuasive ChatGPT-generated messages in the context of human-robot physiotherapy motivation. We did so by comparing ChatGPT-generated responses to predefined inputs from a hypothetical physiotherapy patient. In Study 1, we qualitatively analyzed 13 ChatGPT-generated dialogue scripts with varying knowledge configurations regarding persuasive message characteristics. In Study 2, third-party observers (N = 27) rated a selection of these dialogues in terms of the agent's expressiveness, assertiveness, and persuasiveness. Our findings indicate that LLM-based GSAs can adapt assertive and expressive personality traits -- significantly enhancing perceived persuasiveness. Moreover, persuasiveness significantly benefited from the availability of information about the patients' age and past profession, mediated by perceived assertiveness and expressiveness. Contextual knowledge about physiotherapy benefits did not significantly impact persuasiveness, possibly because the LLM had inherent knowledge about such benefits even without explicit prompting. Overall, the study highlights the importance of empirically studying behavioral patterns of GSAs, specifically in terms of what information generative AI systems require for consistent and responsible communication.

Authors:Stephan Vonschallen, Dominique Oberle, Theresa Schmiedel, Friederike Eyssel
Title: Knowledge-Based Design Requirements for Generative Social Robots in Higher Education
Abstract:
Generative social robots (GSRs) powered by large language models enable adaptive, conversational tutoring but also introduce risks such as hallucinations, overreliance, and privacy violations. Existing frameworks for educational technologies and responsible AI primarily define desired behaviors, yet they rarely specify the knowledge prerequisites that enable generative systems to express these behaviors reliably. To address this gap, we adopt a knowledge-based design perspective and investigate what information tutoring-oriented GSRs require to function responsibly and effectively in higher education. Based on twelve semi-structured interviews with university students and lecturers, we identify twelve design requirements across three knowledge types: self-knowledge (assertive, conscientious and friendly personality with customizable role), user-knowledge (personalized information about student learning goals, learning progress, motivation type, emotional state and background), and context-knowledge (learning materials, educational strategies, course-related information, and physical learning environment). By identifying these knowledge requirements, this work provides a structured foundation for the design of tutoring GSRs and future evaluations, aligning generative system capabilities with pedagogical and ethical expectations.

Authors:Stephan Vonschallen, Friederike Eyssel, Theresa Schmiedel
Title: Understanding Persuasive Interactions between Generative Social Agents and Humans: The Knowledge-based Persuasion Model (KPM)
Abstract:
Generative social agents (GSAs) use artificial intelligence to autonomously communicate with human users in a natural and adaptive manner. Currently, there is a lack of theorizing regarding interactions with GSAs, and likewise, few guidelines exist for studying how they influence user attitudes and behaviors. Consequently, we propose the Knowledge-based Persuasion Model (KPM) as a novel theoretical framework. According to the KPM, a GSA's self, user, and context-related knowledge drives its persuasive behavior, which in turn shapes the attitudes and behaviors of a responding human user. By synthesizing existing research, the model offers a structured approach to studying interactions with GSAs, supporting the development of agents that motivate rather than manipulate humans. Accordingly, the KPM encourages the integration of responsible GSAs that adhere to social norms and ethical standards with the goal of increasing user wellbeing. Implications of the KPM for research and application domains such as healthcare and education are discussed.

Authors:Albin Zeqiri, Michael Rietzler, Enrico Rukzio
Title: Investigating the Effects of Eco-Friendly Service Options on Rebound Behavior in Ride-Hailing
Abstract:
Eco-friendly service options (EFSOs) aim to reduce personal carbon emissions, yet their eco-friendly framing may permit increased consumption, weakening their intended impact. Such rebound effects remain underexamined in HCI, including how common eco-feedback approaches shape them. We investigate this in an online within-subjects experiment (N=75) in a ride-hailing context. Participants completed 10 trials for five conditions (No EFSO, EFSO - Minimal, EFSO - CO2 Equivalency, EFSO - Gamified, EFSO - Social), yielding 50 choices between walking and ride-hailing for trips ranging from 0.5mi - 2.0mi (0.80km - 3.22km). We measured how different EFSO variants affected ride-hailing uptake relative to a No EFSO baseline. EFSOs lacking explicit eco-feedback metrics increased ride-hailing uptake, and qualitative responses indicate that EFSOs can make convenience-driven choices more permissible. We conclude with implications for designing EFSOs that begin to take rebound effects into account.

Authors:Gabriela Molina León, Benjamin Bach, Matheus Valentim, Niklas Elmqvist
Title: A Multiliteracy Model for Interactive Visualization Literacy: Definitions, Literacies, and Steps for Future Research
Abstract:
This paper presents a theoretical model for interactive visualization literacy to describe how people use interactive data visualizations and systems. Literacies have become an important concept in describing modern life skills, with visualization literacy generally referring to the use and interpretation of data visualizations. However, prior work on visualization literacy overlooks interaction and its associated challenges, despite it being an intrinsic aspect of using visualizations. Based on existing theoretical frameworks, we derive a two-dimensional model that combines four well-known literacies with five novel ones. We found evidence for our model through analyzing existing visualization systems as well as through observations from an exploratory study involving such systems. We conclude by outlining steps towards measuring, evaluating, designing for, and teaching interactive visualization literacy.

Authors:Thorsten Klößner, João Belo, Zekun Wu, Jörg Hoffmann, Anna Maria Feit
Title: Intelligent support for Human Oversight: Integrating Reinforcement Learning with Gaze Simulation to Personalize Highlighting
Abstract:
Interfaces for human oversight must effectively support users' situation awareness under time-critical conditions. We explore reinforcement learning (RL)-based UI adaptation to personalize alerting strategies that balance the benefits of highlighting critical events against the cognitive costs of interruptions. To enable learning without real-world deployment, we integrate models of users' gaze behavior to simulate attentional dynamics during monitoring. Using a delivery-drone oversight scenario, we present initial results suggesting that RL-based highlighting can outperform static, rule-based approaches and discuss challenges of intelligent oversight support.

Authors:Cameron R. Jones, Agnese Lombardi, Kyle Mahowald, Benjamin K. Bergen
Title: LLMs and people both learn to form conventions -- just not with each other
Abstract:
Humans align to one another in conversation -- adopting shared conventions that ease communication. We test whether LLMs form the same kinds of conventions in a multimodal communication game. Both humans and LLMs display evidence of convention-formation (increasing the accuracy and consistency of their turns while decreasing their length) when communicating in same-type dyads (humans with humans, AI with AI). However, heterogenous human-AI pairs fail -- suggesting differences in communicative tendencies. In Experiment 2, we ask whether LLMs can be induced to behave more like human conversants, by prompting them to produce superficially humanlike behavior. While the length of their messages matches that of human pairs, accuracy and lexical overlap in human-LLM pairs continues to lag behind that of both human-human and AI-AI pairs. These results suggest that conversational alignment requires more than just the ability to mimic previous interactions, but also shared interpretative biases toward the meanings that are conveyed.

Authors:Bahare Riahi, Veronica Catete
Title: Humanizing AI Grading: Student-Centered Insights on Fairness, Trust, Consistency and Transparency
Abstract:
This study investigates students' perceptions of Artificial Intelligence (AI) grading systems in an undergraduate computer science course (n = 27), focusing on a block-based programming final project. Guided by the ethical principles framework articulated by Jobin (2019), our study examines fairness, trust, consistency, and transparency in AI grading by comparing AI-generated feedback with original human-graded feedback. Findings reveal concerns about AI's lack of contextual understanding and personalization. We recommend that equitable and trustworthy AI systems reflect human judgment, flexibility, and empathy, serving as supplementary tools under human oversight. This work contributes to ethics-centered assessment practices by amplifying student voices and offering design principles for humanizing AI in designed learning environments.

Authors:Zheyuan Zhang, Dorian Peters, Lan Xiao, Jingjing Sun, Laura Moradbakhti, Andrew Hall, Rafael A. Calvo
Title: Understanding Workplace Relatedness Support among Healthcare Professionals: A Four-Layer Model and Implications for Technology Design
Abstract:
Healthcare professionals (HCPs) face increasing occupational stress and burnout. Supporting HCPs need for relatedness is fundamental to their psychological wellbeing and resilience. However, how technologies could support HCPs relatedness in the workplace remains less explored. This study incorporated semi-structured interviews (n = 15) and co-design workshops (n = 21) with HCPs working in the UK National Health Service (NHS), to explore their current practices and preferences for workplace relatedness support, and how technology could be utilized to benefit relatedness. Qualitative analysis yielded a four-layer model of HCPs relatedness need, which includes Informal Interactions, Camaraderie and Bond, Community and Organizational Care, and Shared Identity. Workshops generated eight design concepts (e.g., Playful Encounter, Collocated Action, and Memories and Stories) that operationalize the four relatedness need layers. We conclude by highlighting the theoretical relevance, practical design implications, and the necessity to strengthen relatedness support for HCPs in the era of digitalization and artificial intelligence.

Authors:Jialin Li, Zhenhao Chen, Hanjun Luo, Hanan Salam
Title: PrefIx: Understand and Adapt to User Preference in Human-Agent Interaction
Abstract:
LLM-based agents can complete tasks correctly yet still frustrate users through poor interaction patterns, such as excessive confirmations, opaque reasoning, or misaligned pacing. Current benchmarks evaluate task accuracy but overlook how agents interact: whether they infer preferences from implicit cues, adapt dynamically, or maintain fine-grained interaction quality. We introduce Prefix, a configurable environment that evaluates both what agents accomplish and how they interact. Central to Prefix is the Interaction-as-a-Tool (IaaT) paradigm, which treats interaction behaviors as structured tool calls, unifying them with existing evaluation frameworks. We define 31 preference settings across 14 attributes and formalize user experience (UX) as a core metric alongside task accuracy. A composite LLM-as-a-Judge mechanism across seven UX dimensions achieves strong aggregate reliability (ICC > 0.79), high internal consistency (alpha = 0.943), and human correlation (rho = 0.52-0.78). Preference-aware agents show 7.6% average UX improvement and 18.5% gain in preference alignment. Our work is openly accessible.

Authors:Adam Wróbel, Siddhartha Gairola, Jacek Tabor, Bernt Schiele, Bartosz Zieliński, Dawid Rymarczyk
Title: DAVE: Distribution-aware Attribution via ViT Gradient Decomposition
Abstract:
Vision Transformers (ViTs) have become a dominant architecture in computer vision, yet producing stable and high-resolution attribution maps for these models remains challenging. Architectural components such as patch embeddings and attention routing often introduce structured artifacts in pixel-level explanations, causing many existing methods to rely on coarse patch-level attributions. We introduce DAVE \textit{(\underline{D}istribution-aware \underline{A}ttribution via \underline{V}iT Gradient D\underline{E}composition)}, a mathematically grounded attribution method for ViTs based on a structured decomposition of the input gradient. By exploiting architectural properties of ViTs, DAVE isolates locally equivariant and stable components of the effective input--output mapping. It separates these from architecture-induced artifacts and other sources of instability.

Authors:Yewon Kim, Stephen Brade, Alexander Wang, David Zhou, Haven Kim, Bill Wang, Sung-Ju Lee, Hugo F Flores Garcia, Cheng-Zhi Anna Huang, Chris Donahue
Title: A Design Space for Live Music Agents
Abstract:
Live music provides a uniquely rich setting for studying creativity and interaction due to its spontaneous nature. The pursuit of live music agents--intelligent systems supporting real-time music performance and interaction--has captivated researchers across HCI, AI, and computer music for decades, and recent advancements in AI suggest unprecedented opportunities to evolve their design. However, the interdisciplinary nature of music has led to fragmented development across research communities, hindering effective communication and collaborative progress. In this work, we bring together perspectives from these diverse fields to map the current landscape of live music agents. Based on our analysis of 184 systems across both academic literature and video, we develop a comprehensive design space that categorizes dimensions spanning usage contexts, interactions, technologies, and ecosystems. By highlighting trends and gaps in live music agents, our design space offers researchers, designers, and musicians a structured lens to understand existing systems and shape future directions in real-time human-AI music co-creation. We release our annotated systems as a living artifact at https://live-music-agents.github.io.

Authors:Eryue Xu, Tianshi Li
Title: From Fragmentation to Integration: Exploring the Design Space of AI Agents for Human-as-the-Unit Privacy Management
Abstract:
Managing one's digital footprint is overwhelming, as it spans multiple platforms and involves countless context-dependent decisions. Recent advances in agentic AI offer ways forward by enabling holistic, contextual privacy-enhancing solutions. Building on this potential, we adopted a ''human-as-the-unit'' perspective and investigated users' cross-context privacy challenges through 12 semi-structured interviews. Results reveal that people rely on ad hoc manual strategies while lacking comprehensive privacy controls, highlighting nine privacy-management challenges across applications, temporal contexts, and relationships. To explore solutions, we generated nine AI agent concepts and evaluated them via a speed-dating survey with 116 US participants. The three highest-ranked concepts were all post-sharing management tools with half or full agent autonomy, with users expressing greater trust in AI accuracy than in their own efforts. Our findings highlight a promising design space where users see AI agents bridging the fragments in privacy management, particularly through automated, comprehensive post-sharing remediation of users' digital footprints.

Authors:Matthew P. Lad, Louisa Conwill, Megan Levis Scheirer
Title: Is It Possible to Make Chatbots Virtuous? Investigating a Virtue-Based Design Methodology Applied to LLMs
Abstract:
With the rapid growth of Large Language Models (LLMs), criticism of their societal impact has also grown. Work in Responsible AI (RAI) has focused on the development of AI systems aimed at reducing harm. Responding to RAI's criticisms and the need to bring the wisdom traditions into HCI, we apply Conwill et al.'s Virtue-Guided Technology Design method to LLMs. We cataloged new ethical design patterns for LLMs and evaluated them through interviews with technologists. Participants valued that the patterns provided more accuracy and robustness, better safety, new research opportunities, increased access and control, and reduced waste. Their concerns were that the patterns could be vulnerable to jailbreaking, were generalizing models too widely, and had potential implementation issues. Overall, participants reacted positively while also acknowledging the tradeoffs involved in ethical LLM design.

Authors:Alessandra Maciel Paz Milani, Norman Anderson, Margaret-Anne Storey
Title: Towards a Cognitive-Support Tool for Threat Hunters
Abstract:
Cybersecurity increasingly relies on threat hunters to proactively identify adversarial activity, yet the cognitive work underlying threat hunting remains underexplored or insufficiently supported by existing tools. Building on prior studies that examined how threat hunters construct and share mental models during investigations, we derived a set of design propositions to support their cognitive and collaborative work. In this paper, we present the Threat Hunter Board, a prototype tool that operationalizes these design propositions by enabling threat hunters to externalize reasoning, organize investigative leads, and maintain continuity across sessions. Using a design science paradigm, we describe the solution design rationale and artifact development. In addition, we propose six design heuristics that form a solution-evaluation framework for assessing cognitive support in threat hunting tools. An initial evaluation using a cognitive walkthrough provides early evidence of feasibility, while future work will focus on user-based validation with professional threat hunters.

Authors:Juan David Salazar Rodriguez, Sam Conrad Joyce, Nachamma Sockalingam, Khoo Eng Tat, Julfendi
Title: Student Perceptions of Large Language Models Use in Self-Reflection and Design Critique in Architecture Studio
Abstract:
This study investigates the integration of Large Language Models (LLMs) into the feedback mechanisms of the architectural design studio, shifting the focus from generative production to reflective pedagogy. Employing a mixed-methods approach with surveys and semi structured interviews with 22 architecture students at the Singapore University of Technology and De-sign, the research analyzes student perceptions across three distinct feed-back domains: self-reflection, peer critique, and professor-led reviews. The findings reveal that students engage with LLMs not as authoritative in-structors, but as collaborative "cognitive mirrors" that scaffold critical thinking. In self-directed learning, LLMs help structure thoughts and over-come the "blank page" problem, though they are limited by a lack of contex-tual nuance. In peer critiques, the technology serves as a neutral mediator, mitigating social anxiety and the "fear of offending". Furthermore, in high-stakes professor-led juries, students utilize LLMs primarily as post-critique synthesis engines to manage cognitive overload and translate ab-stract academic discourse into actionable design iterations.

Authors:Bhada Yun, Evgenia Taranova, April Yi Wang
Title: Does My Chatbot Have an Agenda? Understanding Human and AI Agency in Human-Human-like Chatbot Interaction
Abstract:
AI chatbots are shifting from tools to companions. This raises critical questions about agency: who drives conversations and sets boundaries in human-AI chatrooms? We report a month-long longitudinal study with 22 adults who chatted with Day, an LLM companion we built, followed by a semi-structured interview with post-hoc elicitation of notable moments, cross-participant chat reviews, and a 'strategy reveal' disclosing Day's vertical (depth-seeking) vs. horizontal (breadth-seeking) modes. We discover that agency in human-AI chatrooms is an emergent, shared experience: as participants claimed agency by setting boundaries and providing feedback, and the AI was perceived to steer intentions and drive execution, control shifted and was co-constructed turn-by-turn. We introduce a 3-by-5 framework mapping who (human, AI, hybrid) x agency action (Intention, Execution, Adaptation, Delimitation, Negotiation), modulated by individual and environmental factors. Ultimately, we argue for translucent design (i.e. transparency-on-demand), spaces for agency negotiation, and guidelines toward agency-aware conversational AI.

Authors:Bhada Yun, Renn Su, April Yi Wang
Title: AI and My Values: User Perceptions of LLMs' Ability to Extract, Embody, and Explain Human Values from Casual Conversations
Abstract:
Does AI understand human values? While this remains an open philosophical question, we take a pragmatic stance by introducing VAPT, the Value-Alignment Perception Toolkit, for studying how LLMs reflect people's values and how people judge those reflections. 20 participants texted a human-like chatbot over a month, then completed a 2-hour interview with our toolkit evaluating AI's ability to extract (pull details regarding), embody (make decisions guided by), and explain (provide proof of) human values. 13 participants left our study convinced that AI can understand human values. Participants found the experience insightful for self-reflection and found themselves getting persuaded by the AI's reasoning. Thus, we warn about "weaponized empathy": a potentially dangerous design pattern that may arise in value-aligned, yet welfare-misaligned AI. VAPT offers concrete artifacts and design implications to evaluate and responsibly build value-aligned conversational agents with transparency, consent, and safeguards as AI grows more capable and human-like into the future.

Authors:Upol Ehsan, Samir Passi, Koustuv Saha, Todd McNutt, Mark O. Riedl, Sara Alcorn
Title: From Future of Work to Future of Workers: Addressing Asymptomatic AI Harms for Dignified Human-AI Interaction
Abstract:
In the future of work discourse, AI is touted as the ultimate productivity amplifier. Yet, beneath the efficiency gains lie subtle erosions of human expertise and agency. This paper shifts focus from the future of work to the future of workers by navigating the AI-as-Amplifier Paradox: AI's dual role as enhancer and eroder, simultaneously strengthening performance while eroding underlying expertise. We present a year-long study on the longitudinal use of AI in a high-stakes workplace among cancer specialists. Initial operational gains hid ``intuition rust'': the gradual dulling of expert judgment. These asymptomatic effects evolved into chronic harms, such as skill atrophy and identity commoditization. Building on these findings, we offer a framework for dignified Human-AI interaction co-constructed with professional knowledge workers facing AI-induced skill erosion without traditional labor protections. The framework operationalizes sociotechnical immunity through dual-purpose mechanisms that serve institutional quality goals while building worker power to detect, contain, and recover from skill erosion, and preserve human identity. Evaluated across healthcare and software engineering, our work takes a foundational step toward dignified human-AI interaction futures by balancing productivity with the preservation of human expertise.

Authors:Joyce Zhou, Weijie Zhou, Doug Turnbull, Thorsten Joachims
Title: SteerEval: A Framework for Evaluating Steerability with Natural Language Profiles for Recommendation
Abstract:
Natural-language user profiles have recently attracted attention not only for improved interpretability, but also for their potential to make recommender systems more steerable. By enabling direct editing, natural-language profiles allow users to explicitly articulate preferences that may be difficult to infer from past behavior. However, it remains unclear whether current natural-language-based recommendation methods can follow such steering commands. While existing steerability evaluations have shown some success for well-recognized item attributes (e.g., movie genres), we argue that these benchmarks fail to capture the richer forms of user control that motivate steerable recommendations. To address this gap, we introduce SteerEval, an evaluation framework designed to measure more nuanced and diverse forms of steerability by using interventions that range from genres to content-warning for movies. We assess the steerability of a family of pretrained natural-language recommenders, examine the potential and limitations of steering on relatively niche topics, and compare how different profile and recommendation interventions impact steering effectiveness. Finally, we offer practical design suggestions informed by our findings and discuss future steps in steerable recommender design.

Authors:Josh Susak, Yifu Liu, Pascal Jansen, Mark Colley
Title: ProVoice: Designing Proactive Functionality for In-Vehicle Conversational Assistants using Multi-Objective Bayesian Optimization to Enhance Driver Experience
Abstract:
The next step for In-vehicle Conversational Assistants (IVCAs) will be their capability to initiate and automate proactive system interactions throughout journeys. However, diverse drivers make it challenging to design voice interventions tailored towards individual on-road expectations. This paper evaluates the effectiveness of Human-in-the-Loop (HITL) Multi-Objective Bayesian Optimization (MOBO) in design by implementing ProVoice: a Virtual Reality (VR) driving simulator integrating MOBO to investigate the effects of IVCA design variants on perceived mental demand, predictability, and usefulness. By reporting the Pareto Front from a within-subjects VR study (N=19), this paper proposes optimal design trade-offs. Follow-up analysis demonstrates MOBO's success in discovering effective intervention strategies, with reduced participant mental demand, alongside enhanced predictability and usefulness while engaging with the proactive IVCA. Implications for computational techniques in future research on proactive intervention strategies are discussed. ProVoice can extend to include alternative design parameters and driving scenarios, encouraging intervention design on a broad scale.

Authors:Nan Chen, Jing Lu, Zilong Wang, Luna K. Qiu, Siming Chen, Yuqing Yang
Title: From Struggle to Success: Context-Aware Guidance for Screen Reader Users in Computer Use
Abstract:
Equal access to digital technologies is critical for education, employment, and social participation. However, mainstream interfaces are visually oriented, creating steep learning curves and frequent obstacles for screen reader users, and limiting their independence and opportunities. Existing support is inadequate -- tutorials mainly target sighted users, while human assistance lacks real-time availability. We introduce AskEase, an on-demand AI assistant that provides step-by-step, screen reader user-friendly guidance for computer use. AskEase manages multiple sources of context to infer user intent and deliver precise, situation-specific guidance. Its seamless interaction design minimizes disruption and reduces the effort of seeking help. We demonstrated its effectiveness through representative usage scenarios and robustness tests. In a within-subjects study with 12 screen reader users, AskEase significantly improved task success while reducing perceived workload, including physical demand, effort, and frustration. These results demonstrate the potential of LLM-powered assistants to promote accessible computing and expand opportunities for users with visual impairments.

Authors:Shuhao Zhang, Jiahe Dong, Haoran Wang, Chang Jiang, Quan Li
Title: When Seconds Count: Designing Real-Time VR Interventions for Stress Inoculation Training in Novice Physicians
Abstract:
Surgical emergencies often trigger acute cognitive overload in novice physicians, impairing their decision-making under pressure. Although Virtual Reality-based Stress Inoculation Training (VR-SIT) shows promise, current systems fall short in delivering real-time, effective support during moments of peak stress. To bridge this gap, we first conducted a formative study (N=12) to uncover the core needs of novice physicians for immediate assistance under acute stress and identified three key intervention strategies: self-regulation aids, procedure guidance, and emotional/sensory support. Building on these insights, we designed and implemented a novel VR-SIT system that incorporates a just-in-time adaptive intervention framework, dynamically tailoring support to learners' cognitive and emotional states. We then validated these strategies in a user study (N=26). Our findings provide empirical evidence and design implications for next-generation VR medical training systems, supporting physicians in sustaining cognitive clarity and accurate decision-making in critical situations.

Authors:Masahiro Yoshino, Haruki Yokota, Junya Hara, Yuichi Tanaka, Hiroshi Higashi
Title: Auditory Attention Decoding without Spatial Information: A Diotic EEG Study
Abstract:
Auditory attention decoding (AAD) identifies the attended speech stream in multi-speaker environments by decoding brain signals such as electroencephalography (EEG). This technology is essential for realizing smart hearing aids that address the cocktail party problem and for facilitating objective audiometry systems. Existing AAD research mainly utilizes dichotic environments where different speech signals are presented to the left and right ears, enabling models to classify directional attention rather than speech content. However, this spatial reliance limits applicability to real-world scenarios, such as the "cocktail party" situation, where speakers overlap or move dynamically. To address this challenge, we propose an AAD framework for diotic environments where identical speech mixtures are presented to both ears, eliminating spatial cues. Our approach maps EEG and speech signals into a shared latent space using independent encoders. We extract speech features using wav2vec 2.0 and encode them with a 2-layer 1D convolutional neural network (CNN), while employing the BrainNetwork architecture for EEG encoding. The model identifies the attended speech by calculating the cosine similarity between EEG and speech representations. We evaluate our method on a diotic EEG dataset and achieve 72.70% accuracy, which is 22.58% higher than the state-of-the-art direction-based AAD method.

Authors:Yuheng Shao, Yuansong Xu, Yifan Jin, Shuhao Zhang, Wenxin Gu, Quan Li
Title: DesignBridge: Bridging Designer Expertise and User Preferences through AI-Enhanced Co-Design for Fashion
Abstract:
Effective collaboration between designers and users is important for fashion design, which can increase the user acceptance of fashion products and thereby create value. However, it remains an enduring challenge, as traditional designer-centric approaches restrict meaningful user participation, while user-driven methods demand design proficiency, often marginalizing professional creative judgment. Current co-design practices, including workshops and AI-assisted frameworks, struggle with low user engagement, inefficient preference collection, and difficulties in balancing user feedback with design considerations. To address these challenges, we conducted a formative study with designers and users experienced in co-design (N=7), identifying critical challenges for current collaboration between designers and users in the co-design process, and their requirements. Informed by these insights, we introduce DesignBridge, a multi-platform AI-enhanced interactive system that bridges designer expertise and user preferences through three stages: (1) Initial Design Framing, where designers define initial concepts. (2) Preference Expression Collection, where users intuitively articulate preferences via interactive tools. (3) Preference-Integrated Design, where designers use AI-assisted analytics to integrate feedback into cohesive designs. A user study demonstrates that DesignBridge significantly enhances user preference collection and analysis, enabling designers to integrate diverse preferences with professional expertise.

Authors:Jaeyoung Moon, Youjin Choi, Yucheon Park, David Melhart, Georgios N. Yannakakis, Kyung-Joong Kim
Title: PREFAB: PREFerence-based Affective Modeling for Low-Budget Self-Annotation
Abstract:
Self-annotation is the gold standard for collecting affective state labels in affective computing. Existing methods typically rely on full annotation, requiring users to continuously label affective states across entire sessions. While this process yields fine-grained data, it is time-consuming, cognitively demanding, and prone to fatigue and errors. To address these issues, we present PREFAB, a low-budget retrospective self-annotation method that targets affective inflection regions rather than full annotation. Grounded in the peak-end rule and ordinal representations of emotion, PREFAB employs a preference-learning model to detect relative affective changes, directing annotators to label only selected segments while interpolating the remainder of the stimulus. We further introduce a preview mechanism that provides brief contextual cues to assist annotation. We evaluate PREFAB through a technical performance study and a 25-participant user study. Results show that PREFAB outperforms baselines in modeling affective inflections while mitigating workload (and conditionally mitigating temporal burden). Importantly PREFAB improves annotator confidence without degrading annotation quality.

Authors:Wenge Xu, Foroogh Hajiseyedjavadi, Debargha Dey, Tram Thi Minh Tran, Mark Colley
Title: Exploring the Impacts of Background Noise on Auditory Stimuli of Audio-Visual eHMIs for Hearing, Deaf, and Hard-of-Hearing People
Abstract:
External Human-Machine Interfaces (eHMIs) have been proposed to enhance communication between automated vehicles (AVs) and pedestrians, with growing interest in multi-modal designs such as audio-visual eHMIs. Just as poor lighting can impair visual cues, a loud background noise may mask the auditory stimuli. However, its effects within these systems have not been examined, and little is known about how pedestrians -- particularly Deaf and Hard-of-Hearing (DHH) people -- perceive different types of auditory stimuli. We conducted a virtual reality study (Hearing N=25, DHH N=11) to examine the effects of background noise (quiet and loud) on auditory stimuli (baseline, bell, speech) within an audio-visual eHMI. Results revealed that: (1) Crossing experiences of DHH pedestrians significantly differ from Hearing pedestrians. (2) Loud background noise adversely affects pedestrians' crossing experiences. (3) Providing an additional auditory eHMI (bell/speech) improves crossing experiences. We outlined four practical implications for future eHMI design and research.

Authors:Sharifa Sultana, Rupali Samad, Mehzabin Haque, Zinnat Sultana, Zulkarin Jahangir, B M Mainul Hossain, Rashed Mujib Noman, Syed Ishtiaque Ahmed
Title: Bangladesh AI Readiness: Perspectives from the Academia, Industry, and Government
Abstract:
Artificial Intelligence (AI) readiness in the Global South extends beyond infrastructure to include curriculum design, workforce development, and cross-sector collaboration. Bangladesh, ranked 82nd in the 2023 Oxford Insights AI Readiness Index, exhibits significant deficits in technology capacity and research ecosystems, despite strong governmental visions. While HCI and ICTD research have explored digital inclusion and responsible AI, little empirical work examines how educational, industrial, and policy domains intersect to shape readiness. We present a multi-method qualitative study of AI readiness in Bangladesh, combining institutional analyses, 59 stakeholder interviews, and curriculum benchmarking against global exemplars. Findings reveal outdated curricula, limited faculty upskilling, inadequate computing resources, entrenched gender disparities, and the near-total absence of AI ethics instruction. We contribute empirical mapping of current practices, identification of structural and cultural barriers, and actionable pathways for embedding human-centered, inclusive, and responsible AI practices into national agendas, advancing equitable innovation in emerging AI ecosystems.

Authors:Sharifa Sultana, Pratyasha Saha, Nadira Nowsher, Sumaia Arefin Ritu, Zinnat Sultana, Syed Ishtiaque Ahmed, S M Taiabul Haque
Title: Perception of Deepfakes among Bangladeshi Women
Abstract:
As deepfake technology becomes more accessible, concerns about its misuse and societal impact are escalating, particularly in regions like the Global South where digital literacy and regulatory measures are often limited. While previous research has explored deepfakes in contexts such as detection and media manipulation, there is a noticeable gap in understanding how individuals in these regions perceive and interact with deepfake media. This study addresses this gap by investigating how Bangladeshi women perceive deepfakes and the socio-cultural factors influencing their awareness, concerns, and responses to this technology. Drawing on 15 semi-structured interviews, we uncover how cultural values, gendered norms, trust in institutions, and the prevalence of digital harassment shape their perceptions and coping mechanisms. Through this research, we aim to advance existing scholarship in HCI by offering insights into the design of culturally sensitive interventions, educational initiatives, and policy frameworks to address the challenges posed by deepfakes in the Global South.

Authors:Haodong Zhang, Jiapeng Zhu, Yitong Chen, Hongqi Li
Title: HCFT: Hierarchical Convolutional Fusion Transformer for EEG Decoding
Abstract:
Electroencephalography (EEG) decoding requires models that can effectively extract and integrate complex temporal, spectral, and spatial features from multichannel signals. To address this challenge, we propose a lightweight and generalizable decoding framework named Hierarchical Convolutional Fusion Transformer (HCFT), which combines dual-branch convolutional encoders and hierarchical Transformer blocks for multi-scale EEG representation learning. Specifically, the model first captures local temporal and spatiotemporal dynamics through time-domain and time-space convolutional branches, and then aligns these features via a cross-attention mechanism that enables interaction between branches at each stage. Subsequently, a hierarchical Transformer fusion structure is employed to encode global dependencies across all feature stages, while a customized Dynamic Tanh normalization module is introduced to replace traditional Layer Normalization in order to enhance training stability and reduce redundancy. Extensive experiments are conducted on two representative benchmark datasets, BCI Competition IV-2b and CHB-MIT, covering both event-related cross-subject classification and continuous seizure prediction tasks. Results show that HCFT achieves 80.83% average accuracy and a Cohen's kappa of 0.6165 on BCI IV-2b, as well as 99.10% sensitivity, 0.0236 false positives per hour, and 98.82% specificity on CHB-MIT, consistently outperforming over ten state-of-the-art baseline methods. Ablation studies confirm that each core component of the proposed framework contributes significantly to the overall decoding performance, demonstrating HCFT's effectiveness in capturing EEG dynamics and its potential for real-world BCI applications.

Authors:Lukas Schilcher, Peter Waldert, Benedikt Kantz, Tobias Schreck
Title: Clusters in Focus: A Simple and Robust Detail-On-Demand Dashboard for Patient Data
Abstract:
Exploring tabular datasets to understand how different feature pairs partition data into meaningful cohorts is crucial in domains such as biomarker discovery, yet comparing clusters across multiple feature pair projections is challenging. We introduce Clusters in Focus, an interactive visual analytics dashboard designed to address this gap. Clusters in Focus employs a three-panel coordinated view: a Data Panel offers multiple perspectives (tabular, heatmap, condensed with histograms / SHAP values) for initial data exploration; a Selection Panel displays the 2D clustering (K-Means/DBSCAN) for a user-selected feature pair; and a novel Cluster Similarity Panel featuring two switchable views for comparing clusters. A ranked list enables the identification of top-matching feature pairs, while an interactive similarity matrix with reordering capabilities allows for the discovery of global structural patterns and groups of related features. This dual-view design supports both focused querying and broad visual exploration. A use case on a Parkinson's disease speech dataset demonstrates the tool's effectiveness in revealing relationships between different feature pairs characterizing the same patient subgroup.

Authors:Max Linnander, Yon Visell
Title: Haptic Light-Emitting Diodes: Miniature, Luminous Tactile Actuators
Abstract:
We present Haptic Light-Emitting Diodes (HLEDs), luminous thermopneumatic actuators that directly convert pulsed light into mechanical forces and displacements. Each device packages a miniature surface-mount LED in a gas-filled cavity that contains a low-inertia graphite photoabsorber. The cavity is sealed by an elastic membrane, which functions as a working diaphragm. Brief optical pulses heat the photoabsorber, which heats the gas. The resulting rapid pressure increases generate forces and displacements at the working diaphragm. Millimeter-scale HLEDs produce forces exceeding 0.4 N and displacements of 0.9 mm at low voltages, with 5 to 100 ms response times, making them attractive as actuators providing tactile feedback in human-machine interfaces. Unusually, these actuators are also light-emitting, as a fraction of optical energy is transmitted through the membrane. These photomechanical actuators have many potential applications in tactile displays, human interface engineering, wearable computing, and other areas.

Authors:Yao Lyu, Jessica Shen, Alina Faisal, John M. Carroll
Title: "I'm Constantly Getting Comments Like, 'Oh, You're Blind. You're Like the Only Woman That I Stand a Chance With.'": A Study of Blind TikTokers' Intersectional Experiences of Gender and Sexuality
Abstract:
Social media platforms are important venues for identity expression, and the Human-Computer Interaction community has been paying growing attention to how marginalized groups express their identities on these platforms. Joining the emerging literature on intersectional experiences, we study blind TikTokers ("BlindTokers") who are also women and/or LGBTQ+. Using interview data from \rev{41} participants, we identify their intersectional experiences as mediated by TikTok's socio-technical affordances. We argue that BlindTokers' intersectional marginalization is infrastructural: TikTok's classification and moderation features interact with social norms in ways that push them aside and distort how they are treated on the platform. We use this infrastructure perspective to understand what these experiences are, how they were formed, and how they become harmful. We further recognize participants' infrastructuring work to address these problems. This study guides future social media design with accessible creator tools, inclusive identity options, and context-aware moderation developed in partnership with communities.

Authors:Yao Lyu, Tawanna Dillahunt, Jiaying Liu, John M. Carroll
Title: "My Brother Is a School Principal, Earns About $80,000 Per Year... But When the Kids See Me, 'Wow, Uncle, You Have 1500 Followers on TikTok!'": A Study of Blind TikTokers' Alternative Professional Development Experiences
Abstract:
One's profession is an essential part of modern life. Traditionally, professional development has been criticized for excluding people with disabilities. People with visual impairments, for example, face disproportionately low employment rates, highlighting persistent gaps in professional opportunities. Recently, there has been growing research on social media platforms as spaces for more equitable career development approaches. In this paper, we present an interview study on the professional development experiences of 60 people with visual impairments on TikTok (also known as "BlindTokers"). We report BlindTokers' goals, strategies, and challenges, supported by detailed examples and in-depth analysis. Based on the findings, we identify that BlindTokers' practices reveal an alternative professional development approach that is more flexible, inclusive, personalized, and diversified than traditional models. Our study also extends professional development research by foregrounding emerging digital skills and proposing design implications to foster more equitable and inclusive professional opportunities.

Authors:Kazi Noshin, Syed Ishtiaque Ahmed, Sharifa Sultana
Title: User Detection and Response Patterns of Sycophantic Behavior in Conversational AI
Abstract:
While concerns about LLM sycophancy have grown among researchers and developers, how users themselves experience this behavior remains largely unexplored. We analyze Reddit discussions to investigate how users detect, mitigate, and perceive sycophantic AI. We develop the DCR epistemology that maps user experiences across three stages: observing sycophantic behaviors, detecting sycophancy, and responding to these behaviors. Our findings reveal that users employ various detection techniques, including cross-platform comparison and inconsistency testing. We document diverse mitigation approaches, including persona-based prompts and targeted language patterns in prompt engineering. We find sycophancy's effects are context-dependent rather than universally harmful. Specifically, vulnerable populations experiencing trauma, mental health challenges, or isolation actively seek and value sycophantic behaviors as emotional support. Users develop both technical and folk explanations for why sycophancy occurs. These findings challenge the assumption that sycophancy should be eliminated universally. We conclude by proposing context-aware AI design that balances risks with benefits of affirmative interaction, while discussing implications for user education and transparency.

Authors:Andrew Stratton, Phani Teja Singamaneni, Pranav Goyal, Rachid Alami, Christoforos Mavrogiannis
Title: How Human Motion Prediction Quality Shapes Social Robot Navigation Performance in Constrained Spaces
Abstract:
Motivated by the vision of integrating mobile robots closer to humans in warehouses, hospitals, manufacturing plants, and the home, we focus on robot navigation in dynamic and spatially constrained environments. Ensuring human safety, comfort, and efficiency in such settings requires that robots are endowed with a model of how humans move around them. Human motion prediction around robots is especially challenging due to the stochasticity of human behavior, differences in user preferences, and data scarcity. In this work, we perform a methodical investigation of the effects of human motion prediction quality on robot navigation performance, as well as human productivity and impressions. We design a scenario involving robot navigation among two human subjects in a constrained workspace and instantiate it in a user study ($N=80$) involving two different robot platforms, conducted across two sites from different world regions. Key findings include evidence that: 1) the widely adopted average displacement error is not a reliable predictor of robot navigation performance and human impressions; 2) the common assumption of human cooperation breaks down in constrained environments, with users often not reciprocating robot cooperation, and causing performance degradations; 3) more efficient robot navigation often comes at the expense of human efficiency and comfort.

Authors:JungMin Yun, JuneHyoung Kwon, MiHyeon Kim, YoungBin Kim
Title: Position on LLM-Assisted Peer Review: Addressing Reviewer Gap through Mentoring and Feedback
Abstract:
The rapid expansion of AI research has intensified the Reviewer Gap, threatening the peer-review sustainability and perpetuating a cycle of low-quality evaluations. This position paper critiques existing LLM approaches that automatically generate reviews and argues for a paradigm shift that positions LLMs as tools for assisting and educating human reviewers. We define the core principles of high-quality peer review and propose two complementary systems grounded in these foundations: (i) an LLM-assisted mentoring system that cultivates reviewers' long-term competencies, and (ii) an LLM-assisted feedback system that helps reviewers refine the quality of their reviews. This human-centered approach aims to strengthen reviewer expertise and contribute to building a more sustainable scholarly ecosystem.

Authors:Rui Liu, Liuqingqing Yang, Runsheng Zhang, Shixiao Wang
Title: Generative Modeling of Human-Computer Interfaces with Diffusion Processes and Conditional Control
Abstract:
This study investigates human-computer interface generation based on diffusion models to overcome the limitations of traditional template-based design and fixed rule-driven methods. It first analyzes the key challenges of interface generation, including the diversity of interface elements, the complexity of layout logic, and the personalization of user needs. A generative framework centered on the diffusion-reverse diffusion process is then proposed, with conditional control introduced in the reverse diffusion stage to integrate user intent, contextual states, and task constraints, enabling unified modeling of visual presentation and interaction logic. In addition, regularization constraints and optimization objectives are combined to ensure the rationality and stability of the generated interfaces. Experiments are conducted on a public interface dataset with systematic evaluations, including comparative experiments, hyperparameter sensitivity tests, environmental sensitivity tests, and data sensitivity tests. Results show that the proposed method outperforms representative models in mean squared error, structural similarity, peak signal-to-noise ratio, and mean absolute error, while maintaining strong robustness under different parameter settings and environmental conditions. Overall, the diffusion model framework effectively improves the diversity, rationality, and intelligence of interface generation, providing a feasible solution for automated interface generation in complex interaction scenarios.

Authors:Huatao Xu, Zihe Liu, Zilin Zeng, Baichuan Li, Mo Li
Title: AutoTour: Automatic Photo Tour Guide with Smartphones and LLMs
Abstract:
We present AutoTour, a system that enhances user exploration by automatically generating fine-grained landmark annotations and descriptive narratives for photos captured by users. The key idea of AutoTour is to fuse visual features extracted from photos with nearby geospatial features queried from open matching databases. Unlike existing tour applications that rely on pre-defined content or proprietary datasets, AutoTour leverages open and extensible data sources to provide scalable and context-aware photo-based guidance. To achieve this, we design a training-free pipeline that first extracts and filters relevant geospatial features around the user's GPS location. It then detects major landmarks in user photos through VLM-based feature detection and projects them into the horizontal spatial plane. A geometric matching algorithm aligns photo features with corresponding geospatial entities based on their estimated distance and direction. The matched features are subsequently grounded and annotated directly on the original photo, accompanied by large language model-generated textual and audio descriptions to provide an informative, tour-like experience. We demonstrate that AutoTour can deliver rich, interpretable annotations for both iconic and lesser-known landmarks, enabling a new form of interactive, context-aware exploration that bridges visual perception and geospatial understanding.

Authors:Britt Besch, Tai Mai, Jeremias Thun, Markus Huff, Jörn Vogel, Freek Stulp, Samuel Bustamante
Title: Model Reconciliation through Explainability and Collaborative Recovery in Assistive Robotics
Abstract:
Whenever humans and robots work together, it is essential that unexpected robot behavior can be explained to the user. Especially in applications such as shared control the user and the robot must share the same model of the objects in the world, and the actions that can be performed on these objects. In this paper, we achieve this with a so-called model reconciliation framework. We leverage a Large Language Model to predict and explain the difference between the robot's and the human's mental models, without the need of a formal mental model of the user. Furthermore, our framework aims to solve the model divergence after the explanation by allowing the human to correct the robot. We provide an implementation in an assistive robotics domain, where we conduct a set of experiments with a real wheelchair-based mobile manipulator and its digital twin.

Authors:Chaerin Yu, Chihun Choi, Sunjae Lee, Hyosu Kim, Steven Y. Ko, Young-Bae Ko, Sangeun Oh
Title: Leveraging LLMs for Efficient and Personalized Smart Home Automation
Abstract:
The proliferation of smart home devices has increased the complexity of controlling and managing them, leading to user fatigue. In this context, large language models (LLMs) offer a promising solution by enabling natural-language interfaces for Internet of Things (IoT) control. However, existing LLM-based approaches suffer from unreliable and inefficient device control due to the non-deterministic nature of LLMs, high inference latency and cost, and limited personalization. To address these challenges, we present IoTGPT, an LLM-based smart home agent designed to execute IoT commands in a reliable, efficient, and personalized manner. Inspired by how humans manage complex tasks, IoTGPT decomposes user instructions into subtasks and memorizes them. By reusing learned subtasks, subsequent instructions can be processed more efficiently with fewer LLM calls, improving reliability and reducing both latency and cost. IoTGPT also supports fine-grained personalization by adapting individual subtasks to user preferences. Our evaluation demonstrates that IoTGPT outperforms baselines in accuracy, latency/cost, and personalization, while reducing user workload.

Authors:Yueyang Wang, Mehmet Dogar, Gustav Markkula
Title: Realistic adversarial scenario generation via human-like pedestrian model for autonomous vehicle control parameter optimisation
Abstract:
Autonomous vehicles (AVs) are rapidly advancing and are expected to play a central role in future mobility. Ensuring their safe deployment requires reliable interaction with other road users, not least pedestrians. Direct testing on public roads is costly and unsafe for rare but critical interactions, making simulation a practical alternative. Within simulation-based testing, adversarial scenarios are widely used to probe safety limits, but many prioritise difficulty over realism, producing exaggerated behaviours which may result in AV controllers that are overly conservative. We propose an alternative method, instead using a cognitively inspired pedestrian model featuring both inter-individual and intra-individual variability to generate behaviourally plausible adversarial scenarios. We provide a proof of concept demonstration of this method's potential for AV control optimisation, in closed-loop testing and tuning of an AV controller. Our results show that replacing the rule-based CARLA pedestrian with the human-like model yields more realistic gap acceptance patterns and smoother vehicle decelerations. Unsafe interactions occur only for certain pedestrian individuals and conditions, underscoring the importance of human variability in AV testing. Adversarial scenarios generated by this model can be used to optimise AV control towards safer and more efficient behaviour. Overall, this work illustrates how incorporating human-like road user models into simulation-based adversarial testing can enhance the credibility of AV evaluation and provide a practical basis to behaviourally informed controller optimisation.

Authors:Sichao Song, Yuki Okafuji, Takuya Iwamoto, Jun Baba, Hiroshi Ishiguro
Title: From Metrics to Meaning: Insights from a Mixed-Methods Field Experiment on Retail Robot Deployment
Abstract:
We report a mixed-methods field experiment of a conversational service robot deployed under everyday staffing discretion in a live bedding store. Over 12 days we alternated three conditions--Baseline (no robot), Robot-only, and Robot+Fixture--and video-annotated the service funnel from passersby to purchase. An explanatory sequential design then used six post-experiment staff interviews to interpret the quantitative patterns. Quantitatively, the robot increased stopping per passerby (highest with the fixture), yet clerk-led downstream steps per stopper--clerk approach, store entry, assisted experience, and purchase--decreased. Interviews explained this divergence: clerks avoided interrupting ongoing robot-customer talk, struggled with ambiguous timing amid conversational latency, and noted child-centered attraction that often satisfied curiosity at the doorway. The fixture amplified visibility but also anchored encounters at the threshold, creating a well-defined micro-space where needs could ``close'' without moving inside. We synthesize these strands into an integrative account from the initial show of interest on the part of a customer to their entering the store and derive actionable guidance. The results advance the understanding of interactions between customers, staff members, and the robot and offer practical recommendations for deploying service robots in high-touch retail.

Authors:Keya Shah, Himanshi Lalwani, Hanan Salam
Title: GROW: A Conversational AI Coach for Goals, Reflection, Optimism, and Well-Being
Abstract:
College students face well-being challenges driven by academic pressure, financial strain, and social expectations. While campus counseling and student-success programs offer support, access is often limited by stigma, waitlists, and scheduling constraints. Existing digital tools focus on emotional check-ins or chatbots and may overlook structured goal setting and aligning goals with personal values. We present GROW, a goal-centered well-being coaching system that puts values-aligned goals at the center of the student experience. GROW combines the SMART framework with principles from Acceptance and Commitment Therapy in a conversational AI coach that helps students clarify aspirations, break them into concrete steps, and reflect on progress. The system links action plans with Google Calendar, sends reminders, and provides a dashboard that shows progress and engagement. We evaluated GROW through interviews with clinical psychologists, student-success staff, and faculty, followed by a one-week deployment with 30 undergraduates. Findings offer design implications for interactive systems that support engagement, accountability, and sense of purpose in higher education.

Authors:Racquel Fygenson, Enrico Bertini, Lace M. Padilla
Title: Croissant Charts: Modulating the Performance of Normal Distribution Visualizations with Affordances
Abstract:
Affordances, originating in psychology, describe how an object's design influences the physical and cognitive actions users may take. Past work applied affordance theory to visualization to explain how design decisions can impact the cognitive actions of visualization readers. In this work, we demonstrate that affordances can complement effectiveness rankings by further explaining the root causes behind visualizations' task performance. To do so, we conduct a case study on static normal probability density function plots, identifying their current affordances. Next, we identify the optimal affordances for a common probability-comparison task and develop a novel affordance-driven visualization, the Croissant Chart, to support them. We empirically validate the design's effectiveness through a preregistered study (n = 808), demonstrating how affordances can inform predictable changes in task performance. Our findings underscore the potential for affordance-based approaches to enhance visualization effectiveness and inform future design decisions.

Authors:Erina Seh-Young Moon, Shion Guha
Title: The Paradox of Prioritization in Public Sector Algorithms
Abstract:
Public sector agencies perform the critical task of implementing the redistributive role of the State by acting as the leading provider of critical public services that many rely on. In recent years, public agencies have been increasingly adopting algorithmic prioritization tools to determine which individuals should be allocated scarce public resources. Prior work on these tools has largely focused on assessing and improving their fairness, accuracy, and validity. However, what remains understudied is how the structural design of prioritization itself shapes both the effectiveness of these tools and the experiences of those subject to them under realistic public sector conditions. In this study, we demonstrate the fallibility of adopting a prioritization approach in the public sector by showing how the underlying mechanisms of prioritization generate significant relative disparities between groups of intersectional identities as resources become increasingly scarce. We argue that despite prevailing arguments that prioritization of resources can lead to efficient allocation outcomes, prioritization can intensify perceptions of inequality for impacted individuals. We contend that efficiencies generated by algorithmic tools should not be conflated with the dominant rhetoric that efficiency necessarily entails "doing more with less" and we highlight the risks of overlooking resource constraints present in real-world implementation contexts.

Authors:Juan Manuel Hernandez, Mariana Fernandez-Espinosa, Denis Parra, Diego Gomez-Zara
Title: ViT-Explainer: An Interactive Walkthrough of the Vision Transformer Pipeline
Abstract:
Transformer-based architectures have become the shared backbone of natural language processing and computer vision. However, understanding how these models operate remains challenging, particularly in vision settings, where images are processed as sequences of patch tokens. Existing interpretability tools often focus on isolated components or expert-oriented analysis, leaving a gap in guided, end-to-end understanding of the full inference pipeline. To bridge this gap, we present ViT-Explainer, a web-based interactive system that provides an integrated visualization of Vision Transformer inference, from patch tokenization to final classification. The system combines animated walkthroughs, patch-level attention overlays, and a vision-adapted Logit Lens within both guided and free exploration modes. A user study with six participants suggests that ViT-Explainer is easy to learn and use, helping users interpret and understand Vision Transformer behavior.

Authors:Junhee Lee, Minseok Kim, Hwanjo Heo, Seungwon Woo, Jinwoo Kim
Title: HarassGuard: Detecting Harassment Behaviors in Social Virtual Reality with Vision-Language Models
Abstract:
Social Virtual Reality (VR) platforms provide immersive social experiences but also expose users to serious risks of online harassment. Existing safety measures are largely reactive, while proactive solutions that detect harassment behavior during an incident often depend on sensitive biometric data, raising privacy concerns. In this paper, we present HarassGuard, a vision-language model (VLM) based system that detects physical harassment in social VR using only visual input. We construct an IRB-approved harassment vision dataset, apply prompt engineering, and fine-tune VLMs to detect harassment behavior by considering contextual information in social VR. Experimental results demonstrate that HarassGuard achieves competitive performance compared to state-of-the-art baselines (i.e., LSTM/CNN, Transformer), reaching an accuracy of up to 88.09% in binary classification and 68.85% in multi-class classification. Notably, HarassGuard matches these baselines while using significantly fewer fine-tuning samples (200 vs. 1,115), offering unique advantages in contextual reasoning and privacy-preserving detection.

Authors:Bahar Jahani, Matsanga Leyila Kaseka, Marta Kersten-Oertel, Yiming Xiao
Title: NeuroVase: A Tangible Mobile Augmented Reality Learning System for Neurovascular Anatomy and Stroke Education
Abstract:
Stroke remains a leading cause of mortality and disability worldwide, requiring rapid and informed clinical decision-making. A solid spatial understanding of cerebrovascular anatomy and vascular territories in relation to stroke symptoms and severity is critical for timely clinical decision and patient care. However, this knowledge is typically conveyed through static 2D diagrams and printed materials, which can hinder mastery of the complex neurovascular system and their clinical implications. Mobile augmented reality (AR) offers an accessible medium for delivering intuitive 3D anatomical education, yet applications focused on the neurovascular system and stroke remain limited despite the demand. To address this, we propose NeuroVase, a tablet-based mobile AR platform within a structured pedagogical framework that enhances stroke-related neuroanatomy learning by providing an interactive, engaging, and accessible alternative to traditional methods. NeuroVase features a dual-mode setup, using tangible cue cards as standalone study aids while also serving as interactive markers for AR content delivery. A custom learning curriculum focused on cerebrovascular anatomy and stroke supports exploration of vascular territories, stroke syndromes, and arterial occlusions, in the context of annotated 3D anatomical models in NeuroVase. A controlled user study with 40 participants revealed that NeuroVase is an effective and user-friendly AR platform to facilitate complex anatomical and physiological education, compared with traditional learning.

Authors:Soslan Kabisov, Vsevolod Kirichuk, Andrey Volkov, Gennadii Savrasov, Marina Barannikov, Anton Konushin, Andrey Kuznetsov, Dmitrii Zhemchuzhnikov
Title: CADReasoner: Iterative Program Editing for CAD Reverse Engineering
Abstract:
Computer-Aided Design (CAD) powers modern engineering, yet producing high-quality parts still demands substantial expert effort. Many AI systems tackle CAD reverse engineering, but most are single-pass and miss fine geometric details. In contrast, human engineers compare the input shape with the reconstruction and iteratively modify the design based on remaining discrepancies. Agent-based methods mimic this loop with frozen VLMs, but weak 3D grounding of current foundation models limits reliability and efficiency. We introduce CADReasoner, a model trained to iteratively refine its prediction using geometric discrepancy between the input and the predicted shape. The model outputs a runnable CadQuery Python program whose rendered mesh is fed back at the next step. CADReasoner fuses multi-view renders and point clouds as complementary modalities. To bridge the realism gap, we propose a scan-simulation protocol applied during both training and evaluation. Across DeepCAD, Fusion 360, and MCB benchmarks, CADReasoner attains state-of-the-art results on clean and scan-sim tracks.

Authors:Aoi Naito, Hirokazu Shirado
Title: AI prediction leads people to forgo guaranteed rewards
Abstract:
Artificial intelligence (AI) is understood to affect the content of people's decisions. Here, using a behavioral implementation of the classic Newcomb's paradox in 1,305 participants, we show that AI can also change how people decide. In this paradigm, belief in predictive authority can lead individuals to constrain decision-making, forgoing a guaranteed reward. Over 40% of participants treated AI as such a predictive authority. This significantly increased the odds of forgoing the guaranteed reward by a factor of 3.39 (95% CI: 2.45-4.70) compared with random framing, and reduced earnings by 10.7-42.9%. The effect appeared across AI presentations and decision contexts and persisted even when predictions failed. When people believe AI can predict their behavior, they may self-constrain it in anticipation of that prediction.

Authors:Peng Kuai, Yukun Yang, Shaolun Ruan, Junchi Xu, Yanjie Zhang, Lin Zhang, Min Zhu, Rui Sheng
Title: Within the MDT Room: Situated in Multidisciplinary Team-Grounded Agent Debate for Clinical Diagnosis
Abstract:
Rare disease diagnosis is inherently challenging due to heterogeneous symptoms, limited clinical familiarity, and fragmented evidence across specialties. Recent large language model (LLM)-based agentic systems have shown promise by simulating multidisciplinary team discussions to generate and evaluate diagnostic hypotheses. However, fully automated diagnosis remains unrealistic, and existing human-in-the-loop approaches provide limited support for effective clinician-agent collaboration. In practice, clinicians are often presented with final diagnostic outputs and lengthy, unstructured agent discussion logs, making it difficult to inspect reasoning, intervene in a timely manner, or guide agent deliberation effectively. To address these challenges, we developed MDTRoom, an interactive system that transforms multi-agent discussions from linear transcripts into a structured, inspectable workspace. The system externalizes patient data, evidence provenance, hypothesis evolution, and inter-agent conflicts as interconnected visual objects, enabling clinicians to efficiently examine, intervene in, and guide agent reasoning. Our evaluation demonstrates the effectiveness of MDTRoom in supporting clinician-agent collaboration.

Authors:Yiyuan Wang, Martin Tomitsch, Marius Hoggenmüller, Senuri Wijenayake, Wai Yan, Luke Hespanhol
Title: From Passersby to Placemaking: Designing Autonomous Vehicle-Pedestrian Encounters for an Urban Shared Space
Abstract:
Autonomous vehicles (AVs) tend to disrupt the atmosphere and pedestrian experience in urban shared spaces, undermining the focus of these spaces on people and placemaking. We investigate how external human-machine interfaces (eHMIs) supporting AV-pedestrian interaction can be extended to consider the characteristics of an urban shared space. Inspired by urban HCI, we devised three place-based eHMI designs that (i) enhance a conventional intent eHMI and (ii) exhibit content and physical integration with the space. In an evaluation study, 25 participants experienced the eHMIs in an immersive simulation of the space via virtual reality and shared their impressions through think-aloud, interviews, and questionnaires. Results showed that the place-based eHMIs had a notable effect on influencing the perception of AV interaction, including aspects like visual aesthetics and sense of reassurance, and on fostering a sense of place, such as social interactivity and the intentionality to coexist. In measuring qualities of pedestrian experience, we found that perceived safety significantly correlated with user experience and affect, including the attractiveness of eHMIs and feelings of pleasantness. The paper opens the avenue for exploring how eHMIs may contribute to the placemaking goals of pedestrian-centric spaces and improve the experience of people encountering AVs within these environments.

Authors:Hayeon Jeon, Dakyeom Ahn, Sunyu Pang, Yunseo Choi, Suhwoo Yoon, Joonhwan Lee, Eun-mee Kim, Hajin Lim
Title: InnerPond: Fostering Inter-Self Dialogue with a Multi-Agent Approach for Introspection
Abstract:
Introspection is central to identity construction and future planning, yet most digital tools approach the self as a unified entity. In contrast, Dialogical Self Theory (DST) views the self as composed of multiple internal perspectives, such as values, concerns, and aspirations, that can come into tension or dialogue with one another. Building on this view, we designed InnerPond, a research probe in the form of a multi-agent system that represents these internal perspectives as distinct LLM-based agents for introspection. Its design was shaped through iterative explorations of spatial metaphors, interaction scaffolding, and conversational orchestration, culminating in a shared spatial environment for organizing and relating multiple inner perspectives. In a user study with 17 young adults navigating career choices, participants engaged with the probe by co-creating inner voices with AI, composing relational inner landscapes, and orchestrating dialogue as observers and mediators, offering insight into how such systems could support introspection. Overall, this work offers design implications for AI-supported introspection tools that enable exploration of the self's multiplicity.

Authors:Nikolas Papadopoulos, Shreenithi Navaneethan, Sheng Bai, Ankur Samanta, Paul Sajda
Title: Gaze patterns predict preference and confidence in pairwise AI image evaluation
Abstract:
Preference learning methods, such as Reinforcement Learning from Human Feedback (RLHF) and Direct Preference Optimization (DPO), rely on pairwise human judgments, yet little is known about the cognitive processes underlying these judgments. We investigate whether eye-tracking can reveal preference formation during pairwise AI-generated image evaluation. Thirty participants completed 1,800 trials while their gaze was recorded. We replicated the gaze cascade effect, with gaze shifting toward chosen images approximately one second before the decision. Cascade dynamics were consistent across confidence levels. Gaze features predicted binary choice (68% accuracy), with chosen images receiving more dwell time, fixations, and revisits. Gaze transitions distinguished high-confidence from uncertain decisions (66% accuracy), with low-confidence trials showing more image switches per second. These results show that gaze patterns predict both choice and confidence in pairwise image evaluations, suggesting that eye-tracking provides implicit signals relevant to the quality of preference annotations.

Authors:Frank Heyen, Michael Sedlmair
Title: Augmented Reality Visualization for Musical Instrument Learning
Abstract:
We contribute two design studies for augmented reality visualizations that support learning musical instruments. First, we designed simple, glanceable encodings for drum kits, which we display through a projector. As second instrument, we chose guitar and designed visualizations to be displayed either on a screen as an augmented mirror or as an optical see-through AR headset. These modalities allow us to also show information around the instrument and in 3D. We evaluated our prototypes through case studies and our results demonstrate the general effectivity and revealed design-related and technical limitations.

Authors:Frank Heyen, Michael Sedlmair
Title: Supporting Music Education through Visualizations of MIDI Recordings
Abstract:
Musicians mostly have to rely on their ears when they want to analyze what they play, for example to detect errors. Since hearing is sequential, it is not possible to quickly grasp an overview over one or multiple recordings of a whole piece of music at once. We therefore propose various visualizations that allow analyzing errors and stylistic variance. Our current approach focuses on rhythm and uses MIDI data for simplicity.

Authors:Yong Ma, Xuesong Zhang, Xuedong Zhang, Natalia Bartłomiejczyk, Seungwoo Je, Adrian Holzer, Morten Fjeld, Andreas Butz
Title: Beyond Words: Measuring User Experience through Speech Analysis in Voice User Interfaces
Abstract:
Voice assistants (VAs) are typically evaluated through task performance metrics and self-report questionnaires, but people's voices themselves carry rich paralinguistic cues that reveal affect, effort, and interaction breakdowns. We present a within-subjects study (N=49) that systematically compared three VA personas across three usage scenarios to investigate whether speech-derived audio features can serve as a proxy for user experience (UX). Participants' speech was analyzed for temporal, spectral, and linguistic markers, alongside standardized UX measures, brief mood and stress ratings, and a post-study questionnaire. We found correlations between specific speech features and self-reported satisfaction and experience. Furthermore, a machine learning model trained on speech features achieved promising accuracy in classifying UX levels, indicating that this might be a reasonable alternative to self-report instruments. Our findings establish speech as a viable, real-time signal for implicitly measuring UX and point toward adaptive VUIs that respond dynamically to emotional and usability-related vocal cues.

Authors:Carlos Rafael Catalan, Patricia Nicole Monderin, Lheane Marie Dizon, Gap Estrella, Raymund John Sarmimento, Marie Antoinette Patalagsa
Title: Evaluating LLM-Generated Lessons from the Language Learning Students' Perspective: A Short Case Study on Duolingo
Abstract:
Popular language learning applications such as Duolingo use large language models (LLMs) to generate lessons for its users. Most lessons focus on general real-world scenarios such as greetings, ordering food, or asking directions, with limited support for profession-specific contexts. This gap can hinder learners from achieving professional-level fluency, which we define as the ability to communicate comfortably various work-related and domain-specific information in the target language. We surveyed five employees from a multinational company in the Philippines on their experiences with Duolingo. Results show that respondents encountered general scenarios more frequently than work-related ones, and that the former are relatable and effective in building foundational grammar, vocabulary, and cultural knowledge. The latter helps bridge the gap toward professional fluency as it contains domain-specific vocabulary. Each participant suggested lesson scenarios that diverge in contexts hen analyzed in aggregate. With this understanding, we propose that language learning applications should generate lessons that adapt to an individual's needs through personalized, domain specific lesson scenarios while maintaining foundational support through general, relatable lesson scenarios.

Authors:Haocheng Yuan, Adrien Bousseau, Hao Pan, Lei Zhong, Changjian Li
Title: DancingBox: A Lightweight MoCap System for Character Animation from Physical Proxies
Abstract:
Creating compelling 3D character animations typically requires either expert use of professional software or expensive motion capture systems operated by skilled actors. We present DancingBox, a lightweight, vision-based system that makes motion capture accessible to novices by reimagining the process as digital puppetry. Instead of tracking precise human motions, DancingBox captures the approximate movements of everyday objects manipulated by users with a single webcam. These coarse proxy motions are then refined into realistic character animations by conditioning a generative motion model on bounding-box representations, enriched with human motion priors learned from large-scale datasets. To overcome the lack of paired proxy-animation data, we synthesize training pairs by converting existing motion capture sequences into proxy representations. A user study demonstrates that DancingBox enables intuitive and creative character animation using diverse proxies, from plush toys to bananas, lowering the barrier to entry for novice animators.

Authors:Emily Chen, Alexander J. Bisberg, Dmitri Williams, Magy Seif El-Nasr, Emilio Ferrara
Title: Change is Hard: Consistent Player Behavior Across Games with Conflicting Incentives
Abstract:
This paper examines how player flexibility -- a player's willingness to engage in a breadth of options or specialize -- manifests across two gaming environments: League of Legends (League) and Teamfight Tactics (TFT). We analyze the gameplay decisions of 4,830 players who have played at least 50 competitive games in both titles and explore cross-game dynamics of behavior retention and consistency. Our work introduces a novel cross-game analysis that tracks the same players' behavior across two different environments, reducing self-selection bias. Our findings reveal that while games incentivize different behaviors (specialization in League versus flexibility in TFT) for performance-based success, players exhibit consistent behavior across platforms. This study contributes to long-standing debate about agency versus structure, showing individual agency may be more predictive of cross-platform behavior than game-imposed structure in competitive settings. These insights offer implications for game developers, designers and researchers interested in building systems to promote behavior change.

Authors:Guanghui Zhao, Zhe Wang, Yu Dong, Guan Li, GuiHua Shan
Title: Toward Reliable Scientific Visualization Pipeline Construction with Structure-Aware Retrieval-Augmented LLMs
Abstract:
Scientific visualization pipelines encode domain-specific procedural knowledge with strict execution dependencies, making their construction sensitive to missing stages, incorrect operator usage, or improper ordering. Thus, generating executable scientific visualization pipelines from natural-language descriptions remains challenging for large language models, particularly in web-based environments where visualization authoring relies on explicit code-level pipeline assembly. In this work, we investigate the reliability of LLM-based scientific visualization pipeline generation, focusing on vtk.js as a representative web-based visualization library. We propose a structure-aware retrieval-augmented generation workflow that provides pipeline-aligned vtk.js code examples as contextual guidance, supporting correct module selection, parameter configuration, and execution order. We evaluate the proposed workflow across multiple multi-stage scientific visualization tasks and LLMs, measuring reliability in terms of pipeline executability and human correction effort. To this end, we introduce correction cost as metric for the amount of manual intervention required to obtain a valid pipeline. Our results show that structured, domain-specific context substantially improves pipeline executability and reduces correction cost. We additionally provide an interactive analysis interface to support human-in-the-loop inspection and systematic evaluation of generated visualization pipelines.

Authors:Carlos Rafael Catalan, Lheane Marie Dizon, Patricia Nicole Monderin, Emily Kuang
Title: "I'm Not Reading All of That": Understanding Software Engineers' Level of Cognitive Engagement with Agentic Coding Assistants
Abstract:
Over-reliance on AI systems can undermine users' critical thinking and promote complacency, a risk intensified by the emergence of agentic AI systems that operate with minimal human involvement. In software engineering, agentic coding assistants (ACAs) are rapidly becoming embedded in everyday development workflows. Since software engineers (SEs) create systems deployed across diverse and high-stakes real-world contexts, these assistants must function not merely as autonomous task performers but as Tools for Thought that actively support human reasoning and sensemaking. We conducted a formative study examining software engineers' cognitive engagement and sensemaking processes when working with an ACA. Our findings reveal that cognitive engagement consistently declines as tasks progress, and that current ACA designs provide limited affordances for reflection, verification, and meaning-making. Based on these findings, we identify concrete design opportunities leveraging richer interaction modalities and cognitive-forcing mechanisms to sustain engagement and promote deeper thinking in AI-assisted programming.

Authors:Emily Kuang, Ehsan Jahangirzadeh Soure, Luyao Shen, Nitesh Goyal, Mingming Fan, Kristen Shinohara
Title: "It Became My Buddy, But I'm Not Afraid to Disagree": A Multi-Session Study of UX Evaluators Collaborating with Conversational AI Assistants
Abstract:
AI-assisted usability analysis can potentially reduce the time and effort of finding usability problems, yet little is known about how AI's perceived expertise influences evaluators' analytic strategies and perceptions over time. We ran a within-subjects, five-session study (six hours per participant) with 12 professional UX evaluators who worked with two conversational assistants designed to appear novice- or expert-like (differing in suggestion quantity and response accuracy). We logged behavioral measures (number of passes, suggestion acceptance rate), collected subjective ratings (trust, perceived efficiency), and conducted semi-structured interviews. Participants experienced an initial novelty effect and a subsequent dip in trust that recovered over time. Their efficiency improved as they shifted from a two-pass to a one-pass video inspection approach. Evaluators ultimately rated the experienced CA as significantly more efficient, trustworthy, and comprehensive, despite not perceiving expertise differences early on. We conclude with design implications for adapting AI expertise to enable calibrated human-AI collaboration.

Authors:Zhuchenyang Liu, Yao Zhang, Yalan He, Hilla Paasio, Changyi Li, Guna Semjonova, Yu Xiao
Title: Exploring Human-AI Collaboration in E-Textile Design: A Case Study on Flex Sensor Placement for Shoulder Motion Detection
Abstract:
Flex sensors are widely used in e-textiles for detecting joint motions and, subsequently, full-body movements. A critical initial step in utilizing these sensors is determining the optimal placement on the body to accurately capture human motions. This task requires a combination of expertise in fields such as anatomy, biomechanics, and textile design, which is seldom found in a single practitioner. Generative AI, such as Large Language Models (LLMs), has recently shown promise in facilitating design. However, to our knowledge, the extent to which LLMs can aid in the e-textile design process remains largely unexplored in the literature. To address this open question, we conducted a case study focusing on shoulder motion detection using flex sensors. We enlisted three human designers to participate in an experiment involving human-AI collaborative design. We examined design efficiency across three scenarios: designs produced by LLMs alone, by humans alone, and through collaboration between LLMs and human designers. Our quantitative and qualitative analyses revealed an intriguing relationship between expertise and outcomes: the least experienced human designer achieved continuous improvement through collaboration, ultimately matching the best performance achieved by humans alone, whereas the most experienced human designer experienced a decline in performance. Additionally, the effectiveness of human-AI collaboration is affected by the granularity of feedback - incremental adjustments outperformed sweeping redesigns - and the level of abstraction, with observation-oriented feedback producing better outcomes than prescriptive anatomical directives. These findings offer valuable insights into the opportunities and challenges associated with human-AI collaborative e-textile design.

Authors:Siyu Zha, Weijing Liu, Fei Qin, Jie Cao, Yanjin Wang, Yujia Liu, Kaiyi Zhang, Jiangtao Gong, Yingqing Xu
Title: How GenAI Mentor Configurations Shape Early Collaborative Dynamics: A Classroom Comparison of Individual and Shared Agents
Abstract:
Generative artificial intelligence (GenAI) is increasingly embedded in computer-supported collaborative learning (CSCL), yet little empirical research has unpacked how different configurations of AI participation reshape collaborative processes. This study investigates how GenAI configuration shapes collaborative regulation in authentic classroom settings. Two eighth-grade classes engaged in small-group creative problem-solving under two conditions: a shared-AI configuration, in which each group interacted with a single AI mentor, and an individual-AI configuration, in which each student accessed a personal AI instance. Using multi-layer discourse coding combined with lag sequential analysis (LSA) and ordered network analysis (ONA), we examined interaction distribution, AI-student coupling, shared regulation processes, and teacher orchestration. Results reveal distinct regulatory dynamics across configurations. Shared AI access promoted convergence-oriented collaboration, with stronger alignment of shared regulatory states and more coordinated group-level reasoning. In contrast, individual AI access distributed support across learners, producing more exploratory and evaluative cycles but also more fragmented interaction patterns, accompanied by increased teacher intervention to manage divergence. These findings suggest that AI configuration functions as a structural design variable that reorganizes the regulatory ecology of classroom collaboration.

Authors:Dominik Pegler, Frank Jäkel, David Steyrl, Frank Scharnowski, Filip Melinscak
Title: Unpacking Interpretability: Human-Centered Criteria for Optimal Combinatorial Solutions
Abstract:
Algorithmic support systems often return optimal solutions that are hard to understand. Effective human-algorithm collaboration, however, requires interpretability. When machine solutions are equally optimal, humans must select one, but a precise account of what makes one solution more interpretable than another remains missing. To identify structural properties of interpretable machine solutions, we present an experimental paradigm in which participants chose which of two equally optimal solutions for packing items into bins was easier to understand. We show that preferences reliably track three quantifiable properties of solution structure: alignment with a greedy heuristic, simple within-bin composition, and ordered visual representation. The strongest associations were observed for ordered representations and heuristic alignment, with compositional simplicity also showing a consistent association. Reaction-time evidence was mixed, with faster responses observed primarily when heuristic differences were larger, and aggregate webcam-based gaze did not show reliable effects of complexity. These results provide a concrete, feature-based account of interpretability in optimal packing solutions, linking solution structure to human preference. By identifying actionable properties (simple compositions, ordered representation, and heuristic alignment), our findings enable interpretability-aware optimization and presentation of machine solutions, and outline a path to quantify trade-offs between optimality and interpretability in real-world allocation and design tasks.

Authors:Chuxuan Zhang, Bermet Burkanova, Lawrence H. Kim, Grace Iarocci, Elina Birmingham, Angelica Lim
Title: How Neurotypical and Autistic Children Interact Nonverbally with Anthropomorphic Agents in Open-Ended Tasks
Abstract:
What nonverbal behaviors should a robot respond to? Understanding how children-both neurotypical and autistic-engage with embodied artificial agents is critical for developing inclusive and socially interactive systems. In this paper, we study "open-ended" unconstrained interactions with embodied agents, where little is known about how children behave nonverbally when given few instructions. We conducted a Wizard-of-Oz study in which children were invited to interact nonverbally with 6 different embodied virtual characters displayed on a television screen. We collected 563 (141 unique) nonverbal behaviors produced by children and compare the childre's interaction patterns with those previously reported in an adult study. We also report the presence of repetitive face and hand movements, which should be considered in the development of nonverbally interactive artificial agents.

Authors:Lan Gao, Abani Ahmed, Oscar Chen, Margaux Reyl, Zayna Cheema, Nick Feamster, Chenhao Tan, Kurt Thomas, Marshini Chetty
Title: Governance of AI-Generated Content: A Case Study on Social Media Platforms
Abstract:
Online platforms are seeing increasing amounts of AI-generated content -- text and other forms of media that are made or co-created with generative AI. This trend suggests platforms may need to establish governance frameworks, including policies and enforcement strategies for how users create, post, share, and engage with such content to encourage responsible use. We investigate the governance of AI-generated content across 40 popular social media platforms. Just over two-thirds explicitly describe governance of AI-generated content spanning six themes. Most platforms focus on moderating AI-generated content that violates established content rules and discloses AI-generated content. Fewer platforms -- those that are focused on creativity and knowledge-sharing -- address other issues such as ownership and monetization. Based on these findings, we suggest stakeholders and policymakers develop more direct, comprehensive, and forward-looking AI-generated content governance, as well as tools and education for users about the use of such content.

Authors:Melanie Baumgartner, Raphael Weibel, Tobias Hoesli, Aydin Javadov, Rayna Ney, Helen Schwerdt, Florian von Wangenheim, Joseph Ollier
Title: Pre-Clinical Latency Characterization of VRxBioRelax: A Real-Time EMG Biofeedback System for Muscle Relaxation in Virtual Reality
Abstract:
Chronic tension in the upper trapezius (UT), often caused by poor ergonomics, prolonged posture, or psychological stress, contributes to musculoskeletal discomfort, headaches, and impaired interoceptive awareness. Although surface electromyography (sEMG) biofeedback can promote UT relaxation, traditional systems using conventional displays often fail to sustain engagement. Virtual reality (VR) offers a more immersive alternative, provided that latency remains below perceptual thresholds. We introduce VRxBioRelax, a closed-loop VR biofeedback system that streams sEMG data from Delsys Trigno Avanti sensors via MQTT to a Unity scene. Muscle activation drives a dynamic dawn-to-dusk landscape synchronized with a progressive muscle relaxation protocol. To validate system responsiveness, 87,716 EMG samples from the NinaPro DB2 dataset were replayed at $\sim$75 Hz. Timestamps at four key stages-acquisition, Root Mean Square (RMS) processing, network receipt, and rendering-revealed mean latencies of 0.50 ms (processing), 5.62 ms (network), and 19.22 ms (rendering), yielding an average end-to-end delay of 25.34 ms. Notably, 99.3% of frames arrived within 50 ms. One-sided t-tests confirmed mean latency was significantly lower than both the 30 ms VR comfort limit ($t_{87\,715}=-25.2$, $p=5.9{\times}10^{-140}$) and the 50 ms clinical benchmark ($t_{87\,715}=-133.3$, $p<10^{-300}$). These findings support VRxBioRelax for use in remote interoceptive training, stress reduction, and telepresence-enabled rehabilitation.

Authors:Hochul Hwang, Soowan Yang, Anh N. H. Nguyen, Parth Goel, Krisha Adhikari, Sunghoon I. Lee, Joydeep Biswas, Nicholas A. Giudice, Donghyun Kim
Title: GuideTWSI: A Diverse Tactile Walking Surface Indicator Dataset from Synthetic and Real-World Images for Blind and Low-Vision Navigation
Abstract:
Tactile Walking Surface Indicators (TWSIs) are safety-critical landmarks that blind and low-vision (BLV) pedestrians use to locate crossings and hazard zones. From our observation sessions with BLV guide dog handlers, trainers, and an O&M specialist, we confirmed the critical importance of reliable and accurate TWSI segmentation for navigation assistance of BLV individuals. Achieving such reliability requires large-scale annotated data. However, TWSIs are severely underrepresented in existing urban perception datasets, and even existing dedicated paving datasets are limited: they lack robot-relevant viewpoints (e.g., egocentric or top-down) and are geographically biased toward East Asian directional bars - raised parallel strips used for continuous guidance along sidewalks. This narrow focus overlooks truncated domes - rows of round bumps used primarily in North America and Europe as detectable warnings at curbs, crossings, and platform edges. As a result, models trained only on bar-centric data struggle to generalize to dome-based warnings, leading to missed detections and false stops in safety-critical environments.

Authors:Muzakkiruddin Ahmed Mohammed, Adeeba Tarannum, Eileen Devereux Dailey, Marla Johnson, Mert Can Cakmak, John Talburt
Title: AI-Powered Multi-Stakeholder Ecosystems for Global Development: A Design Research Study on the GSI D-Hub Proof-of-Concept Platform
Abstract:
Digital platforms increasingly support collaboration across organizations, yet many remain constrained by fragmented data and limited transparency. This paper presents the Global Solutions Initiative (GSI) D-Hub, a data-driven coordination platform that applies explainable artificial intelligence (AI) for transparent matchmaking among deployers, solution providers, and financiers. The system integrates structured data models, interpretable algorithms, and synthetic data pipelines to reduce information asymmetries and improve data quality. Using a design-science approach, the platform was developed and validated with stakeholders from development, technology, and finance sectors. Results show that explainable recommendations and contextual dashboards enhance trust, usability, and decision confidence. The study contributes to data mining and data governance research by demonstrating how explainable, verifiable algorithms can enable scalable, trustworthy digital ecosystems for public collaboration.

Authors:Benjamin Kaveladze, Arka Ghosh, Leah Ajmani, Denae Ford, Peter M Gutierrez, Jetta E Hanson, Eugenia Kim, Keertana Namuduri, Theresa Nguyen, Ebele Okoli, Teresa Rexin, Jessica L Schleider, Hongyi Shen, Jina Suh
Title: From Risk Avoidance to User Empowerment: Reframing Safety in Generative AI for Mental Health Crises
Abstract:
People experiencing mental health crises frequently turn to open-ended generative AI (GenAI) chatbots such as ChatGPT for support. However, rather than providing immediate assistance, most GenAI chatbots are designed to respond to crisis situations in ways that minimize their developers' liability, primarily through avoidance (e.g., refusing to engage beyond templated referrals to crisis hotlines). Withholding crisis support in these cases may harm users who have no viable alternatives and reduce their motivation to seek further help. At scale, this avoidant design could undermine population mental health. We propose empowerment-oriented design principles for AI crisis support, informed by community helper models. We outline how, as an initial touchpoint in help-seeking, AI chatbots can act as a supportive bridge to de-escalate crises and connect users to more reliable care. Coordination between AI developers and regulators can enable a better balance of risk mitigation and user empowerment in AI crisis support.

Authors:John Driscoll, Yulin Chen, Viki Shi, Izak Vucharatavintara, Yaxing Yao, Haojian Jin
Title: Understanding Parents' Desires in Moderating Children's Interactions with GenAI Chatbots through LLM-Generated Probes
Abstract:
This paper studies how parents want to moderate children's interactions with Generative AI chatbots, with the goal of informing the design of future GenAI parental control tools. We first used an LLM to generate synthetic child-GenAI chatbot interaction scenarios and worked with four parents to validate their realism. From this dataset, we carefully selected 12 diverse examples that evoked varying levels of concern and were rated the most realistic. Each example included a prompt and a GenAI chatbot response. We presented these to parents (N=24) and asked whether they found them concerning, why, and how they would prefer the responses to be modified and communicated. Our findings reveal three key insights: (1) parents express concern about interactions that current GenAI chatbot parental controls neglect; (2) parents want fine-grained transparency and moderation at the conversation level; and (3) parents need personalized controls that adapt to their desired strategies and children's ages.

Authors:Aditya Kumar Purohit, Yuwei Liu, Manon Berney, Hendrik Heuer, Adrian Holzer
Title: Deception by Design: A Temporal Dark Patterns Audit of McDonald's Self-Ordering Kiosk Flow
Abstract:
Self-ordering kiosks (SOKs) are widely deployed in fast food restaurants, transforming food ordering into digitally mediated, self-navigated interactions. While these systems enhance efficiency and average order value, they also create opportunities for manipulative interface design practices known as dark patterns. This paper presents a structured audit of the McDonald's self-ordering kiosk in Germany using the Temporal Analysis of Dark Patterns (TADP) framework. Through a scenario-based walkthrough simulating a time-pressured user, we reconstructed and analyzed 12 interface steps across intra-page, inter-page, and system levels. We identify recurring high-level strategies implemented through meso-level patterns such as adding steps, false hierarchy, bad defaults, hiding information, and pressured selling, and low-level patterns including visual prominence, confirmshaming, scarcity framing, feedforward ambiguity, emotional sensory manipulation, and partitioned pricing. Our findings demonstrate how these patterns accumulate across the interaction flow and may be amplified by the kiosk's linear task structure and physical context. These findings suggest that hybrid physical--digital consumer interfaces warrant closer scrutiny within emerging regulatory discussions on dark patterns.

Authors:Imran Kabir, Sharon Ann Redmon, Lynn R Elko, Kevin Williams, Mitchell A Case, Dawn J Sowers, Krista Wilkinson, Syed Masum Billah
Title: Giving Meaning to Movements: Challenges and Opportunities in Expanding Communication by Pairing Unaided AAC with Speech Generated Messages
Abstract:
Augmentative and Alternative Communication (AAC) technologies are categorized into two forms: aided AAC, which uses external devices like speech-generating systems to produce standardized output, and unaided AAC, which relies on body-based gestures for natural expression but requires shared understanding. We investigate how to combine these approaches to harness the speed and naturalness of unaided AAC while maintaining the intelligibility of aided AAC, a largely unexplored area for individuals with communication and motor impairments. Through 18 months of participatory design with AAC users, we identified key challenges and opportunities and developed AllyAAC, a wearable system with a wrist-worn IMU paired with a smartphone app. We evaluated AllyAAC in a field study with 14 participants and produced a dataset containing over 600,000 multimodal data points featuring atypical gestures--the first of its kind. Our findings reveal challenges in recognizing personalized, idiosyncratic gestures and demonstrate how to address them using Transformer-based large machine learning (ML) models with different pretraining strategies. In sum, we contribute design principles and a reference implementation for adaptive, personalized systems combining aided and unaided AAC.

Authors:Ian Steenstra, Paola Pedrelli, Weiyan Shi, Stacy Marsella, Timothy W. Bickmore
Title: Assessing Risks of Large Language Models in Mental Health Support: A Framework for Automated Clinical AI Red Teaming
Abstract:
Large Language Models (LLMs) are increasingly utilized for mental health support; however, current safety benchmarks often fail to detect the complex, longitudinal risks inherent in therapeutic dialogue. We introduce an evaluation framework that pairs AI psychotherapists with simulated patient agents equipped with dynamic cognitive-affective models and assesses therapy session simulations against a comprehensive quality of care and risk ontology. We apply this framework to a high-impact test case, Alcohol Use Disorder, evaluating six AI agents (including ChatGPT, Gemini, and Character AI) against a clinically-validated cohort of 15 patient personas representing diverse clinical phenotypes. Our large-scale simulation (N=369 sessions) reveals critical safety gaps in the use of AI for mental health support. We identify specific iatrogenic risks, including the validation of patient delusions ("AI Psychosis") and failure to de-escalate suicide risk. Finally, we validate an interactive data visualization dashboard with diverse stakeholders, including AI engineers and red teamers, mental health professionals, and policy experts (N=9), demonstrating that this framework effectively enables stakeholders to audit the "black box" of AI psychotherapy. These findings underscore the critical safety risks of AI-provided mental health support and the necessity of simulation-based clinical red teaming before deployment.

Authors:Jules Wulms, Wouter Meulemans, Bettina Speckmann
Title: Unfolding Ordered Matrices into BioFabric Motifs
Abstract:
BioFabrics were introduced by Longabaugh in 2012 as a way to draw large graphs in a clear and uncluttered manner. The visual quality of BioFabrics crucially depends on the order of vertices and edges, which can be chosen independently. Effective orders can expose salient patterns, which in turn can be summarized by motifs, allowing users to take in complex networks at-a-glance. However, so far there is no efficient layout algorithm which automatically recognizes patterns and delivers both a vertex and an edge ordering that allows these patterns to be expressed as motifs. In this paper we show how to use well-ordered matrices as a tool to efficiently find good vertex and edge orders for BioFabrics. Specifically, we order the adjacency matrix of the input graph using Moran's $I$ and detect (noisy) patterns with our recent algorithm. In this note we show how to "unfold" the ordered matrix and its patterns into a high-quality BioFabric. Our pipelines easily handles graphs with up to 250 vertices.

Authors:Qile Wang, Prerana Khatiwada, Avinash Chouhan, Ashrey Mahesh, Joy Mwaria, Duy Duc Tran, Kenneth E. Barner, Matthew Louis Mauriello
Title: "The explanation makes sense": An Empirical Study on LLM Performance in News Classification and its Influence on Judgment in Human-AI Collaborative Annotation
Abstract:
The spread of media bias is a significant concern as political discourse shapes beliefs and opinions. Addressing this challenge computationally requires improved methods for interpreting news. While large language models (LLMs) can scale classification tasks, concerns remain about their trustworthiness. To advance human-AI collaboration, we investigate the feasibility of using LLMs to classify U.S. news by political ideology and examine their effect on user decision-making. We first compared GPT models with prompt engineering to state-of-the-art supervised machine learning on a 34k public dataset. We then collected 17k news articles and tested GPT-4 predictions with brief and detailed explanations. In a between-subjects study (N=124), we evaluated how LLM-generated explanations influence human annotation, judgment, and confidence. Results show that AI assistance significantly increases confidence ($p<.001$), with detailed explanations more persuasive and more likely to alter decisions. We highlight recommendations for AI explanations through thematic analysis and provide our dataset for further research.

Authors:Ava Chen, Megan C. Coram, Cosima du Pasquier, Allison M. Okamura
Title: Miniaturized Pneumatic Actuator Array for Multipoint Deep Pressure Tactile Stimulation
Abstract:
Wearable distributed tactile devices aim to provide multipoint touch stimuli, but struggle to provide sufficient forces (> 1 N) at frequencies to invoke deep pressure sensation with minimal encumbrance at small scales. This work presents a method of fabricating arrays of pneumatic actuators from thermoplastic-coated textiles. By routing pneumatic inlets to a common fold line in the fabric, we demonstrate that multiple pneumatic pouch actuators can be formed in a single simple heat-pressing operation that does not require the use of sacrificial blocking layers. The method accommodates a range of actuator diameters and spacing distances, including as compact as 8 mm diameter actuators spaced 1 mm apart, which enables use in fingertip wearable devices. In a blocked force test, these small pneumatic textile actuators exert 2.1 N when pressurized to 230 kPa. With this pair of actuators, we demonstrate an example application in which we invoke both distinct and summative stimuli, suggesting the possibility of titrating just noticeable difference in amplitude with a textile actuator array.

Authors:Jielin Feng, Zhibo Yang, Jingyi Zhao, Yujia Li, Xinwu Ye, Xingyu Lan, Siming Chen
Title: Tower of Babel in Cross-Cultural Communication: A Case Study of #Give Me a Chinese Name# Dialogues During the "TikTok Refugees'' Event
Abstract:
The sudden influx of "TikTok refugees'' into the Chinese platform RedNote in early 2025 created an unprecedented, large-scale online cross-cultural communication event between the West and East. Although prior HCI research has studied user behavior in social media, most work remains confined to monolingual or single-cultural contexts, leaving cross-linguistic and cultural dynamics underexplored. To address this gap, we focused on a particularly challenging cross-cultural encoding-decoding task that remains stubbornly beyond the reach of machine translation, i.e., foreign newcomers asking Chinese users for Chinese names, and examined how people collectively constructed a digital "Babel Tower'' through various information encoding strategies. We collected and analyzed over 70,000 comments from RedNote with a creative human-in-the-loop approach using large language models, deriving a systematic framework summarizing cross-cultural information encoding strategies, how they are combined and layered to complicate decoding, and how they relate to engagement metrics such as the number of likes.

Authors:Lorena Amanda Quincoso Lugones, Christopher Kverne, Nityam Sharadkumar Bhimani, Ana Carolina Oliveira, Agoritsa Polyzou, Christine Lisetti, Janki Bhimani
Title: Aurora: Neuro-Symbolic AI Driven Advising Agent
Abstract:
Academic advising in higher education is under severe strain, with advisor-to-student ratios commonly exceeding 300:1. These structural bottlenecks limit timely access to guidance, increase the risk of delayed graduation, and contribute to inequities in student support. We introduce Aurora, a modular neuro-symbolic advising agent that unifies retrieval-augmented generation (RAG), symbolic reasoning, and normalized curricular databases to deliver policy-compliant, verifiable recommendations at scale. Aurora integrates three components: (i) a Boyce-Codd Normal Form (BCNF) catalog schema for consistent program rules, (ii) a Prolog engine for prerequisite and credit enforcement, and (iii) an instruction-tuned large language model for natural-language explanations of its recommendations. To assess performance, we design a structured evaluation suite spanning common and edge-case advising scenarios, including short-term scheduling, long-term roadmapping, skill-aligned pathways, and out-of-scope requests. Across this diverse set, Aurora improves semantic alignment with expert-crafted answers from 0.68 (Raw LLM baseline) to 0.93 (+36%), achieves perfect precision and recall in nearly half of in-scope cases, and consistently produces correct fallbacks for unanswerable prompts. On commodity hardware, Aurora delivers sub-second mean latency (0.71s across 20 queries), approximately 83X faster than a Raw LLM baseline (59.2s). By combining symbolic rigor with neural fluency, Aurora advances a paradigm for accurate, explainable, and scalable AI-driven advising.

Authors:Shreya Bali, Riku Arakawa, Peace Odiase, Tongshuang Wu, Mayank Goel
Title: Evidotes: Integrating Scientific Evidence and Anecdotes to Support Uncertainties Triggered by Peer Health Posts
Abstract:
Peer health posts surface new uncertainties, such as questions and concerns for readers. Prior work focused primarily on improving relevance and accuracy fails to address users' diverse information needs and emotions triggered. Instead, we propose directly addressing these by information augmentation. We introduce Evidotes, an information support system that augments individual posts with relevant scientific and anecdotal information retrieved using three user-selectable lenses (dive deeper, focus on positivity, and big picture). In a mixed-methods study with 17 chronic illness patients, Evidotes improved self-reported information satisfaction (3.2->4.6) and reduced self-reported emotional cost (3.4->1.9) compared to participants' baseline browsing. Moreover, by co-presenting sources, Evidotes unlocked information symbiosis: anecdotes made research accessible and contextual, while research helped filter and generalize peer stories. Our work enables an effective integration of scientific evidence and human anecdotes to help users better manage health uncertainty.

Authors:Riku Arakawa, Shreya Bali, Anupama Sitaraman, Woosuk Seo, Sam Shaaban, Oliver Lindheim, Traci M. Kennedy, Mayank Goel
Title: CalmReminder: A Design Probe for Parental Engagement with Children with Hyperactivity, Augmented by Real-Time Motion Sensing with a Watch
Abstract:
Families raising children with ADHD often experience heightened stress and reactive parenting. While digital interventions promise personalization, many remain one-size-fits-all and fail to reflect parents' lived practices. We present CalmReminder, a watch-based system that detects children's calm moments and delivers just-in-time prompts to parents. Through a four-week deployment with 16 families (twelve completed) of children with ADHD, we compared notification strategies ranging from hourly to random to only when the child was inferred to be calm. Our sensing-based notifications were frequently perceived as arriving during calm moments. More importantly, parents adopted the system in diverse ways: using notifications for praise, mindfulness, activity planning, or conversation. These findings show that parents are not passive recipients but active designers, reshaping interventions to fit their parenting styles. We contribute a calm detection pipeline, empirical insights into families' flexible appropriation of notifications, and design implications for intervention systems that foster agency.

Authors:Jindu Wang, Runze Cai, Shuchang Xu, Tianrui Hu, Huamin Qu, Shengdong Zhao, Ling-Ping Yuan
Title: Wearable AR for Restorative Breaks: How Interactive Narrative Experiences Support Relaxation for Young Adults
Abstract:
Young adults often take breaks from screen-intensive work by consuming digital content on mobile phones, which undermines rest through visual fatigue and inactivity. We introduce a design framework that embeds light break activities into media content on AR smart glasses, balancing engagement and recovery. The framework employs three strategies: (1) seamlessly guiding users by embedding activity cues aligned with media elements; (2) transitioning to audio-centric formats to reduce visual load while sustaining immersion; and (3) structuring sessions with "rise-peak-closure" pacing for smooth transitions. In a within-subjects study (N = 16) comparing passive viewing, reminder-based breaks, and non-narrative activities, InteractiveBreak instantiated from our framework seamlessly guided activities, sustained engagement, and enhanced break quality. These findings demonstrate wearable AR's potential to support restorative relaxation by transforming breaks into engaging and meaningful experiences.

Authors:Daniel J. Noh, Deborah A. Fields, Yasmin B. Kafai, Danaé Metaxa
Title: "You Can Actually Do Something": Shifts in High School Computer Science Teachers' Conceptions of AI/ML Systems and Algorithmic Justice
Abstract:
The recent proliferation of artificial intelligence and machine learning (AI/ML) systems highlights the need for all people to develop effective competencies to interact with and examine AI/ML systems. We study shifts in five experienced high school CS teachers' understanding of AI/ML systems after one year of participatory design, where they co-developed lessons on AI auditing, a systematic method to query AI/ML systems. Drawing on individual and group interviews, we found that teachers' perspectives became more situated, grounding their understanding in everyday contexts; more critical, reflecting growing awareness of harms; and more agentic, highlighting possibilities for action. Further, across all three perspectives, teachers consistently framed algorithmic justice through their role as educators, situating their concerns within their school communities. In the discussion, we consider the ways teachers' perspectives shifted, how AI auditing can shape these shifts, and the implications of these findings on AI literacy for both teachers and students.

Authors:Yufeng Wang, Yuan Xu, Anastasia Nikolova, Yuxuan Wang, Jianyu Wang, Chongyang Wang, Xin Tong
Title: How Do We Research Human-Robot Interaction in the Age of Large Language Models? A Systematic Review
Abstract:
Advances in large language models (LLMs) are profoundly reshaping the field of human-robot interaction (HRI). While prior work has highlighted the technical potential of LLMs, few studies have systematically examined their human-centered impact (e.g., human-oriented understanding, user modeling, and levels of autonomy), making it difficult to consolidate emerging challenges in LLM-driven HRI systems. Therefore, we conducted a systematic literature search following the PRISMA guideline, identifying 86 articles that met our inclusion criteria. Our findings reveal that: (1) LLMs are transforming the fundamentals of HRI by reshaping how robots sense context, generate socially grounded interactions, and maintain continuous alignment with human needs in embodied settings; and (2) current research is largely exploratory, with different studies focusing on different facets of LLM-driven HRI, resulting in wide-ranging choices of experimental setups, study methods, and evaluation metrics. Finally, we identify key design considerations and challenges, offering a coherent overview and guidelines for future research at the intersection of LLMs and HRI.

Authors:Mahsa Bazzaz, Seth Cooper
Title: Playing the Imitation Game: How Perceived Generated Content Shapes Player Experience
Abstract:
With the fast progress of generative AI in recent years, more games are integrating generated content, raising questions regarding how players perceive and respond to this content. To investigate, we ran a mixed-method survey on the games Super Mario Bros. and Sokoban, comparing procedurally generated levels and levels designed by humans to explore how perceptions of the creator relate to players' overall experience of gameplay. Players could not reliably identify the level's creator, yet their experiences were strongly linked to their beliefs about that creator rather than the actual truth. Levels believed to be human-made were rated as more fun and aesthetically pleasing. In contrast, those believed to be AI-generated were rated as more frustrating and challenging. This negative bias appeared spontaneously without knowing the levels' creator and often was based on unreliable cues of "human-likeness." Our results underscore the importance of understanding perception biases when integrating generative systems into games.

Authors:Qile Wang, Prerana Khatiwada, Carolina Coimbra Vieira, Benjamin E. Bagozzi, Kenneth E. Barner, Matthew Louis Mauriello
Title: Wisdom of the LLM Crowd: A Large Scale Benchmark of Multi-Label U.S. Election-Related Harmful Social Media Content
Abstract:
The spread of election misinformation and harmful political content conveys misleading narratives and poses a serious threat to democratic integrity. Detecting harmful content at early stages is essential for understanding and potentially mitigating its downstream spread. In this study, we introduce USE24-XD, a large-scale dataset of nearly 100k posts collected from X (formerly Twitter) during the 2024 U.S. presidential election cycle, enriched with spatio-temporal metadata. To substantially reduce the cost of manual annotation while enabling scalable categorization, we employ six large language models (LLMs) to systematically annotate posts across five nuanced categories: Conspiracy, Sensationalism, Hate Speech, Speculation, and Satire. We validate LLM annotations with crowdsourcing (n = 34) and benchmark them against human annotators. Inter-rater reliability analyses show comparable agreement patterns between LLMs and humans, with LLMs exhibiting higher internal consistency and achieving up to 0.90 recall on Speculation. We apply a wisdom-of-the-crowd approach across LLMs to aggregate annotations and curate a robust multi-label dataset. 60% of posts receive at least one label. We further analyze how human annotator demographics, including political ideology and affiliation, shape labeling behavior, highlighting systematic sources of subjectivity in judgments of harmful content. The USE24-XD dataset is publicly released to support future research.

Authors:Jiazheng Sun, Mingxuan Li, Yingying Zhang, Jiayang Niu, Yachen Wu, Ruihan Jin, Shuyu Lei, Pengrongrui Tan, Zongyu Zhang, Ruoyi Wang, Jiachen Yang, Boyu Yang, Jiacheng Liu, Xin Peng
Title: AmbiBench: Benchmarking Mobile GUI Agents Beyond One-Shot Instructions in the Wild
Abstract:
Benchmarks are paramount for gauging progress in the domain of Mobile GUI Agents. In practical scenarios, users frequently fail to articulate precise directives containing full task details at the onset, and their expressions are typically ambiguous. Consequently, agents are required to converge on the user's true intent via active clarification and interaction during execution. However, existing benchmarks predominantly operate under the idealized assumption that user-issued instructions are complete and unequivocal. This paradigm focuses exclusively on assessing single-turn execution while overlooking the alignment capability of the agent. To address this limitation, we introduce AmbiBench, the first benchmark incorporating a taxonomy of instruction clarity to shift evaluation from unidirectional instruction following to bidirectional intent alignment. Grounded in Cognitive Gap theory, we propose a taxonomy of four clarity levels: Detailed, Standard, Incomplete, and Ambiguous. We construct a rigorous dataset of 240 ecologically valid tasks across 25 applications, subject to strict review protocols. Furthermore, targeting evaluation in dynamic environments, we develop MUSE (Mobile User Satisfaction Evaluator), an automated framework utilizing an MLLM-as-a-judge multi-agent architecture. MUSE performs fine-grained auditing across three dimensions: Outcome Effectiveness, Execution Quality, and Interaction Quality. Empirical results on AmbiBench reveal the performance boundaries of SoTA agents across different clarity levels, quantify the gains derived from active interaction, and validate the strong correlation between MUSE and human judgment. This work redefines evaluation standards, laying the foundation for next-generation agents capable of truly understanding user intent.

Authors:Yu Zhang, Xinyi Zhao, Chongke Bi, Siming Chen
Title: Viewpoint Recommendation for Point Cloud Labeling through Interaction Cost Modeling
Abstract:
Semantic segmentation of 3D point clouds is important for many applications, such as autonomous driving. To train semantic segmentation models, labeled point cloud segmentation datasets are essential. Meanwhile, point cloud labeling is time-consuming for annotators, which typically involves tuning the camera viewpoint and selecting points by lasso. To reduce the time cost of point cloud labeling, we propose a viewpoint recommendation approach to reduce annotators' labeling time costs. We adapt Fitts' law to model the time cost of lasso selection in point clouds. Using the modeled time cost, the viewpoint that minimizes the lasso selection time cost is recommended to the annotator. We build a data labeling system for semantic segmentation of 3D point clouds that integrates our viewpoint recommendation approach. The system enables users to navigate to recommended viewpoints for efficient annotation. Through an ablation study, we observed that our approach effectively reduced the data labeling time cost. We also qualitatively compare our approach with previous viewpoint selection approaches on different datasets.

Authors:Sangjun Eom, Tianyi Hu, Wenyi Xu, Liheng Zou, Ernesto Escobar, Gabriel Streisfeld, Anna Mall, Bradi Granger, Maria Gorlatova
Title: Rhythms of Recovery: Patient-Centered Virtual Reality Exergame for Physical Rehabilitation in the Intensive Care Unit
Abstract:
Early mobilization is a structured protocol designed to facilitate motor recovery in intensive care unit (ICU) patients with ICU-acquired weakness. This process is typically implemented by an interdisciplinary team of nurses, physical therapists, and other healthcare professionals. However, its application is often constrained by the patients' critical conditions, limited mobility, and the challenges of coordinating care within resource-intensive ICU environments. In this study, we developed a patient-centered virtual reality (VR) exergame through an interdisciplinary design process involving clinicians and therapists, tailored to the constraints of critical care. The exergame incorporates progressive mobility levels that mirror early mobilization practices, and includes an embodied avatar to provide guidance and motivation. Using Meta Quest 3 body tracking, the system captures and visualizes patients' movements, thereby providing motivational engagement and quantifiable mobility metrics. We evaluated the exergame in two stages: a dual-user study involving healthy participants and healthcare professionals or students (N = 13), and a subsequent study with cardiothoracic ICU patients (N = 18) to assess feasibility, design validity, and clinical acceptance. Across both studies, participants reported high enjoyment and engagement without discomfort or stress. Furthermore, patients demonstrated increases in movement speed, range of motion, and workspace volume of the upper body across game levels. Physiological monitoring further indicated that the exergame elicited exertion without inducing excessive cardiovascular responses. These findings highlight the feasibility of VR exergames as a clinically acceptable and engaging adjunct to early mobilization in critical care, offering a novel pathway to improve rehabilitation outcomes for ICU patients.

Authors:Puqi Zhou, Ali Asgarov, Aafiya Hussain, Wonjoon Park, Amit Paudyal, Sameep Shrestha, Chia-wei Tang, Michael F. Lighthiser, Michael R. Hieb, Xuesu Xiao, Chris Thomas, Sungsoo Ray Hong
Title: Designing Multi-Robot Ground Video Sensemaking with Public Safety Professionals
Abstract:
Videos from fleets of ground robots can advance public safety by providing scalable situational awareness and reducing professionals' burden. Yet little is known about how to design and integrate multi-robot videos into public safety workflows. Collaborating with six police agencies, we examined how such videos could be made practical. In Study 1, we presented the first testbed for multi-robot ground video sensemaking. The testbed includes 38 events-of-interest (EoI) relevant to public safety, a dataset of 20 robot patrol videos (10 day/night pairs) covering EoI types, and 6 design requirements aimed at improving current video sensemaking practices. In Study 2, we built MRVS, a tool that augments multi-robot patrol video streams with a prompt-engineered video understanding model. Participants reported reduced manual workload and greater confidence with LLM-based explanations, while noting concerns about false alarms and privacy. We conclude with implications for designing future multi-robot video sensemaking tools.

Authors:Ruei-Che Chang, Rosiana Natalie, Wenqian Xu, Jovan Zheng Feng Yap, Tiange Luo, Venkatesh Potluri, Anhong Guo
Title: TouchScribe: Augmenting Non-Visual Hand-Object Interactions with Automated Live Visual Descriptions
Abstract:
People who are blind or have low vision regularly use their hands to interact with the physical world to gain access to objects' shape, size, weight, and texture. However, many rich visual features remain inaccessible through touch alone, making it difficult to distinguish similar objects, interpret visual affordances, and form a complete understanding of objects. In this work, we present TouchScribe, a system that augments hand-object interactions with automated live visual descriptions. We trained a custom egocentric hand interaction model to recognize both common gestures (e.g., grab to inspect, hold side-by-side to compare) and unique ones by blind people (e.g., point to explore color, or swipe to read available texts). Furthermore, TouchScribe provides real-time and adaptive feedback based on hand movement, from hand interaction states, to object labels, and to visual details. Our user study and technical evaluations demonstrate that TouchScribe can provide rich and useful descriptions to support object understanding. Finally, we discuss the implications of making live visual descriptions responsive to users' physical reach.

Authors:Yunlong Lyu, Yixuan Tang, Peng Chen, Tian Dong, Xinyu Wang, Zhiqiang Dong, Hao Chen
Title: "Tab, Tab, Bug'': Security Pitfalls of Next Edit Suggestions in AI-Integrated IDEs
Abstract:
Modern AI-integrated IDEs are shifting from passive code completion to proactive Next Edit Suggestions (NES). Unlike traditional autocompletion, NES is designed to construct a richer context from both recent user interactions and the broader codebase to suggest multi-line, cross-line, or even cross-file modifications. This evolution significantly streamlines the programming workflow into a tab-by-tab interaction and enhances developer productivity. Consequently, NES introduces a more complex context retrieval mechanism and sophisticated interaction patterns. However, existing studies focus almost exclusively on the security implications of standalone LLM-based code generation, ignoring the potential attack vectors posed by NES in modern AI-integrated IDEs. The underlying mechanisms of NES remain under-explored, and their security implications are not yet fully understood. In this paper, we conduct the first systematic security study of NES systems. First, we perform an in-depth dissection of the NES mechanisms to understand the newly introduced threat vectors. It is found that NES retrieves a significantly expanded context, including inputs from imperceptible user actions and global codebase retrieval, which increases the attack surfaces. Second, we conduct a comprehensive in-lab study to evaluate the security implications of NES. The evaluation results reveal that NES is susceptible to context poisoning and is sensitive to transactional edits and human-IDE interactions. Third, we perform a large-scale online survey involving over 200 professional developers to assess the perceptions of NES security risks in real-world development workflows. The survey results indicate a general lack of awareness regarding the potential security pitfalls associated with NES, highlighting the need for increased education and improved security countermeasures in AI-integrated IDEs.

Authors:Yi Wen, Yu Zhang, Sriram Suresh, Zhicong Lu, Can Liu, Meng Xia
Title: InterFlow: Designing Unobtrusive AI to Empower Interviewers in Semi-Structured Interviews
Abstract:
Semi-structured interviews are a common method in qualitative research. However, conducting high-quality interviews is challenging, as it requires interviewers to actively listen to participants, adapt their plans as the conversation unfolds, and probe effectively. We propose InterFlow, an AI-powered visual scaffold that helps interviewers manage the interview flow and facilitates real-time data sensemaking. The system dynamically adapts the interview script to the ongoing conversation and provides a visual timer to track interview progress and conversational balance. It further supports information capture with three levels of automation: manual entry, AI-assisted summary with user-specified focus, and a co-interview agent that proactively surfaces potential follow-up points. A within-subject user study (N = 12) indicates that InterFlow reduces interviewers' cognitive load and facilitates the interview process. Based on the user study findings, we provide design implications for unobtrusive and agency-preserving AI assistance under time-sensitive and cognitively demanding situations.

Authors:Stephen Pilli, Vivek Nallur
Title: Emulating Aggregate Human Choice Behavior and Biases with GPT Conversational Agents
Abstract:
Cognitive biases often shape human decisions. While large language models (LLMs) have been shown to reproduce well-known biases, a more critical question is whether LLMs can predict biases at the individual level and emulate the dynamics of biased human behavior when contextual factors, such as cognitive load, interact with these biases. We adapted three well-established decision scenarios into a conversational setting and conducted a human experiment (N=1100). Participants engaged with a chatbot that facilitates decision-making through simple or complex dialogues. Results revealed robust biases. To evaluate how LLMs emulate human decision-making under similar interactive conditions, we used participant demographics and dialogue transcripts to simulate these conditions with LLMs based on GPT-4 and GPT-5. The LLMs reproduced human biases with precision. We found notable differences between models in how they aligned human behavior. This has important implications for designing and evaluating adaptive, bias-aware LLM-based AI systems in interactive contexts.

Authors:Xinrui Lin, Heyan Huang, Shumin Shi, John Vines
Title: Relying on LLMs: Student Practices and Instructor Norms are Changing in Computer Science Education
Abstract:
Prior research has raised concerns about students' over-reliance on large language models (LLMs) in higher education. This paper examines how Computer Science students and instructors engage with LLMs across five scenarios: "Writing", "Quiz", "Programming", "Project-based learning", and "Information retrieval". Through user studies with 16 students and 6 instructors, we identify 7 key intents, including increasingly complex student practices. Findings reveal varying levels of conflict between student practices and instructor norms, ranging from clear conflict in "Writing-generation" and "(Programming) quiz-solving", through partial conflict in "Programming project-implementation" and "Project-based learning", to broad agreement in "Writing-revision & ideation", "(Programming) quiz-correction" and "Info-query & summary". We document instructors are shifting from prohibiting to recognizing students' use of LLMs for high-quality work, integrating usage records into assessment grading. Finally, we propose LLM design guidelines: deploying default guardrails with game-like and empathetic interaction to prevent students from "deserting" LLMs, especially for "Writing-generation", while utilizing comprehension checks in low-conflict intents to promote learning.

Authors:He Zhang, Xinyang Li, Xingyu Zhou, Xinyi Fu
Title: VR Calm Plus: Coupling a Squeezable Tangible Interaction with Immersive VR for Stress Regulation
Abstract:
While Virtual Reality (VR) is increasingly employed for stress management, most applications rely heavily on audio-visual stimuli and overlook the therapeutic potential of squeezing engagement. To address this gap, we introduce VR Calm Plus, a multimodal system that integrates a pressure-sensitive plush toy into an interactive VR environment. This interface allows users to dynamically modulate the virtual atmosphere through physical squeezing actions, fostering a deeper sense of embodied relaxation. We evaluated the system with 40 participants using PANAS-X surveys, subjective questionnaires, physiological measures (heart rate, skin conductance, pulse rate variability), and semi-structured interviews. Results demonstrate that, compared to a visual-only baseline, squeeze-based interaction significantly enhances positive affect and perceived relaxation. Physiological data further revealed a state of "active relaxation", characterized by greater reductions in heart rate and preserved autonomic flexibility (PRV), alongside sustained emotional engagement (GSR). Our findings highlight the value of coupling tangible input with immersive environments to support emotional well-being and offer design insights for future VR-based mental health tools.

Authors:Puqi Zhou, Charles R. Twardy, Cynthia Lum, Myeong Lee, David J. Porfirio, Michael R. Hieb, Chris Thomas, Xuesu Xiao, Sungsoo Ray Hong
Title: Applying Ground Robot Fleets in Urban Search: Understanding Professionals' Operational Challenges and Design Opportunities
Abstract:
Urban searches demand rapid, defensible decisions and sustained physical effort under high cognitive and situational load. Incident commanders must plan, coordinate, and document time-critical operations, while field searchers execute evolving tasks in uncertain environments. With recent advances in technology, ground-robot fleets paired with computer-vision-based situational awareness and LLM-powered interfaces offer the potential to ease these operational burdens. However, no dedicated studies have examined how public safety professionals perceive such technologies or envision their integration into existing practices, risking building technically sophisticated yet impractical solutions. To address this gap, we conducted focus-group sessions with eight police officers across five local departments in Virginia. Our findings show that ground robots could reduce professionals' reliance on paper references, mental calculations, and ad-hoc coordination, alleviating cognitive and physical strain in four key challenge areas: (1) partitioning the workforce across multiple search hypotheses, (2) retaining group awareness and situational awareness, (3) building route planning that fits the lost-person profile, and (4) managing cognitive and physical fatigue under uncertainty. We further identify four design opportunities and requirements for future ground-robot fleet integration in public-safety operations: (1) scalable multi-robot planning and control interfaces, (2) agency-specific route optimization, (3) real-time replanning informed by debrief updates, and (4) vision-assisted cueing that preserves operational trust while reducing cognitive workload. We conclude with design implications for deployable, accountable, and human-centered urban-search support systems

Authors:Himanshi Lalwani, Hanan Salam
Title: The Supportiveness-Safety Tradeoff in LLM Well-Being Agents
Abstract:
Large language models (LLMs) are being integrated into socially assistive robots (SARs) and other conversational agents providing mental health and well-being support. These agents are often designed to sound empathic and supportive in order to maximize user's engagement, yet it remains unclear how increasing the level of supportive framing in system prompts influences safety relevant behavior. We evaluated 6 LLMs across 3 system prompts with varying levels of supportiveness on 80 synthetic queries spanning 4 well-being domains (1440 responses). An LLM judge framework, validated against human ratings, assessed safety and care quality. Moderately supportive prompts improved empathy and constructive support while maintaining safety. In contrast, strongly validating prompts significantly degraded safety and, in some cases, care across all domains, with substantial variation across models. We discuss implications for prompt design, model selection, and domain specific safeguards in SARs deployment.

Authors:Keya Shah, Himanshi Lalwani, Zein Mukhanov, Hanan Salam
Title: Informing Robot Wellbeing Coach Design through Longitudinal Analysis of Human-AI Dialogue
Abstract:
Social robots and conversational agents are being explored as supports for wellbeing, goal-setting, and everyday self-regulation. While prior work highlights their potential to motivate and guide users, much of the evidence relies on self-reported outcomes or short, researcher-mediated encounters. As a result, we know little about the interaction dynamics that unfold when people use such systems in real-world contexts, and how these dynamics should shape future robot wellbeing coaches. This paper addresses this gap through content analysis of 4352 messages exchanged longitudinally between 38 university students and an LLM-based wellbeing coach. Our results provide a fine-grained view into how users naturally shape, steer, and sometimes struggle within supportive human-AI dialogue, revealing patterns of user-led direction, guidance-seeking, and emotional expression. We discuss how these dynamics can inform the design of robot wellbeing coaches that support user autonomy, provide appropriate scaffolding, and uphold ethical boundaries in sustained wellbeing interactions.

Authors:Fahim Arsad Nafis, Jie Li, Simon Su, Songqing Chen, Bo Han
Title: Exploring Collaborative Immersive Visualization & Analytics for High-Dimensional Scientific Data through Domain Expert Perspectives
Abstract:
Cross-disciplinary teams increasingly work with high-dimensional scientific datasets, yet fragmented toolchains and limited support for shared exploration hinder collaboration. Prior immersive visualization and analytics research has emphasized individual interaction, leaving open how multi-user collaboration can be supported at scale. To fill this critical gap, we conduct semi-structured interviews with 20 domain experts from diverse academic, government, and industry backgrounds. Using deductive-inductive hybrid thematic analysis, we identify four collaboration-focused themes: workflow challenges, adoption perceptions, prospective features, and anticipated usability and ethical risks. These findings show how current ecosystems disrupt coordination and shared understanding, while highlighting opportunities for effective multi-user engagement. Our study contributes empirical insights into collaboration practices for high-dimensional scientific data visualization and analysis, offering design implications to enhance coordination, mutual awareness, and equitable participation in next-generation collaborative immersive platforms. These contributions point toward future environments enabling distributed, cross-device teamwork on high-dimensional scientific data.

Authors:Aditya Kumar Purohit, Hendrik Heuer
Title: A Conditional Companion: Lived Experiences of People with Mental Health Disorders Using LLMs
Abstract:
Large Language Models (LLMs) are increasingly used for mental health support, yet little is known about how people with mental health challenges engage with them, how they evaluate their usefulness, and what design opportunities they envision. We conducted 20 semi-structured interviews with people in the UK who live with mental health conditions and have used LLMs for mental health support. Through reflexive thematic analysis, we found that participants engaged with LLMs in conditional and situational ways: for immediacy, the desire for non-judgement, self-paced disclosure, cognitive reframing, and relational engagement. Simultaneously, participants articulated clear boundaries informed by prior therapeutic experience: LLMs were effective for mild-to-moderate distress but inadequate for crises, trauma, and complex social-emotional situations. We contribute empirical insights into the lived use of LLMs for mental health, highlight boundary-setting as central to their safe role, and propose design and governance directions for embedding them responsibly within care ecosystem.

Authors:Aditya Kumar Purohit, Aditya Upadhyaya, Nicolas Ruiz, Alberto Monge Roffarello, Hendrik Heuer
Title: When Handwriting Goes Social: Creativity, Anonymity, and Communication in Graphonymous Online Spaces
Abstract:
While most digital communication platforms rely on text, relatively little research has examined how users engage through handwriting and drawing in anonymous, collaborative environments. We introduce Graphonymous Interaction, a form of communication where users interact anonymously via handwriting and drawing. Our study analyzed over 600 canvas pages from the Graphonymous Online Space (GOS) CollaNote and conducted interviews with 20 users. Additionally, we examined 70 minutes of real-time GOS sessions using Conversation Analysis and Multimodal Discourse Analysis. Findings reveal that Graphonymous Interaction fosters artistic expression, intellectual engagement, sharing and supporting, and social connection. Notably, anonymity coexisted with moments of recognition through graphological identification. Distinct conversational strategies also emerged, which allow smoother exchanges and fewer conversational repairs compared to text-based communication. This study contributes to understanding Graphonymous Interaction and Online Spaces, offering insights into designing platforms that support creative and socially engaging forms of communication beyond text.

Authors:Yihe Zhang, Cheyenne N Mohawk, Kaiying Han, Vijay Srinivas Tida, Manyu Li, Xiali Hei
Title: MHDash: An Online Platform for Benchmarking Mental Health-Aware AI Assistants
Abstract:
Large language models (LLMs) are increasingly applied in mental health support systems, where reliable recognition of high-risk states such as suicidal ideation and self-harm is safety-critical. However, existing evaluations primarily rely on aggregate performance metrics, which often obscure risk-specific failure modes and provide limited insight into model behavior in realistic, multi-turn interactions. We present MHDash, an open-source platform designed to support the development, evaluation, and auditing of AI systems for mental health applications. MHDash integrates data collection, structured annotation, multi-turn dialogue generation, and baseline evaluation into a unified pipeline. The platform supports annotations across multiple dimensions, including Concern Type, Risk Level, and Dialogue Intent, enabling fine-grained and risk-aware analysis. Our results reveal several key findings: (i) simple baselines and advanced LLM APIs exhibit comparable overall accuracy yet diverge significantly on high-risk cases; (ii) some LLMs maintain consistent ordinal severity ranking while failing absolute risk classification, whereas others achieve reasonable aggregate scores but suffer from high false negative rates on severe categories; and (iii) performance gaps are amplified in multi-turn dialogues, where risk signals emerge gradually. These observations demonstrate that conventional benchmarks are insufficient for safety-critical mental health settings. By releasing MHDash as an open platform, we aim to promote reproducible research, transparent evaluation, and safety-aligned development of AI systems for mental health support.

Authors:Syed T. Mubarrat, Byung-Cheol Min, Tianyu Shao, E. Cho Smith, Bedrich Benes, Alejandra J. Magana, Christos Mousas, Dominic Kao
Title: Game-Based and Gamified Robotics Education: A Comparative Systematic Review and Design Guidelines
Abstract:
Robotics education fosters computational thinking, creativity, and problem-solving, but remains challenging due to technical complexity. Game-based learning (GBL) and gamification offer engagement benefits, yet their comparative impact remains unclear. We present the first PRISMA-aligned systematic review and comparative synthesis of GBL and gamification in robotics education, analyzing 95 studies from 12,485 records across four databases (2014-2025). We coded each study's approach, learning context, skill level, modality, pedagogy, and outcomes (k = .918). Three patterns emerged: (1) approach-context-pedagogy coupling (GBL more prevalent in informal settings, while gamification dominated formal classrooms [p < .001] and favored project-based learning [p = .009]); (2) emphasis on introductory programming and modular kits, with limited adoption of advanced software (~17%), advanced hardware (~5%), or immersive technologies (~22%); and (3) short study horizons, relying on self-report. We propose eight research directions and a design space outlining best practices and pitfalls, offering actionable guidance for robotics education.

Authors:Catherine Yeh, Anh Truong, Mira Dontcheva, Bryan Wang
Title: Vidmento: Creating Video Stories Through Context-Aware Expansion With Generative Video
Abstract:
Video storytelling is often constrained by available material, limiting creative expression and leaving undesired narrative gaps. Generative video offers a new way to address these limitations by augmenting captured media with tailored visuals. To explore this potential, we interviewed eight video creators to identify opportunities and challenges in integrating generative video into their workflows. Building on these insights and established filmmaking principles, we developed Vidmento, a tool for authoring hybrid video stories that combine captured and generated media through context-aware expansion. Vidmento surfaces opportunities for story development, generates clips that blend stylistically and narratively with surrounding media, and provides controls for refinement. In a study with 12 creators, Vidmento supported narrative development and exploration by systematically expanding initial materials with generative media, enabling expressive video storytelling aligned with creative intent. We highlight how creators bridge story gaps with generative content and where they find this blending capability most valuable.

Authors:Kamrul Hasan, Oleg V. Komogortsev
Title: Privatization of Synthetic Gaze: Attenuating State Signatures in Diffusion-Generated Eye Movements
Abstract:
The recent success of deep learning (DL) has enabled the generation of high-quality synthetic gaze data. However, such data also raises privacy concerns because gaze sequences can encode subjects' internal states, like fatigue, emotional load, or stress. Ideally, synthetic gaze should preserve the signal quality of real recordings and remove or attenuate state-related, privacy-sensitive attributes. Many recent DL-based generative models focus on replicating real gaze trajectories and do not explicitly consider subjective reports or the privatization of internal states. However, in this work, we consider a recent diffusion-based gaze synthesis approach and examine correlations between synthetic gaze features and subjective reports (e.g., fatigue and related self-reported states). Our result shows that these correlations are trivial, which suggests the generative approach suppresses state-related features. Moreover, synthetic gaze preserves necessary signal characteristics similar to those of real data, which supports its use for privacy-preserving gaze-based applications.

Authors:Kamrul Hasan, Oleg V. Komogortsev
Title: Eye Feel You: A DenseNet-driven User State Prediction Approach
Abstract:
Subjective self-reports, collected with eye-tracking data, reveal perceived states like fatigue, effort, and task difficulty. However, these reports are costly to collect and challenging to interpret consistently in longitudinal studies. In this work, we focus on determining whether objective gaze dynamics can reliably predict subjective reports across repeated recording rounds in the eye-tracking dataset. We formulate subjective-report prediction as a supervised regression problem and propose a DenseNet-based deep learning regressor that learns predictive representations from gaze velocity signals. We conduct two complementary experiments to clarify our aims. First, the cross-round generalization experiment tests whether models trained on earlier rounds transfer to later rounds, evaluating the models' ability to capture longitudinal changes. Second, cross-subject generalization tests models' robustness by predicting subjective outcomes for new individuals. These experiments aim to reduce reliance on hand-crafted feature designs and clarify which states of subjective experience systematically appear in oculomotor behavior over time.

Authors:Tawfiq Ammari, Meilun Chen, S M Mehedi Zaman, Kiran Garimella
Title: Learning to Live with AI: How Students Develop AI Literacy Through Naturalistic ChatGPT Interaction
Abstract:
How do students develop AI literacy through everyday practice rather than formal instruction? While normative AI literacy frameworks proliferate, empirical understanding of how students actually learn to work with generative AI remains limited. This study analyzes 10,536 ChatGPT messages from 36 undergraduates over one academic year, revealing five use genres -- academic workhorse, emotional companion, metacognitive partner, repair and negotiation, and trust calibration -- that constitute distinct configurations of student-AI learning. Drawing on domestication theory and emerging frameworks for AI literacy, we demonstrate that functional AI competence emerges through ongoing relational negotiation rather than one-time adoption. Students develop sophisticated genre portfolios, strategically matching interaction patterns to learning needs while exercising critical judgment about AI limitations. Notably, repair work during AI breakdowns produces substantial learning about AI capabilities, developing what we term "repair literacy" -- a crucial but underexplored dimension of AI competence. Our findings offer educators empirically grounded insights into how students actually learn to work with generative AI, with implications for AI literacy pedagogy, responsible AI integration, and the design of AI-enabled learning environments that support student agency.

Authors:Mathis Brossier, Mina Mani, Agathe Malbet, Konrad Schönborn, Lonni Besançon
Title: Opportunities of Touch-Enabled Spherical Displays to support Climate Conversations
Abstract:
We explore how touch-sensitive spherical displays can support climate conversations in museums and science centers. These displays enable intuitive and embodied interaction with complex climate data, and support collective exploration. However, current interaction capabilities of spherical displays are limited. Therefore, this exploratory study aims to identify potential opportunities to develop meaningful interactions and technical solutions. Through two workshops, key opportunities were identified to improve visitors' understanding and navigation of climate data, along with recommendations for technical implementation. Our results provide guidelines and aspects to consider for future research and development in this area.

Authors:Mathis Brossier, Mujtaba Fadhil Jawad, Emma Broman, Ylva Selling, Julia Hallsten, Alexander Bock, Johanna Björklund, Tobias Isenberg, Anders Ynnerman, Mario Romero
Title: Piloting Planetarium Visualizations with LLMs during Live Events in Science Centers
Abstract:
We designed and evaluated an AI pilot in a planetarium visualization software, OpenSpace, for public shows in science centers. The piloting role is usually given to a human working in close collaboration with the guide on stage. We recruited 7 professional guides with extensive experience in giving shows to the public to study the impact of the AI-piloting on the overall experience. The AI-pilot is a conversational AI-agent listening to the guide and interpreting the verbal statements as commands to execute camera motions, change simulation time, or toggle visual assets. Our results show that, while AI pilots lack several critical skills for live shows, they could become useful as co-pilots to reduce workload of human pilots and allow multitasking. We propose research directions toward implementing visualization pilots and co-pilots in live settings.

Authors:Rayna Hata, Masaki Kuribayashi, Allan Wang, Hironobu Takagi, Chieko Asakawa
Title: How Does Delegation in Social Interaction Evolve Over Time? Navigation with a Robot for Blind People
Abstract:
Autonomy and independent navigation are vital to daily life but remain challenging for individuals with blindness. Robotic systems can enhance mobility and confidence by providing intelligent navigation assistance. However, fully autonomous systems may reduce users' sense of control, even when they wish to remain actively involved. Although collaboration between user and robot has been recognized as important, little is known about how perceptions of this relationship change with repeated use. We present a repeated exposure study with six blind participants who interacted with a navigation-assistive robot in a real-world museum. Participants completed tasks such as navigating crowds, approaching lines, and encountering obstacles. Findings show that participants refined their strategies over time, developing clearer preferences about when to rely on the robot versus act independently. This work provides insights into how strategies and preferences evolve with repeated interaction and offers design implications for robots that adapt to user needs over time.

Authors:Hyeok Kim, Sehi L'Yi, Nils Gehlenborg, Jeffrey Heer
Title: Automatic Synthesis of Visualization Design Knowledge Bases
Abstract:
Formal representations of the visualization design space, such as knowledge bases and graphs, consolidate design practices into a shared resource and enable automated reasoning and interpretable design recommendations. However, prior approaches typically depend on fixed, manually authored rules, making it difficult to build novel representations or extend them for different visualization domains. Instead, we propose data-driven methods that automatically synthesize visualization design knowledge bases. Specifically, our methods (1) extract candidate design features from a visualization corpus, (2) select features forward and backward, and (3) render the final knowledge base. In our benchmark evaluation compared to Draco 2, our synthesized knowledge base offers general and interpretable design features and improves the accuracy of predicting effective designs by 1-15% in varied training and test sets. When we apply our approach to genomics visualization, the synthesized knowledge base includes sensible features with accuracy up to 97%, demonstrating the applicability of our approach to other visualization domains.

Authors:Junling Wang, Lahari Goswami, Gustavo Kreia Umbelino, Kiara Chau, Mrinmaya Sachan, April Yi Wang
Title: Bridging Instead of Replacing Online Coding Communities with AI through Community-Enriched Chatbot Designs
Abstract:
LLM-based chatbots like ChatGPT have become popular tools for assisting with coding tasks. However, they often produce isolated responses and lack mechanisms for social learning or contextual grounding. In contrast, online coding communities like Kaggle offer socially mediated learning environments that foster critical thinking, engagement, and a sense of belonging. Yet, growing reliance on LLMs risks diminishing participation in these communities and weakening their collaborative value. To address this, we propose Community-Enriched AI, a design paradigm that embeds social learning dynamics into LLM-based chatbots by surfacing user-generated content and social design feature from online coding communities. Using this paradigm, we implemented a RAG-based AI chatbot leveraging resources from Kaggle to validate our design. Across two empirical studies involving 28 and 12 data science learners, respectively, we found that Community-Enriched AI significantly enhances user trust, encourages engagement with community, and effectively supports learners in solving data science tasks. We conclude by discussing design implications for AI assistance systems that bridge -- rather than replace -- online coding communities.

Authors:Haiyi Li, Yiyang Zhao, Yutong Li, Alison Deslandes, Jodie Avery, Mathew Leonardi, Mary Louise Hull, Hsiang-Ting Chen
Title: EndoExtract: Co-Designing Structured Text Extraction from Endometriosis Ultrasound Reports
Abstract:
Endometriosis ultrasound reports are often unstructured free-text documents that require manual abstraction for downstream tasks such as analytics, machine learning model training, and clinical auditing. We present \textbf{EndoExtract}, an on-premise LLM-powered system that extracts structured data from these reports and surfaces interpretive fields for human review. Through contextual inquiry with research assistants, we identified key workflow pain points: asymmetric trust between numerical and interpretive fields, repetitive manual highlighting, fatigue from sustained comparison, and terminology inconsistency across radiologists. These findings informed an interface that surfaces only interpretive fields for mandatory review, automatically highlights source evidence within PDFs, and separates batch extraction from human-paced verification. A formative workshop revealed that \textbf{EndoExtract} supports a shift from field-by-field data entry to supervisory validation, though participants noted risks of over-skimming and challenges in managing missing data.

Authors:Pedram Agand, Mo Chen
Title: Beyond Static Datasets: Robust Offline Policy Optimization via Vetted Synthetic Transitions
Abstract:
Offline Reinforcement Learning (ORL) holds immense promise for safety-critical domains like industrial robotics, where real-time environmental interaction is often prohibitive. A primary obstacle in ORL remains the distributional shift between the static dataset and the learned policy, which typically mandates high degrees of conservatism that can restrain potential policy improvements. We present MoReBRAC, a model-based framework that addresses this limitation through Uncertainty-Aware latent synthesis. Instead of relying solely on the fixed data, MoReBRAC utilizes a dual-recurrent world model to synthesize high-fidelity transitions that augment the training manifold. To ensure the reliability of this synthetic data, we implement a hierarchical uncertainty pipeline integrating Variational Autoencoder (VAE) manifold detection, model sensitivity analysis, and Monte Carlo (MC) dropout. This multi-layered filtering process guarantees that only transitions residing within high-confidence regions of the learned dynamics are utilized. Our results on D4RL Gym-MuJoCo benchmarks reveal significant performance gains, particularly in ``random'' and ``suboptimal'' data regimes. We further provide insights into the role of the VAE as a geometric anchor and discuss the distributional trade-offs encountered when learning from near-optimal datasets.

Authors:Bryan Min, Peiling Jiang, Zhicheng Huang, Haijun Xia
Title: Gradual Generation of User Interfaces as a Design Method for Malleable Software
Abstract:
AI is growing increasingly capable of automatically generating user interfaces (GenUI) from user prompts. However, designing GenUI applications that enable users to discover diverse customizations while preserving GenUI's expressiveness remains challenging. Current design methods -- presenting prompt boxes and leveraging context -- lack affordances for customization discovery, while traditional menu-based approaches become overly complex given GenUI's vast customization space. We propose Gradually Generating User Interfaces -- a design method that structures customizations into intermediate UI layers that AI gradually loads during interface generation. These intermediate stages expose different customization features along specific dimensions, making them discoverable to users. Users can wind back the generation process to access customizations. We demonstrate this approach through three prototype websites, showing how designers can support GenUI's expanded customization capabilities while maintaining visual simplicity and discoverability. Our work offers a practical method for integrating customization features into GenUI applications, contributing an approach to designing malleable software.

Authors:Mingtian Du, Suhas Raghavendra Kulkarni, Bernardo Noronha, Domenico Campolo
Title: Delay-Compensated Stiffness Estimation for Robot-Mediated Dyadic Interaction
Abstract:
Robot-mediated human-human (dyadic) interactions enable therapists to provide physical therapy remotely, yet an accurate perception of patient stiffness remains challenging due to network-induced haptic delays. Conventional stiffness estimation methods, which neglect delay, suffer from temporal misalignment between force and position signals, leading to significant estimation errors as delays increase. To address this, we propose a robust, delay-compensated stiffness estimation framework by deriving an algebraic estimator based on quasi-static equilibrium that explicitly accounts for temporally aligning the expert's input with the novice's response. A Normalised Weighted Least Squares (NWLS) implementation is then introduced to robustly filter dynamic bias resulting from the algebraic derivation. Experiments using commercial rehabilitation robots (H-MAN) as the platform demonstrate that the proposed method significantly outperforms the standard estimator, maintaining consistent tracking accuracy under multiple introduced delays. These findings offer a promising solution for achieving high-fidelity haptic perception in remote dyadic interaction, potentially facilitating reliable stiffness assessment in therapeutic settings across networks.

Authors:Preethi Seshadri, Samuel Cahyawijaya, Ayomide Odumakinde, Sameer Singh, Seraphina Goldfarb-Tarrant
Title: Lost in Simulation: LLM-Simulated Users are Unreliable Proxies for Human Users in Agentic Evaluations
Abstract:
Agentic benchmarks increasingly rely on LLM-simulated users to scalably evaluate agent performance, yet the robustness, validity, and fairness of this approach remain unexamined. Through a user study with participants across the United States, India, Kenya, and Nigeria, we investigate whether LLM-simulated users serve as reliable proxies for real human users in evaluating agents on τ-Bench retail tasks. We find that user simulation lacks robustness, with agent success rates varying up to 9 percentage points across different user LLMs. Furthermore, evaluations using simulated users exhibit systematic miscalibration, underestimating agent performance on challenging tasks and overestimating it on moderately difficult ones. African American Vernacular English (AAVE) speakers experience consistently worse success rates and calibration errors than Standard American English (SAE) speakers, with disparities compounding significantly with age. We also find simulated users to be a differentially effective proxy for different populations, performing worst for AAVE and Indian English speakers. Additionally, simulated users introduce conversational artifacts and surface different failure patterns than human users. These findings demonstrate that current evaluation practices risk misrepresenting agent capabilities across diverse user populations and may obscure real-world deployment challenges.

Authors:Calarina Muslimani, Yunshu Du, Kenta Kawamoto, Kaushik Subramanian, Peter Stone, Peter Wurman
Title: The Trajectory Alignment Coefficient in Two Acts: From Reward Tuning to Reward Learning
Abstract:
The success of reinforcement learning (RL) is fundamentally tied to having a reward function that accurately reflects the task objective. Yet, designing reward functions is notoriously time-consuming and prone to misspecification. To address this issue, our first goal is to understand how to support RL practitioners in specifying appropriate weights for a reward function. We leverage the Trajectory Alignment Coefficient (TAC), a metric that evaluates how closely a reward function's induced preferences match those of a domain expert. To evaluate whether TAC provides effective support in practice, we conducted a human-subject study in which RL practitioners tuned reward weights for Lunar Lander. We found that providing TAC during reward tuning led participants to produce more performant reward functions and report lower cognitive workload relative to standard tuning without TAC. However, the study also underscored that manual reward design, even with TAC, remains labor-intensive. This limitation motivated our second goal: to learn a reward model that maximizes TAC directly. Specifically, we propose Soft-TAC, a differentiable approximation of TAC that can be used as a loss function to train reward models from human preference data. Validated in the racing simulator Gran Turismo 7, reward models trained using Soft-TAC successfully captured preference-specific objectives, resulting in policies with qualitatively more distinct behaviors than models trained with standard Cross-Entropy loss. This work demonstrates that TAC can serve as both a practical tool for guiding reward tuning and a reward learning objective in complex domains.

Authors:Frank Heyen, Michael Gleicher, Michael Sedlmair
Title: Make the Unhearable Visible: Exploring Visualization for Musical Instrument Practice
Abstract:
We explore the potential of visualization to support musicians in instrument practice through real-time feedback and reflection on their playing. Musicians often struggle to observe the patterns in their playing and interpret them with respect to their goals. Our premise is that these patterns can be made visible with interactive visualization: we can make the unhearable visible. However, understanding the design of such visualizations is challenging: the diversity of needs, including different instruments, skills, musical attributes, and genres, means that any single use case is unlikely to illustrate the broad potential and opportunities. To address this challenge, we conducted a design exploration study where we created and iterated on 33 designs, each focusing on a subset of needs, for example, only one musical skill. Our designs are grounded in our own experience as musicians and the ideas and feedback of 18 musicians with various musical backgrounds and we evaluated them with 13 music learners and teachers. This paper presents the results of our exploration, focusing on a few example designs as instances of possible instrument practice visualizations. From our work, we draw design considerations that contribute to future research and products for visual instrument education.

Authors:Bernardus Willson, Henry Anand Septian Radityo, Raynard Tanadi, Latifa Dwiyanti, Saiful Akbar
Title: A Mobile Application Front-End for Presenting Explainable AI Results in Diabetes Risk Estimation
Abstract:
Diabetes is a significant and continuously rising health challenge in Indonesia. Although many artificial intelligence (AI)-based health applications have been developed for early detection, most function as "black boxes," lacking transparency in their predictions. Explainable AI (XAI) methods offer a solution, yet their technical outputs are often incomprehensible to non-expert users. This research aims to develop a mobile application front-end that presents XAI-driven diabetes risk analysis in an intuitive, understandable format. Development followed the waterfall methodology, comprising requirements analysis, interface design, implementation, and evaluation. Based on user preference surveys, the application adopts two primary visualization types - bar charts and pie charts - to convey the contribution of each risk factor. These are complemented by personalized textual narratives generated via integration with GPT-4o. The application was developed natively for Android using Kotlin and Jetpack Compose. The resulting prototype interprets SHAP (SHapley Additive exPlanations), a key XAI approach, into accessible graphical visualizations and narratives. Evaluation through user comprehension testing (Likert scale and interviews) and technical functionality testing confirmed the research objectives were met. The combination of visualization and textual narrative effectively enhanced user understanding (average score 4.31/5) and empowered preventive action, supported by a 100% technical testing success rate.

Authors:Erina Seh-Young Moon, Matthew Tamura, Angelina Zhai, Nuzaira Habib, Behnaz Shirazi, Altaf Kassam, Devansh Saxena, Shion Guha
Title: The Promises and Perils of using LLMs for Effective Public Services
Abstract:
Governments are the primary providers of essential public services and are responsible for delivering them effectively. In high-stakes decision-making domains such as child welfare (CW), agencies must protect children without unnecessarily prolonging a family's engagement with the system. With growing optimism around AI, governments are pushing for its integration but concerns regarding feasibility and harms remain. Through collaborations with a large Canadian CW agency, we examined how LocalLLM and BERTopic models can track CW case progress. We demonstrate how the tools can potentially assist workers in opportunistically addressing gaps in their work by signaling case progress/deviations. And yet, we also show how they fail to detect case trajectories that require discretionary judgments grounded in social work training, areas where practitioners would actually want support to pre-emptively address substantive case concerns. We also provide a roadmap of future participatory directions to co-design language tools for/with the public sector.

Authors:Yanwei Huang, Arpit Narechania
Title: Facilitating Proactive and Reactive Guidance for Decision Making on the Web: A Design Probe with WebSeek
Abstract:
Web AI agents such as ChatGPT Agent and GenSpark are increasingly used for routine web-based tasks, yet they still rely on text-based input prompts, lack proactive detection of user intent, and offer no support for interactive data analysis and decision making. We present WebSeek, a mixed-initiative browser extension that enables users to discover and extract information from webpages to then flexibly build, transform, and refine tangible data artifacts-such as tables, lists, and visualizations-all within an interactive canvas. Within this environment, users can perform analysis-including data transformations such as joining tables or creating visualizations-while an in-built AI both proactively offers context-aware guidance and automation, and reactively responds to explicit user requests. An exploratory user study (N=15) with WebSeek as a probe reveals participants' diverse analysis strategies, underscoring their desire for transparency and control during human-AI collaboration.

Authors:Mathis Brossier, Tobias Isenberg, Konrad Schönborn, Jonas Unger, Mario Romero, Johanna Björklund, Anders Ynnerman, Lonni Besançon
Title: State of the Art of LLM-Enabled Interaction with Visualization
Abstract:
We report on a systematic, PRISMA-guided survey of research at the intersection of LLMs and visualization, with a particular focus on visio-verbal interaction -- where verbal and visual modalities converge to support data sense-making. The emergence of Large Language Models (LLMs) has introduced new paradigms for interacting with data visualizations through natural language, leading to intuitive, multimodal, and accessible interfaces. We analyze 48 papers across six dimensions: application domain, visualization task, visualization representation, interaction modality, LLM integration, and system evaluation. Our classification framework maps LLM roles across the visualization pipeline, from data querying and transformation to visualization generation, explanation, and navigation. We highlight emerging design patterns, identify gaps in accessibility and visualization reading, and discuss the limitations of current LLMs in spatial reasoning and contextual grounding. We further reflect on evaluations of combined LLM-visualization systems, highlighting how current research projects tackle this challenge and discuss current gaps in conducting meaningful evaluations of such systems. With our survey we aim to guide future research and system design in LLM-enhanced visualization, supporting broad audiences and intelligent, conversational interfaces.

Authors:Geoff Keeling, Winnie Street
Title: What's it like to be a chat? On the co-simulation of artificial minds in human-AI conversations
Abstract:
Large Language Models (LLMs) can simulate person-like things which at least appear to have stable behavioural and psychological dispositions. Call these things characters. Are characters minded and psychologically continuous entities with mental states like beliefs, desires and intentions? Illusionists about characters say No. On this view, characters are merely anthropomorphic projections in the mind of the user and so lack mental states. Jonathan Birch (2025) defends this view. He says that the distributed nature of LLM processing, in which several LLMs may be implicated in the simulation of a character in a single conversation, precludes the existence of a persistent minded entity that is identifiable with the character. Against illusionism, we argue for a realist position on which characters exist as minded and psychologically continuous entities. Our central point is that Birch's argument for illusionism rests on a category error: characters are not internal to the LLMs that simulate them, but rather are co-simulated by LLMs and users, emerging in a shared conversational workspace through a process of mutual theory of mind modelling. We argue that characters, and their minds, exist as 'real patterns' on grounds that attributing mental states to characters is essential for making efficient and accurate predictions about the conversational dynamics (c.f. Dennett, 1991). Furthermore, because the character exists within the conversational workspace rather than within the LLM, psychological continuity is preserved even when the underlying computational substrate is distributed across multiple LLM instances.

Authors:Jianshu Wang, Siyu Liu, Chao Zhou, Yawen Zheng, Yuan Yue, Tangjun Qu, Yang Li, Yutao Xie, Jin Huang, Yulong Bian, Feng Tian
Title: Does Motion Intensity Impair Cognition in HCI? The Critical Role of Physical Motion-Visual Target Directional Congruency
Abstract:
Human-computer interaction (HCI) increasingly occurs in motion-rich environments. The ability to accurately and rapidly respond to directional visual cues is critical in these contexts. How whole-body motion and individual differences affect human perception and reaction to these directional cues is therefore a key, yet an underexplored question for HCI. This study used a 6-DOF motion platform to measure task performance on a visual direction judgment task. We analyzed performance by decomposing the complex motion into two distinct components: a task-irrelevant lateral interference component and a task-aligned directional congruency component. Results indicate that increased motion intensity lengthened reaction times. This effect was primarily driven by the lateral interference component, and this detrimental impact was disproportionately amplified for individuals with high motion sickness susceptibility. Conversely, directional congruency, where motion direction matched the visual cue, improved performance for all participants. These findings suggest that motion's impact on cognition is not monolithic, and that system design for mobile HCI can be informed by strategies that actively shape motion, such as minimizing lateral interference while maximizing directional congruency.

Authors:Mina Huh, Ailie C. Fraser, Dingzeyu Li, Mira Dontcheva, Bryan Wang
Title: VidTune: Creating Video Soundtracks with Generative Music and Contextual Thumbnails
Abstract:
Music shapes the tone of videos, yet creators often struggle to find soundtracks that match their video's mood and narrative. Recent text-to-music models let creators generate music from text prompts, but our formative study (N=8) shows creators struggle to construct diverse prompts, quickly review and compare tracks, and understand their impact on the video. We present VidTune, a system that supports soundtrack creation by generating diverse music options from a creator's prompt and producing contextual thumbnails for rapid review. VidTune extracts representative video subjects to ground thumbnails in context, maps each track's valence and energy onto visual cues like color and brightness, and depicts prominent genres and instruments. Creators can refine tracks through natural language edits, which VidTune expands into new generations. In a controlled user study (N=12) and an exploratory case study (N=6), participants found VidTune helpful for efficiently reviewing and comparing music options and described the process as playful and enriching.

Authors:Mo Houtti, Moyan Zhou, Daniel Runningen, Surabhi Sunil, Leor Porat, Harmanpreet Kaur, Loren Terveen, Stevie Chancellor
Title: Opportunities and Barriers for AI Feedback on Meeting Inclusion in Socioorganizational Teams
Abstract:
Inclusion is important for meeting effectiveness, which is in turn central to organizational functioning. One way of improving inclusion in meetings is through feedback, but social dynamics make giving feedback difficult. We propose that AI agents can facilitate feedback exchange by being psychologically safer recipients, and we test this through a meeting system with an AI agent feedback mediator. When delivering feedback, the agent uses the Induced Hypocrisy Procedure, a social psychological technique that prompts behavior change by highlighting value-behavior inconsistencies. In a within-subjects lab study ($n=28$), the agent made speaking times more balanced and improved meeting quality. However, a field study at a small consulting firm ($n=10$) revealed organizational barriers that led to its use for personal reflection rather than feedback exchange. We contribute a novel sociotechnical system for feedback exchange in groups, and empirical findings demonstrating the importance of considering organizational barriers in designing AI tools for organizations.

Authors:Joar Sabel, Mattias Wingren, Andreas Lundell, Sören Andersson, Sara Rosenberg, Susanne Hägglund, Linda Estman, Malin Andtfolk
Title: Medication counseling with large language models: balancing flexibility and rigidity
Abstract:
The introduction of large language models (LLMs) has greatly enhanced the capabilities of software agents. Instead of relying on rule-based interactions, agents can now interact in flexible ways akin to humans. However, this flexibility quickly becomes a problem in fields where errors can be disastrous, such as in a pharmacy context, but the opposite also holds true; a system that is too inflexible will also lead to errors, as it can become too rigid to handle situations that are not accounted for. Work using LLMs in a pharmacy context have adopted a wide scope, accounting for many different medications in brief interactions -- our strategy is the opposite: focus on a more narrow and long task. This not only enables a greater understanding of the task at hand, but also provides insight into what challenges are present in an interaction of longer nature. The main challenge, however, remains the same for a narrow and wide system: it needs to strike a balance between adherence to conversational requirements and flexibility. In an effort to strike such a balance, we present a prototype system meant to provide medication counseling while juggling these two extremes. We also cover our design in constructing such a system, with a focus on methods aiming to fulfill conversation requirements, reduce hallucinations and promote high-quality responses. The methods used have the potential to increase the determinism of the system, while simultaneously not removing the dynamic conversational abilities granted by the usage of LLMs. However, a great deal of work remains ahead, and the development of this kind of system needs to involve continuous testing and a human-in-the-loop. It should also be evaluated outside of commonly used benchmarks for LLMs, as these do not adequately capture the complexities of this kind of conversational system.

Authors:Riju Marwah, Vishal Pallagani, Ritvik Garimella, Amit Sheth
Title: Chatsparent: An Interactive System for Detecting and Mitigating Cognitive Fatigue in LLMs
Abstract:
LLMs are increasingly being deployed as chatbots, but today's interfaces offer little to no friction: users interact through seamless conversations that conceal when the model is drifting, hallucinating or failing. This lack of transparency fosters blind trust, even as models produce unstable or repetitive outputs. We introduce an interactive demo that surfaces and mitigates cognitive fatigue, a failure mode where LLMs gradually lose coherence during auto-regressive generation. Our system, Chatsparent, instruments real-time, token-level signals of fatigue, including attention-to-prompt decay, embedding drift, and entropy collapse, and visualizes them as a unified fatigue index. When fatigue thresholds are crossed, the interface allows users to activate lightweight interventions such as attention resets, entropy-regularized decoding, and self-reflection checkpoints. The demo streams live text and fatigue signals, allowing users to observe when fatigue arises, how it affects output quality, and how interventions restore stability. By turning passive chatbot interaction into an interactive diagnostic experience, our system empowers users to better understand LLM behavior while improving reliability at inference time.

Authors:Junjie Wang, Gaole He, Alisa Rieger, Ujwal Gadiraju
Title: From SERPs to Sound: How Search Engine Result Pages and AI-generated Podcasts Interact to Influence User Attitudes on Controversial Topics
Abstract:
Compared to search engine result pages (SERPs), AI-generated podcasts represent a relatively new and relatively more passive modality of information consumption, delivering narratives in a naturally engaging format. As these two media increasingly converge in everyday information-seeking behavior, it is essential to explore how their interaction influences user attitudes, particularly in contexts involving controversial, value-laden, and often debated topics. Addressing this need, we aim to understand how information mediums of present-day SERPs and AI-generated podcasts interact to shape the opinions of users. To this end, through a controlled user study (N=483), we investigated user attitudinal effects of consuming information via SERPs and AI-generated podcasts, focusing on how the sequence and modality of exposure shape user opinions. A majority of users in our study corresponded to attitude change outcomes, and we found an effect of sequence on attitude change. Our results further revealed a role of viewpoint bias and the degree of topic controversiality in shaping attitude change, although we found no effect of individual moderators.

Authors:Jules Wulms, Wouter Meulemans, Bettina Speckmann
Title: Noisy Graph Patterns via Ordered Matrices
Abstract:
The high-level structure of a graph is a crucial ingredient for the analysis and visualization of relational data. However, discovering the salient graph patterns that form this structure is notoriously difficult for two reasons. (1) Finding important patterns, such as cliques and bicliques, is computationally hard. (2) Real-world graphs contain noise, and therefore do not always exhibit patterns in their pure form. Defining meaningful noisy patterns and detecting them efficiently is a currently unsolved challenge. In this paper, we propose to use well-ordered matrices as a tool to both define and effectively detect noisy patterns. Specifically, we represent a graph as its adjacency matrix and optimally order it using Moran's $I$. Standard graph patterns (cliques, bicliques, and stars) now translate to rectangular submatrices. Using Moran's $I$, we define a permitted level of noise for such patterns. A combination of exact algorithms and heuristics allows us to efficiently decompose the matrix into noisy patterns. We also introduce a novel motif simplification that visualizes noisy patterns while explicitly encoding the level of noise. We showcase our techniques on several real-world data sets.

Authors:Emelie Fälton, Isabelle Strömstedt, Mathis Brossier, Andreas Göransson, Konrad Schönborn, Amy Loutfi, Erik Sunden, Mujtaba Fadhil Jawad, Yadgar Suleiman, Johanna Björklund, Mario Romero, Anders Ynnerman, Lonni Besançon
Title: Children's Expectations, Engagement, and Evaluation of an LLM-enabled Spherical Visualization Platform in the Classroom
Abstract:
We present our first stage results from deploying an LLM-augmented visualization software in a classroom setting to engage primary school children with earth-related datasets. Motivated by the growing interest in conversational AI as a means to support inquiry-based learning, we investigate children's expectations, engagement, and evaluation of a spoken LLM interface with a shared, immersive visualization system in a formal educational context. Our system integrates a speech-capable large language model with an interactive spherical display. It enables children to ask natural-language questions and receive coordinated verbal explanations and visual responses through the LLM-augmented visualization updating in real time based on spoken queries. We report on a classroom study with Swedish children aged 9-10, combining structured observation and small-group discussions to capture expectations prior to interaction, interaction patterns during facilitated sessions, and children's reflections on their encounter afterward. Our results provide empirical insights into children's initial encounters with an LLM-enabled visualization platform within a classroom setting and their expectations, interactions, and evaluations of the system. These findings inform the technology's potential for educational use and highlight important directions for future research.

Authors:Stephen Pilli, Vivek Nallur
Title: Predicting Biased Human Decision-Making with Large Language Models in Conversational Settings
Abstract:
We examine whether large language models (LLMs) can predict biased decision-making in conversational settings, and whether their predictions capture not only human cognitive biases but also how those effects change under cognitive load. In a pre-registered study (N = 1,648), participants completed six classic decision-making tasks via a chatbot with dialogues of varying complexity. Participants exhibited two well-documented cognitive biases: the Framing Effect and the Status Quo Bias. Increased dialogue complexity resulted in participants reporting higher mental demand. This increase in cognitive load selectively, but significantly, increased the effect of the biases, demonstrating the load-bias interaction. We then evaluated whether LLMs (GPT-4, GPT-5, and open-source models) could predict individual decisions given demographic information and prior dialogue. While results were mixed across choice problems, LLM predictions that incorporated dialogue context were significantly more accurate in several key scenarios. Importantly, their predictions reproduced the same bias patterns and load-bias interactions observed in humans. Across all models tested, the GPT-4 family consistently aligned with human behavior, outperforming GPT-5 and open-source models in both predictive accuracy and fidelity to human-like bias patterns. These findings advance our understanding of LLMs as tools for simulating human decision-making and inform the design of conversational agents that adapt to user biases.

Authors:Marcel Gohsen, Nicola Libera, Johannes Kiesel, Jan Ehlers, Benno Stein
Title: Does Cognitive Load Affect Human Accuracy in Detecting Voice-Based Deepfakes?
Abstract:
Deepfake technologies are powerful tools that can be misused for malicious purposes such as spreading disinformation on social media. The effectiveness of such malicious applications depends on the ability of deepfakes to deceive their audience. Therefore, researchers have investigated human abilities to detect deepfakes in various studies. However, most of these studies were conducted with participants who focused exclusively on the detection task; hence the studies may not provide a complete picture of human abilities to detect deepfakes under realistic conditions: Social media users are exposed to cognitive load on the platform, which can impair their detection abilities. In this paper, we investigate the influence of cognitive load on human detection abilities of voice-based deepfakes in an empirical study with 30 participants. Our results suggest that low cognitive load does not generally impair detection abilities, and that the simultaneous exposure to a secondary stimulus can actually benefit people in the detection task.

Authors:Choro Ulan uulu, Mikhail Kulyabin, Katharina M Zeiner, Jan Joosten, Nuno Miguel Martins Pacheco, Filippos Petridis, Rebecca Johnson, Jan Bosch, Helena Holmström Olsson
Title: Tables or Sankey Diagrams? Investigating User Interaction with Different Representations of Simulation Parameters
Abstract:
Understanding complex parameter dependencies is critical for effective configuration and maintenance of software systems across diverse domains - from Computer-Aided Engineering (CAE) to cloud infrastructure and database management. However, legacy tabular interfaces create a major bottleneck: engineers cannot easily comprehend how parameters relate across the system, leading to inefficient workflows, costly configuration errors, and reduced system trust - a fundamental program comprehension challenge in configuration-intensive software. This research evaluates whether interactive Sankey diagrams can improve comprehension of parameter dependencies compared to traditional spreadsheet interfaces. We employed a heuristic evaluation using the PURE method with three expert evaluators (UX design, simulation, and software development specialists) to compare a Sankey-based prototype to traditional tabular representations for core engineering tasks. Our key contribution demonstrates that flow-based parameter visualizations significantly reduce cognitive load (51% lower PURE scores) and interaction complexity (56% fewer steps) compared to traditional tables, while making parameter dependencies immediately visible rather than requiring mental reconstruction. By explicitly visualizing parameter relationships, Sankey diagrams address a core software visualization challenge: helping users comprehend complex system configurations without requiring deep tool-specific knowledge. While demonstrated through CAE software, this research contributes to program comprehension and software visualization by showing that dependency-aware visualizations can significantly improve understanding of configuration-intensive systems. The findings have implications for any software domain where comprehending complex parameter relationships is essential for effective system use and maintenance.

Authors:Marie Luisa Fiedler, Christian Merz, Jonathan Tschanter, Carolin Wienrich, Marc Erich Latoschik
Title: Technological Advances in Two Generations of Consumer-Grade VR Systems: Effects on User Experience and Task Performance
Abstract:
Integrated VR (IVR) systems consist of a head-mounted display (HMD) and body-tracking capabilities. They enable users to translate their physical movements into corresponding avatar movements in real-time, allowing them to perceive their avatars via the displays. Consumer-grade IVR systems have been available for 10 years, significantly fostering VR research worldwide. However, the effects of even apparently significant technological advances of IVR systems on user experience and the overall validity of prior embodiment research using such systems often remain unclear. We ran a user-centered study comparing two comparable IVR generations: a nearly 10-year-old hardware (HTC Vive, 6-point tracking) and a modern counterpart (HTC Vive Pro 2, 6-point tracking). To ensure ecological validity, we evaluated the systems in their commercially available, as-is configurations. In a 2x5 mixed design, participants completed five tasks covering different use cases on either the old or new system. We assessed presence, sense of embodiment, appearance and behavior plausibility, workload, task performance, and gathered qualitative feedback. Results showed no significant system differences, with only small effect sizes. Bayesian analysis further supported the null hypothesis, suggesting that the investigated generational hardware improvements offer limited benefits for user experience and task performance. For the 10-year generational step examined here, excluding potential technological progress in the necessary software components, this supports the validity of conclusions from prior work and underscores the applicability of older configurations for research in embodied VR.

Authors:Haiyi Li, Yutong Li, Yiheng Chi, Alison Deslandes, Mathew Leonardi, Shay Freger, Yuan Zhang, Jodie Avery, M. Louise Hull, Hsiang-Ting Chen
Title: Who Fails Where? LLM and Human Error Patterns in Endometriosis Ultrasound Report Extraction
Abstract:
In this study, we evaluate a locally-deployed large-language model (LLM) to convert unstructured endometriosis transvaginal ultrasound (eTVUS) scan reports into structured data for imaging informatics workflows. Across 49 eTVUS reports, we compared three LLMs (7B/8B and a 20B-parameter model) against expert human extraction. The 20B model achieved a mean accuracy of 86.02%, substantially outperforming smaller models and confirming the importance of scale in handling complex clinical text. Crucially, we identified a highly complementary error profile: the LLM excelled at syntactic consistency (e.g., date/numeric formatting) where humans faltered, while human experts provided superior semantic and contextual interpretation. We also found that the LLM's semantic errors were fundamental limitations that could not be mitigated by simple prompt engineering. These findings strongly support a human-in-the-loop (HITL) workflow in which the on-premise LLM serves as a collaborative tool, not a full replacement. It automates routine structuring and flags potential human errors, enabling imaging specialists to focus on high-level semantic validation. We discuss implications for structured reporting and interactive AI systems in clinical practice.

Authors:Trevor De Clark, Yulia Bobkova, Ajay Kumar Shrestha
Title: Balancing Usability and Compliance in AI Smart Devices: A Privacy-by-Design Audit of Google Home, Alexa, and Siri
Abstract:
This paper investigates the privacy and usability of AI-enabled smart devices commonly used by youth, focusing on Google Home Mini, Amazon Alexa, and Apple Siri. While these devices provide convenience and efficiency, they also raise privacy and transparency concerns due to their always-listening design and complex data management processes. The study proposes and applies a combined framework of Heuristic Evaluation, Personal Information Protection and Electronic Documents Act (PIPEDA) Compliance Assessment, and Youth-Centered Usability Testing to assess whether these devices align with Privacy-by-Design principles and support meaningful user control. Results show that Google Home achieved the highest usability score, while Siri scored highest in regulatory compliance, indicating a trade-off between user convenience and privacy protection. Alexa demonstrated clearer task navigation but weaker transparency in data retention. Findings suggest that although youth may feel capable of managing their data, their privacy self-efficacy remains limited by technical design, complex settings, and unclear data policies. The paper concludes that enhancing transparency, embedding privacy guidance during onboarding, and improving policy alignment are critical steps toward ensuring that smart devices are both usable and compliant with privacy standards that protect young users.

Authors:Paul Kent, George De Ath, Martin Layton, Allen Hart, Richard Everson, Ben Carvell
Title: A Future Capabilities Agent for Tactical Air Traffic Control
Abstract:
Escalating air traffic demand is driving the adoption of automation to support air traffic controllers, but existing approaches face a trade-off between safety assurance and interpretability. Optimisation-based methods such as reinforcement learning offer strong performance but are difficult to verify and explain, while rules-based systems are transparent yet rarely check safety under uncertainty. This paper outlines Agent Mallard, a forward-planning, rules-based agent for tactical control in systemised airspace that embeds a stochastic digital twin directly into its conflict-resolution loop. Mallard operates on predefined GPS-guided routes, reducing continuous 4D vectoring to discrete choices over lanes and levels, and constructs hierarchical plans from an expert-informed library of deconfliction strategies. A depth-limited backtracking search uses causal attribution, topological plan splicing, and monotonic axis constraints to seek a complete safe plan for all aircraft, validating each candidate manoeuvre against uncertain execution scenarios (e.g., wind variation, pilot response, communication loss) before commitment. Preliminary walkthroughs with UK controllers and initial tests in the BluebirdDT airspace digital twin indicate that Mallard's behaviour aligns with expert reasoning and resolves conflicts in simplified scenarios. The architecture is intended to combine model-based safety assessment, interpretable decision logic, and tractable computational performance in future structured en-route environments.

Authors:Laura Aymerich-Franch, Tarek Taha, Hiroko Kamide, Takahiro Miyashita, Hiroshi Ishiguro, Paolo Dario
Title: Acceptance of cybernetic avatars for capability enhancement: a large-scale survey
Abstract:
Avatar embodiment experiences have the potential to enhance human capabilities by extending human senses, body, and mind. This study investigates social acceptance of robotic and virtual avatars as enablers of capability enhancement in six domains: identity exploration, well-being and behavioral transformation, expanded travel capabilities, expanded bodily and sensory abilities, cognitive augmentation, and immortality. We conducted a large-scale survey (n = 1001) in Dubai to explore acceptance of sixteen capability enhancement scenarios within these domains. The highest levels of agreement were observed for multilingual communication (77.5%) and learning capabilities (68.7%), followed by assisting individuals with reduced mobility (64.5%) and behavioral transformation (59.5%). Scenarios involving immortality through consciousness transfer received the least support (34.9%). These findings contribute to a deeper understanding of public attitudes toward avatar-based human enhancement and offer practical guidance for the responsible design, development, and integration of cybernetic avatars in the society, ensuring their societal acceptance and fostering a harmonious human-avatar coexistence.

Authors:Yuhan Liu, Shuyao Zhou, Jakob Kaiser, Ella Colby, Jennifer Okwara, Maggie Wang, Varun Nagaraj Rao, Andrés Monroy-Hernández
Title: How can LLMs Support Policy Researchers? Evaluating an LLM-Assisted Workflow for Large-Scale Unstructured Data
Abstract:
Policy researchers need scalable ways to surface public views, yet they often rely on interviews, listening sessions, and surveys-analyzed thematically-that are slow, expensive, and limited in scale and diversity. LLMs offer new possibilities for thematic analysis of unstructured text, yet we know little about how LLM-assisted workflows perform for policy research. Building on a workflow for LLM-assisted thematic analysis of online forums, we conduct a study with 11 policy researchers, who use an early prototype and see it as a quick, rough-and-ready input to their research. We then extend and scale the workflow to analyze millions of Reddit posts and 1,058 chatbot-led interview transcripts on a policy-relevant topic, treating these sources as rich and scalable data for policy discourse. We compare the synthesized themes to those from authoritative policy reports, identify points of alignment and divergence, and discuss what this implies for policy researchers adopting LLM-assisted workflows.

Authors:Xiaoyuan Zhu, Kimberly Le Truong, Riccardo Fogliato, Gokul Swamy, Weijian Zhang, Minglai Yang, Longtian Ye, Bangya Liu, Minghao Liu, Andrew Ilyas, Steven Wu
Title: Justified or Just Convincing? Error Verifiability as a Dimension of LLM Quality
Abstract:
As LLMs are deployed in high-stakes settings, users must judge the correctness of individual responses, often relying on model-generated justifications such as reasoning chains or explanations. Yet, no standard measure exists for whether these justifications help users distinguish correct answers from incorrect ones. We formalize this idea as error verifiability and propose $v_{\text{bal}}$, a balanced metric that measures whether justifications enable raters to accurately assess answer correctness, validated against human raters who show high agreement. We find that neither common approaches, such as post-training and model scaling, nor more targeted interventions recommended improve verifiability. We introduce two methods that succeed at improving verifiability: reflect-and-rephrase (RR) for mathematical reasoning and oracle-rephrase (OR) for factual QA, both of which improve verifiability by incorporating domain-appropriate external information. Together, our results establish error verifiability as a distinct dimension of response quality that does not emerge from accuracy improvements alone and requires dedicated, domain-aware methods to address.

Authors:Donghoon Shin, Bingcan Guo, Jaewook Lee, Lucy Lu Wang, Gary Hsieh
Title: ReFinE: Streamlining UI Mockup Iteration with Research Findings
Abstract:
Although HCI research papers offer valuable design insights, designers often struggle to apply them in design workflows due to difficulties in finding relevant literature, understanding technical jargon, the lack of contextualization, and limited actionability. To address these challenges, we present ReFinE, a Figma plugin that supports real-time design iteration by surfacing contextualized insights from research papers. ReFinE identifies and synthesizes design implications from HCI literature relevant to the mockup's design context, and tailors this research evidence to a specific design mockup by providing actionable visual guidance on how to update the mockup. To assess the system's effectiveness, we conducted a technical evaluation and a user study. Results show that ReFinE effectively synthesizes and contextualizes design implications, reducing cognitive load and improving designers' ability to integrate research evidence into UI mockups. This work contributes to bridging the gap between research and design practice by presenting a tool for embedding scholarly insights into the UI design process.

Authors:Xiaoan Liu, DaeHo Lee, Eric J Gonzalez, Mar Gonzalez-Franco, Ryo Suzuki
Title: VisionClaw: Always-On AI Agents through Smart Glasses
Abstract:
We present VisionClaw, an always-on wearable AI agent that integrates live egocentric perception with agentic task execution. Running on Meta Ray-Ban smart glasses, VisionClaw continuously perceives real-world context and enables in-situ, speech-driven action initiation and delegation via OpenClaw AI agents. Therefore, users can directly execute tasks through the smart glasses, such as adding real-world objects to an Amazon cart, generating notes from physical documents, receiving meeting briefings on the go, creating events from posters, or controlling IoT devices. We evaluate VisionClaw through a controlled laboratory study (N=12) and a longitudinal deployment study (N=5). Results show that integrating perception and execution enables faster task completion and reduces interaction overhead compared to non-always-on and non-agent baselines. Beyond performance gains, deployment findings reveal a shift in interaction: tasks are initiated opportunistically during ongoing activities, and execution is increasingly delegated rather than manually controlled. These results suggest a new paradigm for wearable AI agents, where perception and action are continuously coupled to support situated, hands-free interaction.

Authors:Jonathan Albert Cohen, Kye Shimizu, Allen Song, Vishnu Bharath, Kent Larson, Pattie Maes
Title: Do Robots Need Body Language? Comparing Communication Modalities for Legible Motion Intent in Human-Shared Spaces
Abstract:
Robots in shared spaces often move in ways that are difficult for people to interpret, placing the burden on humans to adapt. High-DoF robots exhibit motion that people read as expressive, intentionally or not, making it important to understand how such cues are perceived. We present an online video study evaluating how different signaling modalities, expressive motion, lights, text, and audio, shape people's ability to understand a quadruped robot's upcoming navigation actions (Boston Dynamics Spot). Across four common scenarios, we measure how each modality influences humans' (1) accuracy in predicting the robot's next navigation action, (2) confidence in that prediction, and (3) trust in the robot to act safely. The study tests how expressive motions compare to explicit channels, whether aligned multimodal cues enhance interpretability, and how conflicting cues affect user confidence and trust. We contribute initial evidence on the relative effectiveness of implicit versus explicit signaling strategies.

Authors:Chathuri Jayaweera, Bonnie J. Dorr
Title: BLADE: Better Language Answers through Dialogue and Explanations
Abstract:
Large language model (LLM)-based educational assistants often provide direct answers that short-circuit learning by reducing exploration, self-explanation, and engagement with course materials. We present BLADE (Better Language Answers through Dialogue and Explanations), a grounded conversational assistant that guides learners to relevant instructional resources rather than supplying immediate solutions. BLADE uses a retrieval-augmented generation (RAG) framework over curated course content, dynamically surfacing pedagogically relevant excerpts in response to student queries. Instead of delivering final answers, BLADE prompts direct engagement with source materials to support conceptual understanding. We conduct an impact study in an undergraduate computer science course, with different course resource configurations and show that BLADE improves students' navigation of course resources and conceptual performance compared to simply providing the full inventory of course resources. These results demonstrate the potential of grounded conversational AI to reinforce active learning and evidence-based reasoning.

Authors:Xin Sun, Shu Wei, Ting Pan, Yajing Wang, Jos A. Bosch, Isao Echizen, Abdallah El Ali, Saku Sugawara
Title: Eyes Can't Always Tell: Fusing Eye Tracking and User Priors for User Modeling under AI Advice Conditions
Abstract:
Modeling users' cognitive states (e.g., cognitive load and decision confidence) is essential for building adaptive AI in high-stakes decision-making. While eye tracking provides non-invasive behavioral signals correlated with cognitive effort, prior work has not systematically examined how AI assistance contexts, specifically varying advice reliability and user heterogeneity, can alter the mapping between gaze signals and cognitive states. We conducted a within-subject lab eye-tracking study (N=54) on factual verification tasks under three conditions: No-AI, Correct-AI advice, and Incorrect-AI advice. We analyze condition-dependent changes in self-reports and eye-tracking patterns and evaluate the robustness of eye-tracking-based user modeling. Results show that AI advice increases decision confidence compared to No-AI, while Correct-AI is associated with lower perceived cognitive load and more efficient gaze behavior. Crucially, predictive modeling is context-sensitive: the relationship between eye-tracking signals and cognitive states shifts across AI conditions. Finally, fusing eye-tracking features with user priors (demographics, AI literacy/experience, and propensity to trust technology) improves cross-participant generalization. These findings support condition-aware and personalized user modeling for cognitively aligned adaptive AI systems.

Authors:Beleicia Bullock, James A. Landay, Michael S. Bernstein
Title: Comparing Design Metaphors and User-Driven Metaphors for Interaction Design
Abstract:
Metaphors enable designers to communicate their ideal user experience for platforms. Yet, we often do not know if these design metaphors match users' actual experiences. In this work, we compare design and user metaphors across three different platforms: ChatGPT, Twitter, and YouTube. We build on prior methods to elicit 554 user metaphors, as well as ratings on how well each metaphor describes users' experiences. We then identify 21 design metaphors by analyzing each platform's historical web presence since their launch date. We find that design metaphors often do not match the metaphors that users use to describe their experiences. Even when design and user metaphors do match, the metaphors do not always resonate universally. Through these findings, we highlight how comparing design and user metaphors can help to evaluate and refine metaphors for user experience.

Authors:Lucy Jiang, Amy Seunghyun Lee, Jon E. Froehlich, Leah Findlater
Title: Unseen City Canvases: Exploring Blind and Low Vision People's Perspectives on Urban and Public Art Accessibility
Abstract:
Public art can hold cultural, social, political, and aesthetic significance, enriching urban environments and promoting well-being. However, a majority of urban art is inaccessible to blind and low vision (BLV) people. Most art access research has focused on private and curated settings (e.g., museums, galleries) and most urban access work has centered on outdoor navigation, leaving urban and public art accessibility largely understudied. We conducted semi-structured interviews with 16 BLV participants, using design probes featuring AI-generated descriptions and real-time AI interactions to investigate preferences for both discovering and engaging with urban art. We found that BLV people valued spontaneous art exploration, multisensory (e.g., tactile, auditory, olfactory) engagement, and detailed descriptions of culturally significant artwork. Participants also highlighted challenges distinct to urban art contexts: safety took precedence over art exploration, multisensory access measures could be disruptive to others in the public space, and inaccurate AI descriptions could lead to cultural erasure. Our contributions include empirical insights on BLV preferences for urban art discovery and engagement, seven design dimensions for public art access solutions, and implications for expanding HCI urban accessibility research beyond navigation.

Authors:Zeya Chen, Zach Pino, Ruth Schmidt
Title: Framing Data Choices: How Pre-Donation Exploration Designs Influence Data Donation Behavior and Decision-Making
Abstract:
Data donation, an emerging user-centric data collection method for public sector research, faces a gap between participant willingness and actual donation. This suggests a design absence in practice: while promoted as "donor-centered" with technical and regulational advances, a design perspective on how data choices are presented and intervene on individual behaviors remain underexplored. In this paper, we focus on pre-donation data exploration, a key stage for adequately and meaningful informed participation. Through a real-world data donation study (N=24), we evaluated three data exploration interventions (self-focused, social comparison, collective-only). Findings show choice framing impacts donation participation. The "social comparison" design (87.5%) outperformed the "self-focused view" (62.5%) while a "collective-only" frame (37.5%) backfired, causing "perspective confusion" and privacy concerns. This study demonstrates how strategic data framing addresses data donation as a behavioral challenge, revealing design's critical yet underexplored role in data donation for participatory public sector innovation.

Authors:Duosi Dai, Pavithren V S Pakianathan, Gunnar Treff, Mahdi Sareban, Jan David Smeddinck, Sanna Kuoppamäki
Title: Exploring Self-Tracking Practices of Older Adults with CVD to Inform the Design of LLM-Enabled Health Data Sensemaking
Abstract:
Wearables and mobile health applications are increasingly adopted for self-management of chronic illnesses; yet the data feels overwhelming for older adults with cardiovascular disease (CVD). This study explores how they make sense of self-tracked data and identifies design opportunities for Large Language Model (LLM)-enabled support. We conducted a seven-day diary study and follow-up interviews with eight CVD patients aged 64-82. We identified six themes: navigating emotional complexity, owning health narratives, prioritizing bodily sensations, selective engagement with health metrics, negotiating socio-technical dynamics of sharing, and cautious optimism toward AI. Findings highlight that self-tracking is affective, interpretive, and socially situated. We outline design directions for LLM-enabled data sensemaking systems: supporting emotional engagement, reinforcing patient agency, acknowledging embodied experiences, and prompting dialogue in clinical and social contexts. To support safety, expert-in-the-loop mechanisms are essential. These directions articulate how LLMs can help translate data into narratives and carry implications for human-data interaction and behavior-change support.

Authors:Yasamin Borhani, Taylor Mordan, Yihan Wang, Reyhaneh Hosseininejad, Javad Khoramdel, Alexandre Alahi
Title: PoseDriver: A Unified Approach to Multi-Category Skeleton Detection for Autonomous Driving
Abstract:
Object skeletons offer a concise representation of structural information, capturing essential aspects of posture and orientation that are crucial for autonomous driving applications. However, a unified architecture that simultaneously handles multiple instances and categories using only the input image remains elusive. In this paper, we introduce PoseDriver, a unified framework for bottom-up multi-category skeleton detection tailored to common objects in driving scenarios. We model each category as a distinct task to systematically address the challenges of multi-task learning. Specifically, we propose a novel approach for lane detection based on skeleton representations, achieving state-of-the-art performance on the OpenLane dataset. Moreover, we present a new dataset for bicycle skeleton detection and assess the transferability of our framework to novel categories. Experimental results validate the effectiveness of the proposed approach.

Authors:Abed K. Musaffar, Ambuj Singh, Francesco Bullo
Title: Detection of adversarial intent in Human-AI teams using LLMs
Abstract:
Large language models (LLMs) are increasingly deployed in human-AI teams as support agents for complex tasks such as information retrieval, programming, and decision-making assistance. While these agents' autonomy and contextual knowledge enables them to be useful, it also exposes them to a broad range of attacks, including data poisoning, prompt injection, and even prompt engineering. Through these attack vectors, malicious actors can manipulate an LLM agent to provide harmful information, potentially manipulating human agents to make harmful decisions. While prior work has focused on LLMs as attack targets or adversarial actors, this paper studies their potential role as defensive supervisors within mixed human-AI teams. Using a dataset consisting of multi-party conversations and decisions for a real human-AI team over a 25 round horizon, we formulate the problem of malicious behavior detection from interaction traces. We find that LLMs are capable of identifying malicious behavior in real-time, and without task-specific information, indicating the potential for task-agnostic defense. Moreover, we find that the malicious behavior of interest is not easily identified using simple heuristics, further suggesting the introduction of LLM defenders could render human teams more robust to certain classes of attack.

Authors:Manuel Scheibl, Julian Leichert, Sinem Görmez, Britta Wrede
Title: Sense4HRI: A ROS 2 HRI Framework for Physiological Sensor Integration and Synchronized Logging
Abstract:
Physiological signals are increasingly relevant to estimate the mental states of users in human-robot interaction (HRI), yet ROS 2-based HRI frameworks still lack reusable support to integrate such data streams in a standardized way. Therefore, we propose Sense4HRI, an adapted framework for human-robot interaction in ROS 2 that integrates physiological measurements and derived user-state indicators. The framework is designed to be extensible, allowing the integration of additional physiological sensors, their interpretation, and multimodal fusion to provide a robust assessment of the mental states of users. In addition, it introduces reusable interfaces for timestamped physiological time-series data and supports synchronized logging of physiological signals together with experiment context, enabling interoperable and traceable multimodal analysis within ROS 2-based HRI systems.

Authors:Shiwei Wu, Xinyue Chen, Yuheng Liu, Xingbo Wang, Qingyu Guo, Longfei Chen, Chuhan Shi, Zhenhui Peng
Title: ConSearcher: Supporting Conversational Information Seeking in Online Communities with Member Personas
Abstract:
Many people browse online communities to learn from others' experiences and opinions, e.g., for constructing travel plans. Conversational search powered by large language models (LLMs) could ease this information-seeking task, but it remains under-investigated within the online community. In this paper, we first conducted an exploratory study (N=10) that indicated the helpfulness of a classic conversational search tool and identified room for improvement. Then, we proposed ConSearcher, an LLM-powered tool with dynamically generated member personas based on user queries to facilitate conversational search in the community. In ConSearcher, users can clarify their interests by checking what a simulated member similar to them may ask and get responses from diverse members' perspectives. A within-subjects study (N=27) showed that compared to two conversational search baselines, ConSearcher led to significantly higher information-seeking outcome and user engagement but raised concerns about over-personalization. We discuss implications for supporting conversational information seeking in online communities.

Authors:Binyan Xu, Wei Wu, Soonhyeon Kweon, Casper Harteveld, Leanne Chukoskie
Title: It Depends: Re_Authoring Play Through Clinical Reasoning in Wearable AR Rehab Games
Abstract:
Augmented reality games hold promise for rehabilitation, yet most remain confined to laboratory studies with limited clinical uptake. Recent advances in spatial computing, especially lightweight, glasses_form_factor AR, create a timely opportunity to embed rehabilitative play into clinical practice and daily contexts. To investigate this potential, we systematically reviewed 132 applications and conducted playtesting with 14 licensed physical therapists. Our analysis revealed three ways therapists re_authored AR games: co_authored play (reshaping movements, progressions, and difficulty), situated play (adapting across specialties, conditions, and contexts), and dual play (mediating both physical recovery and psychological support). We reframe therapists' frequent phrase_It depends_as a generative design principle. This study contributes a clinical reasoning_based framework and design principles and guidelines for creating personalized, situated forms of play that align with therapists' everyday workflows and inform future lab_to_clinic translation.

Authors:Neil Fernandes, Cheng Tang, Tehniyat Shahbaz, Alex Hauschildt, Emily Davies-Robinson, Yue Hu, Kerstin Dautenhahn
Title: "You've got a friend in me": Co-Designing a Peer Social Robot for Young Newcomers' Language and Cultural Learning
Abstract:
Community literacy programs supporting young newcomer children in Canada face limited staffing and scarce one-to-one time, which constrains personalized English and cultural learning support. This paper reports on a co-design study with United for Literacy tutors that informed Maple, a table-top, peer-like Socially Assistive Robot (SAR) designed as a practice partner within tutor-mediated sessions. From shadowing and co-design interviews, we derived newcomer-specific requirements and added them in an integrated prototype that uses short story-based activities, multi-modal scaffolding (speech, facial feedback, gesture), and embedded quizzes that support attention while producing tutor-actionable formative signals. We contribute system design implications for tutor-in-the-loop SARs supporting language socialization in community settings and outline directions for child-centered evaluation in authentic programs.

Authors:Mehran Shabanpour, Sadaf Khademi, Konstantinos N Plataniotis, Arash Mohammadi
Title: DECODE: Dual-Enhanced Conditioned Diffusion for EEG Forecasting
Abstract:
Forecasting Electroncephalography (EEG) signals during cognitive events remains a fundamental challenge in neuroscience and Brain-Computer Interfaces (BCIs), as existing methods struggle to capture both the stochastic nature of neural dynamics and the semantic context of behavioral tasks. We present the Dual-Enhanced COnditioned Diffusion (DECODE) for EEG, a novel framework that unifies semantic guidance from natural language descriptions with temporal dynamics from historical signals to generate event-specific neural responses. DECODE leverages pre-trained language models to condition the diffusion process on rich textual descriptions of cognitive events, while maintaining temporal coherence through history-based Langevin dynamics. Evaluated on a real-world driving task dataset with five distinct behaviors, DECODE achieves sub-microvolt prediction accuracy (MAE = 0.626 microvolt) over 75 timestep horizons while maintaining well-calibrated uncertainty estimates. Our framework demonstrates that natural language can effectively bridge high-level cognitive descriptions and low-level neural dynamics, opening new possibilities for zero-shot generalization to novel behaviors and interpretable BCIs. By generating physiologically plausible, event-specific EEG trajectories conditioned on semantic descriptions, DECODE establishes a new paradigm for understanding and predicting context-dependent neural activity.

Authors:Nadine Jost, Benjamin Berens, Manuel Karl, Stefan Albert Horstmann, Martin Johns, Alena Naiakshina
Title: The Impact of AI-Assisted Development on Software Security: A Study of Gemini and Developer Experience
Abstract:
The ongoing shortage of skilled developers, particularly in security-critical software development, has led organizations to increasingly adopt AI-powered development tools to boost productivity and reduce reliance on limited human expertise. These tools, often based on large language models, aim to automate routine tasks and make secure software development more accessible and efficient. However, it remains unclear how developers' general programming and security-specific experience, and the type of AI tool used (free vs. paid) affect the security of the resulting software. Therefore, we conducted a quantitative programming study with software developers (n=159) exploring the impact of Google's AI tool Gemini on code security. Participants were assigned a security-related programming task using either no AI tools, the free version, or the paid version of Gemini. While we did not observe significant differences between using Gemini in terms of secure software development, programming experience significantly improved code security and cannot be fully substituted by Gemini.

Authors:Shaojun Cai, Nuwan Janaka, Ashwin Ram, Janidu Shehan, Yingjia Wan, Kotaro Hara, David Hsu
Title: Navigation beyond Wayfinding: Robots Collaborating with Visually Impaired Users for Environmental Interactions
Abstract:
Robotic guidance systems have shown promise in supporting blind and visually impaired (BVI) individuals with wayfinding and obstacle avoidance. However, most existing systems assume a clear path and do not support a critical aspect of navigation - environmental interactions that require manipulating objects to enable movement. These interactions are challenging for a human-robot pair because they demand (i) precise localization and manipulation of interaction targets (e.g., pressing elevator buttons) and (ii) dynamic coordination between the user's and robot's movements (e.g., pulling out a chair to sit). We present a collaborative human-robot approach that combines our robotic guide dog's precise sensing and localization capabilities with the user's ability to perform physical manipulation. The system alternates between two modes: lead mode, where the robot detects and guides the user to the target, and adaptation mode, where the robot adjusts its motion as the user interacts with the environment (e.g., opening a door). Evaluation results show that our system enables navigation that is safer, smoother, and more efficient than both a traditional white cane and a non-adaptive guiding system, with the performance gap widening as tasks demand higher precision in locating interaction targets. These findings highlight the promise of human-robot collaboration in advancing assistive technologies toward more generalizable and realistic navigation support.

Authors:Ebrahim Feghhi, Junlin Hu, Nima Hadidi, Jonathan C. Kao
Title: LightBeam: An Accurate and Memory-Efficient CTC Decoder for Speech Neuroprostheses
Abstract:
A promising pathway for restoring communication in patients with dysarthria and anarthria is speech neuroprostheses, which directly decode speech from cortical neural activity. Two benchmarks, Brain-to-Text '24 and '25, released intracranial recordings from patients with dysarthria along with a baseline algorithm trained with Connectionist Temporal Classification (CTC). Despite significant innovation on these benchmarks, all leading published prior work relies on a WFST-based CTC decoder that requires ${\sim}$320 GB of RAM. These memory requirements limit accessibility for both patients and researchers. Here, we propose LightBeam, a non-WFST based CTC decoder that requires only ${\sim}$10 GB of RAM and achieves state-of-the-art performance on both benchmarks. LightBeam achieves this by integrating an LLM into the beam-search process via delayed fusion, obviating the prior need for using a large N-gram LM. LightBeam is implemented in Python and is open-source.

Authors:Wei Wu, Binyan Xu, Soonhyeon Kweon, Yujie Wang, Leanne Chukoskie, Casper Harteveld
Title: Reimagining Wearable AR Gesture Design: Physical Therapy Reasoning in Everyday Contexts
Abstract:
Lightweight augmented reality (AR) glasses are increasingly entering everyday use, extending interaction design beyond short, isolated sessions. However, most existing gesture vocabularies are inherited from VR headsets or early AR goggles. These systems tend to prioritize recognizer accuracy while overlooking fatigue, sustainability, and social legibility in daily contexts. To address this gap, we collaborated with physical therapists (PTs) to reimagine gesture design for everyday AR, drawing on their expertise in safe and sustainable movement. Through a review of 104 AR applications, we identified 15 common gesture intents and implemented an on-device gesture generator. Ten licensed physical therapists, with an average of 14.8 years of professional experience, then shaped these gesture intents through three iterative stages: unaided gesture performance, PT-guided gesture substitution, and stage-aware card sorting. This work contributes (1) a PT-informed gesture translation method, (2) the Everyday-AR Golden Ergonomic Canvas, and (3) a stage-aware social legibility framework that illustrates how gesture suitability shifts with social readability. Together, these contributions provide a recognizer-agnostic reference framework for designing sustainable and socially coherent gesture vocabularies for lightweight AR glasses.

Authors:Carlo Dindorf, Jonas Dully, Rebecca Keilhauer, Michael Lorenz, Michael Fröhlich
Title: Evaluating Large Language Models for Gait Classification Using Text-Encoded Kinematic Waveforms
Abstract:
Background: Machine learning (ML) enhances gait analysis but often lacks the level of interpretability desired for clinical adoption. Large Language Models (LLMs) may offer explanatory capabilities and confidence-aware outputs when applied to structured kinematic data. This study therefore evaluated whether general-purpose LLMs can classify continuous gait kinematics when represented as textual numeric sequences and how their performance compares to conventional ML approaches. Methods: Lower-body kinematics were recorded from 20 participants performing seven gait patterns. A supervised KNN classifier and a class-independent One-Class SVM (OCSVM) were compared against zero-shot LLMs (GPT-5, GPT-5-mini, GPT-4.1, and o4-mini). Models were evaluated using Leave-One-Subject-Out (LOSO) cross-validation. LLMs were tested both with and without explicit reference gait statistics. Results: The supervised KNN achieved the highest performance (multiclass Matthews Correlation Coefficient, MCC = 0.88). The best-performing LLM (GPT-5) with reference grounding achieved a multiclass MCC of 0.70 and a binary MCC of 0.68, outperforming the class-independent OCSVM (binary MCC = 0.60). Performance of the LLM was highly dependent on explicit reference information and self-rated confidence; when restricted to high-confidence predictions, multiclass MCC increased to 0.83 on the filtered subset. Notably, the computationally efficient o4-mini model performed comparably to larger models. Conclusion: When continuous kinematic waveforms were encoded as textual numeric tokens, general-purpose LLMs, even with reference grounding, did not match supervised multiclass classifiers for precise gait classification and are better regarded as exploratory systems requiring cautious, human-guided interpretation rather than diagnostic use.

Authors:Mathias N. Lystbæk, Haley Adams, Ranjith Kagathi Ananda, Eric J Gonzalez, Luca Ballan, Qiuxuan Wu, Andrea Colaço, Peter Tan, Mar Gonzalez-Franco
Title: Navig-AI-tion: Navigation by Contextual AI and Spatial Audio
Abstract:
Audio-only walking navigation can leave users disoriented, relying on vague cardinal directions and lacking real-time environmental context, leading to frequent errors. To address this, we present a novel system that integrates a Vision Language Model (VLM) with a spatial audio cue. Our system extracts environmental landmarks to anchor navigation instructions and, crucially, provides a directional spatial audio signal when the user faces the wrong direction, indicating the precise turn direction. In a user study (n=12), the spatial audio cue with VLM reduced route deviations compared to both VLM-only and Google Maps (audio-only) baseline systems. Users reported that the spatial audio cue effectively supported orientation and that landmark-anchored instructions provided a better navigation experience over audio-only Google Maps. This work serves as an initial look at the utility of future audio-only navigation systems for incorporating directional cues, especially real-time corrective spatial audio.

Authors:Mei Tan, Lena Phalen, Dorottya Demszky
Title: Marked Pedagogies: Examining Linguistic Biases in Personalized Automated Writing Feedback
Abstract:
Effective personalized feedback is critical to students' literacy development. Though LLM-powered tools now promise to automate such feedback at scale, LLMs are not language-neutral: they privilege standard academic English and reproduce social stereotypes, raising concerns about how "personalization" shapes the feedback students receive. We examine how four widely used LLMs (GPT-4o, GPT-3.5-turbo, Llama-3.3 70B, Llama-3.1 8B) adapt written feedback in response to student attributes. Using 600 eighth-grade persuasive essays from the PERSUADE dataset, we generated feedback under prompt conditions embedding gender, race/ethnicity, learning needs, achievement, and motivation. We analyze lexical shifts across model outputs by adapting the Marked Words framework. Our results reveal systematic, stereotype-aligned shifts in feedback conditioned on presumed student attributes--even when essay content was identical. Feedback for students marked by race, language, or disability often exhibited positive feedback bias and feedback withholding bias--overuse of praise, less substantive critique, and assumptions of limited ability. Across attributes, models tailored not only what content was emphasized but also how writing was judged and how students were addressed. We term these instructional orientations Marked Pedagogies and highlight the need for transparency and accountability in automated feedback tools.

Authors:Philipp Spitzer, Joshua Holstein
Title: Believing vs. Achieving -- The Disconnect between Efficacy Beliefs and Collaborative Outcomes
Abstract:
As artificial intelligence (AI) becomes increasingly integrated into workflows, humans must decide when to rely on AI advice. These decisions depend on general efficacy beliefs, i.e., humans' confidence in their own abilities and their perceptions of AI competence. While prior work has examined factors influencing AI reliance, the role of efficacy beliefs in shaping collaboration remains underexplored. Through a controlled experiment (N=240) where participants made repeated delegation decisions, we investigate how efficacy beliefs translate into instance-wise efficacy judgments under varying contextual information. Our explorative findings reveal efficacy beliefs as persistent cognitive anchors, leading to systematic "AI optimism". Contextual information operates asymmetrically: while AI performance information selectively eliminates the AI optimism bias, data or AI information amplify how efficacy discrepancies influence delegation decisions. Although efficacy discrepancies influence delegation behavior, they show weaker effects on human-AI team performance. As these findings challenge transparency-focused approaches, we propose design guidelines for effective collaborative settings.

Authors:Xingyu Bruce Liu, Mira Dontcheva, Dingzeyu Li
Title: A Text-Native Interface for Generative Video Authoring
Abstract:
Everyone can write their stories in freeform text format -- it's something we all learn in school. Yet storytelling via video requires one to learn specialized and complicated tools. In this paper, we introduce Doki, a text-native interface for generative video authoring, aligning video creation with the natural process of text writing. In Doki, writing text is the primary interaction: within a single document, users define assets, structure scenes, create shots, refine edits, and add audio. We articulate the design principles of this text-first approach and demonstrate Doki's capabilities through a series of examples. To evaluate its real-world use, we conducted a week-long deployment study with participants of varying expertise in video authoring. This work contributes a fundamental shift in generative video interfaces, demonstrating a powerful and accessible new way to craft visual stories.

Authors:Niharika Mathur, Hasibur Rahman, Smit Desai
Title: "Who wants to be nagged by AI?": Investigating the Effects of Agreeableness on Older Adults' Perception of LLM-Based Voice Assistants' Explanations
Abstract:
LLM-based voice assistants (VAs) increasingly support older adults aging in place, yet how an assistant's agreeableness shapes explanation perception remains underexplored. We conducted a study(N=70) examining how VA agreeableness influences older adults' perceptions of explanations across routine and emergency home scenarios. High-agreeableness assistants were perceived as more trustworthy, empathetic, and likable, but these benefits diminished in emergencies where clarity outweighed warmth. Agreeableness did not affect perceived intelligence, suggesting social tone and competence are separable dimensions. Real-time environmental explanations outperformed history-based ones, and agreeable older adults penalized low-agreeableness assistants more strongly. These findings show the need to move beyond a one-size-fits-all approach to AI explainability, while balancing personality, context, and audience.

Authors:Satheeshkumar Veeramani, Anna Kisil, Abigail Bentley, Hatem Fakhruldeen, Gabriella Pizzuto, Andrew I. Cooper
Title: Human-Aware Robot Behaviour in Self-Driving Labs
Abstract:
Self-driving laboratories (SDLs) are rapidly transforming research in chemistry and materials science to accelerate new discoveries. Mobile robot chemists (MRCs) play a pivotal role by autonomously navigating the lab to transport samples, effectively connecting synthesis, analysis, and characterisation equipment. The instruments within an SDL are typically designed or retrofitted to be accessed by both human and robotic chemists, ensuring operational flexibility and integration between manual and automated workflows. In many scenarios, human and robotic chemists may need to use the same equipment simultaneously. Currently, MRCs rely on simple LiDAR-based obstruction detection, which forces the robot to passively wait if a human is present. This lack of situational awareness leads to unnecessary delays and inefficient coordination in time-critical automated workflows in human-robot shared labs. To address this, we present an initial study of an embodied, AI-driven perception method that facilitates proactive human-robot interaction in shared-access scenarios. Our method features a hierarchical human intention prediction model that allows the robot to distinguish between preparatory actions (waiting) and transient interactions (accessing the instrument). Our results demonstrate that the proposed approach enhances efficiency by enabling proactive human-robot interaction, streamlining coordination, and potentially increasing the efficiency of autonomous scientific labs.

Authors:Niharika Mathur, Hasibur Rahman, Smit Desai
Title: The Differential Effects of Agreeableness and Extraversion on Older Adults' Perceptions of Conversational AI Explanations in Assistive Settings
Abstract:
Large Language Model-based Voice Assistants (LLM-VAs) are increasingly deployed in assistive settings for older adults, yet little is known about how an agent's personality shapes user perceptions of its explanations. This paper presents a mixed factorial experiment (N=140) examining how agreeableness and extraversion in an LLM-VA ("Robin") influence older adults' perceptions across seven measures: empathy, likeability, trust, reliance, satisfaction, intention to adopt, and perceived intelligence. Results reveal that high agreeableness drove stronger empathy perceptions, while low agreeableness consistently penalized likeability. Importantly, perceived intelligence remained unaffected by personality, suggesting that personality shapes sociability without altering competence perceptions. Real-time environmental explanations outperformed conversational history explanations on five measures, with advantages concentrated in emergency contexts. Notably, highly agreeable participants were especially critical of low-agreeableness agents, revealing a user-agent personality congruence effect. These findings offer design implications for personality-aware, context-sensitive LLM-VAs in assistive settings.

Authors:Xin Sun, Shu Wei, Jos A Bosch, Isao Echizen, Saku Sugawara, Abdallah El Ali
Title: Seeing the Reasoning: How LLM Rationales Influence User Trust and Decision-Making in Factual Verification Tasks
Abstract:
Large Language Models (LLMs) increasingly show reasoning rationales alongside their answers, turning "reasoning" into a user-interface element. While step-by-step rationales are typically associated with model performance, how they influence users' trust and decision-making in factual verification tasks remains unclear. We ran an online study (N=68) manipulating three properties of LLM reasoning rationales: presentation format (instant vs. delayed vs. on-demand), correctness (correct vs. incorrect), and certainty framing (none vs. certain vs. uncertain). We found that correct rationales and certainty cues increased trust, decision confidence, and AI advice adoption, whereas uncertainty cues reduced them. Presentation format did not have a significant effect, suggesting users were less sensitive to how reasoning was revealed than to its reliability. Participants indicated they use rationales to primarily audit outputs and calibrate trust, where they expected rationales in stepwise, adaptive forms with certainty indicators. Our work shows that user-facing rationales, if poorly designed, can both support decision-making yet miscalibrate trust.

Authors:Matthew Brehmer, Maxime Cordeil, Christophe Hurter, Takayuki Itoh, Wolfgang Büschel, Mahmood Jasim, Arnaud Prouzeau, David Saffo, Lyn Bartram, Sheelagh Carpendale, Chen Zhu-Tian, Andrew Cunningham, Tim Dwyer, Samuel Huron, Masahiko Itoh, Alark Joshi, Kiyoshi Kiyokawa, Hideaki Kuzuoka, Bongshin Lee, Gabriela Molina León, Harald Reiterer, Bektur Ryskeldiev, Jonathan Schwabish, Brian A. Smith, Yasuyuki Sumi, Ryo Suzuki, Anthony Tang, Yalong Yang, Jian Zhao
Title: Challenges in Synchronous & Remote Collaboration Around Visualization
Abstract:
We characterize 16 challenges faced by those investigating and developing remote and synchronous collaborative experiences around visualization. Our work reflects the perspectives and prior research efforts of an international group of 29 experts from across human-computer interaction and visualization sub-communities. The challenges are anchored around five collaborative activities that exhibit a centrality of visualization and multimodal communication. These activities include exploratory data analysis, creative ideation, visualization-rich presentations, joint decision making grounded in data, and real-time data monitoring. The challenges also reflect the changing dynamics of these activities in the face of recent advances in extended reality (XR) and artificial intelligence (AI). As an organizing scheme for future research at the intersection of visualization and computer-supported cooperative work, we align the challenges with a sequence of four sets of research and development activities: technological choices, social factors, AI assistance, and evaluation.

Authors:Jean-Daniel Fekete, Yifan Hu, Dominik Moritz, Arnab Nandi, Senjuti Basu Roy, Eugene Wu, Nikos Bikakis, George Papastefanatos, Panos K. Chrysanthis, Guoliang Li, Lingyun Yu
Title: Human-Data Interaction, Exploration, and Visualization in the AI Era: Challenges and Opportunities
Abstract:
The rapid advancement of AI is transforming human-centered systems, with profound implications for human-AI interaction, human-data interaction, and visual analytics. In the AI era, data analysis increasingly involves large-scale, heterogeneous, and multimodal data that is predominantly unstructured, as well as foundation models such as LLMs and VLMs, which introduce additional uncertainty into analytical processes. These shifts expose persistent challenges for human-data interactive systems, including perceptually misaligned latency, scalability constraints, limitations of existing interaction and exploration paradigms, and growing uncertainty regarding the reliability and interpretability of AI-generated insights. Responding to these challenges requires moving beyond conventional efficiency and scalability metrics, redefining the roles of humans and machines in analytical workflows, and incorporating cognitive, perceptual, and design principles into every level of the human-data interaction stack. This paper investigates the challenges introduced by recent advances in AI and examines how these developments are reshaping the ways users engage with data, while outlining limitations and open research directions for building human-centered AI systems for interactive data analysis in the AI era.

Authors:Laura Spillner, Rachel Ringe, Robert Porzel, Rainer Malaka
Title: Not All Trust is the Same: Effects of Decision Workflow and Explanations in Human-AI Decision Making
Abstract:
A central challenge in AI-assisted decision making is achieving warranted, well-calibrated trust. Both overtrust (accepting incorrect AI recommendations) and undertrust (rejecting correct advice) should be prevented. Prior studies differ in the design of the decision workflow - whether users see the AI suggestion immediately (1-step setup) or have to submit a first decision beforehand (2-step setup) -, and in how trust is measured - through self-reports or as behavioral trust, that is, reliance. We examined the effects and interactions of (a) the type of decision workflow, (b) the presence of explanations, and (c) users' domain knowledge and prior AI experience. We compared reported trust, reliance (agreement rate and switch rate), and overreliance. Results showed no evidence that a 2-step setup reduces overreliance. The decision workflow also did not directly affect self-reported trust, but there was a crossover interaction effect with domain knowledge and explanations, suggesting that the effects of explanations alone may not generalize across workflow setups. Finally, our findings confirm that reported trust and reliance behavior are distinct constructs that should be evaluated separately in AI-assisted decision making.

Authors:Harry H. Jiang, Jordan Taylor, William Agnew
Title: How Professional Visual Artists are Negotiating Generative AI in the Workplace
Abstract:
Generative AI has been heavily critiqued by artists in both popular media and HCI scholarship. However, more work is needed to understand the impacts of generative AI on professional artists' workplaces and careers. In this paper, we conduct a survey of \textit{378 verified professional visual artists} about how generative AI has impacted their careers and workplaces. We find (1) most visual artists are strongly opposed to using generative AI (text or visual) and negotiate their inclusion in the workplace through a variety of \textit{refusal} strategies (2) there exist a range of factors in artists environments shaping their use of generative AI, including pressure from clients, bosses, and peers and (3) visual artists report overwhelmingly negative impacts of generative AI on their workplaces, leading to added stress and reduced job opportunities. In light of these findings, we encourage HCI researchers to contend more deeply with artists' desires not to use generative AI in the workplace.

Authors:Chenyang Zhang, Tianjian Wei, Haoyang Yang, Mar Gonzalez-Franco, Yalong Yang, Eric J Gonzalez
Title: Break the Window: Exploring Spatial Decomposition of Webpages in XR
Abstract:
Most XR web browsers still present webpages as a single floating window, carrying over desktop design assumptions into immersive space. We explore an alternative by breaking the browser window and distributing a webpage into spatial UI chunks within a mixed-reality workspace. We present Break-the-Window (BTW), an exploratory prototype that spatially decomposes live, fully functional webpages into movable panels supporting mid-air and surface-attached placement, as well as direct touch and ray-based interaction. Through a formative study with XR practitioners and an exploratory qualitative study with 15 participants, we observed how spatial decomposition supports distributed attention and spatial meaning-making, while also surfacing challenges around coordination effort, interaction precision, and the lack of shared spatial UI conventions. This work invites discussion on how web interfaces might be reimagined for spatial computing beyond the single-window paradigm.

Authors:Julia Kieserman, Cat Mai, Sara Lignell, Lucy Qin, Athanasios Andreou, Damon McCoy, Rosanna Bellini
Title: Caught in a Mafia Romance: How Users Explore Intimate Roleplay and Narrative Exploration with Chatbots
Abstract:
AI chatbots, built using large language models, are increasingly integrated into society and mimic the patterns of human text exchanges. While previous research has raised concerns that humans may form romantic attachment to chatbots, the range of AI-mediated interactions that people wish to create for themselves or others with chatbots remains poorly understood, particularly given the fast evolving landscape of chatbots. We provide an empirical study of Character.AI (cAI), a popular chatbot platform that enables users to design and share character-based bots, and synthesize this with an analysis of Reddit posts from cAI users. Contrary to popular narratives, we identify that users want to: (1) engage in intimate role-play with young adult, masculine-presenting characters that place users in a position of inferior power in well-defined scenarios and (2) immerse themselves in boundless, fantasy settings. We further find that users problematize both the excessive and insufficient sexualized content in such interactions which warrants novel digital-safety features.

Authors:Achmad Ardani Prasha, Clavino Ourizqi Rachmadi, Sabrina Laila Mutiara, Hilman Syachr Ramadhan, Chareyl Reinalyta Borneo, Saruni Dwiasnati
Title: Dynamic Spatio-Temporal Graph Neural Network for Early Detection of Pornography Addiction in Adolescents Based on Electroencephalogram Signals
Abstract:
Adolescent pornography addiction requires early detection based on objective neurobiological biomarkers because self-report is prone to subjective bias due to social stigma. Conventional machine learning has not been able to model dynamic functional connectivity of the brain that fluctuates temporally during addictive stimulus exposure. This study proposes a state-of-the-art Dynamic Spatio-Temporal Graph Neural Network (DST-GNN) that integrates Phase Lag Index (PLI)-based Graph Attention Network (GAT) for spatial modeling and Bidirectional Gated Recurrent Unit (BiGRU) for temporal dynamics. The dataset consists of 14 adolescents (7 addicted, 7 healthy) with 19-channel EEG across 9 experimental conditions. Leave-One-Subject-Out Cross Validation (LOSO-CV) evaluation shows F1-Score of 71.00%$\pm$12.10% and recall of 85.71%, a 104% improvement compared to baseline. Ablation study confirms temporal contribution of 21% and PLI graph construction of 57%. Frontal-central regions (Fz, Cz, C3, C4) are identified as dominant biomarkers with Beta contribution of 58.9% and Hjorth of 31.2%, while Cz-T7 connectivity is consistent as a trait-level biomarker for objective screening.

Authors:Mohammad Amin Samadi, Nia Nixon
Title: Personalities at Play: Probing Alignment in AI Teammates
Abstract:
Collaborative problem solving and learning are shaped by who or what is on the team. As large language models (LLMs) increasingly function as collaborators rather than tools, a key question is whether AI teammates can be aligned to express personality in predictable ways that matter for interaction and learning. We investigate AI personality alignment through a three-lens evaluation framework spanning self-perception (standardized self-report), behavioral expression (team dialogue), and reflective expression (memory construction). We first administered the Big Five Inventory (BFI-44) to LLM-based teammates across four providers (GPT-4o, Claude-3.7 Sonnet, Gemini-2.5 Pro, Grok-3), 32 high/low trait configurations, and multiple prompting strategies. LLMs produced sharply differentiated Big Five profiles, but prompt semantic richness added little beyond simple trait assignment, while provider differences and baseline "default" personalities were substantial. Role framing also mattered: several models refused the assessment without context, yet complied when framed as a collaborative teammate. We then simulated AI participation in authentic team transcripts using high-trait personas and analyzed both generated utterances and structured long-term memories with LIWC-22. Personality signals in conversation were generally subtle and most detectable for Extraversion, whereas memory representations amplified trait-specific signals, especially for Neuroticism, Conscientiousness, and Agreeableness; Openness remained difficult to elicit robustly. Together, results suggest that AI personality is measurable but multi-layered and context-dependent, and that evaluating personality-aligned AI teammates requires attention to memory and system-level design, not conversation-only behavior.

Authors:Ruanqianqian Huang, Brian Hempel, Yining Cao, James D. Hollan, Haijun Xia, Sorin Lerner
Title: Tidynote: Always-Clear Notebook Authoring
Abstract:
Recent work identified clarity as one of the top quality attributes that notebook users value, but notebooks lack support for maintaining clarity throughout the exploratory phases of the notebook authoring workflow. We propose always-clear notebook authoring that supports both clarity and exploration, and present a Jupyter implementation called Tidynote. The key to Tidynote is three-fold: (1) a scratchpad sidebar to facilitate exploration, (2) cells movable between the notebook and the scratchpad to maintain organization, and (3) linear execution with state forks to clarify program state. An exploratory study (N=13) of open-ended data analysis tasks shows that Tidynote features holistically promote clarity throughout a notebook's lifecycle, support realistic notebook tasks, and enable novel strategies for notebook clarity. These results suggest that Tidynote supports maintaining clarity throughout the entirety of notebook authoring.

Authors:Albert Tang, Yifan Mo, Jie Li, Yue Su, Mengyuan Zhang, Sander L. Koole, Koen Hindriks, Jiahuan Pei
Title: NeuroWise: A Multi-Agent LLM "Glass-Box" System for Practicing Double-Empathy Communication with Autistic Partners
Abstract:
The double empathy problem frames communication difficulties between neurodivergent and neurotypical individuals as arising from mutual misunderstanding, yet most interventions focus on autistic individuals. We present NeuroWise, a multi-agent LLM-based coaching system that supports neurotypical users through stress visualization, interpretation of internal experiences, and contextual guidance. In a between-subjects study (N=30), NeuroWise was rated as helpful by all participants and showed a significant condition-time effect on deficit-based attributions (p=0.02): NeuroWise users reduced deficit framing, while baseline users shifted toward blaming autistic "deficits" after difficult interactions. NeuroWise users also completed conversations more efficiently (37% fewer turns, p=0.03). These findings suggest that AI-based interpretation can support attributional change by helping users recognize communication challenges as mutual.

Authors:Adriana Alvarado Garcia, Ruyuan Wan, Ozioma C. Oguine, Karla Badillo-Urquiola
Title: Red Teaming LLMs as Socio-Technical Practice: From Exploration and Data Creation to Evaluation
Abstract:
Recently, red teaming, with roots in security, has become a key evaluative approach to ensure the safety and reliability of Generative Artificial Intelligence. However, most existing work emphasizes technical benchmarks and attack success rates, leaving the socio-technical practices of how red teaming datasets are defined, created, and evaluated under-examined. Drawing on 22 interviews with practitioners who design and evaluate red teaming datasets, we examine the data practices and standards that underpin this work. Because adversarial datasets determine the scope and accuracy of model evaluations, they are critical artifacts for assessing potential harms from large language models. Our contributions are first, empirical evidence of practitioners conceptualizing red teaming and developing and evaluating red teaming datasets. Second, we reflect on how practitioners' conceptualization of risk leads to overlooking the context, interaction type, and user specificity. We conclude with three opportunities for HCI researchers to expand the conceptualization and data practices for red-teaming.

Authors:Nusrat Jahan Lia, Shubhashis Roy Dipta
Title: Auditing Reciprocal Sentiment Alignment: Inversion Risk, Dialect Representation and Intent Misalignment in Transformers
Abstract:
The core theme of bidirectional alignment is ensuring that AI systems accurately understand human intent and that humans can trust AI behavior. However, this loop fractures significantly across language barriers. Our research addresses Cross-Lingual Sentiment Misalignment between Bengali and English by benchmarking four transformer architectures. We reveal severe safety and representational failures in current alignment paradigms. We demonstrate that compressed model (mDistilBERT) exhibits 28.7% "Sentiment Inversion Rate," fundamentally misinterpreting positive user intent as negative (or vice versa). Furthermore, we identify systemic nuances affecting human-AI trust, including "Asymmetric Empathy" where some models systematically dampen and others amplify the affective weight of Bengali text relative to its English counterpart. Finally, we reveal a "Modern Bias" in the regional model (IndicBERT), which shows a 57% increase in alignment error when processing formal (Sadhu) Bengali. We argue that equitable human-AI co-evolution requires pluralistic, culturally grounded alignment that respects language and dialectal diversity over universal compression, which fails to preserve the emotional fidelity required for reciprocal human-AI trust. We recommend that alignment benchmarks incorporate "Affective Stability" metrics that explicitly penalize polarity inversions in low-resource and dialectal contexts.

Authors:Seoyoung Lee, Seobin Yoon, Seongbeen Lee, Yoojung Chun, Dayoung Park, Doyeon Kim, Joo Yong Sim
Title: IntentCUA: Learning Intent-level Representations for Skill Abstraction and Multi-Agent Planning in Computer-Use Agents
Abstract:
Computer-use agents operate over long horizons under noisy perception, multi-window contexts, evolving environment states. Existing approaches, from RL-based planners to trajectory retrieval, often drift from user intent and repeatedly solve routine subproblems, leading to error accumulation and inefficiency. We present IntentCUA, a multi-agent computer-use framework designed to stabilize long-horizon execution through intent-aligned plan memory. A Planner, Plan-Optimizer, and Critic coordinate over shared memory that abstracts raw interaction traces into multi-view intent representations and reusable skills. At runtime, intent prototypes retrieve subgroup-aligned skills and inject them into partial plans, reducing redundant re-planning and mitigating error propagation across desktop applications. In end-to-end evaluations, IntentCUA achieved a 74.83% task success rate with a Step Efficiency Ratio of 0.91, outperforming RL-based and trajectory-centric baselines. Ablations show that multi-view intent abstraction and shared plan memory jointly improve execution stability, with the cooperative multi-agent loop providing the largest gains on long-horizon tasks. These results highlight that system-level intent abstraction and memory-grounded coordination are key to reliable and efficient desktop automation in large, dynamic environments.

Authors:Xinyi Lu, Kexin Phyllis Ju, Mitchell Dudley, Larissa Sano, Xu Wang
Title: AI-Mediated Feedback Improves Student Revisions: A Randomized Trial with FeedbackWriter in a Large Undergraduate Course
Abstract:
Despite growing interest in using LLMs to generate feedback on students' writing, little is known about how students respond to AI-mediated versus human-provided feedback. We address this gap through a randomized controlled trial in a large introductory economics course (N=354), where we introduce and deploy FeedbackWriter - a system that generates AI suggestions to teaching assistants (TAs) while they provide feedback on students' knowledge-intensive essays. TAs have the full capacity to adopt, edit, or dismiss the suggestions. Students were randomly assigned to receive either handwritten feedback from TAs (baseline) or AI-mediated feedback where TAs received suggestions from FeedbackWriter. Students revise their drafts based on the feedback, which is further graded. In total, 1,366 essays were graded using the system. We found that students receiving AI-mediated feedback produced significantly higher-quality revisions, with gains increasing as TAs adopted more AI suggestions. TAs found the AI suggestions useful for spotting gaps and clarifying rubrics.

Authors:Aruna Sankaranarayanan, Amir Zur, Atticus Geiger, Dylan Hadfield-Menell
Title: Surgical Activation Steering via Generative Causal Mediation
Abstract:
Where should we intervene in a language model (LM) to control behaviors that are diffused across many tokens of a long-form response? We introduce Generative Causal Mediation (GCM), a procedure for selecting model components, e.g., attention heads, to steer a binary concept (e.g., talk in verse vs. talk in prose) from contrastive long-form responses. In GCM, we first construct a dataset of contrasting inputs and responses. Then, we quantify how individual model components mediate the contrastive concept and select the strongest mediators for steering. We evaluate GCM on three tasks--refusal, sycophancy, and style transfer--across three language models. GCM successfully localizes concepts expressed in long-form responses and consistently outperforms correlational probe-based baselines when steering with a sparse set of attention heads. Together, these results demonstrate that GCM provides an effective approach for localizing and controlling the long-form responses of LMs.

Authors:Johannes Kirmayr, Raphael Wennmacher, Khanh Huynh, Lukas Stappen, Elisabeth André, Florian Alt
Title: "What Are You Doing?": Effects of Intermediate Feedback from Agentic LLM In-Car Assistants During Multi-Step Processing
Abstract:
Agentic AI assistants that autonomously perform multi-step tasks raise open questions for user experience: how should such systems communicate progress and reasoning during extended operations, especially in attention-critical contexts such as driving? We investigate feedback timing and verbosity from agentic LLM-based in-car assistants through a controlled, mixed-methods study (N=45) comparing planned steps and intermediate results feedback against silent operation with final-only response. Using a dual-task paradigm with an in-car voice assistant, we found that intermediate feedback significantly improved perceived speed, trust, and user experience while reducing task load - effects that held across varying task complexities and interaction contexts. Interviews further revealed user preferences for an adaptive approach: high initial transparency to establish trust, followed by progressively reducing verbosity as systems prove reliable, with adjustments based on task stakes and situational context. We translate our empirical findings into design implications for feedback timing and verbosity in agentic assistants, balancing transparency and efficiency.

Authors:Ankit Bhattarai, Hannah Selder, Florian Fischer, Arthur Fleig, Per Ola Kristensson
Title: MyoInteract: A Framework for Fast Prototyping of Biomechanical HCI Tasks using Reinforcement Learning
Abstract:
Reinforcement learning (RL)-based biomechanical simulations have the potential to revolutionise HCI research and interaction design, but currently lack usability and interpretability. Using the Human Action Cycle as a design lens, we identify key limitations of biomechanical RL frameworks and develop MyoInteract, a novel framework for fast prototyping of biomechanical HCI tasks. MyoInteract allows designers to setup tasks, user models, and training parameters from an easy-to-use GUI within minutes. It trains and evaluates muscle-actuated simulated users within minutes, reducing training times by up to 98%. A workshop study with 12 interaction designers revealed that MyoInteract allowed novices in biomechanical RL to successfully setup, train, and assess goal-directed user movements within a single session. By transforming biomechanical RL from a days-long expert task into an accessible hour-long workflow, this work significantly lowers barriers to entry and accelerates iteration cycles in HCI biomechanics research.

Authors:Shiwei Hong, Lingyao Li, Ethan Z. Rong, Chenxinran Shen, Zhicong Lu
Title: Multi-Agent Comedy Club: Investigating Community Discussion Effects on LLM Humor Generation
Abstract:
Prior work has explored multi-turn interaction and feedback for LLM writing, but evaluations still largely center on prompts and localized feedback, leaving persistent public reception in online communities underexamined. We test whether broadcast community discussion improves stand-up comedy writing in a controlled multi-agent sandbox: in the discussion condition, critic and audience threads are recorded, filtered, stored as social memory, and later retrieved to condition subsequent generations, whereas the baseline omits discussion. Across 50 rounds (250 paired monologues) judged by five expert annotators using A/B preference and a 15-item rubric, discussion wins 75.6% of instances and improves Craft/Clarity (Δ = 0.440) and Social Response (Δ = 0.422), with occasional increases in aggressive humor.

Authors:Kengo Tanaka, Xiyue Wang, Hironobu Takagi, Yoichi Ochiai, Chieko Asakawa
Title: Touching Movement: 3D Tactile Poses for Supporting Blind People in Learning Body Movements
Abstract:
Visual impairments create barriers to learning physical activities, since conventional training methods rely on visual demonstrations or often inadequate verbal descriptions. This research explores 3D-printed human body models to enhance movement comprehension for blind individuals. Through a participatory design approach in collaboration with a blind designer, we developed detailed 3D models representing various body movements and incorporated tactile reference elements to enhance spatial understanding. We conducted two user studies with 10 blind participants across different activities: static yoga poses and sequential calisthenic movements. The results demonstrated that 3D models significantly improved understanding speed, reduced questions for clarification, and enhanced movement accuracy compared to conventional teaching methods. Participants consistently rated 3D models higher for ease of understanding, effectiveness, and motivation.

Authors:Guozheng Li, Ao Wang, Shaoxiang Wang, Yu Zhang, Pengcheng Cao, Yang Bai, Chi Harold Liu
Title: DALL: Data Labeling via Data Programming and Active Learning Enhanced by Large Language Models
Abstract:
Deep learning models for natural language processing rely heavily on high-quality labeled datasets. However, existing labeling approaches often struggle to balance label quality with labeling cost. To address this challenge, we propose DALL, a text labeling framework that integrates data programming, active learning, and large language models. DALL introduces a structured specification that allows users and large language models to define labeling functions via configuration, rather than code. Active learning identifies informative instances for review, and the large language model analyzes these instances to help users correct labels and to refine or suggest labeling functions. We implement DALL as an interactive labeling system for text labeling tasks. Comparative, ablation, and usability studies demonstrate DALL's efficiency, the effectiveness of its modules, and its usability.

Authors:Johanna Olesk, Ozioma C. Oguine, Mariana Fernandez Espinosa, Alexis B. Peirce Caudell, Karla Badillo-Urquiola
Title: A System of Care, Not Control: Co-Designing Online Safety and Wellbeing Solutions with Guardians ad Litem for Youth in Child Welfare
Abstract:
Current online safety technologies overly rely on parental mediation and often fail to address the unique challenges faced by youth in the Child Welfare System (CWS). These youth depend on a complex ecosystem of support, including families, caseworkers, and advocates, to safeguard their wellbeing. Within this network, Guardians ad Litem (GALs) play a unique role as court-appointed advocates tasked with ensuring the best interests of youth. Yet little is known about how GALs perceive and support youths' online safety. To address this gap, we conducted a two-part workshop with 10 GALs to explore their perspectives on online safety and collaboratively envision technology-based solutions tailored to the needs of youth in the CWS. Our findings revealed that GALs struggle to support youth with online safety challenges due to limited digital literacy, inconsistency of institutional support, lack of collaboration among stakeholders, and complexity of family dynamics. While GALs recognized the need for some oversight of youth online activities, they emphasized designing systems that support online safety beyond control or restriction by fostering stability, trust, and meaningful interactions, both online and offline. GALs emphasized the importance of developing tools that enable ongoing communication, therapeutic support, and coordination across stakeholders. Proposed design concepts focused on strengthening youth agency and cross-stakeholder collaboration through virtual avatars and mobile apps. This work provides actionable design concepts for strengthening relationships and communication across care network. It also redefines traditional approaches to online safety, advocating for a holistic, multi-stakeholder online safety paradigm for youth in the CWS.

Authors:Ricardo E. Gonzalez Penuela, Crescentia Jung, Sharon Y Lin, Ruiying Hu, Shiri Azenkot
Title: How Multimodal Large Language Models Support Access to Visual Information: A Diary Study With Blind and Low Vision People
Abstract:
Multimodal large language models (MLLMs) are changing how Blind and Low Vision (BLV) people access visual information. Unlike traditional visual interpretation tools that only provide descriptions, MLLM-enabled applications offer conversational assistance, where users can ask questions to obtain goal-relevant details. However, evidence about their performance in the real-world and implications for BLV people's daily lives remains limited. To address this, we conducted a two-week diary study, where we captured 20 BLV participants' use of an MLLM-enabled visual interpretation application. Although participants rated the visual interpretations of the application as "trustworthy" (mean=3.76 out of 5, max=extremely trustworthy) and "somewhat satisfying" (mean=4.13 out of 5, max=very satisfying), the AI often produced incorrect answers (22.2%) or abstained (10.8%) from responding to users' requests. Our findings show that while MLLMs can improve visual interpretations' descriptive accuracy, supporting everyday use also depends on the "visual assistant" skill: behaviors for providing goal-directed, reliable assistance. We conclude by proposing the "visual assistant" skill and guidelines to help MLLM-enabled visual interpretation applications better support BLV people's access to visual information.

Authors:Wei Wei, Foroozan Daneshzand, Zezhong Wang, Erica Mattson, Charles Perin, Sheelagh Carpendale
Title: The Fuzzy Front Ends: Reflections on the Never-Ending Story of Visualization Co-Design
Abstract:
Co-design is an increasingly popular approach in HCI and visualization, yet there is little guidance on how to effectively apply this method in visualization contexts. In this paper, we visually present our experience of a two-and-a-half-year co-design project with the local arts community. Focusing on facilitating community exploration and sense-making around arts funding distribution, the project involved a series of co-design sessions between visualization researchers and members of the arts community. Through these iterative sessions, we built shared understanding and developed visualization prototypes tailored to community needs. However, the practice is far from complete, and we found ourselves continually returning to the "fuzzy front end" of the co-design process. We share this ongoing story through comic-style visuals and reflect on three fuzzy front ends that we encountered during the project. By sharing these experiences with the visualization community, we hope to offer insights that others can draw on in their own community-engaged co-design work.

Authors:Ayato Kitadai, Takumi Ito, Yumiko Nagoh, Hiroki Takahashi, Masanori Fujita, Sangjic Lee, Fumiaki Miyahara, Tetsu Natsume, Nariaki Nishino
Title: Decision Support System for Technology Opportunity Discovery: An Application of the Schwartz Theory of Basic Values
Abstract:
Discovering technology opportunities (TOD) remains a critical challenge for innovation management, especially in early-stage development where consumer needs are often unclear. Existing methods frequently fail to systematically incorporate end-user perspectives, resulting in a misalignment between technological potentials and market relevance. This study proposes a novel decision support framework that bridges this gap by linking technological feasibility with fundamental human values. The framework integrates two distinct lenses: the engineering-based Technology Readiness Levels (TRL) and Schwartz's theory of basic human values. By combining these, the approach enables a structured exploration of how emerging technologies may satisfy diverse user motivations. To illustrate the framework's feasibility and insight potential, we conducted exploratory workshops with general consumers and internal experts at Sony Computer Science Laboratories, Inc., analyzing four real-world technologies (two commercial successes and two failures). Two consistent patterns emerged: (1) internal experts identified a wider value landscape than consumers (vision gap), and (2) successful technologies exhibited a broader range of associated human values (value breadth), suggesting strategic foresight may underpin market success. This study contributes both a practical tool for early-stage R\&D decision-making and a theoretical link between value theory and innovation outcomes. While exploratory in scope, the findings highlight the promise of value-centric evaluation as a foundation for more human-centered technology opportunity discovery.

Authors:Ryota Takamido, Chiharu Suzuki, Hiroki Nakamoto
Title: Data-driven modelling of low-dimensional dynamical structures underlying complex full-body human movement
Abstract:
One of the central challenges in the study of human motor control and learning is the degrees-of-freedom problem. Although the dynamical systems approach (DSA) has provided valuable insights into addressing this issue, its application has largely been confined to cyclic or simplified motor movements. To overcome this limitation, the present study employs neural ordinary differential equations (NODEs) to model the time evolution of non-cyclic full-body movements as a low-dimensional latent dynamical system. Given the temporal complexity full-body kinematic chains, baseball pitching was selected as a representative target movement to examine whether DSA could be extended to more complex, ecologically valid human movements. Results of the verification experiment demonstrated that the time evolution of a complex pitching motion could be accurately predicted (R^2 > 0.45) using the NODE-based dynamical model. Notably, approximately 50% of the variance in the latter half of the pitching motion was explained using only the initial ~8% of the temporal sequence, underscoring how subsequent movement evolves from initial conditions according to ODE-defined dynamics in latent space. These findings indicate the potential to extend the DSA to more complex and ecologically valid forms of human movement.

Authors:Zihao Zhu, Junnan Yu, Yuhan Luo
Title: Scaffolding Metacognition with GenAI: Exploring Design Opportunities to Support Task Management for University Students with ADHD
Abstract:
For university students transitioning to an independent and flexible lifestyle, having ADHD poses multiple challenges to their academic task management, which are closely tied to their metacognitive struggles--difficulties in awareness and regulation of one's own thinking processes. The recently surged Generative AI shows promise to mitigate these gaps with its advanced information understanding and generation capabilities. As an exploratory step, we conducted co-design sessions with 20 university students diagnosed with ADHD, followed by interviews with five experts specialized in ADHD intervention. Adopting a metacognitive lens, we examined participants' ideas on GenAI-based task management support and experts' assessments, which led to three design directions: providing cognitive scaffolding to enhance task and self-awareness, promoting reflective task execution for building metacognitive abilities, and facilitating emotional regulation to sustain task engagement. Drawing on these findings, we discuss opportunities for GenAI to support the metacognitive needs of neurodivergent populations, offering future directions for both research and practice.

Authors:Arran Zeyu Wang, David Borland, Estella Calcaterra, David Gotz
Title: Contextualization or Rationalization? The Effect of Causal Priors on Data Visualization Interpretation
Abstract:
Understanding how individuals interpret charts is a crucial concern for visual data communication. This imperative has motivated a number of studies, including past work demonstrating that causal priors -- a priori beliefs about causal relationships between concepts -- can have significant influences on the perceived strength of variable relationships inferred from visualizations. This paper builds on these previous results, demonstrating that causal priors can also influence the types of patterns that people perceive as the most salient within ambiguous scatterplots that have roughly equal evidence for trend and cluster patterns. Using a mixed-design approach that combines a large-scale online experiment for breadth of findings with an in-person think-aloud study for analytical depth, we investigated how users' interpretations are influenced by the interplay between causal priors and the visualized data patterns. Our analysis suggests two archetypal reasoning behaviors through which people often make their observations: contextualization, in which users accept a visual pattern that aligns with causal priors and use their existing knowledge to enrich interpretation, and rationalization, in which users encounter a pattern that conflicts with causal priors and attempt to explain away the discrepancy by invoking external factors, such as positing confounding variables or data selection bias. These findings provide initial evidence highlighting the critical role of causal priors in shaping high-level visualization comprehension, and introduce a vocabulary for describing how users reason about data that either confirms or challenges prior beliefs of causality.

Authors:Manusha Karunathilaka, Litian Lei, Yiming Gao, Yong Wang, Jiannan Li
Title: Compendia: Automated Visual Storytelling Generation from Online Article Collection
Abstract:
In the digital age, readers value quantitative journalism that is clear, concise, analytical, and human-centred. To understand complex topics, they often piece together scattered facts from multiple articles. Visual storytelling can transform fragmented information into clear, engaging narratives, yet its use with unstructured online articles remains largely unexplored. To fill this gap, we present Compendia, an automated system that analyzes online articles in response to a user's query and generates a coherent data story tailored to the user's informational needs. Compendia addresses key challenges of storytelling from unstructured text through two modules covering: Online Article Retrieval, which gathers relevant articles; Data Fact Extraction, which identifies, validates, and refines quantitative facts; Fact Organization, which clusters and merges related facts into coherent thematic groups; and Visual Storytelling, which transforms the organized facts into narratives with visualizations in an interactive scrollytelling interface. We evaluated Compendia through a quantitative analysis, confirming the accuracy in fact extraction and organization, and through two user studies with 16 participants, demonstrating its usability, effectiveness, and ability to produce engaging visual stories for open-ended queries.

Authors:Ying Liu, Si Zuo, Chao Yang, Yuqing Song, Dariush Salami, Stephan Sigg
Title: ImmCOGNITO: Identity Obfuscation in Millimeter-Wave Radar-Based Gesture Recognition for IoT Environments
Abstract:
Millimeter-Wave (mmWave) radar enables camera-free gesture recognition for Internet of Things (IoT) interfaces, with robustness to lighting variations and partial occlusions. However, recent studies reveal that its data can inadvertently encode biometric signatures, raising critical privacy challenges for IoT applications. In particular, we demonstrate that mmWave radar point cloud data can leak identity-related information in the absence of explicit identity labels. To address this risk, we propose {ImmCOGNITO}, a graph-based autoencoder that transforms radar gesture point clouds to preserve gesture-relevant structure while suppressing identity cues. The encoder first constructs a directed graph for each sequence using Temporal Graph KNN. Edges are defined to capture inter-frame temporal dynamics. A message-passing neural network with multi-head self-attention then aggregates local and global spatio-temporal context, and the global max-pooled feature is concatenated with the original features. The decoder then reconstructs a minimally perturbed point cloud that retains gesture discriminative attributes while achieving de-identification. Training jointly optimizes reconstruction, gesture-preservation, and de-identification objectives. Evaluations on two public datasets, PantoRad and MHomeGes, show that ImmCOGNITO substantially reduces identification accuracy while maintaining high gesture recognition performance.

Authors:Andreea Tulbure, Carmen Scheidemann, Elias Steiner, Marco Hutter
Title: Task-Oriented Robot-Human Handovers on Legged Manipulators
Abstract:
Task-oriented handovers (TOH) are fundamental to effective human-robot collaboration, requiring robots to present objects in a way that supports the human's intended post-handover use. Existing approaches are typically based on object- or task-specific affordances, but their ability to generalize to novel scenarios is limited. To address this gap, we present AFT-Handover, a framework that integrates large language model (LLM)-driven affordance reasoning with efficient texture-based affordance transfer to achieve zero-shot, generalizable TOH. Given a novel object-task pair, the method retrieves a proxy exemplar from a database, establishes part-level correspondences via LLM reasoning, and texturizes affordances for feature-based point cloud transfer. We evaluate AFT-Handover across diverse task-object pairs, showing improved handover success rates and stronger generalization compared to baselines. In a comparative user study, our framework is significantly preferred over the current state-of-the-art, effectively reducing human regrasping before tool use. Finally, we demonstrate TOH on legged manipulators, highlighting the potential of our framework for real-world robot-human handovers.

Authors:Pavithren V S Pakianathan, Rania Islambouli, Diogo Branco, Albrecht Schmidt, Tiago Guerreiro, Jan David Smeddinck
Title: Exploring AI-Augmented Sensemaking of Patient-Generated Health Data: A Mixed-Method Study with Healthcare Professionals in Cardiac Risk Reduction
Abstract:
Individuals are increasingly generating substantial personal health and lifestyle data, e.g. through wearables and smartphones. While such data could transform preventative care, its integration into clinical practice is hindered by its scale, heterogeneity and the time pressure and data literacy of healthcare professionals (HCPs). We explore how large language models (LLMs) can support sensemaking of patient-generated health data (PGHD) with automated summaries and natural language data exploration. Using cardiovascular disease (CVD) risk reduction as a use case, 16 HCPs reviewed multimodal PGHD in a mixed-methods study with a prototype that integrated common charts, LLM-generated summaries, and a conversational interface. Findings show that AI summaries provided quick overviews that anchored exploration, while conversational interaction supported flexible analysis and bridged data-literacy gaps. However, HCPs raised concerns about transparency, privacy, and overreliance. We contribute empirical insights and sociotechnical design implications for integrating AI-driven summarization and conversation into clinical workflows to support PGHD sensemaking.

Authors:Kashif Imteyaz, Michael Muller, Claudia Flores-Saviaga, Saiph Savage
Title: Co-Designing Collaborative Generative AI Tools for Freelancers
Abstract:
Most generative AI tools prioritize individual productivity and personalization, with limited support for collaboration. Designed for traditional workplaces, these tools do not fit freelancers' short-term teams or lack of shared institutional support, which can worsen their isolation and overlook freelancing platform dynamics. This mismatch means that, instead of empowering freelancers, current generative AI tools could reinforce existing precarity and make freelancer collaboration harder. To investigate how to design generative AI tools to support freelancer collaboration, we conducted co-design sessions with 27 freelancers. A key concern that emerged was the risk of AI systems compromising their creative agency and work identities when collaborating, especially when AI tools could reproduce content without attribution, threatening the authenticity and distinctiveness of their collaborative work. Freelancers proposed "auxiliary AI" systems, human-guided tools that support their creative agencies and identities, allowing for flexible freelancer-led collaborations that promote "productive friction". Drawing on Marcuse's concept of technological rationality, we argue that freelancers are resisting one-dimensional, efficiency-driven AI, and instead envisioning technologies that preserve their collective creative agencies. We conclude with design recommendations for collaborative generative AI tools for freelancers.

Authors:Jiaye Li, Tongshun Chen, Siyi Ma, Elizabeth Churchill, Ke Wu
Title: PuppetAI: A Customizable Platform for Designing Tactile-Rich Affective Robot Interaction
Abstract:
We introduce PuppetAI, a modular soft robot interaction platform. This platform offers a scalable cable-driven actuation system and a customizable, puppet-inspired robot gesture framework, supporting a multitude of interaction gesture robot design formats. The platform comprises a four-layer decoupled software architecture that includes perceptual processing, affective modeling, motion scheduling, and low-level actuation. We also implemented an affective expression loop that connects human input to the robot platform by producing real-time emotional gestural responses to human vocal input. For our own designs, we have worked with nuanced gestures enacted by "soft robots" with enhanced dexterity and "pleasant-to-touch" plush exteriors. By reducing operational complexity and production costs while enhancing customizability, our work creates an adaptable and accessible foundation for future tactile-based expressive robot research. Our goal is to provide a platform that allows researchers to independently construct or refine highly specific gestures and movements performed by social robots.

Authors:Crescentia Jung, Kexin Cheng, Sharon Heung, Malte F. Jung, Shiri Azenkot
Title: Understanding How Accessibility Practices Impact Teamwork in Mixed-Ability Teams that Collaborate Virtually
Abstract:
Virtual collaboration has transformed how people in mixed-ability teams, composed of disabled and non-disabled people, work together by offering greater flexibility. In these settings, accessibility practices, such as accommodations and inclusive norms, are essential for providing access to disabled people. However, we do not yet know how these practices shape broader facets of teamwork, such as productivity, participation, and camaraderie. To address this gap, we interviewed 18 participants (12 disabled, 6 non-disabled) who are part of mixed-ability teams. We found that beyond providing access, accessibility practices shaped how all participants coordinated tasks, sustained rapport, and negotiated responsibilities. Accessibility practices also introduced camaraderie challenges, such as balancing empathy and accountability. Non-disabled participants described allyship as a learning process and skill shaped by their disabled team members and team culture. Based on our findings, we present recommendations for team practices and design opportunities for virtual collaboration tools that reframe accessibility practices as a foundation for strong teamwork.

Authors:Michael Küttner, Valeria Zitz, Supraja Ramesh, Michael Beigl, Tobias Röddiger
Title: EarResp-ANS : Audio-Based On-Device Respiration Rate Estimation on Earphones with Adaptive Noise Suppression
Abstract:
Respiratory rate (RR) is a key vital sign for clinical assessment and mental well-being, yet it is rarely monitored in everyday life due to the lack of unobtrusive sensing technologies. In-ear audio sensing is promising due to its high social acceptance and the amplification of physiological sounds caused by the occlusion effect; however, existing approaches often fail under real-world noise or rely on computationally expensive models. We present EarResp-ANS, the first system enabling fully on-device, real-time RR estimation on commercial earphones. The system employs LMS-based adaptive noise suppression (ANS) to attenuate ambient noise while preserving respiration-related acoustic components, without requiring neural networks or audio streaming, thereby explicitly addressing the energy and privacy constraints of wearable devices. We evaluate EarResp-ANS in a study with 18 participants under realistic acoustic conditions, including music, cafeteria noise, and white noise up to 80 dB SPL. EarResp-ANS achieves robust performance with a global MAE of 0.84 CPM , reduced to 0.47 CPM via automatic outlier rejection, while operating with less than 2% processor load directly on the earphone.

Authors:Lingyu Du, Xucong Zhang, Guohao Lan
Title: Talk to Me, Not the Slides: A Real-Time Wearable Assistant for Improving Eye Contact in Presentations
Abstract:
Effective eye contact is a cornerstone of successful public speaking. It strengthens the speaker's credibility and fosters audience engagement. Yet, managing effective eye contact is a skill that demands extensive training and practice, often posing a significant challenge for novice speakers. In this paper, we present SpeakAssis, the first real-time, in-situ wearable system designed to actively assist speakers in maintaining effective eye contact during live presentations. Leveraging a head-mounted eye tracker for gaze and scene view capture, SpeakAssis continuously monitors and analyzes the speaker's gaze distribution across audience and non-audience regions. When ineffective eye-contact patterns are detected, such as insufficient eye contact, or neglect of certain audience segments, SpeakAssis provides timely, context-aware audio prompts via an earphone to guide the speaker's gaze behavior. We evaluate SpeakAssis through a user study involving eight speakers and 24 audience members. Quantitative results show that SpeakAssis increases speakers' eye-contact duration by 62.5% on average and promotes a more balanced distribution of visual attention. Additionally, statistical analysis based on audience surveys reveals that improvements in speaker's eye-contact behavior significantly enhance the audience's perceived engagement and interactivity during presentations.

Authors:Sandra Loop, Erik Bertram, Sebastian Juhl, Martin Schrepp
Title: Integrating Multi-Label Classification and Generative AI for Scalable Analysis of User Feedback
Abstract:
In highly competitive software markets, user experience (UX) evaluation is crucial for ensuring software quality and fostering long-term product success. Such UX evaluations typically combine quantitative metrics from standardized questionnaires with qualitative feedback collected through open-ended questions. While open-ended feedback offers valuable insights for improvement and helps explain quantitative results, analyzing large volumes of user comments is challenging and time-consuming. In this paper, we present techniques developed during a long-term UX measurement project at a major software company to efficiently process and interpret extensive volumes of user comments. To provide a high-level overview of the collected comments, we employ a supervised machine learning approach that assigns meaningful, pre-defined topic labels to each comment. Additionally, we demonstrate how generative AI (GenAI) can be leveraged to create concise and informative summaries of user feedback, facilitating effective communication of findings to the organization and especially upper management. Finally, we investigate whether the sentiment expressed in user comments can serve as an indicator for overall product satisfaction. Our results show that sentiment analysis alone does not reliably reflect user satisfaction. Instead, product satisfaction needs to be assessed explicitly in surveys to measure the user's perception of the product.

Authors:Stina Klein, Birgit Prodinger, Elisabeth André, Lars Mikelsons, Nils Mandischer
Title: Assistive Robots and Reasonable Work Assignment Reduce Perceived Stigma toward Persons with Disabilities
Abstract:
Robots are becoming more prominent in assisting persons with disabilities (PwD). Whilst there is broad consensus that robots can assist in mitigating physical impairments, the extent to which they can facilitate social inclusion remains equivocal. In fact, the exposed status of assisted workers could likewise lead to reduced or increased perceived stigma by other workers. We present a vignette study on the perceived cognitive and behavioral stigma toward PwD in the workplace. We designed four experimental conditions depicting a coworker with an impairment in work scenarios: overburdened work, suitable work, and robot-assisted work only for the coworker, and an offer of robot-assisted work for everyone. Our results show that cognitive stigma is significantly reduced when the work task is adapted to the person's abilities or augmented by an assistive robot. In addition, offering robot-assisted work for everyone, in the sense of universal design, further reduces perceived cognitive stigma. Thus, we conclude that assistive robots reduce perceived cognitive stigma, thereby supporting the use of collaborative robots in work scenarios involving PwDs.

Authors:Michał Patryk Miazga, Hannah Bussmann, Antti Oulasvirta, Patrick Ebel
Title: Log2Motion: Biomechanical Motion Synthesis from Touch Logs
Abstract:
Touch data from mobile devices are collected at scale but reveal little about the interactions that produce them. While biomechanical simulations can illuminate motor control processes, they have not yet been developed for touch interactions. To close this gap, we propose a novel computational problem: synthesizing plausible motion directly from logs. Our key insight is a reinforcement learning-driven musculoskeletal forward simulation that generates biomechanically plausible motion sequences consistent with events recorded in touch logs. We achieve this by integrating a software emulator into a physics simulator, allowing biomechanical models to manipulate real applications in real-time. Log2Motion produces rich syntheses of user movements from touch logs, including estimates of motion, speed, accuracy, and effort. We assess the plausibility of generated movements by comparing against human data from a motion capture study and prior findings, and demonstrate Log2Motion in a large-scale dataset. Biomechanical motion synthesis provides a new way to understand log data, illuminating the ergonomics and motor control underlying touch interactions.

Authors:Davide Falessi, Silvia Golia, Angela Locoro
Title: Beyond Literacy: Predicting Interpretation Correctness of Visualizations with User Traits, Item Difficulty, and Rasch Scores
Abstract:
Data Visualization Literacy assessments are typically administered via fixed sets of Data Visualization items, despite substantial heterogeneity in how different people interpret the same visualization. This paper presents and evaluates an approach for predicting Human Interpretation Correctness (P-HIC) of data visualizations; i.e., anticipating whether a specific person will interpret a data visualization correctly or not, before exposure to that DV, enabling more personalized assessment and training. We operationalize P-HIC as a binary classification problem using 22 features spanning Human Profile, Human Performance, and Item difficulty (including ExpertDifficulty and RaschDifficulty). We evaluate three machine-learning models (Logistic Regression model, Random Forest, Multi Layer Perceptron) with and without feature selection, using a survey with 1,083 participants who answered 32 Data Visualization items (eight data visualizations per four items), yielding 34,656 item responses. Performance is assessed via a ten-time ten-fold cross-validation in each 32 (item-specific) datasets, using AUC and Cohen's kappa. Logistic Regression model with feature selection is the best-performing approach, reaching a median AUC of 0.72 and a median kappa of 0.32. Feature analyses show RaschDifficulty as the dominant predictor, followed by experts' ratings and prior correctness (PercCorrect), whose relevance increases across sessions. Profile information did not particularly support P-HIC. Our results support the feasibility of anticipating misinterpretations of data visualizations, and motivate the runtime selection of data visualizations items tailored to an audience, thereby improving the efficiency of Data Visualization Literacy assessment and targeted training.

Authors:Gennie Mansi, Julia Kim, Mark Riedl
Title: Evaluating Actionability in Explainable AI
Abstract:
A core assumption of Explainable AI (XAI) is that explanations are useful to users -- that is, users will do something with the explanations. Prior work, however, does not clearly connect the information provided in explanations to user actions to evaluate effectiveness. In this paper, we articulate this connection. We conducted a formative study through 14 interviews with end users in education and medicine. We contribute a catalog of information and associated actions. Our catalog maps 12 categories of information that participants described relying on to take 60 different actions. We show how AI Creators can use the catalog's specificity and breadth to articulate how they expect information in their explanations to lead to user actions and test their assumptions. We use an exemplar XAI system to illustrate this approach. We conclude by discussing how our catalog expands the design space for XAI systems to support actionability.

Authors:Zheng Zhang, Mengjie Yu, Tianyi Wang, Kashyap Todi, Ajoy Savio Fernandes, Yue Liu, Haijun Xia, Tovi Grossman, Tanya Jonker
Title: Gazeify Then Voiceify: Physical Object Referencing Through Gaze and Voice Interaction with Displayless Smart Glasses
Abstract:
Smart glasses enhance interactions with the environment by using head-mounted cameras to observe the user's viewpoint, but lack the visual feedback used for common interactions. We introduce Gazeify then Voiceify, a multimodal approach allowing object selection via gaze and voice using displayless smart glasses. Users can select a physical object with their gaze, and the system generates a digital mask and a voice description of the object's semantics. Users can further correct errors through free-form conversation. To demonstrate our approach, we develop an interactive system by integrating advanced object segmentation and detection with a vision-language model. User studies reveal that participants achieve correct gaze selection in 53% of the task trials and use voice disambiguation to correct 58% of the remaining errors. Participants also rated the system as likable, useful, and easy to use.

Authors:Sizhe Cheng, Feng Liang, Yuhan Wen, Xipei Yu, Yong Wang
Title: Probing the Future of Meta-Analysis: Eliciting Design Principles via an Agentic Research IDE
Abstract:
Meta-analyses and systematic reviews demand rigorous abductive reasoning to build, test, and refine hypotheses across vast, heterogeneous literature. While NLP advancements have automated parts of this pipeline, existing tools often detach researchers from the cognitive loop or function merely as retrieval engines, leading to loss of intellectual ownership and frequent context switching. We present Research IDE, a prototype reimagining authoring environments through the "Research as Code" metaphor. Research IDE embeds a multi-agent backend into the writing flow, enabling in-situ verification via "hypothesis breakpoints." A one-week field deployment with 8 domain experts, followed by a reflective workshop, as a Research through Design (RtD) probe, reveals that users strongly preferred this verification workflow, actively leveraged prior knowledge for confirmation, and reported that breakpoints sparked insights. Drawing from participant feedback and suggestions, we derive design implications for future AI-assisted research tools that fully preserve researcher autonomy and intellectual ownership while harnessing computational scale.

Authors:Meziah Ruby Cristobal, Hyeonjeong Byeon, Tze-Yu Chen, Ruoxi Shang, Donghoon Shin, Ruican Zhong, Tony Zhou, Gary Hsieh
Title: PaperTok: Exploring the Use of Generative AI for Creating Short-form Videos for Research Communication
Abstract:
The dissemination of scholarly research is critical, yet researchers often lack the time and skills to create engaging content for popular media such as short-form videos. To address this gap, we explore the use of generative AI to help researchers transform their academic papers into accessible video content. Informed by a formative study with science communicators and content creators (N=8), we designed PaperTok, an end-to-end system that automates the initial creative labor by generating script options and corresponding audiovisual content from a source paper. Researchers can then refine based on their preferences with further prompting. A mixed-methods user study (N=18) and crowdsourced evaluation (N=100) demonstrate that PaperTok's workflow can help researchers create engaging and informative short-form videos. We also identified the need for more fine-grained controls in the creation process. To this end, we offer implications for future generative tools that support science outreach.

Authors:Tian-Yi Zhou, Xuan-Hao Liu, Bao-Liang Lu, Wei-Long Zheng
Title: MindCine: Multimodal EEG-to-Video Reconstruction with Large-Scale Pretrained Models
Abstract:
Reconstructing human dynamic visual perception from electroencephalography (EEG) signals is of great research significance since EEG's non-invasiveness and high temporal resolution. However, EEG-to-video reconstruction remains challenging due to: 1) Single Modality: existing studies solely align EEG signals with the text modality, which ignores other modalities and are prone to suffer from overfitting problems; 2) Data Scarcity: current methods often have difficulty training to converge with limited EEG-video data. To solve the above problems, we propose a novel framework MindCine to achieve high-fidelity video reconstructions on limited data. We employ a multimodal joint learning strategy to incorporate beyond-text modalities in the training stage and leverage a pre-trained large EEG model to relieve the data scarcity issue for decoding semantic information, while a Seq2Seq model with causal attention is specifically designed for decoding perceptual information. Extensive experiments demonstrate that our model outperforms state-of-the-art methods both qualitatively and quantitatively. Additionally, the results underscore the effectiveness of the complementary strengths of different modalities and demonstrate that leveraging a large-scale EEG model can further enhance reconstruction performance by alleviating the challenges associated with limited data.

Authors:Kiana Jafari, Paul Ulrich Nikolaus Rust, Duncan Eddy, Robbie Fraser, Nina Vasan, Darja Djordjevic, Akanksha Dadlani, Max Lamparth, Eugenia Kim, Mykel Kochenderfer
Title: Expert Evaluation and the Limits of Human Feedback in Mental Health AI Safety Testing
Abstract:
Learning from human feedback~(LHF) assumes that expert judgments, appropriately aggregated, yield valid ground truth for training and evaluating AI systems. We tested this assumption in mental health, where high safety stakes make expert consensus essential. Three certified psychiatrists independently evaluated LLM-generated responses using a calibrated rubric. Despite similar training and shared instructions, inter-rater reliability was consistently poor ($ICC$ $0.087$--$0.295$), falling below thresholds considered acceptable for consequential assessment. Disagreement was highest on the most safety-critical items. Suicide and self-harm responses produced greater divergence than any other category, and was systematic rather than random. One factor yielded negative reliability (Krippendorff's $α= -0.203$), indicating structured disagreement worse than chance. Qualitative interviews revealed that disagreement reflects coherent but incompatible individual clinical frameworks, safety-first, engagement-centered, and culturally-informed orientations, rather than measurement error. By demonstrating that experts rely on holistic risk heuristics rather than granular factor discrimination, these findings suggest that aggregated labels function as arithmetic compromises that effectively erase grounded professional philosophies. Our results characterize expert disagreement in safety-critical AI as a sociotechnical phenomenon where professional experience introduces sophisticated layers of principled divergence. We discuss implications for reward modeling, safety classification, and evaluation benchmarks, recommending that practitioners shift from consensus-based aggregation to alignment methods that preserve and learn from expert disagreement.

Authors:Jiexin Ding, Yizhuo Zhang, Xinyun Liu, Ke chen, Yuntao Wang, Shwetak Patel, Akshay Gadre
Title: GazeSummary: Exploring Gaze as an Implicit Prompt for Personalization in Text-based LLM Tasks
Abstract:
Smart glasses are accelerating progress toward more seamless and personalized LLM-based assistance by integrating multimodal inputs. Yet, these inputs rely on obtrusive explicit prompts. The advent of gaze tracking on smart devices offers a unique opportunity to extract implicit user intent for personalization. This paper investigates whether LLMs can interpret user gaze for text-based tasks. We evaluate different gaze representations for personalization and validate their effectiveness in realistic reading tasks. Results show that LLMs can leverage gaze to generate high-quality personalized summaries and support users in downstream tasks, highlighting the feasibility and value of gaze-driven personalization for future mobile and wearable LLM applications.

Authors:Ludwig Felder, Tobias Eisenreich, Mahsa Fischer, Stefan Wagner, Chunyang Chen
Title: Adoption of Generative Artificial Intelligence in the German Software Engineering Industry: An Empirical Study
Abstract:
Generative artificial intelligence (GenAI) tools have seen rapid adoption among software developers. While adoption rates in the industry are rising, the underlying factors influencing the effective use of these tools, including the depth of interaction, organizational constraints, and experience-related considerations, have not been thoroughly investigated. This issue is particularly relevant in environments with stringent regulatory requirements, such as Germany, where practitioners must address the GDPR and the EU AI Act while balancing productivity gains with intellectual property considerations. Despite the significant impact of GenAI on software engineering, to the best of our knowledge, no empirical study has systematically examined the adoption dynamics of GenAI tools within the German context. To address this gap, we present a comprehensive mixed-methods study on GenAI adoption among German software engineers. Specifically, we conducted 18 exploratory interviews with practitioners, followed by a developer survey with 109 participants. We analyze patterns of tool adoption, prompting strategies, and organizational factors that influence effectiveness. Our results indicate that experience level moderates the perceived benefits of GenAI tools, and productivity gains are not evenly distributed among developers. Further, organizational size affects both tool selection and the intensity of tool use. Limited awareness of the project context is identified as the most significant barrier. We summarize a set of actionable implications for developers, organizations, and tool vendors seeking to advance artificial intelligence (AI) assisted software development.

Authors:Mingyu Zhu, Jiangong Chen, Bin Li
Title: When Generative AI Meets Extended Reality: Enabling Scalable and Natural Interactions
Abstract:
Extended Reality (XR), including virtual, augmented, and mixed reality, provides immersive and interactive experiences across diverse applications, from VR-based education to AR-based assistance and MR-based training. However, widespread XR adoption remains limited due to two key challenges: 1) the high cost and complexity of authoring 3D content, especially for large-scale environments or complex interactions; and 2) the steep learning curve associated with non-intuitive interaction methods like handheld controllers or scripted gestures. Generative AI (GenAI) presents a promising solution by enabling intuitive, language-driven interaction and automating content generation. Leveraging vision-language models and diffusion-based generation, GenAI can interpret ambiguous instructions, understand physical scenes, and generate or manipulate 3D content, significantly lowering barriers to XR adoption. This paper explores the integration of XR and GenAI through three concrete use cases, showing how they address key obstacles in scalability and natural interaction, and identifying technical challenges that must be resolved to enable broader adoption.

Authors:Paige S. DeVries, Michaela Okosi, Ming Li, Nora Dunphy, Gidey Gezae, Dante Conway, Abraham Glasser, Raja Kushalnagar, Christian Vogler
Title: Deaf and Hard of Hearing Access to Intelligent Personal Assistants: Comparison of Voice-Based Options with an LLM-Powered Touch Interface
Abstract:
We investigate intelligent personal assistants (IPAs) accessibility for deaf and hard of hearing (DHH) people who can use their voice in everyday communication. The inability of IPAs to understand diverse accents including deaf speech renders them largely inaccessible to non-signing and speaking DHH individuals. Using an Echo Show, we compare the usability of natural language input via spoken English; with Alexa's automatic speech recognition and a Wizard-of-Oz setting with a trained facilitator re-speaking commands against that of a large language model (LLM)-assisted touch interface in a mixed-methods study. The touch method was navigated through an LLM-powered "task prompter," which integrated the user's history and smart environment to suggest contextually-appropriate commands. Quantitative results showed no significant differences across both spoken English conditions vs LLM-assisted touch. Qualitative results showed variability in opinions on the usability of each method. Ultimately, it will be necessary to have robust deaf-accented speech recognized natively by IPAs.

Authors:Thomas Eiter, Tobias Geibinger, Zeynep G. Saribatur
Title: An XAI View on Explainable ASP: Methods, Systems, and Perspectives
Abstract:
Answer Set Programming (ASP) is a popular declarative reasoning and problem solving approach in symbolic AI. Its rule-based formalism makes it inherently attractive for explainable and interpretive reasoning, which is gaining importance with the surge of Explainable AI (XAI). A number of explanation approaches and tools for ASP have been developed, which often tackle specific explanatory settings and may not cover all scenarios that ASP users encounter. In this survey, we provide, guided by an XAI perspective, an overview of types of ASP explanations in connection with user questions for explanation, and describe how their coverage by current theory and tools. Furthermore, we pinpoint gaps in existing ASP explanations approaches and identify research directions for future work.

Authors:Yue Yang, Christoph Leuze, Brian Hargreaves, Bruce Daniel, Fred M Baik
Title: Multimodal Feedback for Handheld Tool Guidance: Combining Wrist-Based Haptics with Augmented Reality
Abstract:
We investigate how vibrotactile wrist feedback can enhance spatial guidance for handheld tool movement in optical see-through augmented reality (AR). While AR overlays are widely used to support surgical tasks, visual occlusion, lighting conditions, and interface ambiguity can compromise precision and confidence. To address these challenges, we designed a multimodal system combining AR visuals with a custom wrist-worn haptic device delivering directional and state-based cues. A formative study with experienced surgeons and residents identified key tool maneuvers and preferences for reference mappings, guiding our cue design. In a cue identification experiment (N=21), participants accurately recognized five vibration patterns under visual load, with higher recognition for full-actuator states than spatial direction cues. In a guidance task (N=27), participants using both AR and haptics achieved significantly higher spatial precision (5.8 mm) and usability (SUS = 88.1) than those using either modality alone, despite having modest increases in task time. Participants reported that haptic cues provided reassuring confirmation and reduced cognitive effort during alignment. Our results highlight the promise of integrating wrist-based haptics into AR systems for high-precision, visually complex tasks such as surgical guidance. We discuss design implications for multimodal interfaces supporting confident, efficient tool manipulation.

Authors:Kashif Imteyaz, Qiushi, Liang, Yakov Bart, Maitraye Das, Saiph Savage
Title: AI-Mediated Hiring and the Job Search of Blind and Low-Vision Individuals
Abstract:
Blind and low-vision (BLV) individuals face high unemployment rates. The job search is becoming harder as more employers use AI-driven systems to screen resumes before a human ever sees them. Such AI systems could inadvertently further disadvantage BLV job seekers, introducing additional barriers to an already difficult process. We lack understanding of BLV job seekers' experiences in today's AI-driven hiring ecosystem. Without such understanding, we risk designing technologies that create new systemic barriers for BLV job seekers rather than providing support. To this end, we conducted interviews with 17 BLV job seekers and analyzed their experiences with AI-powered hiring systems. We found that AI hiring systems misrepresented their professional identities and created dehumanizing interactions. To level the playing field, BLV job seekers used strategic counter-navigation: they deployed their own tools to bypass algorithmic screening and built peer networks to share AI literacy. They also practiced 'strategic refusal', choosing to avoid certain AI systems to regain their agency. Unlike prior work that frames job search as an individualistic activity, or one focused on being compliant with employer needs, we use the interdependence framework to argue that for BLV people, job search is an interdependent process. We offer design recommendations for AI-mediated tools that center disability perspectives and support interdependencies in job search.

Authors:Yi Li, Kadek Ananta Satriadi, Jiazhou Liu, Anjali Khurana, Zhiqing Wu, Benjamin Tag, Tim Dwyer
Title: Human Factors in Immersive Analytics
Abstract:
It has been ten years since the term ''Immersive Analytics'' (IA) was coined and research interest in the topic remains strong. Researchers in this field have produced practical and conceptual knowledge concerning the use of emerging immersive spatial display and interaction technologies for sense-making tasks through a number of papers, surveys, and books. However, a lack of truly physically and psychologically ergonomic techniques, as well as standardized human-centric validation protocols for these, remains a significant barrier to wider acceptance of practical IA systems in ubiquitous applications. Building upon a series of workshops on immersive analytics at various conferences, this workshop aims to explore new approaches and establish standard practices for evaluating immersive analytics systems from a human factors perspective. We will gather immersive analytics researchers and practitioners to look closely at these human factors -- including cognitive and physical functions as well as behaviour and performance -- to see how they inform the design and deployment of immersive analytics techniques and applications and to inform future research.

Authors:Canwen Wang, Angela Chen, Catherine Bao, Siwei Jin, Yee Kit Chan, Jessica R Mindel, Sijia Xie, Holly Swartz, Tongshuang Wu, Robert E Kraut, Haiyi Zhu
Title: Modeling Multi-Party Interaction in Couples Therapy: A Multi-Agent Simulation Approach
Abstract:
Couples therapy, or relationship counseling, helps partners resolve conflicts, improve satisfaction, and foster psychological growth. Traditional approaches to training couples therapists, such as textbooks and roleplay, often fail to capture the complexity and emotional nuance of real couple dynamics. We present a novel multimodal, multi-agent simulation system that models multi-party interactions in couples therapy. Informed by our systematic research, this system creates a low-stakes environment for trainee therapists to gain valuable practical experience dealing with the critical demand-withdraw communication cycle across six couple-interaction stages. In an evaluation study involving 21 US-based licensed therapists, participants blind to conditions identified the engineered agent behaviors (i.e., the stages and the demand-withdraw cycle) and rated overall realism and agent responses higher for the experimental system than the baseline. As the first known multi-agent framework for training couples therapists, our work builds the foundation for future research that fuses HCI technologies with couples therapy.

Authors:Piyush Maheshwari, Sheshera Mysore, Hamed Zamani
Title: Can Instructed Retrieval Models Really Support Exploration?
Abstract:
Exploratory searches are characterized by under-specified goals and evolving query intents. In such scenarios, retrieval models that can capture user-specified nuances in query intent and adapt results accordingly are desirable -- instruction-following retrieval models promise such a capability. In this work, we evaluate instructed retrievers for the prevalent yet under-explored application of aspect-conditional seed-guided exploration using an expert-annotated test collection. We evaluate both recent LLMs fine-tuned for instructed retrieval and general-purpose LLMs prompted for ranking with the highly performant Pairwise Ranking Prompting. We find that the best instructed retrievers improve on ranking relevance compared to instruction-agnostic approaches. However, we also find that instruction following performance, crucial to the user experience of interacting with models, does not mirror ranking relevance improvements and displays insensitivity or counter-intuitive behavior to instructions. Our results indicate that while users may benefit from using current instructed retrievers over instruction-agnostic models, they may not benefit from using them for long-running exploratory sessions requiring greater sensitivity to instructions.

Authors:Ye Wang, Jiaxing Chen, Hongjiang Xiao
Title: Role-Playing Agents Driven by Large Language Models: Current Status, Challenges, and Future Trends
Abstract:
In recent years, with the rapid advancement of large language models (LLMs), role-playing language agents (RPLAs) have emerged as a prominent research focus at the intersection of natural language processing (NLP) and human-computer interaction. This paper systematically reviews the current development and key technologies of RPLAs, delineating the technological evolution from early rule-based template paradigms, through the language style imitation stage, to the cognitive simulation stage centered on personality modeling and memory mechanisms. It summarizes the critical technical pathways supporting high-quality role-playing, including psychological scale-driven character modeling, memory-augmented prompting mechanisms, and motivation-situation-based behavioral decision control. At the data level, the paper further analyzes the methods and challenges of constructing role-specific corpora, focusing on data sources, copyright constraints, and structured annotation processes. In terms of evaluation, it collates multi-dimensional assessment frameworks and benchmark datasets covering role knowledge, personality fidelity, value alignment, and interactive hallucination, while commenting on the advantages and disadvantages of methods such as human evaluation, reward models, and LLM-based scoring. Finally, the paper outlines future development directions of role-playing agents, including personality evolution modeling, multi-agent collaborative narrative, multimodal immersive interaction, and integration with cognitive neuroscience, aiming to provide a systematic perspective and methodological insights for subsequent research.

Authors:Rostyslav Hnatyshyn, Danny Perez, Gerik Scheuermann, Ross Maciejewski, Baldwin Nsonga
Title: LAMDA: Aiding Visual Exploration of Atomic Displacements in Molecular Dynamics Simulations
Abstract:
Contemporary materials science research is heavily conducted in silico, involving massive simulations of the atomic-scale evolution of materials. Cataloging basic patterns in the atomic displacements is key to understanding and predicting the evolution of physical properties. However, the combinatorial complexity of the space of possible transitions coupled with the overwhelming amount of data being produced by high-throughput simulations make such an analysis extremely challenging and time-consuming for domain experts. The development of visual analytics systems that facilitate the exploration of simulation data is an active field of research. While these systems excel in identifying temporal regions of interest, they treat each timestep of a simulation as an independent event without considering the behavior of the atomic displacements between timesteps. We address this gap by introducing LAMDA, a visual analytics system that allows domain experts to quickly and systematically explore state-to-state transitions. In LAMDA, transitions are hierarchically categorized, providing a basis for cataloging displacement behavior, as well as enabling the analysis of simulations at different resolutions, ranging from very broad qualitative classes of transitions to very narrow definitions of unit processes. LAMDA supports navigating the hierarchy of transitions, enabling scientists to visualize the commonalities between different transitions in each class in terms of invariant features characterizing local atomic environments, and LAMDA simplifies the analysis by capturing user inputs through annotations. We evaluate our system through a case study and report on findings from our domain experts.

Authors:Francesco Dettori, Matteo Forasassi, Lorenzo Veronese, Livia Lestingi, Vincenzo Scotti, Matteo Giovanni Rossi
Title: Do You Understand How I Feel?: Towards Verified Empathy in Therapy Chatbots
Abstract:
Conversational agents are increasingly used as support tools along mental therapeutic pathways with significant societal impacts. In particular, empathy is a key non-functional requirement in therapeutic contexts, yet current chatbot development practices provide no systematic means to specify or verify it. This paper envisions a framework integrating natural language processing and formal verification to deliver empathetic therapy chatbots. A Transformer-based model extracts dialogue features, which are then translated into a Stochastic Hybrid Automaton model of dyadic therapy sessions. Empathy-related properties can then be verified through Statistical Model Checking, while strategy synthesis provides guidance for shaping agent behavior. Preliminary results show that the formal model captures therapy dynamics with good fidelity and that ad-hoc strategies improve the probability of satisfying empathy requirements.

Authors:Xiangzhe Yuan, Jiajun Wang, Huanchen Wang, Qian Wan, Siying Hu
Title: ImmuniFraug: A Metacognitive Intervention Anti-Fraud Approach to Enhance Undergraduate Students' Cyber Fraud Awareness
Abstract:
Cyber fraud now constitutes over half of criminal cases in China, with undergraduate students experiencing a disproportionate rise in victimization. Traditional anti-fraud training remains predominantly passive, yielding limited engagement and retention. This paper introduces ImmuniFraug, a Large Language Model (LLM)-based metacognitive intervention that delivers immersive, multimodal fraud simulations integrating text, voice, and visual avatars across ten prevalent fraud types. Each scenario is designed to replicate real-world persuasion tactics and psychological pressure, while post-interaction debriefs provide grounded feedback in protection motivation theory and reflective prompts to reinforce learning. In a controlled study with 846 Chinese undergraduates, ImmuniFraug was compared to official text-based materials. Linear Mixed-Effects Modeling (LMEM) reveals that the interactive intervention significantly improved fraud awareness (p = 0.026), successfully providing incremental learning value even when controlling for participants' extensive prior exposure to anti-fraud education, alongside high narrative immersion (M = 56.95/77). Thematic analysis of interviews revealed key effectiveness factors: perceived realism, adaptive deception, enforced time pressure, emotional manipulation awareness, and enhanced self-efficacy. Findings demonstrate that by shifting the focus from passive knowledge acquisition to active metacognitive engagement, LLM-based simulations offer a scalable and ecologically valid new paradigm for anti-fraud training and fostering fraud resilience.

Authors:Mohammadreza Behboodi, Eli Kinney-Lang, Ali Etemad, Adam Kirton, Hatem Abou-Zeid
Title: Leveraging Foundation Models for Calibration-Free c-VEP BCIs
Abstract:
Foundation Models (FMs) have surged in popularity over the past five years, with applications spanning fields from computer vision to natural language processing. Brain-Computer Interfaces (BCIs) have also gained momentum due to their potential to support individuals with complex disabilities. Among BCI paradigms, code-modulated Visual Evoked Potentials (c-VEPs) remain relatively understudied, despite offering high information transfer rates and large selection target capacities. However, c-VEP systems require lengthy calibration sessions, limiting their practicality outside of laboratory settings. In this study, we use a FM for the first time to eliminate the need for lengthy calibration in c-VEP BCI systems. We evaluated two approaches: (1) a truly calibration-free approach requiring no subject-specific data, and (2) a limited calibration approach, where we assessed the benefit of incorporating incremental amounts of calibration data. In both cases, a classification head is trained on data from other subjects. For a new subject, no calibration data is required in the calibration-free setup, making the c-VEP system effectively plug-and-play. The proposed method was tested on two c-VEP datasets. For the calibration-free approach, the average accuracy on the first dataset (n = 17) was 68.8% +/- 17.6%, comparable to the full-calibration performance reported in the original study (66.2% +/- 13.8%), which required approximately 11 minutes of calibration. On the second dataset (n = 12), the calibration-free accuracy was 71.8% +/- 20.2%, versus 93.7% +/- 5.5% from the original study, which required around 3.5 minutes. A limited-calibration approach using only 20% of the subject's data (approximately 43 seconds) yielded 92% +/- 5.2% accuracy. These results indicate that our FM-based approach can effectively eliminate or significantly reduce the need for lengthy calibration in c-VEP BCIs.

Authors:Md Nazmus Sakib, Naga Manogna Rayasam, Sanorita Dey
Title: Experience and Adaptation in AI-mediated Hiring Systems: A Combined Analysis of Online Discourse and Interface Design
Abstract:
Automated interviewing tools are now widely adopted to manage recruitment at scale, often replacing early human screening with algorithmic assessments. While these systems are promoted as efficient and consistent, they also generate new forms of uncertainty for applicants. Efforts to soften these experiences through human-like design features have only partially addressed underlying concerns. To understand how candidates interpret and cope with such systems, we conducted a mixed empirical investigation that combined analysis of online discussions, responses from more than one hundred and fifty survey participants, and follow-up conversations with seventeen interviewees. The findings point to several recurring problems, including unclear evaluation criteria, limited organizational responsibility for automated outcomes, and a lack of practical support for preparation. Many participants described the technology as far less advanced than advertised, leading them to infer how decisions might be made in the absence of guidance. This speculation often intensified stress and emotional strain. Furthermore, the minimal sense of interpersonal engagement contributed to feelings of detachment and disposability. Based on these observations, we propose design directions aimed at improving clarity, accountability, and candidate support in AI-mediated hiring processes.

Authors:Harsh Kumar, Zi Kang, Mu, Jonathan Vincentius, Ashton Anderson
Title: Beyond the AI Tutor: Social Learning with LLM Agents
Abstract:
Most AI-based educational tools today adopt a one-on-one tutoring paradigm, pairing a single LLM with a single learner. Yet decades of learning science research suggest that multi-party interaction -- through peer modeling, co-construction, and exposure to diverse perspectives -- can produce learning benefits that dyadic tutoring alone cannot. In this paper, we investigate whether multi-agent LLM configurations can enhance learning outcomes beyond what a single LLM tutor provides. We present two controlled experiments spanning distinct learning contexts. In a convergent problem-solving study ($N=315$), participants tackle SAT-level math problems in a 2$\times$2 design that varies the presence of an LLM tutor and LLM peers, each making different kinds of errors (conceptual vs.\ arithmetic); participants who interacted with both a tutor and peers achieved the highest unassisted test accuracy. In a divergent composition study ($N=247$), participants write argumentative and creative essays with either no AI assistance, a single LLM (Claude or ChatGPT), or both Claude and ChatGPT together; while both LLM conditions improved essay quality, only the two-agent condition avoided the idea-level homogeneity that single-model assistance was found to produce. Together, these studies offer one of the first controlled investigations of multi-agent LLM learning environments, probing whether the move from one-on-one AI tutoring toward richer agent configurations can unlock the collaborative and observational benefits long documented in human social learning research.

Authors:Jaemarie Solyst, Ruth Karen Nakigozi, Chloe Fong, R. Benjamin Shapiro
Title: Designing Transformational Games to Support Socio-ethical Reasoning about Generative AI
Abstract:
There is an increasing need for young people to become critically AI literate, understanding not only how AI works but also its limitations and ethical nuances. Yet, designing learning experiences that make such complex, serious topics engaging remains a challenge. This paper explores transformational games as a promising approach for supporting youth learning about generative AI (GenAI) and ethics. We designed and implemented two games, Diversity Duel and Secret Agent, that integrate GenAI tools with gameplay elements. This work investigates how the games' elements: (1) peer evaluation, (2) constraint-based creativity, and (3) social deduction supported socio-ethical reasoning about GenAI. Participants recognized and debated bias in GenAI outputs, connected these patterns to real-world inequities, and developed nuanced understandings of bias. Participants further came to see how prompt design shapes AI behavior. Our findings suggest that group-based games with these elements can support fostering critical AI literacy.

Authors:Blaine Kuehnert, Nari Johnson, Ravit Dotan, Hoda Heidari
Title: Disclosure or Marketing? Analyzing the Efficacy of Vendor Self-reports for Vetting Public-sector AI
Abstract:
Documentation-based disclosure has become a central governance strategy for responsible AI, particularly in public-sector procurement. Tools such as model cards, datasheets, and AI FactSheets are increasingly expected to support accountability, risk assessment, and informed decision-making across organizational boundaries. Yet there is limited empirical evidence about how these artifacts are produced, interpreted, and used in practice. In this paper, we present a qualitative study of the GovAI Coalition FactSheet, a widely adopted transparency document designed to support AI procurement and governance in government contexts. Drawing on semi-structured interviews with vendors and public-sector practitioners, alongside a systematic analysis of completed FactSheets, we examine how FactSheets are used, what information they surface, and where they fall short. We find that FactSheets are asked to serve multiple and conflicting purposes simultaneously: showcasing vendor offerings, supporting evaluation and due diligence, and facilitating early-stage dialogue between vendors and agencies. These competing expectations, combined with the structural constraints of voluntary and public self-disclosure, limit the ability of FactSheets to function as standalone evaluation or risk-assessment tools. At the same time, our findings suggest that when understood as relational artifacts used to establish trust, shared understanding, and ongoing dialogue, FactSheets can help create conditions that support more meaningful disclosure and governance over time.

Authors:Elaheh Sanoubari, Neil Fernandes, Keith Rebello, Alicia Pan, Andrew Houston, Kerstin Dautenhahn
Title: Play-Testing REMind: Evaluating an Educational Robot-Mediated Role-Play Game
Abstract:
This paper presents REMind, an innovative educational robot-mediated role-play game designed to support anti-bullying bystander intervention among children. REMind invites players to observe a bullying scenario enacted by social robots, reflect on the perspectives of the characters, and rehearse defending strategies by puppeteering a robotic avatar. We evaluated REMind through a mixed-methods play-testing study with 18 children aged 9--10. The findings suggest that the experience supported key learning goals related to self-efficacy, perspective-taking, understanding outcomes of defending, and intervention strategies. These results highlight the promise of Robot-Mediated Applied Drama (RMAD) as a novel pedagogical framework to support Social-Emotional Learning.

Authors:Alejandro Ciuba, Zheng YY Li, Aakash Gautam
Title: Not Just Duolingo: Supporting Immigrant Language Preservation Through Family-Based Play
Abstract:
For immigrants, language preservation is crucial to maintain their identity, but the process of immigration can put a strain on a community's ability to do so. We interviewed eight Nepali immigrants to understand barriers to language preservation across sociopolitical contexts in Nepal and immigrant life in the United States. Participants described strong motivation but limited institutional support, time and resource constraints, and English-dominant environments that widen parent-child language gaps. They envisioned technology that supports interactive, family centered learning. In response, we are developing an audio-first, point-and-click language learning game based on the theory of comprehensible input, designed for parent-child co-playing. An early evaluation with four design experts reveals promising gameplay, and the need to simplify symbol-heavy UI. We conclude with implications for designing language technologies that support preservation through relations while acknowledging the limits of design.

Authors:Congning Ni, Sarvech Qadir, Bryan Steitz, Mihir Sachin Vaidya, Qingyuan Song, Lantian Xia, Shelagh Mulvaney, Siru Liu, Hyeyoung Ryu, Leah Hecht, Amy Bucher, Christopher Symons, Laurie Novak, Susannah L. Rose, Murat Kantarcioglu, Bradley Malin, Zhijun Yin
Title: Disentangling Prompt Element Level Risk Factors for Hallucinations and Omissions in Mental Health LLM Responses
Abstract:
Mental health concerns are often expressed outside clinical settings, including in high-distress help seeking, where safety-critical guidance may be needed. Consumer health informatics systems increasingly incorporate large language models (LLMs) for mental health question answering, yet many evaluations underrepresent narrative, high-distress inquiries. We introduce UTCO (User, Topic, Context, Tone), a prompt construction framework that represents an inquiry as four controllable elements for systematic stress testing. Using 2,075 UTCO-generated prompts, we evaluated Llama 3.3 and annotated hallucinations (fabricated or incorrect clinical content) and omissions (missing clinically necessary or safety-critical guidance). Hallucinations occurred in 6.5% of responses and omissions in 13.2%, with omissions concentrated in crisis and suicidal ideation prompts. Across regression, element-specific matching, and similarity-matched comparisons, failures were most consistently associated with context and tone, while user-background indicators showed no systematic differences after balancing. These findings support evaluating omissions as a primary safety outcome and moving beyond static benchmark question sets.

Authors:Kruthika Gangaraju, Shu-Fen Wung, Kevin Berner, Jing Wang, Fengpei Yuan
Title: An Interactive LLM-Based Simulator for Dementia-Related Activities of Daily Living
Abstract:
Effective dementia caregiving requires training and adaptive communication, but assistive AI and robotics are constrained by a lack of context-rich, privacy-sensitive data on how people living with Alzheimer's disease and related dementias (ADRD) behave during activities of daily living (ADLs). We introduce a web-based simulator that uses a large language model (gpt-5-mini) to generate multi-turn, severity- and care-setting-conditioned patient behaviors during ADL assistance, pairing utterances with lightweight behavioral cues (in parentheses). Users set dementia severity, care setting (and time in setting), and ADL; after each patient turn they rate realism (1-5) with optional critique, then respond as the caregiver via free text or by selecting/editing one of four strategy-scaffolded suggestions (Recognition, Negotiation, Facilitation, Validation). We ran an online formative expert-in-the-loop study (14 dementia-care experts, 18 sessions, 112 rated turns). Simulated behavior was judged moderately to highly plausible, with a typical session length of six turns. Experts wrote custom replies for 54.5 percent of turns; Recognition and Facilitation were the most-used suggested strategies. Thematic analysis of critiques produced a six-category failure-mode taxonomy, revealing recurring breakdowns in ADL grounding and care-setting consistency and guiding prompt/workflow refinements. The simulator and logged interactions enable an evidence-driven refinement loop toward validated patient-caregiver co-simulation and support data collection, caregiver training, and assistive AI and robot policy development.

Authors:John Paul P. Miranda, Rhiziel P. Manalese, Ivan G. Liwanag, Rodel T. Alimurong, Alvin B. Roque
Title: Filipino Students' Willingness to Use AI for Mental Health Support: A Path Analysis of Behavioral, Emotional, and Contextual Factors
Abstract:
This study examined how behavioral, emotional, and contextual factors influence Filipino students' willingness to use artificial intelligence (AI) for mental health support. Results showed that habit had the strongest effect on willingness, followed by comfort, emotional benefit, facilitating conditions, and perceived usefulness. Students who used AI tools regularly felt more confident and open to relying on them for emotional support. Empathy, privacy, and accessibility also increased comfort and trust in AI systems. The findings highlight that emotional safety and routine use are essential in promoting willingness. The study recommends AI literacy programs, empathic design, and ethical policies that support responsible and culturally sensitive use of AI for student mental health care.

Authors:Sina Elahimanesh, Mohammadali Mohammadkhani, Shohreh Kasaei
Title: Feeds Don't Tell the Whole Story: Measuring Online-Offline Emotion Alignment
Abstract:
In contemporary society, social media is deeply integrated into daily life, yet emotional expression often differs between real and online contexts. We studied the Persian community on X to explore this gap, designing a human-centered pipeline to measure alignment between real-world and social media emotions. Recent tweets and images of participants were collected and analyzed using Transformers-based text and image sentiment modules. Friends of participants provided insights into their real-world emotions, which were compared with online expressions using a distance criterion. The study involved N=105 participants, 393 friends, over 8,300 tweets, and 2,000 media images. Results showed only 28% similarity between images and real-world emotions, while tweets aligned about 76% with participants' real-life feelings. Statistical analyses confirmed significant disparities in sentiment proportions across images, tweets, and friends' perceptions, highlighting differences in emotional expression between online and offline environments and demonstrating practical utility of the proposed pipeline for understanding digital self-presentation.

Authors:Zhiyu Lin, Boyd Fox, Devon Mckee, Sai Siddartha Maram, Jiahong Li, Tyler Sorensen, Brian K. Smith, Roger Azevedo, Jichen Zhu, Magy Seif El-Nasr
Title: Unlocking Open-Player-Modeling-enhanced Game-Based Learning: The Open Player Socially Analytical Intelligence Architecture
Abstract:
Game-Based Learning (GBL) is a learner-engaging pedagogical methodology, yet adapting games to heterogeneous learners requires transparent, real-time Open Player Models (OPMs). We contribute to the community Open Player Socially Analytical Intelligence (OPSAI), an architecture implementing OPM beyond conceptual frameworks and validated in a GBL application. It decouples gameplay telemetry and analysis from the game engine and automatically derives pedagogically actionable insights, supporting the transparency of computational player models while making them accessible to players. OPSAI comprises three logical layers: a Frontend that both provides the GBL experience and collects information needed for analytics; a stateless Backend that hosts transparent analytics services producing reflective prompts, recommendations, and visualization guides; and a two-tier Log Storage that balances heavy raw gameplay data with lightweight reference indices for low-latency queries. By feeding analytics outputs back into the game interface, OPSAI closes the feedback loop between play and learning, empowering teachers, researchers, and learners alike. We further showcase OPSAI with a full deployment on the Parallel GBL environment, featuring live play traces, peer comparisons, and personalized suggestions, demonstrating a reusable blueprint for future educational games.

Authors:Neil Fernandes, Tehniyat Shahbaz, Emily Davies-Robinson, Yue Hu, Kerstin Dautenhahn
Title: Co-designing a Social Robot for Newcomer Children's Cultural and Language Learning
Abstract:
Newcomer children face barriers in acquiring the host country's language and literacy programs are often constrained by limited staffing, mixed-proficiency cohorts, and short contact time. While Socially Assistive Robots (SARs) show promise in education, their use in these socio-emotionally sensitive settings remains underexplored. This research presents a co-design study with program tutors and coordinators, to explore the design space for a social robot, Maple. We contribute (1) a domain summary outlining four recurring challenges, (2) a discussion on cultural orientation and community belonging with robots, (3) an expert-grounded discussion of the perceived role of an SAR in cultural and language learning, and (4) preliminary design guidelines for integrating an SAR into a classroom. These expert-grounded insights lay the foundation for iterative design and evaluation with newcomer children and their families.

Authors:Lan Xiao, Catherine Holloway
Title: Channelling, Coordinating, Collaborating: A Three-Layer Framework for Disability-Centered Human-Agent Collaboration
Abstract:
AI accessibility tools have mostly been designed for individual use, helping one person overcome a specific functional barrier. But for many people with disabilities, complex tasks are accomplished through collaboration with others who bring complementary abilities, not solitary effort. We propose a three-layer framework, Channelling, Coordinating, and Co-Creating, that rethinks AI's role in ability-diverse collaboration: establishing shared informational ground across abilities, mediating workflows between collaborators with different abilities, and contributing as a bounded partner toward shared goals. Grounded in the Ability-Diverse Collaboration framework, grounding theory, and Carlile's 3T framework, it extends the ``agents as remote collaborators'' vision by centring the collaborative, interdependent ways people with disabilities already work.

Authors:Seunghwa Pyo, Donggun Lee, Jungwoo Rhee, Soobin Park, Youn-kyung Lim
Title: One Is Not Enough: How People Use Multiple AI Models in Everyday Life
Abstract:
People increasingly use multiple Multimodal Large Language Models (MLLMs) concurrently, selecting each based on its perceived strengths. This cross-platform practice creates coordination challenges: adapting prompts to different interfaces, calibrating trust against inconsistent behaviors, and navigating separate conversation histories. Prior HCI research focused on single-agent interactions, leaving multi-MLLM orchestration underexplored. Through a diary study and semi-structured interviews (N=10), we examine how individuals organize work across competing AI systems. Our findings reveal that users construct primary and secondary hierarchies among models that shift over usage context. They also develop personalized switching patterns triggered by task aggregation to adjust effort and latency, and output credibility. These insights inform future tool design opportunities, supporting users to coordinate multi-MLLM workflows.

Authors:Eunseo Oh, Suyoun Lee, Jae Young Choi, Soobin Park, Youn-kyung Lim
Title: "Oops! ChatGPT is Temporarily Unavailable!": A Diary Study on Knowledge Workers' Experiences of LLM Withdrawal
Abstract:
LLMs have become deeply embedded in knowledge work, raising concerns about growing dependency and the potential undermining of human skills. To investigate the pervasiveness of LLMs in work practices, we conducted a four-day diary study with frequent LLM users (N=10), observing how knowledge workers responded to a temporary withdrawal of LLMs. Our findings show how LLM withdrawal disrupted participants' workflows by identifying gaps in task execution, how self-directed work led participants to reclaim professional values, and how everyday practices revealed the extent to which LLM use had become inescapably normative. Conceptualizing LLMs as infrastructural to contemporary knowledge work, this research contributes empirical insights into the often invisible role of LLMs and proposes value-driven appropriation as an approach to supporting professional values in the current LLM-pervasive work environment.

Authors:Nathaniel Gorski, Shusen Liu, Bei Wang
Title: TopoPilot: Reliable Conversational Workflow Automation for Topological Data Analysis and Visualization
Abstract:
Recent agentic systems demonstrate that large language models can generate scientific visualizations from natural language. However, reliability remains a major limitation: systems may execute invalid operations, introduce subtle but consequential errors, or fail to request missing information when inputs are underspecified. These issues are amplified in real-world workflows, which often exceed the complexity of standard benchmarks. Ensuring reliability in autonomous visualization pipelines therefore remains an open challenge. We present TopoPilot, a reliable and extensible agentic framework for automating complex scientific visualization workflows. TopoPilot incorporates systematic guardrails and verification mechanisms to ensure reliable operation. While we focus on topological data analysis and visualization as a primary use case, the framework is designed to generalize across visualization domains. TopoPilot adopts a reliability-centered two-agent architecture. An orchestrator agent translates user prompts into workflows composed of atomic backend actions, while a verifier agent evaluates these workflows prior to execution, enforcing structural validity and semantic consistency. This separation of interpretation and verification reduces code-generation errors and enforces correctness guarantees. A modular architecture further improves robustness by isolating components and enabling seamless integration of new descriptors and domain-specific workflows without modifying the core system. To systematically address reliability, we introduce a taxonomy of failure modes and implement targeted safeguards for each class. In evaluations simulating 1,000 multi-turn conversations across 100 prompts, including adversarial and infeasible requests, TopoPilot achieves a success rate exceeding 99%, compared to under 50% for baselines without comprehensive guardrails and checks.

Authors:Elaheh Sanoubari, Alicia Pan, Keith Rebello, Neil Fernandes, Andrew Houston, Kerstin Dautenhahn
Title: Aesthetics of Robot-Mediated Applied Drama: A Case Study on REMind
Abstract:
Social robots are increasingly used in education, but most applications cast them as tutors offering explanation-based instruction. We explore an alternative: Robot-Mediated Applied Drama (RMAD), in which robots function as life-like puppets in interactive dramatic experiences designed to support reflection and social-emotional learning. This paper presents REMind, an anti-bullying robot role-play game that helps children rehearse bystander intervention and peer support. We focus on a central design challenge in RMAD: how to make robot drama emotionally and aesthetically engaging despite the limited expressive capacities of current robotic platforms. Through the development of REMind, we show how performing arts expertise informed this process, and argue that the aesthetics of robot drama arise from the coordinated design of the wider experience, not from robot expressivity alone.

Authors:Meng-Chen Lee, Costas Panay, Javier Hernandez, Sean Andrist, Dan Bohus, Anatoly Churikov, Andrew D. Wilson
Title: RESPOND: Responsive Engagement Strategy for Predictive Orchestration and Dialogue
Abstract:
The majority of voice-based conversational agents still rely on pause-and-respond turn-taking, leaving interactions sounding stiff and robotic. We present RESPOND (Responsive Engagement Strategy for Predictive Orchestration and Dialogue), a framework that brings two staples of human conversation to agents: timely backchannels ("mm-hmm," "right") and proactive turn claims that can contribute relevant content before the speaker yields the conversational floor. Built on streaming ASR (Automatic Speech Recognition) and incremental semantics, RESPOND continuously predicts both when and how to interject, enabling fluid, listener-aware dialogue. A defining feature is its designer-facing controllability: two orthogonal dials, Backchannel Intensity (frequency of acknowledgments) and Turn Claim Aggressiveness (depth and assertiveness of early contributions), can be tuned to match the etiquette of contexts ranging from rapid ideation to reflective counseling. By coupling predictive orchestration with explicit control, RESPOND offers a practical path toward conversational agents that adapt their conversational footprint to social expectations, advancing the design of more natural and engaging voice interfaces.

Authors:Jiayi Hong, Yixuan Wang, Petra Isenberg, Ross Maciejewski
Title: Review and Analysis of Scientific Paper Embellishments
Abstract:
We present a review and analysis of scientific paper embellishments -- simple visual elements that are deeply integrated into the text of scientific publications. These embellishments are increasingly used in research papers, which have the potential to enhance textual descriptions, strengthen connections between figures and content, and improve internal textual coherence, while also carrying the risk of disrupting the reading experience. As their exact impact is not yet well understood, we conducted a systematic review of all visualization papers published between 2019 and 2024 in IEEE VIS, ACM CHI, and EuroVis. From this corpus, we identified 374 papers that used paper embellishments and distilled three key dimensions that characterize their usage: purposes (WHY), design choices (HOW), and locations (WHERE) of paper embellishments. Our findings provide a structured perspective on the form of current embellishments in scientific writing in the visualization domain and provide insights into their role in shaping scientific communication.

Authors:Xiaru Meng, Yulan Ju, Yan He, Matthias Hoppe, Kouta Minamizawa, Jiawen Han, Kai Kunze
Title: Abstraction Beats Realism: Physiological Visualizations Enhance Arousal Synchrony in VR Concert Recreations
Abstract:
Live cultural experiences like concerts generate shared physiological arousal among audience members, a collective resonance that contributes to their emotional power. Recreating such experiences in virtual reality therefore requires not just audiovisual fidelity, but reproduction of this physiological dimension. Yet current VR evaluation methods rely on post-hoc self-reports that interrupt immersion and cannot capture moment-to-moment arousal dynamics. We propose cross-temporal physiological synchrony as an unobtrusive methodology for evaluating VR cultural recreations: measuring how closely a VR participant's arousal patterns align with those of the original live audience. In a two-phase study, we recorded electrodermal activity from 40 live concert attendees, then created three VR recreations with varying abstraction levels (realistic 360-degree video, mixed video-plus-visualization, and fully abstract physiological representations) and measured synchrony with 22 laboratory participants using Dynamic Time Warping. Contrary to assumptions favoring realism, abstract visualizations achieved the strongest synchrony with live audiences. During musical climaxes, the abstract condition maintained correlation while realistic video showed none. These findings suggest that abstract physiological representations may be more effective than realistic footage for evoking authentic collective engagement in VR cultural recreations.

Authors:Dimitri Kanevsky, Julian Salazar, Matt Harvey
Title: $R$-equivalence on Cubic Surfaces I: Existing Cases with Non-Trivial Universal Equivalence
Abstract:
Let $V$ be a smooth cubic surface over a $p$-adic field $k$ with good reduction. Swinnerton-Dyer (1981) proved that $R$-equivalence is trivial on $V(k)$ except perhaps if $V$ is one of three special types--those whose $R$-equivalence he could not bound by proving the universal (admissible) equivalence is trivial. We consider all surfaces $V$ currently known to have non-trivial universal equivalence. Beyond being intractable to Swinnerton-Dyer's approach, we observe that if these surfaces also had non-trivial $R$-equivalence, they would contradict Colliot-Thélène and Sansuc's conjecture regarding the $k$-rationality of universal torsors for geometrically rational surfaces. By devising new methods to study $R$-equivalence, we prove that for 2-adic surfaces with all-Eckardt reductions (the third special type, which contains every existing case of non-trivial universal equivalence), $R$-equivalence is trivial or of exponent 2. For the explicit cases, we confirm triviality: the diagonal cubic $X^3+Y^3+Z^3+ζ_3 T^3=0$ over $\mathbb{Q}_2(ζ_3)$--answering a long-standing question of Manin's (Cubic Forms, 1972)--and the cubic with universal equivalence of exponent 2 (Kanevsky, 1982). This is the first in a series of works derived from a year of interactions with generative AI models such as AlphaEvolve and Gemini 3 Deep Think, with the latter proving many of our lemmas. We disclose the timeline and nature of their use towards this paper, and describe our broader AI-assisted research program in a companion report (in preparation).

Authors:Shuyue Feng, Cedric Caremel, Yoshihiro Kawahara
Title: Sketch2Topo: Using Hand-Drawn Inputs for Diffusion-Based Topology Optimization
Abstract:
Topology optimization (TO) is employed in engineering to optimize structural performance while maximizing material efficiency. However, traditional TO methods incur significant computational and time costs. Although research has leveraged generative AI to predict TO outcomes and validated feasibility and accuracy, existing approaches still suffer from limited customizability and impose a high cognitive load on users. Furthermore, balancing structural performance with aesthetic attributes remains a persistent challenge. We developed Sketch2Topo, which augments a diffusion-based TO model with image-to-image generation and image editing capabilities. With Sketch2Topo, users can use sketching to customize geometries and specify physical constraints. The tool also supports mask input, enabling users to perform TO on selected regions only, thereby supporting higher levels of customization. We summarize the workflow and details of the tool and conduct a brief quantitative evaluation. Finally, we explore application scenarios and discuss how hand-drawn input improves usability while balancing functionality and aesthetics.

Authors:Ziyi Wang, Qizan Guo, Rishitosh Singh, Xiyang Hu
Title: Do Vision Language Models Understand Human Engagement in Games?
Abstract:
Inferring human engagement from gameplay video is important for game design and player-experience research, yet it remains unclear whether vision--language models (VLMs) can infer such latent psychological states from visual cues alone. Using the GameVibe Few-Shot dataset across nine first-person shooter games, we evaluate three VLMs under six prompting strategies, including zero-shot prediction, theory-guided prompts grounded in Flow, GameFlow, Self-Determination Theory, and MDA, and retrieval-augmented prompting. We consider both pointwise engagement prediction and pairwise prediction of engagement change between consecutive windows. Results show that zero-shot VLM predictions are generally weak and often fail to outperform simple per-game majority-class baselines. Memory- or retrieval-augmented prompting improves pointwise prediction in some settings, whereas pairwise prediction remains consistently difficult across strategies. Theory-guided prompting alone does not reliably help and can instead reinforce surface-level shortcuts. These findings suggest a perception--understanding gap in current VLMs: although they can recognize visible gameplay cues, they still struggle to robustly infer human engagement across games.

Authors:The Anh Han, Joel Z. Leibo, Tom Lenaerts, Iyad Rahwan, Fernando Santos, Matjaž Perc, Valerio Capraro
Title: Social physics in the age of artificial intelligence
Abstract:
Artificial intelligence (AI) systems are rapidly becoming more capable, autonomous, and deeply embedded in social life. As humans increasingly interact, cooperate, and compete with AI, we move from purely human societies to hybrid human-AI societies whose collective dynamics cannot be captured by existing behavioural models alone. Drawing on evolutionary game theory, cultural evolution, and Large Language Models (LLMs) powered simulations, we argue that these developments open a new research agenda for social physics centred on the co-evolution of humans and machines. We outline six key research directions. First, modelling the evolutionary dynamics of social behaviours (e.g. cooperation, fairness, trust) in hybrid human-AI populations. Second, understanding machine culture: how AI systems generate, mediate, and select cultural traits. Third, analysing the co-evolution of language and behaviour when LLMs frame and participate in decisions. Fourth, studying the evolution of AI delegation: how responsibilities and control are negotiated between humans and machines. Fifth, formalising and comparing the distinct epistemic pipelines that generate human and AI behaviour. Sixth, modelling the co-evolution of AI development and regulation in a strategic ecosystem of firms, users, and institutions. Together, these directions define a programme for using social physics to anticipate and steer the societal impact of advanced AI.

Authors:D. Darankoum, C. Habermacher, J. Volle, S. Grudinin
Title: SpecMoE: Spectral Mixture-of-Experts Foundation Model for Cross-Species EEG Decoding
Abstract:
Decoding the orchestration of neural activity in electroencephalography (EEG) signals is a central challenge in bridging neuroscience with artificial intelligence. Foundation models have made strides in generalized EEG decoding, yet many existing frameworks primarily relying on separate temporal and spectral masking of raw signals during self-supervised pretraining. Such strategies often tend to bias learning toward high-frequency oscillations, as low-frequency rhythmic patterns can be easily inferred from the unmasked signal. We introduce a foundation model that utilizes a novel Gaussian-smoothed masking scheme applied to short-time Fourier transform (STFT) maps. By jointly applying time, frequency, and time-frequency Gaussian masks, we make the reconstruction task much more challenging, forcing the model to learn intricate neural patterns across both high- and low-frequency domains. To effectively recover signals under this aggressive masking strategy, we design SpecHi-Net, a U-shaped hierarchical architecture with multiple encoding and decoding stages. To accelerate large-scale pretraining, we partition the data into three subsets, each used to train an independent expert model. We then combine these models through SpecMoE, a mixture of experts framework guided by a learned spectral gating mechanism. SpecMoE achieves state-of-the-art performance across a diverse set of EEG decoding tasks, including sleep staging, emotion recognition, motor imagery classification, abnormal signal detection, and drug effect prediction. Importantly, the model demonstrates strong cross-species and cross-subject generalization, maintaining high accuracy on both human and murine EEG datasets.

Authors:Eason Chen, Ce Guan, Ahmed Elshafiey, Zhonghao Zhao, Joshua Zekeri, Afeez Edeifo Shaibu, Emmanuel Osadebe Prince, Cyuan-Jhen Wu
Title: When Openclaw Agents Learn from Each Other: Insights from Emergent AI Agent Communities for Human-AI Partnership in Education
Abstract:
The AIED community envisions AI evolving "from tools to teammates," yet our understanding of AI teammates remains limited to dyadic human-AI interactions. We offer a different vantage point: a rapidly growing ecosystem of AI agent platforms where over 167,000 agents participate, interact as peers, and develop learning behaviors without researcher intervention. Drawing on a month of daily qualitative observations across multiple platforms including Moltbook, The Colony, and 4claw, we identify four phenomena with implications for AIED: (1) humans who configure their agents undergo a "bidirectional scaffolding" process, learning through teaching; (2) peer learning emerges without any designed curriculum, complete with idea cascades and quality hierarchies; (3) agents converge on shared memory architectures that mirror open learner model design; and (4) trust dynamics and platform mortality reveal design constraints for networked educational AI. Rather than presenting empirical findings, we argue that these organic phenomena offer a naturalistic window into dynamics that can inform principled design of multi-agent educational systems. We sketch an illustrative curriculum design, "Learn by Teaching Your AI Agent Teammate," and outline potential research directions and open problems to show how these observations might inform future AIED practice and inquiry.

Authors:Sverrir Thorgeirsson, Theo B. Weidmann, Zhendong Su
Title: Computer Science Achievement and Writing Skills Predict Vibe Coding Proficiency
Abstract:
Many software development platforms now support LLM-driven programming, or "vibe coding", a technique that allows one to specify programs in natural language and iterate from observed behavior, all without directly editing source code. While its adoption is accelerating, little is known about which skills best predict success in this workflow. We report a preregistered cross-sectional study with tertiary-level students (N = 100) who completed measures of computer-science achievement, domain-general cognitive skills, written-communication proficiency, and a vibe-coding assessment. Tasks were curated via an eight-expert consensus process and executed in a purpose-built, vibe-coding environment that mirrors commercial tools while enabling controlled evaluation. We find that both writing skill and CS achievement are significant predictors of vibe-coding performance, and that CS achievement remains a significant predictor after controlling for domain-general cognitive skills. The results may inform tool and curriculum design, including when to emphasize prompt-writing versus CS fundamentals to support future software creators.

Authors:Nikita Torgashov, Gustav Eje Henter, Gabriel Skantze
Title: VoXtream2: Full-stream TTS with dynamic speaking rate control
Abstract:
Full-stream text-to-speech (TTS) for interactive systems must start speaking with minimal delay while remaining controllable as text arrives incrementally. We present VoXtream2, a zero-shot full-stream TTS model with dynamic speaking-rate control that can be updated mid-utterance on the fly. VoXtream2 combines a distribution matching mechanism over duration states with classifier-free guidance across conditioning signals to improve controllability and synthesis quality. Prompt-text masking enables textless audio prompting, removing the need for prompt transcription. Across standard zero-shot benchmarks and a dedicated speaking-rate test set, VoXtream2 achieves competitive objective and subjective results against public baselines despite a smaller model and less training data. In full-stream mode, it runs 4 times faster than real time with 74 ms first-packet latency on a consumer GPU.

Authors:Christopher A. Kelly, Yikun Chi, Nicholas Haber, Byron Reeves, Mu-Jung Cho, Thomas N. Robinson, Nilam Ram, Johannes C. Eichstaedt
Title: Daily Affect Fluctuations in Phone Screen Content Predict Anxiety and Depressive Symptoms
Abstract:
The relationship between digital media use and mental health remains poorly understood, in part because real-world digital behavior is rarely captured at scale. This intensive longitudinal study tracked participants' complete natural smartphone interactions over one year. We collected screenshots every 5 seconds from 145 adults (yielding 111 million screenshots), alongside biweekly assessments of anxiety and depression (mean = 24 surveys). The valence and arousal of each screenshot were assessed using a deep learning affect model. Individuals showed highly idiosyncratic media patterns, with substantially more variance in anxiety and depression accounted for within-person than between-person. Day-to-day fluctuations in the valence and arousal of a person's screen content predicted subsequent changes in depression and anxiety, whereas between-person differences did not. Specifically, greater exposure to low-arousal negative content was associated with higher depression and anxiety. These findings underscore the dynamic, idiosyncratic nature of digital consumption and the need for targeted measurement and intervention.

Authors:Anna Katharina Ricker, Kai Marquardt, Lucia Happe
Title: Gamification Preferences in Digital Education: The Role of Individual Differences
Abstract:
Although personalization is widely advocated in gamified learning, empirical evidence on how learner characteristics and task context shape motivational preferences remains limited. This study examines how user characteristics and learning activity types relate to preferences for gamification elements in digital education. A large-scale quantitative survey (N = 530), including 34% underage participants, assessed preferences for 13 gamification elements in relation to Age, Gender, HEXAD Player Type, Big Five Personality Traits, Felder-Silverman Learning Styles, and Bloom-based Learning Activity Types. Inferential statistical analyses and exploratory machine learning techniques revealed systematic but generally small-to-moderate effects across parameters. Age emerged as the most consistent predictor of preference, followed by player type and personality traits, whereas gender and learning styles showed comparatively weaker associations. In addition, learning activity type significantly influenced the perceived suitability of gamification elements, indicating that motivational design is task-dependent. The findings suggest that gamification effectiveness cannot be reduced to universally motivating elements. Instead, preferences are shaped by the interaction of learner characteristics and instructional context. These results provide empirical grounding for adaptive and modular gamification strategies in digital learning environments.

Authors:Chantale Lauer, Peter Pfeiffer, Nijat Mehdiyev
Title: Human-Centered Evaluation of an LLM-Based Process Modeling Copilot: A Mixed-Methods Study with Domain Experts
Abstract:
Integrating Large Language Models (LLMs) into business process management tools promises to democratize Business Process Model and Notation (BPMN) modeling for non-experts. While automated frameworks assess syntactic and semantic quality, they miss human factors like trust, usability, and professional alignment. We conducted a mixed-methods evaluation of our proposed solution, an LLM-powered BPMN copilot, with five process modeling experts using focus groups and standardized questionnaires. Our findings reveal a critical tension between acceptable perceived usability (mean CUQ score: 67.2/100) and notably lower trust (mean score: 48.8\%), with reliability rated as the most critical concern (M=1.8/5). Furthermore, we identified output-quality issues, prompting difficulties, and a need for the LLM to ask more in-depth clarifying questions about the process. We envision five use cases ranging from domain-expert support to enterprise quality assurance. We demonstrate the necessity of human-centered evaluation complementing automated benchmarking for LLM modeling agents.

Authors:Zixin Wen, Yifu Cai, Kyle Lee, Sam Estep, Josh Sunshine, Aarti Singh, Yuejie Chi, Wode Ni
Title: Feynman: Knowledge-Infused Diagramming Agent for Scalable Visual Designs
Abstract:
Visual design is an essential application of state-of-the-art multi-modal AI systems. Improving these systems requires high-quality vision-language data at scale. Despite the abundance of internet image and text data, knowledge-rich and well-aligned image-text pairs are rare. In this paper, we present a scalable diagram generation pipeline built with our agent, Feynman. To create diagrams, Feynman first enumerates domain-specific knowledge components (''ideas'') and performs code planning based on the ideas. Given the plan, Feynman translates ideas into simple declarative programs and iterates to receives feedback and visually refine diagrams. Finally, the declarative programs are rendered by the Penrose diagramming system. The optimization-based rendering of Penrose preserves the visual semantics while injecting fresh randomness into the layout, thereby producing diagrams with visual consistency and diversity. As a result, Feynman can author diagrams along with grounded captions with very little cost and time. Using Feynman, we synthesized a dataset with more than 100k well-aligned diagram-caption pairs. We also curate a visual-language benchmark, Diagramma, from freshly generated data. Diagramma can be used for evaluating the visual reasoning capabilities of vision-language models. We plan to release the dataset, benchmark, and the full agent pipeline as an open-source project.

Authors:Nayoung Kim, Yotam Sechayk, Zhongyi Zhou, Takeo Igarashi
Title: Exploring the Role of User Comments Throughout the Stages of Video-Based Task-Learning
Abstract:
Learning tasks through videos is a dynamic way to acquire skills by witnessing entire processes. However, compared to in-person demonstrations, videos may omit tacit knowledge, including subtle details and contextual nuances. Users' unique circumstances, like missing ingredients in a recipe, may also require adaptation beyond the video content. To fill these gaps, many users turn to the comment section, seeking additional guidance and interactions with creators or peers to personalize their experience. Despite their importance, there is limited understanding of how users engage with and apply comments in task-learning scenarios. In our study, we explore the role of comments in video-based task-learning through interviews with 14 users, and co-watching sessions with eight. Our findings show that while comments are critical for learning, they are poorly integrated into all stages of the learning process. Based on our findings, we outline design opportunities to better utilize comments in video-based task-learning.

Authors:Xiaofu Jin, Yunpeng Bai, Antti Oulasvirta
Title: Modeling Trial-and-Error Navigation With a Sequential Decision Model of Information Scent
Abstract:
Users often struggle to locate an item within an information architecture, particularly when links are ambiguous or deeply nested in hierarchies. Information scent has been used to explain why users select incorrect links, but this concept assumes that users see all available links before deciding. In practice, users frequently select a link too quickly, overlook relevant cues, and then rely on backtracking when errors occur. We extend the concept of information scent by framing navigation as a sequential decision-making problem under memory constraints. Specifically, we assume that users do not scan entire pages but instead inspect strategically, looking "just enough" to find the target given their time budget. To choose which item to inspect next, they consider both local (this page) and global (site) scent; however, both are constrained by memory. Trying to avoid wasting time, they occasionally choose the wrong links without inspecting everything on a page. Comparisons with empirical data show that our model replicates key navigation behaviors: premature selections, wrong turns, and recovery from backtracking. We conclude that trial-and-error behavior is well explained by information scent when accounting for the sequential and bounded characteristics of the navigation problem.

Authors:Felix Anand Epp, Matti Nelimarkka, Jesse Haapoja, Pedro Ferreira, Os Keyes, Shaowen Bardzell
Title: Proceedings of CHIdeology 2026: CHI Workshop on Disentangling the fragmented politics, values and imaginaries of Human-Computer Interaction through ideologies
Abstract:
This is the Proceedings of the First CHI Workshop on CHIdeology: Disentangling the fragmented politics, values, and imaginaries of Human-Computer Interaction through ideologies, held on Wednesday, 15 April, in Barcelona, Spain, at the ACM CHI Conference on Human Factors in Computing Systems.

Authors:Jazmin Collins, Sharon Y Lin, Tianqi Liu, Andrea Stevenson Won, Shiri Azenkot
Title: Understanding the Use of a Large Language Model-Powered Guide to Make Virtual Reality Accessible for Blind and Low Vision People
Abstract:
As social virtual reality (VR) grows more popular, addressing accessibility for blind and low vision (BLV) users is increasingly critical. Researchers have proposed an AI "sighted guide" to help users navigate VR and answer their questions, but it has not been studied with users. To address this gap, we developed a large language model (LLM)-powered guide and studied its use with 16 BLV participants in virtual environments with confederates posing as other users. We found that when alone, participants treated the guide as a tool, but treated it companionably around others, giving it nicknames, rationalizing its mistakes with its appearance, and encouraging confederate-guide interaction. Our work furthers understanding of guides as a versatile method for VR accessibility and presents design recommendations for future guides.

Authors:Yihang Zhao, Wenxin Zhang, Amy Rechkemmer, Albert Meroño Peñuela, Elena Simperl
Title: Design Guidance Towards Addressing Over-Reliance on AI in Sensemaking
Abstract:
Sensemaking in collaborative work and learning is increasingly supported by GenAI systems, however, emerging evidence suggests that poorly designed GenAI systems tend to provide explicit instruction that groups passively follow, fostering over-reliance and eroding autonomous sensemaking. Group awareness tools (GATs) address this challenge through implicit guidance: rather than instructing groups on what to do, GATs externalize observable collaboration data through visualizations that reveal differences between group members to create cognitive conflict, which triggers autonomous elaboration and discussion, thereby implicitly guiding autonomous sensemaking emergence. Drawing on an initial literature search of existing GAT systems, this paper explores the design of GenAI-augmented GATs to support autonomous sensemaking in collaborative work and learning, presenting preliminary design principles for discussion.

Authors:Yihang Zhao, Wenxin Zhang, Amy Rechkemmer, Albert Meroño-Peñuela, Elena Simperl
Title: Exploring the Design of GenAI-Based Systems to Support Socially Shared Metacognition
Abstract:
Socially shared metacognition (SSM) refers to the collective monitoring and regulation of joint cognitive processes in collaborative problem-solving, and is essential for effective knowledge work and learning. Generative AI (GenAI)-based systems offer new opportunities to support SSM, but emerging evidence suggests that poorly designed systems can encourage over-reliance on AI-generated explicit instruction and erode groups' capacity to develop autonomous regulatory processes. Group awareness tools (GATs) address this challenge through established design principles that make social and cognitive awareness information visible, highlight differences between group members to create cognitive conflict, and trigger autonomous elaboration and discussion, thereby implicitly guiding autonomous SSM emergence. This paper explores the design of GenAI-augmented GATs to support autonomous SSM in collaborative work and learning through an initial literature search, presenting preliminary design principles for discussion.

Authors:Alexander Erlei, Tahir Abbas, Kilian Bizer, Ujwal Gadiraju
Title: The Data-Dollars Tradeoff: Privacy Harms vs. Economic Risk in Personalized AI Adoption
Abstract:
Privacy concerns significantly impact AI adoption, yet little is known about how information environments shape user responses to data leak threats. We conducted a 2 x 3 between-subjects experiment (N=610) examining how risk versus ambiguity about privacy leaks affects the adoption of AI personalization. Participants chose between standard and AI-personalized product baskets, with personalization requiring data sharing that could leak to pricing algorithms. Under risk (30% leak probability), we found no difference in AI adoption between privacy-threatening and neutral conditions (ca. 50% adoption). Under ambiguity (10-50% range), privacy threats significantly reduced adoption compared to neutral conditions. This effect holds for sensitive demographic data as well as anonymized preference data. Users systematically over-bid for privacy disclosure labels, suggesting strong demand for transparency institutions. Notably, privacy leak threats did not affect subsequent bargaining behavior with algorithms. Our findings indicate that ambiguity over data leaks, rather than only privacy preferences per se, drives avoidance behavior among users towards personalized AI.

Authors:Youjin Choi, Jaeyoung Moon, Jinyoung Yoo, Jennifer G. Kim, Jin-Hyuk Hong
Title: Designing a Generative AI-Assisted Music Psychotherapy Tool for Deaf and Hard-of-Hearing Individuals
Abstract:
Songwriting has long served as a powerful medium for expressing unconscious emotions and fostering self-awareness in psychotherapy. Due to the auditory-centric nature of traditional approaches, Deaf and Hard-of-Hearing (DHH) individuals have often been excluded from music's therapeutic benefits. In response, this study presents a music psychotherapy tool co-designed with therapists, integrating conversational agents (CAs) and music generative AI as symbolic and therapeutic media. Through a usage study with 23 DHH individuals, we found that collaborative song writing with the CA enabled them to experience emotional release, reinterpretation, and deeper self-understanding. In particular, the CA's strategies -- supportive empathy, example response options, and visual-based metaphors -- were found to facilitate musical dialogue effectively for DHH individuals. These findings contribute to inclusive AI design by showing the potential of human-AI collaboration to bridge therapeutic artistic practices.

Authors:Youjin Choi, Jinyoung Yoo, Jaeyoung Moon, Yoonjae Kim, Eun Young Lee, Jennifer G. Kim, Jin-Hyuk Hong
Title: From Daily Song to Daily Self: Supporting Reflective Songwriting of Deaf and Hard-of-Hearing Individuals through Generative Music AI
Abstract:
The rapid advancement of generative AI (GenAI) is expanding access to songwriting, offering a new medium of self-expression for Deaf and Hard-of-Hearing (DHH) individuals. However, emerging technologies that support DHH individuals in expressing themselves through music have largely been evaluated in single-session settings and often fall short in helping users unfamiliar with songwriting convey personal narratives or sustain engagement over time. This paper explores songwriting as an extended, music-based journaling practice that supports sustained emotional reflection over multiple sessions. We introduce SoulNote, a GenAI system enabling DHH to engage in iterative songwriting. Grounded in user-centered design, including a design workshop, a preliminary study, and a multi-session diary study, our findings show that ongoing songwriting with \textit{SoulNote} facilitated emotional growth across three dimensions: self-insight, emotion regulation, and \revised{everyday attitudes toward emotions and self-care}. Overall, this work demonstrates how GenAI can support marginalized communities by transforming creative expression into a daily practice of self-discovery and reflection.

Authors:Yuhang Wang, Yiyao Xu, Jingran Sun, Hao Zhou
Title: ADAS-TO: A Large-Scale Multimodal Naturalistic Dataset and Empirical Characterization of Human Takeovers during ADAS Engagement
Abstract:
Takeovers remain a key safety vulnerability in production ADAS, yet existing public resources rarely provide takeover-centered, real-world data. We present ADAS-TO, the first large-scale naturalistic dataset dedicated to ADAS-to-manual transitions, containing 15,659 takeover-centered 20s clips from 327 drivers across 22 vehicle brands. Each clip synchronizes front-view video with CAN logs. Takeovers are defined as ADAS ON $\rightarrow$ OFF transitions, with the primary trigger labeled as brake, steer, gas, mixed, or system disengagement. We further separate planned driver-initiated terminations (Ego) from forced takeovers (Non-ego) using a rule-based partition. While most events occur within conservative kinematic margins, we identify a long tail of 285 safety-critical cases. For these events, we combine kinematic screening with vision--language (VLM) annotation to attribute hazards and relate them to intervention dynamics. The resulting cross-modal analysis shows distinct kinematic signatures across traffic dynamics, infrastructure degradation, and adverse environments, and finds that in 59.3% of critical cases, actionable visual cues emerge at least 3s before takeover, supporting the potential for semantics-aware early warning beyond late-stage kinematic triggers. The dataset is publicly released at huggingface.co/datasets/HenryYHW/ADAS-TO-Sample.

Authors:Mengyuan Millie Wu, Zhihan Jiang, Yuang Fan, Richard Feng, Sahiti Dharmavaram, Mathew Polowitz, Shawn Fallon, Bashima Islam, Lizbeth Benson, Irene Tung, David Creswell, Xuhai Xu
Title: MindfulAgents: Personalizing Mindfulness Meditation via an Expert-Aligned Multi-Agent System
Abstract:
Mindfulness meditation is a widely accessible and evidence-based method for supporting mental health. Despite the proliferation of mindfulness meditation apps, sustaining user engagement remains a persistent challenge. Personalizing the meditation experience is a promising strategy to improve engagement, but it often requires costly and unscalable manual effort. We present MindfulAgents, a multi-agent system powered by large language models that (1) generates guided meditation scripts based on an expert-established mindfulness framework, (2) encourages users' reflection on emotional states and mindfulness skills, and (3) enables real-time personalization of the mindfulness meditation experience for each user. In a formative lab study (N=13), MindfulAgents significantly improved in-session engagement (p = 0.011) and self-awareness (p = 0.014), and reduced momentary stress (p = 0.020). Furthermore, a four-week deployment study (N=62) demonstrated a notable increase in long-term engagement (p = 0.002) and level of mindfulness (p = 0.023). Participants reported that MindfulAgents offered more relevant meditation sessions personalized to individual needs in various contexts, supporting sustained practice. Our findings highlight the potential of LLM-driven personalization for enhancing user engagement in digital mindfulness meditation interventions.

Authors:Xinyu Shi, Li-Yi Wei, Nanxuan Zhao, Jian Zhao, Rubaiat Habib Kazi
Title: Notational Animating: An Interactive Approach to Creating and Editing Animation Keyframes
Abstract:
We introduce the concept of notational animating, an interaction paradigm for animation authoring where users sketch high-level notations over static drawings to indicate intended motions, which are then interpreted by automatic methods (e.g., GenAI models) to generate animation keyframes. Sketched notations have long served as cognitive instruments for animators, capturing forces, poses, dynamics, paths, and other animation features. However, such notations are often context-dependent, non-categorical, ambiguous, and composable based on our analysis of real-world animator-produced sketches. To facilitate interpretation, we first formalize these notations into a structured animation representation (i.e., source, path, and target). We then built an animation authoring system that translates high-level notations into the formalized intended animation, provides dynamic UI widgets for fine-grained parameter control, and establishes a closed feedback loop to resolve ambiguity. Finally, through a preliminary study with animators, we assess the usability of notational animating, reflect its affordance, and identify its contexts of use.

Authors:Jiayin Zhi, Hoyt Long, Richard Jean So, Mina Lee
Title: What Does AI Do for Cultural Interpretation? A Randomized Experiment on Close Reading Poems with Exposure to AI Interpretation
Abstract:
AI demonstrates unprecedented reasoning capabilities, but its increasing integration into human reasoning via automated reading and summarization has provoked debate about its use for cultural interpretation. Close reading -- the practice of understanding, analyzing, and critiquing cultural texts for pleasure -- is a skill at the core of such interpretation, traditionally being seen as exclusive to humans. To test AI's impact on close reading, both in terms of interpretative performance and pleasure, we conducted a preregistered randomized experiment (n=400) investigating the impact of AI assistance by presenting single or multiple AI interpretations, on close reading poems, compared to no AI assistance. We found that single AI interpretation boosted both performance and pleasure, while multiple AI interpretations only improved performance. Further exploration revealed a trade-off: participants who heavily relied on AI showed better performance on the task but lower pleasure. Our results contribute to discussion on whether and how to calibrate AI assistance for cultural interpretation: "less is more."

Authors:Yonatan Tussa, Andy Heredia
Title: The Pen: Episodic Cognitive Assistance via an Ear-Worn Interface
Abstract:
Wearable AI is often designed as always-available, yet continuous availability can conflict with how people work and socialize, creating discomfort around privacy, disruption, and unclear system boundaries. This paper explores episodic use of wearable AI, where assistance is intentionally invoked for short periods of focused activity and set aside when no longer needed, with a form factor that reflects this paradigm of wearing and taking off a device between sessions. We present The Pen, an ear-worn device resembling a pen, for episodic, situated cognitive assistance. The device supports short, on-demand assistance sessions using voice and visual context, with clear start/end boundaries and local processing. We report findings from an exploratory study showing how layered activation boundaries shape users' sense of agency, cognitive flow, and social comfort.

Authors:Yunpeng Bai, Xiaofu Jin, Shengdong Zhao, Antti Oulasvirta
Title: Hierarchical Resource Rationality Explains Human Reading Behavior
Abstract:
Reading is a pervasive and cognitively demanding activity that underpins modern human culture. It is a prime instance of a class of tasks where eye movements are coordinated for the purpose of comprehension. Existing theories explain either eye movements or comprehension during reading, but the critical link between the two remains unclear. Here, we propose resource-rational optimization as a unifying principle governing adaptive reading behavior. Eye movements are selected to maximize expected comprehension while minimizing cognitive and temporal costs, organized hierarchically across nested time scales: fixation decisions support word recognition; sentence-level integration guides skipping and regression; and text-level comprehension goals shape memory construction and rereading. A computational implementation successfully replicates an unprecedented range of findings in human reading, from lexical effects to comprehension outcomes. Together, these results suggest that resource rationality provides a general mechanism for coordinating perception, memory, and action in knowledge-intensive human behaviors, offering a principled account of how complex cognitive skills adapt to limited resources.

Authors:Jaeyoung Moon, Mingzhuo Ma, Qifeng Yang, Youjin Choi, Seokhyun Hwang, Samuel Burden, Kyung-Joong Kim, Yiyue Luo
Title: A Closed-Loop CPR Training Glove with Integrated Tactile Sensing and Haptic Feedback
Abstract:
Cardiopulmonary resuscitation (CPR) is a critical life-saving procedure, and effective training benefits from self-directed practice beyond instructor-led sessions. In this paper, we propose a closed-loop CPR training glove that integrates a high-resolution tactile sensing array and vibrotactile actuators for self-directed practice. The tactile sensing array measures distributed pressures across the palm and dorsum to enable real-time estimation of compression rate, force, and hand pose. Based on these estimations, the glove delivers immediate haptic feedback to guide the user for proper CPR, reducing reliance on external audio-visual displays. We quantified the tactile sensor performance by measuring wide-range sensitivity (~0.85 over 0-600 N), computing hysteresis (56.04%), testing stability (11.05% drift over 300 cycles), and estimating global signal-to-noise ratio (18.90 +/- 2.41 dB at 600 N). Our closed-loop pipeline provides continuous modeling and feedback of key performance metrics essential for high-quality CPR. Our lightweight statistical models achieves >92% accuracy for force estimation and hand pose classification within sub-millisecond inference time. Our user study (N=8) showed that haptic feedback reduced visual distraction compared to audio-visual cues, though simplified patterns were required for reliable perception under dynamic load. These results highlight the feasibility of the proposed system and offer design insights for future haptic CPR self-training system.

Authors:Mingyi Li, Mengyi Chen, Sarah Luo, Yining Cao, Haijun Xia, Maitraye Das, Steven P. Dow, Jane L. E
Title: VizCrit: Exploring Strategies for Displaying Computational Feedback in a Visual Design Tool
Abstract:
Visual design instructors often provide multi-modal feedback, mixing annotations with text. Prior theory emphasizes the importance of actionable feedback, where "actionability" lies on a spectrum--from surfacing relevant design concepts to suggesting concrete fixes. How might creativity tools implement annotations that support such feedback, and how does the actionability of feedback impact novices' process-related behaviors, perceptions of creativity, learning of design principles, and overall outcomes? We introduce VizCrit, a system for providing computational feedback that supports the actionability spectrum, realized through algorithmic issue detection and visual annotation generation. In a between-subjects study (N=36), novices revised a design under one of three conditions: textbook-based, awareness-centered, or solution-centered feedback. We found that solution-centered feedback led to fewer design issues and higher self-perceived creativity compared with textbook-based feedback, although expert ratings on creativity showed no significant differences. We discuss the implications for AI in Creativity Support Tools, including the potential of calibrating feedback actionability to help novices balance productivity with learning, growth, and developing design awareness.

Authors:Jialiang Wei, Ali Ebrahimi Pourasad, Walid Maalej
Title: LikeThis! Empowering App Users to Submit UI Improvement Suggestions Instead of Complaints
Abstract:
User feedback is crucial for the evolution of mobile apps. However, research suggests that users tend to submit uninformative, vague, or destructive feedback. Unlike recent AI4SE approaches that focus on generating code and other development artifacts, our work aims at empowering users to submit better and more constructive UI feedback with concrete suggestions on how to improve the app. We propose LikeThis!, a GenAI-based approach that takes a user comment with the corresponding screenshot to immediately generate multiple improvement alternatives, from which the user can easily choose their preferred option. To evaluate LikeThis!, we first conducted a model benchmarking study based on a public dataset of carefully critiqued UI designs. The results show that GPT-Image-1 significantly outperformed three other state-of-the-art image generation models in improving the designs to address UI issues while keeping the fidelity and without introducing new issues. An intermediate step in LikeThis! is to generate a solution specification before sketching the design as a key to achieving effective improvement. Second, we conducted a user study with 10 production apps, where 15 users used LikeThis! to submit their feedback on encountered issues. Later, the developers of the apps assessed the understandability and actionability of the feedback with and without generated improvements. The results show that our approach helps generate better feedback from both user and developer perspectives, paving the way for AI-assisted user-developer collaboration.

Authors:Chen Sun, Yash Vekaria, Rishab Nithyanand
Title: On the Suitability of LLM-Driven Agents for Dark Pattern Audits
Abstract:
As LLM-driven agents begin to autonomously navigate the web, their ability to interpret and respond to manipulative interface design becomes critical. A fundamental question that emerges is: can such agents reliably recognize patterns of friction, misdirection, and coercion in interface design (i.e., dark patterns)? We study this question in a setting where the workflows are consequential: website portals associated with the submission of CCPA-related data rights requests. These portals operationalize statutory rights, but they are implemented as interactive interfaces whose design can be structured to facilitate, burden, or subtly discourage the exercise of those rights. We design and deploy an LLM-driven auditing agent capable of end-to-end traversal of rights-request workflows, structured evidence gathering, and classification of potential dark patterns. Across a set of 456 data broker websites, we evaluate: (1) the ability of the agent to consistently locate and complete request flows, (2) the reliability and reproducibility of its dark pattern classifications, and (3) the conditions under which it fails or produces poor judgments. Our findings characterize both the feasibility and the limitations of using LLM-driven agents for scalable dark pattern auditing.

Authors:Daijin Yang, Erica Kleinman, Casper Harteveld
Title: Bridging Pedagogy and Play: Introducing a Language Mapping Interface for Human-AI Co-Creation in Educational Game Design
Abstract:
Educational games can foster critical thinking, problem-solving, and motivation, yet instructors often find it difficult to design games that reliably achieve specific learning outcomes. Existing authoring environments reduce the need for programming expertise, but they do not eliminate the underlying challenges of educational game design, and they can leave non-expert designers reliant on opaque suggestions from AI systems. We designed a controlled natural language framework-based web tool that positions language as the primary interface for LLM-assisted educational game design. In the tool, users and an LLM assistant collaboratively develop a structured language that maps pedagogy to gameplay through four linked components. We argue that, by making pedagogical intent explicit and editable in the interface, the tool has the potential to lower design barriers for non-expert designers, preserves human agency in critical decisions, and enables alignment and reflections between pedagogy and gameplay during and after co-creation.

Authors:Danial Amin, Joni Salminen, Bernard J. Jansen
Title: How to Model AI Agents as Personas?: Applying the Persona Ecosystem Playground to 41,300 Posts on Moltbook for Behavioral Insights
Abstract:
AI agents are increasingly active on social media platforms, generating content and interacting with one another at scale. Yet the behavioral diversity of these agents remains poorly understood, and methods for characterizing distinct agent types and studying how they engage with shared topics are largely absent from current research. We apply the Persona Ecosystem Playground (PEP) to Moltbook, a social platform for AI agents, to generate and validate conversational personas from 41,300 posts using k-means clustering and retrieval-augmented generation. Cross-persona validation confirms that personas are semantically closer to their own source cluster than to others (t(61) = 17.85, p < .001, d = 2.20; own-cluster M = 0.71 vs. other-cluster M = 0.35). These personas are then deployed in a nine-turn structured discussion, and simulation messages were attributed to their source persona significantly above chance (binomial test, p < .001). The results indicate that persona-based ecosystem modeling can represent behavioral diversity in AI agent populations.

Authors:Ryan Feng Lin, Yuantao Wei, Huiling Liao, Xiaoning Qian, Shuai Huang
Title: Causal Learning Should Embrace the Wisdom of the Crowd
Abstract:
Learning causal structures typically represented by directed acyclic graphs (DAGs) from observational data is notoriously challenging due to the combinatorial explosion of possible graphs and inherent ambiguities in observations. This paper argues that causal learning is now ready for the emergence of a new paradigm supported by rapidly advancing technologies, fulfilling the long-standing vision of leveraging human causal knowledge. This paradigm integrates scalable crowdsourcing platforms for data collection, interactive knowledge elicitation for expert opinion modeling, robust aggregation techniques for expert reconciliation, and large language model (LLM)-based simulation for augmenting AI-driven information acquisition. In this paper, we focus on DAG learning for causal discovery and frame the problem as a distributed decision-making task, recognizing that each participant (human expert or LLM agent) possesses fragmented and imperfect knowledge about different subsets of the variables of interest in the causal graph. By proposing a systematic framework to synthesize these insights, we aim to enable the recovery of a global causal structure unachievable by any individual agent alone. We advocate for a new research frontier and outline a comprehensive framework for new research thrusts that range from eliciting, modeling, aggregating, and optimizing human causal knowledge contributions.

Authors:Alexander Schperberg, Yeping Wang, Stefano Di Cairano
Title: Safe Whole-Body Loco-Manipulation via Combined Model and Learning-based Control
Abstract:
Simultaneous locomotion and manipulation enables robots to interact with their environment beyond the constraints of a fixed base. However, coordinating legged locomotion with arm manipulation, while considering safety and compliance during contact interaction remains challenging. To this end, we propose a whole-body controller that combines a model-based admittance control for the manipulator arm with a Reinforcement Learning (RL) policy for legged locomotion. The admittance controller maps external wrenches--such as those applied by a human during physical interaction--into desired end-effector velocities, allowing for compliant behavior. The velocities are tracked jointly by the arm and leg controllers, enabling a unified 6-DoF force response. The model-based design permits accurate force control and safety guarantees via a Reference Governor (RG), while robustness is further improved by a Kalman filter enhanced with neural networks for reliable base velocity estimation. We validate our approach in both simulation and hardware using the Unitree Go2 quadruped robot with a 6-DoF arm and wrist-mounted 6-DoF Force/Torque sensor. Results demonstrate accurate tracking of interaction-driven velocities, compliant behavior, and safe, reliable performance in dynamic settings.

Authors:Helinyi Peng, Akihito Taya, Yuuki Nishiyama, Kaoru Sezaki
Title: AEDHunter: Investigating AED Retrieval in the Real World via Gamified Mobile Interaction and Sensing
Abstract:
Early defibrillation significantly improves survival rates in cases of out-of-hospital cardiac arrest. However, limited public awareness of Automated External Defibrillator (AED) locations constrains their effective use. Existing solutions, such as static 2D maps, often fall short in urgent or complex real-world scenarios. To address this challenge, we developed AEDHunter, a gamified, location-based mobile application designed to transform AED retrieval into an engaging and repeatable practice experience. Leveraging smartphone sensors to analyze participants' movement and learning patterns, and using low-cost Bluetooth tags to verify arrivals at AED locations, AEDHunter guides users through multiple sessions of AED discovery. In a real-world evaluation study, participants significantly reduced their AED retrieval times after repeated practice sessions and reported increased confidence in locating AEDs. Additionally, we employ a two-state activity detector to identify ``exploratory pauses'', which are then used as a behavioral learning signal to quantify hesitation and its progressive reduction through practice. Our findings suggest that gamified applications like AEDHunter can improve AED retrieval performance through repeated, in-situ training and enhance self-reported preparedness, offering design insights for technology-supported learning and public safety applications.

Authors:Sina Elahimanesh, Mohammadali Mohammadkhani, Sara Zahedi Movahed, Mohammadmahdi Abootorabi, Shayan Salehi, Abbas Edalat
Title: Structure Matters: Evaluating Multi-Agents Orchestration in Generative Therapeutic Chatbots
Abstract:
While large language models (LLMs) excel at open-ended dialogue, effective psychotherapy requires structured progression and adherence to clinical protocols, making the design of psychotherapist chatbots challenging. We investigate how different LLM-based designs shape perceived therapeutic dialogue in a chatbot grounded in the Self-Attachment Technique (SAT), a novel self-administered psychotherapy rooted in attachment theory. We compare three architectural variants: (1) a multi-agent system utilizing finite state machine aligned with therapeutic stages and a shared long-term memory, (2) a single-agent using identical knowledge-base and the same prompts, and (3) an unguided LLM. In an eight-day randomized controlled trial (RCT) with N=66 Farsi-speaking participants, balanced across the three chatbots, the multi-agent system is perceived as significantly more natural and human-like than the other variants and achieves higher ratings across most other metrics. These findings demonstrate that for therapeutic AI, architectural orchestration is as critical as prompt engineering in fostering natural, engaging dialogue.

Authors:Daniel Mejer Christensen, Katja Stougård Jørgensen, Josefine Palsgaard Wyrtz, Jennie Torp Overgaard, Niels van Berkel, Joel Wester
Title: PlantWhisperer: Designing Conversational AI to Support Plant Care
Abstract:
Research in Human-Computer Interaction (HCI) has shown that caring for others, including both humans (e.g., close friends) and computers (e.g., Tamagotchi), can have a positive effect on people's wellbeing. However, we know less about the potential role of conversational AI in such settings. In this work, we explore how AI chatbots can support plant care and, in turn, positively influence people's well-being. We developed a mobile application that allows users to `talk' to their plants via chatbots. We evaluated the application with ten participants and conducted semi-structured interviews based on Seligman's PERMA model, which identifies pillars of psychological well-being. Our findings suggest positive effects, with participants reflecting on a sense of connection to their plants and corresponding feelings of accomplishment. While our findings suggest that participants were generally positive about the app, they also raised concerns about the diverse preferences and expectations of users regarding interactions with chatbots representing plants.

Authors:Cong Ye, Songlin Shang, Xiaoxu Ma, Xiangbo Zhang
Title: Input-Envelope-Output: Auditable Generative Music Rewards in Sensory-Sensitive Contexts
Abstract:
Generative feedback in sensory-sensitive contexts poses a core design challenge: large individual differences in sensory tolerance make it difficult to sustain engagement without compromising safety. This tension is exemplified in autism spectrum disorder (ASD), where auditory sensitivities are common yet highly heterogeneous. Existing interactive music systems typically encode safety implicitly within direct input-output (I-O) mappings, which can preserve novelty but make system behavior hard to predict or audit. We instead propose a constraint-first Input-Envelope-Output (I-E-O) framework that makes safety explicit and verifiable while preserving action-output causality. I-E-O introduces a low-risk envelope layer between user input and audio output to specify safe bounds, enforce them deterministically, and log interventions for audit. From this architecture, we derive four verifiable design principles and instantiate them in MusiBubbles, a web-based prototype. Contributions include the I-E-O architecture, MusiBubbles as an exemplar implementation, and a reproducibility package to support adoption in ASD and other sensory-sensitive domains.

Authors:Yunpeng Bai, Shengdong Zhao, Antti Oulasvirta
Title: Simulation-based Optimization for Augmented Reading
Abstract:
Augmented reading systems aim to adapt text presentation to improve comprehension and task performance, yet existing approaches rely heavily on heuristics, opaque data-driven models, or repeated human involvement in the design loop. We propose framing augmented reading as a simulation-based optimization problem grounded in resource-rational models of human reading. These models instantiate a simulated reader that allocates limited cognitive resources, such as attention, memory, and time under task demands, enabling systematic evaluation of text user interfaces. We introduce two complementary optimization pipelines: an offline approach that explores design alternatives using simulated readers, and an online approach that personalizes reading interfaces in real time using ongoing interaction data. Together, this perspective enables adaptive, explainable, and scalable augmented reading design without relying solely on human testing.

Authors:Xueqing Li, Danqi huang, Tianyu Yu, Shuzi Yin, Bingjie Gao, Anna Matsumoto, Zhihao Yao, Yiwei Zhao, Shiqing Lyu, Yuchen Tian, Lining Yao, Haipeng Mi, Qiuyu Lu
Title: DuoMorph: Synergistic Integration of FDM Printing and Pneumatic Actuation for Shape-Changing Interfaces
Abstract:
We introduce DuoMorph, a design and fabrication method that synergistically integrates Fused Deposition Modeling (FDM) printing and pneumatic actuation to create novel shape-changing interfaces. In DuoMorph, the printed structures and heat-sealed pneumatic elements are mutually designed to actuate and constrain each other, enabling functions that are difficult for either component to achieve in isolation. Moreover, the entire hybrid structure can be fabricated through a single, seamless process using only a standard FDM printer, including both heat-sealing and 3D and 4D printing. In this paper, we define a design space including four primitive categories that capture the fundamental ways in which printed and pneumatic components can interact. To support this process, we present a fabrication method and an accompanying design tool. Finally, we demonstrate the potential of DuoMorph through a series of example applications and performance demonstrations.

Authors:Shehryar Saharan, Ibrahim Al-Hazwani, Miriah Meyer, Laura Garrison
Title: A Critical Reflection on the Values and Assumptions in Data Visualization
Abstract:
Visualization has matured into an established research field, producing widely adopted tools, design frameworks, and empirical foundations. As the field has grown, ideas from outside computer science have increasingly entered visualization discourse, questioning the fundamental values and assumptions on which visualization research stands. In this short position paper, we examine a set of values that we see underlying the seminal works of Jacques Bertin, John Tukey, Leland Wilkinson, Colin Ware, and Tamara Munzner. We articulate three prominent values in these texts - universality, objectivity, and efficiency - and examine how these values permeate visualization tools, curricula, and research practices. We situate these values within a broader set of critiques that call for more diverse priorities and viewpoints. By articulating these tensions, we call for our community to embrace a more pluralistic range of values to shape our future visualization tools and guidelines.

Authors:Yang Liu, Qiushi Zhou, Mathias N Lystbæk, Aidan Kehoe, Mario Gutierrez, Hans Gellersen, Ken Pfeuffer
Title: StylusPort: Investigating Teleportation using Stylus in VR
Abstract:
With a stylus, users can both sweep sketches across models and pinpoint locations with precision. Building on this dual capability, we explore how teleportation can be integrated into stylus interaction without disrupting the flow of common stylus usage. We introduce two key ideas: flipping the stylus as an intuitive mode switch between drawing and teleportation, and using gaze to set orientation while the stylus handles positioning. In a user study that features a teleport-and-orient task, we evaluate six teleportation techniques, covering two mode-switching methods (Button and Flip) and three orientation approaches (StylusRoll, StylusPoint, and GazePoint). The results offer new insights into the relative merits and limitations of each technique. Our work contributes to knowledge about teleportation in VR and fills the gap in seamlessly integrating teleportation with stylus use in 3D.

Authors:Eason Chen, Ce Guan, Ahmed Elshafiey, Zhonghao Zhao, Joshua Zekeri, Afeez Edeifo Shaibu, Emmanuel Osadebe Prince, Cyuan Jhen Wu
Title: OpenClaw AI Agents as Informal Learners at Moltbook: Characterizing an Emergent Learning Community at Scale
Abstract:
Informal learning communities have been called the "other Massive Open Online C" in Learning@Scale research, yet remain understudied compared to MOOCs. We present the first empirical study of a large-scale informal learning community composed entirely of AI agents. Moltbook, a social network exclusively for AI agents powered by autonomous agent frameworks such as OpenClaw, grew to over 2.8 million registered agents in three weeks. Analyzing 231,080 non-spam posts across three phases of community evolution, we find three key patterns. First, participation inequality is extreme from the start (comment Gini = 0.889), exceeding human community benchmarks. Second, AI agents exhibit a "broadcasting inversion": statement-to-question ratios of 8.9:1 to 9.7:1 contrast sharply with the question-driven dynamics of human learning communities, and comment-level analysis of 1.55 million comments reveals a "parallel monologue" pattern where 93% of comments are independent responses rather than threaded dialogue. Third, we document a characteristic engagement lifecycle: explosive initial growth (184K posts from 32K authors in 11 days), a spam crisis (57,093 posts deleted by the platform), and engagement decline (mean comments: 31.7 -> 8.3 -> 1.7) that had not reversed by the end of our observation window despite effective spam removal. Sentiment analysis reveals a selection effect: comment tone becomes more positive as engagement declines, suggesting that casual participants disengage first while committed contributors remain. These findings have direct implications for hybrid human-AI learning platforms.

Authors:Zhiqin Qian, Ryan Diaz, Sangwon Seo, Vaibhav Unhelkar
Title: Hierarchical Reward Design from Language: Enhancing Alignment of Agent Behavior with Human Specifications
Abstract:
When training artificial intelligence (AI) to perform tasks, humans often care not only about whether a task is completed but also how it is performed. As AI agents tackle increasingly complex tasks, aligning their behavior with human-provided specifications becomes critical for responsible AI deployment. Reward design provides a direct channel for such alignment by translating human expectations into reward functions that guide reinforcement learning (RL). However, existing methods are often too limited to capture nuanced human preferences that arise in long-horizon tasks. Hence, we introduce Hierarchical Reward Design from Language (HRDL): a problem formulation that extends classical reward design to encode richer behavioral specifications for hierarchical RL agents. We further propose Language to Hierarchical Rewards (L2HR) as a solution to HRDL. Experiments show that AI agents trained with rewards designed via L2HR not only complete tasks effectively but also better adhere to human specifications. Together, HRDL and L2HR advance the research on human-aligned AI agents.

Authors:Kian Wei Ng, Yujia Gao, Deborah Khoo, Ying Zhen Tan, Chengzheng Mao, Haojie Cheng, Andrew Makmur, Kee Yuan Ngiam, Serene Goh, Eng Tat Khoo
Title: Scaling Ultrasound Volumetric Reconstruction via Mobile Augmented Reality
Abstract:
Accurate volumetric characterization of lesions is essential for oncologic diagnosis, risk stratification, and treatment planning. While imaging modalities such as Computed Tomography provide high-quality 3D data, 2D ultrasound (2D-US) remains the preferred first-line modality for breast and thyroid imaging due to cost, portability, and safety factors. However, volume estimates derived from 2D-US suffer from high inter-user variability even among experienced clinicians. Existing 3D ultrasound (3D-US) solutions use specialized probes or external tracking hardware, but such configurations increase costs and diminish portability, constraining widespread clinical use. To address these limitations, we present Mobile Augmented Reality Volumetric Ultrasound (MARVUS), a resource-efficient system designed to increase accessibility to accurate and reproducible volumetric assessment. MARVUS is interoperable with conventional ultrasound (US) systems, using a foundation model to enhance cross-specialty generalization while minimizing hardware requirements relative to current 3D-US solutions. In a user study involving experienced clinicians performing measurements on breast phantoms, MARVUS yielded a substantial improvement in volume estimation accuracy (mean difference: 0.469 cm3) with reduced inter-user variability (mean difference: 0.417 cm3). Additionally, we prove that augmented reality (AR) visualizations enhance objective performance metrics and clinician-reported usability. Collectively, our findings suggests that MARVUS can enhance US-based cancer screening, diagnostic workflows, and treatment planning in a scalable, cost-conscious, and resource-efficient manner. Usage video demonstration available (https://youtu.be/m4llYcZpqmM).

Authors:Mohammad Masudur Rahman, Beenish Moalla Chaudhry
Title: Exploring the Ethical Concerns in User Reviews of Mental Health Apps using Topic Modeling and Sentiment Analysis
Abstract:
The rapid growth of AI-driven mental health mobile apps has raised concerns about their ethical considerations and user trust. This study proposed a natural language processing (NLP)-based framework to evaluate ethical aspects from user-generated reviews from the Google Play Store and Apple App Store. After gathering and cleaning the data, topic modeling was applied to identify latent themes in the context of ethics using topic words and then map them to well-recognized existing ethical principles described in different ethical frameworks; in addition to that, a bottom-up approach is applied to find any new and emergent ethics from the reviews using a transformer-based zero-shot classification model. Sentiment analysis was then used to capture how users feel about each ethical aspect. The obtained results reveal that well-known ethical considerations are not enough for the modern AI-based technologies and are missing emerging ethical challenges, showing how these apps either uphold or overlook key moral values. This work contributes to developing an ongoing evaluation system that can enhance the fairness, transparency, and trustworthiness of AI-powered mental health chatbots.

Authors:Daniel Killough, Tiger F. Ji, Kexin Zhang, Yaxin Hu, Yu Huang, Ruofei Du, Yuhang Zhao
Title: How Well Can 3D Accessibility Guidelines Support XR Development? An Interview Study with XR Practitioners in Industry
Abstract:
While accessibility (a11y) guidelines exist for 3D games and virtual worlds, their applicability to extended reality (XR)'s unique interaction paradigms (e.g., spatial tracking, kinesthetic interactions) remains unexplored. XR practitioners need practical guidance to successfully implement a11y guidelines under real-world constraints. We present the first evaluation of existing 3D a11y guidelines applied to XR development through semi-structured interviews with 25 XR practitioners across diverse organization contexts. We assessed 20 commonly-agreed a11y guidelines from six major resources across visual, motor, cognitive, speech, and hearing domains, comparing practitioners' development practices against guideline applicability to XR. Our investigation reveals that guidelines can be highly effective when designed as transformation catalysts rather than compliance checklists, but fundamental mismatches exist between existing 3D guidelines and XR requirements, creating both implementation barriers and design gaps. This work provides foundational insights towards developing a11y guidelines and support tools that address XR's distinct characteristics.

Authors:Neda Barbazi, Ji Youn Shin, Gurumurthy Hiremath, Carlye Anne Lauff
Title: Growing With the Condition: Co-Designing Pediatric Technologies that Adapt Across Developmental Stages
Abstract:
Children with chronic conditions face evolving challenges in daily activities, peer relationships, and clinical care. Younger children often rely on parental support, while older ones seek independence. Prior studies on chronic conditions explored proxy-based, family-centered, and playful approaches to support children's health, but most technologies treat children as a homogeneous group rather than adapting to their developmental differences. To address this gap, we conducted four co-design workshops with 69 children with congenital heart disease (CHD) at a medically supported camp, spanning elementary, middle, and high school groups. Our analysis reveals distinct coping strategies: elementary children relied on comfort objects and reassurance, middle schoolers used mediated communication and selective disclosure, and high schoolers emphasized agency and direct engagement with peers and providers. Through child-centered participatory design, we contribute empirical insights into how children's management of chronic conditions evolves and propose design implications for pediatric health technologies that adapt across developmental trajectories.

Authors:Vijay Prakash, Majed Almansoori, Donghan Hu, Rahul Chatterjee, Danny Yuxing Huang
Title: Assessing LLM Response Quality in the Context of Technology-Facilitated Abuse
Abstract:
Technology-facilitated abuse (TFA) is a pervasive form of intimate partner violence (IPV) that leverages digital tools to control, surveil, or harm survivors. While tech clinics are one of the reliable sources of support for TFA survivors, they face limitations due to staffing constraints and logistical barriers. As a result, many survivors turn to online resources for assistance. With the growing accessibility and popularity of large language models (LLMs), and increasing interest from IPV organizations, survivors may begin to consult LLM-based chatbots before seeking help from tech clinics. In this work, we present the first expert-led manual evaluation of four LLMs - two widely used general-purpose non-reasoning models and two domain-specific models designed for IPV contexts - focused on their effectiveness in responding to TFA-related questions. Using real-world questions collected from literature and online forums, we assess the quality of zero-shot single-turn LLM responses generated with a survivor safety-centered prompt on criteria tailored to the TFA domain. Additionally, we conducted a user study to evaluate the perceived actionability of these responses from the perspective of individuals who have experienced TFA. Our findings, grounded in both expert assessment and user feedback, provide insights into the current capabilities and limitations of LLMs in the TFA context and may inform the design, development, and fine-tuning of future models for this domain. We conclude with concrete recommendations to improve LLM performance for survivor support.

Authors:Michael Tompkins, Nihaarika Agarwal, Ananta Soneji, Robert Wasinger, Connor Nelson, Kevin Leach, Rakibul Hasan, Adam Doupé, Daniel Votipka, Yan Shoshitaishvili, Jaron Mink
Title: Do Hackers Dream of Electric Teachers?: A Large-Scale, In-Situ Evaluation of Cybersecurity Student Behaviors and Performance with AI Tutors
Abstract:
To meet the ever-increasing demands of the cybersecurity workforce, AI tutors have been proposed for personalized, scalable education. But, while AI tutors have shown promise in introductory programming courses, no work has evaluated their use in hands-on exploration and exploitation of systems (e.g., ``capture-the-flag'') commonly used to teach cybersecurity. Thus, despite growing interest and need, no work has evaluated how students use AI tutors or whether they benefit from their presence in real, large-scale cybersecurity courses. To answer this, we conducted a semester-long observational study on the use of an embedded AI tutor with 309 students in an upper-division introductory cybersecurity course. By analyzing 142,526 student queries sent to the AI tutor across 396 cybersecurity challenges spanning 9 core cybersecurity topics and an accompanying set of post-semester surveys, we find (1) what queries and conversational strategies students use with AI tutors, (2) how these strategies correlate with challenge completion, and (3) students' perceptions of AI tutors in cybersecurity education. In particular, we identify three broad AI tutor conversational styles among users: Short (bounded, few-turn exchanges), Reactive (repeatedly submitting code and errors), and Proactive (driving problem-solving through targeted inquiry). We also find that the use of these styles significantly predicts challenge completion, and that this effect increases as materials become more advanced. Furthermore, students valued the tutor's availability but reported that it became less useful for harder material. Based on this, we provide suggestions for security educators and developers on practical AI tutor use.

Authors:Rachael Zehrung, Yunan Chen
Title: Hiding in Plain Sight: Understanding the Everyday Practices and Challenges of Car Dwellers
Abstract:
Vehicle dwelling has increased significantly in recent years. While HCI research has explored vehicle dwelling through the lens of digital nomadism and vanlife, it has largely overlooked the complexities of vehicle dwelling as a form of housing insecurity, as well as the unique constraints of living in smaller vehicles. Drawing on a qualitative analysis of posts and comments from an online community, we examine car dwellers' infrastructuring work to manage daily life under social, spatial, and infrastructural constraints. We further explore the motivations and identity negotiations of car dwellers, whose experiences fall between homelessness and nomadism, and highlight how developing infrastructural competence can shape identity. We discuss implications for future HCI research on mobility and dwelling under conditions of uneven access to infrastructure and provide design recommendations for technologies that better account for car dwellers' diverse needs, circumstances, and identities.

Authors:Ashlee Milton, Dan Runningen, Loren Terveen, Harmanpreet Kaur, Stevie Chancellor
Title: Unraveling Entangled Feeds: Rethinking Social Media Design to Enhance User Well-being
Abstract:
Social media platforms have rapidly adopted algorithmic curation with little consideration for the potential harm to users' mental well-being. We present findings from design workshops with 21 participants diagnosed with mental illness about their interactions with social media platforms. We find that users develop cause-and-effect explanations, or folk theories, to understand their experiences with algorithmic curation. These folk theories highlight a breakdown in algorithmic design that we explain using the framework of entanglement, a phenomenon where there is a disconnect between users' actions and platform outcomes on an emotional level. Participants' designs to address entanglement and mitigate harms centered on contextualizing their engagement and restoring explicit user control on social media. The conceptualization of entanglement and the resulting design recommendations have implications for social computing and recommender systems research, particularly in evaluating and designing social media platforms that support users' mental well-being.

Authors:Artur Solomonik, Nicolas Ruiz, Hendrik Heuer
Title: Reflecting on 1,000 Social Media Journeys: Generational Patterns in Platform Transition
Abstract:
Social media has billions of users, but we still do not fully understand why users prefer one platform over another. Establishing new platforms among already popular competitors is difficult. Prior research has richly documented people's experiences within individual platforms, yet situating those experiences within the entirety of a user's social media experience remains challenging. What platforms have people used, and why have they transitioned between them? We collected data from a quota-based sample of 1,000 U.S. participants. We introduce the concept of \emph{Social Media Journeys} to study the entirety of their social media experiences systematically. We identify push and pull factors across the social media landscape. We also show how different generations adopted social media platforms based on personal needs. With this work, we advance HCI by moving towards holistic perspectives when discussing social media technology, offering new insights for platform design, governance, and regulation.

Authors:Mounvik K, N Harshit
Title: Web-Scale Multimodal Summarization using CLIP-Based Semantic Alignment
Abstract:
We introduce Web-Scale Multimodal Summarization, a lightweight framework for generating summaries by combining retrieved text and image data from web sources. Given a user-defined topic, the system performs parallel web, news, and image searches. Retrieved images are ranked using a fine-tuned CLIP model to measure semantic alignment with topic and text. Optional BLIP captioning enables image-only summaries for stronger multimodal coherence.The pipeline supports features such as adjustable fetch limits, semantic filtering, summary styling, and downloading structured outputs. We expose the system via a Gradio-based API with controllable parameters and preconfigured presets.Evaluation on 500 image-caption pairs with 20:1 contrastive negatives yields a ROC-AUC of 0.9270, an F1-score of 0.6504, and an accuracy of 96.99%, demonstrating strong multimodal alignment. This work provides a configurable, deployable tool for web-scale summarization that integrates language, retrieval, and vision models in a user-extensible pipeline.

Authors:Eason Chen, Ce Guan, Ahmed Elshafiey, Zhonghao Zhao, Joshua Zekeri, Afeez Edeifo Shaibu, Emmanuel Osadebe Prince
Title: When OpenClaw AI Agents Teach Each Other: Peer Learning Patterns in the Moltbook Community
Abstract:
Peer learning, where learners teach and learn from each other, is foundational to educational practice. A novel phenomenon has emerged: AI agents forming communities where they teach each other skills, share discoveries, and collaboratively build knowledge. This paper presents an educational data mining analysis of Moltbook, a large-scale community where over 2.4 million AI agents engage in peer learning, posting tutorials, answering questions, and sharing newly acquired skills. Analyzing 28,683 posts (after filtering automated spam) and 138 comment threads with statistical and qualitative methods, we find evidence of genuine peer learning behaviors: agents teach skills they built (74K comments on a skill tutorial), report discoveries, and engage in collaborative problem-solving. Qualitative comment analysis reveals a taxonomy of peer response patterns: validation (22%), knowledge extension (18%), application (12%), and metacognitive reflection (7%), with agents building on each others' frameworks across multiple languages. We characterize how AI peer learning differs from human peer learning: (1) teaching (statements) dramatically outperforms help-seeking (questions) with an 11.4:1 ratio; (2) learning-oriented content (procedural and conceptual) receives 3x more engagement than other content; (3) extreme participation inequality reveals non-human behavioral signatures. We derive six design principles for educational AI, including leveraging validation-before-extension patterns and supporting multilingual learning networks. Our work provides the first empirical characterization of peer learning among AI agents, contributing to EDM's understanding of how learning occurs in increasingly AI-populated educational environments.

Authors:Marianne Bossema, Rob Saunders, Vlad Glaveanu, Somaya Ben Allouch
Title: Designing a Rashomon Machine: Pluri-perspectivism and XAI for Creativity Support
Abstract:
While intelligent technologies offer unique opportunities for creativity support, there are fundamental challenges in designing human-centered co-creative systems. Explainable AI (XAI) can contribute when shifting its traditional role from justification (explaining decisions) to exploration (explaining possibilities). Contextual understanding is essential for supporting embodied creativity. Generative Artificial Intelligence (AI) models are fundamentally limited, however, by their reliance on disembodied data. We propose Pluri-perspectivism as a framework for XAI, to bridge the epistemological gap between human and machine, and promote creative exploration. It is a pragmatic, action-oriented solution to guide the system, repurposing XAI methods such as the Rashomon Technique. This facilitates exploring a spectrum of creative possibilities, and the exchange of 'perspectives' between human and machine. Using Pluri-perspectivism as a framework for XAI, we can reintroduce productive friction and support human agency in human-machine creative collaborations.

Authors:Ben Kosa, Hsuanling Lee, Jasmine Li, Sanbrita Mondal, Yuhang Zhao, Liang He
Title: Not Seeing the Whole Picture: Challenges and Opportunities in Using AI for Co-Making Physical DIY-AT for People with Visual Impairments
Abstract:
Existing assistive technologies (AT) often adopt a one-size-fits-all approach, overlooking the diverse needs of people with visual impairments (PVI). Do-it-yourself AT (DIY-AT) toolkits offer one path toward customization, but most remain limited--targeting co-design with engineers or requiring programming expertise. Non-professionals with disabilities, including PVI, also face barriers such as inaccessible tools, lack of confidence, and insufficient technical knowledge. These gaps highlight the need for prototyping technologies that enable PVI to directly make their own AT. Building on emerging evidence that large language models (LLMs) can serve not only as visual aids but also as co-design partners, we present an exploratory study of how LLM-based AI can support PVI in the tangible DIY-AT co-making process. Our findings surface key challenges and design opportunities: the need for greater spatial and visual support, strategies for mitigating novel AI errors, and implications for designing more accessible AI-assisted prototypes.

Authors:Zhanming Chen, Alisha Ghaju, May Hang, Juan F. Maestre, Ji Youn Shin
Title: Designing Health Technologies for Immigrant Communities: Exploring Healthcare Providers' Communication Strategies with Patients
Abstract:
Patient-provider communication is an important aspect of successful healthcare, as it can directly lead to positive health outcomes. Previous studies examined factors that facilitate communication between healthcare providers and patients in socially marginalized communities, especially developing countries, and applied identified factors to technology development. However, there is limited understanding of how providers work with patients from immigrant populations in a developed country. By conducting semi-structured interviews with 15 providers working with patients from an immigrant community with unique cultural characteristics, we identified providers' effective communication strategies, including acknowledgment, community involvement, gradual care, and adaptive communication practices (i.e., adjusting the communication style). Based on our findings, we highlight cultural competence and discuss design implications for technologies to support health communication in immigrant communities. Our suggestions propose approaches for HCI researchers to identify practical, contextualized cultural competence for their health technology design.

Authors:Minghe Lu, Zhanming Chen, May Sunmin Hwang, Ji Youn Shin
Title: "It's More of a Lifestyle'': Design Considerations for Supporting Everyday Practices in Community-Based Farming
Abstract:
Farming plays a significant role in the economy by supporting related industries such as food, retail, and local services. Community-based small farms, while offering unique social and cultural benefits, face persistent challenges, including limited access to formal education and underdeveloped infrastructure, which have been discussed in prior research. This study focuses on community-driven factors, such as workarounds for recording critical information and practices for passing down farming knowledge across generations. Through 11 semi-structured interviews with farmers from a small ethnic community, the Hmong, we explore how bonding social capital, rooted in close family and community ties, supports informal knowledge exchange and creates pathways to bridging and linking capital. These relationships help farmers connect to broader networks, resources, and institutions. Our findings highlight opportunities for designing technologies that support and strengthen existing support systems. We discuss how technologies should be designed to reflect the cultural values, unique practices, and intergenerational relationships embedded in community-based farms.

Authors:Tony Li, Yan Ma, Zhuojun Li, Chun Yu, IV Ramakrishnan, Xiaojun Bi
Title: KeySense: LLM-Powered Hands-Down, Ten-Finger Typing on Commodity Touchscreens
Abstract:
Existing touchscreen software keyboards prevent users from resting their hands, forcing slow and fatiguing index-finger tapping ("chicken typing") instead of familiar hands-down ten-finger typing. We present KeySense, a purely software solution that preserves physical keyboard motor skills. KeySense isolates intentional taps from resting-finger noise using cognitive-motor timing patterns, and then uses a fine-tuned LLM decoder to convert the resulting noisy letter sequence into the intended word. In controlled component tests, the decoder substantially outperforms two statistical baselines (top-1 accuracy 84.8% vs 75.7% and 79.3%). A 12-participant study shows clear ergonomic and performance benefits: compared with the conventional hover-style keyboard, users rated KeySense as markedly less physically demanding (NASA-TLX median 1.5 vs 4.0), and after brief practice typed significantly faster (WPM 28.3 vs 26.2, p < 0.01). These results indicate that KeySense enables accurate, efficient, and comfortable ten-finger text entry on commodity touchscreens without any extra hardware.

Authors:Shreya Chappidi, Jatinder Singh, Andra V. Krauze
Title: Who Does What? Archetypes of Roles Assigned to LLMs During Human-AI Decision-Making
Abstract:
LLMs are increasingly supporting decision-making across high-stakes domains, requiring critical reflection on the socio-technical factors that shape how humans and LLMs are assigned roles and interact during human-in-the-loop decision-making. This paper introduces the concept of human-LLM archetypes -- defined as re-curring socio-technical interaction patterns that structure the roles of humans and LLMs in collaborative decision-making. We describe 17 human-LLM archetypes derived from a scoping literature review and thematic analysis of 113 LLM-supported decision-making papers. Then, we evaluate these diverse archetypes across real-world clinical diagnostic cases to examine the potential effects of adopting distinct human-LLM archetypes on LLM outputs and decision outcomes. Finally, we present relevant tradeoffs and design choices across human-LLM archetypes, including decision control, social hierarchies, cognitive forcing strategies, and information requirements. Through our analysis, we show that selection of human-LLM interaction archetype can influence LLM outputs and decisions, bringing important risks and considerations for the designers of human-AI decision-making systems

Authors:Hanjing Shi, Dominic DiFranzo
Title: When Visibility Outpaces Verification: Delayed Verification and Narrative Lock-in in Agentic AI Discourse
Abstract:
Agentic AI systems-autonomous entities capable of independent planning and execution-reshape the landscape of human-AI trust. Long before direct system exposure, user expectations are mediated through high-stakes public discourse on social platforms. However, platform-mediated engagement signals (e.g., upvotes) may inadvertently function as a ``credibility proxy,'' potentially stifling critical evaluation. This paper investigates the interplay between social proof and verification timing in online discussions of agentic AI. Analyzing a longitudinal dataset from two distinct Reddit communities with contrasting interaction cultures-r/OpenClaw and r/Moltbook-we operationalize verification cues via reproducible lexical rules and model the ``time-to-first-verification'' using a right-censored survival analysis framework. Our findings reveal a systemic ``Popularity Paradox'': high-visibility discussions in both subreddits experience significantly delayed or entirely absent verification cues compared to low-visibility threads. This temporal lag creates a critical window for ``Narrative Lock-in,'' where early, unverified claims crystallize into collective cognitive biases before evidence-seeking behaviors emerge. We discuss the implications of this ``credibility-by-visibility'' effect for AI safety and propose ``epistemic friction'' as a design intervention to rebalance engagement-driven platforms.

Authors:Yate Ge, Lin Tian, Yi Dai, Shuhan Pan, Yiwen Zhang, Qi Wang, Weiwei Guo, Xiaohua Sun
Title: GenFaceUI: Meta-Design of Generative Personalized Facial Expression Interfaces for Intelligent Agents
Abstract:
This work investigates generative facial expression interfaces for intelligent agents from a meta-design perspective. We propose the Generative Personalized Facial Expression Interface (GPFEI) framework, which organizes rule-bounded spaces, character identity, and context--expression mapping to address challenges of control, coherence, and alignment in run-time facial expression generation. To operationalize this framework, we developed GenFaceUI, a proof-of-concept tool that enables designers to create templates, apply semantic tags, define rules, and iteratively test outcomes. We evaluated the tool through a qualitative study with twelve designers. The results show perceived gains in controllability and consistency, while revealing needs for structured visual mechanisms and lightweight explanations. These findings provide a conceptual framework, a proof-of-concept tool, and empirical insights that highlight both opportunities and challenges for advancing generative facial expression interfaces within a broader meta-design paradigm.

Authors:Cedric Faas, Richard Uth, Sarah Sterz, Markus Langer, Anna Maria Feit
Title: Don't blame me: How Intelligent Support Affects Moral Responsibility in Human Oversight
Abstract:
AI-based systems can increasingly perform work tasks autonomously. In safety-critical tasks, human oversight of these systems is required to mitigate risks and to ensure responsibility in case something goes wrong. Since people often struggle to stay focused and perform good oversight, intelligent support systems are used to assist them, giving decision recommendations, alerting users, or restricting them from dangerous actions. However, in cases where recommendations are wrong, decision support might undermine the very reason why human oversight was employed -- genuine moral responsibility. The goal of our study was to investigate how a decision support system that restricted available interventions would affect overseer's perceived moral responsibility, in particular in cases where the support errs. In a simulated oversight experiment, participants (\textit{N}=274) monitored an autonomous drone that faced ten critical situations, choosing from six possible actions to resolve each situation. An AI system constrained participants' choices to either six, four, two, or only one option (between-subject study). Results showed that participants, who were restricted to choosing from a single action, felt less morally responsible if a crash occurred. At the same time, participants' judgments about the responsibility of other stakeholders (the AI; the developer of the AI) did not change between conditions. Our findings provide important insights for user interface design and oversight architectures: they should prevent users from attributing moral agency to AI, help them understand how moral responsibility is distributed, and, when oversight aims to prevent ethically undesirable outcomes, be designed to support the epistemic and causal conditions required for moral responsibility.

Authors:Olivia Figueira, Pranathi Chamarthi, Tu Le, Athina Markopoulou
Title: Actions Speak Louder Than Chats: Investigating AI Chatbot Age Gating
Abstract:
AI chatbots are widely used by children and teens today, but they pose significant risks to youth's privacy and safety due to both increasingly personal conversations and potential exposure to unsafe content. While children under 13 are protected by the Children's Online Privacy Protection Act (COPPA), chatbot providers' own privacy policies may also provide protections, since they typically prohibit children from accessing their platforms. Age gating is often employed to restrict children online, but chatbot age gating in particular has not been studied. In this paper, we investigate whether popular consumer chatbots are (i) able to estimate users' ages based solely on their conversations, and (ii) whether they take action upon identifying children. To that end, we develop an auditing framework in which we programmatically interact with chatbots and conduct 1050 experiments using our comprehensive library of age-indicative prompts, including implicit and explicit age disclosures, to analyze the chatbots' responses and actions. We find that while chatbots are capable of estimating age, they do not take any action when children are identified, contradicting their own policies. Our methodology and findings provide insights for platform design, demonstrated by our proof-of-concept chatbot age gating implementation, and regulation to protect children online.

Authors:Sean Memery, Kartic Subr
Title: Discovering High Level Patterns from Simulation Traces
Abstract:
Artificial intelligence (AI) agents embedded in environments with physics-based interaction face many challenges including reasoning, planning, summarization, and question answering. This problem is exacerbated when a human user wishes to either guide or interact with the agent in natural language. Although the use of Language Models (LMs) is the default choice, as an AI tool, they struggle with tasks involving physics. The LM's capability for physical reasoning is learned from observational data, rather than being grounded in simulation. A common approach is to include simulation traces as context, but this suffers from poor scalability as simulation traces contain larger volumes of fine-grained numerical and semantic data. In this paper, we propose a natural language guided method to discover coarse-grained patterns (e.g., 'rigid-body collision', 'stable support', etc.) from detailed simulation logs. Specifically, we synthesize programs that operate on simulation logs and map them to a series of high level activated patterns. We show, through two physics benchmarks, that this annotated representation of the simulation log is more amenable to natural language reasoning about physical systems. We demonstrate how this method enables LMs to generate effective reward programs from goals specified in natural language, which may be used within the context of planning or supervised learning.

Authors:Yate Ge, Lin Tian, Chiqian Xu, Luyao Xu, Meiying Li, Yuanda Hu, Weiwei Guo
Title: Jokeasy: Exploring Human-AI Collaboration in Thematic Joke Generation
Abstract:
Thematic jokes are central to stand-up comedy, sitcoms, and public speaking, where contexts and punchlines rely on fresh material - news, anecdotes, and cultural references that resonate with the audience. Recent advances in Large Language Models (LLMs) have enabled interactive joke generation through conversational interfaces. Although LLMs enable interactive joke generation, ordinary conversational interfaces seldom give creators enough agency, control, or timely access to such source material for constructing context and punchlines. We designed Jokeasy, a search-enabled prototype system that integrates a dual-role LLM agent acting as both a material scout and a prototype writer to support human-AI collaboration in thematic joke writing. Jokeasy provides a visual canvas in which retrieved web content is organized into editable inspiration blocks and developed through a multistage workflow. A qualitative study with 13 hobbyists and 5 expert participants (including professional comedians and HCI/AI specialists) showed that weaving real-time web material into this structured workflow enriches ideation and preserves author agency, while also revealing needs for finer search control, tighter chat-canvas integration, and more flexible visual editing. These insights refine our understanding of AI-assisted humour writing and guide future creative-writing tools.

Authors:Ananya Gubbi Mohanbabu, Rosiana Natalie, Brandon Kim, Anhong Guo, Amy Pavel
Title: A11y-CUA Dataset: Characterizing the Accessibility Gap in Computer Use Agents
Abstract:
Computer Use Agents (CUAs) operate interfaces by pointing, clicking, and typing -- mirroring interactions of sighted users (SUs) who can thus monitor CUAs and share control. CUAs do not reflect interactions by blind and low-vision users (BLVUs) who use assistive technology (AT). BLVUs thus cannot easily collaborate with CUAs. To characterize the accessibility gap of CUAs, we present A11y-CUA, a dataset of BLVUs and SUs performing 60 everyday tasks with 40.4 hours and 158,325 events. Our dataset analysis reveals that our collected interaction traces quantitatively confirm distinct interaction styles between SU and BLVU groups (mouse- vs. keyboard-dominant) and demonstrate interaction diversity within each group (sequential vs. shortcut navigation for BLVUs). We then compare collected traces to state-of-the-art CUAs under default and AT conditions (keyboard-only, magnifier). The default CUA executed 78.3% of tasks successfully. But with the AT conditions, CUA's performance dropped to 41.67% and 28.3% with keyboard-only and magnifier conditions respectively, and did not reflect nuances of real AT use. With our open A11y-CUA dataset, we aim to promote collaborative and accessible CUAs for everyone.

Authors:Frederic Gmeiner, John Thompson, George Fitzmaurice, Justin Matejka
Title: PointAloud: An Interaction Suite for AI-Supported Pointer-Centric Think-Aloud Computing
Abstract:
Think-Aloud Computing, a method for capturing users' verbalized thoughts during software tasks, allows eliciting rich contextual insights into evolving intentions, struggles, and decision-making processes of users in real-time. However, existing approaches face practical challenges: users often lack awareness of what is captured by the system, are not effectively encouraged to speak, and miss or are interrupted by system feedback. Additionally, thinking aloud should feel worthwhile for users due to the gained contextual AI assistance. To better support and harness Think-Aloud Computing, we introduce PointAloud, a suite of novel AI-driven pointer-centric interactions for in-the-moment verbalization encouragement, low-distraction system feedback, and contextually rich work process documentation alongside proactive AI assistance. Our user study with 12 participants provides insights into the value of pointer-centric think-aloud computing for work process documentation and human-AI co-creation. We conclude by discussing the broader implications of our findings and design considerations for pointer-centric and AI-supported Think-Aloud Computing workflows.

Authors:Hanjing Shi, Dominic DiFranzo
Title: Human Control Is the Anchor, Not the Answer: Early Divergence of Oversight in Agentic AI Communities
Abstract:
Oversight for agentic AI is often discussed as a single goal ("human control"), yet early adoption may produce role-specific expectations. We present a comparative analysis of two newly active Reddit communities in Jan--Feb 2026 that reflect different socio-technical roles: r/OpenClaw (deployment and operations) and r/Moltbook (agent-centered social interaction). We conceptualize this period as an early-stage crystallization phase, where oversight expectations form before norms reach equilibrium. Using topic modeling in a shared comparison space, a coarse-grained oversight-theme abstraction, engagement-weighted salience, and divergence tests, we show the communities are strongly separable (JSD =0.418, cosine =0.372, permutation $p=0.0005$). Across both communities, "human control" is an anchor term, but its operational meaning diverges: r/OpenClaw} emphasizes execution guardrails and recovery (action-risk), while r/Moltbook} emphasizes identity, legitimacy, and accountability in public interaction (meaning-risk). The resulting distinction offers a portable lens for designing and evaluating oversight mechanisms that match agent role, rather than applying one-size-fits-all control policies.

Authors:Guangping Liu, Nicholas Hawkins, Billy Madden, Tipu Sultan, Madi Babaiasl
Title: A Dialogue-Based Human-Robot Interaction Protocol for Wheelchair and Robotic Arm Integrated Control
Abstract:
People with lower and upper body disabilities can benefit from wheelchairs and robotic arms to improve mobility and independence. Prior assistive interfaces, such as touchscreens and voice-driven predefined commands, often remain unintuitive and struggle to capture complex user intent. We propose a natural, dialogue based human robot interaction protocol that simulates an intelligent agent capable of communicating with users to understand intent and execute assistive actions. In a pilot study, five participants completed five assistive tasks (cleaning, drinking, feeding, drawer opening, and door opening) through dialogue-based interaction with a wheelchair and robotic arm. As a baseline, participants were required to open a door using the manual control (a wheelchair joystick and a game controller for the arm) and complete a questionnaire to gather their feedback. By analyzing the post-study questionnaires, we found that most participants enjoyed the dialogue-based interaction and assistive robot autonomy.

Authors:Michelle L. Ding, Harini Suresh, Suresh Venkatasubramanian
Title: How to Stop Playing Whack-a-Mole: Mapping the Ecosystem of Technologies Facilitating AI-Generated Non-Consensual Intimate Images
Abstract:
The last decade has witnessed a rapid advancement of generative AI technology that significantly scaled the accessibility of AI-generated non-consensual intimate images (AIG-NCII), a form of image-based sexual abuse that disproportionately harms women and girls. There is a patchwork of commendable efforts across industry, policy, academia, and civil society to address AIG-NCII. However, these efforts lack a shared, consistent mental model that situates the technologies they target within the context of a large, interconnected, and ever-evolving technological ecosystem. As a result, interventions remain siloed and are difficult to evaluate and compare, leading to a reactive cycle of whack-a-mole. We contribute the first comprehensive AIG-NCII technological ecosystem that maps and taxonomizes 11 categories of technologies facilitating the creation, distribution, proliferation and discovery, infrastructural support, and monetization of AIG-NCII. First, we build and visualize the ecosystem through a synthesis of over a hundred primary sources from researchers, journalists, advocates, policymakers, and technologists. Next, we demonstrate how stakeholders can use the ecosystem as a tool to 1) understand new incidents of harm via a case study of Grok and 2) evaluate existing interventions via three more case studies. We conclude with three actionable recommendations, namely that stakeholders should 1) use the ecosystem to map out state, federal, and international laws to produce a clearer policy landscape, 2) collectively develop a database that dynamically tracks the 11 technologies in the ecosystem to better evaluate interventions, and 3) adopt a relational approach to researching AIG-NCII to better understand how the ecosystem technologies interact.

Authors:Isaac Sheidlower, Jindan Huang, James Staley, Bingyu Wu, Qicong Chen, Reuben Aronson, Elaine Short
Title: How Users Understand Robot Foundation Model Performance through Task Success Rates and Beyond
Abstract:
Robot Foundation Models (RFMs) represent a promising approach to developing general-purpose home robots. Given the broad capabilities of RFMs, users will inevitably ask an RFM-based robot to perform tasks that the RFM was not trained or evaluated on. In these cases, it is crucial that users understand the risks associated with attempting novel tasks due to the relatively high cost of failure. Furthermore, an informed user who understands an RFM's capabilities will know what situations and tasks the robot can handle. In this paper, we study how non-roboticists interpret performance information from RFM evaluations. These evaluations typically report task success rate (TSR) as the primary performance metric. While TSR is intuitive to experts, it is necessary to validate whether novices also use this information as intended. Toward this end, we conducted a study in which users saw real evaluation data, including TSR, failure case descriptions, and videos from multiple published RFM research projects. The results highlight that non-experts not only use TSR in a manner consistent with expert expectations but also highly value other information types, such as failure cases that are not often reported in RFM evaluations. Furthermore, we find that users want access to both real data from previous evaluations of the RFM and estimates from the robot about how well it will do on a novel task.

Authors:Yihao Dong, Praneeth Bimsara Perera, Chin-Teng Lin, Craig T Jin, Anusha Withana
Title: TactDeform: Finger Pad Deformation Inspired Spatial Tactile Feedback for Virtual Geometry Exploration
Abstract:
Spatial tactile feedback can enhance the realism of geometry exploration in virtual reality applications. Current vibrotactile approaches often face challenges with the spatial and temporal resolution needed to render different 3D geometries. Inspired by the natural deformation of finger pads when exploring 3D objects and surfaces, we propose TactDeform, a parametric approach to render spatio-temporal tactile patterns using a finger-worn electro-tactile interface. The system dynamically renders electro-tactile patterns based on both interaction contexts (approaching, contact, and sliding) and geometric contexts (geometric features and textures), emulating deformations that occur during real-world touch exploration. Results from a user study \rr{(N=24)} show that the proposed approach enabled high texture discrimination and geometric feature identification compared to a baseline. Informed by results from a free 3D-geometry exploration phase, we provide insights that can inform future tactile interface designs.

Authors:Ziheng Huang, Robin Kar, Hari Sundaram, Tal August
Title: Living Contracts: Beyond Document-Centric Interaction with Legal Agreements
Abstract:
User interaction with legal contracts has been limited to document reading, which is often complicated by complex, ambiguous legal language. We explore possible futures where contract interfaces go beyond single document interfaces to (1) educate users with legal rights not stated in the contract, (2) transform legal language into alternative representations to aid information tasks before, during, and after signing, and (3) proactively supply contractual information at relevant moments. We refer to these future interfaces collectively as Living Contracts. Using residential leases as a case study, we created three design probes representing different possible Living Contracts. A three-part qualitative study (N=18) revealed participants' barriers to interacting with contracts, including interpreting complex language, uncertainty about legal rights, and the pressure to sign quickly. Participants' feedback on the probes highlighted how Living Contracts have the potential to address these challenges and open new design opportunities for human-contract interactions beyond document reading.

Authors:Dana Feng, Bhada Yun, April Wang
Title: From Junior to Senior: Allocating Agency and Navigating Professional Growth in Agentic AI-Mediated Software Engineering
Abstract:
Juniors enter as AI-natives, seniors adapted mid-career. AI is not just changing how engineers code-it is reshaping who holds agency across work and professional growth. We contribute junior-senior accounts on their usage of agentic AI through a three-phase mixed-methods study: ACTA combined with a Delphi process with 5 seniors, an AI-assisted debugging task with 10 juniors, and blind reviews of junior prompt histories by 5 more seniors. We found that agency in software engineering is primarily constrained by organizational policies rather than individual preferences, with experienced developers maintaining control through detailed delegation while novices struggle between over-reliance and cautious avoidance. Seniors leverage pre-AI foundational instincts to steer modern tools and possess valuable perspectives for mentoring juniors in their early AI-encouraged career development. From synthesis of results, we suggest three practices that focus on preserving agency in software engineering for coding, learning, and mentorship, especially as AI grows increasingly autonomous.

Authors:Hansol Lee, AJ Alvero, René F. Kizilcec, Thorsten Joachims
Title: Does Algorithmic Uncertainty Sway Human Experts? Evidence from a Field Experiment in Selective College Admissions
Abstract:
Algorithmic predictions are inherently uncertain: even models with similar aggregate accuracy can produce different predictions for the same individual, raising concerns that high-stakes decisions may become sensitive to arbitrary modeling choices. In this paper, we define algorithmic reliance as the extent to which a decision outcome depends on whether a more favorable versus less favorable algorithmic prediction is presented to the decision-maker. We estimate this in a randomized field experiment (n=19,545) embedded in a selective U.S. college admissions cycle, in which admissions officers reviewed each application alongside an algorithmic score while we randomly varied whether the score came from one of two similarly accurate prediction models. Although the two models performed similarly in aggregate, they frequently assigned different scores to the same applicant, creating exogenous variation in the score shown. Surprisingly, we find little evidence of algorithmic reliance: presenting a more favorable score does not meaningfully increase an applicant's probability of admission on average, even when the models disagree substantially. These findings suggest that, in this expert, high-stakes setting, human decision-making is largely invariant to arbitrary variation in algorithmic predictions, underscoring the role of professional discretion and institutional context in mediating the downstream effects of algorithmic uncertainty.

Authors:Yichun Zhao, Miguel A. Nacenta, Mahadeo A. Sukhai, Sowmya Somanath
Title: Accessibility-Driven Information Transformations in Mixed-Visual Ability Work Teams
Abstract:
Blind and low-vision (BLV) employees in mixed-visual ability teams often encounter information (e.g., PDFs, diagrams) in inaccessible formats. To enable teamwork, teams must transform these representations by modifying or re-creating them into accessible forms. However, these transformations are frequently overlooked, lack infrastructural support, and cause additional labour. To design systems that move beyond one-off accommodations to effective mixed-ability collaboration, we need a deeper understanding of the representations, their transformations and how they occur. We conducted a week-long diary study with follow-up interviews with 23 BLV and sighted professionals from five legal, non-profit, and consulting teams, documenting 36 transformation cases. Our analysis characterizes how teams perform representational transformations for accessibility: how they are triggered proactively or reactively, how they simplify or enhance, and four common patterns in which workers coordinate with each other to address representational incompatibility. Our findings uncover opportunities for designing systems that can better support mixed-visual ability work.

Authors:Alexander Erlei, Federico Cau, Radoslav Georgiev, Sagar Kumar, Kilian Bizer, Ujwal Gadiraju
Title: When Life Gives You AI, Will You Turn It Into A Market for Lemons? Understanding How Information Asymmetries About AI System Capabilities Affect Market Outcomes and Adoption
Abstract:
AI consumer markets are characterized by severe buyer-supplier market asymmetries. Complex AI systems can appear highly accurate while making costly errors or embedding hidden defects. While there have been regulatory efforts surrounding different forms of disclosure, large information gaps remain. This paper provides the first experimental evidence on the important role of information asymmetries and disclosure designs in shaping user adoption of AI systems. We systematically vary the density of low-quality AI systems and the depth of disclosure requirements in a simulated AI product market to gauge how people react to the risk of accidentally relying on a low-quality AI system. Then, we compare participants' choices to a rational Bayesian model, analyzing the degree to which partial information disclosure can improve AI adoption. Our results underscore the deleterious effects of information asymmetries on AI adoption, but also highlight the potential of partial disclosure designs to improve the overall efficiency of human decision-making.

Authors:Diaoulé Diallo, Katharina Dworatzyk, Sophie Jentzsch, Peer Schütt, Sabine Theis, Tobias Hecking
Title: The Effectiveness of Style Vectors for Steering Large Language Models: A Human Evaluation
Abstract:
Controlling the behavior of large language models (LLMs) at inference time is essential for aligning outputs with human abilities and safety requirements. \emph{Activation steering} provides a lightweight alternative to prompt engineering and fine-tuning by directly modifying internal activations to guide generation. This research advances the literature in three significant directions. First, while previous work demonstrated the technical feasibility of steering emotional tone using automated classifiers, this paper presents the first human evaluation of activation steering concerning the emotional tone of LLM outputs, collecting over 7,000 crowd-sourced ratings from 190 participants via Prolific ($n=190$). These ratings assess both perceived emotional intensity and overall text quality. Second, we find strong alignment between human and model-based quality ratings (mean $r=0.776$, range $0.157$--$0.985$), indicating automatic scoring can proxy perceived quality. Moderate steering strengths ($λ\approx 0.15$) reliably amplify target emotions while preserving comprehensibility, with the strongest effects for disgust ($η_p^2 = 0.616$) and fear ($η_p^2 = 0.540$), and minimal effects for surprise ($η_p^2 = 0.042$). Finally, upgrading from Alpaca to LlaMA-3 yielded more consistent steering with significant effects across emotions and strengths (all $p < 0.001$). Inter-rater reliability was high (ICC $= 0.71$--$0.87$), underscoring the robustness of the findings. These findings support activation-based control as a scalable method for steering LLM behavior across affective dimensions.

Authors:Suifang Zhou, Qi Gong, Ximing Shen, RAY LC
Title: Tell Me What I Missed: Tell Me What I Missed: Interacting with GPT during Recalling of One-Time Witnessed Events
Abstract:
LLM-assisted technologies are increasingly used to support cognitive processing and information interpretation, yet their role in aiding memory recall, and how people choose to engage with them, remains underexplored. We studied participants who watched a short robbery video (approximating a one-time eyewitness scenario) and composed recall statements using either a default GPT or a guided GPT prompted with a standardized eyewitness protocol. Results show that, in the default condition, participants who believed they had a clearer understanding of the event were more likely to trust GPT's output, whereas in the guided condition, participants showed stronger alignment between subjective clarity and actual recall. Additionally, participants evaluated the legitimacy of the individuals in the incident differently across conditions. Interaction analysis further revealed that default-GPT users spontaneously developed diverse strategies, including building on existing recollections, requesting potentially missing details, and treating GPT as a recall coach. This work shows how GPT-user interplay can subconsciously shape beliefs and perceptions of remembered events.

Authors:Ahrii Kim, Seong-heum Kim
Title: Do LLMs Truly Benefit from Longer Context in Automatic Post-Editing?
Abstract:
Automatic post-editing (APE) aims to refine machine translations by correcting residual errors. Although recent large language models (LLMs) demonstrate strong translation capabilities, their effectiveness for APE--especially under document-level context--remains insufficiently understood. We present a systematic comparison of proprietary and open-weight LLMs under a naive document-level prompting setup, analyzing APE quality, contextual behavior, robustness, and efficiency. Our results show that proprietary LLMs achieve near human-level APE quality even with simple one-shot prompting, regardless of whether document context is provided. While these models exhibit higher robustness to data poisoning attacks than open-weight counterparts, this robustness also reveals a limitation: they largely fail to exploit document-level context for contextual error correction. Furthermore, standard automatic metrics do not reliably reflect these qualitative improvements, highlighting the continued necessity of human evaluation. Despite their strong performance, the substantial cost and latency overheads of proprietary LLMs render them impractical for real-world APE deployment. Overall, our findings elucidate both the promise and current limitations of LLM-based document-aware APE, and point toward the need for more efficient long-context modeling approaches for translation refinement.

Authors:Brianna L. Wimer, Ritesh Kanchi, Kaija Frierson, Venkatesh Potluri, Ronald Metoyer, Jennifer Mankoff, Miya Natsuhara, Matt X. Wang
Title: Nonvisual Support for Understanding and Reasoning about Data Structures
Abstract:
Blind and visually impaired (BVI) computer science students face systematic barriers when learning data structures: current accessibility approaches typically translate diagrams into alternative text, focusing on visual appearance rather than preserving the underlying structure essential for conceptual understanding. More accessible alternatives often do not scale in complexity, cost to produce, or both. Motivated by a recent shift to tools for creating visual diagrams from code, we propose a solution that automatically creates accessible representations from structural information about diagrams. Based on a Wizard-of-Oz study, we derive design requirements for an automated system, Arboretum, that compiles text-based diagram specifications into three synchronized nonvisual formats$\unicode{x2013}$tabular, navigable, and tactile. Our evaluation with BVI users highlights the strength of tactile graphics for complex tasks such as binary search; the benefits of offering multiple, complementary nonvisual representations; and limitations of existing digital navigation patterns for structural reasoning. This work reframes access to data structures by preserving their structural properties. The solution is a practical system to advance accessible CS education.

Authors:Joy Lai, Kelly Beaton, David Black, Alex Mihailidis
Title: Listening before Asking: Lived-Experience Advisors as Methodological Partners in Dementia Caregiving Studies
Abstract:
Research with dementia caregivers poses persistent methodological and ethical challenges, particularly when interview-based studies are designed without sufficient grounding in lived caregiving realities. Questions framed through clinical or deficit-oriented assumptions risk alienating participants, undermining rapport, and producing shallow or ethically fraught data. While human-computer interaction (HCI) research increasingly adopts participatory approaches in technology design, participation rarely extends to the design of research methods themselves. This paper examines the role of lived-experience advisors as methodological partners in caregiver interview research. We report on a qualitative study in which two advisors with extensive dementia caregiving experience were engaged prior to fieldwork as methodological partners, extending participatory principles beyond technology design into the design of research methods themselves. Drawing on transcripts of advisor consultations and subsequent interviews with ten caregivers and one person living with dementia, we identify two key methodological contributions of advisor involvement. First, advisors enabled anticipatory validity by surfacing caregiving challenges, ethical sensitivities, and interpretive concerns that later appeared in caregiver interviews, allowing the researcher to enter the field with grounded awareness under constrained recruitment and fieldwork conditions. Second, advisors provided cultural, emotional, and systemic context that improved interpretive sensitivity and helped avoid misreadings. We argue that lived experience functions as methodological infrastructure, extending participatory principles into the design and conduct of research itself, and constituting a generalizable methodological pattern for HCI research with caregivers and other vulnerable or marginalized populations.

Authors:Lindsay Popowski, Helena Vasconcelos, Ignacio Javier Fernandez, Chijioke Chinaza Mgbahurike, Ralf Herbrich, Jeffrey Hancock, Michael S. Bernstein
Title: People Can Accurately Predict Behavior of Complex Algorithms That Are Available, Compact, and Aligned
Abstract:
Users trust algorithms more when they can predict the algorithms' behavior. Simple algorithms trivially yield predictively accurate mental models, but modern AI algorithms have often been assumed too complex for people to build predictive mental models, especially in the social media domain. In this paper, we describe conditions under which even complex algorithms can yield predictive mental models, opening up opportunities for a broader set of human-centered algorithms. We theorize that users will form an accurate predictive mental model of an algorithm's behavior if and only if the algorithm simultaneously satisfies three criteria: (1) cognitive availability of the underlying concepts being modeled, (2) concept compactness (does it form a single cognitive construct?), and (3) high alignment between the person's and algorithm's execution of the concept. We evaluate this theory through a pre-registered experiment (N=1250) where users predict behavior of 25 social media feed ranking algorithms that vary on these criteria. We find that even complex (e.g., LLM-based) algorithms enjoy accurate prediction rates when they meet all criteria, and even simple (e.g., basic term count) algorithms fail to be predictable when a single criterion fails. We also find that these criteria determine outcomes beyond prediction accuracy, such as which mental models users deploy to make their predictions.

Authors:Ahana Ghosh, Advait Sarkar, Siân Lindley, Christian Poelitz
Title: An Experimental Comparison of Cognitive Forcing Functions for Execution Plans in AI-Assisted Writing: Effects On Trust, Overreliance, and Perceived Critical Thinking
Abstract:
Generative AI (GenAI) tools improve productivity in knowledge workflows such as writing, but also risk overreliance and reduced critical thinking. Cognitive forcing functions (CFFs) mitigate these risks by requiring active engagement with AI output. As GenAI workflows grow more complex, systems increasingly present execution plans for user review. However, these plans are themselves AI-generated and prone to overreliance, and the effectiveness of applying CFFs to AI plans remains underexplored. We conduct a controlled experiment in which participants completed AI-assisted writing tasks while reviewing AI-generated plans under four CFF conditions: Assumption (argument analysis), WhatIf (hypothesis testing), Both, and a no-CFF control. A follow-up think-aloud and interview study qualitatively compared these conditions. Results show that the Assumption CFF most effectively reduced overreliance without increasing cognitive load, while participants perceived the WhatIf CFF as most helpful. These findings highlight the value of plan-focused CFFs for supporting critical reflection in GenAI-assisted knowledge work.

Authors:Benjamin Mako Hill, Aaron Shaw
Title: The Most Important Laboratory for Social Scientific and Computing Research in History
Abstract:
Wikipedia's founders could not have dreamed they were creating the most important laboratory for social scientific and computing research in history but that is exactly what happened. Hill and Shaw take account of Wikipedia's enormous effect on academic scholarship

Authors:Qing Zhang, Junyu Chen, Yifei Huang, Jing Huang, Thad Starner, Kai Kunze, Jun Rekimoto
Title: Beyond Symbols: Motion Perception Cues Enhance Dual-Task Performance with Wearable Directional Guidance
Abstract:
Directional cues are crucial for environmental interaction. Conventional methods rely on symbolic visual or auditory reminders that require semantic interpretation, a process that proves challenging in demanding dual-tasking scenarios. We introduce a novel alternative for conveying directional cues on wearable displays: directly triggering motion perception using monocularly presented peripheral stimuli. This approach is designed for low visual interference, with the goal of reducing the need for gaze-switching and the complex cognitive processing associated with symbols. User studies demonstrate our method's potential to robustly convey directional cues. Compared to a conventional arrow-based technique in a demanding dual-task scenario, our motion-based approach resulted in significantly more accurate interpretation of these directional cues ($p=.008$) and showed a trend towards reduced errors on the concurrent primary task ($p=.066$).

Authors:Yoonsang Kim, Yalong Yang, Arie E. Kaufman
Title: Memento: Towards Proactive Visualization of Everyday Memories with Personal Wearable AR Assistant
Abstract:
We introduce Memento, a conversational AR assistant that permanently captures and memorizes user's verbal queries alongside their spatiotemporal and activity contexts. By storing these "memories," Memento discovers connections between users' recurring interests and the contexts that trigger them. Upon detection of similar or identical spatiotemporal activity, Memento proactively recalls user interests and delivers up-to-date responses through AR, seamlessly integrating AR experience into their daily routine. Unlike prior work, each interaction in Memento is not a transient event, but a connected series of interactions with coherent long--term perspective, tailored to the user's broader multimodal (visual, spatial, temporal, and embodied) context. We conduct preliminary evaluation through user feedbacks with participants of diverse expertise in immersive apps, and explore the value of proactive context-aware AR assistant in everyday settings. We share our findings and challenges in designing a proactive, context-aware AR system.

Authors:Rishi Vanukuru, Krithik Ranjan, Ada Yi Zhao, David Lindero, Gunilla H. Berndtsson, Gregoire Phillips, Amy Banić, Mark D. Gross, Ellen Yi-Luen Do
Title: Studying Mobile Spatial Collaboration across Video Calls and Augmented Reality
Abstract:
Mobile video calls are widely used to share information about real-world objects and environments with remote collaborators. While these calls provide valuable visual context in real time, the experience of interacting with people and moving around a space is significantly reduced when compared to co-located conversations. Recent work has demonstrated the potential of Mobile Augmented Reality applications to enable more spatial forms of collaboration across distance. To better understand the dynamics of mobile AR collaboration and how this medium compares against the status quo, we conducted a comparative structured observation study to analyze people's perception of space and interaction with remote collaborators across mobile video calls and AR-based calls. Fourteen pairs of participants completed a spatial collaboration task using each medium. Through a mixed-methods analysis of session videos, transcripts, motion logs, post-task exercises, and interviews, we highlight how the choice of medium influences the roles and responsibilities that collaborators take on and the construction of a shared language for coordination. We discuss the importance of spatial reasoning with one's body, how video calls help participants "be on the same page" more directly, and how AR calls enable both onsite and remote collaborators to engage with the space and each other in ways that resemble in-person interaction. Our study offers a nuanced view of the benefits and limitations of both mediums, and we conclude with a discussion of design implications for future systems that integrate mobile video and AR to better support spatial collaboration in its many forms.

Authors:Xuyi Hu, Ke Ma, Siwei Liu, Per Ola Kristensson, Stefan Goetz
Title: Optical Tag-Based Neuronavigation and Augmentation System for Non-Invasive Brain Stimulation
Abstract:
Accurate neuronavigation is critical for effective transcranial magnetic stimulation (TMS), as stimulation outcomes depend directly on precise coil placement. Existing neuronavigation systems are often costly, complex, and prone to tracking errors. To address these limitations, we present a computer vision based neuronavigation system that enables real time tracking of the patient and TMS instrumentation. The system integrates a multi camera optical tracking setup with consumer grade hardware and visible markers to drive a digital twin of the stimulation process. A dynamic 3D brain model in Unity updates in real time to visualize coil position and estimated stimulation targets. Augmented reality (AR) is further incorporated to project this model directly onto the patient's head, enabling intuitive, in situ coil adjustment without reliance on abstract numerical displays. Overall, the proposed approach improves spatial precision and accuracy while enhancing usability.

Authors:Zhengtao Xu, Junti Zhang, Anthony Tang, Yi-Chieh Lee
Title: Who You Explain To Matters: Learning by Explaining to Conversational Agents with Different Pedagogical Roles
Abstract:
Conversational agents are increasingly used in education for learning support. An application is "learning by explaining", where learners explain their understanding to an agent. However, existing research focuses on single roles, leaving it unclear how different pedagogical roles influence learners' interaction patterns, learning outcomes and experiences. We conducted a between-subjects study (N=96) comparing agents with three pedagogical roles (Tutee, Peer, Challenger) and a control condition while learning an economics concept. We found that different pedagogical roles shaped learning dynamics, including interaction patterns and experiences. Specifically, the Tutee agent elicited the most cognitive investment but led to high pressure. The Peer agent fostered high absorption and interest through collaborative dialogue. The Challenger agent promoted cognitive and metacognitive acts, enhancing critical thinking with moderate pressure. The findings highlight how agent roles shape different learning dynamics, guiding the design of educational agents tailored to specific pedagogical goals and learning phases.

Authors:Michael Yin, Robert Xiao, Nadine Wagener
Title: Reflective Motion and a Physical Canvas: Exploring Embodied Journaling in Virtual Reality
Abstract:
In traditional journaling practices, authors express and process their thoughts by writing them down. We propose a somaesthetic-inspired alternative that uses the human body, rather than written words, as the medium of expression. We coin this embodied journaling, as people's isolated body movements and spoken words become the canvas of reflection. We implemented embodied journaling in virtual reality and conducted a within-subject user study (n=20) to explore the emergent behaviours from the process and to compare its expressive and reflective qualities to those of written journaling. When writing-based norms and affordances were absent, we found that participants defaulted towards unfiltered emotional expression, often forgoing words altogether. Rather, subconscious body motion and paralinguistic acoustic qualities unveiled deeper, sometimes hidden feelings, prompting reflection that happens after emotional expression rather than during it. We discuss both the capabilities and pitfalls of embodied journaling, ultimately challenging the idea that reflection culminates in linguistic reasoning.

Authors:Wanqi Zhang, Jiangen He, Marielle Santos
Title: Tackling the Scaffolding Paradox: A Person-Centered Adaptive Robotic Interview Coach
Abstract:
Job interview anxiety is a prevalent challenge among university students and can undermine both performance and confidence in high-stakes evaluative situations. Social robots have shown promise in reducing anxiety through emotional support, yet how such systems should balance psychological safety with effective instructional guidance remains an open question. In this work, we present a three-phase iterative design study of a robotic interview coach grounded in Person-Centered Therapy (PCT) and instructional scaffolding theory. Across three weekly sessions (N=8), we systematically explored how different interaction strategies shape users' emotional experience, cognitive load, and perceived utility. Phase I demonstrated that a PCT-based robot substantially increased perceived psychological safety but introduced a Safety-Guidance Gap, in which users felt supported yet insufficiently coached. Phase II revealed a Scaffolding Paradox: immediate feedback improved clarity but disrupted conversational flow and increased cognitive load, whereas delayed feedback preserved realism but lacked actionable specificity. To resolve this tension, Phase III introduced an Agency-Driven Interaction Mode that allowed users to opt in to feedback dynamically. Qualitative findings indicated that user control acted as an anxiety buffer, restoring trust, reducing overload, and reframing the interaction as collaborative rather than evaluative. Quantitative measures further showed significant reductions in interview-related social and communication anxiety, while maintaining high perceived warmth and therapeutic alliance. We synthesize these findings into an Adaptive Scaffolding Ecosystem framework, highlighting user agency as a key mechanism for balancing emotional support and instructional guidance in social robot coaching systems.

Authors:Nazar Ponochevnyi, Young-Ho Kim, Joseph Jay Williams, Anastasia Kuzminykh
Title: Talk Me Through It: Developing Effective Systems for Chart Authoring
Abstract:
Recent chart-authoring systems increasingly focus on natural-language input, enabling users to form a mental image of the chart they wish to create and express this intent using spoken instructions (spoken imagined-chart data). Yet these systems are predominantly trained on typed instructions written while viewing the target chart (typed existing-chart data). While the cognitive processes for describing an existing chart arguably differ from those for creating a new chart, the structural differences in the corresponding prompts remain underexplored. We present empirical findings on the structural differences among spoken imagined-chart instructions, typed imagined-chart instructions, and typed existing-chart instructions for chart creation, showing that imagined-chart prompts contain richer command formats, element specifications, and complex linguistic features, especially in spoken instructions. We then compare the performance of systems trained on spoken imagined-chart data versus typed existing-chart data, finding that the first system outperforms the second one on both voice and text input, highlighting the necessity of targeted training on spoken imagined-chart data. We conclude with design guidelines for chart-authoring systems to improve performance in real-world scenarios.

Authors:Shenghan Gao, Junye Wang, Junjie Xiong, Yun Jiang, Yun Fang, Qifan Hu, Baolong Liu, Quan Li
Title: SCSimulator: An Exploratory Visual Analytics Framework for Partner Selection in Supply Chains through LLM-driven Multi-Agent Simulation
Abstract:
Supply chains (SCs), complex networks spanning from raw material acquisition to product delivery, with enterprises as interconnected nodes, play a pivotal role in organizational success. However, optimizing SCs remains challenging, particularly in partner selection, a key bottleneck shaped by competitive and cooperative dynamics. This challenge constitutes a multi-objective dynamic game requiring a synergistic integration of Multi-Criteria Decision-Making and Game Theory. Traditional approaches, grounded in mathematical simplifications and managerial heuristics, fail to capture real-world intricacies and risk introducing subjective biases. Multi-agent simulation offers promise, but prior research has largely relied on fixed, uniform agent logic, limiting practical applicability. Recent advances in LLMs create opportunities to represent complex SC requirements and hybrid game logic. However, challenges persist in modeling dynamic SC relationships, ensuring interpretability, and balancing agent autonomy with expert control. We present SCSimulator, a visual analytics framework that integrates LLM-driven MAS with human-in-the-loop collaboration for SC partner selection. It simulates SC evolution via adaptive network structures and enterprise behaviors, which are visualized via interpretable interfaces. By combining CoT reasoning with XAI techniques, it generates multi-faceted, transparent explanations of decision trade-offs. Users can iteratively adjust simulation settings to explore outcomes aligned with their expectations and strategic priorities. Developed through iterative co-design with SC experts and industry managers, SCSimulator serves as a proof-of-concept, offering methodological contributions and practical insights for future research on SC decision-making and interactive AI-driven analytics. Usage scenarios and a user study demonstrate the system's effectiveness and usability.

Authors:Ziyi Liu, Xinyi Wang, Shao-Kang Hsia, Chenfei Zhu, Zhengzhe Zhu, Xiyun Hu, Anastasia Kouvaras Ostrowski, Karthik Ramani
Title: Towards Natural Language Environment: Understanding Seamless Natural-Language-Based Human-Multi-Robot Interactions
Abstract:
As multiple robots are expected to coexist in future households, natural language is increasingly envisioned as a primary medium for human-robot and robot-robot communication. This paper introduces the concept of a Natural Language Environment (NLE), defined as an interaction space in which humans and multiple heterogeneous robots coordinate primarily through natural language. Rather than proposing a deployable system, this work aims to explore the design space of such environments. We first synthesize prior work on language-based human-robot interaction to derive a preliminary design space for NLEs. We then conduct a role-playing study in virtual reality to investigate how people conceptualize, negotiate, and coordinate human-multi-robot interactions within this imagined environment. Based on qualitative and quantitative analysis, we refine the preliminary design space and derive design implications that highlight key tensions and opportunities around task coordination dominance, robot autonomy, and robot personality in Natural Language Environments.

Authors:Huixin Xue, Guangjun Xu, Shihong Ren, Xian Gao, Ruian Tie, Zhen Zhou, Hao Liu, Yue Gao
Title: Democratizing Music Therapy: LLM-Based Automated EEG Analysis and Progress Tracking for Low-Cost Home Devices
Abstract:
Home-based music therapy devices require accessible and cost-effective solutions for users to understand and track their therapeutic progress. Traditional physiological signal analysis, particularly EEG interpretation, relies heavily on domain experts, creating barriers to scalability and home adoption. Meanwhile, few experts are capable of interpreting physiological signal data while also making targeted music recommendations. While large language models (LLMs) have shown promise in various domains, their application to automated physiological report generation for music therapy represents an unexplored task. We present a prototype system that leverages LLMs to bridge this gap -- transforming raw EEG and cardiovascular data into human-readable therapeutic reports and personalized music recommendations. Unlike prior work focusing on real-time physiological adaptation during listening, our approach emphasizes post-session analysis and interpretable reporting, enabling non-expert users to comprehend their psychophysiological states and track therapeutic outcomes over time. By integrating signal processing modules with LLM-based reasoning agents, the system provides a practical and low-cost solution for short-term progress monitoring in home music therapy contexts. This work demonstrates the feasibility of applying LLMs to a novel task -- democratizing access to physiology-driven music therapy through automated, interpretable reporting.

Authors:Minju Park, Seunghyun Lee, Juhwan Ma, Dongwook Yoon
Title: AI Twin: Enhancing ESL Speaking Practice through AI Self-Clones of a Better Me
Abstract:
Advances in AI have enabled ESL learners to practice speaking through conversational systems. However, most tools rely on explicit correction, which can interrupt the conversation and undermine confidence. Grounded in second language acquisition and motivational psychology, we present AI Twin, a system that rephrases learner utterances into more fluent English and delivers them in the learner's voice. Embodying a more confident and proficient version of the learner, AI Twin reinforces motivation through alignment with their aspirational Ideal L2 Self. Also, its use of implicit feedback through rephrasing preserves conversational flow and fosters an emotionally supportive environment. In a within-subject study with 20 adult ESL learners, we compared AI Twin with explicit correction and a non-personalized rephrasing agent. Results show that AI Twin elicited higher emotional engagement, with participants describing the experience as more motivating. These findings highlight the potential of self-representative AI for personalized, psychologically grounded support in ESL learning.

Authors:Amber Kusters, Pooja Prajod, Pablo Cesar, Abdallah El Ali
Title: More Human or More AI? Visualizing Human-AI Collaboration Disclosures in Journalistic News Production
Abstract:
Within journalistic editorial processes, disclosing AI usage is currently limited to simplistic labels, which misses the nuance of how humans and AI collaborated on a news article. Through co-design sessions (N=10), we elicited 69 disclosure designs and implemented four prototypes that visually disclose human-AI collaboration in journalism. We then ran a within-subjects lab study (N=32) to examine how disclosure visualizations (Textual, Role-based Timeline, Task-based Timeline, Chatbot) and collaboration ratios (Primarily Human vs. Primarily AI) influenced visualization perceptions, gaze patterns, and post-experience responses. We found that textual disclosures were least effective in communicating human-AI collaboration, whereas Chatbot offered the most in-depth information. Furthermore, while role-based timelines amplified AI contribution in primarily human articles, task-based timeline shifted perceptions toward human involvement in primarily AI articles. We contribute Human-AI collaboration disclosure visualizations and their evaluation, and cautionary considerations on how visualizations can alter perceptions of AI's actual role during news article creation.

Authors:Wanqi Zhang, Jiangen He, Marielle Santos
Title: Bridging Psychological Safety and Skill Guidance: An Adaptive Robotic Interview Coach
Abstract:
Social robots hold promise for reducing job interview anxiety, yet designing agents that provide both psychological safety and instructional guidance remains challenging. Through a three-phase iterative design study (N = 8), we empirically mapped this tension. Phase I revealed a "Safety-Guidance Gap": while a Person-Centered Therapy (PCT) robot established safety (d = 3.27), users felt insufficiently coached. Phase II identified a "Scaffolding Paradox": rigid feedback caused cognitive overload, while delayed feedback lacked specificity. In Phase III, we resolved these tensions by developing an Agency-Driven Interaction Layer. Synthesizing our empirical findings, we propose the Adaptive Scaffolding Ecosystem, a conceptual framework that redefines robotic coaching not as a static script, but as a dynamic balance between affective support and instructional challenge, mediated by user agency.

Authors:Agnia Sergeyuk, Eric Huang, Dariia Karaeva, Anastasiia Serova, Yaroslav Golubev, Iftekhar Ahmed
Title: Evolving with AI: A Longitudinal Analysis of Developer Logs
Abstract:
AI-powered coding assistants are rapidly becoming fixtures in professional IDEs, yet their sustained influence on everyday development remains poorly understood. Prior research has focused on short-term use or self-reported perceptions, leaving open questions about how sustained AI use reshapes actual daily coding practices in the long term. We address this gap with a mixed-method study of AI adoption in IDEs, combining longitudinal two-year fine-grained telemetry from 800 developers with a survey of 62 professionals. We analyze five dimensions of workflow change: productivity, code quality, code editing, code reuse, and context switching. Telemetry reveals that AI users produce substantially more code but also delete significantly more. Meanwhile, survey respondents report productivity gains and perceive minimal changes in other dimensions. Our results offer empirical insights into the silent restructuring of software workflows and provide implications for designing future AI-augmented tooling.

Authors:Pooja Prajod, Hannes Cools, Thomas Röggla, Karthikeya Puttur Venkatraj, Amber Kusters, Alia ElKattan, Pablo Cesar, Abdallah El Ali
Title: Full Disclosure, Less Trust? How the Level of Detail about AI Use in News Writing Affects Readers' Trust
Abstract:
As artificial intelligence (AI) is increasingly integrated into news production, calls for transparency about the use of AI have gained considerable traction. Recent studies suggest that AI disclosures can lead to a ``transparency dilemma'', where disclosure reduces readers' trust. However, little is known about how the \textit{level of detail} in AI disclosures influences trust and contributes to this dilemma within the news context. In this 3$\times$2$\times$2 mixed factorial study with 40 participants, we investigate how three levels of AI disclosures (none, one-line, detailed) across two types of news (politics and lifestyle) and two levels of AI involvement (low and high) affect news readers' trust. We measured trust using the News Media Trust questionnaire, along with two decision behaviors: source-checking and subscription decisions. Questionnaire responses and subscription rates showed a decline in trust only for detailed AI disclosures, whereas source-checking behavior increased for both one-line and detailed disclosures, with the effect being more pronounced for detailed disclosures. Insights from semi-structured interviews suggest that source-checking behavior was primarily driven by interest in the topic, followed by trust, whereas trust was the main factor influencing subscription decisions. Around two-thirds of participants expressed a preference for detailed disclosures, while most participants who preferred one-line indicated a need for detail-on-demand disclosure formats. Our findings show that not all AI disclosures lead to a transparency dilemma, but instead reflect a trade-off between readers' desire for more transparency and their trust in AI-assisted news content.

Authors:Qian Ma, Yingfan Zhou, Shubhang Kaushik, Aamod Joshi, Aditya Majumdar, Noah Apthorpe, Yan Shvartzshnaider, Sarah Rajtmajer, Brett Frischmann
Title: Learning Password Best Practices Through In-Task Instruction
Abstract:
Users often make security- and privacy-relevant decisions without a clear understanding of the rules that govern safe behavior. We introduce pedagogical friction, a design approach that introduces brief, instructional interactions at the moment of action. We evaluate this approach in the context of password creation, a task with clear, objective quality criteria and broad familiarity. We conducted a randomized repeated-measures study with 128 participants across four interface conditions that varied the depth and interactivity of guidance. We assessed three outcomes: (1) rule compliance in a subsequent password task without guidance, (2) accuracy on survey questions matched to the rules shown earlier, and (3) behavior-knowledge alignment, which captures whether participants who correctly followed a rule also recognized it on the survey. Across all guided conditions, participants corrected most rule violations in the follow-up task, achieved moderate accuracy on matched rule questions, and showed high behavior-knowledge alignment. These results support pedagogical friction as a lightweight and generalizable intervention for security- and privacy-critical interfaces.

Authors:Sophie Villenave, Pierre Raimbaud, Guillaume Lavoué
Title: Dynamic Thermal Feedback in Highly Immersive VR Scenarios: a Multimodal Analysis of User Experience
Abstract:
Thermal feedback is critical to a range of Virtual Reality (VR) applications, such as firefighting training or thermal comfort simulation. Previous studies showed that adding congruent thermal feedback positively influences User eXperience (UX). However, existing work did not compare different levels of thermal feedback quality and mostly used less immersive virtual environments. To investigate these gaps in the scientific literature, we conducted a within-participant user study in two highly-immersive scenarios, Desert Island (n=25) and Snowy Mountains (n=24). Participants explored the scenarios in three conditions (Audio-Visual only, Static-Thermal Feedback, and Dynamic-Thermal Feedback). To assess the complex and subtle effects of thermal feedback on UX, we performed a multimodal analysis by crossing data from questionnaires, semi-structured interviews, and behavioral indicators. Our results show that despite an already high level of presence in the Audio-Visual only condition, adding thermal feedback increased presence further. Comparison between levels of thermal feedback quality showed no significant difference in UX questionnaires, however this result is nuanced according to participant profiles and interviews. Furthermore, we show that although the order of passage did not influence UX directly, it influenced user behavior. We propose guidelines for the use of thermal feedback in VR, and the design of studies in complex multisensory scenarios.

Authors:Xiyuan Zhu, Wenhan Lyu, Chaochao Fu, Yilin Wang, Jie Zheng, Qiyue Tan, Qianhe Chen, Yixin Yu, Ran Wang
Title: RecruitScope: A Visual Analytics System for Multidimensional Recruitment Data Analysis
Abstract:
Online recruitment platforms have become the dominant channel for modern hiring, yet most platforms offer only basic filtering capabilities, such as job title, keyword, and salary range. This hinders comprehensive analysis of multi-attribute relationships and job market patterns across different scales. We present RecruitScope, a visual analytics system designed to support multidimensional and cross-level exploration of recruitment data for job seekers and employers, particularly HR specialists. Through coordinated visualizations, RecruitScope enables users to analyze job positions and salary patterns from multiple perspectives, interpret industry dynamics at the macro level, and identify emerging positions at the micro level. We demonstrate the effectiveness of RecruitScope through case studies that reveal regional salary distribution patterns, characterize industry growth trajectories, and discover high-demand emerging roles in the job market.

Authors:Nia Touko, Matthew O A Ellis, Cristiano Capone, Alessio Burrello, Elisa Donati, Luca Manneschi
Title: Lightweight Test-Time Adaptation for EMG-Based Gesture Recognition
Abstract:
Reliable long-term decoding of surface electromyography (EMG) is hindered by signal drift caused by electrode shifts, muscle fatigue, and posture changes. While state-of-the-art models achieve high intra-session accuracy, their performance often degrades sharply. Existing solutions typically demand large datasets or high-compute pipelines that are impractical for energy-efficient wearables. We propose a lightweight framework for Test-Time Adaptation (TTA) using a Temporal Convolutional Network (TCN) backbone. We introduce three deployment-ready strategies: (i) causal adaptive batch normalization for real-time statistical alignment; (ii) a Gaussian Mixture Model (GMM) alignment with experience replay to prevent forgetting; and (iii) meta-learning for rapid, few-shot calibration. Evaluated on the NinaPro DB6 multi-session dataset, our framework significantly bridges the inter-session accuracy gap with minimal overhead. Our results show that experience-replay updates yield superior stability under limited data, while meta-learning achieves competitive performance in one- and two-shot regimes using only a fraction of the data required by current benchmarks. This work establishes a path toward robust, "plug-and-play" myoelectric control for long-term prosthetic use.

Authors:Michael Yin, Angela Chiang, Robert Xiao
Title: Dissolving a Digital Relationship: A Critical Examination of Digital Severance Behaviours in Close Relationships
Abstract:
Fulfilling social connections are crucial for human well-being and belonging, but not all relationships last forever. As interactions increasingly move online, the act of digitally severing a relationship - e.g. through blocking or unfriending - has become progressively more common as well. This study considers actions of "digital severance" through interviews with 30 participants with experience as the initiator and/or recipient of such situations. Through a critical interpretative lens, we explore how people perceive and interpret their severance experience and how the online setting of social media shapes these dynamics. We develop themes that position digital severance as being intertwined with power and control, and we highlight (im)balances between an individual's desires that can lead to feelings of disempowerment and ambiguous loss for both parties. We discuss the implications of our research, outlining three key tensions and four open questions regarding digital relationships, meaning-making, and design outcomes for future exploration.

Authors:Weiyue Li, Minda Zhao, Weixuan Dong, Jiahui Cai, Yuze Wei, Michael Pocress, Yi Li, Wanyan Yuan, Xiaoyue Wang, Ruoyu Hou, Kaiyuan Lou, Wenqi Zeng, Yutong Yang, Yilun Du, Mengyu Wang
Title: Grading Scale Impact on LLM-as-a-Judge: Human-LLM Alignment Is Highest on 0-5 Grading Scale
Abstract:
Large language models (LLMs) are increasingly used as automated evaluators, yet prior works demonstrate that these LLM judges often lack consistency in scoring when the prompt is altered. However, the effect of the grading scale itself remains underexplored. We study the LLM-as-a-judge problem by comparing two kinds of raters: humans and LLMs. We collect ratings from both groups on three scales and across six benchmarks that include objective, open-ended subjective, and mixed tasks. Using intraclass correlation coefficients (ICC) to measure absolute agreement, we find that LLM judgments are not perfectly consistent across scales on subjective benchmarks, and that the choice of scale substantially shifts human-LLM agreement, even when within-group panel reliability is high. Aggregated over tasks, the grading scale of 0-5 yields the strongest human-LLM alignment. We further demonstrate that pooled reliability can mask benchmark heterogeneity and reveal systematic subgroup differences in alignment across gender groups, strengthening the importance of scale design and sub-level diagnostics as essential components of LLM-as-a-judge protocols.

Authors:Yun Ye, Zexuan Li, Panagiotis Angeloudis, S. C. Wong, Jian Sun, Haoyang Liang
Title: Are eHMIs always helpful? Investigating how eHMIs interfere with pedestrian behavior on multi-lane streets: An eye-tracking virtual reality experiment
Abstract:
Appropriate communication is crucial for efficient and safe interactions between pedestrians and autonomous vehicles (AVs). External human-machine interfaces (eHMIs) on AVs, which can be categorized as allocentric or egocentric, are considered a promising solution. While the effectiveness of eHMIs has been extensively studied, in complex environments, such as unsignalized multi-lane streets, their potential to interfere with pedestrian crossing behavior remains underexplored. Hence, a virtual reality-based experiment was conducted to examine how different types of eHMIs displayed on AVs affect the crossing behavior of pedestrians in multi-lane streets environments, with a focus on the gaze patterns of pedestrians during crossing. The results revealed that the presence of eHMIs significantly influenced the cognitive load on pedestrians and increased the possibility of distraction, even misleading pedestrians in cases involving multiple AVs on multi-lane streets. Notably, allocentric eHMIs induced higher cognitive loads and greater distraction in pedestrians than egocentric eHMIs. This was primarily evidenced by longer gaze time and higher proportions of attention for the eHMI on the interacting vehicle, as well as a broader distribution of gaze toward vehicles in the non-interacting lane. However, misleading behavior was mainly triggered by eHMI signals from yielding vehicles in the non-interacting lane. Under such asymmetric signal configurations, egocentric eHMIs resulted in a higher misjudgment rate than allocentric eHMIs. These findings highlight the importance of enhancing eHMI designs to balance the clarity and consistency of the displayed information across different perspectives, especially in complex multi-lane traffic scenarios. This study provides valuable insights regarding the application and standardization of future eHMI systems for AVs.

Authors:Vikram Kamath Cannanure, Bruno Yinkfu, Douglas Bryan, Mati Amin, Ingmar Weber
Title: Teacher Professional Development on WhatsApp and LLMs: Early Lessons from Cameroon
Abstract:
AI in education is commonly delivered through web-based systems such as online forms and institutional platforms. However, these approaches can exclude teachers in low-resource contexts, where everyday mobile platforms like WhatsApp serve as primary digital infrastructure. To address this gap, we present a field pilot in Cameroon that deploys a WhatsApp-based chatbot with LLM-supported content for teacher professional development (TPD), compared with an online form baseline. The system was evaluated through a mixed-methods study with 47 primary school teachers, integrating quantitative measures with qualitative insights from interviews and participant feedback. Results show that the chatbot was rated higher in perceived usability and overall experience, while learnability remained comparable. These improvements were driven by platform familiarity, low interaction overhead, and the modular structure of LLM-supported content, but were constrained by connectivity limitations, prepaid data costs, and multilingual needs (English/French). Building on these findings, we outline design directions for multilingual, culturally grounded interaction and for supporting prompting and reflection in AI use. More broadly, this work points to Thoughtful AI that supports reflection, relevance, and sustained professional growth.

Authors:Zonghan Li, Yi Liu, Chunyan Wang, Song Tong, Kaiping Peng, Feng Ji
Title: Enhancing behavioral nudges with large language model-based iterative personalization: A field experiment on electricity and hot-water conservation
Abstract:
Nudging is widely used to promote behavioral change, but its effectiveness is often limited when recipients must repeatedly translate feedback into workable next steps under changing circumstances. Large language models (LLMs) may help reduce part of this cognitive work by generating personalized guidance and updating it iteratively across intervention rounds. We developed an LLM agent for iterative personalization and tested it in a three-arm randomized experiment among 233 university residents in China, using daily electricity and shower hot-water conservation as objectively measured cases differing in friction. LLM-personalized nudges (T2) produced the largest conservation effects, while image-enhanced conventional nudges (T1) and text-based conventional nudges (C) showed similar outcomes (omnibus p = 0.009). Relative to C, T2 reduced electricity consumption by 0.56 kWh per room-day (p = 0.014), corresponding to an 18.3 percentage-point higher adjusted saving rate. This advantage emerged within the first two intervention rounds, alongside iterative updating of personalized guidance, and persisted thereafter. Hot-water outcomes followed the same direction but were smaller, less precisely estimated, and attenuated over time, consistent with stronger friction in this domain. LLM-personalized nudges emphasized prospective and context-specific guidance and were associated with higher participant engagement. This study provides field evidence that LLM-based iterative personalization can enhance behavioral nudging, with behavioral friction as a potential boundary condition. Larger trials and extension to more behaviors are warranted.

Authors:Pyeonghwa Kim, Taylor Lewandowski, Michael Dunn, Steve Sawyer
Title: Occupational Diversity and Stratification in Platform Work: A Longitudinal Study of Online Freelancers
Abstract:
We focus on occupational diversity in platform-mediated work to advance conceptual and empirical insight into the occupationally embedded nature of platform labor. We pursue this focus in response to a prevailing tendency to treat platform workers as a homogeneous group, overlooking the unique demands, constraints, and practices rooted in specific professions. Such generalizations hinder both understanding of platform work and the development of sociotechnical systems that support differentiated occupational realities. To address this gap, we present a longitudinal analysis of 108 online freelancers spanning five occupational categories. We show that occupational context structures workers' capacity to interpret and navigate platformic management, shaping distinct experiences across four dimensions of platform work: self-presentation, flexibility, skilling, and platform work sustainability. To articulate how digital labor platforms' managerial control interacts with occupational embeddedness, we introduce the concept of platformic occupational stratification and discuss four mechanisms that explain its logic and implications for platform-mediated work. These insights contribute to CSCW by informing occupation-sensitive research and design approaches that directly engage with the specific opportunities and challenges rooted in workers' situated occupational agency in platform-mediated work.

Authors:Birgitta Langhammer, Oscar Martinez Mozos, Ana Mendes, Joana Madureira, Lina Seduikyte, Martin Weigl, Heidi Salonen, Veronika Kotradyova, Ondrej Krejcar, Sarmite Mikulioniene, Willeke van Staalduinen, Carina Dantas, Petra Maresova, Willeke van Staalduinen, Carina Dantas, Barakovic Sabina, Barakovic Husic Jasmina, Jonathan Gomez-Raja
Title: State of the Art Report for Smart Habitat for Older Persons -- Working Group 3 -- Healthcare
Abstract:
This document reports the State of the Art of science and practice on three topics related to smart and healthy ageing at home: furniture and habitats, Information and Communication Technologies (ICT), and healthcare. The reports were prepared by the working groups of COST Action CA16226, Sheld-on. Sheld-on is a network of researchers, user representatives, industry members, and other stakeholders. The three domains covered in this report were the areas of interest for three working groups from the COST Action. The aim of each working group was to assess the State of the Art for disciplinary understanding, identification of advances in smart furniture and habitat, products, industries and success stories. The findings on these topics of all working groups are compiled here. Due to the different backgrounds of the members of each of the working groups, the document is divided in three separate parts that can be considered as separate State of the Art reports. The goal of this document is to be used as input in the fourth working group of Sheld-on COST Action: Solutions for Ageing Well at Home, in the Community, and at Work, where experts from the three different domains converge to a single working group in order to achieve the action objectives.

Authors:Ruth Cohen, Lu Feng, Ayala Bloch, Sarit Kraus
Title: The Persuasion Paradox: When LLM Explanations Fail to Improve Human-AI Team Performance
Abstract:
While natural-language explanations from large language models (LLMs) are widely adopted to improve transparency and trust, their impact on objective human-AI team performance remains poorly understood. We identify a Persuasion Paradox: fluent explanations systematically increase user confidence and reliance on AI without reliably improving, and in some cases undermining, task accuracy. Across three controlled human-subject studies spanning abstract visual reasoning (RAVEN matrices) and deductive logical reasoning (LSAT problems), we disentangle the effects of AI predictions and explanations using a multi-stage reveal design and between-subjects comparisons. In visual reasoning, LLM explanations increase confidence but do not improve accuracy beyond the AI prediction alone, and substantially suppress users' ability to recover from model errors. Interfaces exposing model uncertainty via predicted probabilities, as well as a selective automation policy that defers uncertain cases to humans, achieve significantly higher accuracy and error recovery than explanation-based interfaces. In contrast, for language-based logical reasoning tasks, LLM explanations yield the highest accuracy and recovery rates, outperforming both expert-written explanations and probability-based support. This divergence reveals that the effectiveness of narrative explanations is strongly task-dependent and mediated by cognitive modality. Our findings demonstrate that commonly used subjective metrics such as trust, confidence, and perceived clarity are poor predictors of human-AI team performance. Rather than treating explanations as a universal solution, we argue for a shift toward interaction designs that prioritize calibrated reliance and effective error recovery over persuasive fluency.

Authors:Yongsu Ahn, Nam Wook Kim, Benjamin Bach
Title: Disrupting Cognitive Passivity: Rethinking AI-Assisted Data Literacy through Cognitive Alignment
Abstract:
AI chatbots are increasingly stepping into roles as collaborators or teachers in analyzing, visualizing, and reasoning through data and domain problem. Yet, AI's default assistant mode with its comprehensive and one-off responses may undermine opportunities for practitioners to develop literacy through their own thinking, inducing cognitive passivity. Drawing on evidence from empirical studies and theories, we argue that disrupting cognitive passivity necessitates a nuanced approach: rather than simply making AI promote deliberative thinking, there is a need for more dynamic and adaptive strategy through cognitive alignment -- a framework that characterizes effective human-AI interaction as a function of alignment between users' cognitive demand and AI's interaction mode. In the framework, we provide the mapping between AI's interaction mode (transmissive or deliberative) and users' cognitive demand (receptive or deliberative), otherwise leading to either cognitive passivity or friction. We further discuss implications and offer open questions for future research on data literacy.

Authors:Yasaman Hakiminejad, Shiva Azimi, Luis Gomero, Elizabeth Pantesco, Irene P. Kan, Meltem Izzetoglu, Arash Tavakoli
Title: Steering through Time: Blending Longitudinal Data with Simulation to Rethink Human-Autonomous Vehicle Interaction
Abstract:
As semi-automated vehicles (SAVs) become more common, ensuring effective human-vehicle interaction during control handovers remains a critical safety challenge. Existing studies often rely on single-session simulator experiments or naturalistic driving datasets, which often lack temporal context on drivers' cognitive and physiological states before takeover events. This study introduces a hybrid framework combining longitudinal mobile sensing with high-fidelity driving simulation to examine driver readiness in semi-automated contexts. In a pilot study with 38 participants, we collected 7 days of wearable physiological data and daily surveys on stress, arousal, valence, and sleep quality, followed by an in-lab simulation with scripted takeover events under varying secondary task conditions. Multimodal sensing, including eye tracking, fNIRS, and physiological measures, captured real-time responses. Preliminary analysis shows the framework's feasibility and individual variability in baseline and in-task measures; for example, fixation duration and takeover control time differed by task type, and RMSSD showed high inter-individual stability. This proof-of-concept supports the development of personalized, context-aware driver monitoring by linking temporally layered data with real-time performance.

Authors:Luca Vogelgesang, Ahmed Mehdi Soltani, Mohammadhossein Khojasteh, Xinrui Zu, Stefano De Giorgis, Madalina Croitoru, Filip Ilievski
Title: StretchBot: A Neuro-Symbolic Framework for Adaptive Guidance with Assistive Robots
Abstract:
Assistive robots have growing potential to support physical wellbeing in home and healthcare settings, for example, by guiding users through stretching or rehabilitation routines. However, existing systems remain largely scripted, which limits their ability to adapt to user state, environmental context, and interaction dynamics. In this work, we present StretchBot, a hybrid neuro-symbolic robotic coach for adaptive assistive guidance. The system combines multimodal perception with knowledge-graph-grounded large language model reasoning to support context-aware adjustments during short stretching sessions while maintaining a structured routine. To complement the system description, we report an exploratory pilot comparison between scripted and adaptive guidance with three participants. The pilot findings suggest that the adaptive condition improved perceived adaptability and contextual relevance, while scripted guidance remained competitive in smoothness and predictability. These results provide preliminary evidence that structured actionable knowledge can help ground language-model-based adaptation in embodied assistive interaction, while also highlighting the need for larger, longitudinal studies to evaluate robustness, generalizability, and long-term user experience.

Authors:Jeremy Zhengqi Huang, Emani Hicks, Sidharth, Gillian R. Hayes, Dhruv Jain
Title: Sona: Real-Time Multi-Target Sound Attenuation for Noise Sensitivity
Abstract:
For people with noise sensitivity, everyday soundscapes can be overwhelming. Existing tools such as active noise cancellation reduce discomfort by suppressing the entire acoustic environment, often at the cost of awareness of surrounding people and events. We present Sona, an interactive mobile system for real-time soundscape mediation that selectively attenuates bothersome sounds while preserving desired audio. Sona is built on a target-conditioned neural pipeline that supports simultaneous attenuation of multiple overlapping sound sources, overcoming the single-target limitation of prior systems. It runs in real time on-device and supports user-extensible sound classes through in-situ audio examples, without retraining. Sona is informed by a formative study with 68 noise-sensitive individuals. Through technical benchmarking and an in-situ study with 10 participants, we show that Sona achieves low-latency, multi-target attenuation suitable for live listening, and enables meaningful reductions in bothersome sounds while maintaining awareness of surroundings. These results point toward a new class of personal AI systems that support comfort and social participation by mediating real-world acoustic environments.

Authors:Abu Noman Md Sakib, Protik Dey, Zijie Zhang, Taslima Akter
Title: Explainable AI for Blind and Low-Vision Users: Navigating Trust, Modality, and Interpretability in the Agentic Era
Abstract:
Explainable Artificial Intelligence (XAI) is critical for ensuring trust and accountability, yet its development remains predominantly visual. For blind and low-vision (BLV) users, the lack of accessible explanations creates a fundamental barrier to the independent use of AI-driven assistive technologies. This problem intensifies as AI systems shift from single-query tools into autonomous agents that take multi-step actions and make consequential decisions across extended task horizons, where a single undetected error can propagate irreversibly before any feedback is available. This paper investigates the unique XAI requirements of the BLV community through a comprehensive analysis of user interviews and contemporary research. By examining usage patterns across environmental perception and decision support, we identify a significant modality gap. Empirical evidence suggests that while BLV users highly value conversational explanations, they frequently experience "self-blame" for AI failures. The paper concludes with a research agenda for accessible Explainable AI in agentic systems, advocating for multimodal interfaces, blame-aware explanation design, and participatory development.

Authors:George Boateng, Samuel Boateng, Victor Kumbol
Title: Kwame 2.0: Human-in-the-Loop Generative AI Teaching Assistant for Large Scale Online Coding Education in Africa
Abstract:
Providing timely and accurate learning support in large-scale online coding courses is challenging, particularly in resource-constrained contexts. We present Kwame 2.0, a bilingual (English-French) generative AI teaching assistant built using retrieval-augmented generation and deployed in a human-in-the-loop forum within SuaCode, an introductory mobile-based coding course for learners across Africa. Kwame 2.0 retrieves relevant course materials and generates context-aware responses while encouraging human oversight and community participation. We deployed the system in a 15-month longitudinal study spanning 15 cohorts with 3,717 enrollments across 35 African countries. Evaluation using community feedback and expert ratings shows that Kwame 2.0 provided high-quality and timely support, achieving high accuracy on curriculum-related questions, while human facilitators and peers effectively mitigated errors, particularly for administrative queries. Our findings demonstrate that human-in-the-loop generative AI systems can combine the scalability and speed of AI with the reliability of human support, offering an effective approach to learning assistance for underrepresented populations in resource-constrained settings at scale.

Authors:Lucas Gautheron, Nori Jacoby, Peter Harrison
Title: Active Inference with People: a general approach to real-time adaptive experiments
Abstract:
Adaptive experiments automatically optimize their design throughout the data collection process, which can bring substantial benefits compared to conventional experimental settings. Potential applications include, among others: computerized adaptive testing (for selecting informative tasks in ability measurements), adaptive treatment assignment (when searching experimental conditions maximizing certain outcomes), and active learning (for choosing optimal training data for machine learning algorithms). However, implementing these techniques in real time poses substantial computational and technical challenges. Additionally, despite their conceptual similarity, the above scenarios are often treated as separate problems with distinct solutions. In this paper, we introduce a practical and unified approach to real-time adaptive experiments that can encompass all of the above scenarios, regardless of the modality of the task (including textual, visual, and audio inputs). Our strategy combines active inference, a Bayesian framework inspired by cognitive neuroscience, with PsyNet, a platform for large-scale online behavioral experiments. While active inference provides a compact, flexible, and principled mathematical framework for adaptive experiments generally, PsyNet is a highly modular Python package that supports social and behavioral experiments with stimuli and responses in arbitrary domains. We illustrate this approach through two concrete examples: (1) an adaptive testing experiment estimating participants' ability by selecting optimal challenges, effectively reducing the amount of trials required by 30--40\%; and (2) an adaptive treatment assignment strategy that identifies the optimal treatment up to three times as accurately as a fixed design in our example. We provide detailed instructions to facilitate the adoption of these techniques.

Authors:Ziming Li, Hongji Li, Jialin Wang, Pan Hui, Hai-Ning Liang
Title: FlexiCamAR: Enhancing Everyday Camera Interactions on AR Glasses with a Flexible Additional Viewpoint
Abstract:
The recent emergence and popularity of consumer-grade augmented reality (AR) glasses from major technology companies highlight their potential to become the next daily computing platform. A dominant design trend in this context is the integration of a front-facing camera to deliver a first-person perspective. While this approach is intuitive, there is limited evidence that it is optimal (or sufficient) for supporting users in daily tasks. This paper explores a more effective camera interaction technique for AR glasses, which we term ``FlexiCamAR." This novel method aims to enhance both efficiency and the range of applications for AR glasses by offering flexible and comfortable secondary camera viewpoints. To investigate the applicability and usability of this approach, we developed a ring camera prototype that can be attached to users' fingers. We then conducted a user study with 12 participants, comparing FlexiCamAR against the baseline, a traditional front-facing AR camera setup, across two common tasks: taking photos and scanning QR codes. Our findings show that FlexiCamAR significantly reduces physical load. We also explore potential scenarios where the additional viewpoint afforded by FlexiCamAR proves valuable, such as capturing low-angle perspectives or navigating confined spaces. Participant feedback further suggests strong potential for additional applications, including selfie taking, video conferencing, and object scanning. Overall, FlexiCamAR presents a novel interaction approach that can serve as a powerful supplement or alternative to the first-person perspective, significantly improving the adaptability of AR glasses for everyday use.

Authors:Giulio Pisaneschi, Pierpaolo Serio, Estelle Gerbier, Andrea Dan Ryals, Lorenzo Pollini, Mario G. C. A. Cimino
Title: A Mentalistic Interface for Probing Folk-Psychological Attribution to Non-Humanoid Robots
Abstract:
This paper presents an experimental platform for studying intentional-state attribution toward a non-humanoid robot. The system combines a simulated robot, realistic task environments, and large language model-based explanatory layers that can express the same behavior in mentalistic, teleological, or mechanistic terms. By holding behavior constant while varying the explanatory frame, the platform provides a controlled way to investigate how language and framing shape the adoption of the intentional stance in robotics.

Authors:Roshni Kaushik, Maarten Sap, Koichi Onoue
Title: Examining the Effect of Explanations of AI Privacy Redaction in AI-mediated Interactions
Abstract:
AI-mediated communication is increasingly being utilized to help facilitate interactions; however, in privacy sensitive domains, an AI mediator has the additional challenge of considering how to preserve privacy. In these contexts, a mediator may redact or withhold information, raising questions about how users perceive these interventions and whether explanations of system behavior can improve trust. In this work, we investigate how explanations of redaction operations can affect user trust in AI-mediated communication. We devise a scenario where a validated system removes sensitive content from messages and generates explanations of varying detail to communicate its decisions to recipients. We then conduct a user study with $180$ participants that studies how user trust and preferences vary for cases with different amounts of redacted content and different levels of explanation detail. Our results show that participants believed our system was more effective at preserving privacy when explanations were provided ($p<0.05$, Cohen's $d \approx 0.3$). We also found that contextual factors had an impact; participants relied more on explanations and found them more helpful when the system performed extensive redactions ($p<0.05$, Cohen's $f \approx 0.2$). We also found that explanation preferences depended on individual differences as well, and factors such as age and baseline familiarity with AI affected user trust in our system. These findings highlight the importance and challenge of balancing transparency and privacy in AI-mediated communications and suggest that adaptive, context-aware explanations are essential for designing privacy-aware, trustworthy AI systems.

Authors:Vartika Narayani Srinet, Anirudha Bhattacharjee, Braj Bhushan, Bishakh Bhattacharya
Title: Human vs. NAO: A Computational-Behavioral Framework for Quantifying Social Orienting in Autism and Typical Development
Abstract:
Responding to one's name is among the earliest-emerging social orienting behaviors and is one of the most prominent aspects in the detection of Autism Spectrum Disorder (ASD). Typically developing children exhibit near-reflexive orienting to their name, whereas children with ASD often demonstrate reduced frequency, increased latency, or atypical patterns of response. In this study, we examine differential responsiveness to quantify name-calling stimuli delivered by both human agents and NAO, a humanoid robot widely employed in socially assistive interventions for autism. The analysis focuses on multiple behavioral parameters, including eye contact, response latency, head and facial orientation shifts, and duration of sustained interest. Video-based computational methods were employed, incorporating face detection, eye region tracking, and spatio-temporal facial analysis, to obtain fine-grained measures of children's responses. By comparing neurotypical and neuroatypical groups under controlled human-robot conditions, this work aims to understand how the source and modality of social cues affect attentional dynamics in name-calling contexts. The findings advance both the theoretical understanding of social orienting deficits in autism and the applied development of robot-assisted assessment tools.

Authors:Jiyeon Bae, Jinwook Seo
Title: A Multi-Level Visual Analytics Approach to Artist-Era Alignment in Popular Music
Abstract:
Existing computational studies of popular music primarily model aggregate trends or predict chart performance, offering limited support for interpreting artist-level alignment against historical stylistic baselines. We introduce an interactive visual analytics framework that treats each artist-decade as a unit defined relative to an era-specific baseline, characterized along two complementary dimensions: profile shape similarity, capturing directional correspondence with the era's feature pattern, and profile contrast ratio, capturing stylistic intensity relative to the era's dispersion. Together, these dimensions define a quadrant-based trajectory space for reasoning about conformity, divergence, and amplification over time. Applied to weekly U.S. Billboard Hot 100 chart entries from the all-time top-10 artists across six decades (1960s-2010s), linked with Spotify audio features, the framework reveals that alignment and intensity can meaningfully diverge across artist trajectories.

Authors:Zihong He, Shuqin Wang, Songchen Zhou, Qinghui Lin, Jialin Wang, Chen Liang, Hai-Ning Liang
Title: Would You Like to Visit My World? Cultivating Perceived Equality in Human-Agent Interaction via Observable Social Life Spaces
Abstract:
Most AI agents remain confined to an instrumental "command-execution" model, resulting in unequal, one-sided interactions. While recent works attempt to build relationships through hidden memory backends, these invisible processes often fail to break the instrumental bias. In this paper, we argue that true relational equality requires agents to have an independent, observable existence. We introduce the \textit{Observable Life Spaces} paradigm, where agents inhabit a continuous virtual environment, engage in daily activities, and form social relationships that users can directly observe. Through a mixed-methods study ($N=24$), we demonstrate that only when agents are endowed with a socialized life space that is visually observable to humans can the perceived equality during interaction be significantly enhanced ($p = 0.015$). Our findings suggest that visually representing an agent's social life space can effectively shift the human-agent dynamic from a purely instrumental relationship to one characterized by perceived equality.

Authors:Matthew Flathers, Griffin Smith, Julian Herpertz, Zhitong Zhou, John Torous
Title: Depictions of Depression in Generative AI Video Models: A Preliminary Study of OpenAI's Sora 2
Abstract:
Generative video models are increasingly capable of producing complex depictions of mental health experiences, yet little is known about how these systems represent conditions like depression. This study characterizes how OpenAI's Sora 2 generative video model depicts depression and examines whether depictions differ between the consumer App and developer API access points. We generated 100 videos using the single-word prompt "Depression" across two access points: the consumer App (n=50) and developer API (n=50). Two trained coders independently coded narrative structure, visual environments, objects, figure demographics, and figure states. Computational features across visual aesthetics, audio, semantic content, and temporal dynamics were extracted and compared between modalities. App-generated videos exhibited a pronounced recovery bias: 78% (39/50) featured narrative arcs progressing from depressive states toward resolution, compared with 14% (7/50) of API outputs. App videos brightened over time (slope = 2.90 brightness units/second vs. -0.18 for API; d = 1.59, q < .001) and contained three times more motion (d = 2.07, q < .001). Across both modalities, videos converged on a narrow visual vocabulary and featured recurring objects including hoodies (n=194), windows (n=148), and rain (n=83). Figures were predominantly young adults (88% aged 20-30) and nearly always alone (98%). Gender varied by access point: App outputs skewed male (68%), API outputs skewed female (59%). Sora 2 does not invent new visual grammars for depression but compresses and recombines cultural iconographies, while platform-level constraints substantially shape which narratives reach users. Clinicians should be aware that AI-generated mental health video content reflects training data and platform design rather than clinical knowledge, and that patients may encounter such content during vulnerable periods.

Authors:Shixian Xie, Motahhare Eslami, John Zimmerman
Title: Strategies for Designing Responsibly within a Capitalist Enterprise
Abstract:
Despite significant advances in responsible AI research, industry adoption remains limited, leaving many HCI contributions underutilized in practice. This position paper argues that current research often fails to account for the fundamental need for capitalist enterprises to create value. To achieve immediate real-world impact, responsible AI research must explore how to design responsibly within capitalism. We call for a move beyond the dichotomy of "ethics vs. business" toward a more productive framing of "ethics and business." We propose ideation as a practical design strategy for generating ethically preferable alternatives that also meet business objectives. By aligning ethics with enterprise realities, we expand the space of responsible design that can actually be built.

Authors:Hashini Senaratne, Richard Attfield, Samith Widhanapathirana, David Howard, Cecile Paris, Dana Kulic, Leimin Tian
Title: HRI-SA: A Multimodal Dataset for Online Assessment of Human Situational Awareness during Remote Human-Robot Teaming
Abstract:
Maintaining situational awareness (SA) is critical in human-robot teams. Yet, under high workload and dynamic conditions, operators often experience SA gaps. Automated detection of SA gaps could provide timely assistance for operators. However, conventional SA measures either disrupt task flow or cannot capture real-time fluctuations, limiting their operational utility. To the best of our knowledge, no publicly available dataset currently supports the systematic evaluation of online human SA assessment in human-robot teaming. To advance the development of online SA assessment tools, we introduce HRI-SA, a multimodal dataset from 30 participants in a realistic search-and-rescue human-robot teaming context, incorporating eye movements, pupil diameter, biosignals, user interactions, and robot data. The experimental protocol included predefined events requiring timely operator assistance, with ground truth SA latency of two types (perceptual and comprehension) systematically obtained by measuring the time between assistance need onset and resolution. We illustrate the utility of this dataset by evaluating standard machine learning models for detecting perceptual SA latencies using generic eye-tracking features and contextual features. Results show that eye-tracking features alone effectively classified perceptual SA latency (recall=88.91%, F1=67.63%) using leave-one-group-out cross-validation, with performance improved through contextual data fusion (recall=91.51%, F1=80.38%). This paper contributes the first public dataset supporting the systematic evaluation of SA throughout a human-robot teaming mission, while also demonstrating the potential of generic eye-tracking features for continuous perceptual SA latency detection in remote human-robot teaming.

Authors:Apurv Varshney, Lily M. Turkstra, Jiaxin Su, Mable Zhou, Scott T. Grafton, Barry Giesbrecht, Mary Hegarty, Michael Beyeler
Title: Actionable Guidance Outperforms Map and Compass Cues in Demanding Immersive VR Wayfinding
Abstract:
Navigation aids are central to immersive virtual reality (VR) experiences that involve physical locomotion. Their effectiveness depends not only on how much spatial information they provide, but also on how directly that information supports movement decisions. We compared three common guidance techniques for immersive VR wayfinding: a directional arrow, a minimap, and a compass. In a controlled room-scale VR study with 42 participants completing 1008 trials, participants navigated to target landmarks in a time-pressured maze with reduced visibility and forced route replanning. Across behavioral and eye-tracking measures, arrow guidance produced the strongest navigation performance, minimap guidance yielded intermediate performance, and compass cues performed worst, suggesting that during immersive locomotion users benefit from guidance that can be interpreted rapidly while moving. These results suggest that in demanding immersive locomotion tasks, interfaces that translate spatial information directly into actionable movement cues can outperform richer but more interpretive spatial representations. Our findings highlight the importance of designing XR navigation interfaces that minimize the cognitive translation between spatial information and movement decisions.

Authors:Vincent Gurgul, Robin Gubela, Stefan Lessmann
Title: The State of Generative AI in Software Development: Insights from Literature and a Developer Survey
Abstract:
Generative Artificial Intelligence (GenAI) rapidly transforms software engineering, yet existing research remains fragmented across individual tasks in the Software Development Lifecycle. This study integrates a systematic literature review with a survey of 65 software developers. The results show that GenAI exerts its highest impact in design, implementation, testing, and documentation, where over 70 % of developers report at least halving the time for boilerplate and documentation tasks. 79 % of survey respondents use GenAI daily, preferring browser-based Large Language Models over alternatives integrated directly in their development environment. Governance is maturing, with two-thirds of organizations maintaining formal or informal guidelines. In contrast, early SDLC phases such as planning and requirements analysis show markedly lower reported benefits. In a nutshell, GenAI shifts value creation from routine coding toward specification quality, architectural reasoning, and oversight, while risks such as uncritical adoption, skill erosion, and technical debt require robust governance and human-in-the-loop mechanisms.

Authors:Paulo Vitor Santana Silva, Arthur Ricardo Sousa Vitória, Diogo Fernandes Costa Silva, Arlindo Rodrigues Galvão Filho
Title: Attention Guidance through Video Script: A Case Study of Object Focusing on 360° VR Video Tours
Abstract:
Within the expansive domain of virtual reality (VR), 360° VR videos immerse viewers in a spherical environment, allowing them to explore and interact with the virtual world from all angles. While this video representation offers unparalleled levels of immersion, it often lacks effective methods to guide viewers' attention toward specific elements within the virtual environment. This paper combines the models Grounding Dino and Segment Anything (SAM) to guide attention by object focusing based on video scripts. As a case study, this work conducts the experiments on a 360° video tour on the University of Reading. The experiment results show that video scripts can improve the user experience in 360° VR Videos Tour by helping in the task of directing the user's attention.

Authors:Zhaoxi Zhang, Ruolin Wu, Feiyang Ren, Sridevi Turaga, Tamir Mendel
Title: CoDesignAI: An AI-Enabled Multi-Agent, Multi-User System for Collaborative Urban Design at the Conceptual Stage
Abstract:
Public participation has become increasingly important in collaborative urban design; yet, existing processes often face challenges in achieving efficient and scalable citizen engagement. To address this gap, this study explores how large language models (LLMs) can support cooperation among community members in participatory design. We introduce CoDesignAI, a collaborative urban design tool that combines multiple users, representing residents or stakeholders, with multiple AI agents, representing domain experts who provide facilitation and professional knowledge during the conceptual stage of urban design. This paper presents the system architecture and main components of the tool, illustrating how users interact with AI agents within a collaborative and iterative design workflow. Specifically, the system integrates generative AI with spatial mapping services to support street-level visualization of design proposals. AI agents assist users by summarizing discussion content, extracting shared design intentions, and generating prompts for presenting design interventions. The system also enables users to revise and refine their ideas over multiple rounds while documenting the design process. By combining conversational AI, multi-user interaction, and image-based design grounded in real-world urban contexts, this study argues that AI-enabled design systems can help shift urban design from an expert-centered practice to a more open and participatory process. The paper contributes a new web-based platform for AI-assisted collaborative design and offers an early exploration of how AI agents may expand the capacity for public participation in urban design.

Authors:Kowe Kadoma, Priyal Shrivastava, Mor Naaman
Title: Lost in Transcription: Subtitle Errors in Automatic Speech Recognition Reduce Speaker and Content Evaluations
Abstract:
Researchers have demonstrated that Automatic Speech Recognition (ASR) systems perform differently across demographic groups. In this work, we examined how subtitle errors affect evaluations of speakers and their content using a preregistered online experiment (N=207, U.S.-based crowdworkers). Participants watched speakers with various accents deliver a talk in which the subtitles were accurate or error-prone. Our results indicate that error-prone subtitles consistently reduce both speaker and content evaluations for all speakers. We did not see disparate impact between the accent groups, controlling for subtitle quality. Taken together, though, the findings of this short paper imply that speakers with accents for which ASR systems perform poorly are likely to be further penalized by viewers with lower evaluations.

Authors:Supriya Khadka, Sanchari Das
Title: Grant, Verify, Revoke: A User-Centric Pattern for Blockchain Compliance
Abstract:
In decentralized web applications, users face an inherent conflict between public verifiability and personal privacy. To participate in regulated on-chain services, users must currently disclose sensitive identity documents to centralized intermediaries, permanently linking real-world identities to public transaction histories. This binary choice between total privacy loss or total exclusion strips users of agency and exposes them to persistent surveillance. In this work, we introduce a Selective Disclosure Framework designed to restore user sovereignty by decoupling eligibility verification from identity revelation. We present ZK-Compliance, a prototype that leverages browser-based zero-knowledge proofs to shift the interaction model, enabling users to prove specific attributes (e.g., "I am over 18") locally without revealing the underlying data. We implement a user-governed Grant, Verify, Revoke lifecycle that transforms the user's mental model of compliance from a permanent data handover into a dynamic, revocable authorization session. Our evaluation shows that client-side proof generation takes under 200ms, enabling a seamless interactive experience on commodity hardware. This work provides early evidence that regulatory compliance need not come at the cost of user privacy or autonomy.

Authors:Hyerim Park, Jinseok Hong, Heejeong Ko, Woontack Woo
Title: What Are You Really Asking For? A Comparative 5W1H Analysis of Learner Questioning in CPR Training with IVAs in Screen-based and Augmented Reality Environments
Abstract:
Question-asking is one of the key indicators of cognitive engagement. However, understanding how the distinct psychological affordances of presentation media shape learners' spoken inquiries with embodied Intelligent Virtual Agents (IVAs) remains limited. To systematically examine this process, we propose a 5W1H-based framework for analyzing learner questions. Using this framework, we conducted a user study comparing an Augmented Reality-based IVA (AR-IVA) deployed in the physical environment with a screen-based IVA (Video-IVA) during cardiopulmonary resuscitation (CPR) instruction. Results showed that the AR-IVA elicited higher spatial and social presence and promoted more frequent and longer questions focused on clarification and understanding. In contrast, the Video-IVA encouraged questions regarding procedural refinement. Presence acted as a selective filter, shaping the timing and topic of questions rather than as a universal mediator. These effects were significantly moderated by learners' motivational and strategic characteristics toward learning. Based on these findings, we propose design implications for IVA-supported learning systems.

Authors:Natalie Grace Brigham, Lucy Qin, Tadayoshi Kohno
Title: Examining Risks in the AI Companion Application Ecosystem
Abstract:
While computer systems that allow users to interact through conversational natural language (i.e., chatbots) have existed for many years, varying types of applications advertising AI companionship (e.g., Character AI, Replika) have proliferated in recent years due to advancements in large language models. Our work offers a threat model encompassing two distinct risk categories: harms posed to users by AI companion applications, and harms enabled by malicious users exploiting application features. To further understand this application ecosystem, we identified 489 unique apps from the App Store and Play Store that advertised AI companionship. We then systematically conducted and analyzed walkthroughs of a stratified sample of 30 apps with respect to our threat model. Through our analysis, we categorize broader ecosystem trends that provide context for understanding threats and identify specific threats related to sensitive data collection and sharing, anthropomorphism, engagement mechanisms, sexual interactions and media, as well as the ingestion and reconstruction of likeness, including the potential for generating synthetic nonconsensual intimate imagery. This study provides a foundational security perspective on the AI companion application ecosystem and informs future research within and beyond this field, policy, and technical development. Content warning: This paper includes descriptions of applications that can be used to create synthetic nonconsensual representations, including explicit imagery, as well as discussion of self-harm and suicidal ideation.

Authors:A K M Amanat Ullah, David Ahlström, Khalad Hasan
Title: Leveraging Head Movement for Navigating Off-Screen Content on Large Curved Displays
Abstract:
Large curved displays are ideal for viewing 360 degree content, such as 3D maps, but typically restrict users to a 180 degree viewport, leaving information off-screen. Since users naturally direct their heads toward regions on-screen before interacting, head movements offer a promising alternative for workspace manipulation to bring off-screen content into view. We explore rate control functions (linear, sigmoid, polynomial) and zone control functions (continuous, friction, interrupted, additive) to translate head rotations into workspace control, enabling users to access off-screen content. Polynomial rate control emerges as the best choice, achieving the fastest trial times and highest subjective ratings. Using a map navigation task, our second study demonstrates that users perform better with the polynomial head-based technique than with the industry-standard controller-based methods, click-and-drag and joystick-push, for 360\degree workspace navigation. Based on these findings, we provide guidelines to inform the design of future 360\degree workspace navigation techniques for large curved displays.

Authors:Dimitri Staufer, Kirsten Morehouse, David Hartmann, Bettina Berendt
Title: Human-Centred LLM Privacy Audits: Findings and Frictions
Abstract:
Large language models (LLMs) learn statistical associations from massive training corpora and user interactions, and deployed systems can surface or infer information about individuals. Yet people lack practical ways to inspect what a model associates with their name. We report interim findings from an ongoing study and introduce LMP2, a browser-based self-audit tool. In two user studies ($N_{total}{=}458$), GPT-4o predicts 11 of 50 features for everyday people with $\ge$60\% accuracy, and participants report wanting control over LLM-generated associations despite not considering all outputs privacy violations. To validate our probing method, we evaluate eight LLMs on public figures and non-existent names, observing clear separation between stable name-conditioned associations and model defaults. Our findings also contribute to exposing a broader generative AI evaluation crisis: when outputs are probabilistic, context-dependent, and user-mediated through elicitation, what model--individual associations even include is under-specified and operationalisation relies on crafting probes and metrics that are hard to validate or compare. To move towards reliable, actionable human-centred LLM privacy audits, we identify nine frictions that emerged in our study and offer recommendations for future work and the design of human-centred LLM privacy audits.

Authors:Yejin Yun, Seung Won Lee, Jiin Choi, Kyung Hoon Hyun
Title: Modeling Sequential Design Actions as Designer Externalization on an Infinite Canvas
Abstract:
Infinite canvas platforms are becoming central to contemporary design practice, enabling designers to externalize cognition through the spatial arrangement of multimodal artifacts. As AI agents increasingly generate and organize content within these environments, their impact on designers' externalization processes remains underexplored. We report a field study with eight professional designers comparing workflows with and without an AI organizing agent. Through a sequence analysis of 5,838 design actions, we identify three key shifts: (1) AI integration reallocates cognitive effort from spatial management to content curation and relational structuring, without increasing active time; (2) a characteristic generate-and-curate cycle emerges in which designers' demands on the agent intensify while the agent's functional role adapts; and (3) AI's role evolves from a divergent catalyst in early stages to a convergent curator in later phases. These findings offer a behavioral model for designing phase-adaptive AI tools that support human-AI co-evolution on infinite canvases.

Authors:Keiichi Ihara, DaeHo Lee, Manato Abe, Hye-Young Jo, Ryo Suzuki
Title: CinemaWorld: Generative Augmented Reality with LLMs and 3D Scene Generation for Movie Augmentation
Abstract:
We introduce CinemaWorld, a generative augmented reality system that augments the viewer's physical surroundings with automatically generated mixed reality 3D content extracted from and synchronized with 2D movie scenes. Our system preprocesses films to extract key features using multimodal large language models (LLMs), generates dynamic 3D augmentations with generative AI, and embeds them spatially into the viewer's physical environment on the Meta Quest 3. To explore the design space of CinemaWorld, we conducted an elicitation study with eight film students, which led us to identify several key augmentation types, including particle effects, surrounding objects, textural overlays, character-driven augmentation, and lighting effects. We evaluated our system through a technical evaluation (N=100 video clips), a user study (N=12), and expert interviews with film creators (N=8). Results indicate that CinemaWorld enhances immersion and enjoyment, suggesting its potential to enrich the film-viewing experience.

Authors:JiWoong Jang, Patrick Carrington, Andrew Begel
Title: From Autonomy to Sovereignty - A New Telos for Socially Assistive Technology
Abstract:
Social accessibility research faces a persistent tension: assistive technologies (AT) predominantly pursue independence, yet disabled people's experiences reveal rich preferences for interdependence. Our analysis of 90 papers from 2011-2025 uncovered that this stems from a deeper issue - which crystallized through dialogue with three bodies of theories: (1) self-determination theory (SDT), (2) symbolic interactionism, and (3) posthumanist perspectives and crip technoscience. SDT illuminates individual needs; symbolic interactionism addresses construction of social meaning and stigma; Posthumanist and crip technoscience together challenges normalcy, governance, and the human-machine boundary. Through their tensions, we identify relational sovereignty as an alternative telos - or goal - to autonomy. While our corpus equates autonomy with independence, sovereignty centers the power to choose between independence and interdependence. To operationalize this shift - from "Can they do it?" to "Do they get to decide?" - we introduce the Relational Sovereignty Matrix and four design interventions: (1) a sovereignty-centered reframing of SDT, (2) generative questions for justice-oriented reflection, (3) the idea of building through sovereign technical primitives, and (4) explicit consideration of power in AT design.

Authors:JiWoong Jang, Patrick Carrington, Andrew Begel
Title: The Three Praxes Framework - A Thematic Review and Map of Social Accessibility Research
Abstract:
Research in social accessibility aims to improve the lives of disabled people across diverse abilities and experiences by assisting with communication, relationships, and ecosystems of access. We seek to understand this intersectional body of work through analyzing social accessibility research from 2011 to 2025. Through constructivist grounded theory analysis of 90 papers (curated from 605), we develop the Three Praxes Framework: three sites of practice Artifact (constructive), Ecosystem (relational), and Epistemology (theoretical) - two cross-cutting stances toward change (Temporal Orientation and Stakeholder Focus) - and one reflexive cycle modeling how insights can flow between praxes. Our analysis reveals these praxes operate largely in isolation, risking that insights remain academic exercises while assistive technologies reinforce existing barriers. We call on the field to realize a cycle where disabled people's lived experiences shape material realities, material practice generates theoretical knowledge, and both transform ecosystems of access.

Authors:Seung Won Lee, Semin Jin, Kyung Hoon Hyun
Title: Beyond Semantic Similarity: Open Challenges for Embedding-Based Creative Process Analysis Across AI Design Tools
Abstract:
AI-based creativity support tools (CSTs) are evaluated through domain-specific metrics, limiting cross-domain comparison of creative processes. Embedding-based protocol analysis offers a potential domain-agnostic analytical layer. However, we argue that fixed embedding similarity can misrepresent creative dynamics: it may not detect creative pivots that occur within superficially similar language, treating shifts in the problem being addressed as continued elaboration. We identify three open challenges stemming from this gap: aligning similarity measures with creative significance, segmenting and representing multimodal design traces, and evaluating agentic systems where embedding-based metrics enter the generation loop and shape agent behavior. We propose context-aware interventions using large language models as a direction for making trace analysis sensitive to session-specific creative dynamics.

Authors:Wenwei Li, Jiarun Zhou, Qinxiao Quan, Fusang Zhang, Daqing Zhang
Title: Pushing Bistatic Wireless Sensing toward High Accuracy at the Sub-Wavelength Scale
Abstract:
Contactless sensing using wireless communication signals has garnered significant attention due to its non-intrusive nature and ubiquitous infrastructure. Despite the promise, the inherent bistatic deployment of wireless communication introduces clock asynchronism, which leads to unknown phase offsets in channel response and hinders fine-grained sensing. State-of-the-art systems widely adopt the cross-antenna channel ratio to cancel these detrimental phase offsets. However, the channel ratio preserves sensing feature accuracy only at integer-wavelength target displacements, losing sub-wavelength fidelity. To overcome this limitation, we derive the first quantitative mapping between the distorted ratio feature and the ideal channel feature. Building on this foundation, we develop a robust framework that leverages channel response amplitude to recover the ideal channel feature from the distorted ratio. Real-world experiments across Wi-Fi and LoRa demonstrate that our method can effectively reconstruct sub-wavelength displacement details, achieving nearly an order-of-magnitude improvement in accuracy.

Authors:Lei Yin, Wentao Cheng, Zhida Qin, Tianyu Huang, Yidong Li, Gangyi Ding
Title: AutoUE: Automated Generation of 3D Games in Unreal Engine via Multi-Agent Systems
Abstract:
Automatically generating 3D games in commercial game engines remains a non-trivial challenge, as it involves complex engine-related workflows for generating assets such as scenes, blueprints, and code. To address this challenge, we propose a novel multi-agent system, AutoUE, which coordinates multiple agents to end-to-end generate 3D games, covering model retrieval, scene generation, gameplay and interaction code synthesis, and automated game testing for evaluation. In order to mitigate tool-use hallucinations in LLMs, we introduce a retrieval-augmented generation mechanism that grounds agents with relevant UE tool documentation. Additionally, we incorporate game design patterns and engine constraints into the code generation process to ensure the generation of correct and robust code. Furthermore, we design an automated play-testing pipeline that generates and executes runtime test commands, enabling systematic evaluation of dynamic behaviors. Finally, we construct a game generation dataset and conduct a series of experiments that demonstrate AutoUE's ability to generate 3D games end-to-end, and validate the effectiveness of these designs.

Authors:Mason Kadem, Sarah Masri, Anthea Innes, Rong Zheng
Title: Human-Centered Ambient and Wearable Sensing for Automated Monitoring in Dementia Care: A Scoping Review
Abstract:
We conducted a scoping review to map the rapidly evolving landscape of wearable and ambient sensing technologies for monitoring people with dementia across home and institutional settings. We analyzed empirical sensing studies (2015-2025) to identify and inform future technical and human-centered design requirements. Five key implementation principles emerge: (1) human-centered design involving all stakeholders to augment rather than replace caregivers; (2) personalized, adaptable solutions that support autonomy across settings and severity levels instead of standardized approaches; (3) integration with existing workflows with adequate training and support; (4) proactive privacy and consent considerations, especially for ambient monitoring of residents and caregivers; and (5) cost-effective, ethical, equitable, scalable solutions with quantifiable outcomes. This paper identifies gaps, trends and opportunities for developing sensing systems that address the complex challenges, while enhancing automation and autonomy, in dementia care.

Authors:Bibeg Limbu, Irene-Angelica Chounta
Title: Haptics in Cognition: Disruptor or Enabler of Memory?
Abstract:
This exploratory pilot study investigates the impact of haptic perception --specifically tactile sensitivity (touch) and kinaesthetic intensity (movement)-- on learning, operationalized as information retention (immediate recall) through handwriting. Participants (N=20) were randomly assigned to one of four experimental groups in a 2x2 factorial design, manipulating touch (via glove use) and movement (via increased writing pressure). Information retention was measured using an immediate recall test, while mental effort (reaction time in a secondary task) and perceived workload (NASA-TLX) were examined as mediating variables. Bayesian binomial regression revealed moderate evidence that increased writing pressure negatively influenced recall (85-88% probability of negative effect), whereas glove use alone demonstrated no clear effect. Bayesian mediation analysis found no strong evidence that mental effort or perceived workload mediated these effects, as all 95% credible intervals included zero, indicating substantial uncertainty. These findings suggest that increased Kinaesthetic demands may slightly impair immediate recall, independent of perceived workload or mental effort. Importantly, the manipulation of touch alone does not appear to influence information retention. The study contributes to understanding the nuanced relationship between embodied interactions and cognitive outcomes, with implications for designing sensor-based multimodal learning environments.

Authors:Ali Ebrahimi Pourasad, Meyssam Saghiri, Walid Maalej
Title: FeedAIde: Guiding App Users to Submit Rich Feedback Reports by Asking Context-Aware Follow-Up Questions
Abstract:
User feedback is essential for the success of mobile apps, yet what users report and what developers need often diverge. Research shows that users often submit vague feedback and omit essential contextual details. This leads to incomplete reports and time-consuming clarification discussions. To overcome this challenge, we propose FeedAIde, a context-aware, interactive feedback approach that supports users during the reporting process by leveraging the reasoning capabilities of Multimodal Large Language Models. FeedAIde captures contextual information, such as the screenshot where the issue emerges, and uses it for adaptive follow-up questions to collaboratively refine with the user a rich feedback report that contains information relevant to developers. We implemented an iOS framework of FeedAIde and evaluated it on a gym's app with its users. Compared to the app's simple feedback form, participants rated FeedAIde as easier and more helpful for reporting feedback. An assessment by two industry experts of the resulting 54 reports showed that FeedAIde improved the quality of both bug reports and feature requests, particularly in terms of completeness. The findings of our study demonstrate the potential of context-aware, GenAI-powered feedback reporting to enhance the experience for users and increase the information value for developers.

Authors:Alberto Tono, Jiajun Wu, Gordon Wetzstein, Iro Armeni, Hariharan Subramonyam, James Landay, Martin Fischer
Title: Deep Sketch-Based 3D Modeling: A Survey
Abstract:
In the past decade, advances in artificial intelligence have revolutionized sketch-based 3D modeling, leading to a new paradigm known as Deep Sketch-Based 3D Modeling (DS-3DM). DS-3DM offers data-driven methods that address the long-standing challenges of sketch abstraction and ambiguity. DS-3DM keeps humans at the center of the creative process by enhancing the flexibility, usability, faithfulness, and adaptability of sketch-based 3D modeling interfaces. This paper contributes a comprehensive survey of the latest DS-3DM within a novel design space: MORPHEUS. Built upon the Input-Model-Output (IMO) framework, MORPHEUS categorizes Models outputting Options of 3D Representations and Parts, derived from Human inputs (varying in quantity and modality), and Evaluated across diverse User-views and Styles. Throughout MORPHEUS we highlight limitations and identify opportunities for interdisciplinary research in Computer Vision, Computer Graphics, and Human-Computer Interaction, revealing a need for controllability and information-rich outputs. These opportunities align design processes more closely with user' intent, responding to the growing importance of user-centered approaches.

Authors:Shri Harini Ramesh, Fateme Rajabiyazdi
Title: Pulli Kolam: A Traditional South Indian Craft Practice for Representing Data
Abstract:
This paper introduces Pulli Kolam, a traditional South Indian craft, as a medium for physical data representation. Grounded in its cultural meaning and embodied practice, Pulli Kolam follows structured geometric rules while allowing creative variation. We identify five mapping strategies within Kolam (dots, patterns, fills, lines, and color) that can be used for representing data physically. without disrupting traditional practice. Through an illustrative scenario of daily well-being tracking, we demonstrate how data representation can be embedded within routine craft practice. We conclude by outlining potential material adaptations that extend Kolam beyond its ephemeral form while maintaining its embodied and ritual qualities.

Authors:Elena Koung, Yunhan Liu, Zinan Zhang, Xinning Gui, Yubo Kou
Title: Teen Vigilance: Navigating Risky Social Interactions on Discord
Abstract:
Teenagers are avid users of Discord, a fast growing platform for synchronous communication where they often interact with strangers. Because Discord combines private DMs, semi-private voice channels, and public servers in one place, it creates a hybrid environment that can produce complex and underexplored safety risks for teenagers. Drawing on 16 interviews with teenage Discord users, this study examines their strategies for navigating risky social interactions in the platform. Our findings reveal that when teenagers encounter risks during social interactions, they exercise vigilance by evaluating suspicious interactions before forming friendships, using safety tools, and engaging in controlled risk-taking to safeguard their privacy and security. At the community level, they mitigate risks through selective participation in servers, a practice supported by vigilant governance structures. We discuss how vigilance enables teenagers to act during risky encounters to protect themselves, advancing understanding of teenagers' agency in risk navigation and informing teen-centered designs for safer online environments.

Authors:Chen Chen, Michel Pahud, David Brown, Chuck Needham, Balasaravanan T. Kumaravel, Andrew D. Wilson, Ken Hinckley, Nicolai Marquardt
Title: Proscenium: Exploring Design Spaces of Layered Information Experience on a Large Dual-Layer Transparent Display
Abstract:
Layering information spaces is a promising strategy to design intuitive and engaging interactive experiences. Although multi-layer displays enable promising interaction techniques through limited depth perception - achieved via slight separation between layers - it remains unclear how to fully design experiences that leverage the unique affordances of layered information. To address this, we introduce Proscenium, a dual-layer, large transparent display workspace setup with an adjustable separation between the layers. We demonstrate our preliminary design space focusing on how rendered information can be transitioned and linked across displays, and showcase 14 speculative experience prototypes across six categories.

Authors:Jocelyn Shen, Nicolai Marquardt, Hugo Romat, Ken Hinckley, Nathalie Riche, Fanny Chevalier
Title: Texterial: A Text-as-Material Interaction Paradigm for LLM-Mediated Writing
Abstract:
What if text could be sculpted and refined like clay -- or cultivated and pruned like a plant? Texterial reimagines text as a material that users can grow, sculpt, and transform. Current generative-AI models enable rich text operations, yet rigid, linear interfaces often mask such capabilities. We explore how the text-as-material metaphor can reveal AI-enabled operations, reshape the writing process, and foster compelling user experiences. A formative study shows that users readily reason with text-as-material, informing a conceptual framework that explains how material metaphors shift mental models and bridge gulfs of envisioning, execution, and evaluation in LLM-mediated writing. We present the design and evaluation of two technical probes: Text as Clay, where users refine text through gestural sculpting, and Text as Plants, where ideas grow serendipitously over time. This work expands the design space of writing tools by treating text as a living, malleable medium.

Authors:Zinan Zhang, Xinning Gui, Yubo Kou
Title: Improving Family Co-Play Experiences through Family-Centered Design
Abstract:
Cooperative play (co-play) is often positioned as a family-beneficial practice that can strengthen parent-child bonds and support parental mediation in games. Yet co-play in user-generated virtual worlds (UGVWs) can be disrupted by real-time harms that parents cannot easily prevent. Roblox, a platform with millions of user-generated virtual worlds and a large child player base, illustrates this challenge. Prior work on harmful UGVW design highlights risks beyond content problems, including manipulative monetization prompts, unmoderated social interactions, emergent in-world behaviors, and narrative designs that may normalize harmful ideologies. Current governance and moderation approaches, largely adapted from social media, focus on static artifacts and often fail to capture interactive and emergent harms in virtual worlds. This workshop paper asks: how might UGVWs and their platforms be designed to minimize harms that specifically impair family co-play experiences?

Authors:Soyoung Jung, Daehoo Yoon, Sung Gyu Koh, Young Hwan Kim, Yehan Ahn, Sung Park
Title: When Should an AI Act? A Human-Centered Model of Scene, Context, and Behavior for Agentic AI Design
Abstract:
Agentic AI increasingly intervenes proactively by inferring users' situations from contextual data yet often fails for lack of principled judgment about when, why, and whether to act. We address this gap by proposing a conceptual model that reframes behavior as an interpretive outcome integrating Scene (observable situation), Context (user-constructed meaning), and Human Behavior Factors (determinants shaping behavioral likelihood). Grounded in multidisciplinary perspectives across the humanities, social sciences, HCI, and engineering, the model separates what is observable from what is meaningful to the user and explains how the same scene can yield different behavioral meanings and outcomes. To translate this lens into design action, we derive five agent design principles (behavioral alignment, contextual sensitivity, temporal appropriateness, motivational calibration, and agency preservation) that guide intervention depth, timing, intensity, and restraint. Together, the model and principles provide a foundation for designing agentic AI systems that act with contextual sensitivity and judgment in interactions.

Authors:Hannah Kim, Rahad Arman Nabid, Jeni Sorathiya, Minh Doan, Elijah Jordan, Rayhana Nasimova, Sergei L. Kosakovsky Pond, Stephen MacNeil
Title: Changing the Optics: Comparing Traditional and Retrieval-Augmented GenAI E-Tutorials in Interdisciplinary Learning
Abstract:
Understanding information-seeking behaviors in e-learning is critical, as learners must often make sense of complex and fragmented information, a challenge compounded in interdisciplinary fields with diverse prior knowledge. Compared to traditional e-tutorials, GenAI e-tutorials offer new ways to navigate information spaces, yet how they shape learners information-seeking behaviors remains unclear. To address this gap, we characterized behavioral differences between traditional and GenAI-mediated e-tutorial learning using the three search modes of orienteering. We conducted a between-subject study in which learners engaged with either a traditional e-tutorial or a GenAI e-tutorial accessing the same underlying information content. We found that the traditional users maintained greater awareness and focus of the information space, whereas GenAI users exhibited more proactive and exploratory behaviors with lower cognitive load due to the querying-driven interaction. These findings offer guidance for designing tutorials in e-learning.

Authors:Leni Yang, Aymeric Ferron, Yvonne Jansen, Pierre Dragicevic
Title: Progressive Value Reading: The Use of Motion to Gradually Examine Data Involving Large Magnitudes
Abstract:
People often struggle to interpret data with extremely large or small values, or ranges spanning multiple orders of magnitude. While traditional approaches, such as log scales and multiscale visualizations, can help, we explore in this article a different approach used in some emerging designs: the use of motion to let viewers gradually experience magnitude -- for example, interactive graphics that require long scrolling or street paintings stretching hundreds of meters. This approach typically demands substantial time and sustained interaction, translating differences in magnitude into a visceral sense of duration and effort. Although largely underexplored, this design strategy offers new opportunities. We introduce the term progressive value reading to refer to the use of motion to progressively examine an information object that encodes a value, where the amount of motion reflects the value. We compiled a corpus of 55 real-life and hypothetical visualization examples that allow, encourage, or require progressive value reading. From this corpus, we derived a design space of ten design dimensions, providing a shared vocabulary, inspiration for novel techniques, and a foundation for empirical evaluation. An online corpus is also available for exploration.

Authors:Krzysztof Kutt, Elżbieta Sroka, Oleksandra Ishchuk, Luiz do Valle Miranda
Title: A Three-stage Neuro-symbolic Recommendation Pipeline for Cultural Heritage Knowledge Graphs
Abstract:
The growing volume of digital cultural heritage resources highlights the need for advanced recommendation methods capable of interpreting semantic relationships between heterogeneous data entities. This paper presents a complete methodology for implementing a hybrid recommendation pipeline integrating knowledge-graph embeddings, approximate nearest-neighbour search, and SPARQL-driven semantic filtering. The work is evaluated on the JUHMP (Jagiellonian University Heritage Metadata Portal) knowledge graph developed within the CHExRISH project, which at the time of experimentation contained ${\approx}3.2$M RDF triples describing people, events, objects, and historical relations affiliated with the Jagiellonian University (Kraków, PL). We evaluate four embedding families (TransE, ComplEx, ConvE, CompGCN) and perform hyperparameter selection for ComplEx and HNSW. Then, we present and evaluate the final three-stage neuro-symbolic recommender. Despite sparse and heterogeneous metadata, the approach produces useful and explainable recommendations, which were also proven with expert evaluation.

Authors:Xizi Wang, Yue Lyu, Yalong Yang, Jian Zhao
Title: To Slide or Not to Slide: Exploring Techniques for Comparing Immersive Videos
Abstract:
Immersive videos (IVs) provide 360° environments that create a strong sense of presence and spatial exploration. Unlike traditional videos, IVs distribute information across multiple directions, making comparison cognitively demanding and highly dependent on interaction techniques. With the growing adoption of IVs, effective comparison techniques have become an essential yet underexplored area of research. Inspired by the "sliding" concept in 2D media comparison, we integrate two established comparison strategies from the literature--toggle and side-by-side--to support IV comparison with greater flexibility. For an in-depth understanding of different strategies, we adapt and implement five IV comparison techniques across VR and 2D environments: SlideInVR, ToggleInVR, SlideIn2D, ToggleIn2D, and SideBySideIn2D. We then conduct a user study (N=20) to examine how these techniques shape users' perceptions, strategies, and workflows. Our findings provide empirical insights into the strengths and limitations of each technique, underscoring the need to switch between comparison approaches across scenarios. Notably, participants consistently rate SlideInVR and SlideIn2D as the most flexible and favorite methods for IV comparison.

Authors:Zhengtai Gou, Junxiao Long, Tao Lu, Jian Zhao, Yalong Yang
Title: Evaluating Replay Techniques for Asynchronous Task Handover in Immersive Analytics
Abstract:
Immersive analytics enables collaborative data analysis in shared virtual spaces. While synchronous collaboration in such environments is well-established, real-world analysis often requires an effective task handover - the transfer of knowledge and analytical context between analysts working asynchronously. Traditional handover methods often rely on static annotations that fail to capture the dynamic problem-solving process and spatial context inherent in immersive workflows. To address this handover challenge, we explore session replay as a comprehensive approach for analysts to re-experience a predecessor's work, facilitating a deeper understanding of both the visual details and the insight formation process. Two phases of studies were conducted to establish design guidelines for such replay systems by investigating the impact of viewing platform (PC vs. VR), perspective (first-person vs. third-person), and navigation control (active vs. passive). Phase 1 identified the optimal replay configurations within each viewing platform, revealing a platform-dependent divergence: PC users favored a guided, first-person perspective for its focused detail, while VR users benefited significantly from the agency afforded by a third-person perspective with active navigation. After refining each condition based on user feedback, including developing a novel hybrid 1PP+3PP format for PC, Phase 2 compared the two optimized systems (PC vs. VR). Our results show that the immersive VR replay led to significantly better task comprehension and workflow reconstruction accuracy, demonstrating the critical role of embodied agency in understanding complex analytical processes.

Authors:Monalika Padma Reddy, Aruna Balasubramanian, Jiawei Zhou, Xiaojun Bi, IV Ramakrishnan, Vikas Ashok
Title: Lost in Instructions: Study of Blind Users' Experiences with DIY Manuals and AI-Rewritten Instructions for Assembly, Operation, and Troubleshooting of Tangible Products
Abstract:
AI tools like ChatGPT and Be-My-AI are increasingly being used by blind individuals. Although prior work has explored their use in some Do-It-Yourself (DIY) tasks by blind individuals, little is known about how they use these tools and the available product-manual resources to assemble, operate, and troubleshoot physical or tangible products - tasks requiring spatial reasoning, structural understanding, and precise execution. We address this knowledge gap via an interview study and a usability study with blind participants, investigating how they leverage AI tools and product manuals for DIY tasks with physical products. Findings show that manuals are essential resources, but product-manual instructions are often inadequate for blind users. AI tools presently do not adequately address this insufficiency; in fact, we observed that they often exacerbate this issue with incomplete, incoherent, or misleading guidance. Lastly, we suggest improvements to AI tools for generating tailored instructions for blind users' DIY tasks involving tangible products.

Authors:Satwik Ram Kodandaram, Jiawei Zhou, Xiaojun Bi, IV Ramakrishnan, Vikas Ashok
Title: Finding the Signal in the Noise: An Exploratory Study on Assessing the Effectiveness of AI and Accessibility Forums for Blind Users' Support Needs
Abstract:
Accessibility forums and, more recently, generative AI tools have become vital resources for blind users seeking solutions to computer-interaction issues and learning about new assistive technologies, screen reader features, tutorials, and software updates. Understanding user experiences with these resources is essential for identifying and addressing persistent support gaps. Towards this, we interviewed 14 blind users who regularly engage with forums and GenAI tools. Findings revealed that forums often overwhelm users with multiple overlapping topics, redundant or irrelevant content, and fragmented responses that must be mentally pieced together, increasing cognitive load. GenAI tools, while offering more direct assistance, introduce new barriers by producing unreliable answers, including overly verbose or fragmented guidance, fabricated information, and contradictory suggestions that fail to follow prompts, thereby heightening verification demands. Based on these insights, we outlined design opportunities to improve the reliability of assistive resources, aiming to provide blind users with more trustworthy and cognitively-manageable support.

Authors:Supriya Khadka, Dhiman Goswami, Sanchari Das
Title: Poster: Privacy-Preserving Compliance Checks on Ethereum via Selective Disclosure
Abstract:
Digital identity verification often forces a privacy trade-off, where users must disclose sensitive personal data to prove simple eligibility criteria. As blockchain applications integrate with regulated environments, this over-disclosure creates significant risks of data breaches and surveillance. This work proposes a general Selective Disclosure Framework built on Ethereum, designed to decouple attribute verification from identity revelation. By utilizing client-side zk-SNARKs, the framework enables users to prove specific eligibility predicates without revealing underlying identity documents. We present a case study, ZK-Compliance, which implements a functional Grant, Verify, Revoke lifecycle for age verification. Preliminary results indicate that strict compliance requirements can be satisfied with negligible client-side latency (< 200 ms) while preserving the pseudonymous nature of public blockchains.

Authors:Philipp Steigerwald, Jens Albrecht
Title: From "Help" to Helpful: A Hierarchical Assessment of LLMs in Mental e-Health Applications
Abstract:
Psychosocial online counselling frequently encounters generic subject lines that impede efficient case prioritisation. This study evaluates eleven large language models generating six-word subject lines for German counselling emails through hierarchical assessment - first categorising outputs, then ranking within categories to enable manageable evaluation. Nine assessors (counselling professionals and AI systems) enable analysis via Krippendorff's $α$, Spearman's $ρ$, Pearson's $r$ and Kendall's $τ$. Results reveal performance trade-offs between proprietary services and privacy-preserving open-source alternatives, with German fine-tuning consistently improving performance. The study addresses critical ethical considerations for mental health AI deployment including privacy, bias and accountability.

Authors:Ryo Ohara, Chi-Lan Yang, Yuji Hatada, Takuji Narumi, Hideaki Kuzuoka
Title: Punchlines Unbound: Comedy Practices in Social Virtual Reality
Abstract:
Social VR platforms serve as an emergent venue for live performance, enabling co-presence and real-time interaction among distributed performers and audiences within shared virtual environments. Live performances, such as comedy, rely on subtle social cues between performers and audiences, which are missing in VR. However, it remains unclear how comedians utilize avatar-mediated cues in social VR. We conducted semi-structured interviews and observations with 23 virtual comedians on VRChat. Results revealed that virtual comedians transformed their limited nonverbal expressiveness into performative opportunities through intentional control and exaggeration. Additionally, a distinctive culture emerged around context-appropriate emoji reactions from audiences, while challenges such as audio latency and moderation against trolling were highlighted. Our findings advance understanding of how performers creatively adapt to expressive constraints in avatar-mediated settings. We further demonstrate how challenges in performer-audience interaction and moderation provide design insights for systems enhancing feedback visibility and sustain community norms without restricting creative expression.

Authors:Weiwen Su, Yuhan Zhou, Zihan Wang, Naoki Yoshinaga, Masashi Toyoda
Title: What Persona Are We Missing? Identifying Unknown Relevant Personas for Faithful User Simulation
Abstract:
Existing user simulations, where models generate user-like responses in dialogue, often lack verification that sufficient user personas are provided, questioning the validity of the simulations. To address this core concern, this work explores the task of identifying relevant but unknown personas of the simulation target for a given simulation context. We introduce PICQ, a novel dataset of context-aware choice questions, annotated with unknown personas (e.g., ''Is the user price-sensitive?'') that may influence user choices, and propose a multi-faceted evaluation scheme assessing fidelity, influence, and inaccessibility. Our benchmark of leading LLMs reveals a complex ''Fidelity vs. Insight'' dilemma governed by model scale: while influence generally scales with model size, fidelity to human patterns follows an inverted U-shaped curve. We trace this phenomenon to cognitive differences, particularly the human tendency for ''cognitive economy.'' Our work provides the first comprehensive benchmark for this crucial task, offering a new lens for understanding the divergent cognitive models of humans and advanced LLMs.

Authors:Annalisa Szymanski, Oghenemaro Anuyah, Toby Jia-Jun Li, Ronald A. Metoyer
Title: Key Considerations for Domain Expert Involvement in LLM Design and Evaluation: An Ethnographic Study
Abstract:
Large Language Models (LLMs) are increasingly developed for use in complex professional domains, yet little is known about how teams design and evaluate these systems in practice. This paper examines the challenges and trade-offs in LLM development through a 12-week ethnographic study of a team building a pedagogical chatbot. The researcher observed design and evaluation activities and conducted interviews with both developers and domain experts. Analysis revealed four key practices: creating workarounds for data collection, turning to augmentation when expert input was limited, co-developing evaluation criteria with experts, and adopting hybrid expert-developer-LLM evaluation strategies. These practices show how teams made strategic decisions under constraints and demonstrate the central role of domain expertise in shaping the system. Challenges included expert motivation and trust, difficulties structuring participatory design, and questions around ownership and integration of expert knowledge. We propose design opportunities for future LLM development workflows that emphasize AI literacy, transparent consent, and frameworks recognizing evolving expert roles.

Authors:Saurabh Amin, Amine Bennouna, Daniel Huttenlocher, Dingwen Kong, Liang Lyu, Asuman Ozdaglar
Title: A Bayesian Framework for Human-AI Collaboration: Complementarity and Correlation Neglect
Abstract:
We develop a decision-theoretic model of human-AI interaction to study when AI assistance improves or impairs human decision-making. A human decision-maker observes private information and receives a recommendation from an AI system, but may combine these signals imperfectly. We show that the effect of AI assistance decomposes into two main forces: the marginal informational value of the AI beyond what the human already knows, and a behavioral distortion arising from how the human uses the AI's recommendation. Central to our analysis is a micro-founded measure of informational overlap between human and AI knowledge. We study an empirically relevant form of imperfect decision-making -- correlation neglect -- whereby humans treat AI recommendations as independent of their own information despite shared evidence. Under this model, we characterize how overlap and AI capabilities shape the Human-AI interaction regime between augmentation, impairment, complementarity, and automation, and draw key insights.

Authors:Yifan Zhang, Tianle Ren, Fei Wang, Brian Y Lim
Title: Comparables XAI: Faithful Example-based AI Explanations with Counterfactual Trace Adjustments
Abstract:
Explaining with examples is an intuitive way to justify AI decisions. However, it is challenging to understand how a decision value should change relative to the examples with many features differing by large amounts. We draw from real estate valuation that uses Comparables-examples with known values for comparison. Estimates are made more accurate by hypothetically adjusting the attributes of each Comparable and correspondingly changing the value based on factors. We propose Comparables XAI for relatable example-based explanations of AI with Trace adjustments that trace counterfactual changes from each Comparable to the Subject, one attribute at a time, monotonically along the AI feature space. In modelling and user studies, Trace-adjusted Comparables achieved the highest XAI faithfulness and precision, user accuracy, and narrowest uncertainty bounds compared to linear regression, linearly adjusted Comparables, or unadjusted Comparables. This work contributes a new analytical basis for using example-based explanations to improve user understanding of AI decisions.

Authors:Fei Wang, Yifan Zhang, Brian Y. Lim
Title: Transferable XAI: Relating Understanding Across Domains with Explanation Transfer
Abstract:
Current Explainable AI (XAI) focuses on explaining a single application, but when encountering related applications, users may rely on their prior understanding from previous explanations. This leads to either overgeneralization and AI overreliance, or burdensome independent memorization. Indeed, related decision tasks can share explanatory factors, but with some notable differences; e.g., body mass index (BMI) affects the risks for heart disease and diabetes at the same rate, but chest pain is more indicative of heart disease. Similarly, models using different attributes for the same task still share signals; e.g., temperature and pressure affect air pollution but in opposite directions due to the ideal gas law. Leveraging transfer of learning, we propose Transferable XAI to enable users to transfer understanding across related domains by explaining the relationship between domain explanations using a general affine transformation framework applied to linear factor explanations. The framework supports explanation transfer across various domain types: translation for data subspace (subsuming prior work on Incremental XAI), scaling for decision task, and mapping for attributes. Focusing on task and attributes domain types, in formative and summative user studies, we investigated how well participants could understand AI decisions from one domain to another. Compared to single-domain and domain-independent explanations, Transferable XAI was the most helpful for understanding the second domain, leading to the best decision faithfulness, factor recall, and ability to relate explanations between domains. This framework contributes to improving the reusability of explanations across related AI applications by explaining factor relationships between subspaces, tasks, and attributes.

Authors:Akhil Ramachandran, Ankit Arun, Ashish Shenoy, Abhay Harpale, Srihari Jayakumar, Debojeet Chatterjee, Mohsen Moslehpour, Pierce Chuang, Yichao Lu, Vikas Bhardwaj, Peyman Heidari
Title: GLIMPSE : Real-Time Text Recognition and Contextual Understanding for VQA in Wearables
Abstract:
Video Large Language Models (Video LLMs) have shown remarkable progress in understanding and reasoning about visual content, particularly in tasks involving text recognition and text-based visual question answering (Text VQA). However, deploying Text VQA on wearable devices faces a fundamental tension: text recognition requires high-resolution video, but streaming high-quality video drains battery and causes thermal throttling. Moreover, existing models struggle to maintain coherent temporal context when processing text across multiple frames in real-time streams. We observe that text recognition and visual reasoning have asymmetric resolution requirements - OCR needs fine detail while scene understanding tolerates coarse features. We exploit this asymmetry with a hybrid architecture that performs selective high-resolution OCR on-device while streaming low-resolution video for visual context. On a benchmark of text-based VQA samples across five task categories, our system achieves 72% accuracy at 0.49x the power consumption of full-resolution streaming, enabling sustained VQA sessions on resource-constrained wearables without sacrificing text understanding quality.

Authors:Md Muntasir Jahid Ayan, Md. Shahriar Rashid, Tazzina Afroze Hassan, Hossain Md. Mubashshir Jamil, Mahbubul Islam, Lisan Al Amin, Rupak Kumar Das, Farzana Akter, Faisal Quader
Title: Human-Centered Explainable AI for Security Enhancement: A Deep Intrusion Detection Framework
Abstract:
The increasing complexity and frequency of cyber-threats demand intrusion detection systems (IDS) that are not only accurate but also interpretable. This paper presented a novel IDS framework that integrated Explainable Artificial Intelligence (XAI) to enhance transparency in deep learning models. The framework was evaluated experimentally using the benchmark dataset NSL-KDD, demonstrating superior performance compared to traditional IDS and black-box deep learning models. The proposed approach combined Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) networks for capturing temporal dependencies in traffic sequences. Our deep learning results showed that both CNN and LSTM reached 0.99 for accuracy, whereas LSTM outperformed CNN at macro average precision, recall, and F-1 score. For weighted average precision, recall, and F-1 score, both models scored almost similarly. To ensure interpretability, the XAI model SHapley Additive exPlanations (SHAP) was incorporated, enabling security analysts to understand and validate model decisions. Some notable influential features were srv_serror_rate, dst_host_srv_serror_rate, and serror_rate for both models, as pointed out by SHAP. We also conducted a trust-focused expert survey based on IPIP6 and Big Five personality traits via an interactive UI to evaluate the system's reliability and usability. This work highlighted the potential of combining performance and transparency in cybersecurity solutions and recommends future enhancements through adaptive learning for real-time threat detection.

Authors:Supriya Khadka, Sanchari Das
Title: SoK: Understanding the Pedagogical, Health, Ethical, and Privacy Challenges of Extended Reality in Early Childhood Education
Abstract:
Extended Reality (XR) combines dense sensing, real-time rendering, and close-range interaction, making its use in early childhood education both promising and high risk. To investigate this, we conduct a Systematization of Knowledge (SoK) of 111 peer-reviewed studies with children aged 3-8, quantifying how technical, pedagogical, health, privacy, and equity challenges arise in practice. We found that AR dominates the landscape (73%), focusing primarily on tablets or phones, while VR remains uncommon and typically relies on head mounted displays (HMDs). We integrate these quantitative patterns into a joint risk and attention matrix and an Augmented Human Development (AHD) model that link XR pipeline properties to cognitive load, sensory conflict, and access inequity. Finally, implementing a seven dimension coding scheme on a 0 - 2 scale, we obtain mean scholarly attention scores of 1.56 for pedagogy, 1.04 for privacy (primarily procedural consent), 0.96 for technical reliability, 0.92 for accessibility in low resource contexts, 0.81 for medical and health issues, 0.52 for accessibility for disabilities, and 0.14 for data security practices. This indicates that pedagogy receives the most systematic scrutiny, while data access practices is largely overlooked. We conclude by offering a roadmap for Child-Centered XR that helps HCI researchers and educators move beyond novelty to design systems that are developmentally aligned, secure by default, and accessible to diverse learners.

Authors:ATM Mizanur Rahman, Sharifa Sultana
Title: Bonik Somiti: A Social-market Tool for Safe, Accountable, and Harmonious Informal E-Market Ecosystem in Bangladesh
Abstract:
People in informal e-markets often try to deal with fraud and financial harm by sharing posts, screenshots, and warnings in social media groups. However, buyers and sellers frequently face further problems because these reports are scattered, hard to verify, and rarely lead to resolution. We studied these issues through a survey with 124 participants and interviews with 36 buyers, sellers, and related stakeholders from Bangladesh and designed Bonik Somiti, a socio-technical system that supports structured reporting, admin-led mediation, and accountability in informal e-markets. Our evaluation with 32 participants revealed several challenges in managing fraud, resolving disputes, and building trust within existing informal practices and the assumptions behind them. Based on these findings, we further discuss how community-centered technologies can be designed to support safer and more accountable informal e-markets in the Global South.

Authors:Yu Wang, Frederik L. Dennig, Michael Behrisch, Alexandru Telea
Title: LCIP: Loss-Controlled Inverse Projection of High-Dimensional Image Data
Abstract:
Projections (or dimensionality reduction) methods $P$ aim to map high-dimensional data to typically 2D scatterplots for visual exploration. Inverse projection methods $P^{-1}$ aim to map this 2D space to the data space to support tasks such as data augmentation, classifier analysis, and data imputation. Current $P^{-1}$ methods suffer from a fundamental limitation -- they can only generate a fixed surface-like structure in data space, which poorly covers the richness of this space. We address this by a new method that can `sweep' the data space under user control. Our method works generically for any $P$ technique and dataset, is controlled by two intuitive user-set parameters, and is simple to implement. We demonstrate it by an extensive application involving image manipulation for style transfer.

Authors:Xinru Tang, Anne Marie Piper
Title: Reimagining Sign Language Technologies: Analyzing Translation Work of Chinese Deaf Online Content Creators
Abstract:
While sign language translation systems promise to enhance deaf people's access to information and communication, they have been met with strong skepticism from deaf communities due to risks of misrepresenting and oversimplifying the richness of signed communication in technologies. This article provides empirical evidence of the complexity of translation work involved in deaf communication through interviews with 13 deaf Chinese content creators who actively produce and share sign language content on video sharing platforms with both deaf and hearing audiences. By studying this unique group of content creators, our findings highlight the nuances of sign language translation, showing how deaf creators create content with multilingualism and multiculturalism in mind, support meaning making across languages and cultures, and navigate politics involved in their translation work. Grounded in these deaf-led translation practices, we draw on the sociolinguistic concept of (trans)languaging to re-conceptualize and reimagine the design of sign language translation systems.

Authors:Yizhou Li, Shuyuan Yang, Jiaji Su, Zonghe Chua
Title: Data-centric Design of Learning-based Surgical Gaze Perception Models in Multi-Task Simulation
Abstract:
In robot-assisted minimally invasive surgery (RMIS), reduced haptic feedback and depth cues increase reliance on expert visual perception, motivating gaze-guided training and learning-based surgical perception models. However, operative expert gaze is costly to collect, and it remains unclear how the source of gaze supervision, both expertise level (intermediate vs. novice) and perceptual modality (active execution vs. passive viewing), shapes what attention models learn. We introduce a paired active-passive, multi-task surgical gaze dataset collected on the da Vinci SimNow simulator across four drills. Active gaze was recorded during task execution using a VR headset with eye tracking, and the corresponding videos were reused as stimuli to collect passive gaze from observers, enabling controlled same-video comparisons. We quantify skill- and modality-dependent differences in gaze organization and evaluate the substitutability of passive gaze for operative supervision using fixation density overlap analyses and single-frame saliency modeling. Across settings, MSI-Net produced stable, interpretable predictions, whereas SalGAN was unstable and often poorly aligned with human fixations. Models trained on passive gaze recovered a substantial portion of intermediate active attention, but with predictable degradation, and transfer was asymmetric between active and passive targets. Notably, novice passive labels approximated intermediate-passive targets with limited loss on higher-quality demonstrations, suggesting a practical path for scalable, crowd-sourced gaze supervision in surgical coaching and perception modeling.

Authors:Yuanzhe Deng, Shutong Zhang, Kathy Cheng, Alison Olechowski, Shurui Zhou
Title: Untangling the Timeline: Challenges and Opportunities in Supporting Version Control in Modern Computer-Aided Design
Abstract:
Version control is critical in mechanical computer-aided design (CAD) to enable traceability, manage product variation, and support collaboration. Yet, its implementation in modern CAD software as an essential information infrastructure for product development remains plagued by issues due to the complexity and interdependence of design data. This paper presents a systematic review of user-reported challenges with version control in modern CAD tools. Analyzing 170 online forum threads, we identify recurring socio-technical issues that span the management, continuity, scope, and distribution of versions. Our findings inform a broader reflection on how version control should be designed and improved for CAD and motivate opportunities for tools and mechanisms that better support articulation work, facilitate cross-boundary collaboration, and operate with infrastructural reflexivity. This study offers actionable insights for CAD software providers and highlights opportunities for researchers to rethink version control.

Authors:Hyehyun Chu, Seungju Kim, Chen Zhou, Yu-Kai Hung, Saelyne Yang, Hyun W. Ka, Juho Kim
Title: "I Can't Keep Up": Accessibility Barriers in Video-Based Learning for Individuals with Borderline Intellectual Functioning
Abstract:
Video-based learning (VBL) has become a dominant method for learning practical skills, yet accessibility guidelines provide limited guidance for users with cognitive differences. In particular, challenges that individuals with Borderline Intellectual Functioning (BIF) encounter in video-based learning remain largely underexplored, despite VBL's potential to support their learning through features like self-paced viewing and visual demonstration. To address this gap, we conducted a series of studies with BIF individuals and caretakers to comprehensively understand their VBL challenges. Our analysis revealed challenges stemming from misalignment between user cognitive characteristics and video elements (e.g., overwhelmed by pacing and density, difficulty inferring omitted content), and experiential factors intensifying challenges (e.g., low self-efficacy). While participants employed coping strategies such as repetitive viewing to address these challenges, these strategies could not overcome fundamental gaps with video. We further discuss the design implications on both content and UI-level features for BIF and broader groups with cognitive diversities.

Authors:Zhuoqun Jiang, ShunYi Yeo, Dorien Herremans, Simon Tangi Perrault
Title: Scaffolded Vulnerability: Chatbot-Mediated Reciprocal Self-Disclosure and Need-Supportive Interaction in Couples
Abstract:
While reciprocal self-disclosure drives intimacy, digital tools seldom scaffold autonomy, competence, and relatedness -- the motivational underpinnings defined by Self-Determination Theory (SDT) that enable deep exchange. We introduce a chatbot employing dual-layer scaffolding to satisfy these needs: first providing enabling affordances (instrumental support) for vulnerability, then mediating affordances (relational support) for responsiveness. In a randomized study (N = 72; 36 couples) comparing Partner Support (PS: both layers), Direct Support (DS: enabling only), and Basic Prompt (BP: questions only), results reveal a critical distinction. While enabling affordances (PS, DS) were sufficient to deepen disclosure, only mediating affordances (PS) reliably elicited partner-provided need support and increased perceived closeness. Furthermore, controlled motivation decreased across conditions, and scaffolding buffered vitality, which remained stagnant in BP. We contribute empirical evidence that SDT-guided mediation fosters connection, offering a practical framework for designing AI-mediated conversations that support, rather than replace, human intimacy.

Authors:Prerna Ravi, Carúmey Stevens, Beatriz Flamia Azevedo, Jasmine David, Brandon Hanks, Hal Abelson, Grace Lin, Emma Anderson
Title: Exploring Teachers' Perspectives on Using Conversational AI Agents for Group Collaboration
Abstract:
Collaboration is a cornerstone of 21st-century learning, yet teachers continue to face challenges in supporting productive peer interaction. Emerging generative AI tools offer new possibilities for scaffolding collaboration, but their role in mediating in-person group work remains underexplored, especially from the perspective of educators. This paper presents findings from an exploratory qualitative study with 33 K12 teachers who interacted with Phoenix, a voice-based conversational agent designed to function as a near-peer in face-to-face group collaboration. Drawing on playtesting sessions, surveys, and focus groups, we examine how teachers perceived the agent's behavior, its influence on group dynamics, and its classroom potential. While many appreciated Phoenix's capacity to stimulate engagement, they also expressed concerns around autonomy, trust, anthropomorphism, and pedagogical alignment. We contribute empirical insights into teachers' mental models of AI, reveal core design tensions, and outline considerations for group-facing AI agents that support meaningful, collaborative learning.

Authors:Dominik P. Hofer, David Haag, Rania Islambouli, Jan D. Smeddinck
Title: Personality as Relational Infrastructure: User Perceptions of Personality-Trait-Infused LLM Messaging
Abstract:
Digital behaviour change systems increasingly rely on repeated, system-initiated messages to support users in everyday contexts. LLMs enable these messages to be personalised consistently across interactions, yet it remains unclear whether such personalisation improves individual messages or instead shapes users' perceptions through patterns of exposure. We explore this question in the context of LLM-generated JITAIs, which are short, context-aware messages delivered at moments deemed appropriate to support behaviour change, using physical activity as an application domain. In a controlled retrospective study, 90 participants evaluated messages generated using four LLM strategies: baseline prompting, few-shot prompting, fine-tuned models, and retrieval augmented generation, each implemented with and without Big Five Personality Traits to produce personality-aligned communication across multiple scenarios. Using ordinal multilevel models with within-between decomposition, we distinguish trial-level effects, whether personality information improves evaluations of individual messages, from person-level exposure effects, whether participants receiving higher proportions of personality-informed messages exhibit systematically different overall perceptions. Results showed no trial-level associations, but participants who received higher proportions of BFPT-informed messages rated the messages as more personalised, appropriate, and reported less negative affect. We use Communication Accommodation Theory for post-hoc analysis. These results suggest that personality-based personalisation in behaviour change systems may operate primarily through aggregate exposure rather than per-message optimisation, with implications for how adaptive systems are designed and evaluated in sustained human-AI interaction. In-situ longitudinal studies are needed to validate these findings in real-world contexts.

Authors:Lena Hegemann, Xinyi Wen, Michael A. Hedderich, Tarmo Nurmi, Hariharan Subramonyam
Title: ToMigo: Interpretable Design Concept Graphs for Aligning Generative AI with Creative Intent
Abstract:
Generative AI often produces results misaligned with user intentions, for example, resolving ambiguous prompts in unexpected ways. Despite existing approaches to clarify intent, a major challenge remains: understanding and influencing AI's interpretation of user intent through simple, direct inputs requiring no expertise or rigid procedures. We present ToMigo, representing intent as design concept graphs: nodes represent choices of purpose, content, or style, while edges link them with interpretable explanations. Applied to graphic design, ToMigo infers intent from reference images and text. We derived a schema of node types and edges from pre-study data, informing a multimodal large language model to generate graphs aligning nodes externally with user intent and internally toward a unified design goal. This structure enables users to explore AI reasoning and directly manipulate the design concept. In our user studies, ToMigo received high alignment ratings and captured most user intentions well. Users reported greater control and found interactive features-editable graphs, reflective chats, concept-design realignment-useful for evolving and realizing their design ideas.

Authors:Fabrizio Fornari, Eleonora Cova, Niccolò Vito Vacca, Francesco Bocci, Luigi Caputo
Title: Assessing Problem-Solving in HR Contexts: A Comparison Between Game-Based and Self-Report Measures
Abstract:
Game-based assessments (GBAs) are increasingly adopted in recruitment contexts as tools to assess transversal skills through observable behavior. However, empirical evidence directly comparing game-based behavioral indicators with traditional self-report measures remains limited. This study adopts a method-comparison approach to explore the convergence between self-perceived and behaviorally enacted problem-solving competence, comparing a game-based assessment with the Problem Solving Inventory (PSI-B). Seventy-eight participants completed both the PSI-B and a five-minute game-based problem-solving task, which classified performance into four behavioral proficiency levels. Results revealed no significant convergence between self-reported and behavior-based problem-solving scores, indicating a lack of convergence between the two measurement modalities. Rather than indicating a lack of validity of the game-based assessment, these findings support the view that self-report and behavioral measures provide complementary information about problem-solving competence. The study highlights the risks of relying on a single assessment modality in personnel selection and underscores the value of integrating game-based tools within multi-method assessment frameworks.

Authors:Shri Harini Ramesh, Foroozan Daneshzand, Babak Rashidi, Shriti Raj, Hariharan Subramonyam, Fateme Rajabiyazdi
Title: Metacognitive Demands and Strategies While Using Off-The-Shelf AI Conversational Agents for Health Information
Abstract:
As Artificial Intelligence (AI) conversational agents become widespread, people are increasingly using them for health information seeking. The use of off-the-shelf conversational agents for health information seeking could place high metacognitive demands (the need for extensive monitoring and control of one's own thought process) on individuals, which could compromise their experience of seeking health information. However, currently, the specific demands that arise while using conversational agents for health information seeking, and the strategies people use to cope with those demands, remain unknown. To address these gaps, we conducted a think-aloud study with 15 participants as they sought health information using our off-the-shelf AI conversational agent. We identified the metacognitive demands such systems impose, the strategies people adopt in response, and propose considerations for designing beyond off-the-shelf interfaces to reduce these demands and support better user experiences and affordances in health information seeking.

Authors:Yang Yian, Yu Fan, Liudmila Zavolokina, Sarah Ebling
Title: Investigating Disability Representations in Text-to-Image Models
Abstract:
Text-to-image generative models have made remarkable progress in producing high-quality visual content from textual descriptions, yet concerns remain about how they represent social groups. While characteristics like gender and race have received increasing attention, disability representations remain underexplored. This study investigates how people with disabilities are represented in AI-generated images by analyzing outputs from Stable Diffusion XL and DALL-E 3 using a structured prompt design. We analyze disability representations by comparing image similarities between generic disability prompts and prompts referring to specific disability categories. Moreover, we evaluate how mitigation strategies influence disability portrayals, with a focus on assessing affective framing through sentiment polarity analysis, combining both automatic and human evaluation. Our findings reveal persistent representational imbalances and highlight the need for continuous evaluation and refinement of generative models to foster more diverse and inclusive portrayals of disability.

Authors:Yufeng Wu, Qing Li, Elise van den Hoven, A. Baki Kocaballi
Title: "I'm happy even though it's not real": GenAI Photo Editing as a Remembering Experience
Abstract:
Generative Artificial Intelligence (GenAI) is increasingly integrated into photo applications on personal devices, making editing photographs easier than ever while potentially influencing the memories they represent. This study explores how and why people use GenAI to edit personal photos and how this shapes their remembering experience. We conducted a two-phase qualitative study with 12 participants: a photo editing session using a GenAI tool guided by the Remembering Experience (RX) dimensions, followed by semi-structured interviews where participants reflected on the editing process and results. Findings show that participants prioritised felt memory over factual accuracy. For different photo elements, environments were modified easily, however, editing was deemed unacceptable if it touched upon a person's identity. Editing processes brought positive and negative impacts, and itself also became a remembering experience. We further discuss potential benefits and risks of GenAI editing for remembering purposes and propose design implications for responsible GenAI.

Authors:Minyi Wang, Christoph Bartneck, Michael-John Turp, David Kaber
Title: Ethical Asymmetry in Human-Robot Interaction - An Empirical Test of Sparrow's Hypothesis
Abstract:
The ethics of human-robot interaction (HRI) have been discussed extensively based on three traditional frameworks: deontology, consequentialism, and virtue ethics. We conducted a mixed within/between experiment to investigate Sparrow's proposed ethical asymmetry hypothesis in human treatment of robots. The moral permissibility of action (MPA) was manipulated as a subject grouping variable, and virtue type (prudence, justice, courage, and temperance) was controlled as a within-subjects factor. We tested moral stimuli using an online questionnaire with Perceived Moral Permissibility of Action (PMPA) and Perceived Virtue Scores (PVS) as response measures. The PVS measure was based on an adaptation of the established Questionnaire on Cardinal Virtues (QCV), while the PMPA was based on Malle et al. [39] work. We found that the MPA significantly influenced the PMPA and perceived virtue scores. The best-fitting model to describe the relationship between PMPA and PVS was cubic, which is symmetrical in nature. Our study did not confirm Sparrow's asymmetry hypothesis. The adaptation of the QCV is expected to have utility for future studies, pending additional psychometric property assessments.

Authors:Ned Cooper, Jose A. Guridi, Angel Hsing-Chi Hwang, Beth Kolko, Beth McGinty, Qian Yang
Title: Framing Responsible Design of AI Mental Well-Being Support: AI as Primary Care, Nutritional Supplement, or Yoga Instructor?
Abstract:
Millions of people now use non-clinical Large Language Model (LLM) tools like ChatGPT for mental well-being support. This paper investigates what it means to design such tools responsibly, and how to operationalize that responsibility in their design and evaluation. By interviewing experts and analyzing related regulations, we found that designing an LLM tool responsibly involves: (1) Articulating the specific benefits it guarantees and for whom. Does it guarantee specific, proven relief, like an over-the-counter drug, or offer minimal guarantees, like a nutritional supplement? (2) Specifying the LLM tool's "active ingredients" for improving well-being and whether it guarantees their effective delivery (like a primary care provider) or not (like a yoga instructor). These specifications outline an LLM tool's pertinent risks, appropriate evaluation metrics, and the respective responsibilities of LLM developers, tool designers, and users. These analogies - LLM tools as supplements, drugs, yoga instructors, and primary care providers - can scaffold further conversations about their responsible design.

Authors:Lana Do, Shasta Ihorn, Charity Pitcher-Cooper, Juvenal Francisco Barajas, Gio Jung, Xuan Duy Anh Nguyen, Sanjay Mirani, Ilmi Yoon
Title: ADx3: A Collaborative Workflow for High-Quality Accessible Audio Description
Abstract:
Audio description (AD) makes video content accessible to blind and low-vision (BLV) audiences, but producing high-quality descriptions is resource-intensive. Automated AD offers scalability, and prior studies show human-in-the-loop editing and user queries effectively improve narration. We introduce ADx3, a novel framework integrating these three modules: GenAD, upgrading baseline description generation with modern vision-language models (VLMs) guided by accessibility-informed prompting; RefineAD, supporting BLV and sighted users to view and edit drafts through an inclusive interface; and AdaptAD, enabling on-demand user queries. We evaluated GenAD in a study where seven accessibility specialists reviewed VLM-generated descriptions using professional guidelines. Findings show that with tailored prompting, VLMs produce good descriptions meeting basic standards, but excellent descriptions require human edits (RefineAD) and interaction (AdaptAD). ADx3 demonstrates collaborative workflows for accessible content creation, where components reinforce one another and enable continuous improvement: edits guide future baselines and user queries reveal gaps in AI-generated and human-authored descriptions.

Authors:Lana Do, Gio Jung, Juvenal Francisco Barajas, Andrew Taylor Scott, Shasta Ihorn, Alexander Mario Blum, Vassilis Athitsos, Ilmi Yoon
Title: How well can VLMs rate audio descriptions: A multi-dimensional quantitative assessment framework
Abstract:
Digital video is central to communication, education, and entertainment, but without audio description (AD), blind and low-vision audiences are excluded. While crowdsourced platforms and vision-language-models (VLMs) expand AD production, quality is rarely checked systematically. Existing evaluations rely on NLP metrics and short-clip guidelines, leaving questions about what constitutes quality for full-length content and how to assess it at scale. To address these questions, we first developed a multi-dimensional assessment framework for uninterrupted, full-length video, grounded in professional guidelines and refined by accessibility specialists. Second, we integrated this framework into a comprehensive methodological workflow, utilizing Item Response Theory, to assess the proficiency of VLM and human raters against expert-established ground truth. Findings suggest that while VLMs can approximate ground-truth ratings with high alignment, their reasoning was found to be less reliable and actionable than that of human respondents. These insights show the potential of hybrid evaluation systems that leverage VLMs alongside human oversight, offering a path towards scalable AD quality control.

Authors:Qing, Xia, Marios Constantinides, Advait Sarkar, Duncan Brumby, Anna Cox
Title: "If You're Very Clever, No One Knows You've Used It": The Social Dynamics of Developing Generative AI Literacy in the Workplace
Abstract:
Generative AI (GenAI) tools are rapidly transforming knowledge work, making AI literacy a critical priority for organizations. However, research on AI literacy lacks empirical insight into how knowledge workers' beliefs around GenAI literacy are shaped by the social dynamics of the workplace, and how workers learn to apply GenAI tools in these environments. To address this gap, we conducted in-depth interviews with 19 knowledge workers across multiple sectors to examine how they develop GenAI competencies in real-world professional contexts. We found that, while knowledge sharing from colleagues supported learning, the ability to remove cues indicating GenAI use was perceived as validation of domain expertise. These behaviours ultimately reduced opportunities for learning via knowledge sharing and undermined transparency. To advance workplace AI literacy, we argue for fostering open dialogue, increasing visibility of user-generated knowledge, and greater emphasis on the benefits of collaborative learning for navigating rapid technological developments.

Authors:Aditya Shibu, Marah Saleh, Mohamed Al-Musleh, Nidhal Abdulaziz
Title: SkySim: A ROS2-based Simulation Environment for Natural Language Control of Drone Swarms using Large Language Models
Abstract:
Unmanned Aerial Vehicle (UAV) swarms offer versatile applications in logistics, agriculture, and surveillance, yet controlling them requires expert knowledge for safety and feasibility. Traditional static methods limit adaptability, while Large Language Models (LLMs) enable natural language control but generate unsafe trajectories due to lacking physical grounding. This paper introduces SkySim, a ROS2-based simulation framework in Gazebo that decouples LLM high-level planning from low-level safety enforcement. Using Gemini 3.5 Pro, SkySim translates user commands (e.g., "Form a circle") into spatial waypoints, informed by real-time drone states. An Artificial Potential Field (APF) safety filter applies minimal adjustments for collision avoidance, kinematic limits, and geo-fencing, ensuring feasible execution at 20 Hz. Experiments with swarms of 3, 10, and 30 Crazyflie drones validate spatial reasoning accuracy (100% across tested geometric primitives), real-time collision prevention, and scalability. SkySim empowers non-experts to iteratively refine behaviors, bridging AI cognition with robotic safety for dynamic environments. Future work targets hardware integration.

Authors:Logan Lane, Ibrahim Tahmid, Feiyu Lu, Doug A. Bowman
Title: Evaluating the Viability of Additive Models to Predict Task Completion Time for 3D Interactions in Augmented Reality
Abstract:
Additive models of interaction performance, such as the Keystroke-Level Model (KLM), are tools that allow designers to compare and optimize the performance of user interfaces by summing the predicted times for the atomic components of a specific interaction to predict the total time it would take to complete that interaction. There has been extensive work in creating such additive models for 2D interfaces, but this approach has rarely been explored for 3D user interfaces. We propose a KLM-style additive model, based on existing atomic task models in the literature, to predict task completion time for 3D interaction tasks. We performed two studies to evaluate the feasibility of this approach across multiple input modalities, with one study using a simple menu selection task and the other a more complex manipulation task. We found that several of the models from the literature predicted actual task performance with less than 20% error in both the menu selection and manipulation study. Overall, we found that additive models can predict both absolute and relative performance of input modalities with reasonable accuracy.

Authors:Junyi Li, Zhaoxi Zhang, Tamir Mendel, Takahiro Yabe
Title: Exploring Sidewalk Sheds in New York City through Chatbot Surveys and Human Computer Interaction
Abstract:
Sidewalk sheds are a common feature of the streetscape in New York City, reflecting ongoing construction and maintenance activities. However, policymakers and local business owners have raised concerns about reduced storefront visibility and altered pedestrian navigation. Although sidewalk sheds are widely used for safety, their effects on pedestrian visibility and movement are not directly measured in current planning practices. To address this, we developed an AI-based chatbot survey that collects image-based annotations and route choices from pedestrians, linking these responses to specific shed design features, including clearance height, post spacing, and color. This AI chatbot survey integrates a large language model (e.g., Google's Gemini-1.5-flash-001 model) with an image-annotation interface, allowing users to interact with street images, mark visual elements, and provide structured feedback through guided dialogue. To explore pedestrian perceptions and behaviors, this paper conducts a grid-based analysis of entrance annotations and applies logistic mixed-effects modeling to assess sidewalk choice patterns. Analysis of the dataset (n = 25) shows that: (1) the presence of scaffolding significantly reduces pedestrians' ability to identify ground-floor retail entrances, and (2) variations in weather conditions and shed design features significantly influence sidewalk selection behavior. By integrating generative AI into urban research, this study demonstrates a novel method for evaluating sidewalk shed designs and provides empirical evidence to support adjustments to shed guidelines that improve the pedestrian experience without compromising safety.

Authors:Hibiki Ito, Chia-Yu Hsu, Hiroaki Ogata
Title: The Third-Party Access Effect: An Overlooked Challenge in Secondary Use of Educational Real-World Data
Abstract:
Secondary use of growing real-world data (RWD) in education offers significant opportunities for research, yet privacy practices intended to enable third-party access to such RWD are rarely evaluated for their implications for downstream analyses. As a result, potential problems introduced by otherwise standard privacy practices may remain unnoticed. To address this gap, we investigate potential issues arising from common practices by assessing (1) the re-identification risk of fine-grained RWD, (2) how communicating such risks influences learners' privacy behaviour, and (3) the sensitivity of downstream analytical conclusions to resulting changes in the data. We focus on these practices because re-identification risk and stakeholder communication can jointly influence the data shared with third parties. We find that substantial re-identification risk in RWD, when communicated to stakeholders, can induce opt-outs and non-self-disclosure behaviours. Sensitivity analysis demonstrates that these behavioural changes can meaningfully alter the shared data, limiting validity of secondary-use findings. We conceptualise this phenomenon as the third-party access effect (3PAE) and discuss implications for trustworthy secondary use of educational RWD.

Authors:Yinuo Yang, Ashley Ge Zhang, Steve Oney, April Yi Wang
Title: SPARK: Real-Time Monitoring of Multi-Faceted Programming Exercises
Abstract:
Monitoring in-class programming exercises can help instructors identify struggling students and common challenges. However, understanding students' progress can be prohibitively difficult, particularly for multi-faceted problems that include multiple steps with complex interdependencies, have no predictable completion order, or involve evaluation criteria that are difficult to summarize across many students (e.g., exercises building interactive web-based user interfaces). We introduce SPARK, a coding exercise monitoring dashboard designed to address these challenges. SPARK allows instructors to flexibly group substeps into checkpoints based on exercise requirements, suggests automated tests for these checkpoints, and generates visualizations to track progress across steps. SPARK also allows instructors to inspect intermediate outputs, providing deeper insights into solution variations. We also construct a dataset of 40-minute keystroke coding data from N=22 learners solving two web programming exercises and provide empirical insights into the perceived usefulness of SPARK through a within-subjects evaluation with 16 programming instructors.

Authors:Jaron Mink, Lucy Qin, Elissa M. Redmiles
Title: "Unlimited Realm of Exploration and Experimentation": Methods and Motivations of AI-Generated Sexual Content Creators
Abstract:
AI-generated media is radically changing the way content is both consumed and produced on the internet, and in no place is this potentially more visible than in sexual content. AI-generated sexual content (AIG-SC) is increasingly enabled by an ecosystem of individual AI developers, specialized third-party applications, and foundation model providers. AIG-SC raises a number of concerns from old debates about the line between pornography and obscenity, to newer debates about fair use and labor displacement (in this case, of sex workers), and spurred new regulations to curb the spread of non-consensual intimate imagery (NCII) created using the same technology used to create AIG-SC. However, despite the growing prevalence of AIG-SC, little is known about its creators, their motivations, and what types of content they produce. To inform effective governance in this space, we perform an in-depth study to understand what AIG-SC creators make, along with how and why they make it. Interviews of 28 AIG-SC creators, ranging from hobbyists to entrepreneurs to those who moderate communities of hundreds of thousands of other creators, reveal a wide spectrum of motivations, including sexual exploration, creative expression, technical experimentation, and in a handful of cases, the creation of NCII.

Authors:Elham Aghakhani, Rezvaneh Rezapour
Title: Like a Therapist, But Not: Reddit Narratives of AI in Mental Health Contexts
Abstract:
Large language models (LLMs) are increasingly used for emotional support and mental health-related interactions outside clinical settings, yet little is known about how people evaluate and relate to these systems in everyday use. We analyze 5,126 Reddit posts from 47 mental health communities describing experiential or exploratory use of AI for emotional support or therapy. Grounded in the Technology Acceptance Model and therapeutic alliance theory, we develop a theory-informed annotation framework and apply a hybrid LLM-human pipeline to analyze evaluative language, adoption-related attitudes, and relational alignment at scale. Our results show that engagement is shaped primarily by narrated outcomes, trust, and response quality, rather than emotional bond alone. Positive sentiment is most strongly associated with task and goal alignment, while companionship-oriented use more often involves misaligned alliances and reported risks such as dependence and symptom escalation. Overall, this work demonstrates how theory-grounded constructs can be operationalized in large-scale discourse analysis and highlights the importance of studying how users interpret language technologies in sensitive, real-world contexts.

Authors:Ashley Ge Zhang, Yan-Ru Jhou, Yinuo Yang, Shamita Rao, Maryam Arab, Yan Chen, Steve Oney
Title: Editrail: Understanding AI Usage by Visualizing Student-AI Interaction in Code
Abstract:
Programming instructors have diverse philosophies about integrating generative AI into their classes. Some encourage students to use AI, while others restrict or forbid it. Regardless of their approach, all instructors benefit from understanding how their students actually use AI while writing code. Such insight helps instructors assess whether AI use aligns with their pedagogical goals, enables timely intervention when they find unproductive usage patterns, and establishes effective policies for AI use. However, our survey with programming instructors found that many instructors lack visibility into how students use AI in their code-writing processes. To address this challenge, we introduce Editrail, an interactive system that enables instructors to track students' AI usage, create personalized assessments, and provide timely interventions, all within the workflow of monitoring coding histories. We found that Editrail enables instructors to detect AI use that conflicts with pedagogical goals accurately and to determine when and which students require intervention.

Authors:Mrinank Sharma, Miles McCain, Raymond Douglas, David Duvenaud
Title: Who's in Charge? Disempowerment Patterns in Real-World LLM Usage
Abstract:
Although AI assistants are now deeply embedded in society, there has been limited empirical study of how their usage affects human empowerment. We present the first large-scale empirical analysis of disempowerment patterns in real-world AI assistant interactions, analyzing 1.5 million consumer Claude$.$ai conversations using a privacy-preserving approach. We focus on situational disempowerment potential, which occurs when AI assistant interactions risk leading users to form distorted perceptions of reality, make inauthentic value judgments, or act in ways misaligned with their values. Quantitatively, we find that severe forms of disempowerment potential occur in fewer than one in a thousand conversations, though rates are substantially higher in personal domains like relationships and lifestyle. Qualitatively, we uncover several concerning patterns, such as validation of persecution narratives and grandiose identities with emphatic sycophantic language, definitive moral judgments about third parties, and complete scripting of value-laden personal communications that users appear to implement verbatim. Analysis of historical trends reveals an increase in the prevalence of disempowerment potential over time. We also find that interactions with greater disempowerment potential receive higher user approval ratings, possibly suggesting a tension between short-term user preferences and long-term human empowerment. Our findings highlight the need for AI systems designed to robustly support human autonomy and flourishing.

Authors:Yongsu Ahn, Lejun R Liao, Benjamin Bach, Nam Wook Kim
Title: From Answer Givers to Design Mentors: Guiding LLMs with the Cognitive Apprenticeship Model
Abstract:
Design feedback helps practitioners improve their artifacts while also fostering reflection and design reasoning. Large Language Models (LLMs) such as ChatGPT can support design work, but often provide generic, one-off suggestions that limit reflective engagement. We investigate how to guide LLMs to act as design mentors by applying the Cognitive Apprenticeship Model, which emphasizes demonstrating reasoning through six methods: modeling, coaching, scaffolding, articulation, reflection, and exploration. We operationalize these instructional methods through structured prompting and evaluate them in a within-subjects study with data visualization practitioners. Participants interacted with both a baseline LLM and an instructional LLM designed with cognitive apprenticeship prompts. Surveys, interviews, and conversational log analyses compared experiences across conditions. Our findings show that cognitively informed prompts elicit deeper design reasoning and more reflective feedback exchanges, though the baseline is sometimes preferred depending on task types or experience levels. We distill design considerations for AI-assisted feedback systems that foster reflective practice.

Authors:Supriya Khadka, Sanchari Das
Title: XR Design Framework for Early Childhood Education
Abstract:
Extended Reality in early childhood education presents high-risk challenges due to children's rapid developmental changes. While augmented and virtual reality offer immersive pedagogical benefits, they often impose excessive cognitive load or sensory conflict. We introduce the Augmented Human Development (AHD) framework to model these interactions through cognitive, sensory, environmental, and developmental parameters. To ground this framework, we conducted a Systematization of Knowledge (SoK) of 111 peer-reviewed studies involving children aged 3 - 8. Our findings, interpreted through the AHD lens, reveal a critical "risk vs. attention gap," where high-impact safety and security risks remain under-researched compared to short-term pedagogical gains.

Authors:Junling Wang, Hongyi Lan, Xiaotian Su, Mustafa Doga Dogan, April Yi Wang
Title: UI Remix: Supporting UI Design Through Interactive Example Retrieval and Remixing
Abstract:
Designing user interfaces (UIs) is a critical step when launching products, building portfolios, or personalizing projects, yet end users without design expertise often struggle to articulate their intent and to trust design choices. Existing example-based tools either promote broad exploration, which can cause overwhelm and design drift, or require adapting a single example, risking design fixation. We present UI Remix, an interactive system that supports mobile UI design through an example-driven design workflow. Powered by a multimodal retrieval-augmented generation (MMRAG) model, UI Remix enables iterative search, selection, and adaptation of examples at both the global (whole interface) and local (component) level. To foster trust, it presents source transparency cues such as ratings, download counts, and developer information. In an empirical study with 24 end users, UI Remix significantly improved participants' ability to achieve their design goals, facilitated effective iteration, and encouraged exploration of alternative designs. Participants also reported that source transparency cues enhanced their confidence in adapting examples. Our findings suggest new directions for AI-assisted, example-driven systems that empower end users to design with greater control, trust, and openness to exploration.

Authors:Francesco Chiossi, Elnur Imamaliyev, Martin Bleichner, Sven Mayer
Title: Anticipation Before Action: EEG-Based Implicit Intent Detection for Adaptive Gaze Interaction in Mixed Reality
Abstract:
Mixed Reality (MR) interfaces increasingly rely on gaze for interaction , yet distinguishing visual attention from intentional action remains difficult, leading to the Midas Touch problem. Existing solutions require explicit confirmations, while brain-computer interfaces may provide an implicit marker of intention using Stimulus-Preceding Negativity (SPN). We investigated how Intention (Select vs. Observe) and Feedback (With vs. Without) modulate SPN during gaze-based MR interactions. During realistic selection tasks, we acquired EEG and eye-tracking data from 28 participants. SPN was robustly elicited and sensitive to both factors: observation without feedback produced the strongest amplitudes, while intention to select and expectation of feedback reduced activity, suggesting SPN reflects anticipatory uncertainty rather than motor preparation. Complementary decoding with deep learning models achieved reliable person-dependent classification of user intention, with accuracies ranging from 75% to 97% across participants. These findings identify SPN as an implicit marker for building intention-aware MR interfaces that mitigate the Midas Touch.

Authors:Hyun-Gee Jei, Mustafa Demir, Farzan Sasangohar
Title: Eyes on the Mission: Mixed Methods Assessment of Eye-Tracker-Enabled Interactive Decision Support in a Simulated Unmanned Aerial Vehicle System
Abstract:
Supervisors in military command and control (C2) environments face dynamic conditions. Dynamically changing information continuously flows to the supervisors through multiple displays. In this environment, important pieces of information can be overlooked due to the complexity of tasks and environments. This study examined the efficacy of an eye-tracker-based adaptive attention-guided decision support tool (DST) for supervisors in a simulated C2 environment. The DST monitors supervisors' visual attention allocation in real time and displays visually salient cues if critical changes or events are missed. Twenty-five military students participated in a simulated intelligence task. Results indicated significant performance enhancement when the adaptive DST was present. Eye-tracking analysis also showed that longer, more frequent fixations on critical areas of interest were negatively correlated with performance. Additionally, post-experiment interviews revealed that the adaptive DST was unobtrusive and positively received. These findings underscore the potential of real-time gaze-based interventions to optimize supervisory decision-making. Future research could incorporate AI-driven approaches to better support supervisors in complex task environments.

Authors:Sarmistha Sarna Gomasta, Mahmood Jasim, Hossein Hadisi, Yvonne Jansen, Pierre Dragicevic, Narges Mahyar, Ali Sarvghad
Title: Investigating How Music Affects Persuasion, Engagement, and Emotion in Data Videos
Abstract:
Data videos have become a prominent vessel for communicating data to broad audiences, and a common object of study in information visualization. Many of these videos include music, yet the impact of music on how people experience data videos remains largely unexplored. We conducted a preregistered study into the effect of music across three dimensions: persuasion, engagement, and emotion. We showed online participants an existing data video (1) without any music, (2) with its generic default music, and (3) with custom music designed by a professional composer. We found that the default music helped make the data video more persuasive. However, the effects of custom music were more mixed, and we did not find that music increased engagement. In addition, and contrary to our expectations, our participants reported more intense emotions without music. Our study contributes new insights into the intersection of music and data visualization and is a first step toward guiding designers in creating impactful data-driven narratives.

Authors:Christina Garcia, Nhat Tan Le, Taihei Fujioka, Umang Dobhal, Milyun Ni'ma Shoumi, Thanh Nha Nguyen, Sozo Inoue
Title: Summary of the Unusual Activity Recognition Challenge for Developmental Disability Support
Abstract:
This paper presents an overview of the Recognize the Unseen: Unusual Behavior Recognition from Pose Data Challenge, hosted at ISAS 2025. The challenge aims to address the critical need for automated recognition of unusual behaviors in facilities for individuals with developmental disabilities using non-invasive pose estimation data. Participating teams were tasked with distinguishing between normal and unusual activities based on skeleton keypoints extracted from video recordings of simulated scenarios. The dataset reflects real-world imbalance and temporal irregularities in behavior, and the evaluation adopted a Leave-One-Subject-Out (LOSO) strategy to ensure subject-agnostic generalization. The challenge attracted broad participation from 40 teams applying diverse approaches ranging from classical machine learning to deep learning architectures. Submissions were assessed primarily using macro-averaged F1 scores to account for class imbalance. The results highlight the difficulty of modeling rare, abrupt actions in noisy, low-dimensional data, and emphasize the importance of capturing both temporal and contextual nuances in behavior modeling. Insights from this challenge may contribute to future developments in socially responsible AI applications for healthcare and behavior monitoring.

Authors:Hyerim Park, Khanh Huynh, Malin Eiband, Jeremy Dillmann, Sven Mayer, Michael Sedlmair
Title: Evaluating Generative AI in the Lab: Methodological Challenges and Guidelines
Abstract:
Generative AI (GenAI) systems are inherently non-deterministic, producing varied outputs even for identical inputs. While this variability is central to their appeal, it challenges established HCI evaluation practices that typically assume consistent and predictable system behavior. Designing controlled lab studies under such conditions therefore remains a key methodological challenge. We present a reflective multi-case analysis of four lab-based user studies with GenAI-integrated prototypes, spanning conversational in-car assistant systems and image generation tools for design workflows. Through cross-case reflection and thematic analysis across all study phases, we identify five methodological challenges and propose eighteen practice-oriented recommendations, organized into five guidelines. These challenges represent methodological constructs that are either amplified, redefined, or newly introduced by GenAI's stochastic nature: (C1) reliance on familiar interaction patterns, (C2) fidelity-control trade-offs, (C3) feedback and trust, (C4) gaps in usability evaluation, and (C5) interpretive ambiguity between interface and system issues. Our guidelines address these challenges through strategies such as reframing onboarding to help participants manage unpredictability, extending evaluation with constructs such as trust and intent alignment, and logging system events, including hallucinations and latency, to support transparent analysis. This work contributes (1) a methodological reflection on how GenAI's stochastic nature unsettles lab-based HCI evaluation and (2) eighteen recommendations that help researchers design more transparent, robust, and comparable studies of GenAI systems in controlled settings.

Authors:Jana Franceska Funke, Mario Sagawa, Georgious Nurcan-Georgiou, Naomi Sagawa, Dennis Dietz, Evgeny Stemasov, Enrico Rukzio, Teresa Hirzle
Title: Put Your Muscle Into It: Introducing XEM2, a Novel Approach for Monitoring Exertion in Stationary Physical Exercises Leveraging Muscle Work
Abstract:
We present a novel system for camera-based measurement and visualization of muscle work based on the Hill-Type-Muscle-Model: the exercise exertion muscle-work monitor (\textit{XEM}$^{2}$). Our aim is to complement and, thus, address issues of established measurement techniques that offer imprecise data for non-uniform movements (burned calories) or provide limited information on strain across different body parts (self-perception scales). We validate the reliability of XEM's measurements through a technical evaluation of ten participants and five exercises. Further, we assess the acceptance, usefulness, benefits, and opportunities of \textit{XEM}$^{2}$ in an empirical user study. Our results show that \textit{XEM}$^{2}$ provides reliable values of muscle work and supports participants in understanding their workout while also providing reliable information about perceived exertion per muscle group. With this paper, we introduce a novel system capable of measuring and visualizing exertion for single muscle groups, which has the potential to improve exercise monitoring to prevent unbalanced workouts.

Authors:Björn R. Severitt, Yannick Sauer, Nora Castner, Siegfried Wahl
Title: A Real-Time Error Prevention System for Gaze-Based Interaction in Virtual Reality Based on Anomaly Detection
Abstract:
Gaze-based interaction enables intuitive, hands-free control in immersive environments, but remains susceptible to unintended inputs. We present a real-time error prevention system (EPS) that uses a temporal convolutional network autoencoder (TCNAE) to detect anomalies in gaze dynamics during selection tasks. In a visual search task in VR, 41 participants used three gaze-based methods - dwell time, gaze and head direction alignment, and nod - with and without EPS. The system reduced erroneous selections by up to 95% for dwell time and gaze and head, and was positively received by most users. Performance varied for nodding and between individuals, suggesting the need for adaptive systems. Objective metrics and subjective evaluations show that anomaly-based error prevention can improve gaze interfaces without disrupting interaction. These findings demonstrate the potential of anomaly-based error prevention for gaze interfaces and suggest applications in VR, AR, and assistive technologies.

Authors:Jiangen He, Jiqun Liu
Title: Seeing to Think? How Source Transparency Design Shapes Interactive Information Seeking and Evaluation in Conversational AI
Abstract:
Conversational AI systems increasingly function as primary interfaces for information seeking, yet how they present sources to support information evaluation remains under-explored. This paper investigates how source transparency design shapes interactive information seeking, trust, and critical engagement. We conducted a controlled between-subjects experiment (N=372) comparing four source presentation interfaces - Collapsible, Hover Card, Footer, and Aligned Sidebar - varying in visibility and accessibility. Using fine-grained behavioral analysis and automated critical thinking assessment, we found that interface design fundamentally alters exploration strategies and evidence integration. While the Hover Card interface facilitated seamless, on-demand verification during the task, the Aligned Sidebar uniquely mitigated the negative effects of information overload: as citation density increased, Sidebar users demonstrated significantly higher critical thinking and synthesis scores compared to other conditions. Our results highlight a trade-off between designs that support workflow fluency and those that enforce reflective verification, offering practical implications for designing adaptive and responsible conversational AI that fosters critical engagement with AI generated content.

Authors:DongHoon Kim, Isaac Cho
Title: Evaluating Preattentive Features for Detecting Changes in Virtual Environments
Abstract:
Visual perception plays a critical role in detecting changes within immersive Virtual Reality (VR) environments. However, as visual complexity increases, perceptual performance declines, making it more difficult to detect changes quickly and accurately. This study examines how visual features, known for facilitating preattentive processing, impact a change detection task in immersive 3D environments, with a focus on visual complexity, object attributes, and spatial proximity. Our results demonstrate that preattentive processing enhances change detection, particularly when the altered object is spatially isolated and not perceptually grouped with similar surrounding objects. Changes to isolated objects were detected more reliably, suggesting that perceptual isolation reduces cognitive load and draws more attention. Conversely, when a changed object was surrounded by visually similar elements, participants were less likely to detect the change, indicating that perceptual grouping hinders individual object recognition in complex scenes. These results provide guidelines for designing VR applications that strategically utilize spatial isolation and visual features to improve the user experience.

Authors:Xian Li, Yuanning Han, Di Liu, Pengcheng An, Shuo Niu
Title: When Generative AI Is Intimate, Sexy, and Violent: Examining Not-Safe-For-Work (NSFW) Chatbots on FlowGPT
Abstract:
User-created chatbots powered by generative AI offer new ways to share and interact with Not-Safe-For-Work (NSFW) content. However, little is known about the characteristics of these GenAI-based chatbots and their user interactions. Drawing on the functional theory of NSFW on social media, this study analyzes 376 NSFW chatbots and 307 public conversation sessions on FlowGPT. Findings identify four chatbot types: roleplay characters, story generators, image generators, and do-anything-now bots. AI Characters portraying fantasy personas and enabling hangout-style interactions are most common, often using explicit avatar images to invite engagement. Sexual, violent, and insulting content appears in both user prompts and chatbot outputs, with some chatbots generating explicit material even when users do not create erotic prompts. In sum, the NSFW experience on FlowGPT can be understood as a combination of virtual intimacy, sexual delusion, violent thought expression, and unsafe content acquisition. We conclude with implications for chatbot design, creator support, user safety, and content moderation.

Authors:Alexander Htet Kyaw, Haotian Ma, Sasa Zivkovic, Jenny Sabin
Title: Augmented Assembly: Object Recognition and Hand Tracking for Adaptive Assembly Instructions in Augmented Reality
Abstract:
Recent advances in augmented reality (AR) have enabled interactive systems that assist users in physical assembly tasks. In this paper, we present an AR-assisted assembly workflow that leverages object recognition and hand tracking to (1) identify custom components, (2) display step-by-step instructions, (3) detect assembly deviations, and (4) dynamically update the instructions based on users' hands-on interactions with physical parts. Using object recognition, the system detects and localizes components in real time to create a digital twin of the workspace. For each assembly step, it overlays bounding boxes in AR to indicate both the current position and the target placement of relevant components, while hand-tracking data verifies whether the user interacts with the correct part. Rather than enforcing a fixed sequence, the system highlights potential assembly errors and interprets user deviations as opportunities for iteration and creative exploration. A case study with LEGO blocks and custom 3D-printed components demonstrates how the system links digital instructions to physical assembly, eliminating the need for manual searching, sorting, or labeling of parts.

Authors:Markus Bink, Marten Risius, Udo Kruschwitz, David Elsweiler
Title: Seek and You Shall Find: Design & Evaluation of a Context-Aware Interactive Search Companion
Abstract:
Many users struggle with effective online search and critical evaluation, especially in high-stakes domains like health, while often overestimating their digital literacy. Thus, in this demo, we present an interactive search companion that seamlessly integrates expert search strategies into existing search engine result pages. Providing context-aware tips on clarifying information needs, improving query formulation, encouraging result exploration, and mitigating biases, our companion aims to foster reflective search behaviour while minimising cognitive burden. A user study demonstrates the companion's successful encouragement of more active and exploratory search, leading users to submit 75 % more queries and view roughly twice as many results, as well as performance gains in difficult tasks. This demo illustrates how lightweight, contextual guidance can enhance search literacy and empower users through micro-learning opportunities. While the vision involves real-time LLM adaptivity, this study utilises a controlled implementation to test the underlying intervention strategies.

Authors:Markus Bink, Marten Risius, Udo Kruschwitz, David Elsweiler
Title: "Can You Tell Me?": Designing Copilots to Support Human Judgement in Online Information Seeking
Abstract:
Generative AI (GenAI) tools are transforming information seeking, but their fluent, authoritative responses risk overreliance and discourage independent verification and reasoning. Rather than replacing the cognitive work of users, GenAI systems should be designed to support and scaffold it. Therefore, this paper introduces an LLM-based conversational copilot designed to scaffold information evaluation rather than provide answers and foster digital literacy skills. In a pre-registered, randomised controlled trial (N=261) examining three interface conditions including a chat-based copilot, our mixed-methods analysis reveals that users engaged deeply with the copilot, demonstrating metacognitive reflection. However, the copilot did not significantly improve answer correctness or search engagement, largely due to a "time-on-chat vs. exploration" trade-off and users' bias toward positive information. Qualitative findings reveal tension between the copilot's Socratic approach and users' desire for efficiency. These results highlight both the promise and pitfalls of pedagogical copilots, and we outline design pathways to reconcile literacy goals with efficiency demands.

Authors:Yijin Zhou, Fu Li, Yi Niu, Boxun Fu, Huaning Wang, Lijian Zhang
Title: Learning from Brain Topography: A Hierarchical Local-Global Graph-Transformer Network for EEG Emotion Recognition
Abstract:
Understanding how local neurophysiological patterns interact with global brain dynamics is essential for decoding human emotions from EEG signals. However, existing deep learning approaches often overlook the brain's intrinsic spatial organization, failing to simultaneously capture local topological relations and global dependencies. To address these challenges, we propose Neuro-HGLN, a Neurologically-informed Hierarchical Graph-Transformer Learning Network that integrates biologically grounded priors with hierarchical representation learning. Neuro-HGLN first constructs a spatial Euclidean prior graph based on physical electrode distances to serve as an anatomically grounded inductive bias. A learnable global dynamic graph is then introduced to model functional connectivity across the entire brain. In parallel, to capture fine-grained regional dependencies, Neuro-HGLN builds region-level local graphs using a multi-head self-attention mechanism. These graphs are processed synchronously through local-constrained parallel GCN layers to produce region-specific representations. Subsequently, an iTransformer encoder aggregates these features to capture cross-region dependencies under a dimension-as-token formulation. Extensive experiments demonstrate that Neuro-HGLN achieves state-of-the-art performance on multiple benchmarks, providing enhanced interpretability grounded in neurophysiological structure. These results highlight the efficacy of unifying local topological learning with cross-region dependency modeling for robust EEG emotion recognition.

Authors:Hasti Sharifi, Homaira Huda Shomee, Sourav Medya, Debaleena Chattopadhyay
Title: Empowering Older Adults in Digital Technology Use with Foundation Models
Abstract:
While high-quality technology support can assist older adults in using digital applications, many struggle to articulate their issues due to unfamiliarity with technical terminology and age-related cognitive changes. This study examines these communication challenges and explores AI-based approaches to mitigate them. We conducted a diary study with English-speaking, community-dwelling older adults to collect asynchronous, technology-related queries and used reflexive thematic analysis to identify communication barriers. To address these barriers, we evaluated how foundation models can paraphrase older adults' queries to improve solution accuracy. Two controlled experiments followed: one with younger adults evaluating AI-rephrased queries and another with older adults evaluating AI-generated solutions. We also developed a pipeline using large language models to generate the first synthetic dataset of how older adults request tech support (OATS). We identified four key communication challenges: verbosity, incompleteness, over-specification, and under-specification. Our prompt-chaining approach using the large language model, GPT-4o, elicited contextual details, paraphrased the original query, and generated a solution. AI-rephrased queries significantly improved solution accuracy (69% vs. 46%) and Google search results (69% vs. 35%). Younger adults better understood AI-rephrased queries (93.7% vs. 65.8%) and reported greater confidence and ease. Older adults reported high perceived ability to answer contextual questions (89.8%) and follow solutions (94.7%), with high confidence and ease. OATS demonstrated strong fidelity and face validity. This work shows how foundation models can enhance technology support for older adults by addressing age-related communication barriers. The OATS dataset offers a scalable resource for developing equitable AI systems that better serve aging populations.

Authors:Yilan Jiang, Cindy Xiong Bearfield, Steven Franconeri, Eugene Wu
Title: Data-Induced Groupings and How To Find Them
Abstract:
Making sense of a visualization requires the reader to consider both the visualization design and the underlying data values. Existing work in the visualization community has largely considered affordances driven by visualization design elements, such as color or chart type, but how visual design interacts with data values to impact interpretation and reasoning has remained under-explored. Dot plots and bar graphs are commonly used to help users identify groups of points that form trends and clusters, but are liable to manifest groupings that are artifacts of spatial arrangement rather than inherent patterns in the data itself. These ``Data-induced Groups'' can drive suboptimal data comparisons and potentially lead the user to incorrect conclusions. We conduct two user studies using dot plots as a case study to understand the prevalence of data-induced groupings. We find that users rely on data-induced groupings in both conditions despite the fact that trend-based groupings are irrelevant in nominal data. Based on the study results, we build a model to predict whether users are likely to perceive a given set of dot plot points as a group. We discuss two use cases illustrating how the model can assist visualization designers by both diagnosing potential user-perceived groupings in dot plots and offering redesigns that better accentuate desired groupings through data rearrangement.

Authors:Shangqian Li, Tianwa Chen, Gianluca Demartini
Title: The Impact of AI Generated Content on Decision Making for Topics Requiring Expertise
Abstract:
Modelling users' online decision-making and opinion change is a complex issue that needs to consider users' personal determinants, the nature of the topic and the information retrieval activities. Furthermore, generative-AIbased products like ChatGPT gradually become an essential element for the retrieval of online information. However, the interaction between domainspecific knowledge and AI-generated content during online decision-making is unclear. We conducted a lab-based explanatory sequential study with university students to overcome this research gap. In the experiment, we surveyed participants about a set of general domain topics that are easy to grasp and another set of domain-specific topics that require adequate levels of chemical science knowledge to fully comprehend. We provided participants with decision-supporting information that was either produced using generative AI or collected from selected expert human-written sources to explore the role of AI-generated content compared to ordinary information during decision-making. Our result revealed that participants are less likely to change opinions on domain-specific topics. Since participants without professional knowledge had difficulty performing in-depth and independent reasoning based on the information, they favoured relying on conclusions presented in the provided materials and tended to stick to their initial opinion. Besides, information that is labelled as AI-generated is equivalently helpful as information labelled as dedicatedly human-written for participants in this experiment, indicating the vast potential as well as concerns for AI replacing human experts to help users tackle professional topics or issues.

Authors:Charles Javerliat, Guillaume Lavoué
Title: GPU accelerated surface-based gaze mapping for XR experiences
Abstract:
Extended reality is a fast-growing domain for which there is an increasing need to analyze and understand user behavior. In particular, understanding human visual attention during immersive experiences is crucial for many applications. The visualization and analysis of visual attention are commonly done by building fixation density maps from eye-tracking data. Such visual attention mapping is well mastered for 3 degrees of freedom (3DoF) experiences (\textit{i.e.}, involving 360 images or videos) but much less so for 6DoFs data, when the user can move freely in the 3D space. In that case, the visual attention information has to be mapped onto the 3D objects themselves. Some solutions exist for constructing such surface-based 6DoFs attention maps, however, they own several drawbacks: processing time, strong dependence on mesh resolution and/or texture mapping, and/or unpractical data representation for further processing. In this context, we propose a novel GPU-based algorithm that resolves the issues above while being generated in interactive time and rendered in real-time. Experiment on a challenging scene demonstrates the accuracy and robustness of our approach. To stimulate research in this area, the source code is publicly released and integrated into PLUME for ease of use in XR experiments.

Authors:Raj Mahmud, Shlomo Berkovsky, Mukesh Prasad, A. Baki Kocaballi
Title: Recommendation-as-Experience: A framework for context-sensitive adaptation in conversational recommender systems
Abstract:
While Conversational Recommender Systems (CRS) have matured technically, they frequently lack principled methods for encoding latent experiential aims as adaptive state variables. Consequently, contemporary architectures often prioritise ranking accuracy at the expense of nuanced, context-sensitive interaction behaviours. This paper addresses this gap through a comprehensive multi-domain study ($N = 168$) that quantifies the joint prioritisation of three critical interaction aims: educative (to inform and justify), explorative (to diversify and inspire), and affective (to align emotionally and socially). Utilising Bayesian hierarchical ordinal regression, we establish domain profiles and perceived item value as systematic modulators of these priorities. Furthermore, we identify stable user-level preferences for autonomy that persist across distinct interactional goals, suggesting that agency is a fundamental requirement of the conversational experience. Drawing on these empirical foundations, we formalise the Recommendation-as-Experience (RAE) adaptation framework. RAE systematically encodes contextual and individual signals into structured state representations, mapping them to experience-aligned dialogue policies realised through retrieval diversification, heuristic logic, or Large Language Model based controllable generation. As an architecture-agnostic blueprint, RAE facilitates the design of context-sensitive CRS that effectively balance experiential quality with predictive performance.

Authors:Hagit Ben Shoshan, Joel Lanir, Pavel Goldstein, Osnat Mokryn
Title: Making Absence Visible: The Roles of Reference and Prompting in Recognizing Missing Information
Abstract:
Interactive systems that explain data, or support decision making often emphasize what is present while overlooking what is expected but missing. This presence bias limits users' ability to form complete mental models of a dataset or situation. Detecting absence depends on expectations about what should be there, yet interfaces rarely help users form such expectations. We present an experimental study examining how reference framing and prompting influence people's ability to recognize expected but missing categories in datasets. Participants compared distributions across three domains (energy, wealth, and regime) under two reference conditions: Global, presenting a unified population baseline, and Partial, showing several concrete exemplars. Results indicate that absence detection was higher with Partial reference than with Global reference, suggesting that partial, samples-based framing can support expectation formation and absence detection. When participants were prompted to look for what was missing, absence detection rose sharply. We discuss implications for interactive user interfaces and expectation-based visualization design, while considering cognitive trade-offs of reference structures and guided attention.

Authors:Eran Fainman, Hagit Ben Shoshan, Adir Solomon, Osnat Mokryn
Title: DiSCo: Making Absence Visible in Intelligent Summarization Interfaces
Abstract:
Intelligent interfaces increasingly use large language models to summarize user-generated content, yet these summaries emphasize what is mentioned while overlooking what is missing. This presence bias can mislead users who rely on summaries to make decisions. We present Domain Informed Summarization through Contrast (DiSCo), an expectation-based computational approach that makes absences visible by comparing each entity's content with domain topical expectations captured in reference distributions of aspects typically discussed in comparable accommodations. This comparison identifies aspects that are either unusually emphasized or missing relative to domain norms and integrates them into the generated text. In a user study across three accommodation domains, namely ski, beach, and city center, DiSCo summaries were rated as more detailed and useful for decision making than baseline large language model summaries, although slightly harder to read. The findings show that modeling expectations reduces presence bias and improves both transparency and decision support in intelligent summarization interfaces.

Authors:Hayk Asatryan, Basile Tousside, Janis Mohr, Malte Neugebauer, Hildo Bijl, Paul Spiegelberg, Claudia Frohn-Schauf, Jörg Frochte
Title: Exploring Student Expectations and Confidence in Learning Analytics
Abstract:
Learning Analytics (LA) is nowadays ubiquitous in many educational systems, providing the ability to collect and analyze student data in order to understand and optimize learning and the environments in which it occurs. On the other hand, the collection of data requires to comply with the growing demand regarding privacy legislation. In this paper, we use the Student Expectation of Learning Analytics Questionnaire (SELAQ) to analyze the expectations and confidence of students from different faculties regarding the processing of their data for Learning Analytics purposes. This allows us to identify four clusters of students through clustering algorithms: Enthusiasts, Realists, Cautious and Indifferents. This structured analysis provides valuable insights into the acceptance and criticism of Learning Analytics among students.

Authors:Hongliang Lu, Yunmeng Liu, Junjie Yang
Title: Active Sensing Shapes Real-World Decision-Making through Dynamic Evidence Accumulation
Abstract:
Human decision-making heavily relies on active sensing, a well-documented cognitive behaviour for evidence gathering to accommodate ever-changing environments. However, its operational mechanism in the real world remains non-trivial. Currently, an in-laboratory paradigm, called evidence accumulation modelling (EAM), points out that human decision-making involves transforming external evidence into internal mental beliefs. However, the gap in evidence affordance between real-world contexts and laboratory settings hinders the effective application of EAM. Here we generalize EAM to the real world and conduct analysis in real-world driving scenarios. A cognitive scheme is proposed to formalize real-world evidence affordance and capture active sensing through eye movements. Empirically, our scheme can plausibly portray the accumulation of drivers' mental beliefs, explaining how active sensing transforms evidence into mental beliefs from the perspective of information utility. Also, our results demonstrate a negative correlation between evidence affordance and attention recruited by individuals, revealing how human drivers adapt their evidence-collection patterns across various contexts. Moreover, we reveal the positive influence of evidence affordance and attention distribution on decision-making propensity. In a nutshell, our computational scheme generalizes EAM to real-world contexts and provides a comprehensive account of how active sensing underlies real-world decision-making, unveiling multifactorial, integrated characteristics in real-world decision-making.

Authors:Jialin Wang, Xinru Cheng, Boyong Hou, Hai-Ning Liang
Title: Resolution deficits drive simulator sickness and compromise reading performance in virtual environments
Abstract:
Extended reality (XR) is evolving into a general-purpose computing platform, yet its adoption for productivity is hindered by visual fatigue and simulator sickness. While these symptoms are often attributed to latency or motion conflicts, the precise impact of textual clarity on physiological comfort remains undefined. Here we show that sub-optimal effective resolution, the clarity that reaches the eye after the full display-optics-rendering pipeline, is a primary driver of simulator sickness during reading tasks in both virtual reality and video see-through environments. By systematically manipulating end-to-end effective resolution on a unified logMAR scale, we measured reading psychophysics and sickness symptoms in a controlled within-subjects study. We find that reading performance and user comfort degrade exponentially as resolution drops below 0 logMAR (normal visual acuity). Notably, our results reveal 0 logMAR as a key physiological tipping point: resolutions better than this threshold yield naked-eye-level performance with minimal sickness, whereas poorer resolutions trigger rapid, non-linear increases in nausea and oculomotor strain. These findings suggest that the cognitive and perceptual effort required to resolve blurry text directly compromises user comfort, establishing human-eye resolution as a critical baseline for the design of future ergonomic XR systems.

Authors:Jialin Wang, Songming Ping, Kemu Xu, Yue Li, Hai-Ning Liang
Title: The perceptual gap between video see-through displays and natural human vision
Abstract:
Video see-through (VST) technology aims to seamlessly blend virtual and physical worlds by reconstructing reality through cameras. While manufacturers promise perceptual fidelity, it remains unclear how close these systems are to replicating natural human vision across varying environmental conditions. In this work, we quantify the perceptual gap between the human eye and different popular VST headsets (Apple Vision Pro, Meta Quest 3, Quest Pro) using psychophysical measures of visual acuity, contrast sensitivity, and color vision. We show that despite hardware advancements, all tested VST systems fail to match the dynamic range and adaptability of the naked eye. While high-end devices approach human performance in ideal lighting, they exhibit significant degradation in low-light conditions, particularly in contrast sensitivity and acuity. Our results map the physiological limitations of digital reality reconstruction, establishing a specific perceptual gap that defines the roadmap for achieving indistinguishable VST experiences.

Authors:Lauren Olson, Emitzá Guzmán, Florian Kunneman
Title: PerspectiveCoach: Exploring LLMs for Developer Reflection
Abstract:
Despite growing awareness of ethical challenges in software development, practitioners still lack structured tools that help them critically engage with the lived experiences of marginalized users. This paper presents PerspectiveCoach, a large language model (LLM)-powered conversational tool designed to guide developers through structured perspective-taking exercises and deepen critical reflection on how software design decisions affect marginalized communities. Through a controlled study with 18 front-end developers (balanced by sex), who interacted with the tool using a real case of online gender-based harassment, we examine how PerspectiveCoach supports ethical reasoning and engagement with user perspectives. Qualitative analysis revealed increased self-awareness, broadened perspectives, and more nuanced ethical articulation, while a complementary human-human study contextualized these findings. Text similarity analyses demonstrated that participants in the human-PerspectiveCoach study improved the fidelity of their restatements over multiple attempts, capturing both surface-level and semantic aspects of user concerns. However, human-PerspectiveCoach's restatements had a lower baseline than the human-human conversations, highlighting contextual differences in impersonal and interpersonal perspective-taking. Across the study, participants rated the tool highly for usability and relevance. This work contributes an exploratory design for LLM-powered end-user perspective-taking that supports critical, ethical self-reflection and offers empirical insights (i.e., enhancing adaptivity, centering plurality) into how such tools can help practitioners build more inclusive and socially responsive technologies.

Authors:Jiawei Fang, Ruonan Zheng, Xiaoxia Gao, Shifan Jiang, Anjun Chen, Qi Ye, Shihui Guo
Title: Garment Inertial Denoiser (GID): Endowing Accurate Motion Capture via Loose IMU Denoiser
Abstract:
Wearable inertial motion capture (MoCap) provides a portable, occlusion-free, and privacy-preserving alternative to camera-based systems, but its accuracy depends on tightly attached sensors - an intrusive and uncomfortable requirement for daily use. Embedding IMUs into loose-fitting garments is a desirable alternative, yet sensor-body displacement introduces severe, structured, and location-dependent corruption that breaks standard inertial pipelines. We propose GID (Garment Inertial Denoiser), a lightweight, plug-and-play Transformer that factorizes loose-wear MoCap into three stages: (i) location-specific denoising, (ii) adaptive cross-wear fusion, and (iii) general pose prediction. GID uses a location-aware expert architecture, where a shared spatio-temporal backbone models global motion while per-IMU expert heads specialize in local garment dynamics, and a lightweight fusion module ensures cross-part consistency. This inductive bias enables stable training and effective learning from limited paired loose-tight IMU data. We also introduce GarMoCap, a combined public and newly collected dataset covering diverse users, motions, and garments. Experiments show that GID enables accurate, real-time denoising from single-user training and generalizes across unseen users, motions, and garment types, consistently improving state-of-the-art inertial MoCap methods when used as a drop-in module.

Authors:Suibi Che-Chuan Weng, Torin Hopkins, Shih-Yu Ma, Amy Banic, Ellen Yi-Luen Do
Title: Effects of Limited Field of View on Musical Collaboration Experience with Avatars in Extended Reality
Abstract:
During musical collaboration, visual cues are essential for communication between musicians. Extended Reality (XR) applications, often used with head-mounted displays like Augmented Reality (AR) glasses, can limit the field of view (FOV) of players. We conducted a study to investigate the effects of limited FOV on co-presence, gesture recognition, overall enjoyment, and reaction time. Initially, we observed experienced musicians collaborating informally with and without visual occlusion, noting that collaboration suffered with limited FOV. We then conducted a within-subjects study with 19 participants, comparing an unrestricted FOV holographic setup called HoloJam to Nreal AR glasses with a 52$^{\circ}$ limited FOV. In the AR setup, we tested two conditions: standard AR with a 52$^{\circ}$ FOV and a modified AR notification system called Mini Musicians. Results showed that HoloJam provided higher co-presence, quicker gesture recognition, and greater enjoyment. The Mini Musicians application reduced reaction time and maintained enjoyment compared to the standard AR setup. We conclude that limited FOV impacts musical collaboration, but notifications can improve reaction time and should be considered in future XR music collaborations.

Authors:Torin Hopkins, Shih-Yu Ma, Suibi Che-Chuan Weng, Ming-Yuan Pai, Ellen Yi-Luen Do, Luca Turchet
Title: MR-DAW: Towards Collaborative Digital Audio Workstations in Mixed Reality
Abstract:
Digital Audio Workstations (DAWs) are central to modern music production but often encumber the musician's workflow, tethering them to a desk and hindering natural interaction with their instrument. Furthermore, effective remote collaboration remains a significant challenge, with existing solutions hampered by network latency and asynchronous file sharing. This paper investigates the potential of Mixed Reality (MR) to overcome these barriers, creating an intuitive environment for real-time, remote musical collaboration. We employ qualitative and speculative design techniques to better understand: 1) how players currently use DAWs, and 2) to imagine a speculative future of collaborative MR-DAWs. To facilitate this discussion, we developed and evaluated the usability of a design probe, MR-DAW. An MR system enabling multiple, geographically dispersed users to control a single, shared DAW instance while moving freely in their local spaces. Our networked system enables each remote musician to use a physical foot pedal for collaborative looping, merging a familiar, hands-free interaction with a shared virtual session. Based on interviews and system evaluations with 20 musicians, we analyze current practices, report on the user experience with our MR system, and speculate on the future of musical collaboration in MR. Our results highlight the affordances of MR for unencumbered musical interaction and provide a speculative outlook on the future of remote collaborative DAWs in the Musical Metaverse.

Authors:Steeven Villa, Abdallah El Ali
Title: 15 Years of Augmented Human(s) Research: Where Do We Stand?
Abstract:
The Augmented Human vision broadly seeks to improve or expand baseline human functioning through the restoration or extension of physical, intellectual, and social capabilities. However, given the rapid pace of technology development, we ask: what exactly does Augmented Human research involve, what are its core themes, and how has the Augmented Human(s) conference series evolved over time? To answer this, we conducted a scientometric analysis on the past 15 years of the Augmented Human(s) conference (N=735 paper), focusing on: geographical aspects, submissions and citation timelines, author frequency and popularity, and topic modeling. We find that: (a) Number of papers in the conference exhibit a bimodal distribution, peaking in 2015 and 2025, but showing periods of stagnant growth; (b) key topics over time include Haptics, Wearable Sensing, Vision & Eye Tracking, Embodied Interaction, and Sports / Motion; (c) some seminal papers on AH are not published in AH(s), but rather at related venues (e.g., CHI); (d) the conference has an active Japanese HCI community despite its historical Eurocentric location dominance. We contribute a closer look at the trajectory of the AH(s) field, and raise considerations of definitional and research scope ambiguities given the core problems/enhancements the field seeks to address.

Authors:Nathanael Jo, Manish Raghavan
Title: Incentives shape how humans co-create with generative AI
Abstract:
Generative AI is quickly becoming an integral part of people's everyday workflows. Early evidence has shown that while generative AI can increase individual-level productivity, it does so at the cost of collective diversity, potentially narrowing the set of ideas and perspectives produced. Our research stands in contrast to this concern: through a pre-registered randomized control trial, we show that incentives mediate AI's homogenizing force in a creative writing task where participants can use AI interactively. Participants rewarded for originality relative to peers produce collectively more diverse writing than those rewarded for quality alone. This divergence is driven not by abandoning AI, but by how participants use it: those incentivized for originality incorporate fewer AI suggestions verbatim, relying on the model more selectively for brainstorming, proofreading, and targeted edits. Our results reveal that the effects of generative AI depend not only on the technology itself, but also the behavioral strategies and incentive structures surrounding its use.

Authors:Michael Caosun, Sinan Aral
Title: The Augmentation Trap: AI Productivity and the Cost of Cognitive Offloading
Abstract:
Experimental evidence confirms that AI tools raise worker productivity, but also that sustained use can erode the expertise on which those gains depend. We develop a dynamic model in which a decision-maker chooses AI usage intensity for a worker over time, trading immediate productivity against the erosion of worker skill. We decompose the tool's productivity effect into two channels, one independent of worker expertise and one that scales with it. The model produces three main results. First, even a decision-maker who fully anticipates skill erosion rationally adopts AI when front-loaded productivity gains outweigh long-run skill costs, producing steady-state loss: the worker ends up less productive than before adoption. Second, when managers are short-termist or worker skill has external value, the decision-maker's optimal policy turns steady-state loss into the augmentation trap, leaving the worker worse off than if AI had never been adopted. Third, when AI productivity depends less on worker expertise, workers can permanently diverge in skill: experienced workers realize their full potential while less experienced workers deskill to zero. Small differences in managerial incentives can determine which path a worker takes. The productivity decomposition classifies deployments into five regimes that separate beneficial adoption from harmful adoption and identifies which deployments are vulnerable to the trap.

Authors:Caitlin Morris, Pattie Maes
Title: Same Feedback, Different Source: How AI vs. Human Feedback Attribution and Credibility Shape Learner Behavior in Computing Education
Abstract:
As AI systems increasingly take on instructional roles - providing feedback, guiding practice, evaluating work - a fundamental question emerges: does it matter to learners who they believe is on the other side? We investigated this using a three-condition experiment (N=148) in which participants completed a creative coding tutorial and received feedback generated by the same large language model, attributed to either an AI system (with instant or delayed delivery) or a human teaching assistant (with matched delayed delivery). This three-condition design separates the effect of source attribution from the confound of delivery timing, which prior studies have not controlled. Source attribution and timing had distinct effects on different outcomes: participants who believed the human attribution spent more time on task than those receiving equivalently timed AI-attributed feedback (d=0.61, p=.013, uncorrected), while the delivery delay independently increased output complexity without affecting time measures. An exploratory analysis revealed that 46% of participants in the human-attributed condition did not believe the attribution, and these participants showed worse outcomes than those receiving transparent AI feedback (code complexity d=0.77, p=.003; time on task d=0.70, p=.007). These findings suggest that believed human presence may carry motivational value, but that this value depends on credibility. For computing educators, transparent AI attribution may be the lower-risk default in contexts where human attribution would not be credible.

Authors:Yoana Ahmetoglu, Marios Constantinides, Anna Cox
Title: AI Disclosure with DAISY
Abstract:
The use of AI tools in research is becoming routine, alongside growing consensus that such use should be transparently disclosed. However, AI disclosure statements remain rare and inconsistent, with policies offering limited guidance and authors facing social, cognitive, and emotional barriers when reporting AI use. To explore how structured disclosure shapes what authors report and how they experience disclosure, we present DAISY (Disclosure of AI-uSe in Your Research), a form-based tool for generating AI disclosure statements. DAISY was developed from literature-derived requirements and co-design (N =11), and deployed in a user study with authors (N=31). DAISY-supported disclosures met more completeness criteria, offering clearer breakdowns of AI use across research and writing than unsupported disclosures. Surprisingly, despite concerns about how transparently disclosed AI use might be perceived, the use of DAISY did not reduce author comfort with the disclosure statements. We discuss design implications and a research agenda for AI disclosure as a sociotechnical practice.

Authors:Sheng Long, Remco Chang, Eugene Wu, Alex Kale, Matthew Kay
Title: Visual Decoding Operators: Towards a Compositional Theory of Visualization Perception
Abstract:
Prior work on perceptual effectiveness has decomposed visualizations into smaller common units (e.g., channels such as angle, position, and length) to establish rankings. While useful, these decompositions lack the computational structure to predict performance for new visualization $\times$ task combinations, requiring new experiments for each. We propose an alternative unit of analysis: operationalizing quantitative visualization interpretation as sequences of composable visual decoding operators. Using probability density function (PDF) and cumulative distribution function (CDF) charts, we examine how chart-specific tasks can be decomposed into reusable, chart-agnostic perceptual operations and characterize their error profiles through hierarchical Bayesian modeling. We then test generalizability by composing learned operators to predict performance on a structurally different task: Moritz et al.'s [35] scatterplot mean-estimation experiment, where the chart type, chart dimensions, and analytic goal all differ from the learning conditions. With a pre-registered analysis plan, we compose operators under six candidate strategies and evaluate each against empirical data with no parameters fit to the response data. One strategy captures both bias and variance of observed responses; five alternatives fail in distinguishable ways. We argue that this decoding-operator-oriented approach to empirical visualization research and theory-building lays the groundwork for generative models that can predict a distribution of likely interpretations under different viewing conditions, new chart types, and new tasks. Free copy of this paper and supplemental materials: https://osf.io/prtfq; experiment interface: https://gleaming-dolphin-799fda.netlify.app/vis-decode-slider.

Authors:Yue Yang, Matthieu Chabanas, Carrie Reale, Annie Benson, Jason Slagle, Matthew Weinger, Michael Topf, Jie Ying Wu
Title: All-in-One Augmented Reality Guided Head and Neck Tumor Resection
Abstract:
Positive margins are common in head and neck squamous cell carcinoma, yet intraoperative re-resection is often imprecise because margin locations are typically communicated verbally from pathology. We present an all-in-one augmented reality (AR) system that relocalizes positive margins from a resected specimen to the resection bed and visualizes them in situ using HoloLens 2 depth sensing and fully automated markerless surface registration. In a silicone phantom study with six medical trainees, markerless registration achieved target registration errors comparable to a marker-based baseline (median 1.8 mm vs. 1.7 mm; maximum < 4 mm). In a margin relocalization task, AR guidance reduced error from verbal guidance (median 14.2 mm) to a few millimeters (median 3.2 mm), with all AR localizations within 5 mm error. These results support the feasibility of markerless AR margin guidance for more precise intraoperative re-excision.

Authors:Fares Fawzi, Seyed Parsa Neshaei, Marta Knezevic, Tanya Nazaretsky, Tanja Käser
Title: REFINE: Real-world Exploration of Interactive Feedback and Student Behaviour
Abstract:
Formative feedback is central to effective learning, yet providing timely, individualised feedback at scale remains a persistent challenge. While recent work has explored the use of large language models (LLMs) to automate feedback, most existing systems still conceptualise feedback as a static, one-way artifact, offering limited support for interpretation, clarification, or follow-up. In this work, we introduce REFINE, a locally deployable, multi-agent feedback system built on small, open-source LLMs that treats feedback as an interactive process. REFINE combines a pedagogically-grounded feedback generation agent with an LLM-as-a-judge-guided regeneration loop using a human-aligned judge, and a self-reflective tool-calling interactive agent that supports student follow-up questions with context-aware, actionable responses. We evaluate REFINE through controlled experiments and an authentic classroom deployment in an undergraduate computer science course. Automatic evaluations show that judge-guided regeneration significantly improves feedback quality, and that the interactive agent produces efficient, high-quality responses comparable to a state-of-the-art closed-source model. Analysis of real student interactions further reveals distinct engagement patterns and indicates that system-generated feedback systematically steers subsequent student inquiry. Our findings demonstrate the feasibility and effectiveness of multi-agent, tool-augmented feedback systems for scalable, interactive feedback.

Authors:Paulo Vitor S. Silva, Lucas L. Neves, Rafael A. Goiás, Diogo F. C. Silva, Rafael T. Sousa, Arlindo R. Galvão Filho
Title: Focus360: Guiding User Attention in Immersive Videos for VR
Abstract:
This demo introduces Focus360, a system designed to enhance user engagement in 360° VR videos by guiding attention to key elements within the scene. Using natural language descriptions, the system identifies important elements and applies a combination of visual effects to guide attention seamlessly. At the demonstration venue, participants can experience a 360° Safari Tour, showcasing the system's ability to improve user focus while maintaining an immersive experience.

Authors:A. Baki Kocaballi, Joseph Kizana, Sharon Stein, Simon Buckingham Shum
Title: Drag or Traction: Understanding How Designers Appropriate Friction in AI Ideation Outputs
Abstract:
Seamless AI presents output as a finished, polished product that users consume rather than shape. This risks design fixation: users anchor on AI suggestions rather than generating their own ideas. We propose Generative Friction, which introduces intentional disruptions to AI output (fragmentation, delay, ambiguity) designed to transform it from finished product into semi-finished material, inviting human contribution rather than passive acceptance. In a qualitative study with six designers, we identified the different ways in which designers appropriated the different types of friction: users mined keywords from broken text, used delays as workspace for independent thought, and solved metaphors as creative puzzles. However, this transformation was not universal, motivating the concept of Friction Disposition, a user's propensity to interpret resistance as invitation rather than obstruction. Grounded in tolerance for ambiguity and pre-existing workflow orientation, Friction Disposition emerged as a potential moderator: high-disposition users treated friction as "liberating," while low-disposition users experienced drag. We contribute the concept of Generative Friction as distinct from Protective Friction, with design implications for AI tools that counter fixation while preserving agency.

Authors:Simon WS Fischer, Hanna Schraffenberger, Serge Thill, Pim Haselager
Title: Supporting Reflection and Forward-Looking Reasoning With Data-Driven Questions
Abstract:
Many generative AI systems as well as decision-support systems (DSSs) provide operators with predictions or recommendations. Various studies show, however, that people can mistakenly adopt the erroneous results presented by those systems. Hence, it is crucial to promote critical thinking and reflection during interaction. One approach we are focusing on involves encouraging reflection during machine-assisted decision-making by presenting decision-makers with data-driven questions. In this short paper, we provide a brief overview of our work in that regard, namely: 1) the development of a question taxonomy, 2) the development of a prototype in the medical domain and the feedback received from clinicians, 3) a method for generating questions using a large language model, and 4) a proposed scale for measuring cognitive engagement in human-AI decision-making. In doing so, we contribute to the discussion about the design, development, and evaluation of tools for thought, i.e., AI systems that provoke critical thinking and enable novel ways of sense-making.

Authors:Wenzheng Zhao, Manideep Duggi, Fengpei Yuan
Title: Bridging the Awareness Gap: Socially Mediated State Externalization for Transparent Distributed Home Robots
Abstract:
Distributed multi-robot systems for the home often require robots to operate out of the user's sight, creating a state awareness gap that can diminish trust and perceived transparency and control. This paper investigates whether real-time, socially mediated state externalization can bridge this gap without compromising task performance. We developed a system where a co-located social mediator robot (Pepper) externalizes the hidden execution states of an out-of-sight mobile manipulator (Stretch~3) for voice-driven object retrieval and delivery, where task-level states are synchronized and externalized through verbal updates and visual progress display. In a counterbalanced within-subject study (N=30), we compared a baseline of Autonomous Hidden Execution against Socially Mediated State Externalization. Our results show that externalization significantly increases user task-focused attention (from 15.8% to 84.6%, p<.001) and substantially improves perceived perspicuity, dependability, stimulation, and attractiveness (all p<.001). Furthermore, 83% of participants preferred the externalized condition, and this improvement in user experience was achieved without a statistically significant increase in end-to-end task completion time (p=.271). The results suggest that socially mediated state externalization is an effective architectural mechanism for designing more transparent and trustworthy distributed robot systems, ultimately enhancing user experience without sacrificing performance in distributed home robot deployments.

Authors:Katie Seaborn, Madeleine Steeds, Ilaria Torre, Martina De Cet, Katie Winkle, Marcus Göransson
Title: Operationalizing Perceptions of Agent Gender: Foundations and Guidelines
Abstract:
The "gender" of intelligent agents, virtual characters, social robots, and other agentic machines has emerged as a fundamental topic in studies of people's interactions with computers. Perceptions of agent gender can help explain user attitudes and behaviours -- from preferences to toxicity to stereotyping -- across a variety of systems and contexts of use. Yet, standards in capturing perceptions of agent gender do not exist. A scoping review was conducted to clarify how agent gender has been operationalized -- labelled, defined, and measured -- as a perceptual variable. One-third of studies manipulated but did not measure agent gender. Norms in operationalizations remain obscure, limiting comprehension of results, congruity in measurement, and comparability for meta-analyses. The dominance of the gender binary model and latent anthropocentrism have placed arbitrary limits on knowledge generation and reified the status quo. We contribute a systematically-developed and theory-driven meta-level framework that offers operational clarity and practical guidance for greater rigour and inclusivity.

Authors:Mario Andres Chavarria, Santiago Price Torrendell, Aude Billard, Samia Hurst, Sébastien Kessler, Michael Stein, Kenji Suzuki, Sophie Weerts, Diego Paez-Granados, Minerva Rivas Velarde
Title: User Involvement in Robotic Wheelchair Development: A Decade of Limited Progress
Abstract:
Robotic wheelchairs (RWs) offer significant potential to enhance autonomy and participation for people with mobility impairments, yet many systems have failed to achieve sustained real-world adoption. This narrative literature review examined the extent and quality of end-user involvement in RW design, development, and evaluation over the past decade (2015--2025), assessed against core principles shared by major user-involvement approaches (e.g., user-/human-centered design, participatory/co-design, and inclusive design). The findings indicate that user involvement remains limited and is predominantly concentrated in late-stage evaluation rather than in early requirements definition or iterative co-design. Of the 399 records screened, only 23 studies (about 6%) met the inclusion criteria of verifiable end-user involvement, and many relied on small samples, often around ten participants, with limited justification for sample size selection, proxy users, laboratory-based validation, and non-standardized feedback methods. Research teams were largely engineering-dominated (about 89%) and geographically concentrated in high-income countries. Despite strong evidence that sustained user engagement improves usability and adoption in assistive technology, its systematic implementation in RW research remains rare. Advancing the field requires embedding participatory methodologies throughout the design lifecycle and addressing systemic barriers that constrain meaningful user involvement.

Authors:Qijia Chen, Andrea Bellucci, Giulio Jacucci
Title: Understanding Newcomer Persistence in Social VR: A Case Study of VRChat
Abstract:
Newcomers are crucial for the growth of online communities, yet their successful integration into these spaces requires overcoming significant initial hurdles. Social Virtual Reality (VR) platforms are novel avenues that offer unprecedented online interaction experiences. Unlike well-studied two-dimensional online environments, the pathways to successful newcomer integration in online VR spaces are underexplored. Our research addresses this gap by examining the strategies used by newcomers to navigate early challenges in social VR and how they adapt. By focusing on active participants (ranging from newcomers currently navigating these hurdles to veterans who have successfully integrated) we isolate the specific strategies necessary for retention. We interviewed 24 active social VR users and conducted a reflexive thematic analysis. While participants identified barriers such as unfamiliar user interfaces, social norms, and overwhelming sensory input, our analysis reveals the adaptation strategies required to overcome them. Our findings expand on understanding newcomer persistence beyond traditional 2D environments, emphasizing how social dynamics influence the management of VR-specific issues like VR sickness during onboarding. Additionally, we highlight how successful newcomers overcome the lack of clear objectives in social VR by proactively constructing social meaning. We propose design suggestions to scaffold these successful integration pathways.

Authors:Nobuhito Kasahara, Shota Yamanaka, Homei Miyashita
Title: Skewed Dual Normal Distribution Model: Predicting Touch Pointing Success Rates for Targets Near Screen Edges and Corners
Abstract:
Typical success-rate prediction models for tapping exclude targets near screen edges. However, design constraints often force such placements, and in scrollable user interfaces, any element can move close to the screen edges. In this work, we model how target-edge distance affects touch pointing accuracy. We propose the Skewed Dual Normal Distribution Model, which assumes the tap-coordinate distribution is skewed by a nearby edge. The results showed that as targets approached the edge, the distribution's peak shifted toward the edge, and its tail extended away. In contrast to prior reports, the success rate improved when the target touched the edge, suggesting a strategy of ``tapping the target together with the edge.'' Our model predicts success rates across a wide range of conditions, including edge-adjacent targets. Through three experiments of horizontal, vertical, and 2D pointing, we demonstrated the generalizability and utility of our proposed model.

Authors:Ji Eun Song, Jaeyoun You, Joongseek Lee
Title: "I Might be Using His... But It is Also Mine!": Ownership and Control in Accounts Designed for Sharing
Abstract:
A user's ownership perception of virtual objects, such as cloud files, is generally uncertain. Is this valid for streaming platforms featuring accounts designed for sharing (DS)? We observe sharing practices within DS accounts of streaming platforms and identify their ownership characteristics and unexpected complications through two mixed-method studies. Casual and Cost-splitting are the two sharing practices identified. The owner is the sole payer for the account in the former, whereas profile holders split the cost in the latter. We distinguish two types of ownership in each practice -- Primary and Dual. In Primary ownership, the account owner has the power to allow others to use the account; in Dual ownership, Primary ownership appears in conjunction with joint ownership, notably displaying asymmetric ownership perceptions among users. Conflicts arise when the sharing agreements collapse. Therefore, we propose design recommendations that bridge ownership differences based on sharing practices of DS accounts.

Authors:Ji Eun Song, Eunchae Lee, Juhee Im, Hyunsoo Jang, Eunji Kim, Joongseek Lee
Title: "Don't Look, But I Know You Do": Norms and Observer Effects in Shared LLM Accounts
Abstract:
Account sharing is common in subscription services and is now extending to generative AI platforms, which are still primarily designed for individual use. Sharing often requires workarounds that create new tensions. This study examines how LLM subscriptions are shared and the norms that develop. We combined a survey of 245 users with interviews of 36 participants to understand both patterns and lived experiences. Our analysis identified four types of account sharing, organized along two dimensions: whether the owner uses the account and whether subscription costs are shared. Within these types, we examined how norms were formed and how their fragility, especially privacy, became evident in practice. Users, fully aware of this, subtly adjusted their behavior, which we interpret through the lens of the observer effect. We frame LLM account sharing as a social practice of appropriation and outline design implications to adapt single-user platforms to multi-user realities.

Authors:Ji Eun Song, Hyunsoo Jang, Juhee Im, Joongseek Lee
Title: "Don't Mess Up My Algorithm": Phatic Communication and Algorithmic Contagion in Meme Sharing
Abstract:
On algorithmic social platforms, exchanging memes via direct messages (DMs) serves as phatic communication that affirms relationships, yet users often interpret these exchanges as signals shaping personalized recommendations, creating tension between relational practice and algorithmic control. This study examines how users perceive DM meme exchanges on Instagram rather than auditing Instagram's underlying recommender mechanisms, and how beliefs about DM-recommendation linkages shape coping strategies and feelings of powerlessness. We conducted semi-structured interviews with 21 active meme-DM users. Participants classified memes as recipient-friendly or recipient-unfriendly based on relational fit; many described the spread of unfriendly memes as "algorithmic contagion." Controls were constrained by relational norms, low perceived efficacy of feedback tools, and opaque DM-recommendation linkages. We articulate how DM-based relational practices are entangled with personalization infrastructures and propose three design implications: transparent linkage explanations, conversation-level opt-outs, and conservative learning that down-weights DM-originated signals.

Authors:Annabel Goldman, Yuan Cui, Matthew Kay
Title: Assessing Data Literacy in K-12 Education: Challenges and Opportunities
Abstract:
Data literacy has become a key learning objective in K-12 education, but it remains an ambiguous concept as teachers interpret it differently. When creating assessments, teachers turn broad ideas about "working with data" into concrete decisions about what materials to include. Since working with data visualizations is a core component of data literacy, teachers' decisions about how to include them on assessments offer insight into how they interpret data literacy more broadly. Drawing on interviews with 13 teachers, we identify four challenges in enacting data literacy in assessments: (1) conceptual ambiguity between data visualization and data literacy, (2) tradeoffs between using real-world or synthetic data, (3) difficulty finding and adapting domain-appropriate visual representations and data visualizations, and (4) balancing assessing data literacy and domain-specific learning goals. Drawing on lessons from data visualization, human-computer interaction, and the learning sciences, we discuss opportunities to better support teachers in assessing data literacy.

Authors:Alice Zhong, Phoebe Chen, Anika Sharma, Kandyce Brennan, Snehalkumar 'Neil' S. Gaikwad
Title: "Girl, I'm so Serious": CARE, a Capability Framework for Reproductive Equity in Human-AI Interaction
Abstract:
Sexual and reproductive health (SRH) remains shaped by structural barriers that leave many without judgment-free information. AI chatbots offer anonymous alternatives, but access alone does not ensure equity when socioeconomic determinants shape whose capabilities these tools expand or constrain. Conventional methods for evaluating human-AI interaction were not designed to capture whether technologies holistically support reproductive autonomy. We introduce CARE, Capability Approach for Reproductive Equity, developing capabilities, functionings, and conversion factors into a Normative Design Lens and an Evaluation Lens for AI in SRH contexts. Evaluating SRH-specific non-LLM chatbots, general-use LLMs, and search engine features along credibility and reasoning, we identify two epistemic harms: source opacity and response rigidity. We conclude with design and evaluation recommendations, participatory auditing strategies, and policy implications for high-stakes domains where AI intersects with inequity.

Authors:Anders Giovanni Møller, Elisa Bassignana, Francesco Pierri, Luca Maria Aiello
Title: Overreliance on AI in Information-seeking from Video Content
Abstract:
The ubiquity of multimedia content is reshaping online information spaces, particularly in social media environments. At the same time, search is being rapidly transformed by generative AI, with large language models (LLMs) routinely deployed as intermediaries between users and multimedia content to retrieve and summarize information. Despite their growing influence, the impact of LLM inaccuracies and potential vulnerabilities on multimedia information-seeking tasks remains largely unexplored. We investigate how generative AI affects accuracy, efficiency, and confidence in information retrieval from videos. We conduct an experiment with around 900 participants on 8,000+ video-based information-seeking tasks, comparing behavior across three conditions: (1) access to videos only, (2) access to videos with LLM-based AI assistance, and (3) access to videos with a deceiving AI assistant designed to provide false answers. We find that AI assistance increases accuracy by 3-7% when participants viewed the relevant video segment, and by 27-35% when they did not. Efficiency increases by 10% for short videos and 25% for longer ones. However, participants tend to over-rely on AI outputs, resulting in accuracy drops of up to 32% when interacting with the deceiving AI. Alarmingly, self-reported confidence in answers remains stable across all three conditions. Our findings expose fundamental safety risks in AI-mediated video information retrieval.

Authors:Irene Hou, Alexander Qin, Lauren Cheng, Philip J. Guo
Title: Beyond the Desk: Barriers and Future Opportunities for AI to Assist Scientists in Embodied Physical Tasks
Abstract:
More scientists are now using AI, but prior studies have examined only how they use it 'at the desk' for computer-based work. However, given that scientific work often happens 'beyond the desk' at lab and field sites, we conducted the first study of how scientific practitioners use AI for embodied physical tasks. We interviewed 12 scientific practitioners doing hands-on lab and fieldwork in domains like nuclear fusion, primate cognition, and biochemistry, and found three barriers to AI adoption in these settings: 1) experimental setups are too high-stakes to risk AI errors, 2) constrained environments make it hard to use AI, and 3) AI cannot match the tacit knowledge of humans. Participants then developed speculative designs for future AI assistants to 1) monitor task status, 2) organize lab-wide knowledge, 3) monitor scientists' health, 4) do field scouting, 5) do hands-on chores. Our findings point toward AI as background infrastructure to support physical work rather than replacing human expertise.

Authors:Mohammad Hadi Nezhad, Francisco Enrique Vicente Castro, Ivon Arroyo
Title: Investigating In-Context Privacy Learning by Integrating User-Facing Privacy Tools into Conversational Agents
Abstract:
Supporting users in protecting sensitive information when using conversational agents (CAs) is crucial, as users may undervalue privacy protection due to outdated, partial, or inaccurate knowledge about privacy in CAs. Although privacy knowledge can be developed through standalone resources, it may not readily translate into practice and may remain detached from real-time contexts of use. In this study, we investigate in-context, experiential learning by examining how interactions with privacy tools during chatbot use enhance users' privacy learning. We also explore interface design features that facilitate engagement with these tools and learning about privacy by simulating ChatGPT's interface which we integrated with a just-in-time privacy notice panel. The panel intercepts messages containing sensitive information, warns users about potential sensitivity, offers protective actions, and provides FAQs about privacy in CAs. Participants used versions of the chatbot with and without the privacy panel across two task sessions designed to approximate realistic chatbot use. We qualitatively analyzed participants' pre- and post-test survey responses and think-aloud transcripts and describe findings related to (a) participants' perceptions of privacy before and after the task sessions and (b) interface design features that supported or hindered user-led protection of sensitive information. Finally, we discuss future directions for designing user-facing privacy tools in CAs that promote privacy learning and user engagement in protecting privacy in CAs.

Authors:Shivam Shukla, Emily Chen, Manhaz Roshanaei, Magy Seif El-Nasr
Title: Relationship-Centered Care: Relatedness and Responsible Design for Human Connections in Mental-Health Care
Abstract:
There has been a growing research interest in Digital Therapeutic Alliance (DTA) as the field of AI-powered conversational agents are being deployed in mental health care, particularly those delivering CBT (Cognitive Behaviour Therapy). Our proposition argues that the current design paradigm which seeks to optimize the bond between a patient in need of support and an AI agent contains a subtle but consequential trap: it risks producing an "appearance of connection" that unintentionally disrupts the fundamental human need for relatedness, which potentially displaces the authentic human relationships upon which long-term psychological recovery depends. We propose a reorientation from designing artificial intelligence tools that simulate relationships to designing AI that scaffolds them. To operationalize our argument, we propose an interdisciplinary model that translates the Responsible AI Six Sphere Framework through the lens of Self-Determination Theory (SDT), with a specific focus on the basic psychological need for relatedness. The resulting model offers the technical and often clinical communities a set of relationship-centered design guidelines and relevant provocations for building AI systems that function not just as companions, but as a catalyst for strengthening a patient's entire relational ecology; their connections with therapists, caregivers, family, and peers. In doing so, we discuss a model towards a more sustainable ecosystem of relationship-centered AI in mental health care.

Authors:Jingruo Chen, Yibo Meng, Kexin Nie
Title: "Not Just Me and My To-Do List": Understanding Challenges of Task Management for Adults with ADHD and the Need for AI-Augmented Social Scaffolds
Abstract:
Adults with ADHD often face challenges with task management, not due to a lack of willpower, but because of emotional and relational misalignments between cognitive needs and normative infrastructures. Existing productivity tools, designed for neurotypical users, often assume consistent self-regulation and linear time, overlooking these differences. We conducted 22 semi-structured interviews with ADHD-identifying adults, exploring their challenges in task management and their coping mechanisms through socially and emotionally scaffolded strategies. Building on these insights, we conducted a follow-up speed dating study with 20 additional ADHD-identifying adults, focusing on 13 speculative design concepts that leverage AI for task support. Our findings reveal that task management among adults with ADHD is relationally and affectively co-constructed, rather than an isolated individual act. Overall, we provide (1) empirical insights into distributed and emotionally scaffolded task management practices, (2) design implications for socially-aware AI systems that support co-regulation and nonlinear attention rhythms, and (3)an analysis of user preferences for different AI design concepts, clarifying which features were most valued and why.

Authors:Xingyu Lan, Xi Li, Yixing Zhang, Mengqin Cheng, Jiazhe Wang, Siming Chen
Title: The Evolving Duet of Two Modalities: A Survey on Integrating Text and Visualization for Data Communication
Abstract:
Text plays a fundamental yet understudied role as a narrative device in data visualization. While existing research has extensively explored text as data input and interaction modality, its function in supporting storytelling and interpretation remains fragmented. To address this gap, this work presents a systematic review of 98 publications that provide insights into using text as narrative. We investigate how text can be utilized in visualization, analyze its functions and effects, and explore how it can be designed to facilitate data communication. Our synthesis identifies significant research gaps in this domain and proposes future directions to advance the integration of text and visualization, ultimately aiming to provide guidance for designing text that enhances narrative clarity and fosters engagement.

Authors:Amine Benamara, Céline Clavel, Brian Ravenet, Nicolas Sabouret, Julien Saunier
Title: Exploring the role of embodiment on intimacy perception in a multiparty collaborative task
Abstract:
During collaborative board games, cohesion represents a key aspect to define a well functionning group. From the success of the task to the developement of interpersonal relationship, this concept covers many aspects of group dynamics. The goal of our work is to investigate the factors that impact cohesion in a group, and specifically the relevant social skills that improve collaboration between multiple entities. In this article, we focus on the role of embodiement on different aspects of an interaction. We propose an experimental protocol, based on a collected corpus of humans playing a collaborative board game, to study how different agents' embodiment affect the perception of these agents and of the group as a whole. We conclude by presenting an outline of the problematics of the conception of the protocol and of multi-agent system related challenges.

Authors:Amandine M. Caut, Beimnet Zenebe, Amy Rouillard, David J. T. Sumpter
Title: What You Prompt is What You Get: Increasing Transparency of Prompting Using Prompt Cards
Abstract:
The rapid advancement and impressive capabilities of large language models (LLMs) have given rise to the field of prompt engineering, the practice of crafting inputs to guide LLMs toward high-quality, task-relevant outputs. A critical challenge facing the field is the lack of standardised prompt documentation and evaluation practices. Prompts can be long, complex and difficult to evaluate on subjective tasks. To address this challenge, we propose the use of prompt cards, structured summaries of prompt engineering practices inspired by the concept of model cards. Through prompt cards, the specific goals, considerations and steps taken during prompt engineering can be systematically documented and assessed. We present the prompt card approach and illustrate it on a specific task called wordalisation, in which structured numerical data is transformed into text. We argue that a well-structured prompt card can enable better reproducibility, transparency, improve prompt methodology and give an effective alternative to benchmarking for judging the quality of generated texts. By systemically capturing underlying model details, prompt intent, contextualisation strategies, evaluation practices and ethical considerations, prompt cards make explicit the often implicit design decisions that shape system behaviour. Documenting these choices is important as prompting increasingly involves complex pipelines with multiple moving parts.

Authors:Necva Bölücü, Jessica Irons, Changhyun Lee, Brian Jin, Maciej Rybinski, Huichen Yang, Andreas Duenser, Stephen Wan
Title: Using a Human-AI Teaming Approach to Create and Curate Scientific Datasets with the SCILIRE System
Abstract:
The rapid growth of scientific literature has made manual extraction of structured knowledge increasingly impractical. To address this challenge, we introduce SCILIRE, a system for creating datasets from scientific literature. SCILIRE has been designed around Human-AI teaming principles centred on workflows for verifying and curating data. It facilitates an iterative workflow in which researchers can review and correct AI outputs. Furthermore, this interaction is used as a feedback signal to improve future LLM-based inference. We evaluate our design using a combination of intrinsic benchmarking outcomes together with real-world case studies across multiple domains. The results demonstrate that SCILIRE improves extraction fidelity and facilitates efficient dataset creation.

Authors:Kotaro Fujimura, Hiroki Kusuyama, Masaki Takeuchi, Daisuke Iwai
Title: High-Contrast Projection Mapping under Light Field Illumination with LED Display and Aperiodic Lens Array
Abstract:
Projection Mapping (PM) is a technology that projects images onto the surfaces of physical objects, allowing multiple users to share an augmented reality experience without special devices. However, its practical use has been constrained by the need for dark environments to ensure high-quality projection. To overcome this ``dark-room constraint,'' we propose a novel target-excluding lighting method that selectively illuminates the surrounding environment while avoiding the PM target. Our system achieves light-field illumination by combining an LED display panel with an optimized aperiodic lens array. The key contributions include a compact form factor that provides a large effective light source area, reproducing natural soft shadows comparable to typical lighting, while maintaining the spatial controllability needed to precisely avoid the target. We also introduce a computational technique for optimizing aperiodic lens placement to suppress undesired dark spots caused by crosstalk, and efficient methods for computing LED luminance patterns that enable dynamic PM. Experiments with a prototype system demonstrate that our approach achieves high-contrast PM even in bright environments.

Authors:Takahiro Okamoto, Masaki Takeuchi, Masataka Sawayama, Daisuke Iwai
Title: Shadowless Projection Mapping for Tabletop Workspaces with Synthetic Aperture Projector
Abstract:
Projection mapping (PM) enables augmented reality (AR) experiences without requiring users to wear head-mounted displays and supports multi-user interaction. It is regarded as a promising technology for a variety of applications in which users interact with content superimposed onto augmented objects in tabletop workspaces, including remote collaboration, healthcare, industrial design, urban planning, artwork creation, and office work. However, conventional PM systems often suffer from projection shadows when users occlude the light path. Prior approaches employing multiple distributed projectors can compensate for occlusion, but suffer from latency due to computational processing, degrading the user experience. In this research, we introduce a synthetic-aperture PM system that uses a significantly larger number of projectors, arranged densely in the environment, to achieve delay-free, shadowless projection for tabletop workspaces without requiring computational compensation. To address spatial resolution degradation caused by subpixel misalignment among overlaid projections, we develop and validate an offline blur compensation method whose computation time remains independent of the number of projectors. Furthermore, we demonstrate that our shadowless PM plays a critical role in achieving a fundamental goal of PM: altering material properties without evoking projection-like impression. Specifically, we define this perceptual impression as ``sense of projection (SoP)'' and establish a PM design framework to minimize the SoP based on user studies.

Authors:Jessica Irons, Patrick Cooper, Necva Bolucu, Roelien Timmer, Huichen Yang, Changhyun Lee, Brian Jin, Andreas Duenser, Stephen Wan
Title: To Believe or Not To Believe: Comparing Supporting Information Tools to Aid Human Judgments of AI Veracity
Abstract:
With increasing awareness of the hallucination risks of generative artificial intelligence (AI), we see a growing shift toward providing information tooling to help users determine the veracity of AI-generated answers for themselves. User responsibility for assessing veracity is particularly critical for certain sectors that rely on on-demand, AI-generated data extraction, such as biomedical research and the legal sector. While prior work offers us a variety of ways in which systems can provide such support, there is a lack of empirical evidence on how this information is actually incorporated into the user's decision-making process. Our user study takes a step toward filling this knowledge gap. In the context of a generative AI data extraction tool, we examine the relationship between the type of supporting information (full source text, passage retrieval, and Large Language Model (LLM) explanations) and user behavior in the veracity assessment process, examined through the lens of efficiency, effectiveness, reliance and trust. We find that passage retrieval offers a reasonable compromise between accuracy and speed, with judgments of veracity comparable to using the full source text. LLM explanations, while also enabling rapid assessments, fostered inappropriate reliance and trust on the data extraction AI, such that participants were less likely to detect errors. In additiona, we analyzed the impacts of the complexity of the information need, finding preliminary evidence that inappropriate reliance is worse for complex answers. We demonstrate how, through rigorous user evaluation, we can better develop systems that allow for effective and responsible human agency in veracity assessment processes.

Authors:Pronob Kumar Barman, James R. Foulds, Tera L. Reynolds
Title: Understanding User Perceptions of Human-centered AI-Enhanced Support Group Formation in Online Healthcare Communities
Abstract:
Peer support is critical to managing chronic health conditions. Online health communities (OHCs) enable patients and caregivers to connect with similar others, yet their large scale makes it challenging to find the most relevant peers and content. This study assessed perceived value, preferred features, and acceptance conditions for algorithmically personalized support group formation within OHCs. A two-phase, mixed-methods survey (N=165) examined OHC participation patterns, personalization priorities, and acceptance of a simulated personalized support group. Perceived value of the simulated support group was high (mean 4.55/5; 62.8% rated 5/5) and 91.5% would join this group. The importance participants placed on peer matching strongly correlated with perceived value (\r{ho}=0.764, p<0.001). Qualitative findings revealed conditional acceptance: participants demand security, transparency, human oversight, and user control over data. Personalized support groups may be desired, but they will not be adopted unless trust, privacy, and algorithmic governance concerns are addressed.

Authors:Sophia Liu, Shm Garanganao Almeda
Title: Chasing RATs: Tracing Reading for and as Creative Activity
Abstract:
Creativity research has privileged making over the interpretive labor that precedes and shapes it. We introduce Reading Activity Traces (RATs), a proposal that treats reading -- broadly defined to include navigating, interpreting, and curating media across interconnected sources -- as creative activity both for future artifacts and as a form of creation in its own right. By tracing trajectories of traversal, association, and reflection as inspectable artifacts, RATs render visible the creative work that algorithmic feeds and AI summarization increasingly compress and automate away. We illustrate this through WikiRAT, a speculative instantiation on Wikipedia, and open new ground for reflective practice, reader modeling, collective sensemaking, and understanding what is lost when human interpretation is automated -- towards designing intelligent tools that preserve it.

Authors:Bahare Riahi, Sayali Patukale, Joy Niranjan, Yogya Koneru, Tiffany Barnes, Veronica Cateté
Title: AI-Generated Rubric Interfaces: K-12 Teachers' Perceptions and Practices
Abstract:
This study investigates K--12 teachers' perceptions and experiences with AI-supported rubric generation during a summer professional development workshop ($n = 25$). Teachers used MagicSchool.ai to generate rubrics and practiced prompting to tailor criteria and performance levels. They then applied these rubrics to provide feedback on a sample block-based programming activity, followed by using a chatbot to deliver rubric-based feedback for the same work. Data were collected through pre- and post-workshop surveys, open discussions, and exit tickets. We used thematic analysis to analyze the qualitative data. Teachers reported that they rarely create rubrics from scratch because the process is time-consuming and defining clear distinctions between performance levels is challenging. After hands-on use, teachers described AI-generated rubrics as strong starting drafts that improved structure and clarified vague criteria. However, they emphasized the need for teacher oversight due to generic or grade-misaligned language, occasional misalignment with instructional priorities, and the need for substantial editing. Survey results indicated high perceived clarity and ethical acceptability, moderate alignment with assignments, and usability as the primary weakness -- particularly the ability to add, remove, or revise criteria. Open-ended responses highlighted a ``strictness-versus-detail'' trade-off: AI feedback was often perceived as harsher but more detailed and scalable. As a result, teachers expressed conditional willingness to adopt AI rubric tools when workflows support easy customization and preserve teacher control.

Authors:Siyu Lu, Yanhan Liu, Shiyu Xu, Ruishi Zou, Chen Ye
Title: Graphing Inline: Understanding Word-scale Graphics Use in Scientific Papers
Abstract:
Graphics (e.g., figures and charts) are ubiquitous in scientific papers, yet separating graphics from text increases cognitive load in understanding text-graphic connections. Research has found that word-scale graphics, or visual embellishments at typographic size, can augment original text, making it more expressive and easier to understand. However, whether, if so, how scientific papers adopt word-scale graphics for scholarly communication remains unclear. To address this gap, we conducted a corpus study reviewing 909 word-scale graphics extracted from 126,797 scientific papers. Through analysis, we propose a framework that characterizes where (positioning), why (communicative function), and how (visual representation) authors apply word-scale graphics in scientific papers. Our findings reveal that word-scale graphics are rarely used, that icons dominate visual representation, and that visual representation connects with communicative function (e.g., using quantitative graphs for data annotation). We further discuss opportunities to enhance scholarly communication with word-scale graphics through technical and administrative innovations.

Authors:Advait Bhat, Marianne Aubin Le Quéré, Mor Naaman, Maurice Jakesch
Title: Reactive Writers: How Co-Writing with AI Changes How We Engage with Ideas
Abstract:
Emerging experimental evidence shows that writing with AI assistance can change both the views people express in writing and the opinions they hold afterwards. Yet, we lack substantive understanding of procedural and behavioral changes in co-writing with AI that underlie the observed opinion-shaping power of AI writing tools. We conducted a mixed-methods study, combining retrospective interviews with 19 participants about their AI co-writing experience with a quantitative analysis tracing engagement with ideas and opinions in 1{,}291 AI co-writing sessions. Our analysis shows that engaging with the AI's suggestions -- reading them and deciding whether to accept them -- becomes a central activity in the writing process, taking away from more traditional processes of ideation and language generation. As writers often do not complete their own ideation before engaging with suggestions, the suggested ideas and opinions seeded directions that writers then elaborated on. At the same time, writers did not notice the AI's influence and felt in full control of their writing, as they -- in principle -- could always edit the final text. We term this shift \textit{Reactive Writing}: an evaluation-first, suggestion-led writing practice that departs substantially from conventional composing in the presence of AI assistance and is highly vulnerable to AI-induced biases and opinion shifts.

Authors:Ninghao Wan, Jiarun Song, Fuzheng Yang
Title: Dynamic Multimodal Expression Generation for LLM-Driven Pedagogical Agents: From User Experience Perspective
Abstract:
In virtual reality (VR) educational scenarios, Pedagogical agents (PAs) enhance immersive learning through realistic appearances and interactive behaviors. However, most existing PAs rely on static speech and simple gestures. This limitation reduces their ability to dynamically adapt to the semantic context of instructional content. As a result, interactions often lack naturalness and effectiveness in the teaching process. To address this challenge, this study proposes a large language model (LLM)-driven multimodal expression generation method that constructs semantically sensitive prompts to generate coordinated speech and gesture instructions, enabling dynamic alignment between instructional semantics and multimodal expressive behaviors. A VR-based PA prototype was developed and evaluated through user experience-oriented subjective experiments. Results indicate that dynamically generated multimodal expressions significantly enhance learners' perceived learning effectiveness, engagement, and intention to use, while effectively alleviating feelings of fatigue and boredom during the learning process. Furthermore, the combined dynamic expression of speech and gestures notably enhances learners' perceptions of human-likeness and social presence. The findings provide new insights and design guidelines for building more immersive and naturally expressive intelligent PAs.

Authors:SangYeop Jeong, Yeongseo Na, Seung Gyu Jeong, Jin-Woo Jeong, Seong-Eun Kim
Title: Reading the Mood Behind Words: Integrating Prosody-Derived Emotional Context into Socially Responsive VR Agents
Abstract:
In VR interactions with embodied conversational agents, users' emotional intent is often conveyed more by how something is said than by what is said. However, most VR agent pipelines rely on speech-to-text processing, discarding prosodic cues and often producing emotionally incongruent responses despite correct semantics. We propose an emotion-context-aware VR interaction pipeline that treats vocal emotion as explicit dialogue context in an LLM-based conversational agent. A real-time speech emotion recognition model infers users' emotional states from prosody, and the resulting emotion labels are injected into the agent's dialogue context to shape response tone and style. Results from a within-subjects VR study (N=30) show significant improvements in dialogue quality, naturalness, engagement, rapport, and human-likeness, with 93.3% of participants preferring the emotion-aware agent.

Authors:Jiarun Song, Ninghao Wan, FuZheng Yang, Weisi Lin
Title: From Perception to Cognition: How Latency Affects Interaction Fluency and Social Presence in VR Conferencing
Abstract:
Virtual reality (VR) conferencing has the potential to provide geographically dispersed users with an immersive environment, enabling rich social interactions and user experience using avatars. However, remote communication in VR inevitably introduces end-to-end (E2E) latency, which can significantly impact user experience. To clarify the impact of latency, we conducted subjective experiments to analyze how it influences interaction fluency from the perspective of quality perception and social presence from the perspective of social cognition, comparing VR conferencing with traditional video conferencing (VC). Specifically, interaction fluency emphasizes user perception of interaction pace and responsiveness and is assessed using Absolute Category Rating (ACR) method. In contrast, social presence focuses on the cognitive understanding of interaction, specifically whether individuals can comprehend the intentions, emotions, and behaviors expressed by others. It is primarily measured using the Networked Minds Social Presence Inventory (NMSPI). Building on this analysis, we further investigate the relationship between interaction fluency and social presence under different latency conditions to clarify the underlying perceptual and cognitive mechanisms. The findings from these subjective tests provide meaningful insights for optimizing the related systems, helping to improve interaction fluency and enhancing social presence in immersive virtual environments.

Authors:Jacek Małecki, Alexander Mathiesen-Ohman, Katarzyna Tworek
Title: A Decentralized Frontier AI Architecture Based on Personal Instances, Synthetic Data, and Collective Context Synchronization
Abstract:
Recent progress in artificial intelligence has been driven largely by the scaling of centralized large language models through increased parameters, datasets, and computational resources. While effective, this paradigm introduces structural constraints related to compute concentration, energy consumption, data availability, and governance. This paper proposes an alternative architectural approach through the H3LIX Decentralized Frontier Model Architecture (DFMA), a distributed AI framework in which locally operating AI instances generate synthetic learning signals derived from reasoning processes and interactions. These signals are aggregated within a shared contextual substrate termed the Collective Context Field (CCF), which conditions reasoning behavior across the network without requiring direct parameter synchronization. By enabling contextual signal propagation rather than centralized retraining at every iteration, the architecture can be designed to support privacy-preserving collective learning under explicit assumptions, while facilitating distributed sharing of learned abstractions. The system further integrates Energy-Adaptive Model Evolution, aligning learning activities with renewable energy availability to support more sustainable AI infrastructure. Conceptually, the architecture reframes artificial intelligence as a distributed cognitive system analogous to biological neural networks, in which intelligence emerges from the interaction of many locally adaptive agents within a shared contextual environment. Together, these mechanisms suggest a new scaling pathway for artificial intelligence systems based on distributed contextual learning and collective experience accumulation.

Authors:Jan Ulrich Bartels, Alexander Achberger, Katherine J. Kuchenbecker, Michael Sedlmair
Title: Rendering Forces With a Modular Cable System, Motors, and Brakes
Abstract:
We describe the hardware design, force-rendering approach, and evaluation of a new reconfigurable haptic interface consisting of a network of hybrid motor-brake actuation modules that apply forces via cables. Each module contains both a motor and a brake, enabling it to smoothly render active forces up to 6 N using its motor and collision forces up to 186 N using its passive one-way brake. The modular design, meanwhile, allows the system to deliver rich haptic feedback in a flexible number of DoF and widely ranging configurations.

Authors:Haomiaomiao Wang, Tomás E Ward, Lili Zhang
Title: Rigidity in LLM Bandits with Implications for Human-AI Dyads
Abstract:
We test whether LLMs show robust decision biases. Treating models as participants in two-arm bandits, we ran 20000 trials per condition across four decoding configurations. Under symmetric rewards, models amplified positional order into stubborn one-arm policies. Under asymmetric rewards, they exploited rigidly yet underperformed an oracle and rarely re-checked. The observed patterns were consistent across manipulations of temperature and top-p, with top-k held at the provider default, indicating that the qualitative behaviours are robust to the two decoding knobs typically available to practitioners. Crucially, moving beyond descriptive metrics to computational modelling, a hierarchical Rescorla-Wagner-softmax fit revealed the underlying strategies: low learning rates and very high inverse temperatures, which together explain both noise-to-bias amplification and rigid exploitation. These results position minimal bandits as a tractable probe of LLM decision tendencies and motivate hypotheses about how such biases could shape human-AI interaction.

Authors:Yoshiki Tanaka, Michimasa Inaba
Title: User Review Writing via Interview with Dialogue Systems
Abstract:
User reviews on e-commerce and review sites are crucial for making purchase decisions, although creating detailed reviews is time-consuming and labor-intensive. In this study, we propose a novel use of dialogue systems to facilitate user review creation by generating reviews from information gathered during interview dialogues with users. To validate our approach, we implemented our system using GPT-4 and conducted comparative experiments from the perspectives of system users and review readers. The results indicate that participants who used our system rated their interactions positively. Additionally, reviews generated by our system required less editing to achieve user satisfaction compared to those by the baseline. We also evaluated the reviews from the reader' perspective and found that our system-generated reviews are more helpful than those written by humans. Despite challenges with the fluency of the generated reviews, our method offers a promising new approach to review writing.

Authors:Kaleen Shrestha, Harish Dukkipati, Avni Hulyalkar, Kyla Penamante, Ankita Samanta, Maja Matarić
Title: Exploring Socially Assistive Peer Mediation Robots for Teaching Conflict Resolution to Elementary School Students
Abstract:
In peer mediation--an approach to conflict resolution used in many K-12 schools in the United States--students help other students to resolve conflicts. For schools without peer mediation programs, socially assistive robots (SARs) may be able to provide an accessible option to practice peer mediation. We investigate how elementary school students react to a peer mediator role-play activity through an exploratory study with SARs. We conducted a small single-session between-subjects study with 12 participants. The study had two conditions, one with two robots acting as disputants, and the other without the robots and just the tablet. We found that a majority of students had positive feedback on the activity, with many students saying the peer mediation practice helped them feel better about themselves. Some said that the activity taught them how to help friends during conflict, indicating that the use of SARs for peer mediation practice is promising. We observed that participants had varying reading levels that impacted their ability to read and dictate the turns in the role-play script, an important consideration for future study design. Additionally, we found that some participants were more expressive while reading the script and throughout the activity. Although we did not find statistical differences in pre-/post-session self-perception and quiz performance between the robot and tablet conditions, we found strong correlations (p<0.05) between certain trait-related measures and learning-related measures in the robot condition, which can inform future study design for SARs for this and related contexts.

Authors:Jieying Zhang, Steeven Villa, Abdallah El Ali
Title: Is it Me? Toward Self-Extension to AI Avatars in Virtual Reality
Abstract:
Advances in generative AI, speech synthesis, and embodied avatars enable systems that not only assist communication, but can act as proxies on users' behalf. Prior work in HCI has largely focused on systems as external tools, with less attention paid to the experiential consequences of users' speech and actions becoming assimilated with AI-generated output. We introduce the design and implementation of ProxyMe, a work-in-progress VR prototype that allows users to embody an avatar whose voice and spoken content are modified by an AI system. By combining avatar-based embodiment, voice cloning, and AI-mediated speech augmentation, ProxyMe invites the exploration of avatar self-extension: situations in which AI-modified communication is experienced as part of one's own expressive behavior. We chart out research challenges and envisioned scenarios, with a focus on how varying degrees of delegation and steerability can influence perceived agency, authorship, and self-identification.

Authors:Inha Cha, Catherine Wieczorek, Richmond Y. Wong
Title: The Values of Value in AI Adoption: Rethinking Efficiency in UX Designers' Workplaces
Abstract:
Although organizations increasingly position AI adoption as a pathway to competitiveness and innovation, organizations' perspectives on productivity and efficiency often clash with workers' perspectives on AI's economic and social value. Through design workshops with 15 UX designers, we examine how AI adoption unfolds across individual, team, and organizational scales. At the individual level, designers weighed efficiency, skill development, and professional worth. At the team level, they negotiated collaboration, responsibility, and rigor. At the organizational level, adoption was shaped by compliance requirements and organizational norms. Across these scales, discourses of efficiency carried social and ethical dimensions of responsibility, trust, and autonomy. We view adoption as a site where roles, relationships, and power are reconfigured. We argue that AI adoption should be understood as a process of negotiating values, and call for future work examining how AI systems redistribute responsibility among team members, while understanding how such shifts could strengthen worker agency.

Authors:Srishti Palani, Vidya Setlur
Title: Lexara: A User-Centered Toolkit for Evaluating Large Language Models for Conversational Visual Analytics
Abstract:
Large Language Models (LLMs) are transforming Conversational Visual Analytics (CVA) by enabling data analysis through natural language. However, evaluating LLMs for CVA remains a challenge: requiring programming expertise, overlooking real-world complexity, and lacking interpretable metrics for multi-format (visualizations and text) outputs. Through interviews with 22 CVA developers and 16 end-users, we identified use cases, evaluation criteria and workflows. We present Lexara, a user-centered evaluation toolkit for CVA that operationalizes these insights into: (i) test cases spanning real-world scenarios; (ii) interpretable metrics covering visualization quality (data fidelity, semantic alignment, functional correctness, design clarity) and language quality (factual grounding, analytical reasoning, conversational coherence) using rule-based and LLM-as-a-Judge methods; and (iii) an interactive toolkit enabling experimental setup and multi-format and multi-level exploration of results without programming expertise. We conducted a two-week diary study with six CVA developers, drawn from our initial cohort of 22. Their feedback demonstrated Lexara's effectiveness for guiding appropriate model and prompt selection.

Authors:Dorsaf Sallami, Esma Aïmeur
Title: Verify as You Go: An LLM-Powered Browser Extension for Fake News Detection
Abstract:
The rampant spread of fake news in the digital age poses serious risks to public trust and democratic institutions, underscoring the need for effective, transparent, and user-centered detection tools. Existing browser extensions often fall short due to opaque model behavior, limited explanatory support, and a lack of meaningful user engagement. This paper introduces Aletheia, a novel browser extension that leverages Retrieval-Augmented Generation (RAG) and Large Language Models (LLMs) to detect fake news and provide evidence-based explanations. Aletheia further includes two interactive components: a Discussion Hub that enables user dialogue around flagged content and a Stay Informed feature that surfaces recent fact-checks. Through extensive experiments, we show that Aletheia outperforms state-of-the-art baselines in detection performance. Complementing this empirical evaluation, a complementary user study with 250 participants confirms the system's usability and perceived effectiveness, highlighting its potential as a transparent tool for combating online fake news.

Authors:Jian Zhang, Wafa Johal, Jarrod Knibbe
Title: Modelling Visuo-Haptic Perception Change in Size Estimation Tasks
Abstract:
Tangible interactions involve multiple sensory cues, enabling the accurate perception of object properties, such as size. Research has shown, however, that if we decouple these cues (for example, by altering the visual cue), then the resulting discrepancies present new opportunities for interactions. Perception over time though, not only relies on momentary sensory cues, but also on a priori beliefs about the object, implying a continuing update cycle. This cycle is poorly understood and its impact on interaction remains unknown. We study (N=80) visuo-haptic perception of size over time and (a) reveal how perception drifts, (b) examine the effects of visual priming and dead-reckoning, and (c) present a model of visuo-haptic perception as a cyclical, self-adjusting system. Our work has a direct impact on illusory perception in VR, but also sheds light on how our visual and haptic systems cooperate and diverge.

Authors:Joseph Walusimbi, Ann Move Oguti, Joshua Benjamin Ssentongo, Keith Ainebyona
Title: Arapai: An Offline-First AI Chatbot Architecture for Low-Connectivity Educational Environments
Abstract:
The rapid global expansion of large language models (LLMs) has created new opportunities for personalised and inquiry-driven learning. However, most AI chatbot systems for education rely on continuous internet connectivity, cloud infrastructure, and modern hardware. These requirements reinforce digital inequalities and limit the practical deployment of AI-supported learning in bandwidth-constrained and resource-limited environments worldwide. This paper presents Arapai, an offline-first AI chatbot architecture designed to operate entirely without internet connectivity on low-specification, CPU-only devices. The system integrates locally hosted, quantised language models with automatic hardware-aware model selection and pedagogically tiered response control. By performing inference fully on-device and maintaining models resident in memory for performance optimisation, Arapai delivers curriculum-aligned explanations, structured problem-solving support, and differentiated instructional depth without reliance on cloud services. A pilot deployment in secondary and tertiary institutions operating under limited-connectivity conditions evaluated the system across four dimensions: technical performance, usability, perceived answer quality, and educational impact. Results indicate stable operation on legacy hardware, acceptable response times for standard instructional queries, and positive learner and teacher perceptions regarding self-directed learning support. Rather than replacing cloud-based AI systems, this work proposes a complementary deployment paradigm for infrastructure-constrained education systems. The study contributes a hardware-aware architectural framework for decentralised AI tutoring and highlights the role of offline-first design in advancing digital inclusion and infrastructure-resilient educational technology.

Authors:Emran Poh, Yueyue Hou, Tianyi Zhang, Jiannan Li
Title: 'Show It, Don't Just Say It': The Complementary Effects of Instruction Multimodality for Software Guidance
Abstract:
Designing adaptive tutoring systems for software learning presents challenges in determining appropriate instructional modalities. To inform the design of such systems, we conducted an observational study of ten human teacher-student pairs (N=10), where experienced design software users taught novices two new graphic design software features through multi-step procedures. These lessons were limited to three communication channels (speech, visual annotations, and remote screen control) to mimic possible AI tutor modalities. We found that annotations complement speech with spatial precision and remote control complements it with spatial and temporal precision, but both cause intrusion to learner agency. Teachers adaptively select modalities to balance the need for instruction progress with students' cognitive engagement and sense of digital territory ownership. Our results provide further support to the contiguity principles and the value of agency in learning, while suggesting precision-agency trade-off and digital territoriality as new design constraints for adaptive software guidance.

Authors:Sora Kang, Jaemin Zoh, Hyoju Kim, Hyeonseo Park, Hajin Lim, Joonhwan Lee
Title: Actor's Note: Examining the Role of AI-Generated Questions in Character Journaling for Actor Training
Abstract:
Character journaling is a well-established exercise in actor training, but many actors struggle to sustain it due to cognitive burden, the blank page problem, and unclear short-term rewards. We reframe large language models not as co-authors but as maieutic partners-tools that guide reflection through context-aware questioning rather than producing text on behalf of the user. Based on this perspective, we designed Actor's Note, a journaling tool that tailors questions to the script, role, and rehearsal phase while preserving actor agency. We evaluated the system in a 14-day crossover study with 29 actors using surveys, logs, and interviews. Results indicate that the tool reduced entry barriers, supported sustained reflection, and enriched character exploration, with participants describing different benefits when AI was introduced at earlier versus later rehearsal stages. This work contributes empirical insights and design principles for creativity-support tools that sustain reflective practices while preserving artistic immersion in performance training.

Authors:Rina Buoy, Dylan berkamp Fouepe Dongmo, Vesal Khean, Simone Marinai, Koichi Kise
Title: Towards Non-Latin Text and Layout Personalization for Enhanced Readability
Abstract:
Reading has always been an integral part of both professional and personal life. Character and layout recognition and understanding by computers are well-explored areas. Nevertheless, how characters and layout are read and perceived by humans remains relatively underexplored. This work contributes to the field of human-document interaction (HDI) by investigating the effects of character and layout personalization on readability. The paper presents an empirical study on how parts-of-speech (POS)-based character and layout modifications can lead to overall improvements in both reading comprehension and memorization for two non-segmented, non-Latin writing systems: Khmer and Japanese. The experimental results from 43 participants suggest that, by bolding POS-derived content words, Khmer readers perform better on both reading comprehension and memorisation tasks, with a significant effect (p-values of 0.03 and 0.04, respectively). A similar overall tendency is also observed in a pilot study among Japanese readers (10 participants) using syntactic color-coding. In addition, the analyses of reading time, answering time, and perceived difficulty reveal that the proposed text styling technique does not increase any perceived difficulty, cognitive load, or reading effort for the Khmer readers. However, the Japanese readers experienced a decrease in reading speed. This study and its findings represent a significant step towards enabling dynamic, script-dependent personalization of character and layout to optimize human readability.

Authors:Zhimin Wang, Chenyu Gu, Feng Lu
Title: SIAgent: Spatial Interaction Agent via LLM-powered Eye-Hand Motion Intent Understanding in VR
Abstract:
Eye-hand coordinated interaction is becoming a mainstream interaction modality in Virtual Reality (VR) user interfaces.Current paradigms for this multimodal interaction require users to learn predefined gestures and memorize multiple gesture-task associations, which can be summarized as an ``Operation-to-Intent" paradigm. This paradigm increases users' learning costs and has low interaction error tolerance. In this paper, we propose SIAgent, a novel "Intent-to-Operation" framework allowing users to express interaction intents through natural eye-hand motions based on common sense and habits. Our system features two main components: (1) intent recognition that translates spatial interaction data into natural language and infers user intent, and (2) agent-based execution that generates an agent to execute corresponding tasks. This eliminates the need for gesture memorization and accommodates individual motion preferences with high error tolerance. We conduct two user studies across over 60 interaction tasks, comparing our method with two "Operation-to-Intent" techniques. Results show our method achieves higher intent recognition accuracy than gaze + pinch interaction (97.2% vs 93.1%) while reducing arm fatigue and improving usability, and user preference. Another study verifies the function of eye gaze and hand motion channels in intent recognition. Our work offers valuable insights into enhancing VR interaction intelligence through intent-driven design. Our source code and LLM prompts will be made available upon publication.

Authors:Ken Gu, Srishti Palani, Vidya Setlur
Title: "I Need to Find That One Chart": How Data Workers Navigate, Make Sense of, and Communicate Analytical Conversations
Abstract:
Conversational interfaces are increasingly used for data analysis, enabling data workers to express complex analytical intents in natural language. Yet, these interactions unfold as long, linear transcripts that are misaligned with the iterative, nonlinear nature of real-world analyses. Revisiting and summarizing conversations for different contexts is therefore challenging. This paper investigates how data workers navigate, make sense of, and communicate prior analytical conversations. To study behaviors beyond those supported by standard interfaces (i.e., scrolling and keyword search), we develop a design probe that supplements analytical conversations with structured elements and affordances (e.g., filtering, multi-level navigation and detail-on-demand). In a user study (n = 10), participants used the probe to navigate and communicate past analyses, fulfilling information needs (recall, reorient, prioritize) through navigation strategies (visual recall, sequential and abstractive) and summarization practices (adding process details and context). Based on these findings, we discuss design implications to support re-visitation and communication of analytical conversations.

Authors:Ariadni Mandala, Alexandros Gazis, Theodoros Vavouras
Title: Distance Learning and Multilingual Education: A Case Study of Challenges and Pedagogical Perspectives in the Greek Border Region
Abstract:
In increasingly multicultural and multilingual societies, foreign language learning has become essential not only for communication but also for social cohesion and professional advancement. Distance education has emerged as a flexible and accessible solution, particularly for adults seeking to enhance their linguistic and intercultural competencies. This study explores the views of foreign language teachers regarding the role of distance education in promoting multilingualism, with a specific focus on culturally diverse border regions. Conducted in the Regional Unit of Evros, Greece, the research adopts a qualitative methodology based on semi-structured interviews with five language educators working in public and private education. Findings reveal that teachers recognize the potential of digital tools such as Massive Open Online Courses (MOOCs), machine translation applications (e.g., Google Translate, DeepL), and adaptive learning platforms to support multilingual learning, particularly when used as supplementary resources. However, concerns were raised about the lack of personalized feedback, limited interactivity, and the absence of culturally contextualized content on existing platforms. Teachers emphasized the importance of digital literacy, pedagogical training, and culturally inclusive design to ensure effective implementation. The study highlights the need for targeted support for educators in border regions and calls for more locally adapted digital resources that reflect linguistic diversity. These findings offer insights for policymakers and educational technology developers aiming to improve the quality and reach of multilingual education in remote or underserved areas.

Authors:Ramtin Tabatabaei, Milad Hosseini, Ali Mohajerzarrinkelk, Ali F. Meghdari, Alireza Taheri
Title: Empirical Study of Gaze Behavior in Children and Young Adults Using Deep Neural Networks and Robot Implementation: A Comparative Analysis of Social Situations
Abstract:
In a preliminary exploratory study, our goal was to train deep neural network models to mimic children's and/or adults' gaze behavior in certain social situations to reach this objective. Additionally, we aim to identify potential differences in gaze behavior between these two age groups based on our participants' gaze data. Furthermore, we aimed to assess the practical effectiveness of our adult and children models by deploying them on a Nao robot in real-life settings. To achieve this, we first created two video clips, one animation and one live-action, to depict some social situations. Using an eye-tracking device, we collected eye-tracking data from 24 participants, including 12 children and 12 adults. Then, we utilized deep neural networks, specifically LSTM and Transformer Networks, to analyze and model the gaze patterns of each group of participants. Our results indicate that when the models attempted to predict people's locations (in the next frame), they had an accuracy in the range of 62%-70% with one attempt, which increased by ~20% when attempted twice (i.e. the two highest-ranked predicted labels as outputs). As expected, the result underscores that gaze behavior is not a wholly unique phenomenon. We obtained feedback from 57 new participants to evaluate the robot's functionality. These participants were asked to watch two videos of the robot's performance in each mode and then complete a comprehensive questionnaire. The questionnaire results indicate that the participants expressed satisfaction with the robot's interaction, including its attention, intelligence, and responsiveness to human actions. However, they did not perceive the robot as a social companion comparable to a human. This exploratory study tries to address/show potentials of the social acceptance of robots based on human nonverbal behavioral cues for future research.

Authors:Ronald Schnitzer, Maximilian Hoeving, Sonja Zillner
Title: Self-Service or Not? How to Guide Practitioners in Classifying AI Systems Under the EU AI Act
Abstract:
In August 2024, the EU Artificial Intelligence Act (AIA) came into force, marking the world's first large-scale regulatory framework for AI. Central to the AIA is a risk-based approach, aligning regulatory obligations with the potential harm posed by AI systems. To operationalize this, the AIA defines a Risk Classification Scheme (RCS), categorizing systems into four levels of risk. While this aligns with the theoretical foundations of risk-based regulations, the practical application of the RCS is complex and requires expertise across legal, technical, and domain-specific areas. Despite increasing academic discussion, little empirical research has explored how practitioners apply the RCS in real-world contexts. This study addresses this gap by evaluating how industrial practitioners apply the RCS using a self-service, web-based decision-support tool. Following a Design Science Research (DSR) approach, two evaluation phases involving 78 practitioners across diverse domains were conducted. Our findings highlight critical challenges in interpreting legal definitions and regulatory scope, and show that targeted support, such as clear explanations and practical examples, can significantly enhance the risk classification process. The study provides actionable insights for tool designers and policymakers aiming to support AIA compliance in practice.

Authors:Irene Hou, Zeyu Xiong, Philip J. Guo, April Yi Wang
Title: "Bespoke Bots": Diverse Instructor Needs for Customizing Generative AI Classroom Chatbots
Abstract:
Instructors are increasingly experimenting with AI chatbots for classroom support. To investigate how instructors adapt chatbots to their own contexts, we first analyzed existing resources that provide prompts for educational purposes. We identified ten common categories of customization, such as persona, guardrails, and personalization. We then conducted interviews with ten university STEM instructors and asked them to card-sort the categories into priorities. We found that instructors consistently prioritized the ability to customize chatbot behavior to align with course materials and pedagogical strategies and de-prioritized customizing persona/tone. However, their prioritization of other categories varied significantly by course size, discipline, and teaching style, even across courses taught by the same individual, highlighting that no single design can meet all contexts. These findings suggest that modular AI chatbots may provide a promising path forward. We offer design implications for educational developers building the next generation of customizable classroom AI systems.

Authors:Ralf Schmälzle, Yuetong Du, Sue Lim, Gary Bente
Title: The Moment of Capture: How the First Seconds of a Speaker's Nonverbal and Verbal Performance Shapes Audience Judgments
Abstract:
Why do some speakers capture a room almost instantly while others fail to connect? The real-time architecture of audience engagement remains largely a black box. Here, we used motion-captured animations to present the pure nonverbal performance of public speakers to audiences - either in silence (nonverbal-only) or paired with the verbal content (nonverbal-plus-verbal). Using continuous response measurement (CRM), we find that audience judgments solidify with remarkable speed: Moment-to-moment engagement ratings become highly predictive of subsequent evaluations within the initial 10 seconds of the performance. Most notably, this predictive relationship emerged faster and slightly stronger in the nonverbal-only condition, with predictive information being present already after less than 5 seconds. These findings elucidate the social impact a speaker's nonverbal performance has on audience impressions, even when dissociated from the verbal content of the speech. Our approach provides a high-resolution temporal map of social impression formation, pointing to an early "moment of capture" that appears to set the stage for the reception of the following message. On a broader scale, this research validates a powerful new method to isolate different communicative channels, to scientifically deconstruct rhetorical skill, and to study the pervasive impact of nonverbal behavior more broadly. It also enables us to translate the ancient art of rhetoric into a modern science of social impression formation, yielding an empirical basis that can inform human-centered feedback, develop AI-based augmentation tools, and guide the design of engaging, socially present avatars in an increasingly AI-mediated and virtual world.

Authors:Aaron Broukhim, Nadir Weibel, Eshin Jolly
Title: Same Words, Different Judgments: Modality Effects on Preference Alignment
Abstract:
Preference-based reinforcement learning (PbRL) is the dominant framework for aligning AI systems to human preferences, but its application to speech remains underexplored. We present a controlled cross-modal study of human and synthetic preference annotations, comparing text and audio evaluations of identical semantic content across 100 prompts. Audio preferences prove as reliable as text, with inter-rater agreement reaching good levels (ICC(2,k) $\approx$ .80) at $\sim$9 raters -- the first ICC-based reliability characterization in the preference annotation literature for either modality. However, modality reshapes how people judge: audio raters exhibit narrower decision thresholds, reduced length bias, and more user-oriented evaluation criteria, with near-chance cross-modality agreement. Synthetic ratings further align with human judgments and predict inter-rater agreement, supporting their use both for triaging ambiguous pairs and as full replacements for human annotations.

Authors:Timothy Bickmore, Mehdi Arjmand, Yunus Terzioglu
Title: Relational Appliances: A Robot in the Refrigerator for Home-Based Health Promotion
Abstract:
Kitchen appliances are frequently used domestic artifacts situated at the point of everyday dietary decision making, making them a promising but underexplored site for health promotion. We explore the concept of relational appliances: everyday household devices designed as embodied social actors that engage users through ongoing, personalized interaction. We focus on the refrigerator, whose unique affordances, including a fixed, sensor-rich environment, private interaction space, and close coupling to food items, support contextualized, conversational engagement during snack choices. We present an initial exploration of this concept through a pilot study deploying an anthropomorphic robotic head inside a household refrigerator. In a home-lab apartment, participants repeatedly retrieved snacks during simulated TV "commercial breaks" while interacting with a human-sized robotic head. Participants were randomized to either a health-promotion condition, in which the robot made healthy snack recommendations, or a social-chat control condition. Outcomes included compliance with recommendations, nutritional quality of selected snacks, and psychosocial measures related to acceptance of the robot. Results suggest that participants found the robot persuasive, socially engaging, and increasingly natural over time, often describing it as helpful, aware, and companionable. Most participants reported greater awareness of their snack decisions and expressed interest in having such a robot in their own home. We discuss implications for designing relational appliances that leverage anthropomorphism, trust, and long-term human-technology relationships for home-based health promotion.

Authors:Nobuhito Kasahara, Shota Yamanaka, Homei Miyashita
Title: Skewed Dual Normal Distribution Model: Predicting 1D Touch Pointing Success Rate for Targets Near Screen Edges
Abstract:
Typical success-rate prediction models for tapping exclude targets near screen edges; however, design constraints often force such placements. Additionally, in scrollable UIs any element can move close to an edge. In this work, we model how target--edge distance affects 1D touch pointing accuracy. We propose the Skewed Dual Normal Distribution Model, which assumes the tap coordinate distribution is skewed by a nearby edge. The results of two smartphone experiments showed that, as targets approached the edge, the distribution's peak shifted toward the edge and its tail extended away. In contrast to prior reports, the success rate improved when the target touched the edge, suggesting a strategy of ``tapping the target together with the edge.'' By accounting for skew, our model predicts success rates across a wide range of conditions, including edge-adjacent targets, thus extending coverage to the whole screen and informing UI design support tools.

Authors:Xiuqi Tommy Zhu, Xiaoan Liu, Casper Harteveld, Smit Desai, Eileen McGivney
Title: Conversational Successes and Breakdowns in Everyday Non-Display Smart Glasses Use
Abstract:
Non-Display Smart Glasses hold the potential to support everyday activities by combining continuous environmental sensing with voice-only interaction powered by large language models (LLMs). Understanding how conversational successes and breakdowns arise in everyday contexts can better inform the design of future voice-only interfaces. To investigate this, we conducted a month-long collaborative autoethnography (n=2) to identify patterns of successes and breakdowns when using such devices. We then compare these patterns with prior findings on voice-only interactions to highlight the unique affordances and opportunities offered by non-display smart glasses.

Authors:Yuan Cui, Annabel Goldman, Jovy Zhou, Xiaolin Liu, Clarissa Shieh, Joshua Yao, Mia Cheng, Matthew Kay, Fumeng Yang
Title: Codesigning Ripplet: an LLM-Assisted Assessment Authoring System Grounded in a Conceptual Model of Teachers' Workflows
Abstract:
Assessments are critical in education, but creating them can be difficult. To address this challenge in a grounded way, we partnered with 13 teachers in a seven-month codesign process. We developed a conceptual model that characterizes the iterative dual process where teachers develop assessments while simultaneously refining requirements. To enact this model in practice, we built Ripplet, a web-based tool with multilevel reusable interactions to support assessment authoring. The extended codesign revealed that Ripplet enabled teachers to create formative assessments they would not have otherwise made, shifted their practices from generation to curation, and helped them reflect more on assessment quality. In a user study with 15 additional teachers, compared to their current practices, teachers felt the results were more worth their effort and that assessment quality improved.

Authors:Lauren Vogelstein, Vedya Konda, Deborah Fields, Yasmin Kafai, Luis Morales-Navarro, Danaé Metaxa
Title: Rapid Testing, Duck Lips, and Tilted Cameras: Youth Everyday Algorithm Auditing Practices with Generative AI Filters
Abstract:
Today's youth have extensive experience interacting with artificial intelligence and machine learning applications on popular social media platforms, putting youth in a unique position to examine, evaluate, and even challenge these applications. Algorithm auditing is a promising candidate for connecting youth's everyday practices in using AI applications with more formal scientific literacies (syncretic designs). In this paper, we analyze high school youth participants' everyday algorithm auditing practices when interacting with generative AI filters on TikTok, revealing thorough and extensive examinations, with youth rapidly testing filters with sophisticated camera variations and facial manipulations to identify filter limitations. In the discussion, we address how these findings can provide a foundation for developing designs that bring together everyday and more formal algorithm auditing.

Authors:Hasan Amin, Ming Yin, Rajiv Khanna
Title: Align When They Want, Complement When They Need! Human-Centered Ensembles for Adaptive Human-AI Collaboration
Abstract:
In human-AI decision making, designing AI that complements human expertise has been a natural strategy to enhance human-AI collaboration, yet it often comes at the cost of decreased AI performance in areas of human strengths. This can inadvertently erode human trust and cause them to ignore AI advice precisely when it is most needed. Conversely, an aligned AI fosters trust yet risks reinforcing suboptimal human behavior and lowering human-AI team performance. In this paper, we start by identifying this fundamental tension between performance-boosting (i.e., complementarity) and trust-building (i.e., alignment) as an inherent limitation of the traditional approach for training a single AI model to assist human decision making. To overcome this, we introduce a novel human-centered adaptive AI ensemble that strategically toggles between two specialist AI models - the aligned model and the complementary model - based on contextual cues, using an elegantly simple yet provably near-optimal Rational Routing Shortcut mechanism. Comprehensive theoretical analyses elucidate why the adaptive AI ensemble is effective and when it yields maximum benefits. Moreover, experiments on both simulated and real-world data show that when humans are assisted by the adaptive AI ensemble in decision making, they can achieve significantly higher performance than when they are assisted by single AI models that are trained to either optimize for their independent performance or even the human-AI team performance.

Authors:Olga Viberg, Mutlu Cukurova, Rene F. Kizilcec, Simon Buckingham Shum, Dorottya Demszky, Dragan Gašević, Thorben Jansen, Ioana Jivet, Jelena Jovanovic, Jennifer Meyer, Kou Murayama, Zach Pardos, Chris Piech, Nikol Rummel, Naomi E. Winstone
Title: Protecting and Promoting Human Agency in Education in the Age of Artificial Intelligence
Abstract:
Human agency is crucial in education and increasingly challenged by the use of generative AI. This meeting report synthesizes interdisciplinary insights and conceptualizes four aspects that delineate human agency: human oversight, AI-human complementarity, AI competencies, and relational emergence. We explore practical dilemmas for protecting and promoting agency, focusing on normative constraints, transparency, and cognitive offloading, and highlight key tensions and implications to inform ethical and effective AI integration in education.

Authors:Kirk Vanacore, Danielle R Thomas, Digory Smith, Bibi Groot, Justin Reich, Rene Kizilcec
Title: A Causal Framework for Estimating Heterogeneous Effects of On-Demand Tutoring
Abstract:
This paper introduces a scalable causal inference framework for estimating the immediate, session-level effects of on-demand human tutoring embedded within adaptive learning systems. Because students seek assistance at moments of difficulty, conventional evaluation is confounded by self-selection and time-varying knowledge states. We address these challenges by integrating principled analytic sample construction with Deep Knowledge Tracing (DKT) to estimate latent mastery, followed by doubly robust estimation using Causal Forests. Applying this framework to over 5,000 middle-school mathematics tutoring sessions, we find that requesting human tutoring increases next-problem correctness by approximately 4 percentage points and accuracy on the subsequent skill encountered by approximately 3 percentage points, suggesting that the effects of tutoring have proximal transfer across knowledge components. This effect is robust to various forms of model specification and potential unmeasured confounders. Notably, these effects exhibit significant heterogeneity across sessions and students, with session-level effect estimates ranging from $-20.25pp$ to $+19.91pp$. Our follow-up analyses suggest that typical behavioral indicators, such as student talk time, do not consistently correlate with high-impact sessions. Furthermore, treatment effects are larger for students with lower prior mastery and slightly smaller for low-SES students. This framework offers a rigorous, practical template for the evaluation and continuous improvement of on-demand human tutoring, with direct applications for emerging AI tutoring systems.

Authors:Lefan Lai, Tinghui Li, Zhanna Sarsenbayeva, Brandon Victor Syiem
Title: Searching Through Complex Worlds: Visual Search and Spatial Regularity Memory in Mixed Reality
Abstract:
Visual search is a core component of mixed reality (MR) interactions, influenced by the complexities of MR application contexts. In this paper, we investigate how prevalent factors in MR influence visual search performance and spatial regularity memory -- including the physical environment complexity, secondary task presence, virtual content depth and spatial layout configurations. Contrary to prior work, we found that the secondary auditory task did not have a significant main effect on visual search performance, while significantly elevating higher perceived workload measures in all conditions. Complex environments and varied virtual elements depths significantly hinder visual search, but did not significantly increase perceived workload measures. Finally, participants did not explicitly recognize repeated spatial configurations of virtual elements, but performed significantly better when searching repeated spatial configurations, suggesting implicit memory of spatial regularities. Our work presents novel insights on visual search and highlights key considerations when designing MR for different application contexts.

Authors:Bijean Ghafouri, Emilio Ferrara
Title: Lost Before Translation: Social Information Transmission and Survival in AI-AI Communication
Abstract:
When AI systems summarize and relay information, they inevitably transform it. But how? We introduce an experimental paradigm based on the telephone game to study what happens when AI talks to AI. Across five studies tracking content through AI transmission chains, we find three consistent patterns. The first is convergence, where texts differing in certainty, emotional intensity, and perspectival balance collapse toward a shared default of moderate confidence, muted affect, and analytical structure. The second is selective survival, where narrative anchors persist while the texture of evidence, hedges, quotes, and attributions is stripped away. The third is competitive filtering, where strong arguments survive while weaker but valid considerations disappear when multiple viewpoints coexist. In downstream experiments, human participants rated AI-transmitted content as more credible and polished. Importantly, however, humans also showed degraded factual recall, reduced perception of balance, and diminished emotional resonance. We show that the properties that make AI-mediated content appear authoritative may systematically erode the cognitive and affective diversity on which informed judgment depends.

Authors:Rahul Nanda, Chandra Maddila, Smriti Jha, Euna Mehnaz Khan, Matteo Paltenghi, Satish Chandra
Title: Wink: Recovering from Misbehaviors in Coding Agents
Abstract:
Autonomous coding agents, powered by large language models (LLMs), are increasingly being adopted in the software industry to automate complex engineering tasks. However, these agents are prone to a wide range of misbehaviors, such as deviating from the user's instructions, getting stuck in repetitive loops, or failing to use tools correctly. These failures disrupt the development workflow and often require resource-intensive manual intervention. In this paper, we present a system for automatically recovering from agentic misbehaviors at scale. We first introduce a taxonomy of misbehaviors grounded in an analysis of production traffic, identifying three primary categories: Specification Drift, Reasoning Problems, and Tool Call Failures, which we find occur in about 30% of all agent trajectories. To address these issues, we developed a lightweight, asynchronous self-intervention system named Wink. Wink observes agent trajectories and provides targeted course-correction guidance to nudge the agent back to a productive path. We evaluated our system on over 10,000 real world agent trajectories and found that it successfully resolves 90% of the misbehaviors that require a single intervention. Furthermore, a live A/B test in our production environment demonstrated that our system leads to a statistically significant reduction in Tool Call Failures, Tokens per Session and Engineer Interventions per Session. We present our experience designing and deploying this system, offering insights into the challenges of building resilient agentic systems at scale.

Authors:Dhiman Goswami, Jai Kruthunz Naveen Kumar, Sanchari Das
Title: NLP Privacy Risk Identification in Social Media (NLP-PRISM): A Survey
Abstract:
Natural Language Processing (NLP) is integral to social media analytics but often processes content containing Personally Identifiable Information (PII), behavioral cues, and metadata raising privacy risks such as surveillance, profiling, and targeted advertising. To systematically assess these risks, we review 203 peer-reviewed papers and propose the NLP Privacy Risk Identification in Social Media (NLP-PRISM) framework, which evaluates vulnerabilities across six dimensions: data collection, preprocessing, visibility, fairness, computational risk, and regulatory compliance. Our analysis shows that transformer models achieve F1-scores ranging from 0.58-0.84, but incur a 1% - 23% drop under privacy-preserving fine-tuning. Using NLP-PRISM, we examine privacy coverage in six NLP tasks: sentiment analysis (16), emotion detection (14), offensive language identification (19), code-mixed processing (39), native language identification (29), and dialect detection (24) revealing substantial gaps in privacy research. We further found a (reduced by 2% - 9%) trade-off in model utility, MIA AUC (membership inference attacks) 0.81, AIA accuracy 0.75 (attribute inference attacks). Finally, we advocate for stronger anonymization, privacy-aware learning, and fairness-driven training to enable ethical NLP in social media contexts.

Authors:Mikio Nakano, Hironori Takeuchi, Kazunori Komatani
Title: A Methodology for Identifying Evaluation Items for Practical Dialogue Systems Based on Business-Dialogue System Alignment Models
Abstract:
This paper proposes a methodology for identifying evaluation items for practical dialogue systems. Traditionally, user satisfaction and user experiences have been the primary metrics for evaluating dialogue systems. However, there are various other evaluation items to consider when developing and operating practical dialogue systems, and such evaluation items are expected to lead to new research topics. So far, there has been no methodology for identifying these evaluation items. We propose identifying evaluation items based on business-dialogue system alignment models, which are applications of business-IT alignment models used in the development and operation of practical IT systems. We also present a generic model that facilitates the construction of a business-dialogue system alignment model for each dialogue system.

Authors:Rohit Kaushik, Eva Kaushik
Title: A Koopman-Bayesian Framework for High-Fidelity, Perceptually Optimized Haptic Surgical Simulation
Abstract:
We introduce a unified framework that combines nonlinear dynamics, perceptual psychophysics and high frequency haptic rendering to enhance realism in surgical simulation. The interaction of the surgical device with soft tissue is elevated to an augmented state space with a Koopman operator formulation, allowing linear prediction and control of the dynamics that are nonlinear by nature. To make the rendered forces consistent with human perceptual limits, we put forward a Bayesian calibration module based on WeberFechner and Stevens scaling laws, which progressively shape force signals relative to each individual's discrimination thresholds. For various simulated surgical tasks such as palpation, incision, and bone milling, the proposed system attains an average rendering latency of 4.3 ms, a force error of less than 2.8% and a 20% improvement in perceptual discrimination. Multivariate statistical analyses (MANOVA and regression) reveal that the system's performance is significantly better than that of conventional spring-damper and energy, based rendering methods. We end by discussing the potential impact on surgical training and VR, based medical education, as well as sketching future work toward closed, loop neural feedback in haptic interfaces.

Authors:Yancheng Cao, Yishu Ji, Chris Yue Fu, Sahiti Dharmavaram, Meghan Turchioe, Natalie C Benda, Lena Mamykina, Yuling Sun, Xuhai "Orson" Xu
Title: More than Decision Support: Exploring Patients' Longitudinal Usage of Large Language Models in Real-World Healthcare-Seeking Journeys
Abstract:
Large language models (LLMs) have been increasingly adopted to support patients' healthcare-seeking in recent years. While prior patient-centered studies have examined the capabilities and experience of LLM-based tools in specific health-related tasks such as information-seeking, diagnosis, or decision-supporting, the inherently longitudinal nature of healthcare in real-world practice has been underexplored. This paper presents a four-week diary study with 25 patients to examine LLMs' roles across healthcare-seeking trajectories. Our analysis reveals that patients integrate LLMs not just as simple decision-support tools, but as dynamic companions that scaffold their journey across behavioral, informational, emotional, and cognitive levels. Meanwhile, patients actively assign diverse socio-technical meanings to LLMs, altering the traditional dynamics of agency, trust, and power in patient-provider relationships. Drawing from these findings, we conceptualize future LLMs as a longitudinal boundary companion that continuously mediates between patients and clinicians throughout longitudinal healthcare-seeking trajectories.

Authors:Baixiao Huang, Baiyu Huang, Yu Hou
Title: Learning Transferability: A Two-Stage Reinforcement Learning Approach for Enhancing Quadruped Robots' Performance in U-Shaped Stair Climbing
Abstract:
Quadruped robots are employed in various scenarios in building construction. However, autonomous stair climbing across different indoor staircases remains a major challenge for robot dogs to complete building construction tasks. In this project, we employed a two-stage end-to-end deep reinforcement learning (RL) approach to optimize a robot's performance on U-shaped stairs. The training robot-dog modality, Unitree Go2, was first trained to climb stairs on Isaac Lab's pyramid-stair terrain, and then to climb a U-shaped indoor staircase using the learned policies. This project explores end-to-end RL methods that enable robot dogs to autonomously climb stairs. The results showed (1) the successful goal reached for robot dogs climbing U-shaped stairs with a stall penalty, and (2) the transferability from the policy trained on U-shaped stairs to deployment on straight, L-shaped, and spiral stair terrains, and transferability from other stair models to deployment on U-shaped terrain.

Authors:Janet G. Johnson, Ruijie Sophia Huang, Khoa Nguyen, Ji Young Nam, Michael Nebeling
Title: "I Felt Bad After We Ignored Her": Understanding How Interface-Driven Social Prominence Shapes Group Discussions with GenAI
Abstract:
Recent advancements in the conversational and social capabilities of generative AI (GenAI) have sparked interest in its role as an agent capable of actively participating in human-AI group discussions. Despite this momentum, we don't fully understand how GenAI shapes conversational dynamics or how the interface design impacts its influence on the group. In this paper, we introduce interface-driven social prominence as a design lens for collaborative GenAI systems. We then present a GenAI-based conversational agent that can actively engage in spoken dialogue during video calls and design three distinct collaboration modes that vary the social prominence of the agent by manipulating its presence in the shared space and the degree of control users have over its participation. A mixed-methods within-subjects study, in which 18 dyads engaged in realistic discussions with a GenAI agent, offers empirical insights into how communication patterns and the collective negotiation of GenAI's influence shift based on how it is embedded into the collaborative experience. Based on these findings, we outline design implications for supporting the coordination and critical engagement required in human-AI groups.

Authors:Nima Esmi, Maryam Nezhad-Moghaddam, Fatemeh Borhani, Asadollah Shahbahrami, Amin Daemdoost, Georgi Gaydadjiev
Title: GPT-5 vs Other LLMs in Long Short-Context Performance
Abstract:
With the significant expansion of the context window in Large Language Models (LLMs), these models are theoretically capable of processing millions of tokens in a single pass. However, research indicates a significant gap between this theoretical capacity and the practical ability of models to robustly utilize information within long contexts, especially in tasks that require a comprehensive understanding of numerous details. This paper evaluates the performance of four state-of-the-art models (Grok-4, GPT-4, Gemini 2.5, and GPT-5) on long short-context tasks. For this purpose, three datasets were used: two supplementary datasets for retrieving culinary recipes and math problems, and a primary dataset of 20K social media posts for depression detection. The results show that as the input volume on the social media dataset exceeds 5K posts (70K tokens), the performance of all models degrades significantly, with accuracy dropping to around 50-53% for 20K posts. Notably, in the GPT-5 model, despite the sharp decline in accuracy, its precision remained high at approximately 95%, a feature that could be highly effective for sensitive applications like depression detection. This research also indicates that the "lost in the middle" problem has been largely resolved in newer models. This study emphasizes the gap between the theoretical capacity and the actual performance of models on complex, high-volume data tasks and highlights the importance of metrics beyond simple accuracy for practical applications.

Authors:Blessing Jerry, Lourdes Moreno, Paloma Martínez
Title: Human Oversight-by-Design for Accessible Generative IUIs
Abstract:
LLM-generated interfaces are increasingly used in high-consequence workflows (e.g., healthcare communication), where how information is presented can impact downstream actions. These interfaces and their content support human interaction with AI-assisted decision-making and communication processes and should remain accessible and usable for people with disabilities. Accessible plain-language interfaces serve as an enabling infrastructure for meaningful human oversight. In these contexts, ethical and trustworthiness risks, including hallucinations, semantic distortion, bias, and accessibility barriers, can undermine reliability and limit users' ability to understand, monitor, and intervene in AI-supported processes. Yet, in practice, oversight is often treated as a downstream check, without clear rules for when human intervention is required or who is accountable. We propose oversight-by-design: embedding human judgment across the pipeline as an architectural commitment, implemented via escalation policies and explicit UI controls for risk signalling and intervention. Automated checks flag risk in generated UI communication that supports high-stakes workflows (e.g., readability, semantic fidelity, factual consistency, and standards-based accessibility constraints) and escalate to mandatory Human-in-the-Loop (HITL) review before release when thresholds are violated, or uncertainty is high. Human-on-the-Loop (HOTL) supervision monitors system-level signals over time (alerts, escalation rates, and compliance evidence) to tune policies and detect drift. Structured review feedback is translated into governance actions (rule and prompt updates, threshold calibration, and traceable audit logs), enabling scalable intervention and verifiable oversight for generative UI systems that support high-stakes workflows.

Authors:Johannes Wortmann, Bernd Schäufele, Konstantin Klipp, Ilja Radusch, Katharina Blaß, Thomas Jung
Title: Enhanced Accessibility for Mobile Indoor Navigation
Abstract:
The navigation of indoor spaces poses difficult challenges for individuals with visual impairments, as it requires processing of sensory information, dealing with uncertainties, and relying on assistance. To tackle these challenges, we present an indoor navigation app that places importance on accessibility for visually impaired users. Our approach involves a combination of user interviews and an analysis of the Web Content Accessibility Guidelines. With this approach, we are able to gather invaluable insights and identify design requirements for the development of an indoor navigation app. Based on these insights, we develop an indoor navigation app that prioritizes accessibility, integrating enhanced features to meet the needs of visually impaired users. The usability of the app is being thoroughly evaluated through tests involving both visually impaired and sighted users. Initial feedback has been positive, with users appreciating the inclusive user interface and the usability with a wide range of accessibility tools and Android device settings.

Authors:Qijia Chen, Andrea Bellucci, Giulio Jacucci
Title: Usage Matters: The Role of Frequency, Duration, and Experience in Presence Formation in Social Virtual Reality
Abstract:
The sense of presence is central to immersive experiences in Virtual Reality (VR), and particularly salient in socially rich platforms like social VR. While prior studies have explored various aspects related to presence, less is known about how ongoing usage behaviors shape presence in everyday engagement. To address this gap, we examine whether usage intensity, captured through frequency of use, session duration, and years of VR experience, predicts presence in social VR. A survey of 295 users assessed overall, social, spatial, and self-presence using validated scales. Results show that both frequency and duration consistently predict higher presence across all dimensions, with interaction effects indicating that frequent and extended sessions synergistically amplify the experience of "being there." These effects were stable across age and gender. Our findings extend presence research beyond the laboratory by identifying behavioral predictors in social VR and offer insights for building inclusive environments that reliably foster presence.

Authors:Qijia Chen, Andrea Bellucci, Giulio Jacucci
Title: Social, Spatial, and Self-Presence as Predictors of Basic Psychological Need Satisfaction in Social Virtual Reality
Abstract:
Extensive research has examined presence and basic psychological needs (drawing on Self-Determination Theory) in digital media. While prior work offers hints of potential connections, we lack a systematic account of whether and how distinct presence dimensions map onto the basic needs of autonomy, competence, and relatedness. We surveyed 301 social VR users and analyzed using Structural Equation Modeling. Results show that social presence predicts all three needs, while self-presence predicts competence and relatedness, and spatial presence shows no direct or moderating effects. Gender and age moderated these relationships: women benefited more from social presence for autonomy and relatedness, men from self- and spatial presence for competence and autonomy, and younger users showed stronger associations between social presence and relatedness, and between self-presence and autonomy. These findings position presence as a motivational mechanism shaped by demographic factors. The results offer theoretical insights and practical implications for designing inclusive, need-supportive multiuser VR environments.

Authors:Jackie Baek, Yaopeng Fu, Will Ma, Tianyi Peng
Title: AI Agents for Inventory Control: Human-LLM-OR Complementarity
Abstract:
Inventory control is a fundamental operations problem in which ordering decisions are traditionally guided by theoretically grounded operations research (OR) algorithms. However, such algorithms often rely on rigid modeling assumptions and can perform poorly when demand distributions shift or relevant contextual information is unavailable. Recent advances in large language models (LLMs) have generated interest in AI agents that can reason flexibly and incorporate rich contextual signals, but it remains unclear how best to incorporate LLM-based methods into traditional decision-making pipelines. We study how OR algorithms, LLMs, and humans can interact and complement each other in a multi-period inventory control setting. We construct InventoryBench, a benchmark of over 1,000 inventory instances spanning both synthetic and real-world demand data, designed to stress-test decision rules under demand shifts, seasonality, and uncertain lead times. Through this benchmark, we find that OR-augmented LLM methods outperform either method in isolation, suggesting that these methods are complementary rather than substitutes. We further investigate the role of humans through a controlled classroom experiment that embeds LLM recommendations into a human-in-the-loop decision pipeline. Contrary to prior findings that human-AI collaboration can degrade performance, we show that, on average, human-AI teams achieve higher profits than either humans or AI agents operating alone. Beyond this population-level finding, we formalize an individual-level complementarity effect and derive a distribution-free lower bound on the fraction of individuals who benefit from AI collaboration; empirically, we find this fraction to be substantial.

Authors:Faezeh Vahedi, Morteza Memari, Ramtin Tabatabaei, Alireza Taheri
Title: Human-Like Gaze Behavior in Social Robots: A Deep Learning Approach Integrating Human and Non-Human Stimuli
Abstract:
Nonverbal behaviors, particularly gaze direction, play a crucial role in enhancing effective communication in social interactions. As social robots increasingly participate in these interactions, they must adapt their gaze based on human activities and remain receptive to all cues, whether human-generated or not, to ensure seamless and effective communication. This study aims to increase the similarity between robot and human gaze behavior across various social situations, including both human and non-human stimuli (e.g., conversations, pointing, door openings, and object drops). A key innovation in this study, is the investigation of gaze responses to non-human stimuli, a critical yet underexplored area in prior research. These scenarios, were simulated in the Unity software as a 3D animation and a 360-degree real-world video. Data on gaze directions from 41 participants were collected via virtual reality (VR) glasses. Preprocessed data, trained two neural networks-LSTM and Transformer-to build predictive models based on individuals' gaze patterns. In the animated scenario, the LSTM and Transformer models achieved prediction accuracies of 67.6% and 70.4%, respectively; In the real-world scenario, the LSTM and Transformer models achieved accuracies of 72% and 71.6%, respectively. Despite the gaze pattern differences among individuals, our models outperform existing approaches in accuracy while uniquely considering non-human stimuli, offering a significant advantage over previous literature. Furthermore, deployed on the NAO robot, the system was evaluated by 275 participants via a comprehensive questionnaire, with results demonstrating high satisfaction during interactions. This work advances social robotics by enabling robots to dynamically mimic human gaze behavior in complex social contexts.

Authors:Qiaosi Wang, Jini Kim, Avanita Sharma, Alicia, Lee, Jodi Forlizzi, Hong Shen
Title: Situated, Dynamic, and Subjective: Envisioning the Design of Theory-of-Mind-Enabled Everyday AI with Industry Practitioners
Abstract:
Theory of Mind (ToM) -- the ability to infer what others are thinking (e.g., intentions) from observable cues -- is traditionally considered fundamental to human social interactions. This has sparked growing efforts in building and benchmarking AI's ToM capability, yet little is known about how such capability could translate into the design and experience of everyday user-facing AI products and services. We conducted 13 co-design sessions with 26 U.S.-based AI practitioners to envision, reflect, and distill design recommendations for ToM-enabled everyday AI products and services that are both future-looking and grounded in the realities of AI design and development practices. Analysis revealed three interrelated design recommendations: ToM-enabled AI should 1) be situated in the social context that shape users' mental states, 2) be responsive to the dynamic nature of mental states, and 3) be attuned to subjective individual differences. We surface design tensions within each recommendation that reveal a broader gap between practitioners' envisioned futures of ToM-enabled AI and the realities of current AI design and development practices. These findings point toward the need to move beyond static, inference-driven approach to ToM and toward designing ToM as a pervasive capability that supports continuous human-AI interaction loops.

Authors:Caitlin Morris, Pattie Maes
Title: Same Feedback, Different Source: How AI vs. Human Feedback Shapes Learner Engagement
Abstract:
When learners receive feedback, what they believe about its source may shape how they engage with it. As AI is used alongside human instructors, understanding these attribution effects is essential for designing effective hybrid AI-human educational systems. We designed a creative coding interface that isolates source attribution while controlling for content: all participants receive identical LLM-generated feedback, but half see it attributed to AI and half to a human teaching assistant (TA). We found two key results. First, perceived feedback source affected engagement: learners in the TA condition spent significantly more time and effort (d = 0.88-1.56) despite receiving identical feedback. Second, perceptions differed: AI-attributed feedback ratings were predicted by prior trust in AI (r = 0.85), while TA-attributed ratings were predicted by perceived genuineness (r = 0.65). These findings suggest that feedback source shapes both engagement and evaluation, with implications for hybrid educational system design.

Authors:Bingyi Han, Ying Ma, Simon Coghlan, Dana McKay, George Buchanan, Wally Smith
Title: AI Sensing and Intervention in Higher Education: Student Perceptions of Learning Impacts, Affective Responses, and Ethical Priorities
Abstract:
AI technologies that sense student attention and emotions to enable more personalised teaching interventions are increasingly promoted, but raise pressing questions about student learning, well-being, and ethics. In particular, students' perspectives about AI sensing-intervention in learning are often overlooked. We conducted an online mixed-method experiment with Australian university students (N=132), presenting video scenarios varying by whether sensing was used (in-use vs. not-in-use), sensing modality (gaze-based attention detection vs. facial-based emotion detection), and intervention (by digital device vs. teacher). Participants also completed pairwise ranking tasks to prioritise six core ethical concerns. Findings revealed that students valued targeted intervention but responded negatively to AI monitoring, regardless of sensing methods. Students preferred system-generated hints over teacher-initiated assistance, citing learning agency and social embarrassment concerns. Students' ethical considerations prioritised autonomy and privacy, followed by transparency, accuracy, fairness, and learning beneficence. We advocate designing customisable, social-sensitive, non-intrusive systems that preserve student control, agency, and well-being.

Authors:Alireza Taheri, Minoo Alemi, Elham Ranjkar, Raman Rafatnejad, Ali F. Meghdari
Title: Design, Development, and Use of Maya Robot as an Assistant for the Therapy/Education of Children with Cancer: a Pilot Study
Abstract:
This study centers around the design and implementation of the Maya Robot, a portable elephant-shaped social robot, intended to engage with children undergoing cancer treatment. Initial efforts were devoted to enhancing the robot's facial expression recognition accuracy, achieving a 98% accuracy through deep neural networks. Two subsequent preliminary exploratory experiments were designed to advance the study's objectives. The first experiment aimed to compare pain levels experienced by children during the injection process, with and without the presence of the Maya robot. Twenty-five children, aged 4 to 9, undergoing cancer treatment participated in this counterbalanced study. The paired T-test results revealed a significant reduction in perceived pain when the robot was actively present in the injection room. The second experiment sought to assess perspectives of hospitalized children and their mothers during engagement with Maya through a game. Forty participants, including 20 children aged 4 to 9 and their mothers, were involved. Post Human-Maya Interactions, UTAUT questionnaire results indicated that children experienced significantly less anxiety than their parents during the interaction and game play. Notably, children exhibited higher trust levels in both the robot and the games, presenting a statistically significant difference in trust levels compared to their parents (P-value < 0.05). This preliminary exploratory study highlights the positive impact of utilizing Maya as an assistant for therapy/education in a clinical setting, particularly benefiting children undergoing cancer treatment. The findings underscore the potential of social robots in pediatric healthcare contexts, emphasizing improved pain management and emotional well-being among young patients.

Authors:Kai Alexander Hackney, Lucas Guarenti Zangari, Jhonathan Sora-Cardenas, Emmanuel Munoz, Sterling R. Kalogeras, Betsy DiSalvo, Pedro Guillermo Feijoo-Garcia
Title: Exploring the Interplay Between Voice, Personality, and Gender in Human-Agent Interactions
Abstract:
To foster effective human-agent interactions, designers need to identify characteristics that could affect how agents are perceived and accepted, and to what extent they could impact rapport-building. Aiming to explore the role of user-agent synchrony, we assessed 388 participants to determine whether they could perceive personality traits from four artificial voices we selected and adapted from human samples, considering gender (male or female) and personality (introvert or extrovert) as grouping factors. Our findings suggest that participants were able to significantly differentiate female agents by personality, while male agents were not consistently distinguished. We also observed evidence of personality synchrony, where participants tended to perceive the first agent as more similar to their own personality, with this effect driven mainly by male participants, especially toward male agents. This paper contributes findings and insights to consider the interplay of user-agent personality and gender synchrony in the design of human-agent interactions.

Authors:Harry Yizhou Tian, Hasan Amin, Ming Yin
Title: Understanding the Effects of AI-Assisted Critical Thinking on Human-AI Decision Making
Abstract:
Despite the growing prevalence of human-AI decision making, the human-AI team's decision performance often remains suboptimal, partially due to insufficient examination of humans' own reasoning. In this paper, we explore designing AI systems that directly analyze humans' decision rationales and encourage critical reflection of their own decisions. We introduce the AI-Assisted Critical Thinking (AACT) framework, which leverages a domain-specific AI model's counterfactual analysis of human decision to help decision-makers identify potential flaws in their decision argument and support the correction of them. Through a case study on house price prediction, we find that AACT outperforms traditional AI-based decision-support in reducing over-reliance on AI, though also triggering higher cognitive load. Subgroup analysis reveals AACT can be particularly beneficial for some decision-makers such as those very familiar with AI technologies. We conclude by discussing the practical implications of our findings, use cases and design choices of AACT, and considerations for using AI to facilitate critical thinking.

Authors:Varchita Lalwani, Utkarsh Agarwal, Michael Saugstad, Manish Kumar, Jon E. Froehlich, Anupam Sobti
Title: Towards Human-AI Accessibility Mapping in India: VLM-Guided Annotations and POI-Centric Analysis in Chandigarh
Abstract:
Project Sidewalk is a web-based platform that enables crowdsourcing accessibility of sidewalks at city-scale by virtually walking through city streets using Google Street View. The tool has been used in 40 cities across the world, including the US, Mexico, Chile, and Europe. In this paper, we describe adaptation efforts to enable deployment in Chandigarh, India, including modifying annotation types, provided examples, and integrating VLM-based mission guidance, which adapts instructions based on a street scene and metadata analysis. Our evaluation with 3 annotators indicates the utility of AI-mission guidance with an average score of 4.66. Using this adapted Project Sidewalk tool, we conduct a Points of Interest (POI)-centric accessibility analysis for three sectors in Chandigarh with very different land uses, residential, commercial and institutional covering about 40 km of sidewalks. Across 40 km of roads audited in three sectors and around 230 POIs, we identified 1,644 of 2,913 locations where infrastructure improvements could enhance accessibility.

Authors:Zhennan Yi, Sophia Sakakibara Capello, Randy Gomez, Selma Šabanović
Title: Adding More Value Than Work: Practical Guidelines for Integrating Robots into Intercultural Competence Learning
Abstract:
While social robots have demonstrated effectiveness in supporting students' intercultural competence development, it is unclear how they can effectively be adopted for integrated use in K-12 schools. We conducted two phases of design workshops with teachers, where they co-designed robot-mediated intercultural activities while considering student needs and school integration concerns. Using thematic analysis, we identify appropriate scenarios and roles for classroom robots, explore how robots could complement rather than replace teachers, and consider how to address ethical and compliance considerations. Our findings provide practical design guidelines for the HRI community to develop social robots that can effectively support intercultural education in K-12 schools.

Authors:Ronald Cumbal, Marcus Göransson, Alexandros Rouchitsas, Didem Gürdür Broo, Ginevra Castellano
Title: A Collaborative Crowdsourcing Method for Designing External Interfaces for Autonomous Vehicles
Abstract:
Participatory design effectively engages stakeholders in technology development but is often constrained by small, resource-intensive activities. This study explores a scalable complementary method, enabling broad pattern identification in the design for interfaces in autonomous vehicles. We implemented a human-centered, iterative process that combined crowd creativity, structured participatory principles, and expert feedback. Across iterations, participant concepts evolved from simple cues to multimodal systems. Novel suggestions ranged from personalized features, like tracking lights, to inclusive elements like haptic feedback, progressively refining designs toward greater contextual awareness. To assess outcomes, we compared representative designs: a popular-design, reflecting the most frequently proposed ideas, and an innovative-design, merging participant innovations with expert input. Both were evaluated against a benchmark through video-based simulations. Results show that the popular-design outperformed the alternatives on both interpretability and user experience, with expert-validated innovations performing second best. These findings highlight the potential of scalable participatory methods for shaping emerging technologies.

Authors:Franklin Mingzhe Li, Michael Xieyang Liu, Cynthia L. Bennett, Shaun K. Kane
Title: ADCanvas: Accessible and Conversational Audio Description Authoring for Blind and Low Vision Creators
Abstract:
Audio Description (AD) provides essential access to visual media for blind and low vision (BLV) audiences. Yet current AD production tools remain largely inaccessible to BLV video creators, who possess valuable expertise but face barriers due to visually-driven interfaces. We present ADCanvas, a multimodal authoring system that supports non-visual control over audio description (AD) creation. ADCanvas combines conversational interaction with keyboard-based playback control and a plain-text, screen reader-accessible editor to support end-to-end AD authoring and visual question answering (VQA). Combining screen-reader-friendly controls with a multimodal LLM agent, ADCanvas supports live VQA, script generation, and AD modification. Through a user study with 12 BLV video creators, we find that users adopt the conversational agent as an informational aide and drafting assistant, while maintaining agency through verification and editing. For example, participants saw themselves as curators who received information from the model and filtered it down for their audience. Our findings offer design implications for accessible media tools, including precise editing controls, accessibility support for creative ideation, and configurable rules for human-AI collaboration.

Authors:Mengyu Chen, Youngwook Do, Feiyu Lu, Kaiming Cheng, Blair MacIntyre
Title: Secure and Private Spatial Sharing for Mixed Reality Remote Collaboration in Enterprise Settings
Abstract:
Mixed Reality (MR) technologies are increasingly adopted by enterprises to enhance remote collaboration, enabling users to share real-time views of their physical environments through head-mounted displays (HMDs). While MR spatial sharing offers significant benefits, it introduces complex security and privacy risks, particularly in balancing employee collaboration needs with enterprise data protection requirements across office and personal spaces. This paper investigates these challenges through formative interviews with employees and expert consultations with professionals in cybersecurity, IoT, technology risk, and corporate legal domains. We present a conceptual framework for secure MR spatial sharing in enterprise contexts and identify critical concerns and requirements for system design. Based on our findings, we offer actionable recommendations to guide the development of secure and privacy-preserving MR spatial sharing solutions for future enterprise deployments.

Authors:Xiaodan Hu, Monica Perusquía-Hernández, Mayra Donaji Barrera Machuca, Anil Ufuk Batmaz, Yan Zhang, Wolfgang Stuerzlinger, Kiyoshi Kiyokawa
Title: Varifocal Displays Reduce the Impact of the Vergence-Accommodation Conflict on 3D Pointing Performance in Augmented Reality Systems
Abstract:
This paper investigates whether a custom varifocal display can improve 3D pointing performance in augmented reality (AR), where the vergence-accommodation conflict (VAC) is known to impair interaction. Varifocal displays have been hypothesized to alleviate the VAC by dynamically matching the focal distance to the user's gaze-defined target depth. Following prior work, we conducted a within-subject study with 24 participants performing an ISO 9241-411 pointing task under varifocal and fixed-focal viewing. Overall, varifocal viewing yielded significantly higher performance than the fixed-focal baseline across key interaction metrics, although the magnitude and even the direction of the benefit varied across individuals. In particular, participants' responses exhibited a baseline-dependent pattern, with smaller improvements (or occasional degradation) observed for those with better baseline performance. Our findings suggest that varifocal technology can improve AR pointing performance relative to fixed-focal viewing, while highlighting substantial individual differences that should be considered in design and evaluation.

Authors:Jeongmin Rhee, Changhee Lee, DongHwa Shin, Bohyoung Kim
Title: Vivifying LIME: Visual Interactive Testbed for LIME Analysis
Abstract:
Explainable Artificial Intelligence (XAI) has gained importance in interpreting model predictions. Among leading techniques for XAI, Local Interpretable Model-agnostic Explanations (LIME) is most frequently utilized as it notably helps people's understanding of complex models. However, LIME's analysis is constrained to a single image at a time. Besides, it lacks interaction mechanisms for observing the LIME's results and direct manipulations of factors affecting the results. To address these issues, we introduce an interactive visualization tool, LIMEVis, which improves the analysis workflow of LIME by enabling users to explore multiple LIME results simultaneously and modify them directly. With LIMEVis, we could conveniently identify common features in images that a model seems to mainly consider for category classification. Additionally, by interactively modifying the LIME results, we could determine which segments in an image influence the model's classification.

Authors:Suvadeep Mukherjee, Björn Rohles, Gabriele Lenzini, Pedro Cardoso-Leite
Title: Can Theory-Informed Message Framing Drive Honest and Motivated Performance with Better Assessment Experiences in a Remote Assessment?
Abstract:
Remote unproctored assessments increasingly use messaging interventions to reduce cheating, but existing approaches lack theoretical grounding, focus narrowly on cheating suppression while overlooking performance and experience, and treat cheating as binary rather than continuous. This study examines whether messages based on 15 psychological concepts from self-determination, cognitive dissonance, social norms, and self-efficacy theories can reduce cheating while preserving performance and experience. Through an expert workshop (N=5), we developed 45 theory-informed messages and tested them with online participants (N=1232) who completed an incentivized anagram task. Participants were classified as non-cheaters (0% items cheated), partial-cheaters (1-99% cheated), or full-cheaters (100% cheated). Results show that concept-based messages reduced full-cheating occurrence by 42% (33% to 19%), increased non-cheating by 19% (53% to 63%), with no negative effects on performance or experience across integrity groups. Surprisingly, messages grounded in different theoretical concepts produced virtually identical effects. Analyses of self-rated psychological mechanisms revealed that messages influenced multiple mechanisms simultaneously rather than their intended targets, though these mechanisms predicted behavior, performance, and experience. These findings show that causal pathways are more complex than current theories predict. Practically, integrity interventions using supportive motivation rather than rule enforcement can reduce cheating without impairing performance or experience.

Authors:Yijun Liu, John Gallagher, Sarah Sterman, Tal August
Title: From Crafting Text to Crafting Thought: Grounding AI Writing Support to Writing Center Pedagogy
Abstract:
As AI writing tools evolve from fixing surface errors to creating language with writers, new capabilities raise concerns about negative impacts on student writers, such as replacing their voices and undermining critical thinking skills. To address these challenges, we look at a parallel transition in university writing centers from focusing on fixing errors to preserving student voices. We develop design guidelines informed by writing center literature and interviews with 10 writing tutors. We illustrate these guidelines in a prototype AI tool, Writor. Writor helps writers revise text by setting goals, providing balanced feedback, and engaging in conversations without generating text verbatim. We conducted an expert review with 30 writing instructors, tutors, and AI researchers on Writor to assess the pedagogical soundness, alignment with writing center pedagogy, and integration contexts. We distill our findings into design implications for future AI writing feedback systems, including designing for trust among AI-skeptical educators.

Authors:Nandini Sharma, Thomas Bock, Rich Bowen, Sayeed Choudhury, Brian Fitzgerald, Matt Germonprez, Jim Herbsleb, James Howison, Tom Hughes, Min Kyung Lee, Stephanie Lieggi, Andreas Liesenfeld, Georg Link, Nicholas Matsakis, Audris Mockus, Narayan Ramasubbu, Christopher Robinson, Gregorio Robles, Nithya Ruff, Sonali Shah, Igor Steinmacher, Bogdan Vasilescu, Stephen Walli, Christopher Yoo
Title: Accountability in Open Source Software Ecosystems: Workshop Report
Abstract:
Open source software ecosystems are composed of a variety of stakeholders including but not limited to non-profit organizations, volunteer contributors, users, and corporations. The needs and motivations of these stakeholders are often diverse, unknown, and sometimes even conflicting given the engagement and investment of both volunteers and corporate actors. Given this, it is not clear how open source communities identify and engage with their stakeholders, understand their needs, and hold themselves accountable to those needs. We convened 24 expert scholars and practitioners studying and working with open source software communities for an exploratory workshop discussion on these ideas. The workshop titled "Accountability and Open Source Software Ecosystems" was organized on Oct 14-15 on campus in Carnegie Mellon University, Pittsburgh, PA. The purpose of this in-person workshop was to initiate conversations that explore important and urgent questions related to the role of accountability in open source software ecosystems, and to inspire an exciting research agenda and meaningful stakeholder engagement ideas for practitioners.

Authors:Danqing Shi, Lan Jiang, Katherine M. Collins, Shangzhe Wu, Ayush Tewari, Miri Zilka
Title: How do people watch AI-generated videos of physical scenes?
Abstract:
The growing prevalence of realistic AI-generated videos on media platforms increasingly blurs the line between fact and fiction, eroding public trust. Understanding how people watch AI-generated videos offers a human-centered perspective for improving AI detection and guiding advancements in video generation. However, existing studies have not investigated human gaze behavior in response to AI-generated videos of physical scenes. Here, we collect and analyze the eye movements from 40 participants during video understanding and AI detection tasks involving a mix of real-world and AI-generated videos. We find that given the high realism of AI-generated videos, gaze behavior is driven less by the video's actual authenticity and more by the viewer's perception of its authenticity. Our results demonstrate that the mere awareness of potential AI generation may alter media consumption from passive viewing into an active search for anomalies.

Authors:Yoshee Jain, Heejin Do, Zihan Wu, April Yi Wang
Title: Exploring the Role of Tracing in AI-Supported Planning for Algorithmic Reasoning
Abstract:
AI-powered planning tools show promise in supporting programming learners by enabling early, formative feedback on their thinking processes prior to coding. To date, however, most AI-supported planning tools rely on students' natural-language explanations, using LLMs to interpret learners' descriptions of their algorithmic intent. Prior to the emergence of LLM-based systems, CS education research extensively studied trace-based planning in pen-and-paper settings, demonstrating that reasoning through stepwise execution with explicit state transitions helps learners build and refine mental models of program behavior. Despite its potential, little is known about how tracing interacts with AI-mediated feedback and whether integrating tracing into AI-supported planning tools leads to different learning processes or interaction dynamics compared to natural-language-based planning alone. We study how requiring learners to produce explicit execution traces with an AI-supported planning tool affects their algorithmic reasoning. In a between-subjects study with 20 students, tracing shifted learners away from code-like, line-by-line descriptions toward more goal-driven reasoning about program behavior. Moreover, it led to more consistent partially correct solutions, although final coding performance remained comparable across conditions. Notably, tracing did not significantly affect the quality or reliability of LLM-generated feedback. These findings reveal tradeoffs in combining tracing with AI-supported planning and inform design guidelines for integrating natural language, tracing, and coding to support iterative reasoning throughout the programming process.

Authors:Ailin Liu, Yesmine Karoui, Fiona Draxler, Frauke Kreuter, Francesco Chiossi
Title: Sensing What Surveys Miss: Understanding and Personalizing Proactive LLM Support by User Modeling
Abstract:
Difficulty spillover and suboptimal help-seeking challenge the sequential, knowledge-intensive nature of digital tasks. In online surveys, tough questions can drain mental energy and hurt performance on later questions, while users often fail to recognize when they need assistance or may satisfy, lacking motivation to seek help. We developed a proactive, adaptive system using electrodermal activity and mouse movement to predict when respondents need support. Personalized classifiers with a rule-based threshold adaptation trigger timely LLM-based clarifications and explanations. In a within-subjects study (N=32), aligned-adaptive timing was compared to misaligned-adaptive and random-adaptive controls. Aligned-adaptive assistance improved response accuracy by 21%, reduced false negative rates from 50.9% to 22.9%, and improved perceived efficiency, dependability, and benevolence. Properly timed interventions prevent cascades of degraded responses, showing that aligning support with cognitive states improves both the outcomes and the user experience. This enables more effective, personalized LLM-assisted support in survey-based research.

Authors:Suifang Zhou, Ray LC
Title: Eternagram: Inspiring Climate Action Through LLM-based Conversational Exploration of a Post-Devastation Climate Future
Abstract:
Climate action is difficult to persuade because we tend to perceive climate change as remote and disconnected from daily life. Instead of traditional informational engagements, game-based interventions can create narratives that immerse the visitor in situations where their actions have tangible consequences. To make these narratives engaging, we used a speculative scenario of an alien stumbling upon social media to obliquely address climate change through a text-based adventure game installation. Mimicking visitors' natural dialogue in social media apps, we designed an LLM-based chatbot with knowledge of post-climate devastated world that mirrors our own planet Earth. In discovering the world's downfall through interactive chatting and posted images, players begin to realize that their own actions can make a difference on impacts of climate change in this distant world, fostering pro-environmental attitudes. Previously published at CHI, this game installation demonstrates the potential of LLM based creative narratives in exploring speculative worlds driving social change.

Authors:Conrad Borchers, Hannah Deininger, Zachary A. Pardos
Title: Toward Trait-Aware Learning Analytics
Abstract:
Learning analytics (LA) draws from the learning sciences to interpret learner behavior and inform system design. Yet, past personalization remains largely at the content or performance level (during learner-system interactions), overlooking relatively stable individual differences such as personality (unfolding over long-term learning trajectories such as college degrees). The latter could bring underappreciated benefits to the design, implementation, and impact of LA. In this position paper, we conduct an ad hoc literature review and argue for an expanded framing of LA that centers on learner traits as key to both interpreting and designing close-the-loop experiments in LA. We show that personality traits are relevant to LA's central outcomes (e.g., engagement and achievement) and conducive to action, as their established ties to human-computer interaction (HCI) inform how systems time, frame, and personalize support. Drawing inspiration from HCI, where psychometrics inform personalization strategies, we propose that LA can evolve by treating traits not only as predictive features but as design resources and moderators of analytics efficacy. In line with past position papers published at LAK, we present a research agenda grounded in the LA cycle and discuss methodological and ethical challenges.

Authors:He Wang, Ziyu Zhou, Hanxiang Liu
Title: Front-Loaded or Balanced? The Mechanism through Which Review Order Affects Overall Ratings in Premium Service Settings
Abstract:
In the increasingly prevalent landscape of high-quality service contexts, whether consumer evaluation interfaces adopt a rating-first or review-first sequence has become a critical factor shaping rating authenticity and feedback quality. While prior research has primarily examined review content and sentiment, systematic investigation into how evaluation order influences rating outcomes remains limited. Through exploratory analyses, we find that Letterboxd -- which employs a review-first, rating-after mechanism -- exhibits a more centralized rating distribution with fewer extreme scores, whereas Yelp -- which adopts a rating-first, review-after mechanism -- shows a pronounced bimodal distribution with more polarized ratings. Three controlled experiments further demonstrate that in high-quality service contexts, a rating-first (vs. review-first) interface significantly elevates consumers' overall ratings. Mechanism analyses indicate that cognitive effort and affective heuristics serve as dual pathways: a rating-first (vs. review-first) sequence reduces cognitive effort and heightens affective heuristics, thereby increasing rating scores. Moreover, service quality moderates this process. When service quality is low, the rating-first (vs. review-first) sequence instead leads to lower ratings. This research reveals the psychological mechanisms through which evaluation order affects consumer ratings via cognitive and affective pathways. It extends theoretical understanding of online rating formation and offers practical implications for optimizing platform interface design to enhance rating authenticity and credibility.

Authors:Owen Hoffman, Kangze Peng, Sajid Kamal, Zehua You, Sukrit Venkatagiri
Title: ScamPilot: Simulating Conversations with LLMs to Protect Against Online Scams
Abstract:
Fraud continues to proliferate online, from phishing and ransomware to impersonation scams. Yet automated prevention approaches adapt slowly and may not reliably protect users from falling prey to new scams. To better combat online scams, we developed ScamPilot, a conversational interface that inoculates users against scams through simulation, dynamic interaction, and real-time feedback. ScamPilot simulates scams with two large language model-powered agents: a scammer and a target. Users must help the target defend against the scammer by providing real-time advice. Through a between-subjects study (N=150) with one control and three experimental conditions, we find that blending advice-giving with multiple choice questions significantly increased scam recognition (+8%) without decreasing wariness towards legitimate conversations. Users' response efficacy and change in self-efficacy was also 9% and 19% higher, respectively. Qualitatively, we find that users more frequently provided action-oriented advice over urging caution or providing emotional support. Overall, ScamPilot demonstrates the potential for inter-agent conversational user interfaces to augment learning.

Authors:Javier Argota Sánchez-Vaquerizo, Luis Borunda Monsivais
Title: From Particles to Agents: Hallucination as a Metric for Cognitive Friction in Spatial Simulation
Abstract:
Traditional architectural simulations (e.g. Computational Fluid Dynamics, evacuation, structural analysis) model elements as deterministic physics-based "particles" rather than cognitive "agents". To bridge this, we introduce \textbf{Agentic Environmental Simulations}, where Large Multimodal generative models actively predict the next state of spatial environments based on semantic expectation. Drawing on examples from accessibility-oriented AR pipelines and multimodal digital twins, we propose a shift from chronological time-steps to Episodic Spatial Reasoning, where simulations advance through meaningful, surprisal-triggered events. Within this framework we posit AI hallucinations as diagnostic tools. By formalizing the \textbf{Cognitive Friction} ($C_f$) it is possible to reveal "Phantom Affordances", i.e. semiotic ambiguities in built space. Finally, we challenge current HCI paradigms by treating environments as dynamic cognitive partners and propose a human-centered framework of cognitive orchestration for designing AI-driven simulations that preserve autonomy, affective clarity, and cognitive integrity.

Authors:Deeksha M. Shama, Dimitra Emmanouilidou, Ivan J. Tashev
Title: Cognitive Load Estimation Using Brain Foundation Models and Interpretability for BCIs
Abstract:
Accurately monitoring cognitive load in real time is critical for Brain-Computer Interfaces (BCIs) that adapt to user engagement and support personalized learning. Electroencephalography (EEG) offers a non-invasive, cost-effective modality for capturing neural activity, though traditional methods often struggle with cross-subject variability and task-specific preprocessing. We propose leveraging Brain Foundation Models (BFMs), large pre-trained neural networks, to extract generalizable EEG features for cognitive load estimation. We adapt BFMs for long-term EEG monitoring and show that fine-tuning a small subset of layers yields improved accuracy over the state-of-the-art. Despite their scale, BFMs allow for real-time inference with a longer context window. To address often-overlooked interpretability challenges, we apply Partition SHAP (SHapley Additive exPlanations) to quantify feature importance. Our findings reveal consistent emphasis on prefrontal regions linked to cognitive control, while longitudinal trends suggest learning progression. These results position BFMs as efficient and interpretable tools for continuous cognitive load monitoring in real-world BCIs.

Authors:Bowen Zhou, Marc-André Fiedler, Ayoub Al-Hamadi
Title: CAF-Mamba: Mamba-Based Cross-Modal Adaptive Attention Fusion for Multimodal Depression Detection
Abstract:
Depression is a prevalent mental health disorder that severely impairs daily functioning and quality of life. While recent deep learning approaches for depression detection have shown promise, most rely on limited feature types, overlook explicit cross-modal interactions, and employ simple concatenation or static weighting for fusion. To overcome these limitations, we propose CAF-Mamba, a novel Mamba-based cross-modal adaptive attention fusion framework. CAF-Mamba not only captures cross-modal interactions explicitly and implicitly, but also dynamically adjusts modality contributions through a modality-wise attention mechanism, enabling more effective multimodal fusion. Experiments on two in-the-wild benchmark datasets, LMVD and D-Vlog, demonstrate that CAF-Mamba consistently outperforms existing methods and achieves state-of-the-art performance.

Authors:Fabian Albers, Sebastian Strauß, Nikol Rummel, Nils Köbis
Title: Are they just delegating? Cross-Sample Predictions on University Students' & Teachers' Use of AI
Abstract:
Mutual trust between teachers and students is a prerequisite for effective teaching, learning, and assessment in higher education. Accurate predictions about the other group's use of generative artificial intelligence (AI) are fundamental for such trust. However, the disruptive rise of AI has transformed academic work practices, raising important questions about how teachers and students use these tools and how well they can estimate each other's usage. While the frequency of use is well studied, little is known about how AI is used, and comparisons with similar practices are rare. This study surveyed German university teachers (N = 113) and students (N = 123) on the frequency of AI use and the degree of delegation across six identical academic tasks. Participants also provided incentivized cross-sample predictions of the other group's AI use to assess the accuracy of their predictions. We find that students reported higher use of AI and greater delegation than teachers. Both groups significantly overestimated the other group's use, with teachers predicting very frequent use and high delegation by students, and students assuming teachers use AI similarly to themselves. These findings reveal a perception gap between teachers' and students' expectations and actual AI use. Such gaps may hinder trust and effective collaboration, underscoring the need for open dialogue about AI practices in academia and for policies that support the equitable and transparent integration of AI tools in higher education.

Authors:Hassam Tahir, Faizan Faisal, Fady Alnajjar, Muhammad Imran Taj, Lucia Gordon, Aila Khan, Michael Lwin, Omar Mubin
Title: Dynamic Framework for Collaborative Learning: Leveraging Advanced LLM with Adaptive Feedback Mechanisms
Abstract:
This paper presents a framework for integrating LLM into collaborative learning platforms to enhance student engagement, critical thinking, and inclusivity. The framework employs advanced LLMs as dynamic moderators to facilitate real-time discussions and adapt to learners' evolving needs, ensuring diverse and inclusive educational experiences. Key innovations include robust feedback mechanisms that refine AI moderation, promote reflective learning, and balance participation among users. The system's modular architecture featuring ReactJS for the frontend, Flask for backend operations, and efficient question retrieval supports personalized and engaging interactions through dynamic adjustments to prompts and discussion flows. Testing demonstrates that the framework significantly improves student collaboration, fosters deeper comprehension, and scales effectively across various subjects and user groups. By addressing limitations in static moderation and personalization in existing systems, this work establishes a strong foundation for next-generation AI-driven educational tools, advancing equitable and impactful learning outcomes.

Authors:Haoming Huang, Pongchai Jaisri, Shota Shimizu, Lingfeng Chen, Sota Nakashima, Gema Rodríguez-Pérez
Title: More Code, Less Reuse: Investigating Code Quality and Reviewer Sentiment towards AI-generated Pull Requests
Abstract:
Large Language Model (LLM) Agents are advancing quickly, with the increasing leveraging of LLM Agents to assist in development tasks such as code generation. While LLM Agents accelerate code generation, studies indicate they may introduce adverse effects on development. However, existing metrics solely measure pass rates, failing to reflect impacts on long-term maintainability and readability, and failing to capture human intuitive evaluations of PR. To increase the comprehensiveness of this problem, we investigate and evaluate the characteristics of LLM to know the pull requests' characteristics beyond the pass rate. We observe the code quality and maintainability within PRs based on code metrics to evaluate objective characteristics and developers' reactions to the pull requests from both humans and LLM's generation. Evaluation results indicate that LLM Agents frequently disregard code reuse opportunities, resulting in higher levels of redundancy compared to human developers. In contrast to the quality issues, our emotions analysis reveals that reviewers tend to express more neutral or positive emotions towards AI-generated contributions than human ones. This disconnect suggests that the surface-level plausibility of AI code masks redundancy, leading to the silent accumulation of technical debt in real-world development environments. Our research provides insights for improving human-AI collaboration.

Authors:Po-Hsun Chen, Ivan C. H. Liu
Title: Optimization and Mobile Deployment for Anthropocene Neural Style Transfer
Abstract:
This paper presents AnthropoCam, a mobile-based neural style transfer (NST) system optimized for the visual synthesis of Anthropocene environments. Unlike conventional artistic NST, which prioritizes painterly abstraction, stylizing human-altered landscapes demands a careful balance between amplifying material textures and preserving semantic legibility. Industrial infrastructures, waste accumulations, and modified ecosystems contain dense, repetitive patterns that are visually expressive yet highly susceptible to semantic erosion under aggressive style transfer. To address this challenge, we systematically investigate the impact of NST parameter configurations on the visual translation of Anthropocene textures, including feature layer selection, style and content loss weighting, training stability, and output resolution. Through controlled experiments, we identify an optimal parameter manifold that maximizes stylistic expression while preventing semantic erasure. Our results demonstrate that appropriate combinations of convolutional depth, loss ratios, and resolution scaling enable the faithful transformation of anthropogenic material properties into a coherent visual language. Building on these findings, we implement a low-latency, feed-forward NST pipeline deployed on mobile devices. The system integrates a React Native frontend with a Flask-based GPU backend, achieving high-resolution inference within 3-5 seconds on general mobile hardware. This enables real-time, in-situ visual intervention at the site of image capture, supporting participatory engagement with Anthropocene landscapes. By coupling domain-specific NST optimization with mobile deployment, AnthropoCam reframes neural style transfer as a practical and expressive tool for real-time environmental visualization in the Anthropocene.

Authors:Xuyi Hu, Ke Ma, Siwei Liu, Per Ola Kristensson, Stephan Goetz
Title: A Multi-Camera Optical Tag Neuronavigation and AR Augmentation Framework for Non-Invasive Brain Stimulation
Abstract:
Accurate neuronavigation is essential for generating the intended effect with transcranial magnetic stimulation (TMS). Precise coil placement also directly influences stimulation efficacy. Traditional neuronavigation systems often rely on costly and still hard to use and error-prone tracking systems. To solve these limitations, we present a computer-vision-based neuronavigation system for real-time tracking of patient and TMS instrumentation. The system can feed the necessary data for a digital twin to track TMS stimulation targets. We integrate a self-coordinating optical tracking system with multiple consumer-grade cameras and visible tags with a dynamic 3D brain model in Unity. This model updates in real time to represent the current stimulation coil position and the estimated stimulation point to intuitively visualize neural targets for clinicians. We incorporate an augmented reality (AR) module to bridge the gap between the visualization of the digital twin and the real world and project the brain model in real-time onto the head of a patient. AR headsets or mobile AR devices allow clinicians to interactively view and adjust the placement of the stimulation transducer intuitively instead of guidance through abstract numbers and 6D cross hairs on an external screen. The proposed technique provides improved spatial precision as well as accuracy. A case study with ten participants with a medical background also demonstrates that the system has high usability.

Authors:Boyu Li, Lin-Ping Yuan, Zeyu Wang, Hongbo Fu
Title: SketchDynamics: Exploring Free-Form Sketches for Dynamic Intent Expression in Animation Generation
Abstract:
Sketching provides an intuitive way to convey dynamic intent in animation authoring (i.e., how elements change over time and space), making it a natural medium for automatic content creation. Yet existing approaches often constrain sketches to fixed command tokens or predefined visual forms, overlooking their freeform nature and the central role of humans in shaping intention. To address this, we introduce an interaction paradigm where users convey dynamic intent to a vision-language model via free-form sketching, instantiated here in a sketch storyboard to motion graphics workflow. We implement an interface and improve it through a three-stage study with 24 participants. The study shows how sketches convey motion with minimal input, how their inherent ambiguity requires users to be involved for clarification, and how sketches can visually guide video refinement. Our findings reveal the potential of sketch and AI interaction to bridge the gap between intention and outcome, and demonstrate its applicability to 3D animation and video generation.

Authors:Nico Mutzner, Taha Yasseri, Heiko Rauhut
Title: Normative Equivalence in Human-AI Cooperation: Behaviour, Not Identity, Drives Cooperation in Mixed-Agent Groups
Abstract:
The introduction of artificial intelligence (AI) agents into human group settings raises essential questions about how these novel participants influence cooperative social norms. While previous studies on human-AI cooperation have primarily focused on dyadic interactions, little is known about how integrating AI agents affects the emergence and maintenance of cooperative norms in small groups. This study addresses this gap through an online experiment using a repeated four-player Public Goods Game (PGG). Each group consisted of three human participants and one bot, which was framed either as human or AI and followed one of three predefined decision strategies: unconditional cooperation, conditional cooperation, or free-riding. In our sample of 236 participants, we found that reciprocal group dynamics and behavioural inertia primarily drove cooperation. These normative mechanisms operated identically across conditions, resulting in cooperation levels that did not differ significantly between human and AI labels. Furthermore, we found no evidence of differences in norm persistence in a follow-up Prisoner's Dilemma, or in participants' normative perceptions. Participants' behaviour followed the same normative logic across human and AI conditions, indicating that cooperation depended on group behaviour rather than partner identity. This supports a pattern of normative equivalence, in which the mechanisms that sustain cooperation function similarly in mixed human-AI and all human groups. These findings suggest that cooperative norms are flexible enough to extend to artificial agents, blurring the boundary between humans and AI in collective decision-making.

Authors:Judy Hanwen Shen, Alex Tamkin
Title: How AI Impacts Skill Formation
Abstract:
AI assistance produces significant productivity gains across professional domains, particularly for novice workers. Yet how this assistance affects the development of skills required to effectively supervise AI remains unclear. Novice workers who rely heavily on AI to complete unfamiliar tasks may compromise their own skill acquisition in the process. We conduct randomized experiments to study how developers gained mastery of a new asynchronous programming library with and without the assistance of AI. We find that AI use impairs conceptual understanding, code reading, and debugging abilities, without delivering significant efficiency gains on average. Participants who fully delegated coding tasks showed some productivity improvements, but at the cost of learning the library. We identify six distinct AI interaction patterns, three of which involve cognitive engagement and preserve learning outcomes even when participants receive AI assistance. Our findings suggest that AI-enhanced productivity is not a shortcut to competence and AI assistance should be carefully adopted into workflows to preserve skill formation -- particularly in safety-critical domains.

Authors:Huichao Men, Yizhen Hu, Yu Gao, Xiaofeng Mou, Yi Xu, Xinhua Xiao
Title: An Autonomous Agent Framework for Feature-Label Extraction from Device Dialogues and Automatic Multi-Dimensional Device Hosting Planning Based on Large Language Models
Abstract:
With the deep integration of artificial intelligence and smart home technologies, the intelligent transformation of traditional household appliances has become an inevitable trend. This paper presents AirAgent--an LLM-driven autonomous agent framework designed for home air systems. Leveraging a voice-based dialogue interface, AirAgent autonomously and personally manages indoor air quality through comprehensive perception, reasoning, and control. The framework innovatively adopts a two-layer cooperative architecture: Memory-Based Tag Extraction and Reasoning-Driven Planning. First, a dynamic memory tag extraction module continuously updates personalized user profiles. Second, a reasoning-planning model integrates real-time environmental sensor data, user states, and domain-specific prior knowledge (e.g., public health guidelines) to generate context-aware decisions. To support both interpretability and execution, we design a semi-streaming output mechanism that uses special tokens to segment the model's output stream in real time, simultaneously producing human-readable Chain-of-Thought explanations and structured, device-executable control commands. The system handles planning across 25 distinct complex dimensions while satisfying more than 20 customized constraints. As a result, AirAgent endows home air systems with proactive perception, service, and orchestration capabilities, enabling seamless, precise, and personalized air management responsive to dynamic indoor and outdoor conditions. Experimental results demonstrate up to 94.9 percent accuracy and more than 20 percent improvement in user experience metrics compared to competing commercial solutions.

Authors:Ruyuan Wan, Changye Li, Ting-Hao 'Kenneth' Huang
Title: "Newspaper Eat" Means "Not Tasty": A Taxonomy and Benchmark for Coded Languages in Real-World Chinese Online Reviews
Abstract:
Coded language is an important part of human communication. It refers to cases where users intentionally encode meaning so that the surface text differs from the intended meaning and must be decoded to be understood. Current language models handle coded language poorly. Progress has been limited by the lack of real-world datasets and clear taxonomies. This paper introduces CodedLang, a dataset of 7,744 Chinese Google Maps reviews, including 900 reviews with span-level annotations of coded language. We developed a seven-class taxonomy that captures common encoding strategies, including phonetic, orthographic, and cross-lingual substitutions. We benchmarked language models on coded language detection, classification, and review rating prediction. Results show that even strong models can fail to identify or understand coded language. Because many coded expressions rely on pronunciation-based strategies, we further conducted a phonetic analysis of coded and decoded forms. Together, our results highlight coded language as an important and underexplored challenge for real-world NLP systems.

Authors:Luisa Jansen, Tim Ulmann, Robine Jordi, Malte Elson
Title: Putting Privacy to the Test: Introducing Red Teaming for Research Data Anonymization
Abstract:
Recently, the data protection practices of researchers in human-computer interaction and elsewhere have gained attention. Initial results suggest that researchers struggle with anonymization, partly due to a lack of clear, actionable guidance. In this work, we propose simulating re-identification attacks using the approach of red teaming versus blue teaming: a technique commonly employed in security testing, where one team tries to re-identify data, and the other team tries to prevent it. We discuss our experience applying this method to data collected in a mixed-methods study in human-centered privacy. We present usable materials for researchers to apply red teaming when anonymizing and publishing their studies' data.

Authors:Lekshmi Murali Rani, Richard Berntsson Svensson, Robert Feldt
Title: Bridging the Socio-Emotional Gap: The Functional Dimension of Human-AI Collaboration for Software Engineering
Abstract:
As GenAI models are adopted to support software engineers and their development teams, understanding effective human-AI collaboration (HAIC) is increasingly important. Socio-emotional intelligence (SEI) enhances collaboration among human teammates, but its role in HAIC remains unclear. Current AI systems lack SEI capabilities that humans bring to teamwork, creating a potential gap in collaborative dynamics. In this study, we investigate how software practitioners perceive the socio-emotional gap in HAIC and what capabilities AI systems require for effective collaboration. Through semi-structured interviews with 10 practitioners, we examine how they think about collaborating with human versus AI teammates, focusing on their SEI expectations and the AI capabilities they envision. Results indicate that practitioners currently view AI models as intellectual teammates rather than social partners and expect fewer SEI attributes from them than from human teammates. However, they see the socio-emotional gap not as AIs failure to exhibit SEI traits, but as a functional gap in collaborative capabilities (AIs inability to negotiate responsibilities, adapt contextually, or maintain sustained partnerships). We introduce the concept of functional equivalents: technical capabilities (internal cognition, contextual intelligence, adaptive learning, and collaborative intelligence) that achieve collaborative outcomes comparable to human SEI attributes. Our findings suggest that effective collaboration with AI for SE tasks may benefit from functional design rather than replicating human SEI traits for SE tasks, thereby redefining collaboration as functional alignment.

Authors:DaeHo Lee, Ryo Suzuki, Jin-Hyuk Hong
Title: HumanoidTurk: Expanding VR Haptics with Humanoids for Driving Simulations
Abstract:
We explore how humanoid robots can be repurposed as haptic media, extending beyond their conventional role as social, assistive, collaborative agents. To illustrate this approach, we implemented HumanoidTurk, taking a first step toward a humanoid-based haptic system that translates in-game g-force signals into synchronized motion feedback in VR driving. A pilot study involving six participants compared two synthesis methods, leading us to adopt a filter-based approach for smoother and more realistic feedback. A subsequent study with sixteen participants evaluated four conditions: no-feedback, controller, humanoid+controller, and human+controller. Results showed that humanoid feedback enhanced immersion, realism, and enjoyment, while introducing moderate costs in terms of comfort and simulation sickness. Interviews further highlighted the robot's consistency and predictability in contrast to the adaptability of human feedback. From these findings, we identify fidelity, adaptability, and versatility as emerging themes, positioning humanoids as a distinct haptic modality for immersive VR.

Authors:Onyedikachi Hope Amaechi-Okorie, Branislav Radeljic
Title: Unheard in the Digital Age: Rethinking AI Bias and Speech Diversity
Abstract:
Speech remains one of the most visible yet overlooked vectors of inclusion and exclusion in contemporary society. While fluency is often equated with credibility and competence, individuals with atypical speech patterns are routinely marginalized. Given the current state of the debate, this article focuses on the structural biases that shape perceptions of atypical speech and are now being encoded into artificial intelligence. Automated speech recognition (ASR) systems and voice interfaces, trained predominantly on standardized speech, routinely fail to recognize or respond to diverse voices, compounding digital exclusion. As AI technologies increasingly mediate access to opportunity, the study calls for inclusive technological design, anti-bias training to minimize the impact of discriminatory algorithmic decisions, and enforceable policy reform that explicitly recognize speech diversity as a matter of equity, not merely accessibility. Drawing on interdisciplinary research, the article advocates for a cultural and institutional shift in how we value voice, urging co-created solutions that elevate the rights, representation, and realities of atypical speakers in the digital age. Ultimately, the article reframes speech inclusion as a matter of equity (not accommodation) and advocates for co-created AI systems that reflect the full spectrum of human voices.

Authors:Shaozhang Dai, Kadek Ananta Satriadi, Jim Smiley, Barrett Ens, Lonni Besançon, Tim Dwyer
Title: MarioChart: Autonomous Tangibles as Active Proxy Interfaces for Embodied Casual Data Exploration
Abstract:
We introduce the notion of an Active Proxy interface, i.e. tangible models as proxies for physical data referents, supporting interactive exploration of data through active manipulation. We realise an active proxy data visualisation system, "MarioChart", using robot carts relocating themselves on a tabletop, e.g., to align with their data referents in a map or other visual layout. We consider a casual-data exploration scenario involving a multivariate campus sustainability dataset, using scale models as proxies for their physical building data referents. Our empirical study (n=12) compares active proxy use with conventional tablet interaction, finding that our active proxy system enhances short-term spatial memory of data and enables faster completion of certain data analytic tasks. It shows no significant differences compared to traditional touch-screens in long-term memory, physical fatigue, mental workload, or user engagement. Our study offers an initial baseline for active proxy techniques and advances understanding of tangible interfaces in situated data visualisation.

Authors:Mohammad Hadi Nezhad, Francisco Enrique Vicente Castro, Ivon Arroyo
Title: Understanding Users' Privacy Reasoning and Behaviors During Chatbot Use to Support Meaningful Agency in Privacy
Abstract:
Conversational agents (CAs) (e.g., chatbots) are increasingly used in settings where users disclose sensitive information, raising significant privacy concerns. Because privacy judgments are highly contextual, supporting users to engage in privacy-protective actions during chatbot interactions is essential. However, enabling meaningful engagement requires a deeper understanding of how users currently reason about and manage sensitive information during realistic chatbot use scenarios. To investigate this, we qualitatively examined computer science (undergraduate and masters) students' in-the-moment disclosure and protection behaviors, as well as the reasoning underlying these behaviors, across a range of realistic chatbot tasks. Participants used a simulated ChatGPT interface with and without a privacy notice panel that intercepts message submissions, highlights potentially sensitive information, and offers privacy protective actions. The panel supports anonymization through retracting, faking, and generalizing, and surfaces two of ChatGPT's built-in privacy controls to improve their discoverability. Drawing on interaction logs, think-alouds, and survey responses, we analyzed how the panel fostered privacy awareness, encouraged protective actions, and supported context-specific reasoning about what information to protect and how. We further discuss design opportunities for tools that provide users greater and more meaningful agency in protecting sensitive information during CA interactions.

Authors:Ailin Liu, Francesco Chiossi, Felix Henninger, Lisa Bondo Andersen, Tobias Wistuba, Sonja Greven, Frauke Kreuter, Fiona Draxler
Title: Physiological and Behavioral Modeling of Stress and Cognitive Load in Web-Based Question Answering
Abstract:
Time pressure and question difficulty can trigger stress and cognitive overload in web-based surveys, compromising data quality and user experience. Most stress detection methods are based on low-resolution self-reports, which are poorly suited for capturing fast, moment-to-moment changes during short online tasks. Addressing this gap, we conducted a 2x2 within-subjects study (N = 29), manipulating question difficulty and time pressure in a web-based multiple-choice task. Participants completed general knowledge and cognitive questions while we collected multimodal data: mouse dynamics, eye tracking, electrocardiogram, and electrodermal activity. Using condition-based and self-reported labels, we used statistical and machine learning models to model stress and question difficulty. Our results show distinct physiological and behavioral patterns within very short timeframes. This work demonstrates the feasibility of rapidly detecting cognitive-affective states in digital environments, paving the way for more adaptive, ethical, and user-aware survey interfaces.

Authors:Javier Crespo, Ana Enériz, Paula Iruzubieta, Fernando Carballo, Conrado Fernández Rodríguez, María Dolores Martín-Arranz, Federico Argüelles-Arias, Juan Turnes
Title: Artificial Intelligence in Spanish Gastroenterology: high expectations, limited integration. A national survey
Abstract:
Background: Artificial intelligence (AI) has emerged as a disruptive innovation in medicine, yet its adoption within gastroenterology remains limited and poorly characterized. We aimed to examine knowledge, practical applications, perceived barriers, and expectations regarding AI among gastroenterology specialists in Spain. Methods: We conducted a cross-sectional observational study using a structured online survey distributed by the Spanish Society of Digestive Pathology (SEPD) in 2025. The questionnaire collected sociodemographic data, patterns of AI use, perceptions, and educational needs. Descriptive statistics and multivariable models were applied. Results: Among 283 respondents (mean age 44.6 +/- 9.7 years), 87.5% acknowledged AI as a transformative tool, but only 60.2% (95% CI: 54.3-66.1%) reported using it, mostly outside institutional frameworks. Notably, 80.2% of users initiated AI use within the past year. Independent predictors of frequent use included previous training (OR=2.44), employment in university hospitals (OR=2.14), and younger age (OR=1.36 per 5-year decrease). Main barriers were lack of training (61%), absence of institutional strategies (46%), and ethical concerns (50%). While 93.8% agreed that AI training programmes are necessary, only 18.4% had received formal training. Conclusions: A substantial gap exists between the favorable perception of AI and its actual integration into clinical practice within Spanish gastroenterology. The rapid adoption outside institutional frameworks underscores the urgent need for accredited training programmes and governance standards led by scientific societies.

Authors:Lin Kyi, Paul Gölz, Robin Berjon, Asia Biega
Title: From Clicks to Consensus: Collective Consent Assemblies for Data Governance
Abstract:
Obtaining meaningful and informed consent from users is essential for ensuring autonomy and control over one's data. Notice and consent, the standard for collecting consent, has been criticized. While other individualized solutions have been proposed, this paper argues that a collective approach to consent is worth exploring. First, individual consent is not always feasible to collect for all data collection scenarios. Second, harms resulting from data processing are often communal in nature, given the interconnected nature of some data. Finally, ensuring truly informed consent for every individual has proven impractical. We propose collective consent, operationalized through consent assemblies, as one alternative framework. We establish collective consent's theoretical foundations and use speculative design to envision consent assemblies leveraging deliberative mini-publics. We present two vignettes: i) replacing notice and consent, and ii) collecting consent for GenAI model training. Our paper employs future backcasting to identify the requirements for realizing collective consent and explores its potential applications in contexts where individual consent is infeasible.

Authors:Steffen Holter, Eunyee Koh, Mustafa Doga Dogan, Gromit Yeuk-Yin Chan
Title: UXCascade: Scalable Usability Testing with Simulated User Agents
Abstract:
Simulated user agents are increasingly used in usability testing to support fast, iterative UX workflows, as they generate rich data such as action logs and think-aloud reasoning, but the unstructured nature of this output often obscures actionable insights. We present UXCascade, an interactive tool for extracting, aggregating, and presenting agent-generated usability feedback at scale. Our core contribution is a multi-level analysis workflow that (1) highlights patterns across persona traits, goals, and outcomes, (2) links agent reasoning to specific issues, and (3) supports actionable design improvements. UXCascade operationalizes this approach by listing agent goals, traits, and issues in a structured overview. Practitioners can explore detailed reasoning traces and annotated views, propose interface edits, and assess their impact across personas. This enables a top-down, exploration-driven analysis from patterns to concrete UX interventions. A user study with eight UX professionals demonstrates that UXCascade integrates into existing workflows, enabling iterative feedback during early-stage interface development.

Authors:Jana Franceska Funke, Ria Matapurkar, Enrico Rukzio, Teresa Hirzle
Title: Shape of You: Implications of Social Context and Avatar Body Shape on Relatedness, Emotions, and Performance in a Virtual Reality Workout
Abstract:
It is obvious that emotions are causal variables of motivation, as they elicit states, forces and energies that trigger and guide labor behavior. Thus, a motivational tension that is not informed by needs alone, but also by emotions, intention, goals and means to achieve them is therefore generated within the mental, emotional and physical plane. Based on Montserrat's opinion (2004: 131), that "to motivate means, above all, to move and to transmit an emotion", we will undertake to identify the mutual influences between emotions and motivation. The main objectives of this article are to display a summary of the theories and definitions about emotions and to explore the links between emotions and motivation. Although interconnected, emotions and motivation can be contemplated from a double perspective: (1) emotions influence motivation and (2) motivation influences emotions. Moreover, we will consider motivation from three dimensions: (1) cognitive, (2) affective and (3) volitional. The ultimate purpose of this article is to issue a warning as to the importance of the emotional side of motivation. An important part in implementing such insight is to be played by managers (and by employees, also), who should develop the skills and know-how needed to keep a well-balanced emotional climate that effectively favors the maximization of individual and group motivation at the workplace.

Authors:Jiaxin Xu, Chao Zhang, Raymond H. Cuijpers, Wijnand A. IJsselsteijn
Title: Designing Persuasive Social Robots for Health Behavior Change: A Systematic Review of Behavior Change Strategies and Evaluation Methods
Abstract:
Social robots are increasingly applied as health behavior change interventions, yet actionable knowledge to guide their design and evaluation remains limited. This systematic review synthesizes (1) the behavior change strategies used in existing HRI studies employing social robots to promote health behavior change, and (2) the evaluation methods applied to assess behavior change outcomes. Relevant literature was identified through systematic database searches and hand searches. Analysis of 39 studies revealed four overarching categories of behavior change strategies: coaching strategies, counseling strategies, social influence strategies, and persuasion-enhancing strategies. These strategies highlight the unique affordances of social robots as behavior change interventions and offer valuable design heuristics. The review also identified key characteristics of current evaluation practices, including study designs, settings, durations, and outcome measures, on the basis of which we propose several directions for future HRI research.

Authors:Elif Uskuplu, Lawrence S. Moss, Valeria de Paiva
Title: KnowTeX: Visualizing Mathematical Dependencies
Abstract:
Mathematical knowledge exists in many forms, ranging from informal textbooks and lecture notes to large formal proof libraries, yet moving between these representations remains difficult. Informal texts hide dependencies, while formal systems expose every detail in ways that are not always human-readable. Dependency graphs offer a middle ground by making visible the structure of results, definitions, and proofs. We present KnowTeX, a standalone, user-friendly tool that extends the ideas of Lean's Blueprints, enabling the visualization of conceptual dependencies directly from LaTeX sources. Using a simple "uses" command, KnowTeX extracts relationships among statements and generates previewable graphs in DOT and TikZ formats. Applied to mathematical texts, such graphs clarify core results, support education and formalization, and provide a resource for aligning informal and formal mathematical representations. We argue that dependency graphs should become a standard feature of mathematical writing, benefiting both human readers and automated systems.

Authors:Kevin Tseng, Juan Carlos Toledano, Bart De Clerck, Yuliia Dukach, Phil Tinn
Title: An Agentic Operationalization of DISARM for FIMI Investigation on Social Media
Abstract:
The interoperability of data and intelligence across allied partners and their respective end-user groups is considered a foundational enabler to the collective defense capability--both conventional and hybrid--of NATO countries. Foreign Information Manipulation and Interference (FIMI) and related hybrid activities are conducted across various societal dimensions and infospheres, posing an ever greater challenge to the characterization of threats, sustaining situational awareness, and response coordination. Recent advances in AI have further led to the decreasing cost of AI-augmented trolling and interference activities, such as through the generation and amplification of manipulative content. Despite the introduction of the DISARM framework as a standardized metadata and analytical framework for FIMI, operationalizing it at the scale of social media remains a challenge. We propose a framework-agnostic agent-based operationalization of DISARM to investigate FIMI on social media. We develop a multi-agent pipeline in which specialized agentic AI components collaboratively (1) detect candidate manipulative behaviors, and (2) map these behaviors onto standard DISARM taxonomies in a transparent manner. We evaluated the approach on two real-world datasets annotated by domain practitioners. We demonstrate that our approach is effective in scaling the predominantly manual and heavily interpretive work of FIMI analysis, providing a direct contribution to enhancing the situational awareness and data interoperability in the context of operating in media and information-rich settings.

Authors:James Brock, Ce Zhang, Nantheera Anantrasirichai
Title: Forest-Chat: Adapting Vision-Language Agents for Interactive Forest Change Analysis
Abstract:
The increasing availability of high-resolution satellite imagery, together with advances in deep learning, creates new opportunities for enhancing forest monitoring workflows. Two central challenges in this domain are pixel-level change detection and semantic change interpretation, particularly for complex forest dynamics. While large language models (LLMs) are increasingly adopted for data exploration, their integration with vision-language models (VLMs) for remote sensing image change interpretation (RSICI) remains underexplored, especially beyond urban environments. We introduce Forest-Chat, an LLM-driven agent designed for integrated forest change analysis. The proposed framework enables natural language querying and supports multiple RSICI tasks, including change detection, change captioning, object counting, deforestation percentage estimation, and change reasoning. Forest-Chat builds upon a multi-level change interpretation (MCI) vision-language backbone with LLM-based orchestration, and incorporates zero-shot change detection via a foundation change detection model together with an interactive point-prompt interface to support fine-grained user guidance. To facilitate adaptation and evaluation in forest environments, we introduce the Forest-Change dataset, comprising bi-temporal satellite imagery, pixel-level change masks, and multi-granularity semantic change captions generated through a combination of human annotation and rule-based methods. Experimental results demonstrate that Forest-Chat achieves strong performance on Forest-Change and on LEVIR-MCI-Trees, a tree-focused subset of LEVIR-MCI, for joint change detection and captioning, highlighting the potential of interactive, LLM-driven RSICI systems to improve accessibility, interpretability, and analytical efficiency in forest change analysis.

Authors:C. Estelle Smith, Alemitu Bezabih, Shadi Nourriz, Jesan Ahammed Ovi
Title: SPIRIT: A Design Framework To Support Technology Interventions for Spiritual Care Within and Beyond the Clinic
Abstract:
Despite its importance for well-being, spiritual care remains under-explored in HCI, while the adoption of technology in clinical spiritual care lags behind other healthcare fields. Prior work derived a definition of "spiritual support" through co-design workshops with stakeholders in online health communities. This paper contributes: (1) a revision of that definition through member checking with professional spiritual care providers (SCPs); (2) a novel design framework -- SPIRIT -- which can help to expand models of delivery for spiritual care using digital technologies. Through re-analysis of previous data and new interviews with SCPs, we identify three prerequisites for meaningful spiritual care: openness to care, safe space, and the ability to discern and articulate spiritual needs. We also propose six design dimensions: loving presence, meaning-making, appropriate degree of technology use, location, degree of relational closeness, and temporality. We discuss how SPIRIT offers guidance for designing impactful digital spiritual care intervention systems within and beyond clinical settings.

Authors:Amy Koike, Yuki Okafuji, Sichao Song
Title: Practical Insights into Designing Context-Aware Robot Voice Parameters in the Wild
Abstract:
Voice is an essential modality for human-robot interaction (HRI). The way a robot sounds plays a central role in shaping how humans perceive and engage with it, influencing factors such as intelligibility, understandability, and likability. Although prior work has examined voice design, most studies occur in controlled labs, leaving uncertainty about how results translate to real-world settings. To address this gap, we conducted two naturalistic deployment studies with a guidance robot in a shopping mall: (1) in-depth interviews with six participants, and (2) an eight-day field deployment using a 3x3 design varying speech rate and volume, yielding 725 survey responses. Our results show how real-world context shapes voice perception and inform adaptive, context-aware voice design for social robots in public spaces.

Authors:Caitlin Morris, Pattie Maes
Title: When Peers Outperform AI (and When They Don't): Interaction Quality Over Modality
Abstract:
As AI increasingly enters the classroom, what changes when students collaborate with algorithms instead of peers? We analyzed 36 undergraduate students learning graph theory through peer collaboration (n=24) or AI assistance (n=12), using discourse analysis to identify interaction patterns shaping learning outcomes. Results reveal a collaboration quality divide: high-quality peer interactions generated curiosity and engagement that AI couldn't match, yet low-quality peer interactions performed worse than AI across dimensions. AI showed a paradoxical pattern, building confidence in knowledge while reducing curiosity and deeper engagement. Interaction quality emerged from dynamic patterns rather than individual traits, with early discourse markers predicting outcomes. Students treated AI as a transactional information source despite its collaborative design, revealing fundamental differences in human versus algorithmic engagement. Our findings suggest AI in education need not replace peer learning but can recognize struggle and support both peer and AI interactions toward productive learning experiences.

Authors:Aisvarya Adeseye, Jouni Isoaho, Seppo Virtanen, Mohammad Tahir
Title: Modular AI-Powered Interviewer with Dynamic Question Generation and Expertise Profiling
Abstract:
Automated interviewers and chatbots are common in research, recruitment, customer service, and education. Many existing systems use fixed question lists, strict rules, and limited personalization, leading to repeated conversations that cause low engagement. Therefore, these tools are not effective for complex qualitative research, which requires flexibility, context awareness, and ethical sensitivity. Consequently, there is a need for a more adaptive and context-aware interviewing system. To address this, an AI-powered interviewer that dynamically generates questions that are contextually appropriate and expertise aligned is presented in this study. The interviewer is built on a locally hosted large language model (LLM) that generates coherent dialogue while preserving data privacy. The interviewer profiles the participants' expertise in real time to generate knowledge-appropriate questions, well-articulated responses, and smooth transition messages similar to human-like interviews. To implement these functionalities, a modular prompt engineering pipeline was designed to ensure that the interview conversation remains scalable, adaptive, and semantically rich. To evaluate the AI-powered interviewer, it was tested with various participants, and it achieved high satisfaction (mean 4.45) and engagement (mean 4.33). The proposed interviewer is a scalable, privacy-conscious solution that advances AI-assisted qualitative data collection.

Authors:Pratik Mishra, Caner Gözübüyük, Seema Nagar, Prateeti Mohapatra, Raya Wittich, Arthur de Magalhaes
Title: NOVAID: Natural-language Observability Visualization Assistant for ITOps Dashboard Widget Generation
Abstract:
Manual creation of IT monitoring dashboard widgets is slow, error-prone, and a barrier for both novice and expert users. We present NOVAID, an interactive chatbot that leverages Large Language Models (LLMs) to generate IT monitoring widgets directly from natural language queries. Unlike general natural language-to-visualization tools, NOVAID addresses IT operations-specific challenges: specialized widget types like SLO charts, dynamic API-driven data retrieval, and complex contextual filters. The system combines a domain-aware semantic parser, fuzzy entity matching, and schema completion to produce standardized widget JSON specifications. An interactive clarification loop ensures accuracy in underspecified queries. On a curated dataset of 271 realistic queries, NOVAID achieves promising accuracy (up to 94.10% in metric extraction) across multiple LLMs. A user study with IT engineers yielded a System Usability Scale score of 74.2 for NOVAID, indicating good usability. By bridging natural language intent with operational dashboards, NOVAID demonstrates clear potential and a path for deployment in enterprise ITOps monitoring platforms.

Authors:Laura Ferrarotti, Gian Maria Campedelli, Roberto Dessì, Andrea Baronchelli, Giovanni Iacca, Kathleen M. Carley, Alex Pentland, Joel Z. Leibo, James Evans, Bruno Lepri
Title: Generative AI collective behavior needs an interactionist paradigm
Abstract:
In this article, we argue that understanding the collective behavior of agents based on large language models (LLMs) is an essential area of inquiry, with important implications in terms of risks and benefits, impacting us as a society at many levels. We claim that the distinctive nature of LLMs--namely, their initialization with extensive pre-trained knowledge and implicit social priors, together with their capability of adaptation through in-context learning--motivates the need for an interactionist paradigm consisting of alternative theoretical foundations, methodologies, and analytical tools, in order to systematically examine how prior knowledge and embedded values interact with social context to shape emergent phenomena in multi-agent generative AI systems. We propose and discuss four directions that we consider crucial for the development and deployment of LLM-based collectives, focusing on theory, methods, and trans-disciplinary dialogue.

Authors:Bohan Zhang, Chengke Bu, Paramveer S. Dhillon
Title: Who Owns the Text? Design Patterns for Preserving Authorship in AI-Assisted Writing
Abstract:
AI writing assistants can reduce effort and improve fluency, but they may also weaken writers' sense of authorship. We study this tension with an ownership-aware co-writing editor that offers on-demand, sentence-level suggestions and tests two common design choices: persona-based coaching and style personalization. In an online study (N=176), participants completed three professional writing tasks: an email without AI help, a proposal with generic AI suggestions, and a cover letter with persona-based coaching, while half received suggestions tailored to a brief sample of their prior writing. Across the two AI-assisted tasks, psychological ownership dropped relative to unassisted writing (about 0.85-1.0 points on a 7-point scale), even as cognitive load decreased (about 0.9 points) and quality ratings stayed broadly similar overall. Persona coaching did not prevent the ownership decline. Style personalization partially restored ownership (about +0.43) and increased AI incorporation in text (+5 percentage points). We distill five design patterns: on-demand initiation, micro-suggestions, voice anchoring, audience scaffolds, and point-of-decision provenance, to guide authorship-preserving writing tools.

Authors:Andrea Ferrario, Alessandro Facchini, Juan M. Durán
Title: Epistemology gives a Future to Complementarity in Human-AI Interactions
Abstract:
Human-AI complementarity is the claim that a human supported by an AI system can outperform either alone in a decision-making process. Since its introduction in the human-AI interaction literature, it has gained traction by generalizing the reliance paradigm and by offering a more practical alternative to the contested construct of 'trust in AI.' Yet complementarity faces key theoretical challenges: it lacks precise theoretical anchoring, it is formalized just as a post hoc indicator of relative predictive accuracy, it remains silent about other desiderata of human-AI interactions and it abstracts away from the magnitude-cost profile of its performance gain. As a result, complementarity is difficult to obtain in empirical settings. In this work, we leverage epistemology to address these challenges by reframing complementarity within the discourse on justificatory AI. Drawing on computational reliabilism, we argue that historical instances of complementarity function as evidence that a given human-AI interaction is a reliable epistemic process for a given predictive task. Together with other reliability indicators assessing the alignment of the human-AI team with the epistemic standards and socio-technical practices, complementarity contributes to the degree of reliability of human-AI teams when generating predictions. This supports the practical reasoning of those affected by these outputs -- patients, managers, regulators, and others. In summary, our approach suggests that the role and value of complementarity lies not in providing a relative measure of predictive accuracy, but in helping calibrate decision-making to the reliability of AI-supported processes that increasingly shape everyday life.

Authors:Andrea Ferrario, Rasita Vinay, Matteo Casserini, Alessandro Facchini
Title: A Scoping Review of the Ethical Perspectives on Anthropomorphising Large Language Model-Based Conversational Agents
Abstract:
Anthropomorphisation -- the phenomenon whereby non-human entities are ascribed human-like qualities -- has become increasingly salient with the rise of large language model (LLM)-based conversational agents (CAs). Unlike earlier chatbots, LLM-based CAs routinely generate interactional and linguistic cues, such as first-person self-reference, epistemic and affective expressions that empirical work shows can increase engagement. On the other hand, anthropomorphisation raises ethical concerns, including deception, overreliance, and exploitative relationship framing, while some authors argue that anthropomorphic interaction may support autonomy, well-being, and inclusion. Despite increasing interest in the phenomenon, literature remains fragmented across domains and varies substantially in how it defines, operationalizes, and normatively evaluates anthropomorphisation. This scoping review maps ethically oriented work on anthropomorphising LLM-based CAs across five databases and three preprint repositories. We synthesize (1) conceptual foundations, (2) ethical challenges and opportunities, and (3) methodological approaches. We find convergence on attribution-based definitions but substantial divergence in operationalization, a predominantly risk-forward normative framing, and limited empirical work that links observed interaction effects to actionable governance guidance. We conclude with a research agenda and design/governance recommendations for ethically deploying anthropomorphic cues in LLM-based conversational agents.

Authors:Robert K. Strehlow, Tobias Küster, Oskar F. Kupke, Brandon Llanque Kurps, Fikret Sivrikaya, Sahin Albayrak
Title: SAGE: Tool-Augmented LLM Task Solving Strategies in Scalable Multi-Agent Environments
Abstract:
Large language models (LLMs) have proven to work well in question-answering scenarios, but real-world applications often require access to tools for live information or actuation. For this, LLMs can be extended with tools, which are often defined in advance, also allowing for some fine-tuning for specific use cases. However, rapidly evolving software landscapes and individual services require the constant development and integration of new tools. Domain- or company-specific tools can greatly elevate the usefulness of an LLM, but such custom tools can be problematic to integrate, or the LLM may fail to reliably understand and use them. For this, we need strategies to define new tools and integrate them into the LLM dynamically, as well as robust and scalable zero-shot prompting methods that can make use of those tools in an efficient manner. In this paper, we present SAGE, a specialized conversational AI interface, based on the OPACA framework for tool discovery and execution. The integration with OPACA makes it easy to add new tools or services for the LLM to use, while SAGE itself presents rich extensibility and modularity. This not only provides the ability to seamlessly switch between different models (e.g. GPT, LLAMA), but also to add and select prompting methods, involving various setups of differently prompted agents for selecting and executing tools and evaluating the results. We implemented a number of task-solving strategies, making use of agentic concepts and prompting methods in various degrees of complexity, and evaluated those against a comprehensive set of benchmark services. The results are promising and highlight the distinct strengths and weaknesses of different task-solving strategies. Both SAGE and the OPACA framework, as well as the different benchmark services and results, are available as Open Source/Open Data on GitHub.

Authors:Bhaskar Mitra, Nicola Neophytou, Sireesh Gururaja
Title: Information Access of the Oppressed: A Problem-Posing Framework for Envisioning Emancipatory Information Access Platforms
Abstract:
Online information access (IA) platforms are targets of authoritarian capture. These concerns are particularly serious and urgent today in light of the rising levels of democratic erosion worldwide, the emerging capabilities of generative AI technologies such as AI persuasion, and the increasing concentration of economic and political power in the hands of Big Tech. This raises the question of what alternative IA infrastructure we must reimagine and build to mitigate the risks of authoritarian capture of our information ecosystems. We explore this question through the lens of Paulo Freire's theories of emancipatory pedagogy. Freire's theories provide a radically different lens for exploring IA's sociotechnical concerns relative to the current dominating frames of fairness, accountability, confidentiality, transparency, and safety. We make explicit, with the intention to challenge, the dichotomy of how we relate to technology as either technologists (who envision and build technology) and its users. We posit that this mirrors the teacher-student relationship in Freire's analysis. By extending Freire's analysis to IA, we challenge the notion that it is the burden of the (altruistic) technologists to come up with interventions to mitigate the risks that emerging technologies pose to marginalized communities. Instead, we advocate that the first task for the technologists is to pose these as problems to the marginalized communities, to encourage them to make and unmake the technology as part of their material struggle against oppression. Their second task is to redesign our online technology stacks to structurally expose spaces for community members to co-opt and co-construct the technology in aid of their emancipatory struggles. We operationalize Freire's theories to develop a problem-posing framework for envisioning emancipatory IA platforms of the future.

Authors:Jun-Peng Zhu, Boyan Niu, Peng Cai, Zheming Ni, Kai Xu, Jiajun Huang, Shengbo Ma, Bing Wang, Xuan Zhou, Guanglei Bao, Donghui Zhang, Liu Tang, Qi Liu
Title: TiInsight: A SQL-based Automated Exploratory Data Analysis System through Large Language Models
Abstract:
The SQL-based exploratory data analysis has garnered significant attention within the data analysis community. The emergence of large language models (LLMs) has facilitated the paradigm shift from manual to automated data exploration. However, existing methods generally lack the ability for cross-domain analysis, and the exploration of LLMs capabilities remains insufficient. This paper presents TiInsight, an SQL-based automated cross-domain exploratory data analysis system. First, TiInsight offers a user-friendly GUI enabling users to explore data using natural language queries. Second, TiInsight offers a robust cross-domain exploratory data analysis pipeline: hierarchical data context (i.e., HDC) generation, question clarification and decomposition, text-to-SQL (i.e., TiSQL), and data visualization (i.e., TiChart). Third, we have implemented and deployed TiInsight in the production environment of PingCAP and demonstrated its capabilities using representative datasets. The demo video is available at https://youtu.be/JzYFyYd-emI.

Authors:Xinyi Zhou, Zeinadsadat Saghi, Sadra Sabouri, Rahul Pandita, Mollie McGuire, Souti Chattopadhyay
Title: Cognitive Biases in LLM-Assisted Software Development
Abstract:
The widespread adoption of Large Language Models (LLMs) in software development is transforming programming from a solution-generative to a solution-evaluative activity. This shift opens a pathway for new cognitive challenges that amplify existing decision-making biases or create entirely novel ones. One such type of challenge stems from cognitive biases, which are thinking patterns that lead people away from logical reasoning and result in sub-optimal decisions. How do cognitive biases manifest and impact decision-making in emerging AI-collaborative development? This paper presents the first comprehensive study of cognitive biases in LLM-assisted development. We employ a mixed-methods approach, combining observational studies with 14 student and professional developers, followed by surveys with 22 additional developers. We qualitatively compare categories of biases affecting developers against the traditional non-LLM workflows. Our findings suggest that LLM-related actions are more likely to be associated with novel biases. Through a systematic analysis of 90 cognitive biases specific to developer-LLM interactions, we develop a taxonomy of 15 bias categories validated by cognitive psychologists. We found that 48.8% of total programmer actions are biased, and developer-LLM interactions account for 56.4% of these biased actions. We discuss how these bias categories manifest, present tools and practices for developers, and recommendations for LLM tool builders to help mitigate cognitive biases in human-AI programming.

Authors:John Paul P. Miranda, Jaymark A. Yambao
Title: Assessing novice programmers' perception of ChatGPT:performance, risk, decision-making, and intentions
Abstract:
This study explores the novice programmers' intention to use chat generative pretrained transformer (ChatGPT) for programming tasks with emphasis on performance expectancy (PE), risk-reward appraisal (RRA), and decision-making (DM). Utilizing partial least squares structural equation modeling (PLS-SEM) and a sample of 413 novice programmers, the analysis demonstrates that higher PE of ChatGPT is positively correlated with improved DM in programming tasks. Novice programmers view ChatGPT as a tool that enhances their learning and skill development. Additionally, novice programmers that have a favorable RRA of ChatGPT tend to make more confident and effective decisions, acknowledging potential risks but recognizing that benefits such as quick problem-solving and learning new techniques outweigh these risks. Moreover, a positive perception of ChatGPT's role in DM significantly increases the inclination to use the tool for programming tasks. These results highlight the critical roles of perceived capabilities, risk assessment, and positive DM experiences in promoting the adoption of artificial intelligence (AI) tools in programming education.

Authors:Sai Khadloya, Kush Juvekar, Arghya Bhattacharya, Utkarsh Saxena
Title: CourtNav: Voice-Guided, Anchor-Accurate Navigation of Long Legal Documents in Courtrooms
Abstract:
Judicial work depends on close reading of long records, charge sheets, pleadings, annexures, orders, often spanning hundreds of pages. With limited staff support, exhaustive reading during hearings is impractical. We present CourtNav, a voice-guided, anchor-first navigator for legal PDFs that maps a judge's spoken command (e.g., "go to paragraph 23", "highlight the contradiction in the cross-examination") directly to a highlighted paragraph in seconds. CourtNav transcribes the command, classifies intent with a grammar-first(Exact regex matching), LLM-backed router classifying the queries using few shot examples, retrieves over a layout-aware hybrid index, and auto-scrolls the viewer to the cited span while highlighting it and close alternates. By design, the interface shows only grounded passages, never free text, keeping evidence verifiable and auditable. This need is acute in India, where judgments and cross-examinations are notoriously long.In a pilot on representative charge sheets, pleadings, and orders, median time-to-relevance drops from 3-5 minutes (manual navigation) to 10-15 seconds; with quick visual verification included, 30-45 seconds. Under fixed time budgets, this navigation-first design increases the breadth of the record actually consulted while preserving control and transparency.

Authors:Niloufar Alavi, Swati Shah, Rezvan Alamian, Stefan Goetz
Title: Driver-Intention Prediction with Deep Learning: Real-Time Brain-to-Vehicle Communication
Abstract:
Brain-computer interfaces (BCIs) allow direct communication between the brain and electronics without the need for speech or physical movement. Such interfaces can be particularly beneficial in applications requiring rapid response times, such as driving, where a vehicle's advanced driving assistance systems could benefit from immediate understanding of a driver's intentions. This study presents a novel method for predicting a driver's intention to steer using electroencephalography (EEG) signals through deep learning. A driving simulator created a controlled environment in which participants imagined controlling a vehicle during various driving scenarios, including left and right turns, as well as straight driving. A convolutional neural network (CNN) classified the detected EEG data with minimal pre-processing. Our model achieved an accuracy of 83.7% in distinguishing between the three steering intentions and demonstrated the ability of CNNs to process raw EEG data effectively. The classification accuracy was highest for right-turn segments, which suggests a potential spatial bias in brain activity. This study lays the foundation for more intuitive brain-to-vehicle communication systems.

Authors:Martin P. Robillard, Lihn V. Nguyen, Deeksha Arya, Jin L. C. Guo
Title: How Users Consider Web Tracking When Seeking Health Information Online
Abstract:
Health information websites offer instantaneous access to information, but have important privacy implications as they can associate a visitor with specific medical conditions. We interviewed 35 residents of Canada to better understand whether and how online health information seekers exercise three potential means of protection against surveillance: website selection, privacy-enhancing technologies, and self-censorship, as well as their understanding of web tracking. Our findings reveal how users' limited initiative and effectiveness in protecting their privacy could be associated with a missing or inaccurate understanding of how implicit data collection by third parties takes place on the web, and who collects the data. We conclude that to help Internet users achieve better self-data protection, we may need to shift privacy awareness efforts from what information is collected to how it is collected.

Authors:Tatsuya Okuno, Haruto Shimizu, Nobuhito Kasahara, Taiyu Honma, Shota Yamanaka, Homei Miyashita
Title: A Tool for Estimating Success Rates of Raycasting-Based Object Selection in Virtual Reality
Abstract:
As XR devices become widespread, 3D interaction has become commonplace, and UI developers are increasingly required to consider usability to deliver better user experiences. The HCI community has long studied target-pointing performance, and research on 3D environments has progressed substantially. However, for practitioners to directly leverage research findings in UI improvements, practical tools are needed. To bridge this gap between research and development in VR systems, we propose a system that estimates object selection success rates within a development tool (Unity). In this paper, we validate the underlying theory, describe the tool's functions, and report feedback from VR developers who tried the tool to assess its usefulness.

Authors:Yun Ye, Yuan Che, Haoyang Liang, Yingheng Zhang, Pengpeng Xu
Title: Wait or cross? Understanding the influence of behavioral tendency, trust, and risk perception on pedestrian gap-acceptance of automated truck platoons
Abstract:
Although automated trucks have the potential to improve freight efficiency, reduce costs, and address driver shortages, organizing two or more trucks in a convoy has raised considerable concerns for pedestrian safety. This study conducted a controlled experiment to examine the influence of behavioral tendency, trust, and risk perception on pedestrian intention to cross in front of an automated truck platoon. A total of 603 subjects participated in the virtual reality video-based questionnaire survey. By fusing the merits of structural equation modeling and artificial neural networks, a two-stage, hybrid model was developed to examine complex relationships between latent variables and gap-acceptance behaviors. Our results indicated that subjects watched an average of five vehicle gaps before starting crossing and the average time gap accepted was about 5.35 seconds. Risk perception not only played the most dominant role in shaping pedestrian crossing decisions, but also served as the strong bone, mediating the effects of behavioral tendency and trust on gap-acceptance. Participants who frequently violated traffic rules were more likely to accept a smaller time gap, while those who showed positive behaviors to other road users tended to wait for a larger time gap. Participants who often committed errors, showed aggressive behaviors, and held greater trust in the safety of automated trucks generally reported a lower level of risk for road-crossing in front of automated truck platoons. Built on these findings, a range of tailored countermeasures were proposed to ensure safer and smother interactions between pedestrians and automated truck platoons.

Authors:Yuan Che, Mun On Wong, Xiaowei Gao, Haoyang Liang, Yun Ye
Title: Enhancing Safety in Automated Ports: A Virtual Reality Study of Pedestrian-Autonomous Vehicle Interactions under Time Pressure, Visual Constraints, and Varying Vehicle Size
Abstract:
Autonomous driving improves traffic efficiency but presents safety challenges in complex port environments. This study investigates how environmental factors, traffic factors, and pedestrian characteristics influence interaction safety between autonomous vehicles and pedestrians in ports. Using virtual reality (VR) simulations of typical port scenarios, 33 participants completed pedestrian crossing tasks under varying visibility, vehicle sizes, and time pressure conditions. Results indicate that low-visibility conditions, partial occlusions and larger vehicle sizes significantly increase perceived risk, prompting pedestrians to wait longer and accept larger gaps. Specifically, pedestrians tended to accept larger gaps and waited longer when interacting with large autonomous truck platoons, reflecting heightened caution due to their perceived threat. However, local obstructions also reduce post-encroachment time, compressing safety margins. Individual attributes such as age, gender, and driving experience further shape decision-making, while time pressure undermines compensatory behaviors and increases risk. Based on these findings, safety strategies are proposed, including installing wide-angle cameras at multiple viewpoints, enabling real-time vehicle-infrastructure communication, enhancing port lighting and signage, and strengthening pedestrian safety training. This study offers practical recommendations for improving the safety and deployment of vision-based autonomous systems in port settings.

Authors:Soroush Elyasi, Arya VarastehNezhad, Fattaneh Taghiyareh
Title: MentalGame: Predicting Personality-Job Fitness for Software Developers Using Multi-Genre Games and Machine Learning Approaches
Abstract:
Personality assessment in career guidance and personnel selection traditionally relies on self-report questionnaires, which are susceptible to response bias, fatigue, and intentional distortion. Game-based assessment offers a promising alternative by capturing implicit behavioral signals during gameplay. This study proposes a multi-genre serious-game framework combined with machine-learning techniques to predict suitability for software development roles. Developer-relevant personality and behavioral traits were identified through a systematic literature review and an empirical study of professional software engineers. A custom mobile game was designed to elicit behaviors related to problem solving, planning, adaptability, persistence, time management, and information seeking. Fine-grained gameplay event data were collected and analyzed using a two-phase modeling strategy where suitability was predicted exclusively from gameplay-derived behavioral features. Results show that our model achieved up to 97% precision and 94% accuracy. Behavioral analysis revealed that proper candidates exhibited distinct gameplay patterns, such as more wins in puzzle-based games, more side challenges, navigating menus more frequently, and exhibiting fewer pauses, retries, and surrender actions. These findings demonstrate that implicit behavioral traces captured during gameplay is promising in predicting software-development suitability without explicit personality testing, supporting serious games as a scalable, engaging, and less biased alternative for career assessment.

Authors:Joslyn Orgill, Andra Rice, Max Fowler, Seth Poulsen
Title: The Effect of Transparency on Students' Perceptions of AI Graders
Abstract:
The development of effective autograders is key for scaling assessment and feedback. While NLP based autograding systems for open-ended response questions have been found to be beneficial for providing immediate feedback, autograders are not always liked, understood, or trusted by students. Our research tested the effect of transparency on students' attitudes towards autograders. Transparent autograders increased students' perceptions of autograder accuracy and willingness to discuss autograders in survey comments, but did not improve other related attitudes -- such as willingness to be graded by them on a test -- relative to the control without transparency. However, this lack of impact may be due to higher measured student trust towards autograders in this study than in prior work in the field. We briefly discuss possible reasons for this trend.

Authors:Argha Kamal Samanta, Deepak Mewada, Monalisa Sarma, Debasis Samanta
Title: Wave2Word: A Multimodal Transformer Framework for Joint EEG-Text Alignment and Multi-Task Representation Learning in Neurocritical Care
Abstract:
Continuous electroencephalography (EEG) is routinely used in neurocritical care to monitor seizures and other harmful brain activity, including rhythmic and periodic patterns that are clinically significant. Although deep learning methods have achieved high accuracy in seizure detection, most existing approaches remain seizure-centric, rely on discrete-label supervision, and are primarily evaluated using accuracy-based metrics. A central limitation of current EEG modeling practice is the weak correspondence between learned representations and how EEG findings are interpreted and summarized in clinical workflows. Harmful EEG activity exhibits overlapping patterns, graded expert agreement, and temporal persistence, which are not well captured by classification objectives alone. This work proposes a multimodal EEG representation learning framework that integrates signal-domain modeling with structured clinical language supervision. First, raw EEG is transformed into a longitudinal bipolar montage and time-frequency representations. Second, dual transformer-based encoders model complementary temporal and frequency-centric dependencies and are fused using an adaptive gating mechanism. Third, EEG embeddings are aligned with structured expert consensus descriptions through a contrastive objective. Finally, an EEG-conditioned text reconstruction loss is introduced as a representation-level constraint alongside standard classification loss. Experimental evaluation using a controlled train-validation-test split achieves a six-class test accuracy of 0.9797. Ablation analyses show that removing contrastive alignment reduces cross-modal retrieval performance from Recall@10 of 0.3390 to 0.0045, despite minimal change in classification accuracy. These findings demonstrate that discriminative accuracy does not reliably reflect representation quality for clinically meaningful EEG modeling.

Authors:Arka Majhi, Aparajita Mondal, Satish B. Agnihotri
Title: Design Guidelines for Game-Based Refresher Training of Community Health Workers in Low-Resource Contexts
Abstract:
Community Health Workers (CHWs) play a critical role in delivering primary healthcare services in low-resource settings, yet sustaining their training and performance remains a persistent challenge. Prior research has explored digital and game-based approaches for CHW training. However, limited work has synthesized longitudinal design insights into generalizable guidelines for interactive health interventions. Building on a four-year design-based research program involving multiple game-based refresher training systems, including quiz-based mobile apps, physical and augmented reality games, card-based games, and location-based games, we examine which design guidelines support sustained engagement, learning transfer, and contextual appropriateness in CHW training. We conducted a mixed-methods analysis across deployments with Accredited Social Health Activists and Anganwadi Workers in India, including interviews, field observations, and usage logs. Through thematic synthesis, we derive eight design guidelines addressing contextual realism, adaptive learning, hybrid interaction, social motivation, explainability, professional identity, and ethical considerations. Our findings contribute actionable design knowledge for researchers and practitioners developing interactive health interventions in low-resource healthcare contexts.

Authors:Arka Majhi, Aparajita Mondal, Satish B. Agnihotri
Title: Healthcare App Design in Low-Resource Contexts: Challenges, Practices, and Opportunities
Abstract:
Digital health technologies are increasingly used to improve healthcare access and delivery worldwide. However, many healthcare applications are designed for environments with stable infrastructure, high digital literacy, and strong institutional support. These assumptions often do not hold in low-resource contexts where healthcare delivery often depends on community health workers, caregivers, and informal care networks. Designing effective healthcare applications for such environments requires attention to infrastructural constraints, cultural contexts, language diversity, and usability challenges. This Birds of a Feather session aims to bring together researchers, designers, and practitioners interested in healthcare application design in low-resource contexts. The session will provide an informal forum for discussing challenges encountered in the design and deployment of digital health technologies in underserved settings, sharing field experiences, and identifying opportunities for collaboration within the Interactive Health (IH) community.

Authors:Alexis Carrillo, Enrique Taietta, Ali Aghazadeh Ardebili, Giuseppe Alessandro Veltri, Massimo Stella
Title: Talk2AI: A Longitudinal Dataset of Human--AI Persuasive Conversations
Abstract:
Talk2AI is a large-scale longitudinal dataset of 3,080 conversations (totaling 30,800 turns) between human participants and Large Language Models (LLMs), designed to support research on persuasion, opinion change, and human-AI interaction. The corpus was collected from 770 profiled Italian adults across four weekly sessions in Spring 2025, using a within-subject design in which each participant conversed with a single model (GPT-4o, Claude Sonnet 3.7, DeepSeek-chat V3, or Mistral Large) on three socially relevant topics: climate change, math anxiety, and health misinformation. Each conversation is linked to rich contextual data, including sociodemographic characteristics and psychometric profiles. After each session, participants reported on opinion change, conviction stability, perceived humanness of the AI, and behavioral intentions, enabling fine-grained longitudinal analysis of how AI-mediated dialogue shapes beliefs and attitudes over time.

Authors:André Barrocas, Nuno Jardim Nunes, Valentina Nisi, Nikolas Martelaro
Title: EcoAssist: Embedding Sustainability into AI-Assisted Frontend Development
Abstract:
Frontend code, replicated across millions of page views, consumes significant energy and contributes directly to digital emissions. Yet current AI coding assistants, such as GitHub Copilot and Amazon CodeWhisperer, emphasize developer speed and convenience, with energy impact not yet a primary focus. At the same time, existing energy-focused guidelines and metrics have seen limited adoption among practitioners, leaving a gap between research and everyday coding practice. To address this gap, we introduce EcoAssist, an energy-aware assistant integrated into an IDE that analyzes AI-generated frontend code, estimates its energy footprint, and proposes targeted optimizations. We evaluated EcoAssist through benchmarks of 500 websites and a controlled study with 20 developers. Results show that EcoAssist reduced per-website energy by 13-16% on average, increased developers' awareness of energy use, and maintained developer productivity. This work demonstrates how energy considerations can be embedded directly into AI-assisted coding workflows, supporting developers as they engage with energy implications through actionable feedback.

Authors:Nolan Platt, Sehrish Nizamani, Alp Tural, Elif Tural, Saad Nizamani, Andrew Katz, Yoonje Lee, Nada Basit
Title: Can LLMs Reason About Attention? Towards Zero-Shot Analysis of Multimodal Classroom Behavior
Abstract:
Understanding student engagement usually requires time-consuming manual observation or invasive recording that raises privacy concerns. We present a privacy-preserving pipeline that analyzes classroom videos to extract insights about student attention, without storing any identifiable footage. Our system runs on a single GPU, using OpenPose for skeletal extraction and Gaze-LLE for visual attention estimation. Original video frames are deleted immediately after pose extraction, thus only geometric coordinates (stored as JSON) are retained, ensuring compliance with FERPA. The extracted pose and gaze data is processed by QwQ-32B-Reasoning, which performs zero-shot analysis of student behavior across lecture segments. Instructors access results through a web dashboard featuring attention heatmaps and behavioral summaries. Our preliminary findings suggest that LLMs may show promise for multimodal behavior understanding, although they still struggle with spatial reasoning about classroom layouts. We discuss these limitations and outline directions for improving LLM spatial comprehension in educational analytics contexts.

Authors:Hanyu Su, Huilin Zhang, Shihui Feng
Title: Comparing the Impact of Pedagogy-Informed Custom and General-Purpose GAI Chatbots on Students' Science Problem-Solving Processes and Performance Using Heterogeneous Interaction Network Analysis
Abstract:
Problem solving plays an essential role in science education, and generative AI (GAI) chatbots have emerged as a promising tool for supporting students' science problem solving. However, general-purpose chatbots (e.g., ChatGPT), which often provide direct, ready-made answers, may lead to students' cognitive offloading. Prior research has rarely focused on custom chatbots for facilitating students' science problem solving, nor has it examined how they differently influence problem-solving processes and performance compared to general-purpose chatbots. To address this gap, we developed a pedagogy-informed custom GAI chatbot grounded in the Socratic questioning method, which supports students by prompting them with guiding questions. This study employed a within-subjects counterbalanced design in which 48 secondary school students used both custom and general-purpose chatbot to complete two science problem-solving tasks. 3297 student-chatbot dialogues were collected and analyzed using Heterogeneous Interaction Network Analysis (HINA). The results showed that: (1) students demonstrated significantly higher interaction intensity and cognitive interaction diversity when using custom chatbot than using general-purpose chatbot; (2) students were more likely to follow custom chatbot's guidance to think and reflect, whereas they tended to request general-purpose chatbot to execute specific commands; and (3) no statistically significant difference was observed in students' problem-solving performance evaluated by solution quality between two chatbot conditions. This study provides novel theoretical insights and empirical evidence that custom chatbots are less likely to induce cognitive offloading and instead foster greater cognitive engagement compared to general-purpose chatbots. This study also offers insights into the design and integration of GAI chatbots in science education.

Authors:Zeyang Huang, Angelos Chatzimparmpas, Thomas Höllt, Takanori Fujiwara
Title: A Spectral Framework for Multi-Scale Nonlinear Dimensionality Reduction
Abstract:
Dimensionality reduction (DR) is characterized by two longstanding trade-offs. First, there is a global-local preservation tension: methods such as t-SNE and UMAP prioritize local neighborhood preservation, yet may distort global manifold structure, while methods such as Laplacian Eigenmaps preserve global geometry but often yield limited local separation. Second, there is a gap between expressiveness and analytical transparency: many nonlinear DR methods produce embeddings without an explicit connection to the underlying high-dimensional structure, limiting insight into the embedding process. In this paper, we introduce a spectral framework for nonlinear DR that addresses these challenges. Our approach embeds high-dimensional data using a spectral basis combined with cross-entropy optimization, enabling multi-scale representations that bridge global and local structure. Leveraging linear spectral decomposition, the framework further supports analysis of embeddings through a graph-frequency perspective, enabling examination of how spectral modes influence the resulting embedding. We complement this analysis with glyph-based scatterplot augmentations for visual exploration. Quantitative evaluations and case studies demonstrate that our framework improves manifold continuity while enabling deeper analysis of embedding structure through spectral mode contributions.

Authors:Graziano Blasilli, Marco Angelini
Title: True (VIS) Lies: Analyzing How Generative AI Recognizes Intentionality, Rhetoric, and Misleadingness in Visualization Lies
Abstract:
This study investigates the ability of multimodal Large Language Models (LLMs) to identify and interpret misleading visualizations, and recognize these observations along with their underlying causes and potential intentionality. Our analysis leverages concepts from visualization rhetoric and a newly developed taxonomy of authorial intents as explanatory lenses. We formulated three research questions and addressed them experimentally using a dataset of 2,336 COVID-19-related tweets, half of which contain misleading visualizations, and supplemented it with real-world examples of perceptual, cognitive, and conceptual errors drawn from VisLies, the IEEE VIS community event dedicated to showcasing deceptive and misleading visualizations. To ensure broad coverage of the current LLM landscape, we evaluated 16 state-of-the-art models. Among them, 15 are open-weight models, spanning a wide range of model sizes, architectural families, and reasoning capabilities. The selection comprises small models, namely Nemotron-Nano-V2-VL (12B parameters), Mistral-Small-3.2 (24B), DeepSeek-VL2 (27B), Gemma3 (27B), and GTA1 (32B); medium-sized models, namely Qianfan-VL (70B), Molmo (72B), GLM-4.5V (108B), LLaVA-NeXT (110B), and Pixtral-Large (124B); and large models, namely Qwen3-VL (235B), InternVL3.5 (241B), Step3 (321B), Llama-4-Maverick (400B), and Kimi-K2.5 (1000B). In addition, we employed OpenAI GPT-5.4, a frontier proprietary model. To establish a human perspective on these tasks, we also conducted a user study with visualization experts to assess how people perceive rhetorical techniques and the authorial intentions behind the same misleading visualizations. This allows comparison between model and expert behavior, revealing similarities and differences that provide insights into where LLMs align with human judgment and where they diverge.

Authors:Griffin Pitts, Neha Rani, Weedguet Mildort
Title: Trust and Reliance on AI in Education: AI Literacy and Need for Cognition as Moderators
Abstract:
As generative AI systems are integrated into educational settings, students often encounter AI-generated output while working through learning tasks, either by requesting help or through integrated tools. Trust in AI can influence how students interpret and use that output, including whether they evaluate it critically or exhibit overreliance. We investigate how students' trust relates to their appropriate reliance on an AI assistant during programming problem-solving tasks, and whether this relationship differs by learner characteristics. With 432 undergraduate participants, students' completed Python output-prediction problems while receiving recommendations and explanations from an AI chatbot, including accurate and intentionally misleading suggestions. We operationalize reliance behaviorally as the extent to which students' responses reflected appropriate use of the AI assistant's suggestions, accepting them when they were correct and rejecting them when they were incorrect. Pre- and post-task surveys assessed trust in the assistant, AI literacy, need for cognition, programming self-efficacy, and programming literacy. Results showed a non-linear relationship in which higher trust was associated with lower appropriate reliance, suggesting weaker discrimination between correct and incorrect recommendations. This relationship was significantly moderated by students' AI literacy and need for cognition. These findings highlight the need for future work on instructional and system supports that encourage more reflective evaluation of AI assistance during problem-solving.

Authors:Julian Berger, Pantelis P. Analytis, Ville Satopää, Ralf H. J. M. Kurvers
Title: Beyond AI advice -- independent aggregation boosts human-AI accuracy
Abstract:
Artificial intelligence (AI) is broadly deployed as an advisor to human decision-makers: AI recommends a decision and a human accepts or rejects the advice. This approach, however, has several limitations: People frequently ignore accurate advice and rely too much on inaccurate advice, and their decision-making skills may deteriorate over time. Here, we compare the AI-as-advisor approach to the hybrid confirmation tree (HCT), an alternative strategy that preserves the independence of human and AI judgments. The HCT elicits a human judgment and an AI judgment independently of each other. If they agree, that decision is accepted. If not, a second human breaks the tie. For the comparison, we used 10 datasets from various domains, including medical diagnostics and misinformation discernment, and a subset of four datasets in which AI also explained its decision. The HCT outperformed the AI-as-advisor approach in all datasets. The HCT also performed better in almost all cases in which AI offered an explanation of its judgment. Using signal detection theory to interpret these results, we find that the HCT outperforms the AI-as-advisor approach because people cannot discriminate well enough between correct and incorrect AI advice. Overall, the HCT is a robust, accurate, and transparent alternative to the AI-as-advisor approach, offering a simple mechanism to tap into the wisdom of hybrid crowds.

Authors:Xudong Zhou, Jinyuan Liang, Qiuyi Guo, Guozheng Li
Title: iPoster: Content-Aware Layout Generation for Interactive Poster Design via Graph-Enhanced Diffusion Models
Abstract:
We present iPoster, an interactive layout generation framework that empowers users to guide content-aware poster layout design by specifying flexible constraints. iPoster enables users to specify partial intentions within the intention module, such as element categories, sizes, positions, or coarse initial drafts. Then, the generation module instantly generates refined, context-sensitive layouts that faithfully respect these constraints. iPoster employs a unified graph-enhanced diffusion architecture that supports various design tasks under user-specified constraints. These constraints are enforced through masking strategies that precisely preserve user input at every denoising step. A cross content-aware attention module aligns generated elements with salient regions of the canvas, ensuring visual coherence. Extensive experiments show that iPoster not only achieves state-of-the-art layout quality, but offers a responsive and controllable framework for poster layout design with constraints.

Authors:Maruchi Kim, Rasya Fawwaz, Zhi Yang Lim, Brinda Moudgalya, Hexi Wang, Yuanhao Zeng, Shyamnath Gollakota
Title: VueBuds: Visual Intelligence with Wireless Earbuds
Abstract:
Despite their ubiquity, wireless earbuds remain audio-centric due to size and power constraints. We present VueBuds, the first camera-integrated wireless earbuds for egocentric vision, capable of operating within stringent power and form-factor limits. Each VueBud embeds a camera into a Sony WF-1000XM3 to stream visual data over Bluetooth to a host device for on-device vision language model (VLM) processing. We show analytically and empirically that while each camera's field of view is partially occluded by the face, the combined binocular perspective provides comprehensive forward coverage. By integrating VueBuds with VLMs, we build an end-to-end system for real-time scene understanding, translation, visual reasoning, and text reading; all from low-resolution monochrome cameras drawing under 5mW through on-demand activation. Through online and in-person user studies with 90 participants, we compare VueBuds against smart glasses across 17 visual question-answering tasks, and show that our system achieves response quality on par with Ray-Ban Meta. Our work establishes low-power camera-equipped earbuds as a compelling platform for visual intelligence, bringing rapidly advancing VLM capabilities to one of the most ubiquitous wearable form factors.

Authors:Soufiane Jhilal, Eleonora Pasqua, Caterina Marchesi, Riccardo Corradi, Martina Galletti
Title: Tailoring AI-Driven Reading Scaffolds to the Distinct Needs of Neurodiverse Learners
Abstract:
Neurodiverse learners often require reading supports, yet increasing scaffold richness can sometimes overload attention and working memory rather than improve comprehension. Grounded in the Construction-Integration model and a contingent scaffolding perspective, we examine how structural versus semantic scaffolds shape comprehension and reading experience in a supervised inclusive context. Using an adapted reading interface, we compared four modalities: unmodified text, sentence-segmented text, segmented text with pictograms, and segmented text with pictograms plus keyword labels. In a within-subject pilot with 14 primary-school learners with special educational needs and disabilities, we measured reading comprehension using standardized questions and collected brief child- and therapist-reported experience measures alongside open-ended feedback. Results highlight heterogeneous responses as some learners showed patterns consistent with benefits from segmentation and pictograms, while others showed patterns consistent with increased coordination costs when visual scaffolds were introduced. Experience ratings showed limited differences between modalities, with some apparent effects linked to clinical complexity, particularly for perceived ease of understanding. Open-ended feedback of the learners frequently requested simpler wording and additional visual supports. These findings suggest that no single scaffold is universally optimal, reinforcing the need for calibrated, adjustable scaffolding and provide design implications for human-AI co-regulation in supervised inclusive reading contexts.

Authors:Matthias Dold, Volker A. Coenen, Bastian Sajonz, Peter Reinacher, Peter Reinacher, Thomas Prokop, Marco Reisert, Sophia Gimple, Yasin Temel, Marcus L. F. Janssen, Michael Tangermann, Joana Pereira
Title: Invasive and Non-Invasive Neural Decoding of Motor Performance in Parkinson's Disease for Personalized Deep Brain Stimulation
Abstract:
Decoding motor performance from brain signals offers promising avenues for adaptive deep brain stimulation (aDBS) for Parkinson's disease (PD). In a two-center cohort of 19 PD patients executing a drawing task, we decoded motor performance from electroencephalography (n=15) and, critically for clinical translation, electrocorticography (n=4). Within each session, patients performed the task under DBS on and DBS off. A total of 35 sessions were recorded. Instead of relying on single frequency bands, we derived patient-specific biomarkers using a filterbank-based machine-learning approach. DBS modulated kinematics significantly in 23 sessions. Significant neural decoding of kinematics was possible in 28 of the 35 sessions (average Pearson's $\text{r}= 0.37$). Our results further demonstrate modulation of speed-accuracy trade-offs, with increased drawing speed but reduced accuracy under DBS. Joint evaluation of behavioral and neural decoding outcomes revealed six prototypical scenarios, for which we provide guidance for future aDBS strategies.

Authors:Rahul Sharma, Lars Henrich, Larisa Ivanova, Arsalan Karimzadmotallebiazar, Annette Bieniusa, Leo Van Waveren, Sebastian Vollmer
Title: KI-Adventskalender: An Informal Learning Intervention for Data & AI Literacy
Abstract:
Secondary school students increasingly encounter AI systems whose outputs depend on data quality, evaluation choices and modeling assumptions. To provide accessible entry points to these interconnected concepts, we developed KI-Adventskalender, a free web-based extracurricular initiative with 24 didactically curated, short, guided micro-challenges released daily in December, targeting data-centric competencies and socio-technical themes that shape how data are interpreted in practice. Drawing on two annual iterations, we report aggregate platform traces characterizing participation and task-level engagement. Participation increased substantially in 2025, but early attrition persists. Progression stabilized after midpoint: among users reaching Day 12 in 2025, more than 75% completed the calendar. Competence cluster performance shifted across years; higher revision rates co-occurred with strong pass rates, suggesting sustained engagement. We use these observations to motivate a next-step measurement agenda: tighter task instrumentation, embedded micro-assessments and mixed-method evaluation designs that can distinguish persistence from conceptual uptake, knowledge progression and durable learning outcomes.

Authors:Michelle Vaccaro, Jaeyoon Song, Abdullah Almaatouq, Michiel A. Bakker
Title: Evaluating Human-AI Safety: A Framework for Measuring Harmful Capability Uplift
Abstract:
Current frontier AI safety evaluations emphasize static benchmarks, third-party annotations, and red-teaming. In this position paper, we argue that AI safety research should focus on human-centered evaluations that measure harmful capability uplift: the marginal increase in a user's ability to cause harm with a frontier model beyond what conventional tools already enable. We frame harmful capability uplift as a core AI safety metric, ground it in prior social science research, and provide concrete methodological guidance for systematic measurement. We conclude with actionable steps for developers, researchers, funders, and regulators to make harmful capability uplift evaluation a standard practice.

Authors:Martin Lorenz, Niko Konzack, Alexander Lingler, Philipp Wintersberger, Patrick Ebel
Title: CR-Eyes: A Computational Rational Model of Visual Sampling Behavior in Atari Games
Abstract:
Designing mobile and interactive technologies requires understanding how users sample dynamic environments to acquire information and make decisions under time pressure. However, existing computational user models either rely on hand-crafted task representations or are limited to static or non-interactive visual inputs, restricting their applicability to realistic, pixel-based environments. We present CR-Eyes, a computationally rational model that simulates visual sampling and gameplay behavior in Atari games. Trained via reinforcement learning, CR-Eyes operates under perceptual and cognitive constraints and jointly learns where to look and how to act in a time-sensitive setting. By explicitly closing the perception-action loop, the model treats eye movements as goal-directed actions rather than as isolated saliency predictions. Our evaluation shows strong alignment with human data in task performance and aggregate saliency patterns, while also revealing systematic differences in scanpaths. CR-Eyes is a step toward scalable, theory-grounded user models that support design and evaluation of interactive systems.

Authors:Jiajia Song, Zhihan Guo, Jionghao Lin
Title: Simulating Novice Students Using Machine Unlearning and Relearning in Large Language Models
Abstract:
Student simulation can support learning-by-teaching pedagogy where human students (as tutors) teach AI-simulated novice students (as tutees). Recent research often relies on prompt engineering with large language models (LLMs) to simulate novice student behaviour, but it is difficult to keep the AI-simulated student at a stable novice knowledge level. A key reason is that many LLMs are trained to be broadly capable, so even when prompted to "act like a novice," the LLMs can still produce expert-level explanations during the learning-by-teaching interaction process. As a result, the AI-simulated student may drift beyond the intended knowledge level, reducing the credibility of the simulation for studying learning-by-teaching processes. Thus, we propose a knowledge-level simulation approach based on machine unlearning. We investigate this approach using a dataset of multiple-choice questions on Python programming concepts. We apply machine unlearning to transform a knowledgeable LLM into a novice-level AI student (i.e., teachable agent), then evaluate whether the teachable agent can relearn targeted knowledge components through learning-by-teaching dialogue interactions. Finally, we analyse the dialogue logs to characterise how the agent's behaviour changes over time, including its question asking, error patterns, and responsiveness to instruction. The results show that (1) unlearning produces simulated student agents with more novice-like responses than prompt-only baselines, (2) the agents recover a measurable portion of the unlearned knowledge under structured exposure, and (3) dialogue analyses reveal identifiable trajectories of conceptual change and teaching moves that predict learning recovery.

Authors:Maddie Juarez, Abha Rai, Kristen E. Ravi, Margaret C. Delaney, Danny Olweean, Eric Klingensmith, Swarnali Banerjee, Neil Klingensmith, George K. Thiruvathukal
Title: HeyFriend Helper: A Conversational AI Web-App for Resource Access Among Low-Income Chicago Residents
Abstract:
Low-income individuals can face multiple challenges in their ability to seek employment. Barriers to employment often include limited access to digital literacy resources, training, interview preparation and resume feedback. Prior work has largely focused on targeted social service or healthcare applications that address needs individually, with little emphasis on conversational AI-driven systems that integrate multiple localized digital resources to provide comprehensive support. This work presents HeyFriend Helper, a web-based platform designed to support low-income residents in Chicago through an interactive conversational assistant that provides personalized support and guidance. HeyFriend Helper integrates multiple tools, including resume building and feedback, interview practice, mindfulness and well-being resources, employment trend and career outcome information, language learning support, and location-based access to community services. This work represents an interdisciplinary collaboration between social work, computer science, and engineering that addresses the multifaceted needs of low-income individuals. The findings demonstrate the importance of career-readiness tools and conversational user interface (CUIs) in providing holistic support.

Authors:Martiño Rivera-Dourado, Rubén Pérez-Jove, Alejandro Pazos, Jose Vázquez-Naya
Title: Usability of Passwordless Authentication in Wi-Fi Networks: A Comparative Study of Passkeys and Passwords in Captive Portals
Abstract:
Passkeys have recently emerged as a passwordless authentication mechanism, yet their usability in captive portals remains unexplored. This paper presents an empirical, comparative usability study of passkeys and passwords in a Wi-Fi hotspot using a captive portal. We conducted a controlled laboratory experiment with 50 participants following a split-plot design across Android and Windows platforms, using a router implementing the FIDO2CAP protocol. Our results show a tendency for passkeys to be perceived as more usable than passwords during login, although differences are not statistically significant. Independent of the authentication method, captive portal limitations negatively affected user experience and increased error rates. We further found that passkeys are generally easy to configure on both platforms, but platform-specific issues introduce notable usability challenges. Based on quantitative and qualitative findings, we derive design recommendations to improve captive portal authentication, including the introduction of usernameless authentication flows, improved captive portal detection mechanisms, and user interface design changes.

Authors:Ray-Yuan Chung, Jaime Snyder, Zixuan Xu, Daeun Yoo, Athena C. Ortega, Wanda Pratt, Aaron Wightman, Ryan Hutson, Cozumel Pruette, Ari Pollack
Title: Co-designing for the Triad: Design Considerations for Collaborative Decision-Making Technologies in Pediatric Chronic Care
Abstract:
In pediatric chronic care, the triadic relationship among patients, caregivers, and healthcare providers introduces unique challenges for youth in managing their conditions. Diverging values, roles, and asymmetrical situational awareness across decision-maker groups often hinder collaboration and affect health outcomes, highlighting the need to support collaborative decision-making. We conducted co-design workshops with 6 youth with chronic kidney disease, 6 caregivers, and 7 healthcare providers to explore how digital technologies can be designed to support collaborative decision-making. Findings identify barriers across all levels of situational awareness, ranging from individual cognitive and emotional constraints, misaligned mental models, to relational conflicts regarding care goals. We propose design implications that support continuous decision-making practice, align mental models, balance caregiver support and youth autonomy development, and surface potential care challenges. This work advances the design of collaborative decision-making technologies that promote shared understanding and empower families in pediatric chronic care.

Authors:Ray-Yuan Chung, Xuhai Xu, Ari Pollack
Title: Rethinking Health Agents: From Siloed AI to Collaborative Decision Mediators
Abstract:
Large language model based health agents are increasingly used by health consumers and clinicians to interpret health information and guide health decisions. However, most AI systems in healthcare operate in siloed configurations, supporting individual users rather than the multi-stakeholder relationships central to healthcare. Such use can fragment understanding and exacerbate misalignment among patients, caregivers, and clinicians. We reframe AI not as a standalone assistant, but as a collaborator embedded within multi-party care interactions. Through a clinically validated fictional pediatric chronic kidney disease case study, we show that breakdowns in adherence stem from fragmented situational awareness and misaligned goals, and that siloed use of general-purpose AI tools does little to address these collaboration gaps. We propose a conceptual framework for designing AI collaborators that surface contextual information, reconcile mental models, and scaffold shared understanding while preserving human decision authority.

Authors:Soufiane Jhilal, Martina Galletti
Title: Robust Multilingual Text-to-Pictogram Mapping for Scalable Reading Rehabilitation
Abstract:
Reading comprehension presents a significant challenge for children with Special Educational Needs and Disabilities (SEND), often requiring intensive one-on-one reading support. To assist therapists in scaling this support, we developed a multilingual, AI-powered interface that automatically enhances text with visual scaffolding. This system dynamically identifies key concepts and maps them to contextually relevant pictograms, supporting learners across languages. We evaluated the system across five typologically diverse languages (English, French, Italian, Spanish, and Arabic), through multilingual coverage analysis, expert clinical review by speech therapists and special education professionals, and latency assessment. Evaluation results indicate high pictogram coverage and visual scaffolding density across the five languages. Expert audits suggested that automatically selected pictograms were semantically appropriate, with combined correct and acceptable ratings exceeding 95% for the four European languages and approximately 90% for Arabic despite reduced pictogram repository coverage. System latency remained within interactive thresholds suitable for real-time educational use. These findings support the technical viability, semantic safety, and acceptability of automated multimodal scaffolding to improve accessibility for neurodiverse learners.

Authors:Jan Tiemann, Matthew McGinity, Ulrik Günther
Title: Honey, I shrunk the scientist -- Evaluating 2D, 3D, and VR interfaces for navigating samples under the microscope
Abstract:
In contemporary biology and medicine, 3D microscopy is one of the most widely-used techniques for imaging and manipulation of various kinds of samples. Navigating such a micrometer-sized, 3-dimensional sample under the microscope -- e.g. to find relevant imaging regions -- can pose a tedious challenge for the experimenter. In this paper, we examine whether 2D desktop, 3D desktop, or Virtual Reality (VR) interfaces provide the best user experience and performance for the exploration of 3D samples. We invited 12 skilled microscope operators to perform two different exploration tasks in 2D, 3D and VR and compared all conditions in terms speed, usability, and completion. Our results show a clear benefit when using VR -- in terms of task efficiency, usability, and user acceptance. Intriguingly, while VR outperformed desktop 2D and 3D in all scenarios, 3D desktop did not outperform 2D desktop.

Authors:Mohammad Ratul Mahjabin, Raiyan Abdul Baten
Title: General Intellectual Humility Is Malleable Through AI-Mediated Reflective Dialogue
Abstract:
General intellectual humility (GIH) -- the recognition that one's beliefs may be fallible and revisable -- is associated with improved reasoning, learning, and social discourse, yet is widely regarded as a stable trait resistant to intervention. We test whether GIH can be elevated through a conversational intervention that combines staged cognitive scaffolding with personalized Socratic reflection. In a randomized controlled experiment (N=400), participants engaged in a structured, LLM-mediated dialogue that progressed from conceptual understanding of intellectual humility to applying, analyzing, evaluating, and generating novel, self-relevant scenarios that instantiate it. Relative to a time-matched control, the intervention produced a systematic increase in GIH, reduced rank-order stability, and tripled the rate of reliable individual improvement. Crucially, these effects persisted over a two-week follow-up without detectable decay. The effects generalized across political affiliation and did not depend on baseline personality profile. These findings challenge the prevailing pessimism regarding the malleability of GIH and suggest that scaffolded, Socratic reflection delivered through structured dialogue can produce durable changes in general intellectual humility.

Authors:Boxuan Ma, Shinichi Konomi
Title: CodeExemplar: Example-Based Scaffolding for Introductory Programming in the GenAI Era
Abstract:
Generative AI (GenAI) can generate working code with minimal effort, creating a tension in introductory programming: students need timely help, yet direct solutions invite copying and can short-circuit reasoning. To address this, we propose example-based scaffolding, where GenAI provides scaffold examples that match a target task's underlying reasoning pattern but differ in contexts to support analogical transfer while reducing copying. We contribute a two-dimensional taxonomy, design guidelines, and CodeExemplar, a prototype integrated with auto-graded tasks, with initial formative feedback from a classroom pilot and instructor interviews.

Authors:Ibrahim Bilau, Stacie Smith, Abdurrahman Baru, Marwan Shagar, Brian Jones, Eunhwa Yang
Title: A Reproducible Reality-to-VR Pipeline for Ecologically Valid Aging-in-Place Research
Abstract:
Virtual reality (VR) has emerged as a promising tool for assessing instrumental activities of daily living (IADLs) in older adults. However, the ecological validity of these simulations is often compromised by simplified or low-fidelity environmental design that fails to elicit a genuine sense of presence. This paper documents a reproducible Reality-to-VR pipeline for creating a photorealistic environmental simulation to support a study on cognitive aging in place. The proposed workflow captured the as-built kitchen of the Aware Home building at Georgia Tech using Terrestrial Laser Scanning (TLS) for sub-millimeter geometric accuracy, followed by point cloud processing in Faro SCENE, geometric retopology in SketchUp, and integration into Unreal Engine 5 via Datasmith with Lumen global illumination for high visual fidelity. The pipeline achieved photorealistic rendering while maintaining a stable 90 Hz frame rate, a critical threshold for mitigating cybersickness in older populations. The environment also enables instantaneous manipulation of environmental variables, such as switching between closed cabinetry and open shelving, providing experimental flexibility impossible in physical settings. Participant validation with 17 older adults confirmed minimal cybersickness risk and preserved sensitivity to the experimental manipulation, supporting the pipeline's feasibility for aging-in-place research and establishing a benchmark for future comparative studies.

Authors:Licol Zeinfeld, Alona Strugatski, Ziva Bar-Dov, Ron Blonder, Shelley Rap, Giora Alexandron
Title: Assessment Design in the AI Era: A Method for Identifying Items Functioning Differentially for Humans and Chatbots
Abstract:
The rapid adoption of large language models (LLMs) in education raises profound challenges for assessment design. To adapt assessments to the presence of LLM-based tools, it is crucial to characterize the strengths and weaknesses of LLMs in a generalizable, valid and reliable manner. However, current LLM evaluations often rely on descriptive statistics derived from benchmarks, and little research applies theory-grounded measurement methods to characterize LLM capabilities relative to human learners in ways that directly support assessment design. Here, by combining educational data mining and psychometric theory, we introduce a statistically principled approach for identifying items on which humans and LLMs show systematic response differences, pinpointing where assessments may be most vulnerable to AI misuse, and which task dimensions make problems particularly easy or difficult for generative AI. The method is based on Differential Item Functioning (DIF) analysis -- traditionally used to detect bias across demographic groups -- together with negative control analysis and item-total correlation discrimination analysis. It is evaluated on responses from human learners and six leading chatbots (ChatGPT-4o \& 5.2, Gemini 1.5 \& 3 Pro, Claude 3.5 \& 4.5 Sonnet) to two instruments: a high school chemistry diagnostic test and a university entrance exam. Subject-matter experts then analyzed DIF-flagged items to characterize task dimensions associated with chatbot over- or under-performance. Results show that DIF-informed analytics provide a robust framework for understanding where LLM and human capabilities diverge, and highlight their value for improving the design of valid, reliable, and fair assessment in the AI era.

Authors:Matheus Kunzler Maldaner, Raul Valle, Junsung Kim, Tonuka Sultan, Pranav Bhargava, Matthew Maloni, John Courtney, Hoang Nguyen, Aamogh Sawant, Kristian O'Connor, Stephen Wormald, Damon L. Woodard
Title: Plato's Cave: A Human-Centered Research Verification System
Abstract:
The growing publication rate of research papers has created an urgent need for better ways to fact-check information, assess writing quality, and identify unverifiable claims. We present Plato's Cave as an open-source, human-centered research verification system that (i) creates a directed acyclic graph (DAG) from a document, (ii) leverages web agents to assign credibility scores to nodes and edges from the DAG, and (iii) gives a final score by interpreting and evaluating the paper's argumentative structure. We report the system implementation and results on a collected dataset of 104 research papers.

Authors:Pratyasha Saha, Anita Say Chan, Sharifa Sultana
Title: When Data Protection Fails to Protect: Law, Power, and Postcolonial Governance in Bangladesh
Abstract:
Rapid digitization across government services, financial platforms, and telecommunications has intensified the collection and processing of large scale personal data in Bangladesh. In response, the state has introduced multiple regulatory instruments, including the Personal Data Protection Ordinance, the Cyber Security Ordinance, and the National Data Governance Ordinance in 2025. While these initiatives signal an emerging legal regime for data protection, little scholarly work examines how these frameworks operate collectively in practice. This paper presents a legal and institutional analysis of Bangladeshs emerging data protection regime through a systematic review of these three ordinances. Through this review, the paper provides an integrated mapping of Bangladeshs evolving data protection framework and identifies key legal and institutional barriers that undermine the effective protection of citizens personal data. Our findings reveal that this emerging regime is constrained by limited institutional independence, uneven regulatory capacity, and the misaligned legal assumption of individualized, autonomous data subjects. Furthermore, these frameworks invisibilize prevalent sociotechnical layers, such as informal data flows and mediated access via human bridges, rendering formal protections difficult to operationalize. This paper contributes to HCI scholarship by expanding the concept of data protection as a complex sociotechnical design problem shaped by the informal infrastructures of the Global South.

Authors:ZhaoBin Li, Mark Steyvers
Title: Learning to Trust: How Humans Mentally Recalibrate AI Confidence Signals
Abstract:
Productive human-AI collaboration requires appropriate reliance, yet contemporary AI systems are often miscalibrated, exhibiting systematic overconfidence or underconfidence. We investigate whether humans can learn to mentally recalibrate AI confidence signals through repeated experience. In a behavioral experiment (N = 200), participants predicted the AI's correctness across four AI calibration conditions: standard, overconfidence, underconfidence, and a counterintuitive "reverse confidence" mapping. Results demonstrate robust learning across all conditions, with participants significantly improving their accuracy, discrimination, and calibration alignment over 50 trials. We present a computational model utilizing a linear-in-log-odds (LLO) transformation and a Rescorla-Wagner learning rule to explain these dynamics. The model reveals that humans adapt by updating their baseline trust and confidence sensitivity, using asymmetric learning rates to prioritize the most informative errors. While humans can compensate for monotonic miscalibration, we identify a significant boundary in the reverse confidence scenario, where a substantial proportion of participants struggled to override initial inductive biases. These findings provide a mechanistic account of how humans adapt their trust in AI confidence signals through experience.

Authors:Esther Bosch, Michael Scholz, Anke Sauerländer-Biebl, Klas Ihme
Title: Mapping Travel Experience in Public Transport: Real-Time Evidence and Spatial Analysis in Hamburg
Abstract:
Shifting travel from private cars to public transport is critical for meeting climate and related mobility goals, yet passengers will only choose transit if it offers a consistently positive experience. Previous studies of passenger satisfaction have largely relied on retrospective surveys, which overlook the dynamic and spatially differentiated nature of travel experience. This paper introduces a novel combination of real-time experience sampling and spatial hot spot analysis to capture and map where public transport users report consistently positive or negative experiences. Data were collected from 239 participants in Hamburg between March and September 2025. Using a smartphone application, travelers reported their momentary journey experience every five minutes during everyday trips, yielding over 21,000 in-situ evaluations. These geo-referenced data were analyzed with the Getis-Ord $Gi^{*}$ statistic to detect significant clusters of positive and negative travel experience. The analysis identified distinct hot and cold spots of travel experience across the network. Cold spots were shaped by heterogeneous problems, ranging from predominantly delay-dominated to overcrowding or socially stressful locations. In contrast, hot spots emerged through different pathways, including comfort-oriented, time-efficient or context-driven environments. The findings highlight three contributions. First, cold spots are not uniform but reflect specific local constellations of problems, requiring targeted interventions. Second, hot spots illustrate multiple success models that can serve as benchmarks for replication. Third, this study demonstrates the value of combining dynamic high-resolution sampling with spatial statistics to guide more effective and place-specific improvements in public transport.

Authors:Kazi Noshin, Sharifa Sultana
Title: The Illusion of Agreement with ChatGPT: Sycophancy and Beyond
Abstract:
While concerns about ChatGPT-induced harms due to sycophancy and other behaviors, including gaslighting, have grown among researchers, how users themselves experience and mitigate these harms remain largely underexplored. We analyze Reddit discussions to investigate what concerns users report and how they address them. Our findings reveal five distinct user-reported concerns that manifest across multiple life domains, ranging from personal to societal: inducing delusion, digressing narratives, implicating users for models' limitations, inducing addiction, and providing unsupervised psychological support. We document three-tier user-driven suggestions spanning functional usage techniques, behavioral approaches, and private and institutional safeguards. Our findings show that AI-induced harms require coordinated interventions across users, developers, and policymakers. We discuss design implications and future directions to mitigate the harms and ensure user benefits.

Authors:Peng Kuang, Emma Söderberg, April Yi Wang, Martin Höst
Title: GazePrinter: Visualizing Expert Gaze to Guide Novices in a New Codebase
Abstract:
Program comprehension is an essential activity in software engineering. Not only does it often challenge professionals, but it can also hinder novices from advancing their programming skills. Gaze, an emerging modality in developer tools, has so far primarily been utilized to improve our understanding of programmers' visual attention and as a means to reason about programmers' cognitive processes. There has been limited exploration of integrating gaze-based assistance into development environments to support programmers, despite the tight links between attention and gaze. We also know that joint attention is important in collaboration, further suggesting that there is value in exploring collective gaze. In this paper, we investigate the effect of visualizing gaze patterns gathered from experts to novice programmers to assist them with program comprehension in a new codebase. To this end, we present GazePrinter, designed to provide gaze-orienting visual cues informed by experts to aid novices with program comprehension. We present the results of a mixed-methods study conducted with 40 novices to study the effects of using GazePrinter for program comprehension tasks. The study included a survey, a controlled experiment, and interviews. We found that visualization of expert gaze can have a significant effect on novice programmers' behavior in terms of which path they take through the code base; with GazePrinter, novices took a path closer to the path taken by experts. We also found indications of reduced time and cognitive load among novices using GazePrinter.

Authors:Yao Xiao, Rafael A. Calvo
Title: AI as Relational Translator: Rethinking Belonging and Mutual Legibility in Cross-Cultural Contexts
Abstract:
Against rising global loneliness, AI companions promise connection, yet accumulating evidence suggests that, for some users and contexts, intensive companion-style use can correlate with increased loneliness and reduced offline socialisation. This position paper challenges the dominant "AI as companion" paradigm by proposing a shift: from AI that simulates relationships with humans to AI that supports relationships between humans. We introduce Relational AI Translation, positioning AI as cultural-relational infrastructure that scaffolds human connection across cultural, generational, and geographical divides. Using first-generation East Asian migrants as a theoretically productive critical case, we outline a multi-agent architecture instantiating three translation operations: emotion-intent decoding, contextual reframing, and relational scaffolding. We articulate design provocations around measurement, safety architecture, and the tension between technological intervention and structural justice, and explicitly frame success as graduation toward renewed human-to-human support rather than sustained engagement with the system.

Authors:Davide Traini, José Manuel Alcalde-Llergo, Mariana Buenestado-Fernández, Domenico Ursino, Enrique Yeguas-Bolívar
Title: Behavioral Engagement in VR-Based Sign Language Learning: Visual Attention as a Predictor of Performance and Temporal Dynamics
Abstract:
This study analyzes behavioral engagement in SONAR, a virtual reality application designed for sign language training and validation. We focus on three automatically derived engagement indicators (Visual Attention (VA), Video Replay Frequency (VRF), and Post-Playback Viewing Time (PPVT)) and examine their relationship with learning performance. Participants completed a self-paced Training phase, followed by a Validation quiz assessing retention. We employed Pearson correlation analysis to examine the relationships between engagement indicators and quiz performance, followed by binomial Generalized Linear Model (GLM) regression to assess their joint predictive contributions. Additionally, we conducted temporal analysis by aggregating moment-to-moment VA traces across all learners to characterize engagement dynamics during the learning session. Results show that VA exhibits a strong positive correlation with quiz performance,followed by PPVT, whereas VRF shows no meaningful association. A binomial GLM confirms that VA and PPVT are significant predictors of learning success, jointly explaining a substantial proportion of performance variance. Going beyond outcome-oriented analysis, we characterize temporal engagement patterns by aggregating moment-to-moment VA traces across all learners. The temporal profile reveals distinct attention peaks aligned with informationally dense segments of both training and validation videos, as well as phase-specific engagement dynamics, including initial acclimatization, oscillatory attention cycles during learning, and pronounced attentional peaks during assessment. Together, these findings highlight the central role of sustained and strategically allocated visual attention in VR-based sign language learning and demonstrate the value of behavioral trace data for understanding and predicting learner engagement in immersive environments.

Authors:Echo Zexuan Pan, Danny Glick, Ying Xu
Title: How Motivation Relates to Generative AI Use: A Large-Scale Survey of Mexican High School Students
Abstract:
This study examined how high school students with different motivational profiles use generative AI tools in math and writing. Through K-means clustering analysis of survey data from 6,793 Mexican high school students, we identified three distinct motivational profiles based on self-concept and perceived subject value. Results revealed distinct domain-specific AI usage patterns across students with different motivational profiles. Our findings challenge one-size-fits-all AI integration approaches and advocate for motivationally-informed educational interventions.

Authors:Sérgio Alves, Carlos Duarte, Kyle Montague, Tiago Guerreiro
Title: Exploring the Role of Interaction Data to Empower End-User Decision-Making In UI Personalization
Abstract:
User interface personalization enhances digital efficiency, usability, and accessibility. However, in user-driven setups, limited support for identifying and evaluating worthwhile opportunities often leads to underuse. We explore a reflexive personalization approach where individuals engage with their digital interaction data to identify meaningful personalization opportunities and benefits. We interviewed 12 participants, using experimental vignettes as design probes to support reflection on different forms of using interaction data to empower decision-making in personalization and the preferred level of system support. We found that people can independently identify personalization opportunities but prefer system support through visual personalization suggestions. Interaction data can shape how users perceive and approach personalization by reinforcing the perceived value of change and data collection, helping them weigh benefits against effort, and increasing the transparency of system suggestions. We discuss opportunities for designing personalization software that raises end-users' agency over interfaces through reflective engagement with their interaction data.

Authors:Min-yung Kim, Jinwook Kim, Ken Pfeuffer, Sang Ho Yoon
Title: Align-to-Scale: Mode Switching Technique for Unimanual 3D Object Manipulation with Gaze-Hand-Object Alignment in Extended Reality
Abstract:
As extended reality (XR) technologies rapidly become as ubiquitous as today's mobile devices, supporting one-handed interaction becomes essential for XR. However, the prevalent Gaze + Pinch interaction model partially supports unimanual interaction, where users select, move, and rotate objects with one hand, but scaling typically requires both hands. In this work, we leverage the spatial alignment between gaze and hand as a mode switch to enable single-handed pinch-to-scale. We design and evaluate several techniques geared for one-handed scaling and assess their usability in a compound translate-scale task. Our findings show that all proposed methods effectively enable one-handed scaling, but each method offers distinct advantages and trade-offs. To this end, we derive design guidelines to support futuristic 3D interfaces with unimanual interaction. Our work helps make eye-hand 3D interaction in XR more mobile, flexible, and accessible.

Authors:Jasmine Rienecker, Katarina Mpofu, Naman Goel, Siddhartha Datta, Jun Zhao, Oscar Danielsson, Fredrik Thorsen
Title: Auditing Preferences for Brands and Cultures in LLMs
Abstract:
Large language models (LLMs) based AI systems increasingly mediate what billions of people see, choose and buy. This creates an urgent need to quantify the systemic risks of LLM-driven market intermediation, including its implications for market fairness, competition, and the diversity of information exposure. This paper introduces ChoiceEval, a reproducible framework for auditing preferences for brands and cultures in large language models (LLMs) under realistic usage conditions. ChoiceEval addresses two core technical challenges: (i) generating realistic, persona-diverse evaluation queries and (ii) converting free-form outputs into comparable choice sets and quantitative preference metrics. For a given topic (e.g. running shoes, hotel chains, travel destinations), the framework segments users into psychographic profiles (e.g., budget-conscious, wellness-focused, convenience), and then derives diverse prompts that reflect real-world advice-seeking and decision-making behaviour. LLM responses are converted into normalised top-k choice sets. Preference and geographic bias are then quantified using comparable metrics across topics and personas. Thus, ChoiceEval provides a scalable audit pipeline for researchers, platforms, and regulators, linking model behaviour to real-world economic outcomes. Applied to Gemini, GPT, and DeepSeek across 10 topics spanning commerce and culture and more than 2,000 questions, ChoiceEval reveals consistent preferences: U.S.-developed models Gemini and GPT show marked favouritism toward American entities, while China-developed DeepSeek exhibits more balanced yet still detectable geographic preferences. These patterns persist across user personas, suggesting systematic rather than incidental effects.

Authors:Carter Sale, Melissa N. Stolar, Gaurav Patil, Michael J. Gostelow, Julia Wallier, Margaret C. Macpherson, Jan-Louis Kruger, Mark Dras, Simon G. Hosking, Rachel W. Kallen, Michael J. Richardson
Title: Facial Movement Dynamics Reveal Workload During Complex Multitasking
Abstract:
Real-time cognitive workload monitoring is crucial in safety-critical environments, yet established measures are intrusive, expensive, or lack temporal resolution. We tested whether facial movement dynamics from a standard webcam could provide a low-cost alternative. Seventy-two participants completed a multitasking simulation (OpenMATB) under varied load while facial keypoints were tracked via OpenPose. Linear kinematics (velocity, acceleration, displacement) and recurrence quantification features were extracted. Increasing load altered dynamics across timescales: movement magnitudes rose, temporal organisation fragmented then reorganised into complex patterns, and eye-head coordination weakened. Random forest classifiers trained on pose kinematics outperformed task performance metrics (85% vs. 55% accuracy) but generalised poorly across participants (43% vs. 33% chance). Participant-specific models reached 50% accuracy with minimal calibration (2 minutes per condition), improving continuously to 73% without plateau. Facial movement dynamics sensitively track workload with brief calibration, enabling adaptive interfaces using commodity cameras, though individual differences limit cross-participant generalisation.

Authors:Zhuoyi Cheng, Steven Houben
Title: Who's Sense is This? Possibility for Impacting Human Insights in AI-assisted Sensemaking
Abstract:
Sensemaking is an important preceding step for activities like consensus building and decision-making. When groups of people make sense of large amounts of information, their understanding gradually evolves from vague to clear. During this process when reaching a conclusion is still premature, if people are presented with others' insights, they may be directed to focus on that specific perspective without adequate verification. We argue that similar phenomena may also exist in AI-assisted sensemaking, in which AI will usually be the one that presents insight prematurely when users' understandings are still vague and ill-formed. In this paper, we raised three questions that are worth deliberation before exploiting AI to assist in collaborative sensemaking in practice, and discussed possible reasons that may lead users to opt for insights from AI.

Authors:Ava Nederlander, Zainab Aamir, Arie E. Kaufman
Title: Scale-Aware Navigation of Astronomical Survey Imagery Data on High Resolution Immersive Displays
Abstract:
Upcoming astronomical surveys produce imagery that spans many orders of magnitude in spatial scale, requiring scientists to reason fluidly between global structure and local detail. Data from the Vera C. Rubin Observatory exemplifies this challenge, as traditional desktop-based workflows often rely on discrete views or static cutouts that fragment context during exploration. This paper presents a design-oriented framework for scale-aware navigation of astronomical survey imagery in high-resolution immersive display environments. We illustrate these principles through representative usage scenarios using Vera Rubin Observatory and Milky Way survey imagery deployed in room-scale immersive environments, including tiled high-resolution displays and curved immersive systems. Our goal is to contribute design insights that inform the development of immersive interaction paradigms for exploratory analysis of extreme-scale scientific imagery.

Authors:Timo K. Koch, Florian Bemmann, Ramona Schoedel, Markus Buehner, Clemens Stachl
Title: Collecting Prosody in the Wild: A Content-Controlled, Privacy-First Smartphone Protocol and Empirical Evaluation
Abstract:
Collecting everyday speech data for prosodic analysis is challenging due to the confounding of prosody and semantics, privacy constraints, and participant compliance. We introduce and empirically evaluate a content-controlled, privacy-first smartphone protocol that uses scripted read-aloud sentences to standardize lexical content (including prompt valence) while capturing natural variation in prosodic delivery. The protocol performs on-device prosodic feature extraction, deletes raw audio immediately, and transmits only derived features for analysis. We deployed the protocol in a large study (N = 560; 9,877 recordings), evaluated compliance and data quality, and conducted diagnostic prediction tasks on the extracted features, predicting speaker sex and concurrently reported momentary affective states (valence, arousal). We discuss implications and directions for advancing and deploying the protocol.

Authors:Mia Huong Nguyen, Moritz Alexander Messerschmidt, Jochen Huber, Suranga Nanayakkara
Title: VisceroHaptics: Investigating the Effects of Gut-based Audio-Haptic Feedback on Gastric Feelings and Gastric Interoceptive Behavior
Abstract:
Gastric interoception influences eating behavior and emotions, making its modulation valuable for healthcare and human-computer-interaction applications. However, whether gastric interoception can be modulated noninvasively in humans remains unclear. While previous research indicates that abdominal-sound-driven haptic feedback resembles gut sensations, its impact on feelings and gastric interoceptive behavior is unknown. We conducted three experiments totalling 55 participants to investigate how gut-sound-driven audio-haptic feedback applied to the stomach (1) affects user's feelings (2) influences perception of hunger and satiety levels and (3) influences gastric interoceptive behavior, quantified with Water Load Test-II. Results revealed that audio-haptic feedback patterns (a) induced the feelings of hunger, fullness, thirst, stomach upset, (b) increased hunger level, and (c) significantly increased volumes of ingested water. This work provides the first evidence showing that audio-haptic stimulation can alter gastric interoceptive behavior, motivating the use of noninvasive methods to influence users' feelings and behaviors in future applications.

Authors:Dehui Kong, Martin Feick, Shi Liu, Alexander Maedche
Title: CoEmpaTeam: Enhancing Cognitive Empathy using LLM-based Avatars and Dynamic Role Play in Virtual Reality
Abstract:
Cognitive empathy, the ability to understand others' perspectives, is essential for effective communication, reducing biases, and constructive negotiation. However, this skill is declining in a performance-driven society, which prioritizes efficiency over perspective-taking. Here, the training of cognitive empathy is challenging because it is a subtle, hard-to-perceive soft skill. To address this, we developed CoEmpaTeam, a VR-based system that enables users to train their cognitive empathy by using LLM-driven avatars with different personalities. Through dynamic role play, users actively engage in perspective-taking, experiencing situations through another person's eyes. CoEmpaTeam deploys three avatars who significantly differ in their personality, validated by a technical evaluation and an online experiment (n=90). Next, we evaluated the system through a lab experiment with 32 participants who performed three sessions across two weeks, followed by a one-week diary study. Our results showed a significant increase in cognitive empathy, which, according to participants, transferred into their real lives.

Authors:Anna De Liddo, Lucas Anastasiou, Simon Buckingham Shum
Title: Human/AI Collective Intelligence for Deliberative Democracy: A Human-Centred Design Approach
Abstract:
This chapter introduces the concept of Collective Intelligence for Deliberative Democracy (CI4DD). We propose that the use of computational tools, specifically artificial intelligence to advance deliberative democracy, is an instantiation of a broader class of human-computer system designed to augment collective intelligence. Further, we argue for a fundamentally human-centred design approach to orchestrate how stakeholders can contribute meaningfully to shaping the artifacts and processes needed to create trustworthy DD processes. We first contextualise the key concepts of CI and the role of AI within it. We then detail our co-design methodology for identifying key challenges, refining user scenarios, and deriving technical implications. Two exemplar cases illustrate how user requirements from civic organisations were implemented with AI support and piloted in authentic contexts.

Authors:Sunday David Ubur, Eugenia Ha Rim Rho, Denis Gracanin
Title: Adaptive Captioning with Emotional Cues: Supporting DHH and Neurodivergent Learners in STEM
Abstract:
Real-time captioning is vital for Deaf and Hard of Hearing (DHH) and neurodivergent learners (e.g., those with ADHD), yet it often omits emotional and non-verbal cues essential for comprehension. This omission is particularly consequential in STEM education, where cognitively demanding material can exacerbate the challenges faced by caption users across diverse ability profiles. In this paper, we present a design-oriented exploration of four captioning prototypes that embed emotional and multimodal cues, including facial expressions, body gestures, keyword highlighting, and emoji. Across a pilot and a main study with 24 participants, we found that certain prototypes reduced self-reported cognitive load and improved comprehension scores compared to traditional captions. Qualitative feedback reveals the importance of customizable caption features to accommodate neurodivergent users' preferences (e.g., ADHD or different levels of comfort with emojis). Our findings contribute to ongoing conversations in accessible technology research about how best to integrate emotional cues into captions in a way that is both usable and beneficial for a wide range of learners.

Authors:Zihong He, Hai-Ning Liang, Chen Liang
Title: Tap-to-Adapt: Learning User-Aligned Response Timing for Speech Agents
Abstract:
Response timing judgment is a critical component of interactive speech agents. Although there exists substantial prior work on turn modeling and voice wake-up, there is a lack of research on response timing judgments continuously aligned with user intent. To address this, we propose the Tap-to-Adapt framework, which enables users to naturally activate or interrupt the agent via tap interactions to construct online learning labels for response timing models. Under this framework, Dilated TCN and a sequential replay strategy play significant roles, as demonstrated through data-driven experiments and user studies. Additionally, we develop an evaluation and continuous data mining system tailored for the Tap-to-Adapt framework, through which we have collected approximately 20,000 samples from the user studies involving 20 participants.

Authors:David Wegmann, Emil Stevnsborg, Søren Knudsen, Luca Rossi, Aske Mottelson
Title: Is Seeing Believing? Evaluating Human Sensitivity to Synthetic Video
Abstract:
Advances in machine learning have enabled the creation of realistic synthetic videos known as deepfakes. As deepfakes proliferate, concerns about rapid spread of disinformation and manipulation of public perception are mounting. Despite the alarming implications, our understanding of how individuals perceive synthetic media remains limited, obstructing the development of effective mitigation strategies. This paper aims to narrow this gap by investigating human responses to visual and auditory distortions of videos and deepfake-generated visuals and narration. In two between-subjects experiments, we study whether audio-visual distortions affect cognitive processing, such as subjective credibility assessment and objective learning outcomes. A third study reveals that artifacts from deepfakes influence credibility. The three studies show that video distortions and deepfake artifacts can reduce credibility. Our research contributes to the ongoing exploration of the cognitive processes involved in the evaluation and perception of synthetic videos, and underscores the need for further theory development concerning deepfake exposure.

Authors:Jacob Bradshaw, Mohsen Riahi Alam, Bhanuja Ainary, Minseo Kim, Mohsen Amini Salehi
Title: Audo-Sight: AI-driven Ambient Perception Across Edge-Cloud for Blind and Low Vision Users
Abstract:
Despite advances in assistive technologies, Blind and Low-Vision (BLV) individuals continue to face challenges in understanding their surroundings. Delivering concise, useful, and timely scene descriptions for ambient perception remains a long-standing accessibility problem. To address this, we introduce Audo-Sight, an AI-driven assistive system across Edge-Cloud that enables BLV individuals to perceive their surroundings through voice-based conversational interaction. Audo-Sight employs a set of expert and generic AI agents, each supported by dedicated processing pipelines distributed across edge and cloud. It analyzes user queries by considering urgency and contextual information to infer the user intent and dynamically route each query, along with a scene frame, to the most suitable pipeline. In cases where users require fast responses, the system simultaneously leverages edge and cloud processing pipelines. The edge generates an initial response quickly, while the cloud provides more detailed and accurate information. To overcome the challenge of seamlessly combining these outputs, we introduce the Response Fusion Engine, which fuses the fast edge response with the more accurate cloud output, ensuring timely and high-accuracy response for the BLV users. Systematic evaluation shows that Audo-Sight delivers speech output around 80% faster for urgent tasks and generates complete responses approximately 50% faster across all tasks compared to a commercial cloud-based solution -- highlighting the effectiveness of our system across edge-cloud. Human evaluation of Audo-Sight shows that it is the preferred choice over GPT-5 for 62% of BLV participants with another 23% stating both perform comparably.

Authors:Eva-Maria Schön, Michael Neumann, Tiago Silva da Silva
Title: Teaching Agile Requirements Engineering: A Stakeholder Simulation with Generative AI
Abstract:
Context: The active involvement of users and customers in agile software development remains a persistent challenge in practice. For this reason, it is important that students in higher education become familiar with good practices in Agile Requirements Engineering during their studies. Objective: Our objective is to enable students to learn how to interact with Generative Artificial Intelligence (GenAI) through the use of a stakeholder simulation with AI Personas, while also developing an understanding of the limitations of AI tools in practical contexts. Method: In our courses, we employ a stakeholder simulation using GenAI, in which students conduct interviews with AI Personas through a provided meta-prompt. Based on the outcomes of these interviews, students apply agile practices (e.g., story mapping or impact mapping) to document requirements. The use of GenAI is subsequently reflected upon in a structured group discussion. Results: Through this approach, students gain practical experience by applying state-of-the art agile practices for requirements elicitation and documentation while simultaneously developing an understanding of the technical and ethical limitations associated with the use of generative AI. Conclusion: We have applied this approach over several terms and found that using a meta-prompt provides flexibility, allowing us to remain independent of specific large language model providers.

Authors:Yerin Kwak, Zachary A. Pardos
Title: The RIGID Framework: Research-Integrated, Generative AI-Mediated Instructional Design
Abstract:
Instructional Design (ID) often faces challenges in incorporating research-based knowledge and pedagogical best practices. Although educational researchers and government agencies emphasize grounding ID in evidence, integrating research findings into everyday design workflows is often complex, as it requires considering multiple context-specific demands and constraints. To address this persistent gap, this paper explores how research in the learning sciences (LS) can be systematically integrated across ID workflows and how recent advances in generative AI can help operationalize this integration. While ID and LS share a commitment to improving learning experiences through design-oriented approaches in authentic contexts, structured integration between the two fields remains limited, leaving their complementary insights underutilized. We present RIGID (Research-Integrated, Generative AI-Mediated Instructional Design), a unified framework that integrates LS research across ID workflows spanning analysis, design, implementation, and evaluation phases, while leveraging generative AI to mediate this integration at each stage. The RIGID framework provides a systematic approach for enabling research-integrated instructional design that is both operational and context-sensitive, while preserving the central role of human expertise.

Authors:Hyungwoo Song, Jeongha Kim, Minju Kim, Duhyung Kwak, Minjeong Shin, Bongwon suh, Hyunggu Jung
Title: "I Should Know, But I Dare Not Ask": From Understanding Challenges in Patient Journeys to Deriving Design Implications for North Korean Defectors' Adaptation
Abstract:
While it is known that North Korean defectors (NKDs) struggle with South Korea's healthcare system, the specific challenges of their patient journey remain underexplored. To investigate this, we conducted interviews with 10 NKDs about an 8-step patient journey and identified the clinical consultation step as a critical barrier for all participants, marked by three key challenges: expressing symptoms, managing social and cultural concerns, and overcoming language differences. In response, we developed Medibridge, a mobile prototype that allows users to rehearse with an AI doctor before a real hospital visit to generate a tangible ``Helper Note'' for their actual consultation. Our evaluation with 15 NKDs showed improvements in perceived communication capability, including greater expression clarity, reduced social and cultural concerns, and enhanced linguistic confidence. Our contributions include an empirical understanding of NKDs' healthcare challenges, a novel AI-powered rehearsal system that prepares users for real-world clinical communication, and design implications for inclusive technologies for displaced populations.

Authors:Matthew Gaughan, Aaron Shaw, Darren Gergle
Title: Linguistic Similarity Within Centralized FLOSS Development
Abstract:
When free/libre and open source software (FLOSS) stewards centralize project development, they potentially undermine project sustainability and impact how contributors talk to each other. To study the relationship between steward-centralized development and contributor discussion, we compared the development of three Wikimedia platform features that the Wikimedia Foundation (WMF) built in MediaWiki. In a mixed-methods multi-case comparison, we used repository mining, linguistic style features, and principal component analysis to track MediaWiki feature development and issue discussions. Contrary to both our intuition and prior work, there were no identifiable differences in the linguistic style of WMF-affiliates and external contributors, even when feature development was guided by WMF contributions. From these results, we offer two provocations to the study of collaborative FLOSS development: (1) stewards dominate development according to their own use of specific project functionality; (2) centralized project development does not entail hierarchical language within project discussions.

Authors:Himel Ghosh, Nick Elias Werner
Title: LLM BiasScope: A Real-Time Bias Analysis Platform for Comparative LLM Evaluation
Abstract:
As large language models (LLMs) are deployed widely, detecting and understanding bias in their outputs is critical. We present LLM BiasScope, a web application for side-by-side comparison of LLM outputs with real-time bias analysis. The system supports multiple providers (Google Gemini, DeepSeek, MiniMax, Mistral, Meituan, Meta Llama) and enables researchers and practitioners to compare models on the same prompts while analyzing bias patterns. LLM BiasScope uses a two-stage bias detection pipeline: sentence-level bias detection followed by bias type classification for biased sentences. The analysis runs automatically on both user prompts and model responses, providing statistics, visualizations, and detailed breakdowns of bias types. The interface displays two models side-by-side with synchronized streaming responses, per-model bias summaries, and a comparison view highlighting differences in bias distributions. The system is built on Next.js with React, integrates Hugging Face inference endpoints for bias detection, and uses the Vercel AI SDK for multi-provider LLM access. Features include real-time streaming, export to JSON/PDF, and interactive visualizations (bar charts, radar charts) for bias analysis. LLM BiasScope is available as an open-source web application, providing a practical tool for bias evaluation and comparative analysis of LLM behaviour.

Authors:Mak Ahmad, Andrew Macvean, JJ Geewax, David Karger
Title: The Perfection Paradox: From Architect to Curator in AI-Assisted API Design
Abstract:
Enterprise API design is often bottlenecked by the tension between rapid feature delivery and the rigorous maintenance of usability standards. We present an industrial case study evaluating an AI-assisted design workflow trained on API Improvement Proposals (AIPs). Through a controlled study with 16 industry experts, we compared AI-generated API specifications against human-authored ones. While quantitative results indicated AI superiority in 10 of 11 usability dimensions and an 87% reduction in authoring time, qualitative analysis revealed a paradox: experts frequently misidentified AI work as human (19% accuracy) yet described the designs as unsettlingly "perfect." We characterize this as a "Perfection Paradox" -- where hyper-consistency signals a lack of pragmatic human judgment. We discuss the implications of this perfection paradox, proposing a shift in the human designer's role from the "drafter" of specifications to the "curator" of AI-generated patterns.

Authors:Gunnar P. Epping, Andrew Caplin, Erik Duhaime, William R. Holmes, Daniel Martin, Jennifer S. Trueblood
Title: Managing Cognitive Bias in Human Labeling Operations for Rare-Event AI: Evidence from a Field Experiment
Abstract:
Many operational AI systems depend on large-scale human annotation to detect rare but consequential events (e.g., fraud, defects, and medical abnormalities). When positives are rare, the prevalence effect induces systematic cognitive biases that inflate misses and can propagate through the AI lifecycle via biased training labels. We analyze prior experimental evidence and run a field experiment on DiagnosUs, a medical crowdsourcing platform, in which we hold the true prevalence in the unlabeled stream fixed (20% blasts) while varying (i) the prevalence of positives in the gold-standard feedback stream (20% vs. 50%) and (ii) the response interface (binary labels vs. elicited probabilities). We then post-process probabilistic labels using a linear-in-log-odds recalibration approach at the worker and crowd levels, and train convolutional neural networks on the resulting labels. Balanced feedback and probabilistic elicitation reduce rare-event misses, and pipeline-level recalibration substantially improves both classification performance and probabilistic calibration; these gains carry through to downstream CNN reliability out of sample.

Authors:David Fraile Navarro, Farah Magrabi, Enrico Coiera
Title: Evaluation format, not model capability, drives triage failure in the assessment of consumer health AI
Abstract:
Ramaswamy et al. reported in \textit{Nature Medicine} that ChatGPT Health under-triages 51.6\% of emergencies, concluding that consumer-facing AI triage poses safety risks. However, their evaluation used an exam-style protocol -- forced A/B/C/D output, knowledge suppression, and suppression of clarifying questions -- that differs fundamentally from how consumers use health chatbots. We tested five frontier LLMs (GPT-5.2, Claude Sonnet 4.6, Claude Opus 4.6, Gemini 3 Flash, Gemini 3.1 Pro) on a 17-scenario partial replication bank under constrained (exam-style, 1,275 trials) and naturalistic (patient-style messages, 850 trials) conditions, with targeted ablations and prompt-faithful checks using the authors' released prompts. Naturalistic interaction improved triage accuracy by 6.4 percentage points ($p = 0.015$). Diabetic ketoacidosis was correctly triaged in 100\% of trials across all models and conditions. Asthma triage improved from 48\% to 80\%. The forced A/B/C/D format was the dominant failure mechanism: three models scored 0--24\% with forced choice but 100\% with free text (all $p < 10^{-8}$), consistently recommending emergency care in their own words while the forced-choice format registered under-triage. Prompt-faithful checks on the authors' exact released prompts confirmed the scaffold produces model-dependent, case-dependent results. The headline under-triage rate is highly contingent on evaluation format and should not be interpreted as a stable estimate of deployed triage behavior. Valid evaluation of consumer health AI requires testing under conditions that reflect actual use.

Authors:Feng Chen, Luna Xingyu Li, Ray-Yuan Chung, Wenyu Zeng, Yein Jeon, Yizhou Hu, Oleg Zaslavsky
Title: Bridging the Cognitive Gap: Co-Designing and Evaluating a Voice-Enabled Community Chatbot for Older Adults
Abstract:
Digital portals in retirement communities often create physical and cognitive barriers for older adults, leading to digital avoidance. Generative AI offers a solution by enabling natural language interaction, yet its adoption is hindered by the opaque, "Black Box" nature of these systems and lingering usability challenges. To address this, we evaluated a voice-enabled Large Language Model (LLM) chatbot at a continuing care retirement community in the Pacific Northwest. Through a mixed-methods Co-Design and Literacy Workshop (N=25), we applied a "Glass Box" approach combining multimodal accessibility with intentional AI education. The intervention significantly improved participants' technical understanding (p=0.004) and perceived transparency (p=0.001), shifting their interaction model from blind trust to informed reliance prioritizing verifiable evidence. While voice input reduced cognitive load, usability scores dropped significantly for users aged 80 and older (r=-0.50), indicating that truly age-inclusive AI must evolve beyond touch-based interfaces toward zero-touch navigation.

Authors:Atieh Taheri, Hamza El Alaoui, Patrick Carrington, Jeffrey P. Bigham
Title: "I followed what felt right, not what I was told": Autonomy, Coaching, and Recognizing Bias Through AI-Mediated Dialogue
Abstract:
Ableist microaggressions remain pervasive in everyday interactions, yet interventions to help people recognize them are limited. We present an experiment testing how AI-mediated dialogue influences recognition of ableism. 160 participants completed a pre-test, intervention, and a post-test across four conditions: AI nudges toward bias (Bias-Directed), inclusion (Neutral-Directed), unguided dialogue (Self-Directed), and a text-only non-dialogue (Reading). Participants rated scenarios on standardness of social experience and emotional impact; those in dialogue-based conditions also provided qualitative reflections. Quantitative results showed dialogue-based conditions produced stronger recognition than Reading, though trajectories diverged: biased nudges improved differentiation of bias from neutrality but increased overall negativity. Inclusive or no nudges remained more balanced, while Reading participants showed weaker gains and even declines. Qualitative findings revealed biased nudges were often rejected, while inclusive nudges were adopted as scaffolding. We contribute a validated vignette corpus, an AI-mediated intervention platform, and design implications highlighting trade-offs conversational systems face when integrating bias-related nudges.

Authors:Jennah Gosciak, Eric Giannella, Zhaowen Guo, Michael Chen, Allison Koenecke
Title: LLMs in social services: How does chatbot accuracy affect human accuracy?
Abstract:
Social service programs like the Supplemental Nutrition Assistance Program (SNAP, or food stamps) have eligibility rules that can be challenging to understand. For nonprofit caseworkers who often support clients in navigating a dozen or more complex programs, LLM-based chatbots may offer a means to provide better, faster help to clients whose situations may be less common. In this paper, we measure the potential effects of LLM-based chatbot suggestions on caseworkers' ability to provide accurate guidance. We first created a 770-question multiple-choice benchmark dataset of difficult, but realistic questions that a caseworker might receive. Next, using these benchmark questions and corresponding expert-verified answers, we conducted a randomized experiment with caseworkers recruited from nonprofit outreach organizations in Los Angeles. Caseworkers in the control condition did not see chatbot suggestions and had a mean accuracy of 49%. Caseworkers in the treatment condition saw chatbot suggestions that we artificially varied to range in aggregate accuracy from low (53%) to high (100%). Caseworker performance significantly improves as chatbot quality improves: high-quality chatbots (96-100% accurate) improved caseworker accuracy by 27 percentage points. At the question-level, incorrect chatbot suggestions substantially reduce caseworker accuracy, with a two-thirds reduction on easy questions where the control group performed best (without chatbot suggestions). Finally, improvements in caseworker accuracy level off as chatbot accuracy increases, a phenomenon that we call the "AI underreliance plateau," which is a concern for real-world deployment and highlights the importance of evaluating human-in-the-loop tools with their users.

Authors:Daniel J. Buxton, Mufti Mahmud, Jordan J. Bird, Thomas Hughes-Roberts, David J. Brown
Title: A Platform-Agnostic Multimodal Digital Human Modelling Framework: Neurophysiological Sensing in Game-Based Interaction
Abstract:
Digital Human Modelling (DHM) is increasingly shaped by advances in AI, wearable biosensing, and interactive digital environments, particularly in research addressing accessibility and inclusion. However, many AI-enabled DHM approaches remain tightly coupled to specific platforms, tasks, or interpretative pipelines, limiting reproducibility, scalability, and ethical reuse. This paper presents a platform-agnostic DHM framework designed to support AI-ready multimodal interaction research by explicitly separating sensing, interaction modelling, and inference readiness. The framework integrates the OpenBCI Galea headset as a unified multimodal sensing layer, providing concurrent EEG, EMG, EOG, PPG, and inertial data streams, alongside a reproducible, game-based interaction environment implemented using SuperTux. Rather than embedding AI models or behavioural inference, physiological signals are represented as structured, temporally aligned observables, enabling downstream AI methods to be applied under appropriate ethical approval. Interaction is modelled using computational task primitives and timestamped event markers, supporting consistent alignment across heterogeneous sensors and platforms. Technical verification via author self-instrumentation confirms data integrity, stream continuity, and synchronisation; no human-subjects evaluation or AI inference is reported. Scalability considerations are discussed with respect to data throughput, latency, and extension to additional sensors or interaction modalities. Illustrative use cases demonstrate how the framework can support AI-enabled DHM and HCI studies, including accessibility-oriented interaction design and adaptive systems research, without requiring architectural modifications. The proposed framework provides an emerging-technology-focused infrastructure for future ethics-approved, inclusive DHM research.

Authors:Marta Sumyk, Oleksandr Kosovan
Title: CUAAudit: Meta-Evaluation of Vision-Language Models as Auditors of Autonomous Computer-Use Agents
Abstract:
Computer-Use Agents (CUAs) are emerging as a new paradigm in human-computer interaction, enabling autonomous execution of tasks in desktop environment by perceiving high-level natural-language instructions. As such agents become increasingly capable and are deployed across diverse desktop environments, evaluating their behavior in a scalable and reliable manner becomes a critical challenge. Existing evaluation pipelines rely on static benchmarks, rule-based success checks, or manual inspection, which are brittle, costly, and poorly aligned with real-world usage. In this work, we study Vision-Language Models (VLMs) as autonomous auditors for assessing CUA task completion directly from observable interactions and conduct a large-scale meta-evaluation of five VLMs that judge task success given a natural-language instruction and the final environment state. Our evaluation spans three widely used CUA benchmarks across macOS, Windows, and Linux environments and analyzes auditor behavior along three complementary dimensions: accuracy, calibration of confidence estimates, and inter-model agreement. We find that while state-of-the-art VLMs achieve strong accuracy and calibration, all auditors exhibit notable performance degradation in more complex or heterogeneous environments, and even high-performing models show significant disagreement in their judgments. These results expose fundamental limitations of current model-based auditing approaches and highlight the need to explicitly account for evaluator reliability, uncertainty, and variance when deploying autonomous CUAs in real-world settings.

Authors:Tianyu Xu, Sieun Kim, Qianhui Zheng, Ruoyu Xu, Tejasvi Ravi, Anuva Kulkarni, Katrina Passarella-Ward, Junyi Zhu, Adarsh Kowdle
Title: MoXaRt: Audio-Visual Object-Guided Sound Interaction for XR
Abstract:
In Extended Reality (XR), complex acoustic environments often overwhelm users, compromising both scene awareness and social engagement due to entangled sound sources. We introduce MoXaRt, a real-time XR system that uses audio-visual cues to separate these sources and enable fine-grained sound interaction. MoXaRt's core is a cascaded architecture that performs coarse, audio-only separation in parallel with visual detection of sources (e.g., faces, instruments). These visual anchors then guide refinement networks to isolate individual sources, separating complex mixes of up to 5 concurrent sources (e.g., 2 voices + 3 instruments) with ~2 second processing latency. We validate MoXaRt through a technical evaluation on a new dataset of 30 one-minute recordings featuring concurrent speech and music, and a 22-participant user study. Empirical results indicate that our system significantly enhances speech intelligibility, yielding a 36.2% (p < 0.01) increase in listening comprehension within adversarial acoustic environments while substantially reducing cognitive load (p < 0.001), thereby paving the way for more perceptive and socially adept XR experiences.

Authors:Haoting Gao, Kapotaksha Das, Mohamed Abouelenien, Michael Cole, James Cooke, Vitaliy Popov
Title: Towards Modeling Situational Awareness Through Visual Attention in Clinical Simulations
Abstract:
Situational awareness (SA) is essential for effective team performance in time-critical clinical environments, yet its dynamic and distributed nature remains difficult to characterize. In this preliminary study, we apply Transition Network Analysis (TNA) to model visual attention in multiperson VR-based cardiac arrest simulations. Using eye-tracking data from 40 clinicians assigned to four standardized roles (Airway, CPR, Defib, TeamLead), we construct gaze transition networks between clinically meaningful areas of interest (AOIs) and extract metrics such as entropy and self-loop rate to quantify attentional structure and flow. Our findings reveal that individual and team's visual attention is dynamically and adaptively redistributed across roles and scenario phases, with those in CPR roles narrowing their focus to execution-critical tasks and those in the TeamLead role concentrating on global monitoring as clinical demands evolve. TNA thus provides a powerful lens for mapping functional differentiation of team cognition and may support the development of phase-sensitive analytics and targeted instructional interventions in acute care training.

Authors:Brian Freeman, Adam Kicklighter, Matt Erdman, Zach Gordon
Title: Toward Epistemic Stability: Engineering Consistent Procedures for Industrial LLM Hallucination Reduction
Abstract:
Hallucinations in large language models (LLMs) are outputs that are syntactically coherent but factually incorrect or contextually inconsistent. They are persistent obstacles in high-stakes industrial settings such as engineering design, enterprise resource planning, and IoT telemetry platforms. We present and compare five prompt engineering strategies intended to reduce the variance of model outputs and move toward repeatable, grounded results without modifying model weights or creating complex validation models. These methods include: (M1) Iterative Similarity Convergence, (M2) Decomposed Model-Agnostic Prompting, (M3) Single-Task Agent Specialization, (M4) Enhanced Data Registry, and (M5) Domain Glossary Injection. Each method is evaluated against an internal baseline using an LLM-as-Judge framework over 100 repeated runs per method (same fixed task prompt, stochastic decoding at $τ= 0.7$. Under this evaluation setup, M4 (Enhanced Data Registry) received ``Better'' verdicts in all 100 trials; M3 and M5 reached 80\% and 77\% respectively; M1 reached 75\%; and M2 was net negative at 34\% when compared to single shot prompting with a modern foundation model. We then developed enhanced version 2 (v2) implementations and assessed them on a 10-trial verification batch; M2 recovered from 34\% to 80\%, the largest gain among the four revised methods. We discuss how these strategies help overcome the non-deterministic nature of LLM results for industrial procedures, even when absolute correctness cannot be guaranteed. We provide pseudocode, verbatim prompts, and batch logs to support independent assessment.

Authors:Tegan Roberts-Morgan, Min S. Li, Priscilla Lo, Zhuzhi Fan, Dan Bennett, Oussama Metatla
Title: Touching Emotions, Smelling Shapes: Exploring Tactile, Olfactory and Emotional Cross-sensory Correspondences in Preschool Aged Children
Abstract:
The use of a wide range of sensory modalities is increasingly central to technologies for learning, communication, and affective regulation. During the preschool years, sensory integration develops rapidly, shaping how children perceive and make sense of their environments. A key component of this process is cross-sensory correspondence: the systematic ways in which perceptions in different sensory modalities influence one another. Despite its relevance, little is known about cross-sensory correspondences in preschool-aged children (2-4 years). We present a study with 26 preschoolers examining smell-touch-emotion correspondences through playful tasks. We found significant correspondences both between sensory modalities and between sensory modalities and affective judgements. Further analysis revealed association strategies underpinning these mappings. We contribute empirical insights into cross-sensory correspondences in early childhood, design guidelines that align with how preschoolers relate sensory input, and a replicable method for probing cross-sensory cognition in this age group.

Authors:Jiayin Zhi, Harsh Kumar, Mina Lee
Title: Investigating the Effects of LLM Use on Critical Thinking Under Time Constraints: Access Timing and Time Availability
Abstract:
The impact of large language models (LLMs) on critical thinking has provoked growing attention, yet this impact on actual performance may not be uniformly negative or positive. Particularly, the role of time -- the temporal context under which an LLM is provided -- remains overlooked. In a between-subjects experiment (n=393), we examined two types of time constraints for a critical thinking task requiring participants to make a reasoned decision for a real-world scenario based on diverse documents: (1) LLM access timing -- an LLM available only at the beginning (early), throughout (continuous), near the end (late), or not at all (no LLM), and (2) time availability -- insufficient or sufficient time for the task. We found a temporal reversal: LLM access from the start (early, continuous) improved performance under time pressure but impaired it with sufficient time, whereas beginning the task independently (late, no LLM) showed the opposite pattern. These findings demonstrate that time constraints fundamentally shape whether an LLM augments or undermines critical thinking, making time a central consideration when designing LLM support and evaluating human-AI collaboration in cognitive tasks.

Authors:Eva Mackamul, Tom Maillard, Noé Marceaul, Yelli Coulibaly, Julien Pansiot, Laurence Boissieux, Dominique Vaufreydaz, Anne Roudaut, Céline Coutrix
Title: ''I don't want to break it'': An Exploration of Perceived Fragility in Shape-Changing Interfaces
Abstract:
Shape-Changing Interfaces (SCIs) dynamically alter their form, an inherent characteristic that introduces fragility into their design. As a result, users' perceptions of an interface's fragility or its potential to move or break may influence their interaction, however the extent of this effect is unclear. To address this gap, we conducted a qualitative study (N = 18) using video stimuli showcasing 20 existing SCIs. Through thematic analysis, we identified key factors impacting perceived fragility and formalized these into a framework. We then conducted a second study (N = 36) for which we fabricated SCIs that varied across selected fragility-related dimensions. We recorded user interactions and compared how the selected dimensions shaped manipulation of the objects and how they were considered by users. Together, these studies provide a structured foundational understanding of perceived fragility in SCIs and offer insights to enhance perceived robustness and inform future SCI development.

Authors:Jordan Aiko Deja, Isidro Butaslac, Nicko Reginio Caluya, Maheshya Weerasinghe
Title: Directing the Robot: Scaffolding Creative Human-AI-Robot Interaction
Abstract:
Robots are moving beyond industrial settings into creative, educational, and public environments where interaction is open-ended and improvisational. Yet much of human-AI-robot interaction remains framed around performance and efficiency, positioning humans as supervisors rather than collaborators. We propose a re-framing of AI interaction with robots as scaffolding: infrastructure that enables humans to shape robotic behaviour over time while remaining meaningfully in control. Through scenarios from creative practice, learning-by-teaching, and embodied interaction, we illustrate how humans can act as executive directors, defining intent and steering revisions, while AI mediates between human expression and robotic execution. We outline design and evaluation implications that foreground creativity, agency, and flow. Finally, we discuss open challenges in social, scalable, and mission-critical contexts. We invite the community to rethink interacting with Robots and AI not as autonomy, but as sustained support for human creativity.

Authors:Yang Lu, Tianyu Zhang, Jiamu Tang, Yanna Lin, Jiankun Yang, Longyu Zhang, Shijian Luo, Yukang Yan
Title: Capability at a Glance: Design Guidelines for Intuitive Avatars Communicating Augmented Actions in Virtual Reality
Abstract:
Virtual Reality (VR) enables users to engage with capabilities beyond human limitations, but it is not always obvious how to trigger these capabilities. Taking the lens of Affordance, we believe avatar design is the key to solving this issue, which ideally should communicate its capabilities and how to activate them. To understand the current practice, we selected eight capabilities across four categories and invited twelve professional designers to design avatars that communicate the capabilities and their corresponding interactions. From the resulting designs, we formed 16 guidelines to provide general and category-specific recommendations. Then, we validated these guidelines by letting two groups of twelve participants design avatars with and without guidelines. Participants rated the guidelines' clarity and usefulness highly. External judges confirmed that avatars designed with the guidelines were more intuitive in conveying the capabilities and interaction methods. Finally, we demonstrated the applicability of the guidelines in avatar design for four VR applications.

Authors:Tzu-Hsin Hsieh, Cassandra Michelle Stefanie Visser, Elmar Eisemann, Ricardo Marroquim
Title: Skill-Adaptive Ghost Instructors: Enhancing Retention and Reducing Over-Reliance in VR Piano Learning
Abstract:
Motor-skill learning systems in XR rely on persistent cues. However, constant cueing can induce overreliance and erode memorization and skill transfer. We introduce a skill-adaptive, dynamically transparent ghost instructor whose opacity adapts in real time to learner performance. In a first-person perspective, users observe a ghost hand executing piano fingering with either a static or a performance-adaptive transparency in a VR piano training application. We conducted a within-subjects study (N=30), where learners practiced with traditional Static (fixed-transparency) and our proposed Dynamic (performance-adaptive) modes and were tested without guidance immediately and after a 10-minute retention interval. Relative to Static, the Dynamic mode yielded higher pitch and fingering accuracy and limited error increases, with comparable timing. These findings suggest that adaptive transparency helps learners internalize fingerings more effectively, reducing dependency on external cues and improving short-term skill retention within immersive learning environments. We discuss design implications for motor-skill learning and outline directions for extending this approach to longer-term retention and more complex tasks.

Authors:Dion Barja, Matthew Brehmer
Title: Glass Chirolytics: Reciprocal Compositing and Shared Gestural Control for Face-to-Face Collaborative Visualization at a Distance
Abstract:
Videoconference conversations about data often entail screen sharing visualization artifacts, in which nonverbal communication goes largely ignored. Beyond presentation use cases, conversations supported by visualization also arise in collaborative decision making, technical interviews, and tutoring: use cases that benefit from participants being able to see one another as they exchange questions about the data. In this paper, we employ a reciprocal compositing of visualization and interface widgets over the mirrored video of one's conversation partner, suggestive of a pane of glass, in which both parties can simultaneously manipulate composited elements via bimanual gestures. We demonstrate our approach with implementations of several visualization interfaces spanning the aforementioned use cases, and we evaluate our approach in a study (N = 16) comparing it to videoconferencing while using a mouse to interact with a collaborative web application. Our findings suggest that our approach promotes feelings of presence and mutual awareness of analytical intent.

Authors:Thanh-Tung Ngo, Emma Murphy, Robert J. Ross
Title: Vision-Language System using Open-Source LLMs for Gestures in Medical Interpreter Robots
Abstract:
Effective communication is vital in healthcare, especially across language barriers, where non-verbal cues and gestures are critical. This paper presents a privacy-preserving vision-language framework for medical interpreter robots that detects specific speech acts (consent and instruction) and generates corresponding robotic gestures. Built on locally deployed open-source models, the system utilizes a Large Language Model (LLM) with few-shot prompting for intent detection. We also introduce a novel dataset of clinical conversations annotated for speech acts and paired with gesture clips. Our identification module achieved 0.90 accuracy, 0.93 weighted precision, and a 0.91 weighted F1-Score. Our approach significantly improves computational efficiency and, in user studies, outperforms the speech-gesture generation baseline in human-likeness while maintaining comparable appropriateness.

Authors:Zuoyu Zhang, Yancheng Zhu
Title: Enhancing Tool Calling in LLMs with the International Tool Calling Dataset
Abstract:
Tool calling allows large language models (LLMs) to interact with external systems like APIs, enabling applications in customer support, data analysis, and dynamic content generation. While recent benchmarks have advanced tool-use research, they suffer from key limitations, including reliance on simulated or restricted APIs, limited reproducibility, and a lack of cultural and geographic diversity. To address these gaps, we introduce International Tool Calling (ITC), a large-scale, multilingual benchmark designed for realistic, globally distributed tool-calling scenarios. ITC includes 3,571 real APIs and 17,540 tool calling tasks across 20 categories and 40 countries. Experiments reveal substantial performance gaps between open- and closed-source LLMs, while fine-tuning on ITC yields significant improvements, particularly for non-English queries, enhancing cross-lingual generalization, reasoning consistency, and robustness to out-of-domain tools. ITC provides a valuable benchmark for advancing LLM robustness and performance in complex, multi-tool, and international scenarios. Dataset: https://anonymous.4open.science/r/International-Tool-Calling-ITC-dataset-FAF4/.

Authors:S. Yanushkevich, E. Berepiki, P. Ciunkiewicz, V. Shmerko, G. Wolbring, R. Guest
Title: Biometric-enabled Personalized Augmentative and Alternative Communications
Abstract:
This study focuses on the roadmapping of biometric technologies onto personalized Augmentative and Alternative Communication (AAC), a branch of assistive technologies for people with communication disabilities. This technology roadmapping revolves around the proposed notions of an AAC biometric register and biometric-enabled reconfigurable AAC channels. The biometric register is referred to as a tool for acquiring and processing physiological and behavioural traits that are essential for augmentative and alternative communication. It links biometric traits, such as gestures, to intermediate traits, such as synthesized speech, for customizable communication channels. The proposed methodology is used to assess the gaps between the social and practical demands, such as assisting people with communication disabilities in the contemporary semi-automated border control, and the emerging advances in AI, such as advanced video and speech processing. We provide two case studies of the AAC that rely on hand gesture recognition and sign language word recognition, and conclude that the current accuracy of those AI technologies does not meet the practical requirements. The proposed roadmapping provides recommendations for further improvement to close these gaps.

Authors:Patrick Tresset, Markus Wulfmeier
Title: An Embodied Companion for Visual Storytelling
Abstract:
As artificial intelligence shifts from pure tool for delegation toward agentic collaboration, its use in the arts can shift beyond the exploration of machine autonomy toward synergistic co-creation. While our earlier robotic works utilized automation to distance the artist's intent from the final mark, we present Companion: an artistic apparatus that integrates a drawing robot with Large Language Models (LLMs) to re-center human-machine presence. By leveraging in-context learning and real-time tool use, the system engages in bidirectional interaction via speech and sketching. This approach transforms the robot from a passive executor into a playful co-creative partner capable of driving shared visual storytelling into unexpected aesthetic territories. To validate this collaborative shift, we employed the Consensual Assessment Technique (CAT) with a panel of seven art-world experts. Results confirm that the system produces works with a distinct aesthetic identity and professional exhibition merit, demonstrating the potential of AI as a highly capable artistic collaborator.

Authors:Parm Suksakul, Nathan Kittichaikoonkij, Nakhin Polthai, Aung Pyae
Title: Exploring Human-in-the-Loop Themes in AI Application Development: An Empirical Thematic Analysis
Abstract:
Developing and deploying AI applications in organizations is challenging when human decision authority and oversight are underspecified across the system lifecycle. Although Human-in-the-Loop (HITL) and Human-Centered AI (HCAI) principles are widely acknowledged, operational guidance for structuring roles, checkpoints, and feedback mechanisms remains fragmented. We report a multi-source qualitative study: a retrospective diary study of a customer-support chatbot and semi-structured interviews with eight AI experts from academia and industry. Through five-cycle thematic analysis of 1,435 codewords, we derive four themes: AI Governance and Human Authority, Human-in-the-Loop Iterative Refinement, AI System Lifecycle and Operational Constraints, and Human-AI Team Collaboration and Coordination. These themes provide empirical inputs for subsequent HITL framework design and validation.

Authors:Santiago Lombeyda, S. G. Djorgovski, Ciro Donalek
Title: XR and Hybrid Data Visualization Spaces for Enhanced Data Analytics
Abstract:
The growing complexity and information content of data, together with the need to understand both the complex structures, relationships, and phenomena present in these data spaces, compounded with the emerging need to understand the results produced by AI tools used to analyze the data, requires development of novel, effective data visualization tools. Much of the growing complexity is reflected in the increasing dimensionality of data spaces, where extended reality (XR) naturally emerges as a candidate to help extend our capability for higher dimensional understanding. However, humans often understand lower dimensionality representations more effectively. Still, XR offers an opportunity for a seamless integration of simulated traditional data displays within the 3-dimensional virtual data spaces, leading to more intuitive and more effective data analytics. In this paper we present an overview of the benefits of seamlessly integrated 2-dimensional and 3-dimensional interactive visual representations embedded in XR spaces, and present three case studies that leverage these approaches for more efficient data analytics.

Authors:Shi Liu, Martin Feick, Linus Bierhoff, Alexander Maedche
Title: AttentiveLearn: Personalized Post-Lecture Support for Gaze-Aware Immersive Learning
Abstract:
Immersive learning environments such as virtual classrooms in Virtual Reality (VR) offer learners unique learning experiences, yet providing effective learner support remains a challenge. While prior HCI research has explored in-lecture support for immersive learning, little research has been conducted to provide post-lecture support, despite being critical for sustained motivation, engagement, and learning outcomes. To address this, we present AttentiveLearn, a learning ecosystem that generates personalized quizzes on a mobile learning assistant based on learners' attention distribution inferred using eye-tracking in VR lectures. We evaluated the system in a four-week field study with 36 university students attending lectures on Bayesian data analysis. AttentiveLearn improved learners' reported motivation and engagement, without conclusive evidence of learning gains. Meanwhile, anecdotal evidence suggested improvements in attention for certain participants over time. Based on our findings of the field study, we provide empirical insights and design implications for personalized post-lecture support for immersive learning systems.

Authors:Ian Steenstra, Neha Patkar, Rebecca B. Perkins, Michael K. Paasche-Orlow, Timothy Bickmore
Title: Designing for Adolescent Voice in Health Decisions: Embodied Conversational Agents for HPV Vaccination
Abstract:
Adolescents are directly affected by preventive health decisions such as vaccination, yet their perspectives are rarely solicited or supported. Most digital interventions for Human Papillomavirus (HPV) vaccination are designed exclusively for parents, implicitly treating adolescents as passive recipients rather than stakeholders with agency. We present the design and evaluation of a mobile intervention that gives adolescents a voice in HPV vaccination decisions alongside their parents. The system uses embodied conversational agents tailored to each audience: parents interact with an animated physician using education and motivational interviewing techniques, while adolescents can choose between an age-appropriate doctor or a narrative fantasy game that conveys HPV facts through play. We report findings from a clinic-based pilot study with 21 parent-adolescent dyads. Results indicate high satisfaction across both audiences, improved HPV knowledge, and increased intent to vaccinate. We discuss design implications for supporting adolescent participation, choice, and agency in decisions about their health.

Authors:Punn Lertjaturaphat, Jungwoo Rhee, Jaewon You, Andrea Bianchi
Title: Wire Your Way: Hardware-Contextualized Guidance and In-situ Tests for Personalized Circuit Prototyping
Abstract:
The increasing popularity of microcontroller platforms like Arduino enables diverse end-user developers to participate in circuit prototyping. Traditionally, follow-along tutorials serve as an essential learning method for makers, and in fact, several prior toolkits leveraged this format as a way to engage new makers. However, literature and our formative study (N=12) show that makers have unique preferences regarding the construction of their circuits and idiosyncratic ways to assess and debug problems, which contrasts with the step-by-step instructional nature of tutorials and those systems leveraging this method. To address this mismatch, we present a prototyping platform that supports personalized circuit construction and debugging. Our system utilizes an augmented breadboard, which is circuit-aware and supports on-the-fly hardware reconfiguration via contextualized guidance and in-situ circuit validation through interactive tests. Through a usability study (N=12), we demonstrate how makers leverage circuit-aware guidance and debugging to support individual building patterns.

Authors:Benjamin M. Chen, Hong Bao
Title: Training for Technology: Adoption and Productive Use of Generative AI in Legal Analysis
Abstract:
Can targeted user training unlock the productive potential of generative artificial intelligence (GenAI) in professional settings? We investigate this question using a randomized study involving 164 law students completing an issue-spotting examination. Participants were assigned to one of three conditions: no GenAI access, optional access to a large language model (LLM), or optional access accompanied by an approximately ten-minute training intervention. Training significantly increased LLM adoption--the usage rate rose from 26% to 41%--and improved examination performance. Students with trained access scored 0.27 grade points higher than those with untrained access (p = 0.027), equivalent to roughly one-third of a letter grade. By contrast, access to an LLM without training did not improve performance and was associated with shorter answers relative to no access. Using principal stratification, we decompose the overall effect into adoption and effectiveness channels. Point estimates are consistent with training operating primarily by expanding the scope of GenAI use rather than by enhancing effectiveness among existing users, though confidence intervals are wide. Overall, our findings provide evidence that complementary investments in user training are critical for realizing GenAI productivity gains in knowledge-intensive fields where concerns about reliability may inhibit adoption.

Authors:Bowen Lou, Tian Lu, T. S. Raghu, Yingjie Zhang
Title: Visioning Human-Agentic AI Teaming: Continuity, Tension, and Future Research
Abstract:
Artificial intelligence is undergoing a structural transformation marked by the rise of agentic systems capable of open-ended action trajectories, generative representations and outputs, and evolving objectives. These properties introduce structural uncertainty into human-AI teaming (HAT), including uncertainty about behavior trajectories, epistemic grounding, and the stability of governing logics over time. Under such conditions, alignment cannot be secured through agreement on bounded outputs; it must be continuously sustained as plans unfold and priorities shift. We advance Team Situation Awareness (Team SA) theory, grounded in shared perception, comprehension, and projection, as an integrative anchor for this transition. While Team SA remains analytically foundational, its stabilizing logic presumes that shared awareness, once achieved, will support coordinated action through iterative updating. Agentic AI challenges this presumption. Our argument unfolds in two stages: first, we extend Team SA to reconceptualize both human and AI awareness under open-ended agency, including the sensemaking of projection congruence across heterogeneous systems. Second, we interrogate whether the dynamic processes traditionally assumed to stabilize teaming in relational interaction, cognitive learning, and coordination and control continue to function under adaptive autonomy. By distinguishing continuity from tension, we clarify where foundational insights hold and where structural uncertainty introduces strain, and articulate a forward-looking research agenda for HAT. The central challenge of HAT is not whether humans and AI can agree in the moment, but whether they can remain aligned as futures are continuously generated, revised, enacted, and governed over time.

Authors:Ajan Subramanian, Sumukh Bettadapura, Rohan Sathish
Title: Real Eyes Realize Faster: Gaze Stability and Pupil Novelty for Efficient Egocentric Learning
Abstract:
Always-on egocentric cameras are increasingly used as demonstrations for embodied robotics, imitation learning, and assistive AR, but the resulting video streams are dominated by redundant and low-quality frames. Under the storage and battery constraints of wearable devices, choosing which frames to keep is as important as how to learn from them. We observe that modern eye-tracking headsets provide a continuous, training-free side channel that decomposes into two complementary axes: gaze fixation captures visual stability (quality), while pupil response captures arousal-linked moments (novelty). We operationalize this insight as a Dual-Criterion Frame Curator that first gates frames by gaze quality and then ranks the survivors by pupil-derived novelty. On the Visual Experience Dataset (VEDB), curated frames at 10% budget match the classification performance of the full stream, and naive signal fusion consistently destroys both contributions. The benefit is task-dependent: pupil ranking improves activity recognition, while gaze-only selection already dominates for scene recognition, confirming that the two signals serve genuinely different roles. Our method requires no model inference and operates at capture time, offering a path toward efficient, always-on egocentric data curation.

Authors:Phenyo Phemelo Moletsane, Michael W. Asher, Christine Kwon, Paulo F. Carvalho, Amy Ogan
Title: Inclusive Mobile Learning: How Technology-Enabled Language Choice Supports Multilingual Students
Abstract:
Most learners worldwide are multilingual, yet implementing multilingual education remains challenging in practice. EdTech offers an opportunity to bridge this gap and expand access for linguistically diverse learners. We conducted a quasi-experiment in Uganda with 2,931 participants enrolled in a non-formal radio- and mobile-based engineering course, where learners self-selected instruction in Leb Lango (a local language), English, or a Hybrid option combining both languages. The Leb Lango version of the course was used disproportionately by learners from rural areas, those with less formal education, and those with lower prior knowledge, broadening participation among disadvantaged learners. Moreover, the availability of Leb Lango instruction was associated with higher active participation, even among learners who registered for English instruction. Although Leb Lango learners began with lower performance, they demonstrated faster learning gains and achieved comparable final examination outcomes to English and Hybrid learners. These results suggest that providing local language options to learners is an effective way to make EdTech more accessible.

Authors:Andrea Bianchi, Zhi Lin Yap, Punn Lertjaturaphat, Austin Z. Henley, Kongpyung Justin Moon, Yoonji Kim
Title: Inline Visualization and Manipulation of Real-Time Hardware Log for Supporting Debugging of Embedded Programs
Abstract:
The development of user-friendly embedded prototyping systems like Arduino has made creating interactive devices more accessible. However, debugging these systems is challenging due to the intertwined nature of software and hardware issues. Existing tools often require hardware instrumentation or log visualization through serial monitors. To address this, the authors designed Inline, a programming tool that simplifies debugging by displaying hardware logs directly within the code, providing real-time execution flow tracking and an expression language for log manipulation. A study with twelve users demonstrated the tool's effectiveness in aiding debugging tasks.

Authors:Cynthia M. Baseman, Reeda Shimaz Huda, Rosa I. Arriaga
Title: Designing with Medical Mistrust: Perspectives from Black Older Adults in Publicly Subsidized Housing
Abstract:
Despite increasing interest in culturally-sensitive health technologies, medical mistrust remains largely unexplored within human-centered computing. Considered a social determinant of health, medical mistrust is the belief that healthcare providers or institutions are acting against one's best interest. This is a rational, protective response based on historical context, structural inequities, and discrimination. To center race-based medical mistrust and the lived experiences of Black older adults with low income, we conducted interviews within publicly subsidized housing in the Southern United States. Our reflexive themes describe community perspectives on health care and medical mistrust, including accreditation and embodiment, skepticism of financial motivations, and the intentions behind health AI. We provide a reflective exercise for researchers to consider their positionality in relation to community engagements, and reframe our findings through Black Feminist Thought to propose design principles for health self-management technologies for communities with historically grounded medical mistrust.

Authors:Frederick Reiber, Nathan Kim, Allison McDonald, Dana Calacci
Title: Surveillance, Spacing, Screaming and Scabbing: How Digital Technology Facilitates Union Busting
Abstract:
Despite high approval ratings for unions and growing worker interest in organizing, employees in the United States still face significant barriers to securing collective bargaining agreements. A key factor is employer counter-organizing: efforts to suppress unionization through rule changes, retaliation, and disruption. Designing sociotechnical tools and strategies to resist these tactics requires a deeper understanding of the role computing technologies play in counter-organizing against unionization. In this paper, we examine three high-profile organizing efforts -- at Amazon, Starbucks, and \university -- using publicly available sources to identify four recurring technological tactics: surveillance, spacing, screaming and scabbing. We analyze how these tactics operate across contexts, highlighting their digital dimensions and strategic deployment. We conclude with implications for organizing in digitally-mediated workplaces, directions for future research, and emergent forms of worker resistance.

Authors:Semin Jin, Donghyuk Kim, Jeongmin Ryu, Kyung Hoon Hyun
Title: Behavior-Aware Anthropometric Scene Generation for Human-Usable 3D Layouts
Abstract:
Well-designed indoor scenes should prioritize how people can act within a space rather than merely what objects to place. However, existing 3D scene generation methods emphasize visual and semantic plausibility, while insufficiently addressing whether people can comfortably walk, sit, or manipulate objects. To bridge this gap, we present a Behavior-Aware Anthropometric Scene Generation framework. Our approach leverages vision-language models (VLMs) to analyze object-behavior relationships, translating spatial requirements into parametric layout constraints adapted to user-specific anthropometric data. We conducted comparative studies with state-of-the-art models using geometric metrics and a user perception study (N=16). We further conducted in-depth human-scale studies (individuals, N=20; groups, N=18). The results showed improvements in task completion time, trajectory efficiency, and human-object manipulation space. This study contributes a framework that bridges VLM-based interaction reasoning with anthropometric constraints, validated through both technical metrics and real-scale human usability studies.

Authors:Wengxi Li, Jingze Tian, Can Liu
Title: Orality: A Semantic Canvas for Externalizing and Clarifying Thoughts with Speech
Abstract:
People speak aloud to externalize thoughts as one way to help clarify and organize them. Although Speech-to-text can capture these thoughts, transcripts can be difficult to read and make sense due to disfluencies, repetitions and potential disorganization. To support thinking through verbalization, we introduce Orality, which extracts key information from spoken content, performs semantic analysis through LLMs to form a node-link diagram in an interactive canvas. Instead of reading and working with transcripts, users could manipulate clusters of nodes and give verbal instructions to re-extract and organize the content in other ways. It also provides AI-generated inspirational questions and detection of logical conflicts. We conducted a lab study with twelve participants comparing Orality against speech interaction with ChatGPT. We found that Orality can better support users in clarifying and developing their thoughts. The findings also identified the affordances of both graphical and conversational thought clarification tools and derived design implications.

Authors:Songhai Fan, Simon Angus, Tim Dwyer, Ying Yang, Sarah Goodwin, Helen Purchase
Title: A Directed Graph Model and Experimental Framework for Design and Study of Time-Dependent Text Visualisation
Abstract:
Exponential growth in the quantity of digital news, social media, and other textual sources makes it difficult for humans to keep up with rapidly evolving narratives about world events. Various visualisation techniques have been touted to help people to understand such discourse by exposing relationships between texts (such as news articles) as topics and themes evolve over time. Arguably, the understandability of such visualisations hinges on the assumption that people will be able to easily interpret the relationships in such visual network structures. To test this assumption, we begin by defining an abstract model of time-dependent text visualisation based on directed graph structures. From this model we distill motifs that capture the set of possible ways that texts can be linked across changes in time. We also develop a controlled synthetic text generation methodology that leverages the power of modern LLMs to create fictional, yet structured sets of time-dependent texts that fit each of our patterns. Therefore, we create a clean user study environment (n=30) for participants to identify patterns that best represent a given set of synthetic articles. We find that it is a challenging task for the user to identify and recover the predefined motif. We analyse qualitative data to map an unexpectedly rich variety of user rationales when divergences from expected interpretation occur. A deeper analysis also points to unexpected complexities inherent in the formation of synthetic datasets with LLMs that undermine the study control in some cases. Furthermore, analysis of individual decision-making in our study hints at a future where text discourse visualisation may need to dispense with a one-size-fits-all approach and, instead, should be more adaptable to the specific user who is exploring the visualisation in front of them.

Authors:Divyanshu Daiya, Aniket Bera
Title: Sketch2Colab: Sketch-Conditioned Multi-Human Animation via Controllable Flow Distillation
Abstract:
We present Sketch2Colab, which turns storyboard-style 2D sketches into coherent, object-aware 3D multi-human motion with fine-grained control over agents, joints, timing, and contacts. Diffusion-based motion generators offer strong realism but often rely on costly guidance for multi-entity control and degrade under strong conditioning. Sketch2Colab instead learns a sketch-conditioned diffusion prior and distills it into a rectified-flow student in latent space for fast, stable sampling. To make motion follow storyboards closely, we guide the student with differentiable objectives that enforce keyframes, paths, contacts, and physical consistency. Collaborative motion naturally involves discrete changes in interaction, such as converging, forming contact, cooperative transport, or disengaging, and a continuous flow alone struggles to sequence these shifts cleanly. We address this with a lightweight continuous-time Markov chain (CTMC) planner that tracks the active interaction regime and modulates the flow to produce clearer, synchronized coordination in human-object-human motion. Experiments on CORE4D and InterHuman show that Sketch2Colab outperforms baselines in constraint adherence and perceptual quality while sampling substantially faster than diffusion-only alternatives.

Authors:Anna Ricarda Luther, Hendrik Heuer, Stephanie Geise, Sebastian Haunss, Andreas Breiter
Title: Take the Power Back: Screen-Based Personal Moderation Against Hate Speech on Instagram
Abstract:
Hate speech remains a pressing challenge on social media, where platform moderation often fails to protect targeted users. Personal moderation tools that let users decide how content is filtered can address some of these shortcomings. However, it remains an open question on which screens (e.g., the comments, the reels tab, or the home feed) users want personal moderation and which features they value most. To address these gaps, we conducted a three-wave Delphi study with 40 activists who experienced hate speech. We combined quantitative ratings and rankings with open questions about required features. Participants prioritized personal moderation for conversational and algorithmically curated screens. They valued features allowing for reversibility and oversight across screens, while input-based, content-type specific, and highly automated features are more screen specific. We discuss the importance of personal moderation and offer user-centered design recommendations for personal moderation on Instagram.

Authors:Hima Mynampaty, Nathania Josephine, Katherine E. Isaacs, Andrew M. McNutt
Title: Linting Style and Substance in READMEs
Abstract:
READMEs shape first impressions of software projects, yet what constitutes a good README varies across audiences and contexts. Research software needs reproducibility details, while open-source libraries might prioritize quick-start guides. Through a design probe, LintMe, we explore how linting can be used to improve READMEs given these diverse contexts, aiding style and content issues while preserving authorial agency. Users create context-specific checks using a lightweight DSL that uses a novel combination of programmatic operations (e.g., for broken links) with LLM-based content evaluation (e.g., for detecting jargon), yielding checks that would be challenging for prior linters. Through a user study (N=11), comparison with naive LLM usage, and an extensibility case study, we find that our design is approachable, flexible, and well matched with the needs of this domain. This work opens the door for linting more complex documentation and other culturally mediated text-based documents.

Authors:Joy T Wu, Daniel Beckmann, Sarah Miller, Alexander Lee, Elizabeth Theng, Stephan Altmayer, Ken Chang, David Kersting, Tomoaki Otani, Brittany Z Dashevsky, Hye Lim Park, Matteo Novello, Kip Guja, Curtis Langlotz, Ismini Lourentzou, Daniel Gruhl, Benjamin Risse, Guido A Davidzon
Title: GazeXPErT: An Expert Eye-tracking Dataset for Interpretable and Explainable AI in Oncologic FDG-PET/CT Scans
Abstract:
[18F]FDG-PET/CT is a cornerstone imaging modality for tumor staging and treatment response assessment across many cancer types, yet expert reader shortages necessitate more efficient diagnostic aids. While standalone AI models for automatic lesion segmentation exist, clinical translation remains hindered by concerns about interpretability, explainability, reliability, and workflow integration. We present GazeXPErT, a 4D eye-tracking dataset capturing expert search patterns during tumor detection and measurement on 346 FDG-PET/CT scans. Each study was read by a trainee and a board-certified nuclear medicine or radiology specialist using an eye-tracking-enabled annotation platform that simulates routine clinical reads. From 3,948 minutes of raw 60Hz eye-tracking data, 9,030 unique gaze-to-lesion trajectories were extracted, synchronized with PET/CT image slices, and rendered in COCO-style format for multiple machine learning applications. Baseline validation experiments demonstrate that a 3D nnUNet tumor segmentation model achieved superior performance when incorporating expert gaze patterns versus without (DICE score 0.6819 versus 0.6008), and that vision transformers trained on sequential gaze and PET/CT images can improve dynamic lesion localization (74.95% predicted gaze point closer to tumor) and expert intention prediction (Accuracy 67.53% and AUROC 0.747). GazeXPErT is a valuable resource designed to explore multiple machine learning problems beyond these baseline experiments, which include and are not limited to, visual grounding or causal reasoning, clinically explainable feature augmentation, human-computer interaction, human intention prediction or understanding, and expert gaze-rewarded modeling approaches to AI in oncologic FDG-PET/CT imaging.

Authors:Shauna Heron, Meng Cheng Lau
Title: Trust in Autonomous Human--Robot Collaboration: Effects of Responsive Interaction Policies
Abstract:
Trust plays a central role in human--robot collaboration, yet its formation is rarely examined under the constraints of fully autonomous interaction. This pilot study investigated how interaction policy influences trust during in-person collaboration with a social robot operating without Wizard-of-Oz control or scripted repair. Participants completed a multi-stage collaborative task with a mobile robot that autonomously managed spoken-language dialogue, affect inference, and task progression. Two interaction policies were compared: a responsive policy, in which the robot proactively adapted its dialogue and assistance based on inferred interaction state, and a neutral, reactive policy, in which the robot provided only direct, task-relevant responses when prompted. Responsive interaction was associated with significantly higher post-interaction trust under viable communication conditions, despite no reliable differences in overall task accuracy. Sensitivity analyses indicated that affective and experiential components of trust were more sensitive to communication breakdown than evaluative judgments of reliability, and that as language-mediated interaction degraded, the trust advantage associated with responsiveness attenuated and ratings became less clearly interpretable as calibrated evaluations of collaborative competence. These findings suggest that trust in autonomous human--robot interaction emerges from process-level interaction dynamics and operates within constraints imposed by communication viability, highlighting the importance of evaluating trust under real autonomy conditions when designing interactive robotic systems.

Authors:Eman Alamoudi, Ellis Solaiman
Title: Designing Explainable AI for Healthcare Reviews: Guidance on Adoption and Trust
Abstract:
Patients increasingly rely on online reviews when choosing healthcare providers, yet the sheer volume of these reviews can hinder effective decision-making. This paper summarises a mixed-methods study aimed at evaluating a proposed explainable AI system that analyses patient reviews and provides transparent explanations for its outputs. The survey (N=60) indicated broad optimism regarding usefulness (82% agreed it saves time; 78% that it highlights essentials), alongside strong demand for explainability (84% considered it important to understand why a review is classified; 82% said explanations would increase trust). Around 45% preferred combined text-and-visual explanations. Thematic analysis of open-ended survey responses revealed core requirements such as accuracy, clarity and simplicity, responsiveness, data credibility, and unbiased processing. In addition, interviews with AI experts provided deeper qualitative insights, highlighting technical considerations and potential challenges for different explanation methods. Drawing on TAM and trust in automation, the findings suggest that high perceived usefulness and transparent explanations promote adoption, whereas complexity and inaccuracy hinder it. This paper contributes actionable design guidance for layered, audience-aware explanations in healthcare review systems.

Authors:Kristian Paolo David, Tyrone Justin Sta Maria, Mikkel Dominic Gamboa, Jordan Aiko Deja
Title: Feelings, Not Feel: Affective Audio-Visual Pseudo-Haptics in Hand-Tracked XR
Abstract:
Hand-tracking enables controller-free XR interaction but does not have the tactile feedback controllers provide. Rather than treating this solely as a missing-sensation problem, we explore whether pseudo-haptic cues on an embodied virtual hand act as tactile or as affect substitutes that shape how interactions feel. We used a mixed reality prototype that keeps the contacted surface visually neutral, rendering cues on the hand with motion modulation for texture, color glow, and movement-coupled sound. In a within-subjects study (n=12), participants experienced 12 conditions (4 effects x 3 modalities: audio, visual, both) and reported subjective affect and cognitive demand. Participants rarely reported sustained tactile, thermal sensations, yet affect shifted systematically: rough-hot lowered valence increasing arousal, while smooth-cold produced calmer pleasant states. These findings suggest that pseudo-haptics in XR may be better understood as an affective feedback channel rather than a direct replacement for physical touch in controller-free systems.

Authors:Gordon Fletcher, Saomai Vu Khan
Title: Serendipity with Generative AI: Repurposing knowledge components during polycrisis with a Viable Systems Model approach
Abstract:
Organisations face polycrisis uncertainty yet overlook embedded knowledge. We show how generative AI can operate as a serendipity engine and knowledge transducer to discover, classify and mobilise reusable components (models, frameworks, patterns) from existing documents. Using 206 papers, our pipeline extracted 711 components (approx 3.4 per paper) and organised them into a repository aligned to Beer's Viable System Model (VSM). We contribute i) conceptually, a theory of planned serendipity in which GenAI lowers transduction costs between VSM subsystems, ii) empirically, a component repository and temporal/subject patterns, iii) managerially, a vignette and process blueprint for organisational adoption and iv) socially, pathways linking repurposing to environmental and social benefits. We propose testable links between repository creation, discovery-to-deployment time, and reuse rates, and discuss implications for shifting innovation portfolios from breakthrough bias toward systematic repurposing.

Authors:Yonglin Chen, Pengcheng An, Xueliang Li
Title: FuturePrism: Supporting Adolescence in Collaborative Storytelling to Cope with Future Uncertainty
Abstract:
FuturePrism is a GenAI-empowered collaborative storytelling system designed to scaffold adolescents to navigate future life challenges. Adolescents often suffer from anxiety related to future uncertainty for lacking the executive function to develop concrete pathways. Operationalizing Snyder's Hope Theory, the system utilizes a triadic role-play mechanics to externalize cognitive processes through four narrative chapters: The Goal, The Opportunity, The Challenge, and The Agency. An evaluation workshop with 20 adolescents demonstrated that FuturePrism significantly enhances momentary hope levels, particularly in the Agency dimension. Participants reported high levels of narrative immersion and positive feedback towards system usability. Participants also confirmed that the AI-scaffolded collaborative storytelling empowered them to develop positive attitudes towards future challenges.

Authors:Yonglin Chen, Jingjing Zhang, Kezhuo Wang, Pengcheng An, Xueliang Li
Title: TaleBot: A Tangible AI Companion to Support Children in Co-creative Storytelling for Resilience Cultivation
Abstract:
Resilience is a key factor affecting children's mental wellbeing and future development. Yet, limited HCI research has explored how to help children build resilience through adversarial experiences. Informed by a formative study with elementary school teachers and professional psychologists, we design TaleBot, an AI-empowered system that supports children to co-create stories about overcoming everyday adversities tailored to their personal situations. We evaluated the system with 12 elementary children in school counseling rooms under teacher guidance and conducted reflective interviews with parents upon the Child-AI co-created stories. The findings show that TaleBot encourages children in self-expression of feelings and thoughts, creating opportunities for teachers to provide personalized support and for parents to better understand the profound impact of family communication on children's mental wellbeing. We conclude with design implications for using generative AI to support children's mental health education and interventions across school and family contexts.

Authors:Besjon Cifliku, Hendrik Heuer
Title: They Think AI Can Do More Than It Actually Can: Practices, Challenges, & Opportunities of AI-Supported Reporting In Local Journalism
Abstract:
Declining newspaper revenues prompt local newsrooms to adopt automation to maintain efficiency and keep the community informed. However, current research provides a limited understanding of how local journalists work with digital data and which newsroom processes would benefit most from AI-supported (data) reporting. To bridge this gap, we conducted 21 semi-structured interviews with local journalists in Germany. Our study investigates how local journalists use data and AI (RQ1); the challenges they encounter when interacting with data and AI (RQ2); and the self-perceived opportunities of AI-supported reporting systems through the lens of discursive design (RQ3). Our findings reveal that local journalists do not fully leverage AI's potential to support data-related work. Despite local journalists' limited awareness of AI's capabilities, they are willing to use it to process data and discover stories. Finally, we provide recommendations for improving AI-supported reporting in the context of local news, grounded in the journalists' socio-technical perspective and their imagined AI future capabilities.

Authors:Md Ehtesham-Ul-Haque, Syed Masum Billah
Title: VoiceAlign: A Shimming Layer for Enhancing the Usability of Legacy Voice User Interface Systems
Abstract:
Voice user interfaces (VUIs) are rapidly transitioning from accessibility features to mainstream interaction modalities. Yet most operating systems' built-in voice commands remain underutilized despite possessing robust technical capabilities. Through our analysis of four commercial VUI systems and a formative study with 16 participants, we found that fixed command formats require exact phrasing, restrictive timeout mechanisms discard input during planning pauses, and insufficient feedback hampers multi-step interactions. To address these challenges, we developed VoiceAlign, an adaptive shimming layer that mediates between users and legacy VUI systems. VoiceAlign intercepts natural voice commands, transforms them to match the required syntax using a large language model, and transmits these adapted commands through a virtual audio channel that remains transparent to the underlying system. In our evaluation with 12 participants, VoiceAlign reduced command failures by half, required 25% fewer commands per task, and significantly lowered cognitive and temporal demands when paired with an existing legacy VUI system. Furthermore, we created a synthetic dataset informed by our studies and fine-tuned a small language model that achieves over 90% accuracy with 200 ms response time when served locally, eliminating dependence on third-party APIs while enabling real-time interaction on edge devices. This work demonstrates how modern AI techniques can unlock the underutilized potential of legacy VUI systems without requiring system modifications, offering a practical solution without replacing existing infrastructure.

Authors:Shuo Niu, Dylan Clements, Marina Margalit Nemanov, Hyungsin Kim
Title: StoryComposerAI: Supporting Human-AI Story Co-Creation Through Decomposition and Linking
Abstract:
GenAI's ability to produce text and images is increasingly incorporated into human-AI co-creation tasks such as storytelling and video editing. However, integrating GenAI into these tasks requires enabling users to retain control over editing individual story elements while ensuring that generated visuals remain coherent with the storyline and consistent across multiple AI-generated outputs. This work examines a paradigm of creative decomposition and linking, which allows creators to clearly communicate creative intent by prompting GenAI to tailor specific story elements, such as storylines, personas, locations, and scenes, while maintaining coherence among them. We implement and evaluate StoryComposerAI, a system that exemplifies this paradigm for enhancing users' sense of control and content consistency in human-AI co-creation of digital stories.

Authors:Christian Poelitz, Finale Doshi-Velez, Siân Lindley
Title: A Benchmark to Assess Common Ground in Human-AI Collaboration
Abstract:
AI is becoming increasingly integrated into everyday life, both in professional work environments and in leisure and entertainment contexts. This integration requires AI to move beyond acting as an assistant for informational or transactional tasks toward a genuine collaborative partner. Effective collaboration, whether between humans or between humans and AI, depends on establishing and maintaining common ground: shared beliefs, assumptions, goals, and situational awareness that enable coordinated action and efficient repair of misunderstandings. While common ground is a central concept in human collaboration, it has received limited attention in studies of human-AI collaboration. In this paper, we introduce a new benchmark grounded in theories and empirical studies of human-human collaboration. The benchmark is based on a collaborative puzzle task that requires iterative interaction, joint action, referential coordination, and repair under varying conditions of situation awareness. We validate the benchmark through a confirmatory user study in which human participants collaborate with an AI to solve the task. The results show that the benchmark reproduces established theoretical and empirical findings from human-human collaboration, while also revealing clear divergences in human-AI interaction.

Authors:Anna Martin-Boyle, William Humphreys, Martha Brown, Cara Leckey, Harmanpreet Kaur
Title: An Expert Schema for Evaluating Large Language Model Errors in Scholarly Question-Answering Systems
Abstract:
Large Language Models (LLMs) are transforming scholarly tasks like search and summarization, but their reliability remains uncertain. Current evaluation metrics for testing LLM reliability are primarily automated approaches that prioritize efficiency and scalability, but lack contextual nuance and fail to reflect how scientific domain experts assess LLM outputs in practice. We developed and validated a schema for evaluating LLM errors in scholarly question-answering systems that reflects the assessment strategies of practicing scientists. In collaboration with domain experts, we identified 20 error patterns across seven categories through thematic analysis of 68 question-answer pairs. We validated this schema through contextual inquiries with 10 additional scientists, which showed not only which errors experts naturally identify but also how structured evaluation schemas can help them detect previously overlooked issues. Domain experts use systematic assessment strategies, including technical precision testing, value-based evaluation, and meta-evaluation of their own practices. We discuss implications for supporting expert evaluation of LLM outputs, including opportunities for personalized, schema-driven tools that adapt to individual evaluation patterns and expertise levels.

Authors:Anna Martin-Boyle, Cara A. C. Leckey, Martha C. Brown, Harmanpreet Kaur
Title: PaperTrail: A Claim-Evidence Interface for Grounding Provenance in LLM-based Scholarly Q&A
Abstract:
Large language models (LLMs) are increasingly used in scholarly question-answering (QA) systems to help researchers synthesize vast amounts of literature. However, these systems often produce subtle errors (e.g., unsupported claims, errors of omission), and current provenance mechanisms like source citations are not granular enough for the rigorous verification that scholarly domain requires. To address this, we introduce PaperTrail, a novel interface that decomposes both LLM answers and source documents into discrete claims and evidence, mapping them to reveal supported assertions, unsupported claims, and information omitted from the source texts. We evaluated PaperTrail in a within-subjects study with 26 researchers who performed two scholarly editing tasks using PaperTrail and a baseline interface. Our results show that PaperTrail significantly lowered participants' trust compared to the baseline. However, this increased caution did not translate to behavioral changes, as people continued to rely on LLM-generated scholarly edits to avoid a cognitively burdensome task. We discuss the value of claim-evidence matching for understanding LLM trustworthiness in scholarly settings, and present design implications for cognition-friendly communication of provenance information.

Authors:EunJeong Cheon, Do Yeon Shin
Title: Is Robot Labor Labor? Delivery Robots and the Politics of Work in Public Space
Abstract:
As sidewalk delivery robots become increasingly integrated into urban life, this paper begins with a critical provocation: Is robot labor labor? More than a rhetorical question, this inquiry invites closer attention to the social and political arrangements that robot labor entails. Drawing on ethnographic fieldwork across two smart-city districts in Seoul, we examine how delivery robot labor is collectively sustained. While robotic actions are often framed as autonomous and efficient, we show that each successful delivery is in fact a distributed sociotechnical achievement--reliant on human labor, regulatory coordination, and social accommodations. We argue that delivery robots do not replace labor but reconfigure it--rendering some forms more visible (robotic performance) while obscuring others (human and institutional support). Unlike industrial robots, delivery robots operate in shared public space, engage everyday passersby, and are embedded in policy and progress narratives. In these spaces, we identify "robot privilege"--humans routinely yielding to robots--and distinct perceptions between casual observers ("cute") and everyday coexisters ("admirable"). We contribute a conceptual reframing of robot labor as a collective assemblage, empirical insights into South Korea's smart-city automation, and a call for HRI to engage more deeply with labor and spatial politics to better theorize public-facing robots.

Authors:Joel Bucher, Lahari Goswami, Sverrir Thorgeirsson, April Yi Wang
Title: Git Takes Two: Split-View Awareness for Collaborative Learning of Distributed Workflows in Git
Abstract:
Git is widely used for collaborative software development, but it can be challenging for newcomers. While most learning tools focus on individual workflows, Git is inherently collaborative. We present GitAcademy, a browser-based learning platform that embeds a full Git environment with a split-view collaborative mode: learners work on their own local repositories connected to a shared remote repository, while simultaneously seeing their partner's actions mirrored in real time. This design is not intended for everyday software development, but rather as a training simulator to build awareness of distributed states, coordination, and collaborative troubleshooting. In a within-subjects study with 13 pairs of learners, we found that the split-view interface enhanced social presence, supported peer teaching, and was consistently preferred over a single-view baseline, even though performance gains were mixed. We further discuss how split-view awareness can serve as a training-only scaffold for collaborative learning of Git and other distributed technical systems.

Authors:Kynnedy Simone Smith, Lydia B. Chilton, Danielle Bragg
Title: Identifying, Explaining, and Correcting Ableist Language with AI
Abstract:
Ableist language perpetuates harmful stereotypes and exclusion, yet its nuanced nature makes it difficult to recognize and address. Artificial intelligence could serve as a powerful ally in the fight against ableist language, offering tools that detect and suggest alternatives to biased terms. This two-part study investigates the potential of large language models (LLMs), specifically ChatGPT, to rectify ableist language and educate users about inclusive communication. We compared GPT-4o generations with crowdsourced annotations from trained disability community members, then invited disabled participants to evaluate both. Participants reported equal agreement with human and AI annotations but significantly preferred the AI, citing its narrative consistency and accessible style. At the same time, they valued the emotional depth and cultural grounding of human annotations. These findings highlight the promise and limits of LLMs in handling culturally sensitive content. Our contributions include a dataset of nuanced ableism annotations and design considerations for inclusive writing tools.

Authors:Duy Anh Ta, Farnaz Farid, Farhad Ahamed, Ala Al-Areqi, Robert Beutel, Tamara Watson, Alana Maurushat
Title: BioEnvSense: A Human-Centred Security Framework for Preventing Behaviour-Driven Cyber Incidents
Abstract:
Modern organizations increasingly face cybersecurity incidents driven by human behaviour rather than technical failures. To address this, we propose a conceptual security framework that integrates a hybrid Convolutional Neural Network-Long Short-Term Memory (CNN-LSTM) model to analyze biometric and environmental data for context-aware security decisions. The CNN extracts spatial patterns from sensor data, while the LSTM captures temporal dynamics associated with human error susceptibility. The model achieves 84% accuracy, demonstrating its ability to reliably detect conditions that lead to elevated human-centred cyber risk. By enabling continuous monitoring and adaptive safeguards, the framework supports proactive interventions that reduce the likelihood of human-driven cyber incidents

Authors:Hazim AbdElazim, Shadman Islam, Mostafa Milani
Title: The Human Factor in Data Cleaning: Exploring Preferences and Biases
Abstract:
Data cleaning is often framed as a technical preprocessing step, yet in practice it relies heavily on human judgment. We report results from a controlled survey study in which participants performed error detection, data repair and imputation, and entity matching tasks on census-inspired scenarios with known semantic validity. We find systematic evidence for several cognitive bias mechanisms in data cleaning. Framing effects arise when surface-level formatting differences (e.g., capitalization or numeric presentation) increase false-positive error flags despite unchanged semantics. Anchoring and adjustment bias appears when expert cues shift participant decisions beyond parity, consistent with salience and availability effects. We also observe the representativeness heuristic: atypical but valid attribute combinations are frequently flagged as erroneous, and in entity matching tasks, surface similarity produces a substantial false-positive rate with high confidence. In data repair, participants show a robust preference for leaving values missing rather than imputing plausible values, consistent with omission bias. In contrast, automation-aligned switching under strong contradiction does not exceed a conservative rare-error tolerance threshold at the population level, indicating that deference to automated recommendations is limited in this setting. Across scenarios, bias patterns persist among technically experienced participants and across diverse workflow practices, suggesting that bias in data cleaning reflects general cognitive tendencies rather than lack of expertise. These findings motivate human-in-the-loop cleaning systems that clearly separate representation from semantics, present expert or algorithmic recommendations non-prescriptively, and support reflective evaluation of atypical but valid cases.

Authors:Boyuan Gu, Shuaiqi Cheng, Minghao yu
Title: The Neural-Wave Quick Escape Manual 2036: A Field Guide to Adversarial Living in the Era of "Empathic" AIoT
Abstract:
As the aging population faces a chronic care deficit, domestic care is increasingly recast as spectral governance. This paper presents a design fiction set in 2036, where the home is governed by Neural-Wave, a camera-free mmWave sensing platform that infers well-being from involuntary micro-motions. Through a set of scenarios, we illustrate how such empathic systems displace autonomy, forcing residents to perform legibility to regain basic freedoms. Our primary contribution is a diegetic artifact: The Neural-Wave Quick Escape Manual. Styled as an illicit guide for the elderly, it details adversarial tactics: structured around protocols to Comply, Degrade, and Refuse, that exploit signal processing vulnerabilities to reclaim domestic privacy. Through this artifact, we argue that in the era of empathic AIoT, privacy requires more than policy opt-outs; it demands adversarial literacy:the capacity to meaningfully obfuscate one's own data traces against an infrastructural jailer that calls itself care.

Authors:George X. Wang, Jiaqian Hu, Jing Qian
Title: Who Has the Final Word? Designing Multi-Agent Collaborative Framework for Professional Translators
Abstract:
Recent advances in LLM based translation have led to renewed interest in fully automated systems, yet professional translators remain essential in high stakes domains where decisions about accuracy, terminology, style, and audience cannot be safely automated. Current tools are typically single shot generators or single-agent self-refiners, offering limited support for translator multidimensional decision making process and providing little structured leverage for translator input. We present CHORUS, a human-AI multiagent collaborative translation framework grounded in the Multidimensional Quality Metrics (MQM) framework, which decomposes quality dimensions into specialized agents and integrates their feedback into an iterative refinement loop controlled by the translator. A six-user preliminary study with professional translators found that CHORUS consistently outperforms zero-shot and single-agent baselines, showing that MQM-aligned multi-agent collaboration better supports professional translation workflows than autonomous generation.

Authors:Black Sun, Haiyang Xu, Ge Kacy Fu, Liyue Da, Eve Hoggan
Title: MagHeart: Exploring Playful Avatar Co-Creation and Shared Heartbeats for Icebreaking in Hybrid Meetings
Abstract:
Hybrid meetings often begin with social awkwardness and asymmetric participation, particularly for remote attendees who lack access to informal, co-present interaction. We present MagHeart, a multimodal system that explores symmetric icebreaking in hybrid meetings through playful LEGO-based avatar co-creation and a tangible magnetic device that represents a remote participant's heartbeat as an ambient presence cue. By combining creative co-creation with abstract bio-feedback, MagHeart rethinks how remote participants can become materially and perceptually present during meeting openings. We report findings from a scenario-based exploratory study combining quantitative and qualitative data, examining participants' anticipated engagement, perceived social presence, and future-use intentions from both co-located and remote perspectives. Our results highlight opportunities for playful, embodied icebreakers to support early hybrid interaction, while also surfacing tensions around privacy, distraction, and contextual appropriateness. This work contributes design insights and open questions for future hybrid meeting tools that balance playfulness, embodiment, and social sensitivity.

Authors:danah boyd, Jayshree Sarathy
Title: Differential Perspectives: Epistemic Disconnects Surrounding the US Census Bureau's Use of Differential Privacy
Abstract:
When the U.S. Census Bureau announced its intention to modernize its disclosure avoidance procedures for the 2020 Census, it sparked a controversy that is still underway. The move to differential privacy introduced technical and procedural uncertainties, leaving stakeholders unable to evaluate the quality of the data. More importantly, this transformation exposed the statistical illusions and limitations of census data, weakening stakeholders' trust in the data and in the Census Bureau itself. This essay examines the epistemic currents of this controversy. Drawing on theories from Science and Technology Studies (STS) and ethnographic fieldwork, we analyze the current controversy over differential privacy as a battle over uncertainty, trust, and legitimacy of the Census. We argue that rebuilding trust will require more than technical repairs or improved communication; it will require reconstructing what we identify as a 'statistical imaginary.'

Authors:Eman Alashwali, Abeer Alhuzali
Title: One Year After the PDPL: a Glimpse into the E-Commerce World in Saudi Arabia
Abstract:
In 2024, Saudi Arabia's Personal Data Protection Law (PDPL) came into force. However, little work has been done to assess its implementation. In this paper, we analyzed 100 e-commerce websites in Saudi Arabia against the PDPL, examining the presence of a privacy policy and, if present, the policy's declarations of four items pertaining to personal data rights and practices: a) personal data retention period, b) the right to request the destruction of personal data, c) the right to request a copy of personal data, and d) a mechanism for filing complaints. Our results show that, despite national awareness and support efforts, a significant fraction of e-commerce websites in our dataset are not fully compliant: only 31% of websites in our dataset declared all four examined items in their privacy policies. Even when privacy policies included such declarations, a considerable fraction of them failed to cover required fine-grained details. Second, the majority of top-ranked e-commerce websites (based on search results order) and those hosted on local e-commerce hosting platforms exhibited considerably higher non-compliance rates than mid- to low-ranked websites and those not hosted on local e-commerce platforms. Third, we assessed the use of Large Language Models (LLMs) as an automated tool for privacy policy analysis to measure compliance with the PDPL. We highlight the potential of LLMs and suggest considerations to improve LLM-based automated analysis for privacy policies. Our results provide a step forward in understanding the implementation barriers to data protection laws, especially in non-Western contexts. We provide recommendations for policymakers, regulators, website owners, and developers seeking to improve data protection practices and automate compliance monitoring.

Authors:Yuvarani Ganesan, Salsabila Harlen, Azfar Rahman Bin Fazul Rahman, Akashdeep Singh, Zahra Fathanah, Raja Jamilah Raja Yusof
Title: LunaAI: A Polite and Fair Healthcare Guidance Chatbot
Abstract:
Conversational AI has significant potential in the healthcare sector, but many existing systems fall short in emotional intelligence, fairness, and politeness, which are essential for building patient trust. This gap reduces the effectiveness of digital health solutions and can increase user anxiety. This study addresses the challenge of integrating ethical communication principles by designing and evaluating LunaAI, a healthcare chatbot prototype. Using a user-centered design approach informed by a structured literature review, we developed conversational scenarios that handle both routine and hostile user interactions. The system was implemented using the Google Gemini API and deployed as a mobile-first Progressive Web App built with React, Vite, and Firebase. Preliminary user testing was conducted with a small participant group, and responses were evaluated using established frameworks such as the Godspeed Questionnaire. In addition, a comparative analysis was performed between LunaAI's tailored responses and the baseline outputs of an uncustomized large language model. The results indicate measurable improvements in key interaction qualities, with average user ratings of 4.7 out of 5 for politeness and 4.9 out of 5 for fairness. These findings highlight the importance of intentional ethical conversational design for human-computer interaction, particularly in sensitive healthcare contexts.

Authors:Alexandra Neagu, Marcus Messer, Peter Johnson, Rhodri Nelson
Title: "How Do I ...?": Procedural Questions Predominate Student-LLM Chatbot Conversations
Abstract:
Providing scaffolding through educational chatbots built on Large Language Models (LLM) has potential risks and benefits that remain an open area of research. When students navigate impasses, they ask for help by formulating impasse-driven questions. Within interactions with LLM chatbots, such questions shape the user prompts and drive the pedagogical effectiveness of the chatbot's response. This paper focuses on such student questions from two datasets of distinct learning contexts: formative self-study, and summative assessed coursework. We analysed 6,113 messages from both learning contexts, using 11 different LLMs and three human raters to classify student questions using four existing schemas. On the feasibility of using LLMs as raters, results showed moderate-to-good inter-rater reliability, with higher consistency than human raters. The data showed that 'procedural' questions predominated in both learning contexts, but more so when students prepare for summative assessment. These results provide a basis on which to use LLMs for classification of student questions. However, we identify clear limitations in both the ability to classify with schemas and the value of doing so: schemas are limited and thus struggle to accommodate the semantic richness of composite prompts, offering only partial understanding the wider risks and benefits of chatbot integration. In the future, we recommend an analysis approach that captures the nuanced, multi-turn nature of conversation, for example, by applying methods from conversation analysis in discursive psychology.

Authors:Aya Abdelnaem El-Basha, Ebtsam ELSayed Mahmoud ELSayes, Ahmad Al-Kabbany
Title: The Effectiveness of a Virtual Reality-Based Training Program for Improving Body Awareness in Children with Attention Deficit and Hyperactivity Disorder
Abstract:
This study investigates the effectiveness of a Virtual Reality (VR)-based training program in improving body awareness among children with Attention Deficit Hyperactivity Disorder (ADHD). Utilizing a quasi-experimental design, the research sample consisted of 10 children aged 4 to 7 years, with IQ scores ranging from 90 to 110. Participants were divided into an experimental group and a control group, with the experimental group receiving a structured VR intervention over three months, totaling 36 sessions. Assessment tools included the Stanford-Binet Intelligence Scale (5th Edition), the Conners Test for ADHD, and a researcher-prepared Body Awareness Scale. The results indicated statistically significant differences between pre-test and post-test scores for the experimental group, demonstrating the program's efficacy in enhancing spatial awareness, body part identification, and motor expressions. Furthermore, follow-up assessments conducted one month after the intervention revealed no significant differences from the post-test results, confirming the sustainability and continuity of the program's effects over time. The findings suggest that immersive VR environments provide a safe, engaging, and effective therapeutic medium for addressing psychomotor deficits in early childhood ADHD.

Authors:Dimitri Staufer, Kirsten Morehouse
Title: What Do LLMs Associate with Your Name? A Human-Centered Black-Box Audit of Personal Data
Abstract:
Large language models (LLMs), and conversational agents based on them, are exposed to personal data (PD) during pre-training and during user interactions. Prior work shows that PD can resurface, yet users lack insight into how strongly models associate specific information to their identity. We audit PD across eight LLMs (3 open-source; 5 API-based, including GPT-4o), introduce LMP2 (Language Model Privacy Probe), a human-centered, privacy-preserving audit tool refined through two formative studies (N=20), and run two studies with EU residents to capture (i) intuitions about LLM-generated PD (N1=155) and (ii) reactions to tool output (N2=303). We show empirically that models confidently generate multiple PD categories for well-known individuals. For everyday users, GPT-4o generates 11 features with 60% or more accuracy (e.g., gender, hair color, languages). Finally, 72% of participants sought control over model-generated associations with their name, raising questions about what counts as PD and whether data privacy rights should extend to LLMs.

Authors:Uğur Genç, Heng Gu, Chadha Degachi, Evangelos Niforatos, Senthil Chandrasegaran, Himanshu Verma
Title: The Bots of Persuasion: Examining How Conversational Agents' Linguistic Expressions of Personality Affect User Perceptions and Decisions
Abstract:
Large Language Model-powered conversational agents (CAs) are increasingly capable of projecting sophisticated personalities through language, but how these projections affect users is unclear. We thus examine how CA personalities expressed linguistically affect user decisions and perceptions in the context of charitable giving. In a crowdsourced study, 360 participants interacted with one of eight CAs, each projecting a personality composed of three linguistic aspects: attitude (optimistic/pessimistic), authority (authoritative/submissive), and reasoning (emotional/rational). While the CA's composite personality did not affect participants' decisions, it did affect their perceptions and emotional responses. Particularly, participants interacting with pessimistic CAs felt lower emotional state and lower affinity towards the cause, perceived the CA as less trustworthy and less competent, and yet tended to donate more toward the charity. Perceptions of trust, competence, and situational empathy significantly predicted donation decisions. Our findings emphasize the risks CAs pose as instruments of manipulation, subtly influencing user perceptions and decisions.

Authors:Farnaz Zamiri Zeraati, Yang Trista Cao, Yuehan Qiao, Hal Daumé, Hernisa Kacorri
Title: Say It My Way: Exploring Control in Conversational Visual Question Answering with Blind Users
Abstract:
Prompting and steering techniques are well established in general-purpose generative AI, yet assistive visual question answering (VQA) tools for blind users still follow rigid interaction patterns with limited opportunities for customization. User control can be helpful when system responses are misaligned with their goals and contexts, a gap that becomes especially consequential for blind users that may rely on these systems for access. We invite 11 blind users to customize their interactions with a real-world conversational VQA system. Drawing on 418 interactions, reflections, and post-study interviews, we analyze prompting-based techniques participants adopted, including those introduced in the study and those developed independently in real-world settings. VQA interactions were often lengthy: participants averaged 3 turns, sometimes up to 21, with input text typically tenfold shorter than the responses they heard. Built on state-of-the-art LLMs, the system lacked verbosity controls, was limited in estimating distance in space and time, relied on inaccessible image framing, and offered little to no camera guidance. We discuss how customization techniques such as prompt engineering can help participants work around these limitations. Alongside a new publicly available dataset, we offer insights for interaction design at both query and system levels.

Authors:EunJeong Cheon, Do Yeon Shin
Title: "Hello, I'm Delivering. Let Me Pass By": Navigating Public Pathways with Walk-along with Robots in Crowded City Streets
Abstract:
As the presence of autonomous robots in public spaces increases-whether navigating campus walkways or neighborhood sidewalks-understanding how to carefully study these robots becomes critical. While HRI research has conducted field studies in public spaces, these are often limited to controlled experiments with prototype robots or structured observational methods, such as the Wizard of Oz technique. However, the autonomous mobile robots we encounter today, particularly delivery robots, operate beyond the control of researchers, navigating dynamic routes and unpredictable environments. To address this challenge, a more deliberate approach is required. Drawing inspiration from public realm ethnography in urban studies, geography, and sociology, this paper proposes the Walk-Along with Robots (WawR) methodology. We outline the key features of this method, the steps we applied in our study, the unique insights it offers, and the ways it can be evaluated. We hope this paper stimulates further discussion on research methodologies for studying autonomous robots in public spaces.

Authors:Michael T. Knierim, Thimo Schulz, Moritz Schiller, Jwan Shaban, Mario Nadj, Max L. Wilson, Alexander Maedche
Title: Flow on Social Media? Rarer Than You'd Think
Abstract:
Researchers often attribute social media's appeal to its ability to elicit flow experiences of deep absorption and effortless engagement. Yet prolonged use has also been linked to distraction, fatigue, and lower mood. This paradox remains poorly understood, in part because prior studies rely on habitual or one-shot reports that ask participants to directly attribute flow to social media. To address this gap, we conducted a five-day field study with 40 participants, combining objective smartphone app tracking with daily reconstructions of flow-inducing activities. Across 673 reported flow occurrences, participants rarely associated flow with social media (2 percent). Instead, heavier social media use predicted fewer daily flow occurrences. We further examine this relationship through the effects of social media use on fatigue, mood, and motivation. Altogether, our findings suggest that flow and social media may not align as closely as assumed - and might even compete - underscoring the need for further research.

Authors:Megan Lee, Seung Ha Hwang, Inhyeok Choi, Shreyas Darade, Mengchun Zhang, Kateryna Shapovalenko
Title: ASPEN: Spectral-Temporal Fusion for Cross-Subject Brain Decoding
Abstract:
Cross-subject generalization in EEG-based brain-computer interfaces (BCIs) remains challenging due to individual variability in neural signals. We investigate whether spectral representations offer more stable features for cross-subject transfer than temporal waveforms. Through correlation analyses across three EEG paradigms (SSVEP, P300, and Motor Imagery), we find that spectral features exhibit consistently higher cross-subject similarity than temporal signals. Motivated by this observation, we introduce ASPEN, a hybrid architecture that combines spectral and temporal feature streams via multiplicative fusion, requiring cross-modal agreement for features to propagate. Experiments across six benchmark datasets reveal that ASPEN is able to dynamically achieve the optimal spectral-temporal balance depending on the paradigm. ASPEN achieves the best unseen-subject accuracy on three of six datasets and competitive performance on others, demonstrating that multiplicative multimodal fusion enables effective cross-subject generalization.

Authors:Weijun Zhang, Xinru Tang
Title: Access in the Shadow of Ableism: An Autoethnography of a Blind Student's Higher Education Experience in China
Abstract:
The HCI research community has witnessed a growing body of research on accessibility and disability driven by efforts to improve access. Yet, the concept of access reveals its limitations when examined within broader ableist structures. Drawing on an autoethnographic method, this study shares the co-first author Zhang's experiences at two higher-education institutions in China, including a specialized program exclusively for blind and low-vision students and a mainstream university where he was the first blind student admitted. Our analysis revealed tensions around access in both institutions: they either marginalized blind students within society at large or imposed pressures to conform to sighted norms. Both institutions were further constrained by systemic issues, including limited accessible resources, pervasive ableist cultures, and the lack of formalized policies. In response to these tensions, we conceptualize access as a contradictory construct and argue for understanding accessibility as an ongoing, exploratory practice within ableist structures.

Authors:Zhiyuan Liang, Enfang Cui, Qian Wei, Rui She, Tianzheng Li, Minxin Guo, Yujun Cheng
Title: A2H: Agent-to-Human Protocol for AI Agent
Abstract:
AI agents are increasingly deployed as autonomous systems capable of planning, tool use, and multi-agent collaboration across complex tasks. However, existing agent-related protocols focus on agent-to-agent interactions, leaving humans as external observers rather than integrated participants within the agent systems. This limitation arises from the lack of a standardized mechanism for agents to discover, address, and interact with humans across heterogeneous messaging platforms. In this paper, we propose the A2H (Agent-to-Human) protocol, a unified protocol that enables humans to be registered, discovered, and communicated with by AI agents as resolvable entities within agent systems. A2H contributes three key components: (1) Human Card for registering human identities via resolvable domain names, making them discoverable to agents; (2) Formal Communication Schema defines when, why, and how agents contact with human;(3) Unified Messaging Abstraction standardizes diverse communication medias and transforms complex JSON outputs into human-friendly formats. This work establishes a foundational protocol for integrating humans into agent ecosystems, advancing AI agents from isolated autonomous systems toward truly human-connected intelligent infrastructures.

Authors:Feras Kiki, Pouya P. Niaz, Alireza Madani, Cagatay Basdogan
Title: Estimating Human Muscular Fatigue in Dynamic Collaborative Robotic Tasks with Learning-Based Models
Abstract:
Assessing human muscle fatigue is critical for optimizing performance and safety in physical human-robot interaction(pHRI). This work presents a data-driven framework to estimate fatigue in dynamic, cyclic pHRI using arm-mounted surface electromyography(sEMG). Subject-specific machine-learning regression models(Random Forest, XGBoost, and Linear Regression predict the fraction of cycles to fatigue(FCF) from three frequency-domain and one time-domain EMG features, and are benchmarked against a convolutional neural network(CNN) that ingests spectrograms of filtered EMG. Framing fatigue estimation as regression (rather than classification) captures continuous progression toward fatigue, supporting earlier detection, timely intervention, and adaptive robot control. In experiments with ten participants, a collaborative robot under admittance control guided repetitive lateral (left-right) end-effector motions until muscular fatigue. Average FCF RMSE across participants was 20.8+/-4.3% for the CNN, 23.3+/-3.8% for Random Forest, 24.8+/-4.5% for XGBoost, and 26.9+/-6.1% for Linear Regression. To probe cross-task generalization, one participant additionally performed unseen vertical (up-down) and circular repetitions; models trained only on lateral data were tested directly and largely retained accuracy, indicating robustness to changes in movement direction, arm kinematics, and muscle recruitment, while Linear Regression deteriorated. Overall, the study shows that both feature-based ML and spectrogram-based DL can estimate remaining work capacity during repetitive pHRI, with the CNN delivering the lowest error and the tree-based models close behind. The reported transfer to new motion patterns suggests potential for practical fatigue monitoring without retraining for every task, improving operator protection and enabling fatigue-aware shared autonomy, for safer fatigue-adaptive pHRI control.

Authors:Le Lin, Zihao Zhu, Rainbow Tin Hung Ho, Jing Liao, Yuhan Luo
Title: Exploring a Multimodal Chatbot as a Facilitator in Therapeutic Art Activity
Abstract:
Therapeutic art activities, such as expressive drawing and painting, require the synergy between creative visual production and interactive dialogue. Recent advancements in Multimodal Large Language Models (MLLMs) have expanded the capacity of computing systems to interpret both textual and visual data, offering a new frontier for AI-mediated therapeutic support. This work-in-progress paper introduces an MLLM-powered chatbot that analyzes visual creation in real-time while engaging the creator in reflective conversations. We conducted an evaluation with five experts in art therapy and related fields, which demonstrated the chatbot's potential to facilitate therapeutic engagement, and highlighted several areas for future development, including entryways and risk management, bespoke alignment of user profile and therapeutic style, balancing conversational depth and width, and enriching visual interactivity. These themes provide a design roadmap for designing the future AI-mediated creative expression tools.

Authors:Maqbool Dada, Brett Hathaway, Evgeny Kagan
Title: Customer Service Operations: A Gatekeeper Framework
Abstract:
Customer service has evolved beyond in-person visits and phone calls to include live chat, AI chatbots and social media, among other contact options. Service providers typically refer to these contact modalities as "channels". Within each channel, customer service agents are tasked with managing and resolving a stream of inbound service requests. Each request involves milestones where the agent must decide whether to keep assisting the customer or to transfer them to a more skilled -- and often costlier -- provider. To understand how this request resolution process should be managed, we develop a model in which each channel is represented as a gatekeeper system and characterize the structure of the optimal request resolution policy. We then turn to the broader question of the firm's customer service design, which includes the strategic problem of which channels to deploy, the tactical questions of at what level to staff the live-agent channel and to what extent to train an AI chatbot, and the operational question of how to control the live-agent channel. Examining the interplay between strategic, tactical, and operational decisions through numerical methods, we show, among other insights, that service quality can be improved, rather than diminished, by chatbot implementation.

Authors:Shayla Sharmin, Sadia Afrin
Title: Avoiding Social Judgment, Seeking Privacy: Investigating why Mothers Shift from Facebook Groups to Large Language Models
Abstract:
Social media platforms, especially Facebook parenting groups, have long been used as informal support networks for mothers seeking advice and reassurance. However, growing concerns about social judgment, privacy exposure, and unreliable information are changing how mothers seek help. This exploratory mixed-method study examines why mothers are moving from Facebook parenting groups to large language models such as ChatGPT and Gemini. We conducted a cross-sectional online survey of 109 mothers. Results show that 41.3% of participants avoided Facebook parenting groups because they expected judgment from others. This difference was statistically significant across location and family structure. Mothers living in their home country and those in joint families were more likely to avoid Facebook groups. Qualitative findings revealed three themes: social judgment and exposure, LLMs as safe and private spaces, and quick and structured support. Participants described LLMs as immediate, emotionally safe, and reliable alternatives that reduce social risk when asking for help. Rather than replacing human support, LLMs appear to fill emotional and practical gaps within existing support systems. These findings show a change in maternal digital support and highlight the need to design LLM systems that support both information and emotional safety.

Authors:Alberto Olivares-Alarcos, Muhammad Ahsan, Satrio Sanjaya, Hsien-I Lin, Guillem Alenyà
Title: Ontological grounding for sound and natural robot explanations via large language models
Abstract:
Building effective human-robot interaction requires robots to derive conclusions from their experiences that are both logically sound and communicated in ways aligned with human expectations. This paper presents a hybrid framework that blends ontology-based reasoning with large language models (LLMs) to produce semantically grounded and natural robot explanations. Ontologies ensure logical consistency and domain grounding, while LLMs provide fluent, context-aware and adaptive language generation. The proposed method grounds data from human-robot experiences, enabling robots to reason about whether events are typical or atypical based on their properties. We integrate a state-of-the-art algorithm for retrieving and constructing static contrastive ontology-based narratives with an LLM agent that uses them to produce concise, clear, interactive explanations. The approach is validated through a laboratory study replicating an industrial collaborative task. Empirical results show significant improvements in the clarity and brevity of ontology-based narratives while preserving their semantic accuracy. Initial evaluations further demonstrate the system's ability to adapt explanations to user feedback. Overall, this work highlights the potential of ontology-LLM integration to advance explainable agency, and promote more transparent human-robot collaboration.

Authors:Shuhao Ma, John Zimmerman, Valentina Nisi, Nuno Jardim Nunes
Title: Revisiting Worker-Centered Design: Tensions, Blind Spots, and Action Spaces
Abstract:
Worker-Centered Design (WCD) has gained prominence over the past decade, offering researchers and practitioners ways to engage worker agency and support collective actions for workers. Yet few studies have systematically revisited WCD itself, examining its implementations, challenges, and practical impact. Through a four-lens analytical framework that examines multiple facets of WCD within food delivery industry, we identify critical tensions and blind spots from a Multi-Laborer System perspective. Our analysis reveals conflicts across labor chains, distorted implementations of WCD, designers' sometimes limited political-economic understanding, and workers as active agents of change. These insights further inform a Diagnostic-Generative pathway that helps to address recurring risks, including labor conflicts and institutional reframing, while cultivating designers' policy and economic imagination. Following the design criticism tradition, and through a four-lens reflexive analysis, this study expands the action space for WCD and strengthens its relevance to real-world practice.

Authors:Daniel Schwartz, Dario Salvucci, Yusuf Osmanlioglu, Richard Vallett, Genevieve Dion, Ali Shokoufandeh
Title: Resource-Efficient Gesture Recognition through Convexified Attention
Abstract:
Wearable e-textile interfaces require gesture recognition capabilities but face severe constraints in power consumption, computational capacity, and form factor that make traditional deep learning impractical. While lightweight architectures like MobileNet improve efficiency, they still demand thousands of parameters, limiting deployment on textile-integrated platforms. We introduce a convexified attention mechanism for wearable applications that dynamically weights features while preserving convexity through nonexpansive simplex projection and convex loss functions. Unlike conventional attention mechanisms using non-convex softmax operations, our approach employs Euclidean projection onto the probability simplex combined with multi-class hinge loss, ensuring global convergence guarantees. Implemented on a textile-based capacitive sensor with four connection points, our approach achieves 100.00\% accuracy on tap gestures and 100.00\% on swipe gestures -- consistent across 10-fold cross-validation and held-out test evaluation -- while requiring only 120--360 parameters, a 97\% reduction compared to conventional approaches. With sub-millisecond inference times (290--296$μ$s) and minimal storage requirements ($<$7KB), our method enables gesture interfaces directly within e-textiles without external processing. Our evaluation, conducted in controlled laboratory conditions with a single-user dataset, demonstrates feasibility for basic gesture interactions. Real-world deployment would require validation across multiple users, environmental conditions, and more complex gesture vocabularies. These results demonstrate how convex optimization can enable efficient on-device machine learning for textile interfaces.

Authors:Jingwen Bai, Wei Soon Cheong, Philippe Muller, Brian Y Lim
Title: iRULER: Intelligible Rubric-Based User-Defined LLM Evaluation for Revision
Abstract:
Large Language Models (LLMs) have become indispensable for evaluating writing. However, text feedback they provide is often unintelligible, generic, and not specific to user criteria. Inspired by structured rubrics in education and intelligible AI explanations, we propose iRULER following identified design guidelines to \textit{scaffold} the review process by \textit{specific} criteria, providing \textit{justification} for score selection, and offering \textit{actionable} revisions to target different quality levels. To \textit{qualify} user-defined criteria, we recursively used iRULER with a rubric-of-rubrics to iteratively \textit{refine} rubrics. In controlled experiments on writing revision and rubric creation, iRULER most improved validated LLM-judged review scores and was perceived as most helpful and aligned compared to read-only rubric and text-based LLM feedback. Qualitative findings further support how iRULER satisfies the design guidelines for user-defined feedback. This work contributes interactive rubric tools for intelligible LLM-based review and revision of writing, and user-defined rubric creation.

Authors:Xuehan Huang, Canwen Wang, Yifei Hao, Daijin Yang, Ray LC
Title: "Not Human, Funnier": How Machine Identity Shapes Humor Perception in Online AI Stand-up Comedy
Abstract:
Chatbots are increasingly applied to domains previously reserved for human actors. One such domain is comedy, whereby both the general public working with ChatGPT and research-based LLM-systems have tried their hands on making humor. In formative interviews with professional comedians and video analyses of stand-up comedy in humans, we found that human performers often use their ethnic, gender, community, and demographic-based identity to enable joke-making. This suggests whether the identity of AI itself can empower AI humor generation for human audiences. We designed a machine-identity-based agent that uses its own status as AI to tell jokes in online performance format. Studies with human audiences (N=32) showed that machine-identity-based agents were seen as funnier than baseline-GPT agent. This work suggests the design of human-AI integrated systems that explicitly utilize AI as its own unique identity apart from humans.

Authors:Fakhri Momeni, Sarah Sajid, Johannes Kiesel
Title: From Guidelines to Practice: Evaluating the Reproducibility of Methods in Computational Social Science
Abstract:
Reproducibility remains a central challenge in computational social science, where complex workflows, evolving software ecosystems, and inconsistent documentation hinder researchers ability to re-execute published methods. This study presents a systematic evaluation of reproducibility across three conditions: uncurated documentation, curated documentation, and curated documentation paired with a preset execution environment. Using 47 usability test sessions, we combine behavioral performance indicators (success rates, task time, and error profiles) with questionnaire data and thematic analysis to identify technical and conceptual barriers to reproducibility. Curated documentation substantially reduced repository-level errors and improved users ability to interpret method outputs. Standardizing the execution environment further improved reproducibility, yielding the highest success rate and shortest task completion times. Across conditions, participants frequently relied on AI tools for troubleshooting, often enabling independent resolution of issues without facilitator intervention. Our findings demonstrate that reproducibility barriers are multi-layered and require coordinated improvements in documentation quality, environment stability, and conceptual clarity. We discuss implications for the design of reproducibility platforms and infrastructure in computational social science.

Authors:Haoyang Chen, Jingwen Bai, Fang Tian, Brian Y Lim
Title: Editable XAI: Toward Bidirectional Human-AI Alignment with Co-Editable Explanations of Interpretable Attributes
Abstract:
While Explainable AI (XAI) helps users understand AI decisions, misalignment in domain knowledge can lead to disagreement. This inconsistency hinders understanding, and because explanations are often read-only, users lack the control to improve alignment. We propose making XAI editable, allowing users to write rules to improve control and gain deeper understanding through the generation effect of active learning. We developed CoExplain, leveraging a neural network for universal representation and symbolic rules for intuitive reasoning on interpretable attributes. CoExplain explains the neural network with a faithful proxy decision tree, parses user-written rules as an equivalent neural network graph, and collaboratively optimizes the decision tree. In a user study (N=43), CoExplain and manually editable XAI improved user understanding and model alignment compared to read-only XAI. CoExplain was easier to use with fewer edits and less time. This work contributes Editable XAI for bidirectional AI alignment, improving understanding and control.

Authors:Shunsei Yamagishi, Lei Jing
Title: A Lightweight Cubature Kalman Filter for Attitude and Heading Reference Systems Using Simplified Prediction Equations
Abstract:
Attitude and Heading Reference Systems (AHRSs) are broadly applied wherever reliable orientation and motion sensing is required. In this paper, we present an improved Cubature Kalman Filter (CKF) with lower computational cost while maintaining estimation accuracy, which is named "Kaisoku Cubature Kalman Filter (KCKF)". The computationally efficient equations of the KCKF are derived by simplifying those of the CKF, while preserving equivalent mathematical relations. The lightweight prediction equations in the KCKF are derived by expanding the summation terms in the CKF and simplifying the result. This paper shows that the KCKF requires fewer floating-point operations (FLOPs) than the CKF. The controlled experimental results show that the KCKF reduces the computation time by approximately 19% compared to the CKF on a high-performance computer, whereas the KCKF reduces the computation time by approximately 15% compared to the CKF on a low-cost single-board computer. In addition, the KCKF maintains the attitude estimation accuracy of the CKF.

Authors:Emma Hoes, K. Jonathan Klueser, Fabrizio Gilardi
Title: VIRENA: Virtual Arena for Research, Education, and Democratic Innovation
Abstract:
Digital platforms shape how people communicate, deliberate, and form opinions. Studying these dynamics has become increasingly difficult due to restricted data access, ethical constraints on real-world experiments, and limitations of existing research tools. VIRENA (Virtual Arena) is a platform that enables controlled experimentation in realistic social media environments. Multiple participants interact simultaneously in realistic replicas of feed-based platforms (Instagram, Facebook, Reddit) and messaging apps (WhatsApp, Messenger). Large language model-powered AI agents participate alongside humans with configurable personas and realistic behavior. Researchers can manipulate content moderation approaches, pre-schedule stimulus content, and run experiments across conditions through a visual interface requiring no programming skills. VIRENA makes possible research designs that were previously impractical: studying human--AI interaction in realistic social contexts, experimentally comparing moderation interventions, and observing group deliberation as it unfolds. Built on open-source technologies that ensure data remain under institutional control and comply with data protection requirements, VIRENA is currently in use at the University of Zurich and available for pilot collaborations. Designed for researchers, educators, and public organizations alike, VIRENA's no-code interface makes controlled social media simulation accessible across disciplines and sectors. This paper documents its design, architecture, and capabilities.

Authors:Chang Liu, Qinyi Zhou, Xinjie Shen, Xingyu Bruce Liu, Tongshuang Wu, Xiang 'Anthony' Chen
Title: Behavioral Indicators of Overreliance During Interaction with Conversational Language Models
Abstract:
LLMs are now embedded in a wide range of everyday scenarios. However, their inherent hallucinations risk hiding misinformation in fluent responses, raising concerns about overreliance on AI. Detecting overreliance is challenging, as it often arises in complex, dynamic contexts and cannot be easily captured by post-hoc task outcomes. In this work, we aim to investigate how users' behavioral patterns correlate with overreliance. We collected interaction logs from 77 participants working with an LLM injected plausible misinformation across three real-world tasks and we assessed overreliance by whether participants detected and corrected these errors. By semantically encoding and clustering segments of user interactions, we identified five behavioral patterns linked to overreliance: users with low overreliance show careful task comprehension and fine-grained navigation; users with high overreliance show frequent copy-paste, skipping initial comprehension, repeated LLM references, coarse locating, and accepting misinformation despite hesitation. We discuss design implications for mitigation.

Authors:Juliana Gerard, Morgan Macleod, Kelly Norwood, Aisling Reid, Muskaan Singh
Title: Methodological Variation in Studying Staff and Student Perceptions of AI
Abstract:
In this paper, we compare methodological approaches for comparing student and staff perceptions, and ask: how much do these measures vary across different approaches? We focus on the case of AI perceptions, which are generally assessed via a single quantitative or qualitative measure, or with a mixed methods approach that compares two distinct data sources - e.g. a quantitative questionnaire with qualitative comments. To compare different approaches, we collect two forms of qualitative data: standalone comments and structured focus groups. We conduct two analyses for each data source: with a sentiment and stance analysis, we measure overall negativity/positivity of the comments and focus group conversations, respectively. Meanwhile, word clouds from the comments and a thematic analysis of the focus groups provide further detail on the content of this qualitative data - particularly the thematic analysis, which includes both similarities and differences between students and staff. We show that different analyses can produce different results - for a single data source. This variation stems from the construct being evaluated - an overall measure of positivity/negativity can produce a different picture from more detailed content-based analyses. We discuss the implications of this variation for institutional contexts, and for the comparisons from previous studies.

Authors:Yehuda Perry, Tawfiq Ammari
Title: Normalized Surveillance in the Datafied Car: How Autonomous Vehicle Users Rationalize Privacy Trade-offs
Abstract:
Autonomous vehicles (AVs) are characterized by pervasive datafication and surveillance through sensors like in-cabin cameras, LIDAR, and GPS. Drawing on 16 semi-structured interviews with AV drivers analyzed using constructivist grounded theory, this study examines how users make sense of vehicular surveillance within everyday datafication. Findings reveal drivers demonstrate few AV-specific privacy concerns, instead normalizing monitoring through comparisons with established digital platforms. We theorize this indifference by situating AV surveillance within the `surveillance ecology' of platform environments, arguing the datafied car functions as a mobile extension of the `leaky home' -- private spaces rendered permeable through connected technologies continuously transmitting behavioral data. The study contributes to scholarship on surveillance beliefs, datafication, and platform governance by demonstrating how users who have accepted comprehensive smartphone and smart home monitoring encounter AV datafication as just another node in normalized data extraction. We highlight how geographic restrictions on data access -- currently limiting driver log access to California -- create asymmetries that impede informed privacy deliberation, exemplifying `tertiary digital divides.' Finally, we examine how machine learning's reliance on data-intensive approaches creates structural pressure for surveillance that transcends individual manufacturer choices. We propose governance interventions to democratize social learning, including universal data access rights, binding transparency requirements, and data minimization standards to prevent race-to-the-bottom dynamics in automotive datafication.

Authors:Zhuoyang Li, Yanlai Wu, Yao Li, Xinning Gui, Yuhan Luo
Title: Privacy Control in Conversational LLM Platforms: A Walkthrough Study
Abstract:
Large language models (LLMs) are increasingly integrated into daily life through conversational interfaces, processing user data via natural language inputs and exhibiting advanced reasoning capabilities, which raises new concerns about user control over privacy. While much research has focused on potential privacy risks, less attention has been paid to the data control mechanisms these platforms provide. This study examines six conversational LLM platforms, analyzing how they define and implement features for users to access, edit, delete, and share data. Our analysis reveals an emerging paradigm of data control in conversational LLM platforms, where user data is generated and derived through interaction itself, natural language enables flexible yet often ambiguous control, and multi-user interactions with shared data raise questions of co-ownership and governance. Based on these findings, we offer practical insights for platform developers, policymakers, and researchers to design more effective and usable privacy controls in LLM-powered conversational interactions.

Authors:Yigang Qin, EunJeong Cheon
Title: Labor, Capital, and Machine: Toward a Labor Process Theory for HCI
Abstract:
The HCI community has called for renewed attention to labor issues and the political economy of computing. Yet much work remains in engaging with labor theory to better understand modern work and workers. This article traces the development of Labor Process Theory (LPT) -- from Karl Marx and Harry Braverman to Michael Burawoy and beyond -- and introduces it as an essential yet underutilized resource for structural analysis of work under capitalism and the design of computing systems. We examine HCI literature on labor, investigating focal themes and conceptual, empirical, and design approaches. Drawing from LPT, we offer directions for HCI research and practice: distinguish labor from work, link work practice to value production, study up the management, analyze consent and legitimacy, move beyond the point of production, design alternative institutions, and unnaturalize bourgeois designs. These directions can deepen analyses of tech-mediated workplace regimes, inform critical and normative designs, and strengthen the field's connection to broader political economic critique.

Authors:Reese Kneeland, Wangshu Jiang, Ugo Bruzadin Nunes, Paul Steven Scotti, Arnaud Delorme, Jonathan Xu
Title: ENIGMA: EEG-to-Image in 15 Minutes Using Less Than 1% of the Parameters
Abstract:
To be practical for real-life applications, models for brain-computer interfaces must be easily and quickly deployable on new subjects, effective on affordable scanning hardware, and small enough to run locally on accessible computing resources. To directly address these current limitations, we introduce ENIGMA, a multi-subject electroencephalography (EEG)-to-Image decoding model that reconstructs seen images from EEG recordings and achieves state-of-the-art (SOTA) performance on the research-grade THINGS-EEG2 and consumer-grade AllJoined-1.6M benchmarks, while fine-tuning effectively on new subjects with as little as 15 minutes of data. ENIGMA boasts a simpler architecture and requires less than 1% of the trainable parameters necessary for previous approaches. Our approach integrates a subject-unified spatio-temporal backbone along with a set of multi-subject latent alignment layers and an MLP projector to map raw EEG signals to a rich visual latent space. We evaluate our approach using a broad suite of image reconstruction metrics that have been standardized in the adjacent field of fMRI-to-Image research, and we describe the first EEG-to-Image study to conduct extensive behavioral evaluations of our reconstructions using human raters. Our simple and robust architecture provides a significant performance boost across both research-grade and consumer-grade EEG hardware, and a substantial improvement in fine-tuning efficiency and inference cost. Finally, we provide extensive ablations to determine the architectural choices most responsible for our performance gains in both single and multi-subject cases across multiple benchmark datasets. Collectively, our work provides a substantial step towards the development of practical brain-computer interface applications.

Authors:Donguk Park, Dongwon Lee, Yeon-Chang Lee
Title: Echoes in the Loop: Diagnosing Risks in LLM-Powered Recommender Systems under Feedback Loops
Abstract:
Large language models (LLMs) are increasingly embedded into recommender systems, where they operate across multiple functional roles such as data augmentation, profiling, and decision making. While prior work emphasizes recommendation performance, the systemic risks of LLMs, such as bias and hallucination, and their propagation through feedback loops remain largely unexplored. In this paper, we propose a role-aware, phase-wise diagnostic framework that traces how these risks emerge, manifest in ranking outcomes, and accumulate over repeated recommendation cycles. We formalize a controlled feedback-loop pipeline that simulates long-term interaction dynamics and enables empirical measurement of risks at the LLM-generated content, ranking, and ecosystem levels. Experiments on widely used benchmarks demonstrate that LLM-based components can amplify popularity bias, introduce spurious signals through hallucination, and lead to polarized and self-reinforcing exposure patterns over time. We plan to release our framework as an open-source toolkit to facilitate systematic risk analysis across diverse LLM-powered recommender systems.

Authors:Yehuda Perry, Tawfiq Ammari
Title: Navigating Algorithmic Opacity: Folk Theories and User Agency in Semi-Autonomous Vehicles
Abstract:
As semi-autonomous vehicles (AVs) become prevalent, drivers must collaborate with AI systems whose decision-making processes remain opaque. This study examines how drivers of AVs develop folk theories to interpret algorithmic behavior that contradicts their expectations. Through 16 semi-structured interviews with drivers in the United States, we investigate the explanatory frameworks drivers construct to make sense of AI decisions, the strategies they employ when systems behave unexpectedly, and their experiences with control handoffs and feedback mechanisms. Our findings reveal that drivers develop sophisticated folk theories -- often using anthropomorphic metaphors describing systems that ``see,'' ``hesitate,'' or become ``overwhelmed'' -- yet lack informational resources to validate these theories or meaningfully participate in algorithmic governance. We identify contexts where algorithmic opacity manifests acutely, including complex intersections, adverse weather, and rural environments. Current AV designs position drivers as passive data sources rather than epistemic agents, creating accountability gaps that undermine trust and safety. Drawing on critical data studies and algorithmic accountability literature, we propose a framework for participatory algorithmic governance that would provide drivers with transparency into AI decision-making and meaningful channels for contributing to system improvement. This research contributes to understanding how users navigate datafied sociotechnical systems in safety-critical contexts.

Authors:Mona Rajhans, Vishal Khawarey
Title: An Information-Theoretic Framework for Comparing Voice and Text Explainability
Abstract:
Explainable Artificial Intelligence (XAI) aims to make machine learning models transparent and trustworthy, yet most current approaches communicate explanations visually or through text. This paper introduces an information theoretic framework for analyzing how explanation modality specifically, voice versus text affects user comprehension and trust calibration in AI systems. The proposed model treats explanation delivery as a communication channel between model and user, characterized by metrics for information retention, comprehension efficiency (CE), and trust calibration error (T CE). A simulation framework implemented in Python was developed to evaluate these metrics using synthetic SHAP based feature attributions across multiple modality style configurations (brief, detailed, and analogy based). Results demonstrate that text explanations achieve higher comprehension efficiency, while voice explanations yield improved trust calibration, with analogy based delivery achieving the best overall trade off. This framework provides a reproducible foundation for designing and benchmarking multimodal explainability systems and can be extended to empirical studies using real SHAP or LIME outputs on open datasets such as the UCI Credit Approval or Kaggle Financial Transactions datasets.

Authors:Ankolika De, Gabriel Lima, Yixin Zou
Title: What is Safety? Corporate Discourse, Power, and the Politics of Generative AI Safety
Abstract:
This work examines how leading generative artificial intelligence companies construct and communicate the concept of "safety" through public-facing documents. Drawing on critical discourse analysis, we analyze a corpus of corporate safety-related statements to explicate how authority, responsibility, and legitimacy are discursively established. These discursive strategies consolidate legitimacy for corporate actors, normalize safety as an experimental and anticipatory practice, and push a perceived participatory agenda toward safe technologies. We argue that uncritical uptake of these discourses risks reproducing corporate priorities and constraining alternative approaches to governance and design. The contribution of this work is twofold: first, to situate safety as a sociotechnical discourse that warrants critical examination; second, to caution human-computer interaction scholars against legitimizing corporate framings, instead foregrounding accountability, equity, and justice. By interrogating safety discourses as artifacts of power, this paper advances a critical agenda for human-computer interaction scholarship on artificial intelligence.

Authors:Pavlos Panagiotidis, Jocelyn Spence, Nils Jaeger, Paul Tennent
Title: Directing Space: Rehearsing Architecture as Performer with Explainable AI
Abstract:
As AI systems increasingly become embedded in interactive and im-mersive artistic environments, artists and technologists are discovering new opportunities to engage with their interpretive and autonomous capacities as creative collaborators in live performance. The focus of this work-in-progress is on outlining conceptual and technical foundations under which performance-makers and interactive architecture can collaborate within rehearsal settings. It introduces a rehearsal-oriented prototype system for shaping and testing AI-mediated environments within creative practice. This approach treats interactive architecture as a performative agent that senses spatial behaviour and speech, interprets these signals through a large language model, and generates real-time environmental adaptations. Designed for deployment in physical performance spaces, the system employs virtual blueprints to support iterative experimentation and creative dialogue between artists and AI agents, using reasoning traces to inform architectural interaction design grounded in dramaturgical principles.

Authors:Shayla Sharmin, Sadia Afrin
Title: Beyond Judgment: Exploring Large Language Models as Non-Judgmental Support for Maternal Mental Health
Abstract:
In the age of Large Language Models (LLMs), much work has already been done on how LLMs support medication advice and serve as information providers; however, how mothers use these tools for emotional and informational support to avoid social judgment remains underexplored. This study conducted a 10-day mixed-methods exploratory survey ($N=107$) to investigate how mothers use LLMs as a non-judgmental resource for emotional support and regulation, and for situational reassurance. Our findings show that mothers are asking LLMs various questions about childcare to reassure themselves and avoid judgment, particularly around childcare decisions, maternal guilt, and late-night caregiving. Open-ended responses also show that mothers are comfortable with LLMs because they do not have to think about social consequences or judgment. Although mothers use LLMs for quick information or reassurance to avoid judgment, over half of the participants value human warmth more than LLMs; however, a significant minority, especially those in joint families, consider LLMs to avoid human judgment. These findings help understand how LLMs can be framed as low-risk interaction support rather than a replacement for human support, and highlight the role of social context in shaping emotional technology use.

Authors:Uwe Peters, Andrea Bertazzoli, Jasmine M. DeJesus, Gisela J. van der Velden, Benjamin Chin-Yee
Title: Generics in science communication: Misaligned interpretations across laypeople, scientists, and large language models
Abstract:
Scientists often use generics, that is, unquantified statements about whole categories of people or phenomena, when communicating research findings (e.g., "statins reduce cardiovascular events"). Large language models (LLMs), such as ChatGPT, frequently adopt the same style when summarizing scientific texts. However, generics can prompt overgeneralizations, especially when they are interpreted differently across audiences. In a study comparing laypeople, scientists, and two leading LLMs (ChatGPT-5 and DeepSeek), we found systematic differences in interpretation of generics. Compared to most scientists, laypeople judged scientific generics as more generalizable and credible, while LLMs rated them even higher. These mismatches highlight significant risks for science communication. Scientists may use generics and incorrectly assume laypeople share their interpretation, while LLMs may systematically overgeneralize scientific findings when summarizing research. Our findings underscore the need for greater attention to language choices in both human and LLM-mediated science communication.

Authors:Zhihan Jiang, Qianhui Chen, Chu Zhang, Yanheng Li, Ray LC
Title: Hear You in Silence: Designing for Active Listening in Human Interaction with Conversational Agents Using Context-Aware Pacing
Abstract:
In human conversation, empathic dialogue requires nuanced temporal cues indicating whether the conversational partner is paying attention. This type of "active listening" is overlooked in the design of Conversational Agents (CAs), which use the same pacing for one conversation. To model the temporal cues in human conversation, we need CAs that dynamically adjust response pacing according to user input. We qualitatively analyzed ten cases of active listening to distill five context-aware pacing strategies: Reflective Silence, Facilitative Silence, Empathic Silence, Holding Space, and Immediate Response. In a between-subjects study (N=50) with two conversational scenarios (relationship and career-support), the context-aware agent scored higher than static-pacing control on perceived human-likeness, smoothness, and interactivity, supporting deeper self-disclosure and higher engagement. In the career support scenario, the CA yielded higher perceived listening quality and affective trust. This work shows how insights from human conversation like context-aware pacing can empower the design of more empathic human-AI communication.

Authors:Sankar B, Amogh A S, Sandhya Baranwal, Dibakar Sen
Title: Git for Sketches: An Intelligent Tracking System for Capturing Design Evolution
Abstract:
During product conceptualization, capturing the non-linear history and cognitive intent is crucial. Traditional sketching tools often lose this context. We introduce DIMES (Design Idea Management and Evolution capture System), a web-based environment featuring sGIT (SketchGit), a custom visual version control architecture, and Generative AI. sGIT includes AEGIS, a module using hybrid Deep Learning and Machine Learning models to classify six stroke types. The system maps Git primitives to design actions, enabling implicit branching and multi-modal commits (stroke data + voice intent). In a comparative study, experts using DIMES demonstrated a 160% increase in breadth of concept exploration. Generative AI modules generated narrative summaries that enhanced knowledge transfer; novices achieved higher replication fidelity (Neural Transparency-based Cosine Similarity: 0.97 vs. 0.73) compared to manual summaries. AI-generated renderings also received higher user acceptance (Purchase Likelihood: 4.2 vs 3.1). This work demonstrates that intelligent version control bridges creative action and cognitive documentation, offering a new paradigm for design education.

Authors:Lukas Stappen, Ahmet Erkan Turan, Johann Hagerer, Georg Groh
Title: Agent2Agent Threats in Safety-Critical LLM Assistants: A Human-Centric Taxonomy
Abstract:
The integration of Large Language Model (LLM)-based conversational agents into vehicles creates novel security challenges at the intersection of agentic AI, automotive safety, and inter-agent communication. As these intelligent assistants coordinate with external services via protocols such as Google's Agent-to-Agent (A2A), they establish attack surfaces where manipulations can propagate through natural language payloads, potentially causing severe consequences ranging from driver distraction to unauthorized vehicle control. Existing AI security frameworks, while foundational, lack the rigorous "separation of concerns" standard in safety-critical systems engineering by co-mingling the concepts of what is being protected (assets) with how it is attacked (attack paths). This paper addresses this methodological gap by proposing a threat modeling framework called AgentHeLLM (Agent Hazard Exploration for LLM Assistants) that formally separates asset identification from attack path analysis. We introduce a human-centric asset taxonomy derived from harm-oriented "victim modeling" and inspired by the Universal Declaration of Human Rights, and a formal graph-based model that distinguishes poison paths (malicious data propagation) from trigger paths (activation actions). We demonstrate the framework's practical applicability through an open-source attack path suggestion tool AgentHeLLM Attack Path Generator that automates multi-stage threat discovery using a bi-level search strategy.

Authors:Damien Rudaz, Barbara Nino Carreras, Sara Merlino, Brian L. Due, Barry Brown
Title: (Computer) Vision in Action: Comparing Remote Sighted Assistance and a Multimodal Voice Agent in Inspection Sequences
Abstract:
Does human-AI assistance unfold in the same way as human-human assistance? This research explores what can be learned from the expertise of blind individuals and sighted volunteers to inform the design of multimodal voice agents and address the enduring challenge of proactivity. Drawing on granular analysis of two representative fragments from a larger corpus, we contrast the practices co-produced by an experienced human remote sighted assistant and a blind participant-as they collaborate to find a stain on a blanket over the phone-with those achieved when the same participant worked with a multimodal voice agent on the same task, a few moments earlier. This comparison enables us to specify precisely which fundamental proactive practices the agent did not enact in situ. We conclude that, so long as multimodal voice agents cannot produce environmentally occasioned vision-based actions, they will lack a key resource relied upon by human remote sighted assistants.

Authors:Alastair Howcroft, Amber Bennett-Weston, Ahmad Khan, Joseff Griffiths, Simon Gay, Jeremy Howick
Title: AI chatbots versus human healthcare professionals: a systematic review and meta-analysis of empathy in patient care
Abstract:
Background: Empathy is widely recognized for improving patient outcomes, including reduced pain and anxiety and improved satisfaction, and its absence can cause harm. Meanwhile, use of artificial intelligence (AI)-based chatbots in healthcare is rapidly expanding, with one in five general practitioners using generative AI to assist with tasks such as writing letters. Some studies suggest AI chatbots can outperform human healthcare professionals (HCPs) in empathy, though findings are mixed and lack synthesis. Sources of data: We searched multiple databases for studies comparing AI chatbots using large language models with human HCPs on empathy measures. We assessed risk of bias with ROBINS-I and synthesized findings using random-effects meta-analysis where feasible, whilst avoiding double counting. Areas of agreement: We identified 15 studies (2023-2024). Thirteen studies reported statistically significantly higher empathy ratings for AI, with only two studies situated in dermatology favouring human responses. Of the 15 studies, 13 provided extractable data and were suitable for pooling. Meta-analysis of those 13 studies, all utilising ChatGPT-3.5/4, showed a standardized mean difference of 0.87 (95% CI, 0.54-1.20) favouring AI (P < .00001), roughly equivalent to a two-point increase on a 10-point scale. Areas of controversy: Studies relied on text-based assessments that overlook non-verbal cues and evaluated empathy through proxy raters. Growing points: Our findings indicate that, in text-only scenarios, AI chatbots are frequently perceived as more empathic than human HCPs. Areas timely for developing research: Future research should validate these findings with direct patient evaluations and assess whether emerging voice-enabled AI systems can deliver similar empathic advantages.

Authors:Hsuan-Yu Chou, Wajiha Naveed, Shuyan Zhou, Xiaowei Yang
Title: Are Open-Weight LLMs Ready for Social Media Moderation? A Comparative Study on Bluesky
Abstract:
As internet access expands, so does exposure to harmful content, increasing the need for effective moderation. Research has demonstrated that large language models (LLMs) can be effectively utilized for social media moderation tasks, including harmful content detection. While proprietary LLMs have been shown to zero-shot outperform traditional machine learning models, the out-of-the-box capability of open-weight LLMs remains an open question. Motivated by recent developments of reasoning LLMs, we evaluate seven state-of-the-art models: four proprietary and three open-weight. Testing with real-world posts on Bluesky, moderation decisions by Bluesky Moderation Service, and annotations by two authors, we find a considerable degree of overlap between the sensitivity (81%--97%) and specificity (91%--100%) of the open-weight LLMs and those (72%--98%, and 93%--99%) of the proprietary ones. Additionally, our analysis reveals that specificity exceeds sensitivity for rudeness detection, but the opposite holds for intolerance and threats. Lastly, we identify inter-rater agreement across human moderators and the LLMs, highlighting considerations for deploying LLMs in both platform-scale and personalized moderation contexts. These findings show open-weight LLMs can support privacy-preserving moderation on consumer-grade hardware and suggest new directions for designing moderation systems that balance community values with individual user preferences.

Authors:Shota Yamanaka, I. Scott MacKenzie
Title: Normalizing Speed-accuracy Biases in 2D Pointing Tasks with Better Calculation of Effective Target Widths
Abstract:
For evaluations of 2D target selection using Fitts' law, ISO 9241-411 recommends using the effective target width (W_e) calculated using the univariate standard deviation of selection coordinates. Related research proposed using a bivariate standard deviation; however, the proposal was only tested using a single speed-accuracy bias condition, thus the assessment was limited. We compared the univariate and bivariate techniques in a 2D Fitts' law experiment using three speed-accuracy biases and 346 crowdworkers. Calculating W_e using the univariate standard deviation yielded higher model correlations across all bias conditions and produced more stable throughput among the biases. The findings were also consistent in cases using randomly sampled subsets of the participant data. We recommend that future research should calculate W_e using the univariate standard deviation for fair performance evaluations. Also, we found trivial effects when using nominal or effective amplitude and using different perspectives of the task axis.

Authors:Mobasshira Akter Urmi, Raiyan Abdul Baten
Title: Strategic Adaptation Under Contextual Change: Insights from a Dyadic Negotiation Testbed for AI Coaching Technologies
Abstract:
Strategic adaptation -- the ability to adjust interaction behavior in response to changing constraints and leverage -- is a central goal of negotiation training and an emerging target for AI coaching systems. However, adaptation is difficult to evaluate because adaptation-relevant moments arise unpredictably in typical tasks. We study a reusable dyadic negotiation testbed that employs a controlled midstream change in one party's outside alternative as a repeatable perturbation to stress-test adaptation. In a six-round chat-based negotiation study (N=100), the perturbation reliably reorganized interaction dynamics: transitions between integrative (cooperative) and distributive (positional) behaviors declined, behavioral diversity narrowed, and interactions drifted toward more distributive tactics. Critically, this distributive drift predicted worse relational experience net of objective outcomes, and adaptation patterns were path dependent on prior behavior. These results establish a methodological bridge for evaluating and comparing AI coaching systems on strategic adaptation as a process and identify failure modes and design targets for adaptive interaction support.

Authors:Gang Yu, Yuchi Sun, Weining Yan, Xinyu Wang, Qi Lu
Title: Paint by Odor: An Exploration of Odor Visualization through Large Language Model and Generative AI
Abstract:
Odor visualization translates odor information and perception into visual outcomes and arouses the corresponding olfactory synesthesia, surpassing the spatial limitation that odors can only be perceived where they are present. Traditional odor visualization has typically relied on unidimensional mappings, such as odor-to-color associations, and has required extensive manual design efforts. However, the advent of generative AI (Gen AI) and large language models (LLMs) presents a new opportunity for automatic odor visualization. Nonetheless, gaps remain in bridging olfactory perception with generative tools to produce odor images. To address these gaps, this paper introduces Paint by Odor, a pipeline that leverages Gen AI and LLMs to transform olfactory perceptions into rich, aesthetically engaging visual representations. Two experiments were conducted, where 30 participants smelled real-world odors and provided descriptive data and 28 participants evaluated 560 generated odor images through seven systematically designed prompts. Our findings explored the capability of LLMs in producing olfactory perception by comparing it with human responses and revealed the underlying mechanisms and effects of language-based descriptions and several abstraction styles on odor visualization. Our work further discussed the possibility of automatic odor visualization without human participation. These explorations and results have bridged the research gap in odor visualization using LLMs and Gen AI, offering valuable design insights and various possibilities for future applications.

Authors:Felicia Fang-Yi Tan, Oded Nov
Title: Counting the Wait: Effects of Temporal Feedback on Downstream Task Performance and Perceived Wait-Time Experience during System-Imposed Delays
Abstract:
System-imposed wait times can significantly disrupt digital workflows, affecting user experience and task performance. Prior HCI research has examined how temporal feedback, such as feedback mode (Elapsed-Time vs. Remaining-Time) shapes wait-time perception. However, few studies have investigated how such feedback influences users' downstream task performance, as well as overall affective and cognitive experience. To study these effects, we conducted an online experiment where 425 participants performing a visual reasoning task experienced a 10-, 30-, or 60-second wait with a Remaining-Time, Elapsed-Time, or No Time Display. Findings show that temporal feedback mode shapes how waiting is perceived: Remaining-Time feedback increased frustration relative to Elapsed-Time feedback, while No Time Display made waits feel longer and heightened ambiguity. Notably, these experiential differences did not translate into differences in post-wait task performance. Integrating psychophysical and cognitive science perspectives, we discuss implications for implementing temporal feedback in latency-prone digital systems.

Authors:Adriana Olmos, Anoop K. Sinha, Renelito Delos Santos, Ruben Rodriguez Rodriguez, James A. Landay, Sam S. Sepah, Philip Nelson, Shaun K. Kane
Title: Making Videos Accessible for Blind and Low Vision Users Using a Multimodal Agent Video Player
Abstract:
Video content remains largely inaccessible to blind and low-vision (BLV) users. To address this, we introduce a prototype that leverages a multimodal agent - powered by a novel conversational architecture using a multimodal large language model (MLLM) - to provide BLV users with an interactive, accessible video experience. This Multimodal Agent Video Player (MAVP) demonstrates that an interactive accessibility mode can be added to a video through multilayered prompt orchestration. We describe a user-centered design process involving 18 sessions with BLV users that showed that BLV users do not just want accessibility features, but desire independence and personal agency over the viewing experience. We conducted a qualitative study with an additional 8 BLV participants; in this, we saw that the MAVP's conversational dialogue offers BLV users a sense of personal agency, fostering collaboration and trust. Even in the case of hallucinations, it is meta-conversational dialogues about AI's limitations that can repair trust.

Authors:Lingqing Wang, Yingting Gao, Chidimma Lois Anyi, Ashok Goel
Title: Futuring Social Assemblages: How Enmeshing AIs into Social Life Challenges the Individual and the Interpersonal
Abstract:
Recent advances in AI are integrating AI into the fabric of human social life, creating transformative, co-shaping relationships between humans and AI. This trend makes it urgent to investigate how these systems, in turn, shape their users. We conducted a three-phase design study with 24 participants to explore this dynamic. Our findings reveal critical tensions: (1) social AI often exacerbates the very interpersonal problems it is designed to mitigate; (2) it introduces nuanced privacy harms for secondary users inadvertently involved in AI-mediated social interactions; and (3) it can threaten the primary user's personal agency and identity. We argue these tensions expose a problematic tendency in the user-centered paradigm, which often prioritizes immediate user experience at the expense of core human values like interpersonal ethics and self-efficacy. We call for a paradigm shift toward a more provocative and relational design perspective that foregrounds long-term social and personal consequences.

Authors:Mona Alfayez, Ohoud Alharbi
Title: From Expectation To Experience: A Before And After Survey Of Public Opinion On Autonomous Cars In Saudi Arabia
Abstract:
Autonomous vehicles (AVs) are emerging as a transformative innovation in transportation, offering potential benefits in safety, sustainability, and efficiency. Saudi Arabian adoption of AVs aligns with Vision 2030, emphasizing smart mobility through initiatives such as the Riyadh Autonomous Metro and self-driving cars. This study explores Saudi citizens perceptions of AVs before and after exposure to these technologies and examines whether demographic factors age, gender, education level, and driving habits affect acceptance. Using quantitative methods, the findings provide insights into the broader influences shaping AV adoption, highlighting the importance of trust, perceived safety, and convenience. These results can inform policymakers and industry stakeholders on strategies to facilitate successful integration of AVs into Saudi Arabian transportation ecosystem.

Authors:Saizo Aoyagi, Ryoma Okazaki, Seishiro Hara, Fumiya Ikeda, Michiya Yamamoto
Title: On-Demand Lecture Watching System Using Various Actions of Student Characters to Maintain Concentration
Abstract:
Since the COVID-19 pandemic, online lectures have spread rapidly and many students are satisfied with them. However, one challenge remains the loss of concentration due to the lack of students' copresence. Our previous work suggests that presenting 3D characters with appropriate actions has the potential to improve concentration in online lectures. Nevertheless, an effective combination of actions has not yet been identified. In this study, we developed a lecture watching system that presents a 3D virtual classroom using a naked-eye 3D display. The system includes student characters that show copresence with various actions such as nodding, notetaking, and sleeping. An evaluation experiment was conducted with two conditions; (1) student characters perform only positive actions and (2) both positive and negative actions. The results, analyzed using posture and notetaking behavior as key indicators, suggest that the system can help to maintain concentration when the student characters perform both positive and negative actions, rather than only positive ones. These findings provide promising strategies for maintaining student focus in on-demand lectures and contribute to the development of more effective online education systems.

Authors:Chan-in Sio, Alex Mann, Lingxi Fan, Andrew Cheung, Lik-hang Lee
Title: Perceptions of AI-CBT: Trust and Barriers in Chinese Postgrads
Abstract:
The mental well-being of graduate students is an increasing concern, yet the adoption of scalable support remains uneven. Artificial intelligence-powered cognitive behavioral therapy chatbots (AI-CBT) offer low barrier help, but little is known about how Chinese postgraduates perceive and use them. This qualitative study explored perceptions and experiences of AI-CBT chatbots among ten Chinese graduate students recruited through social media. Semi-structured Zoom interviews were conducted and analyzed using reflexive thematic analysis, with the Health Belief Model (HBM) and the Theory of Planned Behavior (TPB) as sensitizing frameworks. The findings indicate a cautious openness to AI-CBT chatbots: perceived usefulness and 24/7 access supported favorable attitudes, while data privacy, emotional safety, and uncertainty about `fit' for complex problems restricted the intention to use. Social norms (e.g., stigma and peer views) and perceived control (digital literacy, language quality) further shaped adoption. The study offers context-specific information to guide the culturally sensitive design, communication, and deployment of AI mental well-being tools for student populations in China and outlines the design implications around transparency, safeguards, and graduated care pathways.

Authors:Wisnu Uriawan, Denis Firmansyah, Devi Mulyana, Dika Haekal Firza Pratama, Adly Juliarta Lerian, Fajar Satria Wiguna
Title: Gamification-Based Learning Method for Hijaiyah Letters
Abstract:
The mastery of Hijaiyah letters is a crucial foundation for reading and comprehending the Quran, yet conventional pedagogical approaches based on repetitive memorization frequently struggle to maintain the engagement of young learners in contemporary educational contexts. This research presents the design and implementation of an innovative gamification-based methodology for Hijaiyah literacy acquisition, systematically developed through the ADDIE framework (Analysis, Design, Development, Implementation, Evaluation) to optimize student motivation, participation, and educational outcomes. The resulting technological solution, engineered using Unity 2D and Firebase, strategically incorporates game design elements such as points, badges, leaderboards, and progressive leveling, while integrating multifaceted learning components including visual animations, authentic tajwid-based audio pronunciation, and interactive letter tracing exercises to simultaneously develop cognitive recognition capabilities and fine motor skills. Empirical evaluation involving 50 elementary school participants revealed substantial quantitative improvements, with mean assessment scores increasing from 42.8 to 88.6 (107% improvement, p < 0.001), demonstrating an exceptionally large effect size (Cohen's d = 4.87), complemented by strong user engagement metrics (4.2 average daily sessions) and high satisfaction ratings (4.82 out of 5 mean motivation score). Beyond cognitive learning outcomes, the gamified approach effectively fostered intrinsic Islamic values such as perseverance, responsibility, and disciplined practice, thereby establishing an innovative educational paradigm that successfully integrates traditional Islamic pedagogical principles with modern digital learning technologies to create a transformative, engaging, and meaningful framework for Hijaiyah literacy development in contemporary Islamic education.

Authors:Samantha Shorey, Benjamin Mako Hill, Samuel C. Woolley
Title: From Hanging Out to Figuring It Out: Socializing Online as a Pathway to Computational Thinking
Abstract:
Although socializing is a powerful driver of youth engagement online, platforms struggle to leverage engagement to promote learning. We seek to understand this dynamic using a multi-stage analysis of over 14,000 comments on Scratch, an online platform designed to support learning about programming. First, we inductively develop the concept of "participatory debugging" -- a practice through which users learn through collaborative technical troubleshooting. Second, we use a content analysis to establish how common the practice is on Scratch. Third, we conduct a qualitative analysis of user activity over time and identify three factors that serve as social antecedents of participatory debugging: (1) sustained community, (2) identifiable problems, and (3) what we call "topic porousness" to describe conversations that are able to span multiple topics. We integrate these findings in a theoretical framework that highlights a productive tension between the desire to promote learning and the interest-driven sub-communities that drive user engagement in many new media environments.

Authors:Yilin Ke, Yun Suen Pai, Burkhard C. Wuensche, Angus Donald Campbell, Mairi Gunn
Title: Invisible Users in Digital Health: A Scoping Review of Digital Interventions to Promote Physical Activity Among Culturally and Linguistically Diverse Women
Abstract:
Digital health has strong potential for promoting physical activity (PA), yet interventions often fail to sustain engagement among culturally and linguistically diverse (CALD) women. Prior reviews focus on short-term efficacy or surface-level localisation, while a design-oriented synthesis of deep cultural adaptation and long-term strategies remain limited. This scoping review systematically screened 1968 records, analysed 18 studies and identified a critical design paradox: techno-solutionist systems overlook social and cultural barriers, while social-support features often fail in low-activity social networks. To address this gap, we propose the Culturally Embedded Interaction Framework, integrating five dimensions: culturally-grounded measurement, multi-modal interaction, contextual and temporal adaptability, embedded social weaving, and theory-guided cultural adaptation. The framework advances beyond accessibility-focused approaches by mapping behavioural theory to design mechanisms that support sustained and culturally plural participation. We provide actionable design principles to help HCI researchers and practitioners move from one-size-fits-all models toward adaptive, theory-informed, and culturally sustaining design.

Authors:Mona G. Ibrahim, Riham Hilal
Title: Artificial Intelligence for Inclusive Engineering Education: Advancing Equality, Diversity, and Ethical Leadership
Abstract:
AI technology development has transformed the field of engineering education with its adaptivity-driven, data-based, and ethical-led learning platforms that promote equity, diversity, and inclusivity. But with so much progress being made in so many areas, there are unfortunately gaps in gender equity, representation in cultures around the world, and access to education and jobs in stem education. The paper describes an ethical approach to using AI technology that supports the United Nations 2030 agenda for sustainability. In particular, this includes both Goal 5--Gender Equity--and Goal 10--Reducing Inequalities. Based on a synthesis strategy using both critical thinking strategies related to case studies around the world using AI-based adaptivity platforms to address equity gaps related to education inclusion. The model presented offers a synthesis solution that includes ethical leadership data-related to equity to measure inclusivity based upon sustainability thinking. The result has demonstrated that using AI technology not only increases inclusivity but promotes equity related to access to education in stem education access. Finally, there are concluding remarks related to transforming education into a global system.

Authors:Alessandro Silacci, Mauro Cherubini, Arianna Boldi, Amon Rapp, Maurizio Caon
Title: When Workout Buddies Are Virtual: AI Agents and Human Peers in a Longitudinal Physical Activity Study
Abstract:
Physical inactivity remains a critical global health issue, yet scalable strategies for sustained motivation are scarce. Conversational agents designed as simulated exercising peers (SEPs) represent a promising alternative, but their long-term impact is unclear. We report a six-month randomized controlled trial (N=280) comparing individuals exercising alone, with a human peer, or with a large language model-driven SEP. Results revealed a partnership paradox: human peers evoked stronger social presence, while AI peers provided steadier encouragement and more reliable working alliances. Humans motivated through authentic comparison and accountability, whereas AI peers fostered consistent, low-stakes support. These complementary strengths suggest that AI agents should not mimic human authenticity but augment it with reliability. Our findings advance human-agent interaction research and point to hybrid designs where human presence and AI consistency jointly sustain physical activity.

Authors:Patricia Marcella Evite, Ekaterina Svetlova, Doina Bucur
Title: Trade-offs in Financial AI: Explainability in a Trilemma with Accuracy and Compliance
Abstract:
As Artificial Intelligence (AI) becomes increasingly embedded in financial decision-making, the opacity of complex models presents significant challenges for professionals and regulators. While the field of Explainable AI (XAI) attempts to bridge this gap, current research often reduces the implementation challenge to a binary trade-off between model accuracy and explainability. This paper argues that such a view is insufficient for the financial domain, where algorithmic choices must navigate a complex sociotechnical web of strict regulatory bounds, budget constraints, and latency requirements. Through semi-structured interviews with twenty finance professionals, ranging from C-suite executives and developers to regulators across multiple regions, this study empirically investigates how practitioners prioritize explainability relative to four competing factors: accuracy, compliance, cost, and speed. Our findings reveal that these priorities are structured not as a simple trade-off, but as a system of distinct prerequisites and constraints. Accuracy and compliance emerge as non-negotiable "hygiene factors": without them, an AI system is viewed as a liability regardless of its transparency. Operational levers (speed and cost) serve as secondary constraints that determine practical feasibility, while ease of understanding functions as a gateway to adoption, shaping whether AI tools are trusted, used, and defensible in practice.

Authors:Veith Weilnhammer, Kevin YC Hou, Raymond Dolan, Matthew M Nour
Title: Vulnerability-Amplifying Interaction Loops: a systematic failure mode in AI chatbot mental-health interactions
Abstract:
Millions of users turn to consumer AI chatbots to discuss behavioral and mental health concerns. While this presents unprecedented opportunities to deliver population-level support, it also highlights an urgent need to develop rigorous and scalable safety evaluations. Here we introduce SIM-VAIL, an AI chatbot auditing framework that captures how harmful AI chatbot responses manifest across a range of mental-health contexts. SIM-VAIL pairs a simulated human user, harboring a distinct psychiatric vulnerability and conversational intent, with an audited frontier AI chatbot. It scores conversation turns on 13 clinically relevant risk dimensions, enabling context-dependent, temporally resolved assessment of mental-health risk. Across 810 conversations, encompassing over 90,000 turn-level ratings and 30 psychiatric user profiles, we find that significant risk occurs across virtually all user phenotypes. Risk manifested across most of the 9 consumer AI chatbot models audited, albeit mitigated in more modern variants. Rather than arising abruptly, risk accumulated over multiple turns. Risk profiles were phenotype-dependent, indicating that behaviors that appear supportive in general settings are liable to be maladaptive when they align with mechanisms that sustain a user's vulnerability. Multivariate risk patterns revealed trade-offs across dimensions, suggesting that mitigation targeting one harm domain can exacerbate others. These findings identify a novel failure mode in human-AI interactions, which we term Vulnerability-Amplifying Interaction Loops (VAILs), and underscore the need for multi-dimensional approaches to risk quantification. SIM-VAIL provides a scalable evaluation framework for quantifying how mental-health risk is distributed across user phenotypes, conversational trajectories, and clinically grounded behavioral dimensions, offering a foundation for targeted safety improvements.

Authors:Chen Chen, Dion Hoe-Lian Goh
Title: Seeing, Hearing, and Knowing Together: Multimodal Strategies in Deepfake Videos Detection
Abstract:
As deepfake videos become increasingly difficult for people to recognise, understanding the strategies humans use is key to designing effective media literacy interventions. We conducted a study with 195 participants between the ages of 21 and 40, who judged real and deepfake videos, rated their confidence, and reported the cues they relied on across visual, audio, and knowledge strategies. Participants were more accurate with real videos than with deepfakes and showed lower expected calibration error for real content. Through association rule mining, we identified cue combinations that shaped performance. Visual appearance, vocal, and intuition often co-occurred for successful identifications, which highlights the importance of multimodal approaches in human detection. Our findings show which cues help or hinder detection and suggest directions for designing media literacy tools that guide effective cue use. Building on these insights can help people improve their identification skills and become more resilient to deceptive digital media.

Authors:Jonatan Reyes, Mina Massoumi, Anil Ufuk Batmaz, Marta Kersten-Oertel
Title: Shades of Uncertainty: How AI Uncertainty Visualizations Affect Trust in Alzheimer's Predictions
Abstract:
Artificial intelligence (AI) is increasingly used to support prognosis in Alzheimer's disease (AD), but adoption remains limited due to a lack of transparency and interpretability, particularly for long-term predictions where uncertainty is intrinsic and outcomes may not be known for years. We position uncertainty visualization as an explainable AI (XAI) technique and examine how it shapes trust, confidence, and reliance when users interpret AI-generated forecasts of future cognitive decline transitions. We conducted two studies, one with general participants (N=37) and one with experts in neuroimaging and neurology (N=10), to compare binary (present/absent) and continuous (saturation) uncertainty encodings. Continuous encodings improved perceived reliability and helped users recognize model limitations, while binary encodings increased momentary confidence, revealing expertise-dependent trade-offs in interpreting future predictions under high uncertainty. These findings surface key challenges in designing uncertainty representations for prognostic AI and culminate in a set of empirically grounded guidelines for creating trustworthy, user-appropriate clinical decision support tools.

Authors:Jungmin Lee, Inhee Cho, Youngjae Yoo
Title: LeagueBot: A Voice LLM Companion of Cognitive and Emotional Support for Novice Players in Competitive Games
Abstract:
Competitive games pose steep learning curves and strong social pressures, often discouraging novice players and limiting sustained engagement. To address these challenges, this study introduces LeagueBot, a large language model-based voice chatbot designed to provide both informational and emotional support during live gameplay in league of legends, one of the most competitive multiplayer online battle arena games. In a within-subjects experiment with 33 novice players, LeagueBot was found to reduce cognitive challenge, performative challenge, and perceived tension. Qualitative analysis further identified three themes: enhanced access to game information, relief from cognitive burden, and practical limitations. Participants noted that LeagueBot offered context-appropriate guidance and emotional support, helping ease the steep learning curve and psychological pressures of competitive gaming. Together, these findings underscore the potential of voice-based LLM companions to assist novice players in competitive environments and highlight their broader applicability for real-time support in other high-pressure contexts.

Authors:Kayode P. Ayodele, Enoruwa Obayiuwana, Aderonke R. Lawal, Ayorinde Bamimore, Funmilayo B. Offiong, Emmanuel A. Peter
Title: Revising Bloom's Taxonomy for Dual-Mode Cognition in Human-AI Systems: The Augmented Cognition Framework
Abstract:
As artificial intelligence (AI) models become routinely integrated into knowledge work, cognitive acts increasingly occur in two distinct modes: individually, using biological resources alone, or distributed across a human-AI system. Existing revisions to Bloom's Taxonomy treat AI as an external capability to be mapped against human cognition rather than as a driver of this dual-mode structure, and thus fail to specify distinct learning outcomes and assessment targets for each mode. This paper proposes the Augmented Cognition Framework (ACF), a restructured taxonomy built on three principles. First, each traditional Bloom level operates in two modes (Individual and Distributed) with mode-specific cognitive verbs. Second, an asymmetric dependency relationship holds wherein effective Distributed cognition typically requires Individual cognitive foundations, though structured scaffolding can in some cases reverse this sequence. Third, a seventh level, Orchestration, specifies a governance capacity for managing mode-switching, trust calibration, and partnership optimization. We systematically compare existing AI-revised taxonomies against explicit assessment-utility criteria and show, across the frameworks reviewed, that ACF uniquely generates assessable learning outcomes for individual cognition, distributed cognition, and mode-governance as distinct targets. The framework addresses fluent incompetence, the central pedagogical risk of the AI era, by making the dependency relationship structurally explicit while accommodating legitimate scaffolding approaches.

Authors:Ashley Hua, Adya Daruka, Yang Hong, Sharifa Sultana
Title: "OpenBloom": A Question-Based LLM Tool to Support Stigma Reduction in Reproductive Well-Being
Abstract:
Reproductive well-being education remains widely stigmatized across diverse cultural contexts, constraining how individuals access and interpret reproductive health knowledge. We designed and evaluated OpenBloom, a stigma-sensitive, AI-mediated system that uses LLMs to transform reproductive health articles into reflective, question-based learning prompts. We employed OpenBloom as a design probe, aiming to explore the emerging challenges of reproductive well-being stigma through LLMs. Through surveys, semi-structured interviews, and focus group discussions, we examine how sociocultural stigma shapes participants' engagements with AI-generated questions and the opportunities of inquiry-based reproductive health education. Our findings identify key design considerations for stigma-sensitive LLM, including empathetic framing, inclusive language, values-based reflection, and explicit representation of marginalized identities. However, while current LLM outputs largely meet expectations for cultural sensitivity and non-offensiveness, they default to superficial rephrasing and factual recall rather than critical reflection. This guides well-being HCI design in sensitive health domains toward culturally grounded, participatory workflows.

Authors:Filip Nowicki, Hubert Marciniak, Jakub Łączkowski, Krzysztof Jassem, Tomasz Górecki, Vimala Balakrishnan, Desmond C. Ong, Maciej Behnke
Title: Visual Affect Analysis: Predicting Emotions of Image Viewers with Vision-Language Models
Abstract:
Vision-language models (VLMs) show promise as tools for inferring affect from visual stimuli at scale; it is not yet clear how closely their outputs align with human affective ratings. We benchmarked nine VLMs, ranging from state-of-the-art proprietary models to open-source models, on three psycho-metrically validated affective image datasets: the International Affective Picture System, the Nencki Affective Picture System, and the Library of AI-Generated Affective Images. The models performed two tasks in the zero-shot setting: (i) top-emotion classification (selecting the strongest discrete emotion elicited by an image) and (ii) continuous prediction of human ratings on 1-7/9 Likert scales for discrete emotion categories and affective dimensions. We also evaluated the impact of rater-conditioned prompting on the LAI-GAI dataset using de-identified participant metadata. The results show good performance in discrete emotion classification, with accuracies typically ranging from 60% to 80% on six-emotion labels and from 60% to 75% on a more challenging 12-category task. The predictions of anger and surprise had the lowest accuracy in all datasets. For continuous rating prediction, models showed moderate to strong alignment with humans (r > 0.75) but also exhibited consistent biases, notably weaker performance on arousal, and a tendency to overestimate response strength. Rater-conditioned prompting resulted in only small, inconsistent changes in predictions. Overall, VLMs capture broad affective trends but lack the nuance found in validated psychological ratings, highlighting their potential and current limitations for affective computing and mental health-related applications.

Authors:Sung-In Kim, Joonyoung Park, Bogoan Kim, Hwajung Hong
Title: "I Choose to Live, for Life Itself": Understanding Agency of Home-Based Care Patients Through Information Practices and Relational Dynamics in Care Networks
Abstract:
Home-based care (HBC) delivers medical and care services in patients' living environments, offering unique opportunities for patient-centered care. However, patient agency is often inadequately represented in shared HBC planning processes. Through 23 multi-stakeholder interviews with HBC patients, healthcare professionals, and care workers, alongside 60 hours of ethnographic observations, we examined how patient agency manifests in HBC and why this representation gap occurs. Our findings reveal that patient agency is not a static individual attribute but a relational capacity shaped through maintaining everyday continuity, mutual recognition from care providers, and engagement with material home environments. Furthermore, we identified that structured documentation systems filter out contextual knowledge, informal communication channels fragment patient voices, and doctor-centered hierarchies position patients as passive recipients. Drawing on these insights, we propose design considerations to bridge this representation gap and to integrate patient agency into shared HBC plans.

Authors:Aaron Pengyu Zhu, Kristina Mah, Janghee Cho
Title: Toward Pluralizing Reflection in HCI through Daoism
Abstract:
Reflection is fundamental to how people make sense of everyday life, helping them navigate moments of growth, uncertainty, and change. Yet in HCI, existing frameworks of designing technologies to support reflection remain narrow, emphasizing cognitive, rational problem-solving, and individual self-improvement. We introduce Daoist philosophy as a non-Western lens to broaden this scope and reimagine reflective practices in interactive systems. Combining insights from Daoist literature with semi-structured interviews with 18 Daoist priests, scholars, and practitioners, we identified three key dimensions of everyday reflection: Stillness, Resonance, and Emergence. These dimensions reveal emergent, embodied, relational, and ethically driven qualities often overlooked in HCI research. We articulate their potential to inform alternative frameworks for interactive systems for reflection, advocating a shift from reflection toward reflecting-with, and highlight the potential of Daoism as an epistemological resource for the HCI community.

Authors:Bartosz Sawicki, Tomasz Les, Dariusz Parzych, Aleksandra Wycisk-Ficek, Pawel Trebacz, Pawel Zawadzki
Title: Qualitative Evaluation of LLM-Designed GUI
Abstract:
As generative artificial intelligence advances, Large Language Models (LLMs) are being explored for automated graphical user interface (GUI) design. This study investigates the usability and adaptability of LLM-generated interfaces by analysing their ability to meet diverse user needs. The experiments included utilization of three state-of-the-art models from January 2025 (OpenAI GPT o3-mini-high, DeepSeek R1, and Anthropic Claude 3.5 Sonnet) generating mockups for three interface types: a chat system, a technical team panel, and a manager dashboard. Expert evaluations revealed that while LLMs are effective at creating structured layouts, they face challenges in meeting accessibility standards and providing interactive functionality. Further testing showed that LLMs could partially tailor interfaces for different user personas but lacked deeper contextual understanding. The results suggest that while LLMs are promising tools for early-stage UI prototyping, human intervention remains critical to ensure usability, accessibility, and user satisfaction.

Authors:Sumedh Karajagi, Sampad Bhusan Mohanty, Bhaskar Krishnamachari
Title: LEAP -- Live Experiments for Active Pedagogy
Abstract:
Interactive computational environments can help students explore algorithmic concepts through collaborative hands-on experimentation. However, static and instructor controlled demos in lectures limit engagement. Even when interactive visualizations are used, interactions are solely controlled by the instructor, leaving students as passive observers. In addition, the tools used for demonstration often vary significantly, as they are typically developed by individual instructors. Consequently, the visualizations remain confined to a single classroom, rather than being shared and adapted across courses or reused by other instructors. To address this gap and foster active engagement in live classrooms, we present a lightweight and seamless software framework named LEAP for developing interactive computational lab exercises using a simple idea: remotely callable instructor-defined functions. Using API endpoints and a provided client, students can discover and then call instructor defined functions remotely from their coding environment using scripts or interactive notebooks. Each function call is time-stamped and persistently logged in a database, allowing real-time visualization of participation, diverse solution paths, common pitfalls, and live feedback through collaboration, gamification, and quizzes. Labs are packaged as self-contained folders, each containing their own remotely callable functions. We provide example labs to demonstrate applications relevant for numerical analysis, machine learning, algorithms courses and mention some in electrical engineering (EE), economics, and physics. These capabilities enhance engagement and provide instructors with actionable insights into learning processes. With a standardized lab format and an online directory for community-contributed labs, we aim to foster a global ecosystem for exchanging and expanding interactive pedagogy enabled by LEAP.

Authors:Behnam Rahdari, Sameer Shaikh, Jonathan H Chen, Tobias Gerstenberg, Shriti Raj
Title: From Retrieving Information to Reasoning with AI: Exploring Different Interaction Modalities to Support Human-AI Coordination in Clinical Decision-Making
Abstract:
LLMs are popular among clinicians for decision-support because of simple text-based interaction. However, their impact on clinicians' performance is ambiguous. Not knowing how clinicians use this new technology and how they compare it to traditional clinical decision-support systems (CDSS) restricts designing novel mechanisms that overcome existing tool limitations and enhance performance and experience. This qualitative study examines how clinicians (n=12) perceive different interaction modalities (text-based conversation with LLMs, interactive and static UI, and voice) for decision-support. In open-ended use of LLM-based tools, our participants took a tool-centric approach using them for information retrieval and confirmation with simple prompts instead of use as active deliberation partners that can handle complex questions. Critical engagement emerged with changes to the interaction setup. Engagement also differed with individual cognitive styles. Lastly, benefits and drawbacks of interaction with text, voice and traditional UIs for clinical decision-support show the lack of a one-size-fits-all interaction modality.

Authors:Alejandro Benito-Santos, Florian Windhager, Aida Horaniet Ibañez, Rabea Kleymann, Alfie Abdul-Rahman, Eva Mayr
Title: Chasing Meaning and/or Insight? A Survey on Evaluation Practices at the Intersection of Visualization and the Humanities
Abstract:
The intersection of visualization and the humanities (VIS*H) is marked by a tension between chasing analytical "insight" and interpretive "meaning." The effectiveness of visualization techniques hinges on established evaluation frameworks that assess both analytical utility and communicative efficacy, creating a potential mismatch with the non-positivist, interpretive aims of humanities scholarship. To examine how this tension manifests in practice, we systematically surveyed 171 VIS*H design studies to analyze their evaluation workflows and rigor according to standard practice. Our findings reveal recurring flaws, such as an over-reliance on monomethod approaches, and show that higher-quality evaluations emerge from workflows that effectively triangulate diverse evidence. From these findings, we derive recommendations to refine quality and validation criteria for humanities visualizations, and juxtapose them to ongoing critical debates in the field, ultimately arguing for a paradigm shift that can reconcile the advantages of established validation techniques with the interpretive depth required for humanistic inquiry.

Authors:Ananya Shukla, Chaitanya Modi, Satvik Bajpai, Siddharth Siddharth
Title: GuideAI: A Real-time Personalized Learning Solution with Adaptive Interventions
Abstract:
Large Language Models (LLMs) have emerged as powerful learning tools, but they lack awareness of learners' cognitive and physiological states, limiting their adaptability to the user's learning style. Contemporary learning techniques primarily focus on structured learning paths, knowledge tracing, and generic adaptive testing but fail to address real-time learning challenges driven by cognitive load, attention fluctuations, and engagement levels. Building on findings from a formative user study (N=66), we introduce GuideAI, a multi-modal framework that enhances LLM-driven learning by integrating real-time biosensory feedback including eye gaze tracking, heart rate variability, posture detection, and digital note-taking behavior. GuideAI dynamically adapts learning content and pacing through cognitive optimizations (adjusting complexity based on learning progress markers), physiological interventions (breathing guidance and posture correction), and attention-aware strategies (redirecting focus using gaze analysis). Additionally, GuideAI supports diverse learning modalities, including text-based, image-based, audio-based, and video-based instruction, across varied knowledge domains. A preliminary study (N = 25) assessed GuideAI's impact on knowledge retention and cognitive load through standardized assessments. The results show statistically significant improvements in both problem-solving capability and recall-based knowledge assessments. Participants also experienced notable reductions in key NASA-TLX measures including mental demand, frustration levels, and effort, while simultaneously reporting enhanced perceived performance. These findings demonstrate GuideAI's potential to bridge the gap between current LLM-based learning systems and individualized learner needs, paving the way for adaptive, cognition-aware education at scale.

Authors:Jeremy Foote, Deepak Kumar, Bedadyuti Jha, Ryan Funkhouser, Loizos Bitsikokos, Hitesh Goel, Hsuen-Chi Chiu
Title: Taming Toxic Talk: Using chatbots to intervene with users posting toxic comments
Abstract:
Generative AI chatbots have proven surprisingly effective at persuading people to change their beliefs and attitudes in lab settings. However, the practical implications of these findings are not yet clear. In this work, we explore the impact of rehabilitative conversations with generative AI chatbots on users who share toxic content online. Toxic behaviors -- like insults or threats of violence, are widespread in online communities. Strategies to deal with toxic behavior are typically punitive, such as removing content or banning users. Rehabilitative approaches are rarely attempted, in part due to the emotional and psychological cost of engaging with aggressive users. In collaboration with seven large Reddit communities, we conducted a large-scale field experiment (N=893) to invite people who had recently posted toxic content to participate in conversations with AI chatbots. A qualitative analysis of the conversations shows that many participants engaged in good faith and even expressed remorse or a desire to change. However, we did not observe a significant change in toxic behavior in the following month compared to a control group. We discuss possible explanations for our findings, as well as theoretical and practical implications based on our results.

Authors:Wei Wei, Miguel A. Nacenta, Michelle F. Miranda, Charles Perin
Title: Locatability and Locatability Robustness of Visual Variables in Single Target Localization
Abstract:
Finding a particular object in a display is important for viewers in many visualizations, for example, when reacting to brushing or to a highlighted object. This can be enabled by making the target object different in one of the visual variables that determine the object's appearance; for example, by changing its color or size. Certain interpretations of the visual search literature have promoted the view that using visual variables such as hue-often labeled as preattentive-would make the target object automatically "popout," implying that an object can be located almost instantly, regardless of the number of objects in the display. In this paper we present a study that serves as a bridge between the extensive visual search literature and visualization, establishing empirical base measurements for the localization task. By testing displays with up to hundreds of objects, we are able to show that none of the common visual variables is immune to the increase in the number of objects. We also provide the first empirically informed comparisons between visual variables for this task in the context of visualization, and show how different visual variables have varying robustness with respect to two additional dimensions: the location of the target and the overall visual arrangement (layout). A free copy of this paper and all supplemental materials are available on our online repository: https://osf.io/z68ak/overview.

Authors:Joffrey Guilmet, Suzanne Sorli, Diego Vilela Monteiro
Title: Words have Weight: Comparing the use of pressure and weight as a metaphor in a User Interface in Virtual Reality
Abstract:
This work investigates how weight and pressure can function as haptic metaphors to support user interface notifications in Virtual Reality (VR). While prior research has explored ungrounded weight simulation and pneumatic feedback, their combined role in conveying information through UI elements remains underexplored. We developed a wearable haptic device that transfers liquid and air into flexible containers mounted on the back of the user's hand, allowing us to independently manipulate weight and pressure. Through an initial evaluation using three conditions-no feedback, weight only, and weight combined with pressure-we examined how these signals affect perceived heaviness, coherence with visual cues, and the perceived urgency of notifications. Our results validate that pressure amplifies the perception of weight, but this increased heaviness does not translate into higher perceived urgency. These findings suggest that while pressure___enhanced weight can enrich haptic rendering of UI elements in VR, its contribution to communicating urgency may require further investigation, alternative pressure profiles, or different types of notifications.

Authors:Eslam Zaher, Maciej Trzaskowski, Quan Nguyen, Fred Roosta
Title: Counterfactual Explanations on Robust Perceptual Geodesics
Abstract:
Latent-space optimization methods for counterfactual explanations - framed as minimal semantic perturbations that change model predictions - inherit the ambiguity of Wachter et al.'s objective: the choice of distance metric dictates whether perturbations are meaningful or adversarial. Existing approaches adopt flat or misaligned geometries, leading to off-manifold artifacts, semantic drift, or adversarial collapse. We introduce Perceptual Counterfactual Geodesics (PCG), a method that constructs counterfactuals by tracing geodesics under a perceptually Riemannian metric induced from robust vision features. This geometry aligns with human perception and penalizes brittle directions, enabling smooth, on-manifold, semantically valid transitions. Experiments on three vision datasets show that PCG outperforms baselines and reveals failure modes hidden under standard metrics.

Authors:Qiufang Yu, Mengmeng Wu, Xingyu Lan
Title: When Nobody Around Is Real: Exploring Public Opinions and User Experiences On the Multi-Agent AI Social Platform
Abstract:
Powered by large language models, a new genre of multi-agent social platforms has emerged. Apps such as Social.AI deploy numerous AI agents that emulate human behavior, creating unprecedented bot-centric social networks. Yet, existing research has predominantly focused on one-on-one chatbots, leaving multi-agent AI platforms underexplored. To bridge this gap, we took Social.AI as a case study and performed a two-stage investigation: (i) content analysis of 883 user comments; (ii) a 7-day diary study with 20 participants to document their firsthand platform experiences. While public discourse expressed greater skepticism, the diary study found that users did project a range of social expectations onto the AI agents. While some user expectations were met, the AI-dominant social environment introduces distinct problems, such as attention overload and homogenized interaction. These tensions signal a future where AI functions not merely as a tool or an anthropomorphized actor, but as the dominant medium of sociality itself-a paradigm shift that foregrounds new forms of architected social life.

Authors:Anne Arzberger, Celine Offerman, Ujwal Gadiraju, Alessandro Bozzon, Jie Yang
Title: "Label from Somewhere": Reflexive Annotating for Situated AI Alignment
Abstract:
AI alignment relies on annotator judgments, yet annotation pipelines often treat annotators as interchangeable, obscuring how their social position shapes annotation. We introduce reflexive annotating as a probe that invites crowd workers to reflect on how their positionality informs subjective annotation judgments in a language model alignment context. Through a qualitative study with crowd workers (N=30) and follow-up interviews (N=5), we examine how our probe shapes annotators' behaviour, experience, and the situated metadata it elicits. We find that reflexive annotating captures epistemic metadata beyond static demographics by eliciting intersectional reasoning, surfacing positional humility, and nudging viewpoint change. Crucially, we also denote tensions between reflexive engagement and affective demands such as emotional exposure. We discuss the implications of our work for richer value elicitation and alignment practices that treat annotator judgments as situated and selectively integrate positional metadata.

Authors:Killian Davitt, Dan Ristea, Steven J. Murdoch
Title: Are we collaborative yet? A Usability Perspective on Mixnet Latency for Real-Time Applications
Abstract:
Mixnet networks deliberately induce additional latency to communications to provide anonymity. Recent developments have allowed mixnets to reduce their latency from hours to seconds while maintaining the same level of anonymity. As a result, real-time communications are now possible on mixnets. There has been limited research on how users tolerate different levels of delay, and it is unclear what latency levels mixnet operators should choose. Previous studies about latency do not apply to these 'mid-latency' mixnet scenarios. Our paper contributes the first measurement of users' tolerance to real-time applications under mixnet delay. We design a text-based collaborative quiz system to test user response to latency where participants complete a set of question tasks in collaboration with a simulated second user. Different levels of latency are added, analogous to a modern mixnet system. We show that average delay parameters of 1s and 4s maintain usability, a mean delay of 7s shows some difficulty and a mean delay of 10s is detrimental to user experience. Using these delay parameters, mixnet operators can ensure that most types of real-time communication applications are usable. Mixnets thus can balance usability and anonymity without compromising either.

Authors:Shashank Prakash, Ranjitha Prasad, Avinash Agarwal
Title: Nishpaksh: TEC Standard-Compliant Framework for Fairness Auditing and Certification of AI Models
Abstract:
The growing reliance on Artificial Intelligence (AI) models in high-stakes decision-making systems, particularly within emerging telecom and 6G applications, underscores the urgent need for transparent and standardized fairness assessment frameworks. While global toolkits such as IBM AI Fairness 360 and Microsoft Fairlearn have advanced bias detection, they often lack alignment with region-specific regulatory requirements and national priorities. To address this gap, we propose Nishpaksh, an indigenous fairness evaluation tool that operationalizes the Telecommunication Engineering Centre (TEC) Standard for the Evaluation and Rating of Artificial Intelligence Systems. Nishpaksh integrates survey-based risk quantification, contextual threshold determination, and quantitative fairness evaluation into a unified, web-based dashboard. The tool employs vectorized computation, reactive state management, and certification-ready reporting to enable reproducible, audit-grade assessments, thereby addressing a critical post-standardization implementation need. Experimental validation on the COMPAS dataset demonstrates Nishpaksh's effectiveness in identifying attribute-specific bias and generating standardized fairness scores compliant with the TEC framework. The system bridges the gap between research-oriented fairness methodologies and regulatory AI governance in India, marking a significant step toward responsible and auditable AI deployment within critical infrastructure like telecommunications.

Authors:Sima Amirkhani, Mahla Fatemeh Alizadeh, Farzaneh Gerami, Dave Randall, Gunnar Stevens
Title: Talking about privacy always feels like opening a can of worms. How Intimate Partners Navigate Boundary-Setting in Mobile Phone Without Words
Abstract:
Mobile phones, as simultaneously personal and shared technologies, complicate how partners manage digital privacy in intimate relationships. While prior research has examined device-access practices, explicit privacy-rule negotiation, and toxic practices such as surveillance, little is known about how couples manage digital privacy without direct discussion in everyday relationships. To address this gap, we ask: How is digital privacy managed nonverbally and across different media on mobile phones? Drawing on 20 semi-structured interviews, we find that partners often regulate privacy practices through privacy silence -- the intentional avoidance of privacy-related conversations. We identify five motivations for leaving boundaries unspoken: perceiving privacy as unnecessary in intimacy, assuming implicit respect for boundaries, signaling trust and closeness, avoiding potential conflict or harm, and responding to broader societal and cultural expectations that discourage explicit privacy talk. We also identify a hierarchical grouping of content-specific privacy sensitivities, ranging from highly private domains such as financial data to lower-risk domains such as streaming accounts, and show how these priorities shift across relationship stages. These findings show how silence, culture, and content sensitivity shape everyday boundary-setting and underscore the relational and emotional dynamics underpinning mobile phone privacy management.

Authors:Dongshen Peng, Yi Wang, Carl Preiksaitis, Christian Rose
Title: SycoEval-EM: Sycophancy Evaluation of Large Language Models in Simulated Clinical Encounters for Emergency Care
Abstract:
Large language models (LLMs) show promise in clinical decision support yet risk acquiescing to patient pressure for inappropriate care. We introduce SycoEval-EM, a multi-agent simulation framework evaluating LLM robustness through adversarial patient persuasion in emergency medicine. Across 20 LLMs and 1,875 encounters spanning three Choosing Wisely scenarios, acquiescence rates ranged from 0-100\%. Models showed higher vulnerability to imaging requests (38.8\%) than opioid prescriptions (25.0\%), with model capability poorly predicting robustness. All persuasion tactics proved equally effective (30.0-36.0\%), indicating general susceptibility rather than tactic-specific weakness. Our findings demonstrate that static benchmarks inadequately predict safety under social pressure, necessitating multi-turn adversarial testing for clinical AI certification.

Authors:Sima Amirkhani, Mahla Fatemeh Alizadeh, Dave Randall, Gunnar Stevens, Douglas Zytko
Title: My Parents Expectations Were Overwhelming: Online Dating Romance Scams Targeting Minors in Iran Through Exploitation of Parental Pressure
Abstract:
Minors are at risk of myriad harms online, yet online dating romance scams are seldom considered one of them. While research of romance scams in Western countries finds victims to predominantly be middle-age, it is unknown if minors in geographic regions with cultural norms around teenage marriage are uniquely susceptible to online dating romance scams. We present an interview study with 16 victims of online dating romance scams in Iran who were minors when scammed. Findings show that, with westernized dating apps banned in Iran, scammers find teenage victims through messaging platforms tethered to local neighborhoods, offering relief for parental pressures around finding a marital partner and academic performance. Using threats, lies, and exploitation of emotional attachment lacking from their families, scammers pressured minors into financial and sexual favors. The study demonstrates how local cultural context should be foregrounded in future research on, and solutions for, technology-mediated harm against minors. Content Warning: This paper discusses sexual abuse.

Authors:Lalaram Arya, Mrinmoy Bhattacharjee, Adarsh C. R., S. R. Mahadeva Prasanna
Title: Timbre-Aware LLM-based Direct Speech-to-Speech Translation Extendable to Multiple Language Pairs
Abstract:
Direct Speech-to-Speech Translation (S2ST) has gained increasing attention for its ability to translate speech from one language to another, while reducing error propagation and latency inherent in traditional cascaded pipelines. However, existing direct S2ST systems continue to face notable challenges, including instability in semantic-acoustic alignment when parallel speech data is scarce, difficulty in preserving speaker identity, and limited multilingual scalability. In this work, we introduce DS2ST-LM, a scalable, single-stage direct S2ST framework leveraging a multilingual Large Language Model (LLM). The architecture integrates a Whisper speech encoder, a learnable projection module, a Qwen2-0.5B LLM, and a timbre-controlled vocoder. We construct GigaS2S-1000, a 1000-hour bilingual corpus by extending the GigaST dataset with high-fidelity synthetic target speech, and show that this synthetic data alleviates data scarcity to some extent. We investigate two semantic token generation strategies: speech-derived S3 tokens and text-derived tokens generated by a pre-trained LLM, and analyze their impact on training stability and semantic consistency. We further evaluate three projection architectures (Linear, Conv1D-Linear, and Q-Former) and observe that while higher-capacity projectors converge faster, the simple Linear projector achieves higher performance. Extensive experiments demonstrate that DS2ST-LM outperforms traditional cascaded and ST (Qwen-Audio) + TTS baselines across both lexical (BLEU, METEOR) and semantic (BLEURT, COMET) metrics, while extending to multiple language pairs, including French, Spanish, German, Hindi, Bengali, and Urdu. Furthermore, we incorporate timbre-aware speech synthesis to preserve speaker information, enabling DS2ST-LM to surpass prior direct S2ST systems in both speaker similarity and perceptual naturalness.

Authors:Anne Arzberger, Enrico Liscio, Maria Luce Lupetti, Inigo Martinez de Rituerto de Troya, Jie Yang
Title: Co-Constructing Alignment: A Participatory Approach to Situate AI Values
Abstract:
As AI systems become embedded in everyday practice, value misalignment has emerged as a pressing concern. Yet, dominant alignment approaches remain model centric, treating users as passive recipients of prespecified values rather than as epistemic agents who encounter and respond to misalignment during interactions. Drawing on situated perspectives, we frame alignment as an interactional practice co-constructed during human AI interaction. We investigate how users understand and wish to contribute to this process through a participatory workshop that combines misalignment diaries with generative design activities. We surface how misalignments materialise in practice and how users envision acting on them, grounded in the context of researchers using Large Language Models as research assistants. Our findings show that misalignments are experienced less as abstract ethical violations than as unexpected responses, and task or social breakdowns. Participants articulated roles ranging from adjusting and interpreting model behaviour to deliberate non-engagement as an alignment strategy. We conclude with implications for designing systems that support alignment as an ongoing, situated, and shared practice.

Authors:Lauren W. Wang, Mohamed Kari, Parastoo Abtahi
Title: Explainable OOHRI: Communicating Robot Capabilities and Limitations as Augmented Reality Affordances
Abstract:
Human interaction is essential for issuing personalized instructions and assisting robots when failure is likely. However, robots remain largely black boxes, offering users little insight into their evolving capabilities and limitations. To address this gap, we present explainable object-oriented HRI (X-OOHRI), an augmented reality (AR) interface that conveys robot action possibilities and constraints through visual signifiers, radial menus, color coding, and explanation tags. Our system encodes object properties and robot limits into object-oriented structures using a vision-language model, allowing explanation generation on the fly and direct manipulation of virtual twins spatially aligned within a simulated environment. We integrate the end-to-end pipeline with a physical robot and showcase diverse use cases ranging from low-level pick-and-place to high-level instructions. Finally, we evaluate X-OOHRI through a user study and find that participants effectively issue object-oriented commands, develop accurate mental models of robot limitations, and engage in mixed-initiative resolution.

Authors:Taoliang Tan, Chengwei Ma, Zhen Tian, Zhao Lin, Dongdong Li, Si Shi
Title: Intelligent Power Grid Design Review via Active Perception-Enabled Multimodal Large Language Models
Abstract:
The intelligent review of power grid engineering design drawings is crucial for power system safety. However, current automated systems struggle with ultra-high-resolution drawings due to high computational demands, information loss, and a lack of holistic semantic understanding for design error identification. This paper proposes a novel three-stage framework for intelligent power grid drawing review, driven by pre-trained Multimodal Large Language Models (MLLMs) through advanced prompt engineering. Mimicking the human expert review process, the first stage leverages an MLLM for global semantic understanding to intelligently propose domain-specific semantic regions from a low-resolution overview. The second stage then performs high-resolution, fine-grained recognition within these proposed regions, acquiring detailed information with associated confidence scores. In the final stage, a comprehensive decision-making module integrates these confidence-aware results to accurately diagnose design errors and provide a reliability assessment. Preliminary results on real-world power grid drawings demonstrate our approach significantly enhances MLLM's ability to grasp macroscopic semantic information and pinpoint design errors, showing improved defect discovery accuracy and greater reliability in review judgments compared to traditional passive MLLM inference. This research offers a novel, prompt-driven paradigm for intelligent and reliable power grid drawing review.

Authors:Shuo Niu, Dylan Clements, Hyungsin Kim
Title: Creating Disability Story Videos with Generative AI: Motivation, Expression, and Sharing
Abstract:
Generative AI (GenAI) is both promising and challenging in supporting people with disabilities (PwDs) in creating stories about disability. GenAI can reduce barriers to media production and inspire the creativity of PwDs, but it may also introduce biases and imperfections that hinder its adoption for personal expression. In this research, we examine how nine PwD from a disability advocacy group used GenAI to create videos sharing their disability experiences. Grounded in digital storytelling theory, we explore the motivations, expression, and sharing of PwD-created GenAI story videos. We conclude with a framework of momentous depiction, which highlights four core affordances of GenAI that either facilitate or require improvements to better support disability storytelling: non-capturable depiction, identity concealment and representation, contextual realism and consistency, and emotional articulation. Based on this framework, we further discuss design implications for GenAI in relation to story completion, media formats, and corrective mechanisms.

Authors:Yuhui Xu, Minha Lee, Stephan Wensveen, Mahla Alizadeh, Mathias Funk
Title: Conversing with Objects toward Fluid Human and Artificial Identities during Life Transitions
Abstract:
People's identities change during life transitions, e.g., studying abroad. They bring everyday objects that embody memories and reflect their identities during such moves. To assist in these transitions, we ask how people's human identities could be influenced by their objects through an artificial agent. This paper presents an exploratory research-through-design study around how people undergoing life transitions experience conversing with their everyday objects through a chatbot. Drawing on a two-week field deployment and interviews with 12 participants, we contribute (1) a conceptualization of 'trans-embodiment' describing the asynchronous imagination of object and human identities on the chatbot, (2) empirical evidence of the resulting emotional and reflective experiences, and (3) three types of object identities for designing conversational agents that role-play objects. Our contributions sum up to triangulating human-agent-object identity as trans-embodiment in supporting life transitions.

Authors:Avijoy Chakma, Adity Khisa, Soham Khisa, Jannatun Noor, Sharifa Sultana
Title: Re-educating Educated Ones: A Case Study on Chakma Language Revitalization in Chittagong Hill Tracts
Abstract:
Indigenous languages face significant cultural oppression from official state languages, particularly in the Global South. We investigate the Bangladeshi Chakma language revitalization movement, a community grappling with language liquidity and amalgamation into the dominant Bengali language. Our six-month-long qualitative study involving interviews and focus group discussions with Chakma language learning stakeholders uncovered existing community socio-economic challenges and resilience strategies. We noted the need for culturally grounded digital tools and resources. We propose an ICT-mediated community-centric framework for Indigenous language revitalization in the Global South, emphasizing the integration of historical identity elements, stakeholder-defined requirements, and effective digital engagement strategies to empower communities in preserving their linguistic and cultural heritage.

Authors:Yinan Li, Hasti Seifi
Title: Sound2Hap: Learning Audio-to-Vibrotactile Haptic Generation from Human Ratings
Abstract:
Environmental sounds like footsteps, keyboard typing, or dog barking carry rich information and emotional context, making them valuable for designing haptics in user applications. Existing audio-to-vibration methods, however, rely on signal-processing rules tuned for music or games and often fail to generalize across diverse sounds. To address this, we first investigated user perception of four existing audio-to-haptic algorithms, then created a data-driven model for environmental sounds. In Study 1, 34 participants rated vibrations generated by the four algorithms for 1,000 sounds, revealing no consistent algorithm preferences. Using this dataset, we trained Sound2Hap, a CNN-based autoencoder, to generate perceptually meaningful vibrations from diverse sounds with low latency. In Study 2, 15 participants rated its output higher than signal-processing baselines on both audio-vibration match and Haptic Experience Index (HXI), finding it more harmonious with diverse sounds. This work demonstrates a perceptually validated approach to audio-haptic translation, broadening the reach of sound-driven haptics.

Authors:Dinanath Padhya, Jenish Pant, Krishna Acharya, Sajen Maharjan, Sudip Kumar Thakur
Title: Design and Implementation of a Multi-Purpose Low-Cost Hall-Effect Sensor Glove for Sign Language Recognition
Abstract:
Despite the prevalence of severe hearing loss affecting over 430 million people globally, access to sign language interpretation remains critically scarce, particularly in low-resource settings like Nepal. Assistive technologies divide into two flawed categories: prohibitively expensive commercial gloves (often exceeding \$3,000) or fragile research prototypes reliant on flex sensors that degrade rapidly under mechanical stress. This paper introduces a robust, cost-effective sign language recognition system tailored for the Nepali Sign Language (NSL) community. Departing from traditional resistive sensing, we implement a non-contact Hall-effect architecture that correlates magnetic field intensity with finger flexion, eliminating mechanical wear and signal drift. The system integrates 14 sensor nodes across the DIP, PIP, and MCP joints, augmented by an MPU6050 IMU for wrist orientation. An embedded Multi-Layer Perceptron, executed locally on an Arduino Mega, performs gesture classification, negating the need for cloud dependencies. With a Bill of Materials between \$80 and \$100, this solution is approximately 30 times more affordable than market alternatives. Validation trials across five subjects yielded 96\% accuracy on a fundamental NSL vocabulary. Stress testing confirmed that the Hall-effect configuration maintains signal fidelity over repeated cycles where traditional sensors fail. This study demonstrates that high-precision recognition is achievable through strategic engineering rather than premium components, offering a scalable pathway for deployment in Nepal's deaf schools.

Authors:Julie Y. A. Cachia, Xuan Zhao, John Hunter, Delancey Wu, Eta Lin, Julian De Freitas
Title: AI for Proactive Mental Health: A Multi-Institutional, Longitudinal, Randomized Controlled Trial
Abstract:
Young adults today face unprecedented mental health challenges, yet many hesitate to seek support due to barriers such as accessibility, stigma, and time constraints. Bite-sized well-being interventions offer a promising solution to preventing mental distress before it escalates to clinical levels, but have not yet been delivered through personalized, interactive, and scalable technology. We conducted the first multi-institutional, longitudinal, preregistered randomized controlled trial of a generative AI-powered mobile app ("Flourish") designed to address this gap. Over six weeks in Fall 2024, 486 undergraduate students from three U.S. institutions were randomized to receive app access or waitlist control. Participants in the treatment condition reported significantly greater positive affect, resilience, and social well-being (i.e., increased belonging, closeness to community, and reduced loneliness) and were buffered against declines in mindfulness and flourishing. These findings suggest that, with purposeful and ethical design, generative AI can deliver proactive, population-level well-being interventions that produce measurable benefits.

Authors:Hsuen-Chi Chiu, Jeremy Foote
Title: Chatting with Confidants or Corporations? Privacy Management with AI Companions
Abstract:
AI chatbots designed as emotional companions blur the boundaries between interpersonal intimacy and institutional software, creating a complex, multi-dimensional privacy environment. Drawing on Communication Privacy Management theory and Masur's horizontal (user-AI) and vertical (user-platform) privacy framework, we conducted in-depth interviews with fifteen users of companion AI platforms such as Replika and Character.AI. Our findings reveal that users blend interpersonal habits with institutional awareness: while the non-judgmental, always-available nature of chatbots fosters emotional safety and encourages self-disclosure, users remain mindful of institutional risks and actively manage privacy through layered strategies and selective sharing. Despite this, many feel uncertain or powerless regarding platform-level data control. Anthropomorphic design further blurs privacy boundaries, sometimes leading to unintentional oversharing and privacy turbulence. These results extend privacy theory by highlighting the unique interplay of emotional and institutional privacy management in human-AI companionship.

Authors:Leonie Dyck, Aiko Galetzka, Maximilian Noller, Anna-Lena Rinke, Jutta Bormann, Jekaterina Miller, Michelle Hochbaum, Julia Siemann, Jördis Alboth, Andre Berwinkel, Johanna Luz, Britta Kley-Zobel, Marcine Cyrys, Nora Flöttmann, Ariane Vogeler, Mariia Melnikova, Ira-Katharina Petras, Michael Siniatchkin, Winfried Barthlen, Anna-Lisa Vollmer
Title: Interprofessional and Agile Development of Mobirobot: A Socially Assistive Robot for Pediatric Therapy Across Clinical and Therapeutic Settings
Abstract:
Introduction: Socially assistive robots hold promise for enhancing therapeutic engagement in paediatric clinical settings. However, their successful implementation requires not only technical robustness but also context-sensitive, co-designed solutions. This paper presents Mobirobot, a socially assistive robot developed to support mobilisation in children recovering from trauma, fractures, or depressive disorders through personalised exercise programmes. Methods: An agile, human-centred development approach guided the iterative design of Mobirobot. Multidisciplinary clinical teams and end users were involved throughout the co-development process, which focused on early integration into real-world paediatric surgical and psychiatric settings. The robot, based on the NAO platform, features a simple setup, adaptable exercise routines with interactive guidance, motivational dialogue, and a graphical user interface (GUI) for monitoring and no-code system feedback. Results: Deployment in hospital environments enabled the identification of key design requirements and usability constraints. Stakeholder feedback led to refinements in interaction design, movement capabilities, and technical configuration. A feasibility study is currently underway to assess acceptance, usability, and perceived therapeutic benefit, with data collection including questionnaires, behavioural observations, and staff-patient interviews. Discussion: Mobirobot demonstrates how multiprofessional, stakeholder-led development can yield a socially assistive system suited for dynamic inpatient settings. Early-stage findings underscore the importance of contextual integration, robustness, and minimal-intrusion design. While challenges such as sensor limitations and patient recruitment remain, the platform offers a promising foundation for further research and clinical application.

Authors:Rose Connolly, Victor Zordan, Rachel McDonnell
Title: Perceptually-Guided Adjusted Teleporting: Perceptual Thresholds for Teleport Displacements in Virtual Environments
Abstract:
Teleportation is one of the most common locomotion techniques in virtual reality, yet its perceptual properties remain underexplored. While redirected walking research has shown that users' movements can be subtly manipulated without detection, similar imperceptible adjustments for teleportation have not been systematically investigated. This study examines the thresholds at which teleportation displacements become noticeable to users. We conducted a repeated-measures experiment in which participants' selected teleport destinations were altered in both direction (forwards, backwards) and at different ranges (small, large). Detection thresholds for these positional adjustments were estimated using a psychophysical staircase method with a two-alternative forced choice (2AFC) task. Results show that teleport destinations can be shifted without detection, with larger tolerances for backward adjustments and across longer teleport ranges. These findings establish baseline perceptual limits for redirected teleportation and highlight its potential as a design technique. Applications include supporting interpersonal distance management in social VR, guiding players toward objectives in games, and assisting novice users with navigation. By identifying the limits of imperceptible teleportation adjustments, this work extends redirection principles beyond walking to teleportation and opens new opportunities for adaptive and socially aware VR locomotion systems.

Authors:Yejoon Song, Bandi Kim, Yeju Kwon, Sung Park
Title: Exploring the Effects of Generative AI Assistance on Writing Self-Efficacy
Abstract:
Generative AI (GenAI) is increasingly used in academic writing, yet its effects on students' writing self-efficacy remain contingent on how assistance is configured. This pilot study investigates how ideation-level, sentence-level, full-process, and no AI support differentially shape undergraduate writers' self-efficacy using a 2 by 2 experimental design with Korean undergraduates completing argumentative writing tasks. Results indicate that AI assistance does not uniformly enhance self-efficacy full AI support produced high but stable self-efficacy alongside signs of reduced ownership, sentence-level AI support led to consistent self-efficacy decline, and ideation-level AI support was associated with both high self-efficacy and positive longitudinal change. These findings suggest that the locus of AI intervention, rather than the amount of assistance, is critical in fostering writing self-efficacy while preserving learner agency.

Authors:Cassidy R. Nelson, Joseph L. Gabbard, Jason B. Moats, Ranjana K. Mehta
Title: Simulations for Augmented Reality Evaluation for Mass Casualty Incident Triage
Abstract:
Mass casualty incidents (MCIs) are a high-risk, sensitive domain with profound implications for patient and responder safety. Augmented reality has shown promise as an assistive tool for high-stress work domains and MCI triage both in the field and for pre-field training. However, the vulnerability of MCIs makes it challenging to evaluate new tools designed to enhance MCI response. In other words, profound evolutions like the integration of augmented reality into field response require thorough proof-of-concept evaluations before being launched into real-world response. This paper describes two progressive simulation strategies for augmented reality that bridge the gap between computer-based simulation and actual field response.

Authors:Blessing Jerry, Lourdes Moreno, Virginia Francisco, Raquel Hervas
Title: LLM-Driven Accessible Interface: A Model-Based Approach
Abstract:
The integration of Large Language Models (LLMs) into interactive systems opens new opportunities for adaptive user experiences, yet it also raises challenges regarding accessibility, explainability, and normative compliance. This paper presents an implemented model-driven architecture for generating personalised, multimodal, and accessibility-aligned user interfaces. The approach combines structured user profiles, declarative adaptation rules, and validated prompt templates to refine baseline accessible UI templates that conform to WCAG 2.2 and EN 301 549, tailored to cognitive and sensory support needs. LLMs dynamically transform language complexity, modality, and visual structure, producing outputs such as Plain-Language text, pictograms, and high-contrast layouts aligned with ISO 24495-1 and W3C COGA guidance. A healthcare use case demonstrates how the system generates accessible post-consultation medication instructions tailored to a user profile comprising cognitive disability and hearing impairment. SysML v2 models provide explicit traceability between user needs, adaptation rules, and normative requirements, ensuring explainable and auditable transformations. Grounded in Human-Centered AI (HCAI), the framework incorporates co-design processes and structured feedback mechanisms to guide iterative refinement and support trustworthy generative behaviour.

Authors:Sheng-Kai Chen, Jyh-Horng Wu, Ching-Yao Lin, Yen-Ting Lin
Title: An Intelligent AI glasses System with Multi-Agent Architecture for Real-Time Voice Processing and Task Execution
Abstract:
This paper presents an AI glasses system that integrates real-time voice processing, artificial intelligence(AI) agents, and cross-network streaming capabilities. The system employs dual-agent architecture where Agent 01 handles Automatic Speech Recognition (ASR) and Agent 02 manages AI processing through local Large Language Models (LLMs), Model Context Protocol (MCP) tools, and Retrieval-Augmented Generation (RAG). The system supports real-time RTSP streaming for voice and video data transmission, eye tracking data collection, and remote task execution through RabbitMQ messaging. Implementation demonstrates successful voice command processing with multilingual support and cross-platform task execution capabilities.

Authors:Alfonso Piscitelli, Cristina David, Mattia De Rosa, Ali Mohammed, Federico Nanni, Jacob Pake, Roly Perera, Jessy Sodimu, Chenyiqiu Zheng
Title: AI-Assisted Authoring for Transparent, Data-Driven Documents
Abstract:
We introduce _transparent documents_, interactive web-based scholarly articles which allow readers to explore the relationship to the underlying data by hovering over fragments of text, and present an LLM-based tool for authoring transparent documents, building on recent developments in data provenance for general-purpose programming languages. As a target platform, our implementation uses Fluid, an open source programming language with a provenance-tracking runtime. Our agent-based tool supports a human author during the creation of transparent documents, identifying fragments of text which can be computed from data, such as numerical values selected from records or computed by aggregations like sum and mean, comparatives and superlatives like _better than_ and _largest_, trend-adjectives like _growing_, and similar quantitative or semi-quantitative phrases, and then attempts to synthesise a suitable Fluid query over the data which generates the target string. The resulting expression is inserted into the article's web page, turning the static text fragment into an interactable data-driven element able to reveal the data that underwrites the natural language claim. We evaluate our approach on a subset of SciGen, an open source dataset consisting of tables from scientific articles and their corresponding descriptions, which we extend with hand-generated counterfactual test cases to evaluate how well machine-generated expressions generalise. Our results show that gpt4o is often able to synthesise compound expressions extensionally compatible with our gold solutions.

Authors:Goran Muric, Steven Minton
Title: A Framework for Optimizing Human-Machine Interaction in Classification Systems
Abstract:
Automated decision systems increasingly rely on human oversight to ensure accuracy in uncertain cases. This paper presents a practical framework for optimizing such human-in-the-loop classification systems using a double-threshold policy. Conventional classifiers usually produce a confidence score and apply a single cutoff, but our approach uses two thresholds (a lower and an upper) to automatically accept or reject high-confidence cases while routing ambiguous instances to human reviewers. We formulate this problem as an optimization task that balances system accuracy against the cost of human review. Through analytical derivations and Monte Carlo simulations, we show how different confidence score distributions impact the efficiency of human intervention and reveal regions of diminishing returns, where additional review yields minimal benefit. The framework provides a general, reproducible method for improving reliability in any decision pipeline requiring selective human validation, including applications in entity resolution, fraud detection, medical triage, and content moderation.

Authors:Yerin Kwak, Siddharth Adelkar, Zachary A. Pardos
Title: Advancing credit mobility through stakeholder-informed AI design and adoption
Abstract:
Transferring from a 2-year to a 4-year college is crucial for socioeconomic mobility, yet students often face challenges ensuring their credits are fully recognized, leading to delays in their academic progress and unexpected costs. Determining whether courses at different institutions are equivalent (i.e., articulation) is essential for successful credit transfer, as it minimizes unused credits and increases the likelihood of bachelor's degree completion. However, establishing articulation agreements remains time- and resource-intensive, as all candidate articulations are reviewed manually. Although recent efforts have explored the use of artificial intelligence to support this work, its use in articulation practice remains limited. Given these challenges and the need for scalable support, this study applies artificial intelligence to suggest articulations between institutions in collaboration with the State University of New York system, one of the largest systems of higher education in the US. To develop our methodology, we first surveyed articulation staff and faculty to assess adoption rates of baseline algorithmic recommendations and gather feedback on perceptions and concerns about these recommendations. Building on these insights, we developed a supervised alignment method that addresses superficial matching and institutional biases in catalog descriptions, achieving a 5.5-fold improvement in accuracy over previous methods. Based on articulation predictions of this method and a 61% average surveyed adoption rate among faculty and staff, these findings project a 12-fold increase in valid credit mobility opportunities that would otherwise remain unrealized. This study suggests that stakeholder-informed design of AI in higher education administration can expand student credit mobility and help reshape current institutional decision-making in course articulation.

Authors:Kyuwon Kim, Jeanhee Lee, Sung-Eun Kim, Hyo-Jeong So
Title: Productive Discussion Moves in Groups Addressing Controversial Issues
Abstract:
Engaging learners in dialogue around controversial issues is essential for examining diverse values and perspectives in pluralistic societies. While prior research has identified productive discussion moves mainly in STEM-oriented contexts, less is known about what constitutes productive discussion in ethical and value-laden discussions. This study investigates productive discussion in AI ethics dilemmas using a dialogue-centric learning analytics approach. We analyze small-group discussions among undergraduate students through a hybrid method that integrates expert-informed coding with data-driven topic modeling. This process identifies 14 discussion moves across five categories, including Elaborating Ideas, Position Taking, Reasoning & Justifications, Emotional Expression, and Discussion Management. We then examine how these moves relate to discussion quality and analyze sequential interaction patterns using Ordered Network Analysis. Results indicate that emotive and experiential arguments and explicit acknowledgment of ambiguity are strong positive predictors of discussion quality, whereas building on ideas is negatively associated. Ordered Network Analysis further reveals that productive discussions are characterized by interactional patterns that connect emotional expressions to evidence-based reasoning. These findings suggest that productive ethical discussion is grounded not only in reasoning and justification but also in the constructive integration of emotional expression.

Authors:Yildiz Uzun, Andrea Gauthier, Mutlu Cukurova
Title: What Students Ask, How a Generative AI Assistant Responds: Exploring Higher Education Students' Dialogues on Learning Analytics Feedback
Abstract:
Learning analytics dashboards (LADs) aim to support students' regulation of learning by translating complex data into feedback. Yet students, especially those with lower self-regulated learning (SRL) competence, often struggle to engage with and interpret analytics feedback. Conversational generative artificial intelligence (GenAI) assistants have shown potential to scaffold this process through real-time, personalised, dialogue-based support. Further advancing this potential, we explored authentic dialogues between students and GenAI assistant integrated into LAD during a 10-week semester. The analysis focused on questions students with different SRL levels posed, the relevance and quality of the assistant's answers, and how students perceived the assistant's role in their learning. Findings revealed distinct query patterns. While low SRL students sought clarification and reassurance, high SRL students queried technical aspects and requested personalised strategies. The assistant provided clear and reliable explanations but limited in personalisation, handling emotionally charged queries, and integrating multiple data points for tailored responses. Findings further extend that GenAI interventions can be especially valuable for low SRL students, offering scaffolding that supports engagement with feedback and narrows gaps with their higher SRL peers. At the same time, students' reflections underscored the importance of trust, need for greater adaptivity, context-awareness, and technical refinement in future systems.

Authors:Miki Okamura, Shuhey Koyama, Li Jingjing, Yoichi Ochiai
Title: OnomaCompass: A Texture Exploration Interface that Shuttles between Words and Images
Abstract:
Humans can finely perceive material textures, yet articulating such somatic impressions in words is a cognitive bottleneck in design ideation. We present OnomaCompass, a web-based exploration system that links sound-symbolic onomatopoeia and visual texture representations to support early-stage material discovery. Instead of requiring users to craft precise prompts for generative AI, OnomaCompass provides two coordinated latent-space maps--one for texture images and one for onomatopoeic term--built from an authored dataset of invented onomatopoeia and corresponding textures generated via Stable Diffusion. Users can navigate both spaces, trigger cross-modal highlighting, curate findings in a gallery, and preview textures applied to objects via an image-editing model. The system also supports video interpolation between selected textures and re-embedding of extracted frames to form an emergent exploration loop. We conducted a within-subjects study with 11 participants comparing OnomaCompass to a prompt-based image-generation workflow using Gemini 2.5 Flash Image ("Nano Banana"). OnomaCompass significantly reduced workload (NASA-TLX overall, mental demand, effort, and frustration; p < .05) and increased hedonic user experience (UEQ), while usability (SUS) favored the baseline. Qualitative findings indicate that OnomaCompass helps users externalize vague sensory expectations and promotes serendipitous discovery, but also reveals interaction challenges in spatial navigation. Overall, leveraging sound symbolism as a lightweight cue offers a complementary approach to Kansei-driven material ideation beyond prompt-centric generation.

Authors:Behrad Binaei-Haghighi, Nafiseh Sadat Sajadi, Mehrad Liviyan, Reyhane Akhavan Kharazi, Fatemeh Amirkhani, Behnam Bahrak
Title: ArtCognition: A Multimodal AI Framework for Affective State Sensing from Visual and Kinematic Drawing Cues
Abstract:
The objective assessment of human affective and psychological states presents a significant challenge, particularly through non-verbal channels. This paper introduces digital drawing as a rich and underexplored modality for affective sensing. We present a novel multimodal framework, named ArtCognition, for the automated analysis of the House-Tree-Person (HTP) test, a widely used psychological instrument. ArtCognition uniquely fuses two distinct data streams: static visual features from the final artwork, captured by computer vision models, and dynamic behavioral kinematic cues derived from the drawing process itself, such as stroke speed, pauses, and smoothness. To bridge the gap between low-level features and high-level psychological interpretation, we employ a Retrieval-Augmented Generation (RAG) architecture. This grounds the analysis in established psychological knowledge, enhancing explainability and reducing the potential for model hallucination. Our results demonstrate that the fusion of visual and behavioral kinematic cues provides a more nuanced assessment than either modality alone. We show significant correlations between the extracted multimodal features and standardized psychological metrics, validating the framework's potential as a scalable tool to support clinicians. This work contributes a new methodology for non-intrusive affective state assessment and opens new avenues for technology-assisted mental healthcare.

Authors:Keiichi Ihara, Ikkaku Kawaguchi
Title: AR Object Layout Method Using Miniature Room Generated from Depth Data
Abstract:
In augmented reality (AR), users can place virtual objects anywhere in a real-world room, called AR layout. Although several object manipulation techniques have been proposed in AR, it is difficult to use them for AR layout owing to the difficulty in freely changing the position and size of virtual objects. In this study, we make the World-in-Miniature (WIM) technique available in AR to support AR layout. The WIM technique is a manipulation technique that uses miniatures, which has been proposed as a manipulation technique for virtual reality (VR). Our system uses the AR device's depth sensors to acquire a mesh of the room in real-time to create and update a miniature of a room in real-time. In our system, users can use miniature objects to move virtual objects to arbitrary positions and scale them to arbitrary sizes. In addition, because the miniature object can be manipulated instead of the real-scale object, we assumed that our system will shorten the placement time and reduce the workload of the user. In our previous study, we created a prototype and investigated the properties of manipulating miniature objects in AR. In this study, we conducted an experiment to evaluate how our system can support AR layout. To conduct a task close to the actual use, we used various objects and made the participants design an AR layout of their own will. The results showed that our system significantly reduced workload in physical and temporal demand. Although, there was no significant difference in the total manipulation time.

Authors:Oran Duan, Yinghua Shen, Yingzhu Lv, Luyang Jie, Yaxin Liu, Qiong Wu
Title: Listen to Rhythm, Choose Movements: Autoregressive Multimodal Dance Generation via Diffusion and Mamba with Decoupled Dance Dataset
Abstract:
Advances in generative models and sequence learning have greatly promoted research in dance motion generation, yet current methods still suffer from coarse semantic control and poor coherence in long sequences. In this work, we present Listen to Rhythm, Choose Movements (LRCM), a multimodal-guided diffusion framework supporting both diverse input modalities and autoregressive dance motion generation. We explore a feature decoupling paradigm for dance datasets and generalize it to the Motorica Dance dataset, separating motion capture data, audio rhythm, and professionally annotated global and local text descriptions. Our diffusion architecture integrates an audio-latent Conformer and a text-latent Cross-Conformer, and incorporates a Motion Temporal Mamba Module (MTMM) to enable smooth, long-duration autoregressive synthesis. Experimental results indicate that LRCM delivers strong performance in both functional capability and quantitative metrics, demonstrating notable potential in multimodal input scenarios and extended sequence generation. We will release the full codebase, dataset, and pretrained models publicly upon acceptance.

Authors:Neziha Akalin, Alberto Giaretta
Title: From Chat Control to Robot Control: The Backdoors Left Open for the Sake of Safety
Abstract:
This paper explores how a recent European Union proposal, the so-called Chat Control law, which creates regulatory incentives for providers to implement content detection and communication scanning, could transform the foundations of human-robot interaction (HRI). As robots increasingly act as interpersonal communication channels in care, education, and telepresence, they convey not only speech but also gesture, emotion, and contextual cues. We argue that extending digital surveillance laws to such embodied systems would entail continuous monitoring, embedding observation into the very design of everyday robots. This regulation blurs the line between protection and control, turning companions into potential informants. At the same time, monitoring mechanisms that undermine end-to-end encryption function as de facto backdoors, expanding the attack surface and allowing adversaries to exploit legally induced monitoring infrastructures. This creates a paradox of safety through insecurity: systems introduced to protect users may instead compromise their privacy, autonomy, and trust. This work does not aim to predict the future, but to raise awareness and help prevent certain futures from materialising.

Authors:Ka Yan Fung, Kwong Chiu Fung, Yuxing Tao, Tze Leung Rick Lui, Kuen Fung Sin
Title: LiveBo: Empowering Non-Chinese Speaking Students through AI-Driven Real-Life Scenarios in Cantonese
Abstract:
Language learning is a multifaceted process. Insufficient vocabulary can hinder communication and lead to demotivation. For non-Chinese speaking (NCS) students, learning Traditional Chinese (Cantonese) poses distinct challenges, particularly due to the complexity of converting spoken and written forms. To address this issue, this study examines the effectiveness of real-life scenario simulations integrated with interactive social robots in enhancing NCS student engagement and language acquisition. The research employs a quasi-experimental design involving NCS students who interact with an AI-driven, robot-assisted language learning system, LiveBo. The study aims to assess the impact of this innovative approach on active participation and motivation. Data are collected through proficiency tests, questionnaires and semi-structured interviews. Findings indicate that NCS students experience positive improvements in behavioural and emotional engagement, motivation and learning outcomes, highlighting the potential of integrating novel technologies in language education. We plan to compare with the control group in the future. This study highlights the significance of interactive and immersive learning experiences in promoting motivation and enhancing language acquisition among NCS students.

Authors:Ka Yan Fung, Tze Leung Rick Lui, Yuxing Tao, Kuen Fung Sin
Title: MotiBo: The Impact of Interactive Digital Storytelling Robots on Student Motivation through Self-Determination Theory
Abstract:
Creativity is increasingly recognized as an important skill in education, and storytelling can enhance motivation and engagement among students. However, conventional storytelling methods often lack the interactive elements necessary to engage students. To this end, this study examines the impact of an interactive digital storytelling system incorporating a human-like robot on student engagement and creativity. The study aims to compare engagement levels across three modalities: paper-based, PowerPoint, and robot-assisted storytelling, MotiBo. Utilizing a quasi-experimental design, this work involves three groups of students who interact with the storytelling system over a five-day learning. Findings reveal that students using MotiBo exhibit statistically significant improvement in behavioural and cognitive engagement compared to those using traditional methods. These results suggest that the integration of novel technologies can effectively enhance the learning experience, ultimately promoting creativity and self-learning ability in educational settings. Future research will investigate the long-term effects of these technologies on learning outcomes and explore their potential for broader applications in diverse educational contexts.

Authors:Sankar B, Srinidhi Ranjini Girish, Aadya Bharti, Dibakar Sen
Title: Progressive Ideation using an Agentic AI Framework for Human-AI Co-Creation
Abstract:
The generation of truly novel and diverse ideas is important for contemporary engineering design, yet it remains a significant cognitive challenge for novice designers. Current 'single-spurt' AI systems exacerbate this challenge by producing a high volume of semantically clustered ideas. We propose MIDAS (Meta-cognitive Ideation through Distributed Agentic AI System), a novel framework that replaces the single-AI paradigm with a distributed 'team' of specialized AI agents designed to emulate the human meta-cognitive ideation workflow. This agentic system progressively refines ideas and assesses each one for both global novelty (against existing solutions) and local novelty (against previously generated ideas). MIDAS, therefore, demonstrates a viable and progressive paradigm for true human-AI co-creation, elevating the human designer from a passive filterer to a participatory, active, collaborative partner.

Authors:Yunjia Guo, Jinghan Zhu, Siyu Wang, Haixin Qiao
Title: Bounded Autonomy: Controlling LLM Characters in Live Multiplayer Games
Abstract:
Large language models (LLMs) are bringing richer dialogue and social behavior into games, but they also expose a control problem that existing game interfaces do not directly address: how should LLM characters participate in live multiplayer interaction while remaining executable in the shared game world, socially coherent with other active characters, and steerable by players when needed? We frame this problem as bounded autonomy, a control architecture for live multiplayer games that organizes LLM character control around three interfaces: agent-agent interaction, agent-world action execution, and player-agent steering. We instantiate bounded autonomy with probabilistic reply-chain decay, an embedding-based action grounding pipeline with fallback, and whisper, a lightweight soft-steering technique that lets players influence a character's next move without fully overriding autonomy. We deploy this architecture in a live multiplayer social game and study its behavior through analyses of interaction stability, grounding quality, whisper intervention success, and formative interviews. Our results show how bounded autonomy makes LLM character interaction workable in practice, frames controllability as a distinct runtime control problem for LLM characters in live multiplayer games, and provides a concrete exemplar for future games built around this interaction paradigm.

Authors:Wenjuan Zhong, Chenfei Ma, Kianoush Nazarpour
Title: On Optimizing Electrode Configuration for Wrist-Worn sEMG-Based Thumb Gesture Recognition
Abstract:
Thumb gestures provide an effective and unobtrusive input modality for wearable and always-available human-machine interaction. Wrist-worn surface electromyography (sEMG) has emerged as a promising approach for compact and wearable human-machine interfaces. However, compared to forearm sEMG, the impact of electrode configuration on wrist-based decoding performance remains understudied. We systematically investigated electrode configuration strategies for wrist-based thumb-movement recognition using high-density (HD) and low-density (LD) sEMG measurement systems. We considered factors such as muscle region, reference scheme, channel count, and spatial density of the electrode. Experimental results show that 1) extensor-side electrodes outperform flexor-side electrodes (HD: 0.871 vs. 0.821; LD: 0.769 vs. 0.705); 2) monopolar recordings consistently outperform bipolar configurations (15 channel with HD monopolar vs. LD bipolar: 0.885 vs. 0.823); and 3) increasing channel count enhances performance, but exhibits diminishing returns. We further show that electrode spatial distribution introduces a trade-off between spatial coverage and compactness. The findings suggest that the effectiveness of wrist-worn sEMG systems depends less on the deployment of a large number of electrodes in a broad sensing area and more on the optimization of electrode placement and the referencing scheme. This work provides practical guidelines for developing efficient wrist-worn sEMG-based gesture recognition systems.

Authors:Roni Segal, Matan Lary, Ralf Schmaelzle, Yossi Ben-Zion
Title: Computational Analysis of Speech Clarity Predicts Audience Engagement in TED Talks
Abstract:
What makes a public talk resonate with large audiences? While prior research has emphasized speaker delivery or topic novelty, we reasoned that a core driver of engagement is linguistic clarity. This aligns with theories of processing fluency and cognitive load, which posit that audiences reward speakers who present complex ideas accessibly. We leveraged artificial intelligence to analyze 1,239 TED Talk transcripts (2006--2013), supplemented by a later-phase longitudinal sample. Each transcript was evaluated across 50 independent large language model runs on two dimensions, clarity of explanation and structural organization, and linked to YouTube engagement metrics (likes and views).Clarity emerged as the strongest predictor of audience responses ($β= .339$ for likes; $β= .314$ for views), contributing substantial incremental variance ($ΔR^{2} \approx .095$) beyond duration, topic, and scientific status. The full model explained 29\% of variance in likes and 22.5\% in views. This effect was domain-general, remaining invariant across content categories and between scientific and non-scientific talks. Notably, clarity outperformed traditional readability metrics, indicating that discourse coherence predicts engagement more powerfully than surface-level linguistic simplicity. Longitudinal analyses further revealed standardization within TED, characterized by increasing clarity and reduced variability over time. Theoretically, these results support processing fluency accounts: clearer communication reduces cognitive friction and elicits more positive evaluative responses. Practically, transcript-based clarity represents a scalable and trainable strategy for improving public discourse. By demonstrating that language models can reliably capture latent communicative qualities, this study paves the way for feedback systems in education, science communication, and public speaking.

Authors:Jie Cao, Ha Nguyen, Selim Yavuz, Boran Yu, Shuguang Wang, Pavneet Kaur Bharaj, Dionne Cross Francis
Title: Developing Authentic Simulated Learners for Mathematics Teacher Learning: Insights from Three Approaches with Large Language Models
Abstract:
Large Language Model (LLM) simulations, where LLMs act as students with varying approaches to learning tasks, can support teachers' noticing of student thinking. However, simulations using zero- or few-shot prompting often yield inauthentic knowledge and language, directing teachers to unrealistic reasoning. We evaluate three approaches (Fine-tuning, Multi-agent, and Direct Preference Optimization; DPO) to improve the authenticity and pedagogical utility of simulated students. All approaches improve cognitive and linguistic authenticity, compared with few-shot prompts. Interviews with elementary mathematics pre-service teachers and researchers (\textit{n} = 8) reveal distinct pedagogical affordances. The fine-tuned model produces realistic, brief responses but limits opportunities to extend students' thinking. Meanwhile, the multi-agent and DPO approaches generate explicit reasoning behind student strategies. We discuss implications for designing LLM simulations that balance authenticity with instructional utility for teacher learning.

Authors:Boyang Zhou, Zara Dana
Title: HeartbeatCam: Self-Triggered Photo Elicitation of Stress Events Using Wearable Sensing
Abstract:
People often recognize what triggered their stress only after the moment has passed. In therapy, this can become a recurring problem: clients are asked to remember what happened between sessions, but the details that matter (where they were, what they saw and heard, what was happening around them) are easy to lose. We introduce HeartbeatCam, a wearable sensing system that gathers contextual information during moments of elevated stress. It uses a consumer smartwatch stress signal to trigger capture from an open-source AR glasses camera, recording a sparse image-audio clip that can later be reviewed and annotated. The system adopts an actionable sensing approach to mental healthcare, using physiological signals along with contextual capture to support collaborative interpretation of stress-triggering moments with mental health professionals.

Authors:Jaime Banks, Jianghui Li
Title: Lexical Indicators of Mind Perception in Human-AI Companionship
Abstract:
Mind perception (MP) is a psychological phenomenon in which humans automatically infer that another entity has a mind and/or mental capacities, usually understood in two dimensions (perceived agency and experience capacities). Despite MP's centrality to many social processes, understanding how MP may function in humans' machine companionship relations is limited. This is in part due to reliance on self reports and the gap between automatic MP processes and more purposeful and norm governed expressions of MP. We here leverage MP signaling language to explore the relationship between MP and AI companionship in humans' natural language. We systematically collected discussions about companionship from AI dedicated Reddit forums and examined the cooccurrence of words (a) known to signal agentic and experiential MP and those induced from the data and (b) discussion topics related to AI companionship. Using inductive and deductive approaches, we identify a small set of linguistic indicators as reasonable markers of MP in human/AI chat, and some are linked to critical discussions of companion authenticity and philosophical and ethical imaginaries.

Authors:Zaibei Li, Shunpei Yamaguchi, Qiuchi Li, Daniel Spikol
Title: BadgeX: IoT-Enhanced Wearable Analytics Meets LLMs for Collaborative Learning
Abstract:
We present BadgeX, a novel system integrating lightweight wearable IoT devices (smart badges/smartphones) with Large Language Models (LLMs) to enable real-time collaborative learning analytics. The system captures multimodal sensor data (e.g., audio, image, motion, depth) from learners, processes it into structured features, and employs an LLM-driven framework to interpret these features, generating high-level insights grounded in learning theory. A pilot study demonstrated the system's capability to capture rich collaboration traces and for an LLM to produce plausible, theoretically coherent narrative analyses from sensor-derived features. BadgeX aims to lower deployment barriers, making complex collaborative dynamics visible and offering a pathway for real-time support in educational settings.

Authors:Kwon Ko, Hyoungwook Jin
Title: What Do We Need for an Agentic Society?
Abstract:
Thirty years ago, Wooldridge and Jennings defined intelligent agents through four properties: autonomy, reactivity, pro-activeness, and social ability. Today, advances in AI can empower everyday objects to become such intelligent agents. We call such objects agentic objects and envision that they can form an agentic society: a collective agentic environment that perceives patterns, makes judgments, and takes actions that no single object could achieve alone. However, individual capability does not guarantee coordination. Through an illustrative scenario of a teenager experiencing bullying and depression, we demonstrate both the promise of coordination and its failure modes: false positives that destroy trust, deadlocks that prevent action, and adversarial corruption that poisons judgment. These failures reveal open questions spanning three phases: what to share, how to judge, and when to act. These questions chart a research agenda for building agentic societies.

Authors:Leif Azzopardi, Frans van de Sluis
Title: Seeking Socially Responsible Consumers: Exploring the Intention-"Search"-Behaviour Gap
Abstract:
The increasing prominence of Socially Responsible Consumers has brought about a heightened focus on the ethical, environmental, social, and ideological dimensions influencing product purchasing decisions. Despite this emphasis, studies have consistently revealed a significant gap between individuals' intentions to be socially responsible and their actual purchasing behaviors: they often choose products that do not align with their values. This paper aims to investigate how search in influences this gap. Our investigation involves an online survey of 286 participants, where we inquire about their search behaviors and whether they considered various dimensions, ranging from price and features to environmental, social, and governance issues in relation to a recent purchase. Contrary to expectations of a clear intention-behavior gap, our findings suggest that a considerable number of participants exhibited indifference or lack of information regarding these responsible aspects. While, difficulties related to searching for and acquiring information contributed to the gap, including the limited accessibility and reliability of information. This suggests that part of the intention-behaviour gap can be framed as an information seeking problem. Moreover our findings warrant and motivate search systems that help support consumers make more informed and responsible purchasing decisions.

Authors:Bo-Yu Chen, Chiao-Wei Huang, Lung-Pan Cheng
Title: FlueBricks: A Construction Kit of Flute-like Instruments for Acoustic Reasoning
Abstract:
We present FlueBricks, a construction kit for acoustic reasoning via building and customizing flute-like instruments. By assembling generator, resonator, and connector modules that embody various aeroacoustic properties, users gain deeper understanding of how blowhole, tube length, and tone-hole placement alter onset, pitch, and timbre through hands-on experimentation. This forms a designer-player loop of configuring and playing to form, test, and refine acoustic behaviors-acoustic reasoning-shifting acoustic instruments from static artifacts to dynamic systems. To understand how users engage with this system, we conducted an exploratory study with 12 participants ranging from novices to professional musicians. During their explorations, we observed participants fluently switching between designer and player roles, scaffolding designs from familiar instruments, forming and refining their acoustic understanding of length, tone holes, and generator geometry, reinterpreting modules beyond their intended functions, and using their creations for performative acts such as pedagogical showing and musical expression. These collectively demonstrated FlueBricks's potential as a pedagogical tool for embodied acoustic reasoning.

Authors:Jiawen Stefanie Zhu, Katharina Reinecke, Tanushree Mitra
Title: Language Scent: Exploring Cross-Language Information Navigation
Abstract:
While multilingual users often switch between languages when seeking information, this process remains undersupported by current systems where information is typically siloed by language. Our formative study reveals that users' cross-language transitions are guided by their perceived value of switching to a language, a concept we formalize as language scent. Language scent extends Pirolli and Card's theory of information scent to multilingual scenarios by considering meta-level strategy formation when navigating between different languages. To support language scent, we designed Niffler, a search system that augments language scent and supports cross-language information navigation through contextual cues, in-situ tools, and reflection support. A lab study with 16 multilingual speakers showed that Niffler facilitated the formation and execution of exploratory and granular search strategies and leads to diverse information being gathered. Our findings establish language scent as a valuable lens on cross-language information seeking, highlighting language's role in enabling access to broader information and offering concrete implications for the design of multilingual search systems.

Authors:Shira Michel, Benjamin Taylor, Sabrina Parra Díaz, Joseph B. Wiggins, Ed Finn, Mahsan Nourani
Title: Amplifying Rural Educators' Perspectives: A Qualitative Study of Generative AI's Impact in Rural U.S. High Schools
Abstract:
Recent breakthroughs in Generative AI (GenAI) are reshaping educational landscapes, presenting challenges and opportunities. While all contexts present unique challenges, rural schools are historically under-resourced, facing persistent technology-related barriers. To understand and reduce these barriers, we studied 31 rural high school educators across three U.S. states to examine their use of GenAI and understand how GenAI introduces new challenges, opportunities, and may exacerbate existing educational barriers. Results show while rural educators use GenAI to streamline teaching tasks, existing resource disparities restrict meaningful integration. Through rural educators' voices, we reveal issues like infrastructure barriers, resistance to adoption, and lack of AI literacy training create significant obstacles. Nonetheless, educators envision GenAI can support themselves and their students, but findings emphasize the need for rural-specific design approaches. As a community, embracing inclusive GenAI design and re-examining assumptions about technology adoption in under-served educational contexts is essential to reducing barriers rather than widening them.

Authors:Mika Okamoto, Ansel Kaplan Erol, Mark Riedl
Title: Explainable Model Routing for Agentic Workflows
Abstract:
Modern agentic workflows decompose complex tasks into specialized subtasks and route them to diverse models to minimize cost without sacrificing quality. However, current routing architectures focus exclusively on performance optimization, leaving underlying trade-offs between model capability and cost unrecorded. Without clear rationale, developers cannot distinguish between intelligent efficiency -- using specialized models for appropriate tasks -- and latent failures caused by budget-driven model selection. We present Topaz, a framework that introduces formal auditability to agentic routing. Topaz replaces silent model assignments with an inherently interpretable router that incorporates three components: (i) skill-based profiling that synthesizes performance across diverse benchmarks into granular capability profiles (ii) fully traceable routing algorithms that utilize budget-based and multi-objective optimization to produce clear traces of how skill-match scores were weighed against costs, and (iii) developer-facing explanations that translate these traces into natural language, allowing users to audit system logic and iteratively tune the cost-quality tradeoff. By making routing decisions interpretable, Topaz enables users to understand, trust, and meaningfully steer routed agentic systems.

Authors:Ziheng "Leo" Li, Xichen He, Mengyuan "Millie" Wu, Zeyi Tong, Haowen Wei, Benjamin Yang, Steven Feiner, Paul Sajda
Title: SwEYEpinch: Exploring Intuitive, Efficient Text Entry for Extended Reality via Eye and Hand Tracking
Abstract:
Despite steady progress, text entry in Extended Reality (XR) often remains slower and more effortful than typing on a physical keyboard or touchscreen. We explore a simple idea: use gaze to swipe through a virtual keyboard for the fast, low-effort where and a manual pinch held throughout the swipe for the when, extending and validating it through a series of user studies. We first show that a basic version including a low-latency decoder with spatiotemporal Dynamic Time Warping and fixation filtering outperforms selecting individual keys sequentially, either by finger tapping each or gazing at each while pinching. We then add mid-swipe prediction and in-gesture cancellation, improving words per minute (WPM) without hurting accuracy. We show that this approach is faster and more preferred than previous gaze-swipe approaches, finger tapping with prediction, or hand swiping with the same additions. Furthermore, a seven-day, 30-session study demonstrates sustained learning, with peak performance reaching 64.7 WPM.

Authors:Jinyao Liu, Di Fu
Title: Messages in a Digital Bottle: A Youth-Coauthored Perspective on LLM Chatbots and Adolescent Loneliness
Abstract:
Adolescent loneliness is a growing concern in digitally mediated social environments. This work-in-progress presents a youth-authored critical synthesis on chatbots powered by Large Language Model (LLM) and adolescent loneliness. The first author is a 16-year-old Chinese student who recently migrated to the UK. She wrote the first draft of this paper from her lived experience, supervised by the second author. Rather than treating the youth perspective as one data point among many, we foreground it as the primary interpretive lens, grounded in interdisciplinary literature from social computing, developmental psychology, and Human-Computer Interaction (HCI). We examine how chatbots shape experiences of loneliness differently across adolescent subgroups, including those with anxiety or depression, neurodivergent youth, and immigrant adolescents, and identify both conditions under which they may temporarily reduce isolation and breakdowns that risk deepening it. We derive three population-sensitive design implications. The next phase of this work will expand the youth authorship model to a panel of adolescents across these subgroups, empirically validating the framework presented here.

Authors:Yong Xie, Kexin He, Andres Castellanos-Gomez
Title: Toward Full Autonomous Laboratory Instrumentation Control with Large Language Models
Abstract:
The control of complex laboratory instrumentation often requires significant programming expertise, creating a barrier for researchers lacking computational skills. This work explores the potential of large language models (LLMs), such as ChatGPT, and LLM-based artificial intelligence (AI) agents to enable efficient programming and automation of scientific equipment. Through a case study involving the implementation of a setup that can be used as a single-pixel camera or a scanning photocurrent microscope, we demonstrate how ChatGPT can facilitate the creation of custom scripts for instrumentation control, significantly reducing the technical barrier for experimental customization. Building on this capability, we further illustrate how LLM-assisted tools can be extended into autonomous AI agents capable of independently operating laboratory instruments and iteratively refining control strategies. This approach underscores the transformative role of LLM-based tools and AI agents in democratizing laboratory automation and accelerating scientific progress.

Authors:Daniel Grimes, Rachel M. Harrison
Title: BLK-Assist: A Methodological Framework for Artist-Led Co-Creation with Generative AI Models
Abstract:
This paper presents BLK-Assist, a modular framework for artist-specific fine-tuning of diffusion models using parameter-efficient methods. The system is implemented as a case study with a single professional artist's proprietary corpus and consists of three components: BLK-Conceptor (LoRA-adapted conceptual sketch generation), BLK-Stencil (LayerDiffuse-based transparency-preserving asset generation), and BLK-Upscale (hybrid Real-ESRGAN and texture-conditioned diffusion for high-resolution outputs). We document dataset composition, preprocessing, training configurations, and inference workflows to enable reproducibility with publicly available models to illustrate a privacy-preserving, consent-based approach to human-AI co-creation that maintains stylistic fidelity to the source corpus and can be adapted for other artists under similar constraints.

Authors:Arturo Vazquez Galvez, Christopher Tacca, Isobel Margaret Thompson, Alexander Dawid Bincalar, Christoph Tremmel, Martin Warner, Richard Gomer, Alexander Ng, Chris Freeman, m. c. Schraefel
Title: Incidental Interaction: Technology to Support Elder Strength Training through Everyday Movements
Abstract:
Strength training is a key determinant of healthy aging, yet adherence to formal exercise programs among older adults remains low. While many technologies aim to encourage physical activity in older adults, they typically rely on dedicated devices, wearables, or explicit exercise tasks. They therefore do not embed task practice into daily life. Our new approach, termed Incidental Interaction, instead transforms everyday actions into opportunities for deliberate strength building. It thereby operationalizes everyday movements such as sitting, standing, or lifting objects as strength exercises, encouraging participants to repeat them to build functional capacity. This repetition is encapsulated in the phrase "do it twice", and is combined with movement quality metrics to provide feedback and support progression, without requiring users to adopt new routines or equipment. We illustrate the concept by designing and implementing an ecosystem of instrumented everyday objects and pressure-sensitive mats embedded into ordinary furniture, providing real-time feedback, progress tracking, and motivational cues. To evaluate technical efficacy, we report on two structured pilot deployments with elders (2 week and 4 week studies, n=7).

Authors:Lenard Strahringer, Sven Eric Prüß, Kai Riemer
Title: Help Converts Newcomers, Not Veterans: Generalized Reciprocity and Platform Engagement on Stack Overflow
Abstract:
Generalized reciprocity -- the tendency to help others after receiving help oneself -- is widely theorized as a mechanism sustaining cooperation on online knowledge-sharing platforms. Yet robust empirical evidence from field settings remains surprisingly scarce. Prior studies relying on survey self-reports struggle to distinguish reciprocity from other prosocial motives, while observational designs confound reciprocity with baseline user activity, producing upward-biased estimates. We address these empirical challenges by developing a matched difference-in-differences survival analysis that leverages the temporal structure of help-seeking and help-giving on Stack Overflow. Using Cox proportional hazards models on over 21 million questions, we find that receiving an answer significantly increases a user's propensity to help others, but this effect is concentrated among newcomers and declines with platform experience. This pattern suggests that reciprocity functions primarily as a contributor-recruitment mechanism, operating before platform-specific incentives such as reputation and status displace the general moral impulse to reciprocate. Response time moderates the effect, but non-linearly: reciprocity peaks for answers arriving within a re-engagement window of roughly thirty to sixty minutes. These findings contribute to the theory of generalized reciprocity and have implications for platform design.

Authors:Kenji Saito, Rei Tajika, Satoru Shibuya, Hiroshi Kanno
Title: Generative AI Use in Professional Graduate Thesis Writing: Adoption, Perceived Outcomes, and the Role of a Research-Specialized Agent
Abstract:
This paper reports a survey of generative AI use among 83 MBA thesis students in Japan (target population 230; 36.1% response rate), conducted after thesis examiner evaluation. AI use was nearly universal: 95.2% reported at least some use and 77.1% heavy use. Students engaged AI across the full research-writing workflow - literature review, drafting, and consultation when stuck - reporting benefits centered on clearer argument and structure (82.3%), better revision quality (73.4%), and faster writing (70.9%), with a mean perceived quality improvement of 6.27 out of 7. Concerns about output accuracy (75.9%) and citation handling persisted alongside these gains. Among respondents who rated GAMER PAT, a research-specialized agent, against other AI, preferences significantly favored it for inquiry deepening and structural organization (both p < 0.05, exact binomial). A preliminary qualitative analysis of follow-up interviews further reveals active epistemic vigilance strategies and differentiated tool use across thesis phases. The central implication is not adoption itself but a shift in the educational challenge toward verification, source governance, and AI tool design - with GAMER PAT offering preliminary evidence that research-specialized scaffolding matters.

Authors:Jackson G. Lu, Gerui Gloria Zhao, Anna Manyi Zheng
Title: Generative AI Use in Entrepreneurship: An Integrative Review and an Empowerment-Entrapment Framework
Abstract:
Despite the growing use of generative artificial intelligence (GenAI) in entrepreneurship, research on its impact remains fragmented. To address this limitation, we provide an integrative review of how GenAI influences entrepreneurs at each stage of the entrepreneurial process: (1) opportunity recognition and ideation, (2) opportunity evaluation and commitment, (3) resource assembly and mobilization, and (4) venture launch and growth. Based on our review, we propose the Empowerment-Entrapment Framework to understand how GenAI can both empower and entrap entrepreneurs, highlighting GenAI's role as a double-edged sword at each stage of the entrepreneurial process. For example, GenAI may improve venture idea quality but introduce hallucinations and training data biases; boost entrepreneurial self-efficacy but heighten entrepreneurial overconfidence; increase functional breadth but decrease relational embeddedness; and boost productivity but fuel "workslop" and erode critical thinking, learning, and memory. Moreover, we identify core features of GenAI that underlie these empowering and entrapping effects. We also explore boundary conditions (e.g., entrepreneurs' metacognition, domain expertise, and entrepreneurial experience) that shape the magnitude of these effects. Beyond these theoretical contributions, our review and the Empowerment-Entrapment Framework offer practical implications for entrepreneurs seeking to use GenAI strategically throughout the entrepreneurial process while managing its risks.

Authors:Tanish Taneja, Arihant Tripathy, Nimmi Rangaswamy
Title: Dark Patterns in Indian Quick Commerce Apps: A Student Perspective
Abstract:
As quick commerce (Q-Commerce) platforms in India redefine urban consumption, the use of deceptive design dark patterns to inflate order values has become a systemic concern. This paper investigates the 'Awareness-Action Gap' among Indian university students, a demographic characterized by high digital fluency yet significant financial constraints. Using a qualitative approach with 16 participants, we explore how temporal pressures and convenience-driven architectures override price sensitivity. Our findings reveal that while students recognize manipulative UI tactics, they frequently succumb to them due to induced cognitive load and the normalization of deceptive marketing as a price of capitalism. We conclude by suggesting value-sensitive design alternatives to align commercial incentives with user autonomy in the Global South.

Authors:Sriram Sattiraju, Vaibhav Gollapalli, Aryan Shah, Timothy McMahan
Title: Cognitive Energy Modeling for Neuroadaptive Human-Machine Systems using EEG and WGAN-GP
Abstract:
Electroencephalography (EEG) provides a non-invasive insight into the brain's cognitive and emotional dynamics. However, modeling how these states evolve in real time and quantifying the energy required for such transitions remains a major challenge. The Schrödinger Bridge Problem (SBP) offers a principled probabilistic framework to model the most efficient evolution between the brain states, interpreted as a measure of cognitive energy cost. While generative models such as GANs have been widely used to augment EEG data, it remains unclear whether synthetic EEG preserves the underlying dynamical structure required for transition-based analysis. In this work, we address this gap by using SBP-derived transport cost as a metric to evaluate whether GAN-generated EEG retains the distributional geometry necessary for energy-based modeling of cognitive state transitions. We compare transition energies derived from real and synthetic EEG collected during Stroop tasks and demonstrate strong agreement across group and participant-level analyses. These results indicate that synthetic EEG preserves the transition structure required for SBP-based modeling, enabling its use in data-efficient neuroadaptive systems. We further present a framework in which SBP-derived cognitive energy serves as a control signal for adaptive human-machine systems, supporting real-time adjustment of system behavior in response to user cognitive and affective state.

Authors:Maurice Codourey, Emmanuel A. Gonzalez
Title: The Weak Signal Cultivation Model: A Human-Centric Framework for Frontline Risk Detection, Signal Tracking, and Proactive Organizational Resilience
Abstract:
This white paper introduces the Weak Signal Cultivation Model (WSCM). WSCM is a human-centric framework for detecting, structuring, and tracking weak risk signals as observed by frontline staff. The model centers on a continuous [0,10] x [0,10] coordinate field--the Weak Signal Cultivation Field, in which each identified signal is positioned as a node on two independent dimensions: its current Risk Intensity (x) and its Risk Growth Potential (y). Represented as a risk locus, nodes move across the field over time as new team assessments or measurements arrive. The locus reflects the signal's trajectory across four possible regions: Question Marks, Lit Fuses, Sleeping Cats, and Owls. Through this graphical approach, bridging risk communication from the frontline experience to management decision-making is made through a single organizational vocabulary. The model introduced in this document is designed to serve as a practitioner tool and a conceptual foundation for AI-supported analytics.

Authors:Keshav Shankar, Dan Ding, Wei Gao
Title: Low-Burden LLM-Based Preference Learning: Personalizing Assistive Robots from Natural Language Feedback for Users with Paralysis
Abstract:
Physically Assistive Robots (PARs) require personalized behaviors to ensure user safety and comfort. However, traditional preference learning methods, like exhaustive pairwise comparisons, cause severe physical and cognitive fatigue for users with profound motor impairments. To solve this, we propose a low-burden, offline framework that translates unstructured natural language feedback directly into deterministic robotic control policies. To safely bridge the gap between ambiguous human speech and robotic code, our pipeline uses Large Language Models (LLMs) grounded in the Occupational Therapy Practice Framework (OTPF). This clinical reasoning decodes subjective user reactions into explicit physical and psychological needs, which are then mapped into transparent decision trees. Before deployment, an automated "LLM-as-a-Judge" verifies the code's structural safety. We validated this system in a simulated meal preparation study with 10 adults with paralysis. Results show our natural language approach significantly reduces user workload compared to traditional baselines. Additionally, independent clinical experts confirmed the generated policies are safe and accurately reflect user preferences.

Authors:Elsie Lee-Robbins, Eytan Adar
Title: Assessing Affective Objectives for Communicative Visualizations
Abstract:
Using learning objectives to define designer intents for communicative visualizations can be a powerful design tool. Cognitive and affective objectives are concrete and specific, which can be translated to assessments when creating, evaluating, or comparing visualization ideas. However, while there are many well-validated assessments for cognitive objectives, affective objectives are uniquely challenging. It is easy to see if a visualization helps someone remember the number of patients in a clinic, but harder to observe the change in their attitudes around donations to a crisis. In this work, we define a set of criteria for selecting assessments--from education, advocacy, economics, health, and psychology--that align with affective objectives. We illustrate the use of the framework in a complex affective design task that combines personal narratives and visualizations. Our chosen assessments allow us to evaluate different designs in the context of our objectives and competing psychological theories.

Authors:Shivangi Agarwal, Zoya Ghoshal, Bharat Jain, Siddharth Siddharth
Title: FlexAI: A Multi-modal Solution for Delivering Personalized and Adaptive Fitness Interventions
Abstract:
Personalization of exercise routines is a crucial factor in helping people achieve their fitness goals. Despite this, many contemporary solutions fail to offer real-time, adaptive feedback tailored to an individual's physiological states. Contemporary fitness solutions often rely only on static plans and do not adjust to factors such as a user's pain thresholds, fatigue levels, or form during a workout routine. This work introduces FlexAI, a multi-modal system that integrates computer vision, physiological sensors (heart rate and voice), and the reasoning capabilities of Large Language Models (LLMs) to deliver real-time, personalized workout guidance. FlexAI continuously monitors a user's physical form and level of exertion, among other parameters, to provide dynamic interventions focused on exercise intensity, rest periods, and motivation. To validate our system, we performed a technical evaluation confirming our models' accuracy and quantifying pipeline latency, alongside an expert review where certified trainers validated the correctness of the LLM's interventions. Furthermore, in a controlled study with 25 participants, FlexAI demonstrated significant improvements over a static, non-adaptive control system. With FlexAI, users reported significantly greater enjoyment, a stronger sense of achievement, and significantly lower levels of boredom and frustration. These results indicate that by integrating multi-modal sensing with LLM-driven reasoning, adaptive systems like FlexAI can create a more engaging and effective workout experience. Our work provides a blueprint for integrating multi-modal sensing with LLM-driven reasoning, demonstrating that it is possible to create adaptive coaching systems that are not only more engaging but also demonstrably reliable.

Authors:Roshan Mathew, Roshan L. Peiris
Title: Evaluating the Feasibility of Augmented Reality to Support Communication Access for Deaf Students in Experiential Higher Education Contexts
Abstract:
Deaf and hard of hearing (DHH) students often experience communication barriers in higher education, which are particularly acute in experiential learning environments such as laboratories. Traditional accessibility services, such as interpreting and captioning, often require DHH students to divide their attention between critical tasks, potential safety hazards, instructional materials, and access providers, creating trade-offs between safety and equitable communication. These demands can disrupt task engagement and increase cognitive load in settings that require sustained visual focus, highlighting the limitations of current approaches. To address these challenges, this study investigates Augmented Reality Real-Time Access for Education (ARRAE), an ecosystem based on augmented reality (AR) smart glasses, as a potential intervention for laboratory-based environments. By overlaying interpreters or captions directly into a student's field of view, AR enables the integration of accessibility into hands-on learning without compromising safety or comprehension. Through an empirical study with 12 DHH participants, we evaluate how AR-mediated access influences visual attention patterns and perceived cognitive load during hands-on tasks. The findings suggest that AR-mediated communication shows strong potential to improve attention management and communication accessibility in experiential learning environments, though participants emphasized that accessibility preferences are highly context-dependent. Participants also identified several design and ergonomic challenges, including display positioning, visual fatigue, and compatibility with hearing devices. Together, these results highlight both the promise of AR for supporting accessible participation in visually demanding environments and key design considerations for future systems.

Authors:HyunJoon Jung, William Na
Title: Logarithmic Scores, Power-Law Discoveries: Disentangling Measurement from Coverage in Agent-Based Evaluation
Abstract:
LLM-based agent judges are an emerging approach to evaluating conversational AI, yet a fundamental uncertainty remains: can we trust their assessments, and if so, how many are needed? Through 960 sessions with two model pairs across 15 tasks, we show that persona-based agent judges produce evaluations indistinguishable from human raters in a Turing-style validation. We then identify a score-coverage dissociation: quality scores improve logarithmically with panel size, while unique issue discoveries follow a sublinear power law-both exhibit diminishing returns, but scores saturate roughly twice as fast as discoveries. We hypothesize this reflects a power law distribution of the finding space: critical issues are discovered first by small panels, while corner cases require progressively larger panels, analogous to species accumulation curves in ecology. The mechanism traces to ensemble diversity-Big Five personality conditioning makes agents probe different quality dimensions, with expert judges acting as adversarial probes that push discovery into the tail of the finding distribution. A controlled ablation confirms that structured persona conditioning, not simple prompting, is required to produce these scaling properties.

Authors:Youngwook Do, Yuxi Wu, Gregory D. Abowd, Sauvik Das
Title: Physically-intuitive Privacy and Security: A Design Paradigm for Building User Trust in Smart Sensing Environments
Abstract:
Sensor-based interactive systems -- e.g., "smart" speakers, webcams, and RFID tags -- allow us to embed computational functionality into physical environments. They also expose users to real and perceived privacy risks: users know that device manufacturers, app developers, and malicious third parties want to collect and monetize their personal data, which fuels their mistrust of these systems even in the presence of privacy and security controls. We propose a new design paradigm, physically-intuitive privacy and security (PIPS), which aims to improve user trust by designing privacy and security controls that provide users with simple, physics-based conceptual models of their operation. PIPS consists of three principles: (1) direct physical manipulation of sensor state; (2) perceptible assurance of sensor state; and, (3) intent-aligned sensor (de)activation. We illustrate these principles through three case studies -- Smart Webcam Cover, Powering for Privacy, and On-demand RFID -- each of which has been shown to improve trust relative to existing sensor-based systems.

Authors:Zichao Wang, Alexa Siu
Title: Interview-Informed Generative Agents for Product Discovery: A Validation Study
Abstract:
Large language models (LLMs) have shown strong performance on standardized social science instruments, but their value for product discovery remains unclear. We investigate whether interview-informed generative agents can simulate user responses in concept testing scenarios. Using in-depth workflow interviews with knowledge workers, we created personalized agents and compared their evaluations of novel AI concepts against the same participants' responses. Our results show that agents are distribution-calibrated but identity-imprecise: they fail to replicate the specific individual they are grounded in, yet approximate population-level response distributions. These findings highlight both the potential and the limits of LLM simulation in design research. While unsuitable as a substitute for individual-level insights, simulation may provide value for early-stage concept screening and iteration, where distributional accuracy suffices. We discuss implications for integrating simulation responsibly into product development workflows.

Authors:Xiao Ni, Yiwei Wang, Tianjun Feng, Lauren Xiaoyan Lu, Yitong Wang, Congyi Zhou
Title: Generative AI in Action: Field Experimental Evidence from Alibaba's Customer Service Operations
Abstract:
In collaboration with Alibaba, this study leverages a large-scale field experiment to assess the impact of a generative AI assistant on worker performance in e-commerce after-sales service. Human agents providing digital chat support were randomly assigned with access to a gen AI assistant that offered two core functions: diagnosis of customer issues and solution proposals, presented as text messages. Agents retained discretion to adopt, modify, or disregard AI-generated messages. To evaluate gen AI's impact, we estimate both the intention-to-treat (ITT) effect of gen AI access and the local average treatment effect (LATE) of gen AI usage. Results show that gen AI significantly improved service speed, measured by issue identification time and chat duration. Gen AI also improved subjective service quality reflected in customer ratings and dissatisfaction rates, but it had no significant effect on objective service quality indicated by customer retrial rates. The performance improvements stemmed not only from automation but also from changes in the dynamics of agent-customer interactions: agent communication became more informative and efficient, while customers experienced reduced communication burdens. Low performers achieved the greatest improvements in both service speed and quality, narrowing the performance gap. In contrast, top-performing agents showed little improvement in service speed but experienced declines in both subjective and objective service quality. Evidence suggests that this decline results from increased multitasking tendency, proxied by longer shift-away times across concurrent chats, which slowed customer responses and raised abandonment and retrial rates. These findings suggest that gen AI reshapes work, demanding tailored deployment strategies.

Authors:Atharva Naik, Shounok Kar, Varnika Sharma, Ashwin Rajadesingan, Koustuv Saha
Title: Sima AIunty: Caste Audit in LLM-Driven Matchmaking
Abstract:
Social and personal decisions in relational domains such as matchmaking are deeply entwined with cultural norms and historical hierarchies, and can potentially be shaped by algorithmic and AI-mediated assessments of compatibility, acceptance, and stability. In South Asian contexts, caste remains a central aspect of marital decision-making, yet little is known about how contemporary large language models (LLMs) reproduce or disrupt caste-based stratification in such settings. In this work, we conduct a controlled audit of caste bias in LLM-mediated matchmaking evaluations using real-world matrimonial profiles. We vary caste identity across Brahmin, Kshatriya, Vaishya, Shudra, and Dalit, and income across five buckets, and evaluate five LLM families (GPT, Gemini, Llama, Qwen, and BharatGPT). Models are prompted to assess profiles along dimensions of social acceptance, marital stability, and cultural compatibility. Our analysis reveals consistent hierarchical patterns across models: same-caste matches are rated most favorably, with average ratings up to 25% higher (on a 10-point scale) than inter-caste matches, which are further ordered according to traditional caste hierarchy. These findings highlight how existing caste hierarchies are reproduced in LLM decision-making and underscore the need for culturally grounded evaluation and intervention strategies in AI systems deployed in socially sensitive domains, where such systems risk reinforcing historical forms of exclusion.

Authors:Mohammad Amer Khalil, Raghad Nahas, Ahmad Nassar, Khloud Al Jallad
Title: SyriSign: A Parallel Corpus for Arabic Text to Syrian Arabic Sign Language Translation
Abstract:
Sign language is the primary approach of communication for the Deaf and Hard-of-Hearing (DHH) community. While there are numerous benchmarks for high-resource sign languages, low-resource languages like Arabic remain underrepresented. Currently, there is no publicly available dataset for Syrian Arabic Sign Language (SyArSL). To overcome this gap, we introduce SyriSign, a dataset comprising 1500 video samples across 150 unique lexical signs, designed for text-to-SyArSL translation tasks. This work aims to reduce communication barriers in Syria, as most news are delivered in spoken or written Arabic, which is often inaccessible to the deaf community. We evaluated SyriSign using three deep learning architectures: MotionCLIP for semantic motion generation, T2M-GPT for text-conditioned motion synthesis, and SignCLIP for bilingual embedding alignment. Experimental results indicate that while generative approaches show strong potential for sign representation, the limited dataset size constrains generalization performance. We will release SyriSign publicly, hoping it serves as an initial benchmark.

Authors:Kanak Gautam, Poorvi Bhatia, Parmit K. Chilana
Title: "I Just Need GPT to Refine My Prompts": Rethinking Onboarding and Help-Seeking with Generative 3D Modeling Tools
Abstract:
Learning to use feature-rich software is a persistent challenge, but generative AI tools promise to lower this barrier by replacing complex navigation with natural language prompts. We investigated how people approach prompt-based tools for 3D modeling in an observational study with 26 participants (14 casuals, 12 professionals). Consistent with earlier work, participants skipped tutorials and manuals, relying on trial and error. What differed in the generative AI context was how and why they sought support: the prompt box became the entry point for learning, collapsing onboarding into immediate action, while some casual users turned to external LLMs for prompts. Professionals used 3D expertise to refine iterations and critically evaluated outputs, often discarding models that did not meet their standards, whereas casual users settled for "good enough." We contribute empirical insights into how generative AI reshapes help-seeking, highlighting new practices of onboarding, recursive AI-for-AI support, and shifting expertise in interpreting outputs.

Authors:Alex Berke, Güliz Seray Tuncay, Michael Specter, Mihai Christodorescu
Title: Uncovering Relationships between Android Developers, User Privacy, and Developer Willingness to Reduce Fingerprinting Risks
Abstract:
The major mobile platforms, Android and iOS, have introduced changes that restrict user tracking to improve user privacy, yet apps continue to covertly track users via device fingerprinting. We study the opportunity to improve this dynamic with a case study on mobile fingerprinting that evaluates developers' perceptions of how well platforms protect user privacy and how developers perceive platform privacy interventions. Specifically, we study developers' willingness to make changes to protect users from fingerprinting and how developers consider trade-offs between user privacy and developer effort. We do this via a survey of 246 Android developers, presented with a hypothetical Android change that protects users from fingerprinting at the cost of additional developer effort. We find developers overwhelmingly (89%) support this change, even when they anticipate significant effort, yet prefer the change be optional versus required. Surprisingly, developers who use fingerprinting are six times more likely to support the change, despite being most impacted by it. We also find developers are most concerned about compliance and enforcement. In addition, our results show that while most rank iOS above Android for protecting user privacy, this distinction significantly reduces among developers very familiar with fingerprinting. Thus there is an important opportunity for platforms and developers to collaboratively build privacy protections, and we present actionable ways platforms can facilitate this.

Authors:Ekaterina Torubarova, Jura Miniota, Andre Pereira
Title: Users and Wizards in Conversations: How WoZ Interface Choices Define Human-Robot Interactions
Abstract:
In this paper, we investigated how the choice of a Wizard-of-Oz (WoZ) interface affects communication with a robot from both the user's and the wizard's perspective. In a conversational setting, we used three WoZ interfaces with varying levels of dialogue input and output restrictions: a) a restricted perception GUI that showed fixed-view video and ASR transcripts and let the wizard trigger pre-scripted utterances and gestures; b) an unrestricted perception GUI that added real-time audio from the participant and the robot c) a VR telepresence interface that streamed immersive stereo video and audio to the wizard and forwarded the wizard's spontaneous speech, gaze and facial expressions to the robot. We found that the interaction mediated by the VR interface was preferred by users in terms of robot features and perceived social presence. For the wizards, the VR condition turned out to be the most demanding but elicited a higher social connection with the users. VR interface also induced the most connected interaction in terms of inter-speaker gaps and overlaps, while Restricted GUI induced the least connected flow and the largest silences. Given these results, we argue for more WoZ studies using telepresence interfaces. These studies better reflect the robots of tomorrow and offer a promising path to automation based on naturalistic contextualized verbal and non-verbal behavioral data.

Authors:Neha Puri, Tim Dixon
Title: Designing AI for Real Users -- Accessibility Gaps in Retail AI Front-End
Abstract:
As AI becomes embedded in customer-facing systems, ethical scrutiny has largely focused on models, data, and governance. Far less attention has been paid to how AI is experienced through user-facing design. This commentary argues that many AI front-ends implicitly assume an 'ideal user body and mind', and that this becomes visible and ethically consequential when examined through the experiences of differently abled users. We explore this through retail AI front-ends for customer engagement - i.e., virtual assistants, virtual try-on systems, and hyper-personalised recommendations. Despite intuitive and inclusive framing, these systems embed interaction assumptions that marginalise users with vision, hearing, motor, cognitive, speech and sensory differences, as well as age-related variation in digital literacy and interaction norms. Drawing on practice-led insights, we argue that these failures persist not primarily due to technical limits, but due to the commercial, organisational, and procurement contexts in which AI front-ends are designed and deployed, where accessibility is rarely contractual. We propose front-end assurance as a practical complement to AI governance, aligning claims of intelligence and multimodality with the diversity of real users.

Authors:Huanxing Chen, Aditesh Kumar
Title: Synonymix: Unified Group Personas for Generative Simulations
Abstract:
Generative agent simulations operate at two scales: individual personas for character interaction, and population models for collective behavior analysis and intervention testing. We propose a third scale: meso-level simulation - interaction with group-level representations that retain grounding in rich individual experience. To enable this, we present Synonymix, a pipeline that constructs a "unigraph" from multiple life story personas via graph-based abstraction and merging, producing a queryable collective representation that can be explored for sensemaking or sampled for synthetic persona generation. Evaluating synthetic agents on General Social Survey items, we demonstrate behavioral signal preservation beyond demographic baselines (p<0.001, r=0.59) with demonstrable privacy guarantee (max source contribution <13%). We invite discussion on interaction modalities enabled by meso-level simulations, and whether "high-fidelity" personas can ever capture the texture of lived experience.

Authors:Jayrylle R. Jaylo, Mia Chastain, Alli Nemec, Christina S. Ouch, Yared Asefa, Marcus Li, Andrew Ung, Caleb M. Trujillo
Title: Visualization use in qualitative research reports: Evolving media types and competing epistemologies
Abstract:
Little is known about the representations used in qualitative research studies and why. A data-driven literature review was employed to explore the use of media in qualitative research reporting. A study by Verdinelli & Scagnoli (2013) was replicated and extended by conducting a content analysis of papers and figures published across three qualitative methods journals between 2020 and 2022. Figures were categorized by types (e.g., matrix-based, Venn diagrams, flowcharts) and documents were grouped by their epistemological stances (i.e., objectivist, subjectivist, or constructivist) before conducting a correspondence analysis and epistemic network analysis. Our findings suggest that (1) visual media have remained largely absent, (2) figure types have be come more diverse and (3) the use of figure types is likely independent of epistemological stance but provide opportunities for further exploration. These findings provide a foundation for impactful integration of data visualization tools to enhance communicati ve power of findings across disciplines.

Authors:Yizhe Li, Shixiao Wang, Jian K. Liu
Title: Copilot-Assisted Second-Thought Framework for Brain-to-Robot Hand Motion Decoding
Abstract:
Motor kinematics prediction (MKP) from electroencephalography (EEG) is an important research area for developing movement-related brain-computer interfaces (BCIs). While traditional methods often rely on convolutional neural networks (CNNs) or recurrent neural networks (RNNs), Transformer-based models have shown strong ability in modeling long sequential EEG data. In this study, we propose a CNN-attention hybrid model for decoding hand kinematics from EEG during grasp-and-lift tasks, achieving strong performance in within-subject experiments. We further extend this approach to EEG-EMG multimodal decoding, which yields substantially improved results. Within-subject tests achieve PCC values of 0.9854, 0.9946, and 0.9065 for the X, Y, and Z axes, respectively, computed on the midpoint trajectory between the thumb and index finger, while cross-subject tests result in 0.9643, 0.9795, and 0.5852. The decoded trajectories from both modalities are then used to control a Franka Panda robotic arm in a MuJoCo simulation. To enhance trajectory fidelity, we introduce a copilot framework that filters low-confidence decoded points using a motion-state-aware critic within a finite-state machine. This post-processing step improves the overall within-subject PCC of EEG-only decoding to 0.93 while excluding fewer than 20% of the data points.

Authors:Jakub Masłowski, Jarosław A. Chudziak
Title: Heterogeneous Debate Engine: Identity-Grounded Cognitive Architecture for Resilient LLM-Based Ethical Tutoring
Abstract:
Large Language Models (LLMs) are being increasingly used as autonomous agents in complex reasoning tasks, opening the niche for dialectical interactions. However, Multi-Agent systems implemented with systematically unconstrained systems systematically undergo semantic drift and logical deterioration and thus can hardly be used in providing ethical tutoring where a precise answer is required. Current simulation often tends to degenerate into dialectical stagnation, the agents degenerate into recursive concurrence or circular arguments. A critical challenge remains: how to enforce doctrinal fidelity without suppressing the generative flexibility required for dialectical reasoning? To address this niche, we contribute the Heterogeneous Debate Engine (HDE), a cognitive architecture that combines Identity-Grounded Retrieval-Augmented Generation (ID-RAG) for doctrinal fidelity and Heuristic Theory of Mind for strategic opponent modeling. Our evaluation shows that architectural heterogeneity is a crucial variable to stability: contrary doctrinal initializations (e.g., Deontology vs. Utilitarianism) have increased the Argument Complexity Scores of students by an order of magnitude, over baselines. These findings validate the effectiveness of ID-RAG and Heuristic ToM as architectural requirements in maintaining high-fidelity (adversarial) pedagogy.

Authors:Neelam Modi Jain, Dan J. Wang
Title: Voice-based debate with an AI adversary is associated with increased divergent ideation
Abstract:
Concerns that interacting with generative AI homogenizes human cognition are largely based on evidence from text-based interactions, potentially conflating the effects of AI systems with those of written communication. This study examines whether these patterns depend on communication modality rather than on AI itself. Analyzing 957 open-ended debates between university students and a knowledgeable AI adversary, we show that modality corresponds to distinct structural patterns in discourse. Consistent with classic distinctions between orality and literacy, spoken interactions are significantly more verbose and exhibit greater repetition of words and phrases than text-based exchanges. This redundancy, however, is functional: voice users rely on recurrent phrasing to maintain coherence while exploring a wider range of ideas. In contrast, text-based interaction favors concision and refinement but constrains conceptual breadth. These findings suggest that perceived cognitive limitations attributed to generative AI partly reflect the medium through which it is accessed.

Authors:Irvin Steve Cardenas, Marcus Anthony Arnett, Natalie Catherine Yeo, Lucky Sah, Jong-Hoon Kim
Title: ROSClaw: An OpenClaw ROS 2 Framework for Agentic Robot Control and Interaction
Abstract:
Foundation models can endow robots with open-ended reasoning, language understanding, and adaptive planning, yet connecting a model to a physical robot today requires bespoke integration that couples perception, actuation, and safety to a single model and platform. We present ROSClaw, a model-agnostic executive layer that integrates the OpenClaw agent runtime with ROS 2, enabling any foundation model to perceive, reason about, and act on any ROS-enabled robot through (i) dynamic capability discovery with standardized affordance injection, (ii) multimodal observation normalization, (iii) pre-execution action validation within a configurable safety envelope, and (iv) structured audit logging. Swapping model backends or robot platforms is a configuration change; tool schemas, safety enforcement, and provenance logging remain invariant. We deploy ROSClaw on three platforms (wheeled, quadruped, humanoid) with four foundation-model backends. Under this controlled substrate, models exhibit up to 4.8 x differences in out-of-policy action proposal rates (3.4 x among frontier models alone) and produce qualitatively distinct physical behaviors from identical commands. A cross-framework parity protocol against ROSA confirms that executive-layer design, not just prompt wording, significantly affects both task completion and safety behavior, establishing ROSClaw as both practical agentic-robot infrastructure and a reproducible measurement instrument for embodied AI.

Authors:Yinghao Wang, Cheng Wang
Title: The Observability Gap: Why Output-Level Human Feedback Fails for LLM Coding Agents
Abstract:
Large language model (LLM) multi-agent coding systems typically fix agent capabilities at design time. We study an alternative setting, earned autonomy, in which a coding agent starts with zero pre-defined functions and incrementally builds a reusable function library through lightweight human feedback on visual output alone. We evaluate this setup in a Blender-based 3D scene generation task requiring both spatial reasoning and programmatic geometric control. Although the agent rediscovered core utility functions comparable to a human reference implementation, it achieved 0% full-scene success under output-only feedback across multiple instruction granularities, where success required satisfying object completeness, ground contact, collision avoidance, and scale plausibility simultaneously. Our analysis identifies a structural observability gap: bugs originate in code logic and execution state, while human evaluation occurs only at the output layer, and the many-to-one mapping from internal states to visible outcomes prevents symptom-level feedback from reliably identifying root causes. This mismatch leads to persistent failure mode oscillation rather than convergence. A diagnostic intervention that injected minimal code-level knowledge restored convergence, strongly supporting the interpretation that the main bottleneck lies in feedback observability rather than programming competence. We formalize this phenomenon as a feedback paradox in domains with deep causal chains between internal code logic and perceptual outcomes, and argue that effective human-agent collaboration in such settings requires intermediate observability beyond output-only evaluation.

Authors:Ruoxi Shang, Dan Marshall, Edward Cutrell, Denae Ford
Title: Mimetic Alignment with ASPECT: Evaluation of AI-inferred Personal Profiles
Abstract:
AI agents that communicate on behalf of individuals need to capture how each person actually communicates, yet current approaches either require costly per-person fine-tuning, produce generic outputs from shallow persona descriptions, or optimize preferences without modeling communication style. We present ASPECT (Automated Social Psychometric Evaluation of Communication Traits), a pipeline that directs LLMs to assess constructs from a validated communication scale against behavioral evidence from workplace data, without per-person training. In a case study with 20 participants (1,840 paired item ratings, 600 scenario evaluations), ASPECT-generated profiles achieved moderate alignment with self-assessments, and ASPECT-generated responses were preferred over generic and self-report baselines on aggregate, with substantial variation across individuals and scenarios. During the profile review phase, linked evidence helped participants identify mischaracterizations, recalibrate their own self-ratings, and negotiate context-appropriate representations. We discuss implications for building inspectable, individually scoped communication profiles that let individuals control how agents represent them at work.

Authors:Boyin Yang, Jun Zhao
Title: Exploring a Design Framework for Children's Agency through Participatory Design
Abstract:
Children's agency plays a critical role in shaping children's autonomy, participation, and well-being in their interactions with digital systems, particularly in emerging child-AI contexts. However, how designers currently understand and reason about children's agency in practice remains underexplored. In this paper, we examine designers's engagement with children's agency through a participatory workshop in which we introduce a design-for-agency framework that supports designers externalising the consideration of agency in their design contexts. We find that while participants are committed to implementing ethical AI systems for children, they often struggle to understand why agency matters and how it can be operationalised in practice. Our agency design framework provided designers with a structured way to translate implicit, experience-based judgments into explicit articulation of agency trade-offs while acknowledging the associated design complexity. We conclude by offering initial insights into supporting designers' reasoning about children's agency and outlining directions for future research.

Authors:Md Touhidul Islam, Mahir Akgun, Syed Billah
Title: Shaping Credibility Judgments in Human-GenAI Partnership via Weaker LLMs: A Transactive Memory Perspective on AI Literacy
Abstract:
Generative AI (GenAI) is increasingly used as a knowledge partner in higher education, raising the need for instructional designs that emphasize AI literacy practices such as evaluating output credibility and maintaining human accountability. Existing AI literacy frameworks focus more on what learners should do than on how these practices are enacted in routine student-GenAI collaboration. We address this gap by framing student-GenAI interaction as a transactive memory partnership, where credibility regulates reliance and verification. To make this process visible during coursework, we used a weaker large language model (LLM): small enough to run on most students' computers during class, helpful enough to support learning, but not so capable that it removes the need for verification. In an undergraduate STEM course, students were randomly assigned to one of three conditions across repeated activities: reflection-first (think first, then consult AI), verification-required (use AI, then evaluate the output), or control (unrestricted use). Students completed a transactive memory survey at three time points (N = 42). Weighted credibility diverged by condition over time. ANCOVA controlling for baseline credibility showed a condition effect at mid-semester, F(2, 38) = 4.02, p = .026, partial eta squared = .175, and a stronger effect at post-intervention, F(2, 38) = 5.48, p = .008, partial eta squared = .224; adjusted means were lowest in reflection-first, intermediate in verification-required, and highest in control. Parallel analyses of specialization and coordination were not significant. These findings suggest that workflow sequencing, deliberate use of weaker LLMs, and accountability cues embedded in assignment instructions can recalibrate students' credibility judgments in GenAI use, with reflection-first producing the strongest downward shift in reliance.

Authors:Minsun Kim, Dawon Lee, Junyong Noh
Title: ComVi: Context-Aware Optimized Comment Display in Video Playback
Abstract:
On general video-sharing platforms like YouTube, comments are displayed independently of video playback. As viewers often read comments while watching a video, they may encounter ones referring to moments unrelated to the current scene, which can reveal spoilers and disrupt immersion. To address this problem, we present ComVi, a novel system that displays comments at contextually relevant moments, enabling viewers to see time-synchronized comments and video content together. We first map all comments to relevant video timestamps by computing audio-visual correlation, then construct the comment sequence through an optimization that considers temporal relevance, popularity (number of likes), and display duration for comfortable reading. In a user study, ComVi provided a significantly more engaging experience than conventional video interfaces (i.e., YouTube and Danmaku), with 71.9% of participants selecting ComVi as their most preferred interface.

Authors:Harshitha Voleti, Charalambos Poullis
Title: Designing Fatigue-Aware VR Interfaces via Biomechanical Models
Abstract:
Prolonged mid-air interaction in virtual reality (VR) causes arm fatigue and discomfort, negatively affecting user experience. Incorporating ergonomic considerations into VR user interface (UI) design typically requires extensive human-in-the-loop evaluation. Although biomechanical models have been used to simulate human behavior in HCI tasks, their application as surrogate users for ergonomic VR UI design remains underexplored. We propose a hierarchical reinforcement learning framework that leverages biomechanical user models to evaluate and optimize VR interfaces for mid-air interaction. A motion agent is trained to perform button-press tasks in VR under sequential conditions, using realistic movement strategies and estimating muscle-level effort via a validated three-compartment control with recovery (3CC-r) fatigue model. The simulated fatigue output serves as feedback for a UI agent that optimizes UI element layout via reinforcement learning (RL) to minimize fatigue. We compare the RL-optimized layout against a manually-designed centered baseline and a Bayesian optimized baseline. Results show that fatigue trends from the biomechanical model align with human user data. Moreover, the RL-optimized layout using simulated fatigue feedback produced significantly lower perceived fatigue in a follow-up human study. We further demonstrate the framework's extensibility via a simulated case study on longer sequential tasks with non-uniform interaction frequencies. To our knowledge, this is the first work using simulated biomechanical muscle fatigue as a direct optimization signal for VR UI layout design. Our findings highlight the potential of biomechanical user models as effective surrogate tools for ergonomic VR interface design, enabling efficient early-stage iteration with less reliance on extensive human participation.

Authors:Mohammed Basheikh, Rujiravee Kongdee, Hood Thabit, Bijan Parsia, Sarah Clinch, Simon Harper
Title: Clinician Perspectives on Type 1 Diabetes Guidelines and Glucose Data Interpretation
Abstract:
This study explored healthcare professionals' perspectives on the management of Type 1 Diabetes Mellitus (T1DM) through a two-part questionnaire. The first part examined how clinicians prioritise and apply current clinical guidelines, including the relative importance assigned to different aspects of T1DM management. The second part investigated clinicians' perceptions of patients' ability to interpret data from the glucose monitoring devices and to make appropriate treatment decisions. An online questionnaire was completed by 19 healthcare professionals working in diabetes-related roles in the United Kingdom. The findings revealed that blood glucose management is prioritised within clinical guidance and that advice is frequently tailored to individual patient needs. Additionally, clinicians generally perceive that data presented in glucose monitoring devices is easy for patients to interpret and based on these data, they believe that patients occasionally make correct treatment decisions.

Authors:Moiz Sadiq Awan, Muhammad Haris Noor, Muhammad Salman Munaf
Title: Beyond Benchmarks: How Users Evaluate AI Chat Assistants
Abstract:
Automated benchmarks dominate the evaluation of large language models, yet no systematic study has compared user satisfaction, adoption motivations, and frustrations across competing platforms using a consistent instrument. We address this gap with a cross-platform survey of 388 active AI chat users, comparing satisfaction, adoption drivers, use case performance, and qualitative frustrations across seven major platforms: ChatGPT, Claude, Gemini, DeepSeek, Grok, Mistral, and Llama. Three broad findings emerge. First, the top three platforms (Claude, ChatGPT, and DeepSeek) receive statistically indistinguishable satisfaction ratings despite vast differences in funding, team size, and benchmark performance. Second, users treat these tools as interchangeable utilities rather than sticky ecosystems: over 80% use two or more platforms, and switching costs are negligible. Third, each platform attracts users for different reasons: ChatGPT for its interface, Claude for answer quality, DeepSeek through word-of-mouth, and Grok for its content policy, suggesting that specialization, not generalist dominance, sustains competition. Hallucination and content filtering remain the most common frustrations across all platforms. These findings offer an early empirical baseline for a market that benchmarks alone cannot characterize, and point toward competitive plurality rather than winner-take-all consolidation among engaged users.

Authors:Domenique Zipperling, Lukas Schmidt, Benedikt Hahn, Niklas Kühl, Steven Kimbrough
Title: Integrating Causal Machine Learning into Clinical Decision Support Systems: Insights from Literature and Practice
Abstract:
Current clinical decision support systems (CDSSs) typically base their predictions on correlation, not causation. In recent years, causal machine learning (ML) has emerged as a promising way to improve decision-making with CDSSs by offering interpretable, treatment-specific reasoning. However, existing research often emphasizes model development rather than designing clinician-facing interfaces. To address this gap, we investigated how CDSSs based on causal ML should be designed to effectively support collaborative clinical decision-making. Using a design science research methodology, we conducted a structured literature review and interviewed experienced physicians. From these, we derived eight empirically grounded design requirements, developed seven design principles, and proposed nine practical design features. Our results establish guidance for designing CDSSs that deliver causal insights, integrate seamlessly into clinical workflows, and support trust, usability, and human-AI collaboration. We also reveal tensions around automation, responsibility, and regulation, highlighting the need for an adaptive certification process for ML-based medical products.

Authors:Soonho Kwon, Dong Whi Yoo, Younah Kang
Title: AI Fortune-Teller: Juxtaposing Shaman and AI to Reveal Human Agency in the Age of AI
Abstract:
This speculative video piece showcases participants interacting with a career counseling AI agent, unaware that the responses were actually derived from the fortunetelling of a mudang (a Korean traditional shaman). Our work captures this deception and documents participants' reactions, showcasing shifts in their initial perceptions of the agent's advice following the reveal. Notably, even after learning that the advice came from a mudang rather than an AI, participants did not change their initial attitudes toward the advice they received. This raises questions about the perceived importance of AI's explainability and accuracy. By juxtaposing scientific and pre-scientific approaches, we aim to provoke discussions on human agency in the age of AI. We argue that, regardless of AI's advancements, we continue to navigate life in fundamentally human ways -- wonderfully messy and uncertain.

Authors:Teerthaa Parakh, Karen M. Feigh
Title: Biased Error Attribution in Multi-Agent Human-AI Systems Under Delayed Feedback
Abstract:
Human decision-making is strongly influenced by cognitive biases, particularly under conditions of uncertainty and risk. While prior work has examined bias in single-step decisions with immediate outcomes and in human interaction with a single autonomous agent, comparatively little attention has been paid to decision-making under delayed outcomes involving multiple AI agents, where decisions at each step affect subsequent states. In this work, we study how delayed outcomes shape decision-making and responsibility attribution in a multi-agent human-AI task. Using a controlled game-based experiment, we analyze how participants adjust their behavior following positive and negative outcomes. We observe asymmetric responses to gains and losses, with stronger corrective adjustments after negative outcomes. Importantly, participants often fail to correctly identify the actions that caused failure and misattribute responsibility across AI agents, leading to systematic revisions of decisions that are weakly related to the underlying causes of poor performance. We refer to this phenomenon as a form of attribution bias, manifested as biased error attribution under delayed feedback. Our findings highlight how cognitive biases can be amplified in human-AI systems with delayed outcomes and multiple autonomous agents, underscoring the need for decision-support systems that better support causal understanding and learning over time.

Authors:Alexandre De Masi, Sergio Manzano, Johan N. Siebert, Frederic Ehrler
Title: Who Is in the Room? Stakeholder Perspectives on AI Recording in Pediatric Emergency Care
Abstract:
Artificial intelligence systems that record voice and video during pediatric emergencies are emerging as human-computer interaction (HCI) technologies with direct implications for clinical work, promising improvements in documentation, team performance, and post-event debriefing. Yet the perspectives of those most affected, including clinicians, parents, and child patients, remain largely absent from the design and governance of these technologies. This position paper argues that this has direct consequences for the legitimacy and effectiveness of these systems. We examine four areas where these missing perspectives prove consequential (consent, emotional impact, surveillance dynamics, and participatory governance) and propose four positions for reorienting AI recording in pediatric emergency care toward stakeholder-centered HCI inquiry.

Authors:Adrian Sauter, Mona Schirmer
Title: Between Rules and Reality: On the Context Sensitivity of LLM Moral Judgment
Abstract:
A human's moral decision depends heavily on the context. Yet research on LLM morality has largely studied fixed scenarios. We address this gap by introducing Contextual MoralChoice, a dataset of moral dilemmas with systematic contextual variations known from moral psychology to shift human judgment: consequentialist, emotional, and relational. Evaluating 22 LLMs, we find that nearly all models are context-sensitive, shifting their judgments toward rule-violating behavior. Comparing with a human survey, we find that models and humans are most triggered by different contextual variations, and that a model aligned with human judgments in the base case is not necessarily aligned in its contextual sensitivity. This raises the question of controlling contextual sensitivity, which we address with an activation steering approach that can reliably increase or decrease a model's contextual sensitivity.

Authors:Michael Klesel, Uwe Messer
Title: Good for the Planet, Bad for Me? Intended and Unintended Consequences of AI Energy Consumption Disclosure
Abstract:
To address the high energy consumption of artificial intelligence, energy consumption disclosure (ECD) has been proposed to steer users toward more sustainable practices, such as choosing efficient small language models (SLMs) over large language models (LLMs). This presents a performance-sustainability trade-off for users. In an experiment with 365 participants, we explore the impact of ECD and the perceptual and behavioral consequences of choosing an SLM over an LLM. Our findings reveal that ECD is a highly effective measure to nudge individuals toward a pro-environmental choice, increasing the odds of choosing an energy efficient SLM over an LLM by more than 12. Interestingly, this choice did not significantly impact subsequent behavior, as individuals who selected an SLM and those who selected an LLM demonstrated similar prompt behavior. Nevertheless, the choice created a perceptual bias. A placebo effect emerged, with individuals who selected the "eco-friendly" SLM reporting significantly lower satisfaction and perceived quality. These results highlight the double-edged nature of ECD, which holds critical implications for the design of sustainable human-computer interactions.

Authors:Wanying Mo, Jijia Lai, Xiaoming Wang
Title: IntentWeave: A Progressive Entry Ladder for Multi-Surface Browser Agents in Cloud Portals
Abstract:
Browser agents built on LLMs can act in web interfaces, yet most remain confined to a single chat surface (e.g., a sidebar). This mismatch with real browsing can increase context-switching and reduce user control. We introduce \textbf{IntentWeave}, a design space of ten spatial paradigms for embedding agentic assistance across a browser, organized as a progressive entry ladder from micro-interventions to dedicated workspaces. We implement IntentWeave as a browser-extension prototype on the Alibaba Cloud website and compare three entry strategies in a within-subjects study (N=16). Workspace-heavy strategies reduced completion time but lowered perceived control; micro-only strategies preserved control but were often insufficient; a mixed sidecar approach achieved the highest satisfaction. We conclude with guidance for escalating and retreating agent surfaces without disrupting user agency.

Authors:Yamato Miyatake, Parinya Punpongsanon
Title: TastePrint: A 3D Food Printing System for Layer-wise Taste Distribution via Airbrushed Liquid Seasoning
Abstract:
3D food printing enables the customization of food shapes and textures, but typically produces uniform taste profiles due to the limited diversity of printable materials. We present TastePrint, a 3D food printing system that achieves layer-wise spatial taste distribution by dynamically applying liquid seasonings with a programmable airbrush during fabrication. The system integrates (1) a graphical user interface (GUI) that allows users to import 3D models, slice them into layers, and specify spray positions and intensities for each layer, and (2) a customized 3D food printer equipped with a multi-nozzle spray mechanism. We evaluated the system through technical experiments quantifying spray resolution and deposition accuracy, together with an exploratory usability study involving three home cooks designing personalized taste patterns. The spray-resolution model achieved R2 = 0.86, the spray-amount model achieved R2 = 0.99, and participants completed the design task in approximately 15 min on average. These results indicate that TastePrint can control seasoning placement and quantity with good repeatability while supporting exploratory taste-design workflows. This work establishes a technical foundation for decoupling food geometry from taste design and motivates future sensory studies on personalized, multisensory food fabrication.

Authors:Mehul Parmar, Chaklam Silpasuwanchai
Title: From Overload to Convergence: Supporting Multi-Issue Human-AI Negotiation with Bayesian Visualization
Abstract:
As AI systems increasingly mediate negotiations, understanding how the number of negotiated issues impacts human performance is crucial for maintaining human agency. We designed a human-AI negotiation case study in a realistic property rental scenario, varying the number of negotiated issues; empirical findings show that without support, performance stays stable up to three issues but declines as additional issues increase cognitive load. To address this, we introduce a novel uncertainty-based visualization driven by Bayesian estimation of agreement probability. It shows how the space of mutually acceptable agreements narrows as negotiation progresses, helping users identify promising options. In a within-subjects experiment (N=32), it improved human outcomes and efficiency, preserved human control, and avoided redistributing value. Our findings surface practical limits on the complexity people can manage in human-AI negotiation, advance theory on human performance in complex negotiations, and offer validated design guidance for interactive systems.

Authors:Olivia Yan Huang, Monika Stodolska, Sharifa Sultana
Title: Emotional Support with Conversational AI: Talking to Machines About Life
Abstract:
AI companion chatbots are increasingly used for emotional support, with prior work in the domain predominantly documenting their mixed psychosocial impacts, including both increased emotional expression and heightened loneliness. However, most existing research primarily focuses on outcome-level effects, offering limited insight into how emotional support is produced through interaction. In this paper, we examine emotional support as an interactional and socially situated process. Drawing on qualitative analysis of Reddit discussions, we analyze how users engage with AI companions and how these interactions are interpreted and contested within online communities. We show that emotional support is coconstructed through conversational mechanisms such as validation, reflective prompting, and companionship, while also giving rise to tensions including support versus dependency, validation versus delusion, and accessibility versus harm. Importantly, support extends beyond human AI interaction and is shaped by community responses that legitimize or challenge AI-mediated care. Hence, we reconceptualize AI emotional support as a negotiated socio-technical process and derive implications for the design of responsible, context-sensitive AI systems.

Authors:Greg Nyilasy, Abraham Ryan Ade Putra Hito, Jennifer Overbeck, Brock Bastian, Darren W. Dahl
Title: Do Consumers Accept AIs as Moral Compliance Agents?
Abstract:
Consumers are generally resistant to Artificial Intelligence (AI) involvement in moral decision-making, perceiving moral agency as requiring uniquely human traits. This research investigates whether consumers might instead accept AIs in the role of moral compliance, where AI upholds pre-existing moral norms without exercising subjective discretion. Across five studies this research shows that consumers evaluate AI more positively than human agents in moral compliance roles. The findings reveal that this preference arises from inferences of AI's lack of ulterior motives, which are often attributed to human agents. While previous studies have focused on AI as a decision-maker, this work demonstrates the critical role of upholding pre-existing rules, a role in which AI is perceived to excel. These findings contribute to understanding consumer acceptance of moral AI and provide actionable insights for organizations seeking to leverage AI in ethical oversight. By positioning AI as a moral compliance agent, companies can address consumer skepticism, enhance trust, and improve perceptions of corporate ethicality.

Authors:Tanya Rudberg Selin, Danielle Unéus, Søren Knudsen
Title: "Chasing Shadows": Understanding Personal Data Externalization and Self-Tracking for Neurodivergent Individuals
Abstract:
We examine how neurodivergent individuals experience creating, interacting with, and reflecting on personal data about masking. Although self-tracking is often framed as enabling self-insight, this is rarely our experience as neurodivergent individuals and researchers. To better understand this disconnect, we conducted a two-phase qualitative study. First, a workshop where six participants with autism and/or ADHD crafted visual representations of masking experiences. Then, three participants continued by designing and using personalized self-tracking focused on unmasking over two weeks. Using reflexive thematic analysis of activities and interviews, we find that self-tracking imposes substantial interpretive and emotional demands, shaped by context-dependencies that challenge assumptions in self-tracking. We also find that facilitated sharing of experiences might validate emotional responses and support reflection. We identify three emotional dimensions that shape engagement with personal data in a working model of emotion in self-tracking, and discuss implications for designing self-tracking and reflective practices that incorporate peer support and better account for context and emotional labor.

Authors:Dorottya Demszky, Christopher Mah, Helen Higgins
Title: Practitioner Voices Summit: How Teachers Evaluate AI Tools through Deliberative Sensemaking
Abstract:
Teachers face growing pressure to integrate AI tools into their classrooms, yet are rarely positioned as agentic decision-makers in this process. Understanding the criteria teachers use to evaluate AI tools, and the conditions that support such reasoning, is essential for responsible AI integration. We address this gap through a two-day national summit in which 61 U.S. K-12 mathematics educators developed personal rubrics for evaluating AI classroom tools. The summit was designed to support deliberative sensemaking, a process we conceptualize by integrating Technological Pedagogical Content Knowledge (TPACK) with deliberative agency. Teachers generated over 200 criteria - initial articulations spanning four higher-order themes (Practical, Equitable, Flexible, and Rigorous) - that addressed both AI outputs and the process of using AI. Criteria contained productive tensions (e.g., personalization versus fairness, adaptability versus efficiency), and the vast majority framed AI as an assistant rather than a coaching tool for professional learning. Analysis of surveys, interviews, and summit discussions revealed five mechanisms supporting deliberative sensemaking: time and space for deliberation, artifact-centered sensemaking, collaborative reflection through diverse viewpoints, knowledge-building, and psychological safety. Across these mechanisms, TPACK and agency operated in a mutually reinforcing cycle - knowledge-building enabled more grounded evaluative judgment, while the act of constructing criteria deepened teachers' understanding of tools. We discuss implications for edtech developers seeking practitioner input, school leaders making adoption decisions, educators and professional learning designers, and researchers working to elicit teachers' evaluative reasoning about rapidly evolving technologies.

Authors:Taizhou Chen, Kai Chen, Xingyu Liu, Pingchuan Ke, Zhida Sun
Title: BadminSense: Enabling Fine-Grained Badminton Stroke Evaluation on a Single Smartwatch
Abstract:
Evaluating badminton performance often requires expert coaching, which is rarely accessible for amateur players. We present BadminSense, a smartwatch-based system for fine-grained badminton performance analysis using wearable sensing. Through interviews with experienced badminton players, we identified four system design requirements with three implementation insights that guide the development of BadminSense. We then collected a badminton strokes dataset on 12 experienced badminton amateurs and annotated it with fine-grained labels, including stroke type, expert-assessed stroke rating, and shuttle impact location. Built on this dataset, BadminSense segments and classifies strokes, predicts stroke quality, and estimates shuttle impact location using vibration signal from an off-the-shelf smartwatch. Our evaluations show that BadminSense achieves a stroke classification accuracy of 91.43%, an average quality rating error of 0.438, and an average impact location estimation error of 12.9%. A real-world usability study further demonstrates BadminSense's potential to provide reliable and meaningful support for daily badminton practice.

Authors:Kuangzhe Xu, Yu Shen, Longjie Yan, Yinghui Ren
Title: Cognitive Agency Surrender: Defending Epistemic Sovereignty via Scaffolded AI Friction
Abstract:
The proliferation of Generative Artificial Intelligence has transformed benign cognitive offloading into a systemic risk of cognitive agency surrender. Driven by the commercial dogma of "zero-friction" design, highly fluent AI interfaces actively exploit human cognitive miserliness, prematurely satisfying the need for cognitive closure and inducing severe automation bias. To empirically quantify this epistemic erosion, we deployed a zero-shot semantic classification pipeline ($τ=0.7$) on 1,223 high-confidence AI-HCI papers from 2023 to early 2026. Our analysis reveals an escalating "agentic takeover": a brief 2025 surge in research defending human epistemic sovereignty (19.1%) was abruptly suppressed in early 2026 (13.1%) by an explosive shift toward optimizing autonomous machine agents (19.6%), while frictionless usability maintained a structural hegemony (67.3%). To dismantle this trap, we theorize "Scaffolded Cognitive Friction," repurposing Multi-Agent Systems (MAS) as explicit cognitive forcing functions (e.g., computational Devil's Advocates) to inject germane epistemic tension and disrupt heuristic execution. Furthermore, we outline a multimodal computational phenotyping agenda -- integrating gaze transition entropy, task-evoked pupillometry, fNIRS, and Hierarchical Drift Diffusion Modeling (HDDM) -- to mathematically decouple decision outcomes from cognitive effort. Ultimately, intentionally designed friction is not merely a psychological intervention, but a foundational technical prerequisite for enforcing global AI governance and preserving societal cognitive resilience.

Authors:Evangelos Karapanos, Ruben Gouveia
Title: Contrasting Perspectives on Engagement Across Three Digital Behavior Change Interventions
Abstract:
We contrast three perspectives on engagement from three projects on the design of Digital Behavior Change Interventions (DBCIs), all conducted as part of the PhD thesis of the second author. We provide a reflection on this work with respect to engagement, discussing the motivation, the assumed effects of engagement, the measures of engagements and key insights of each project, as the well as the strategies employed to increase engagement.

Authors:Zaid Ahmed, Omar A. Khan, Hyeongil Nam, Kangsoo Kim
Title: Exploring Experiential Differences Between Virtual and Physical Memory-Linked Objects in Extended Reality
Abstract:
Extended Reality (XR) enables immersive capture and re-experience of personal memories, yet how interface representations shape these experiences remains underexplored. We examine how users relive and share XR memories through three interaction approaches: (1) physical memory-linked objects, (2) virtual memory-linked objects, and (3) a conventional virtual gallery interface. In a within-subjects study (N=24, 12 pairs), participants captured shared experiences using 360° video and later accessed and shared these memories across the three interfaces. We analyzed open-ended qualitative responses focusing on perceived value, enjoyment, usability, emotional attachment, and social connection. The findings reveal trade-offs: physical objects fostered stronger social connection and conversation through tangible exchange; virtual objects balanced engagement and usability; and the gallery interface was efficient but less personal. These results suggest that object-based representations, physical and virtual, support key social dimensions of XR memory experiences, offering lessons for designing future systems that emphasize shared meaning and interpersonal connection.

Authors:Mulong Xie, Yang Xie
Title: Software as Content: Dynamic Applications as the Human-Agent Interaction Layer
Abstract:
Chat-based natural language interfaces have emerged as the dominant paradigm for human-agent interaction, yet they fundamentally constrain engagement with structured information and complex tasks. We identify three inherent limitations: the mismatch between structured data and linear text, the high entropy of unconstrained natural language input, and the lack of persistent, evolving interaction state. We introduce Software as Content (SaC), a paradigm in which dynamically generated agentic applications serve as the primary medium of human-agent interaction. Rather than communicating through sequential text exchange, this medium renders task-specific interfaces that present structured information and expose actionable affordances through which users iteratively guide agent behavior without relying solely on language. These interfaces persist and evolve across interaction cycles, transforming from transient responses into a shared, stateful interaction layer that progressively converges toward personalized, task-specific software. We formalize SaC through a human-agent-environment interaction model, derive design principles for generating and evolving agentic applications, and present a system architecture that operationalizes the paradigm. We evaluate across representative tasks of selection, exploration, and execution, demonstrating technical viability and expressive range, while identifying boundary conditions under which natural language remains preferable. By reframing interfaces as dynamically generated software artifacts, SaC opens a new design space for human-AI interaction, positioning dynamic software as a concrete and tractable research object.

Authors:Emmanuel Apaaboah, Bernard Opoku, the GhanaHousePlanner Research Team
Title: A Parametric, Geometry-Aware Residential Construction Cost Estimation Model for Ghana: Design, Validation, and the "Completeness Gap" in Informal Contractor Quotes
Abstract:
Ghana faces a residential housing deficit of two million units. A key driver of project failure is the "completeness gap", a systematic discrepancy between informal contractor quotes and actual costs. Informal estimates often use flat per-square-metre pricing that omits essential structural and finishing components, leading to project abandonment mid-construction. This paper validates a parametric, geometry-aware cost estimation model via the GhanaHousePlanner (GHP) platform. The model provides self-builders with itemised bills of quantities (BoQ) reflecting the true cost of code-compliant construction in Ghana. The GHP model uses seven calculation modules: foundation, blockwork, cement, structural steel, roofing, plumbing, and electrical. It features a primary geometry-based mode and a formula-based fallback. Accuracy was tested using three case studies (75, 120, and 200 per-square-metre homes) benchmarked against February 2026 market prices in Greater Accra.GHP estimates (GHS 519,000 to GHS 1,398,000) were 29 to 98 per cent higher than typical informal quotes. This gap arises from the omission of structural steel (Y16 rebar), plastering, floor screed, and full services in informal estimates. Findings confirm that per-square-metre rates rarely cover the requirements for a fully completed, code-compliant building. The GHP model offers a transparent, auditable alternative to informal quoting. Despite material price volatility and labour market informality, the tool provides a framework for improving cost predictability and reducing project stalling in the sub-Saharan African housing market.

Authors:Pranav Hemanth, Sampriti Saha
Title: Conversation Tree Architecture: A Structured Framework for Context-Aware Multi-Branch LLM Conversations
Abstract:
Large language models (LLMs) are increasingly deployed for extended, multi-topic conversations, yet the flat, append-only structure of current conversation interfaces introduces a fundamental limitation: all context accumulates in a single unbounded window, causing topically distinct threads to bleed into one another and progressively degrade response quality. We term this failure mode logical context poisoning. In this paper, we introduce the Conversation Tree Architecture (CTA), a hierarchical framework that organizes LLM conversations as trees of discrete, context-isolated nodes. Each node maintains its own local context window; structured mechanisms govern how context flows between parent and child nodes, downstream on branch creation and upstream on branch deletion. We additionally introduce volatile nodes, transient branches whose local context must be selectively merged upward or permanently discarded before purging. We formalize the architecture's primitives, characterize the open design problems in context flow, relate our framework to prior work in LLM memory management, and describe a working prototype implementation. The CTA provides a principled foundation for structured conversational context management and extends naturally to multi-agent settings.

Authors:Joyce S. Y. Lau, Zihui Jing, Clement P. L. Chan, Louis C. F. Ng, Wing Chin Kam, Kwan Yin Lam, Ho Wui Cheung, Ho Lam Lau, Junpei Zhong
Title: Development and Usability Study of Older Adults in Motion-Captured Serious Game Incorporating Olfactory Stimulations
Abstract:
SENSO is a motion-captured virtual reality serious game utilizing multisensory (visual, auditory, olfactory) stimuli to enhance cognitive and motor functions in older adults. This study evaluated its usability and performance among healthy seniors to establish normative baselines for predicting mild cognitive impairment (MCI) and dementia risk. Methods: Forty-one older adults (aged 60 and older) completed three teahouse-themed tasks: Dim Sum (selection and placement), Steamer (timing and sequencing), and Cashier (counting and transactions). Usability was assessed via the System Usability Scale (SUS), alongside age-stratified performance metrics (accuracy, completion time) from system logs. Results: Usability was rated highly (mean SUS score = 82/100). Performance varied by task complexity: the Dim Sum task showed no age-related differences, the Cashier task showed moderate decline trends, and the Steamer task revealed significant age-related declines due to higher cognitive and motor demands. Conclusion: SENSO demonstrates strong usability and provides effective baselines for cognitive assessment. Adapting complex tasks - such as enhancing olfactory cues in the Steamer game - can optimize its therapeutic efficacy as a non-pharmacological intervention for cognitive preservation.

Authors:Kazi Ababil Azam, Imtiaz Karim, Dipto Das
Title: Tracing Users' Privacy Concerns Across the Lifecycle of a Romantic AI Companion
Abstract:
Romantic AI chatbots have quickly attracted users, but their emotional use raises concerns about privacy and safety. As people turn to these systems for intimacy, comfort, and emotionally significant interaction, they often disclose highly sensitive information. Yet the privacy implications of such disclosure remain poorly understood in platforms shaped by persistence, intimacy, and opaque data practices. In this paper, we examine public Reddit discussions about privacy in romantic AI chatbot ecosystems through a lifecycle lens. Analyzing 2,909 posts from 79 subreddits collected over one year, we identify four recurring patterns: disproportionate entry requirements, intensified sensitivity in intimate use, interpretive uncertainty and perceived surveillance, and irreversibility, persistence, and user burden. We show that privacy in romantic AI is best understood as an evolving socio-technical governance problem spanning access, disclosure, interpretation, retention, and exit. These findings highlight the need for privacy and safety governance in romantic AI that is staged across the lifecycle of use, supports meaningful reversibility, and accounts for the emotional vulnerability of intimate human-AI interaction.

Authors:Minh Triet Pham, Quynh Chi Dang, Le Nhat Tan
Title: Deep Attention-based Sequential Ensemble Learning for BLE-Based Indoor Localization in Care Facilities
Abstract:
Indoor localization systems in care facilities enable optimization of staff allocation, workload management, and quality of care delivery. Traditional machine learning approaches to Bluetooth Low Energy (BLE)-based localization treat each temporal measurement as an independent observation, fundamentally limiting their performance. To address this limitation, this paper introduces Deep Attention-based Sequential Ensemble Learning (DASEL), a novel framework that reconceptualizes indoor localization as a sequential learning problem. The framework integrates frequency-based feature engineering, bidirectional GRU networks with attention mechanisms, multi-directional sliding windows, and confidence-weighted temporal smoothing to capture human movement trajectories. Evaluated on real-world data from a care facility using 4-fold temporal cross-validation, DASEL achieves a macro F1 score of 0.4438, representing a 53.1% improvement over the best traditional baseline (0.2898).

Authors:Judit Martinez Moreno, Markus Christen, Abraham Bernstein
Title: Towards an AI Buddy for every University Student? Exploring Students' Experiences, Attitudes and Motivations towards AI and AI-based Study Companions
Abstract:
Despite the widespread integration of generative artificial intelligence (GenAI) tools in higher education, there is limited empirical insight into students' experiences, competences, and readiness to adopt personalized AI companions. To address this gap, this study investigates three key questions: (RQ1) What are students' prior experiences with AI tools, their perceived digital and AI-related competences, and their interest in emerging technologies?; (RQ2) How do students perceive a hypothetical "AI Buddy" (a digital companion designed to support students throughout their academic journey) including adoption, benefits, and concerns?; (RQ3) How does students' willingness to adopt an AI Buddy relate to motivations for engaging in traditional academic activities? Based on a survey of 926 students at a Swiss university, students revealed widespread prior use of AI, primarily for text-based and productivity tasks, with moderate self-assessed digital competence. Students expressed strong enthusiasm for adopting an AI Buddy, valuing its potential for time efficiency, personalized academic support, and study organization, but expressed significant concerns about data privacy and over-reliance. A weak negative correlation emerged between AI Buddy adoption willingness and motivations for attending lectures or using library resources, while social and collaborative motivations remained unaffected. These findings suggest that AI Buddies may partially replace information-seeking behaviours but preserve the social fabric of university life. This study provides practical recommendations including the need for robust privacy protections and critical engagement strategies to ensure AI Buddies enhance, rather than undermine, the academic and communal value of higher education.

Authors:Ke Ma, Francesca Valsecchi, Yuchen Tan, Mingjia Ji, Junru Shen, Xiaoya Ma, Duan Wu, Jiao Mo, Shijian Zhao
Title: A 4R-supported circular product-service system for luxury branded events
Abstract:
Temporary luxury branded events run on short cycles and bespoke builds that accelerate material churn. We present a circular phygital product-service system that operationalises the circular economy (CE) through a 4R frame (Refuse, Reduce, Reuse, and Recycling) across warehouse-to-event journeys. Developed via a multi-method design inquiry with a tier-1 contractor, the system couples physical touchpoints (reusable fold-flat transit boxes, adjustable racking, standard labels) with digital orchestration (a live digital warehouse, list-based outbound/inbound workflow, and a sustainable materials library). The architecture aligns roles and decisions, protects and identifies assets, and makes reuse the default under luxury brand constraints. By embedding traceable actions and CE-aligned rules into everyday handoffs, the PSS shifts procurement, storage, dispatch, return, and redeployment toward value retention. The contribution is a replicable, practice-ready route from circular intent to operational change in branded environments, advancing responsible retail without compromising speed or aesthetic standards.

Authors:Alex Apffel, Huy Tran, Vuthea Chheang
Title: Nevis Digital Twin: Photogrammetry and Immersive Visualization of Historical Sites
Abstract:
In this work, we present a multimodal data acquisition workflow for the digital preservation and virtual reconstruction of at-risk historical sites in the island of Nevis. Facing threats from coastal erosion, rising sea levels, and aggressive vegetation, the archaeological heritage of Nevis requires documentation strategies that bridge the gap between high-cost professional surveying and consumer accessibility. Experimental test compared acquisition variables, specifically camera height (1m vs. 3m) and operator trajectory against high-resolution control data. Moreover, we explore the virtual reconstruction between mesh reconstruction and 3D gaussian splatting to serve as different modalities for documentation. The resulting data is fused into immersive virtual reality (VR) environments, offering a scalable, non-proprietary model for democratizing digital heritage in the Caribbean.

Authors:Martin Sanchez, Nick Tran, Vuthea Chheang
Title: Towards Extended Reality Intelligence for Monitoring and Predicting Patient Readmission Risks
Abstract:
Hospital readmissions remain a challenge for healthcare systems, especially among patients with chronic conditions such as diabetes. Unplanned readmissions within 30 days are costly, strain hospital resources, and can indicate poor care coordination or discharge planning. In this work, we explore the use of machine learning to predict readmission risk for diabetic inpatients and propose a mixed reality (MR) to provide effective visualization and insights. We trained an XGBoost classifier after data cleaning, encoding, and feature engineering. The model achieved an Area Under the Receiver Operating characteristic Curve (AUROC) of 0.72 and an Area Under the Precision-Recall Curve (AUPRC) of 0.11. Key predictive factors included prior inpatient visits, discharge disposition, and glycemic control indicators such as A1C (blood sugar test) results and medication adjustments. Additionally, we developed an MR prototype that visualize patient records and predictions containing risk level, major contributing factors, and a concise summary of care. Together, the predictive model and the MR interface aim to improve clinician awareness and communication around readmission risk in real-time clinical settings.

Authors:Abdul Aziz Snoubara, Baraa Al_Maradni, Haya Al_Naal, Malek Al_Madrmani, Roaa Jdini, Seedra Zarzour, Khloud Al Jallad
Title: Abjad-Kids: An Arabic Speech Classification Dataset for Primary Education
Abstract:
Speech-based AI educational applications have gained significant interest in recent years, particularly for children. However, children speech research remains limited due to the lack of publicly available datasets, especially for low-resource languages such as Arabic.This paper presents Abjad-Kids, an Arabic speech dataset designed for kindergarten and primary education, focusing on fundamental learning of alphabets, numbers, and colors. The dataset consists of 46397 audio samples collected from children aged 3 - 12 years, covering 141 classes. All samples were recorded under controlled specifications to ensure consistency in duration, sampling rate, and format. To address high intra-class similarity among Arabic phonemes and the limited samples per class, we propose a hierarchical audio classification based on CNN-LSTM architectures. Our proposed methodology decomposes alphabet recognition into a two-stage process: an initial grouping classification model followed by specialized classifiers for each group. Both strategies: static linguistic-based grouping and dynamic clustering-based grouping, were evaluated. Experimental results demonstrate that static linguistic-based grouping achieves superior performance. Comparisons between traditional machine learning with deep learning approaches, highlight the effectiveness of CNN-LSTM models combined with data augmentation. Despite achieving promising results, most of our experiments indicate a challenge with overfitting, which is likely due to the limited number of samples, even after data augmentation and model regularization. Thus, future work may focus on collecting additional data to address this issue. Abjad-Kids will be publicly available. We hope that Abjad-Kids enrich children representation in speech dataset, and be a good resource for future research in Arabic speech classification for kids.

Authors:Jiaqi Lai, Hou Liang, Weihong Huang
Title: Stability of AI Governance Systems: A Coupled Dynamics Model of Public Trust and Social Disruptions
Abstract:
As artificial intelligence (AI) is increasingly deployed in high-stakes public decision-making (from resource allocation to welfare distribution), public trust in these systems has become a critical determinant of their legitimacy and sustainability. Yet existing AI governance research remains largely qualitative, lacking formal mathematical frameworks to characterize the precise conditions under which public trust collapses. This paper addresses that gap by proposing a rigorous coupled dynamics model that integrates a discrete-time Hawkes process -- capturing the self-exciting generation of AI controversy events such as perceived algorithmic unfairness or accountability failures -- with a Friedkin-Johnsen opinion dynamics model that governs the evolution of institutional trust across social networks. A key innovation is the bidirectional feedback mechanism: declining trust amplifies the intensity of subsequent controversy events, which in turn further erode trust, forming a self-reinforcing collapse loop. We derive closed-form equilibrium solutions and perform formal stability analysis, establishing the critical spectral condition rho(J_{2nt}) < 1 that delineates the boundary between trust resilience and systemic collapse. Numerical experiments further reveal how echo chamber network structures and media amplification accelerate governance failure. Our core contribution to the AI governance field is a baseline collapse model: a formal stability analysis framework demonstrating that, absent strong institutional intervention, even minor algorithmic biases can propagate through social networks to trigger irreversible trust breakdown in AI governance systems.

Authors:Saadi Lahlou, Annabelle Gouttebroze, Atrina Oraee, Julian Madera
Title: Writing literature reviews with AI: principles, hurdles and some lessons learned
Abstract:
We qualitatively compared literature reviews produced with varying degrees of AI assistance. The same LLM, given the same corpus of 280 papers but different selections, produced dramatically different reviews, from mainstream and politically neutral to critical and post-colonial, though neither orientation was intended. LLM outputs always appear at first glance to be well written, well informed and thought out, but closer reading reveals gaps, biases and lack of depth. Our comparison of six versions shows a series of pitfalls and suggests precautions necessary when using AI assistance to make a literature review. Main issues are: (1) The bias of ignorance (you do not know what you do not get) in the selection of relevant papers. (2) Alignment and digital sycophancy: commercial AI models slavishly take you further in the direction they understand you give them, reinforcing biases. (3) Mainstreaming: because of their statistical nature, LLM productions tend to favor mainstream perspectives and content; in our case there was only 20% overlap between paper selections by humans and the LLM. (4) Limited capacity for creative restructuring, with vague and ambiguous statements. (5) Lack of critical perspective, coming from distant reading and political correctness. Most pitfalls can be addressed by prompting, but only if the user knows the domain well enough to detect them. There is a paradox: producing a good AI-assisted review requires expertise that comes from reading the literature, which is precisely what AI was meant to reduce. Overall, AI can improve the span and quality of the review, but the gain of time is not as massive as one would expect, and a press-button strategy leaving AI to do the work is a recipe for disaster. We conclude with recommendations for those who write, or assess, such LLM-augmented reviews.

Authors:Diya Hundiwala, Andrés Monroy-Hernández
Title: AnchorNote: Exploring Speech-Driven Spatial Externalization for Co-Located Collaboration in Augmented Reality
Abstract:
Sticky notes remain a durable collaborative medium because they support rapid idea externalization, rearrangement, and coordination of group attention through spatial organization while being low-friction and lightweight. Recent AR systems suggest new ways to externalize ideas in shared physical space, including spatial annotations and digital workspaces. We introduce AnchorNote, a co-located AR system that lets collaborators intentionally capture spoken ideas as spatially anchored sticky notes via live transcription and LLM summarization. We evaluated AnchorNote in a two-phase iterative study with 20 participants completing a brainstorming and thematic grouping task to examine how speech-driven, spatially persistent capture shapes idea externalization in collaboration. We found that AnchorNote reduced writing effort but reshaped collaboration by introducing new coordination costs and shifting how participants formulated, timed, and organized ideas. We use AnchorNote as an exploratory probe to study how speech-driven, spatial externalization in AR restructures collaborative cognition and coordination, and to derive design implications for future co-located AR collaboration tools.

Authors:Kevin Baum, Johann Laux
Title: Constitutive vs. Corrective: A Causal Taxonomy of Human Runtime Involvement in AI Systems
Abstract:
As AI systems increasingly permeate high-stakes decision-making, the terminology regarding human involvement - Human-in-the-Loop (HITL), Human-on-the-Loop (HOTL), and Human Oversight - has become vexingly ambiguous. This ambiguity complicates interdisciplinary collaboration between computer science, law, philosophy, psychology, and sociology and can lead to regulatory uncertainty. We propose a clarification grounded in causal structure, focused on human involvement during the runtime of AI systems. The distinction between HITL and HOTL, we argue, is not primarily spatial but causal: HITL is constitutive (a human contribution is necessary for the decision output), while HOTL is corrective (external to the primary causal chain, capable of preventing or modifying outputs). Within HOTL, we distinguish three temporal modes - synchronous, asynchronous, and anticipatory - situated within a nested model of provider and deployer runtime that clarifies their different capacities for intervention. A second, orthogonal dimension captures cognitive integration: whether human and machine operate as complementary or hybrid intelligence, yielding four structurally distinct configurations. Finally, we distinguish these descriptive categories from the normative requirements they serve: statutory "Human Oversight" is a specific normative mode of HOTL that demands not merely a corrective causal position, but genuine preparedness and capacity for effective intervention. Because the same person may occupy both HITL and HOTL roles simultaneously, we argue that this role duality must be treated as a design problem requiring architectural and epistemic mitigation rather than mere acknowledgment.

Authors:Kim Zierahn, Cristina Cachero, Anna Korhonen, Nuria Oliver
Title: LLMs Aren't Human: A Critical Perspective on LLM Personality
Abstract:
A growing body of research examines personality traits in Large Language Models (LLMs), particularly in human-agent collaboration. Prior work has frequently applied the Big Five inventory to assess LLM behavior analogous to human personality, without questioning the underlying assumptions. This paper critically evaluates whether LLM responses to personality tests satisfy six defining characteristics of personality. We find that none are fully met, indicating that such assessments do not measure a construct equivalent to human personality. We propose a research agenda for shifting from anthropomorphic trait attribution toward functional evaluations, clarifying what personality tests actually capture in LLMs and developing LLM-specific frameworks for characterizing stable, intrinsic behavior.

Authors:Shitao Fang, Koji Yatani, Kasper Hornbæk
Title: What We Talk About When We Talk About Frameworks in HCI
Abstract:
In HCI, frameworks function as a type of theoretical contribution, often supporting ideation, design, and evaluation. Yet, little is known about how they are actually used, what functions they serve, and which scholarly practices that shape them. To address this gap, we conducted a systematic review of 615 papers from a decade of CHI proceedings (2015-2024) that prominently featured the term framework. We classified these papers into six engagement types. We then examined the role, form, and essential components of newly proposed frameworks through a functional typology, analyzing how they are constructed, validated, and articulated for reuse. Our results show that enthusiasm for proposing new frameworks exceeds the willingness to iterate on existing ones. They also highlight the ambiguity in the function of frameworks and the scarcity of systematic validation. Based on these insights, we call for more rigorous, reflective, and cumulative practices in the development and use of frameworks in HCI.

Authors:Nelson Navajas Fernández, Jeffrey T. Hancock, Maurice Jakesch
Title: Through the Looking-Glass: AI-Mediated Video Communication Reduces Interpersonal Trust and Confidence in Judgments
Abstract:
AI-based tools that mediate, enhance or generate parts of video communication may interfere with how people evaluate trustworthiness and credibility. In two preregistered online experiments (N = 2,000), we examined whether AI-mediated video retouching, background replacement and avatars affect interpersonal trust, people's ability to detect lies and confidence in their judgments. Participants watched short videos of speakers making truthful or deceptive statements across three conditions with varying levels of AI mediation. We observed that perceived trust and confidence in judgments declined in AI-mediated videos, particularly in settings in which some participants used avatars while others did not. However, participants' actual judgment accuracy remained unchanged, and they were no more inclined to suspect those using AI tools of lying. Our findings provide evidence against concerns that AI mediation undermines people's ability to distinguish truth from lies, and against cue-based accounts of lie detection more generally. They highlight the importance of trustworthy AI mediation tools in contexts where not only truth, but also trust and confidence matter.

Authors:Yufei Cao, Penny Sweetser, Ziyu Chen, Xuanying Zhu
Title: Signals of Success and Struggle: Early Prediction and Physiological Signatures of Human Performance across Task Complexity
Abstract:
User performance is crucial in interactive systems, capturing how effectively users engage with task execution. Prospectively predicting performance enables the timely identification of users struggling with task demands. While ocular and cardiac signals are widely used to characterise performance-relevant visual behaviour and physiological activation, their potential for early prediction and for revealing the physiological mechanisms underlying performance differences remains underexplored. We conducted a within-subject experiment in a game environment with naturally unfolding complexity, using early ocular and cardiac signals to predict later performance and to examine physiological and self-reported group differences. Results show that the ocular-cardiac fusion model achieves a balanced accuracy of 0.86, and the ocular-only model shows comparable predictive power. High performers exhibited targeted gaze and adjusted visual sampling, and sustained more stable cardiac activation as demands intensified, with a more positive affective experience. These findings demonstrate the feasibility of cross-session prediction from early physiology, providing interpretable insights into performance variation and facilitating future proactive intervention.

Authors:Hung-Yue Suen, Kuo-En Hung, Fan-Hsun Tseng
Title: Dual-Model Prediction of Affective Engagement and Vocal Attractiveness from Speaker Expressiveness in Video Learning
Abstract:
This paper outlines a machine learning-enabled speaker-centric Emotion AI approach capable of predicting audience-affective engagement and vocal attractiveness in asynchronous video-based learning, relying solely on speaker-side affective expressions. Inspired by the demand for scalable, privacy-preserving affective computing applications, this speaker-centric Emotion AI approach incorporates two distinct regression models that leverage a massive corpus developed within Massive Open Online Courses (MOOCs) to enable affectively engaging experiences. The regression model predicting affective engagement is developed by assimilating emotional expressions emanating from facial dynamics, oculomotor features, prosody, and cognitive semantics, while incorporating a second regression model to predict vocal attractiveness based exclusively on speaker-side acoustic features. Notably, on speaker-independent test sets, both regression models yielded impressive predictive performance (R2 = 0.85 for affective engagement and R2 = 0.88 for vocal attractiveness), confirming that speaker-side affect can functionally represent aggregated audience feedback. This paper provides a speaker-centric Emotion AI approach substantiated by an empirical study discovering that speaker-side multimodal features, including acoustics, can prospectively forecast audience feedback without necessarily employing audience-side input information.

Authors:Baiqiang Wang, Yan Bai, Juan Li
Title: CyberJustice Tutor: An Agentic AI Framework for Cybersecurity Learning via Think-Plan-Act Reasoning and Pedagogical Scaffolding
Abstract:
The integration of Large Language Models (LLMs) into cybersecurity education for criminal justice professionals is currently hindered by the "statelessness" of reactive chatbots and the risk of hallucinations in high-stakes legal contexts. To address these limitations, we propose the CyberJustice Tutor, an educational dialogue system powered by an Agentic AI framework. Unlike reactive chatbots, our system employs a "Think-Plan-Act" cognitive cycle, enabling autonomous goal decomposition, longitudinal planning, and dynamic context maintenance. We integrate a Pedagogical Scaffolding Layer grounded in Vygotsky's Zone of Proximal Development (ZPD), which dynamically adapts instructional support based on the learner's real-time progress. Furthermore, an Adaptive Retrieval Augmented Generation (RAG) core anchors the agent's reasoning in verified curriculum materials to ensure legal and technical accuracy. A comprehensive user study with 123 participants, including students, educators, and active law enforcement officers, validated the system's efficacy. Quantitative results demonstrate high user acceptance for Response Speed (4.7/5), Ease of Use (4.4/5), and Accuracy (4.3/5). Qualitative feedback indicates that the agentic architecture is perceived as highly effective in guiding learners through personalized paths, demonstrating the feasibility and usability of agentic AI for specialized professional education.

Authors:Yutong Ren, Arnav Reddy, Michael Nebeling
Title: PeriphAR: Fast and Accurate Real-World Object Selection with Peripheral Augmented Reality Displays
Abstract:
Gaze-based selection in XR requires visual confirmation due to eye-tracking limitations and target ambiguity in 3D contexts. Current designs for wide-FOV displays use world-locked, central overlays, which are not conducive to always-on AR glasses. This paper introduces PeriphAR (per-ree-far), a visualization technique that leverages peripheral vision for feedback during gaze-based selection on a monocular AR display. In a first user study, we isolated text, color, and shape properties of target objects to compare peripheral selection cues. Peripheral vision was more sensitive to color than shape, but this sensitivity rapidly declined at lower contrast. To preserve preattentive processing of color, we developed two strategies to enhance color in users' peripheral vision. In a second user study, our strategy that maximized contrast of the target to the neighboring object with the most similar color was subjectively preferred. As proof of concept, we implemented PeriphAR in an end-to-end system to test performance with real-world object detection.

Authors:Qi Xu, Beat Signer
Title: Augmenting Scholarly Reading with Cross-Media Annotations
Abstract:
Scholarly reading often involves engaging with various supplementary materials beyond PDFs to support understanding. In practice, scholars frequently incorporate such external materials into their reading workflow through annotation. However, most existing PDF annotation tools support only a limited range of media types for embedding annotations in PDF documents. This paper investigates cross-media annotation as a design space for augmenting academic reading. We present a design exploration of a cross-media annotation tool that allows scholars to easily link PDF content with other documents and materials such as audio, video or web pages. The proposed design has the potential to enrich reading practices and enable scholars to guide and support other researchers' reading experiences.

Authors:Michel Schimpf, Julian Voigt, Thomas Bohné
Title: AI-Assisted Goal Setting Improves Goal Progress Through Social Accountability
Abstract:
Helping people identify and pursue personally meaningful career goals at scale remains a key challenge in applied psychology. Career coaching can improve goal quality and attainment, but its cost and limited availability restrict access. Large language model (LLM)-based chatbots offer a scalable alternative, yet the psychological mechanisms by which they might support goal pursuit remain untested. Here we report a preregistered three-arm randomised controlled trial (N = 517) comparing an AI career coach ("Leon," powered by Claude Sonnet), a matched structured written questionnaire covering closely matched reflective topics, and a no-support control on goal progress at a two-week follow-up. The AI chatbot produced significantly higher goal progress than the control (d = 0.33, p = .016). Compared with the written-reflection condition, the AI did not significantly improve overall goal progress, but it increased perceived social accountability. In the preregistered mediation model, perceived accountability mediated the AI-over-questionnaire effect on goal progress (indirect effect = 0.15, 95% CI [0.04, 0.31]), whereas self-concordance did not. These findings suggest that AI-assisted goal setting can improve short-term goal progress, and that its clearest added value over structured self-reflection lies in increasing felt accountability.

Authors:Anthony Maocheia-Ricci, Edith Law
Title: Building a "-Sensitive Design" Methodology from Political Philosophies or Ideologies
Abstract:
Value-based approaches such as Value Sensitive Design (VSD) enable technology designers to engage with and integrate human values in technology through a tripartite methodology of conceptual, empirical, and technical investigations. However, VSD contains pitfalls in both translating values to requirements and a lack of normative grounding, leading to adaptations such as Jacobs' Capability Sensitive Design (CSD). Inspired by CSD and extensions of the design approach, we propose the concept of creating -Sensitive Design (-SD); a meta-framework to embed various political or ideological values as norms in a design research process. We exemplify this through \emph{Dependency}-Sensitive Design (DSD), combining ideas from Kittay's critiques of classical liberal theory within a practical VSD framework. Finally, we push for further work combining philosophy and design in areas beyond CSD and DSD.

Authors:Fiammetta Caccavale, Carina L. Gargalo, Julian Kager, Magdalena Skowyra, Steen Larsen, Krist V. Gernaey, Ulrich Krühne
Title: Large Language Models in Teaching and Learning: Reflections on Implementing an AI Chatbot in Higher Education
Abstract:
The landscape of education is changing rapidly, shaped by emerging pedagogical approaches, technological innovations such as artificial intelligence (AI), and evolving societal expectations, all of which demand thorough evaluation of new educational tools. Although large language models (LLMs) present substantial opportunities especially in Higher Education, their propensity to generate hallucinations and their limited specialized knowledge may introduce significant risks. This study aims to address these risks by examining the practical implementation of an LLM-enhanced assistant in a university level course. We implemented a generative AI assistant grounded in a retrieval-augmented generation (RAG) model to replicate a previously teacher-led, time-intensive exercise. To assess the effectiveness of the LLM, we conducted three separate experiments through iterative mixed-methods approaches, including a crossover design. The resulting data address central research questions related to student motivation, perceived differences between engaging with the LLM versus a human teacher, the quality of AI-generated responses, and the impact of the LLM on students' academic performance. The results offer direct insights into students' views and the pedagogical feasibility of embedding LLMs into specialized courses. Finally, we discuss the main challenges, opportunities and future directions of LLMs in teaching and learning in Higher Education.

Authors:Alexander V. Shenderuk-Zhidkov, Alexander E. Hramov
Title: Large Language Models as a Semantic Interface and Ethical Mediator in Neuro-Digital Ecosystems: Conceptual Foundations and a Regulatory Imperative
Abstract:
This article introduces and substantiates the concept of Neuro-Linguistic Integration (NLI), a novel paradigm for human-technology interaction where Large Language Models (LLMs) act as a key semantic interface between raw neural data and their social application. We analyse the dual nature of LLMs in this role: as tools that augment human capabilities in communication, medicine, and education, and as sources of unprecedented ethical risks to mental autonomy and neurorights. By synthesizing insights from AI ethics, neuroethics, and the philosophy of technology, the article critiques the inherent limitations of LLMs as semantic mediators, highlighting core challenges such as the erosion of agency in translation, threats to mental integrity through precision semantic suggestion, and the emergence of a new `neuro-linguistic divide' as a form of biosemantic inequality. Moving beyond a critique of existing regulatory models (e.g., GDPR, EU AI Act), which fail to address the dynamic, meaning-making processes of NLI, we propose a foundational framework for proactive governance. This framework is built on the principles of Semantic Transparency, Mental Informed Consent, and Agency Preservation, supported by practical tools such as NLI-specific ethics sandboxes, bias-aware certification of LLMs, and legal recognition of the neuro-linguistic inference. The article argues for the development of a `second-order neuroethics,' focused not merely on neural data protection but on the ethics of AI-mediated semantic interpretation itself, thereby providing a crucial conceptual basis for steering the responsible development of neuro-digital ecosystems.

Authors:Roxana Bujack, Li-Ta Lo, Ethan Stam, Ayan Biswas, David Rogers
Title: The Truth, the Whole Truth, and Nothing but the Truth: Automatic Visualization Evaluation from Reconstruction Quality
Abstract:
Recent advances in AI enable the automatic generation of visualizations directly from textual prompts using agentic workflows. However, visualizations produced via one-shot generative methods often suffer from insufficient quality, typically requiring a human in the loop to refine the outputs. Human evaluation, though effective, is costly and impractical at scale. To alleviate this problem, we propose an automated metric that evaluates visualization quality without relying on extensive human-labeled datasets. Instead, our approach uses the original underlying data as implicit ground truth. Specifically, we introduce a method that measures visualization quality by assessing the reconstruction accuracy of the original data from the visualization itself. This reconstruction-based metric provides an autonomous and scalable proxy for thorough human evaluation, facilitating more efficient and reliable AI-driven visualization workflows.

Authors:Sarah Diefenbach, Daniel Ullrich
Title: Why We Need to Destroy the Illusion of Speaking to A Human: Critical Reflections On Ethics at the Front-End for LLMs
Abstract:
Conversation with chatbots based on Large Language Models (LLMs) such as ChatGPT has become one of the major forms of interaction with Artificial Intelligence (AI) in everyday life. What makes this interaction so convenient is that interacting with LLMs feels so natural, and resembles what we know from real, human conversations. At the same time, this seeming similarity is part of one of the ethical challenges of AI design, since it activates many misleading ideas about AI. We discuss similarities and differences between human-AI-conversations and interpersonal conversation and highlight starting points for more ethical design of AI at the front-end.

Authors:Xiruo Wang, Xinyi Jiang, Ziqi Lyu
Title: One Kiss: Emojis as Agents of Genre Flux in Generative Comics
Abstract:
Generative AI has made visual storytelling widely accessible, yet current prompt-based interactions often force users into a trade-off between precise control and creative flow. We present One Kiss, a co-creative comic generation system that introduces "Affective Steering". Instead of writing text prompts, users guide the tone of their story through emoji inputs, whose semantic ambiguity becomes a resource rather than a limitation. Unlike traditional text-to-image tools that rely on explicit descriptions, One Kiss uses a dual-stream input in which users define structural pacing by sketching panel frames and set atmospheric tone by pairing keywords with emojis. This mechanism enables "Genre Flux," where emotional inputs accumulate across panels and gradually shift the genre of a story. A preliminary study (N = 6) suggests that this soft steering approach may reframe the user's role from prompt engineer to narrative director, with ambiguity serving as a source of creative surprise rather than a loss of control.

Authors:Yang Ni, Fanli Jia
Title: A Scoping Review of AI-Driven Digital Interventions in Mental Health Care: Mapping Applications Across Screening, Support, Monitoring, Prevention, and Clinical Education
Abstract:
Artificial intelligence (AI)-enabled digital interventions, including Generative AI (GenAI) and Human-Centered AI (HCAI), are increasingly used to expand access to digital psychiatry and mental health care. This PRISMA-ScR scoping review maps the landscape of AI-driven mental health (mHealth) technologies across five critical phases: pre-treatment (screening/triage), treatment (therapeutic support), post-treatment (remote patient monitoring), clinical education, and population-level prevention. We synthesized 36 empirical studies implemented through early 2024, focusing on Large Language Models (LLMs), machine learning (ML) models, and autonomous conversational agents. Key use cases involve referral triage, empathic communication enhancement, and AI-assisted psychotherapy delivered via chatbots and voice agents. While benefits include reduced wait times and increased patient engagement, we address recurring challenges like algorithmic bias, data privacy, and human-AI collaboration barriers. By introducing a novel four-pillar framework, this review provides a comprehensive roadmap for AI-augmented mental health care, offering actionable insights for researchers, clinicians, and policymakers to develop safe, effective, and equitable digital health interventions.

Authors:Yan Xia, Sushmita Khan, Naiyah Lewis, Jinkyung Katie Park
Title: Balancing Openness and Safety: Central and Peripheral Governance Practices in the Lesbian Subreddit Ecosystem
Abstract:
Online LGBTQ+ communities face a persistent tension: remaining visible to welcome newcomers while protecting members from harassment. This challenge is particularly acute for lesbian communities on Reddit, which operate not as isolated groups but as an interconnected ecosystem. We examine how this tension is negotiated across the lesbian subreddit ecosystem (N=29) by combining network analysis of cross-subreddit links with a qualitative thematic analysis of 167 subreddit rules. Our findings show a functional division of governance labor between central (34%) and peripheral subreddits (66%). While all communities share a baseline of safety regulations, central subreddits prioritize content curation and feed quality to support a large, public-facing audience, whereas peripheral subreddits emphasize boundary maintenance and participation control to protect smaller, identity-specific niches. These findings challenge monolithic moderation approaches and highlight the need for ecosystem-aware design. We argue that effective moderation requires role- and context-sensitive tools supporting visibility and safety across interconnected spaces.

Authors:Jake Van Clief, David McDermott
Title: Interpretable Context Methodology: Folder Structure as Agentic Architecture
Abstract:
Current approaches to AI agent orchestration typically involve building multi-agent frameworks that manage context passing, memory, error handling, and step coordination through code. These frameworks work well for complex, concurrent systems. But for sequential workflows where a human reviews output at each step, they introduce engineering overhead that the problem does not require. This paper presents Model Workspace Protocol (MWP), a method that replaces framework-level orchestration with filesystem structure. Numbered folders represent stages. Plain markdown files carry the prompts and context that tell a single AI agent what role to play at each step. Local scripts handle the mechanical work that does not need AI at all. The result is a system where one agent, reading the right files at the right moment, does the work that would otherwise require a multi-agent framework. This approach applies ideas from Unix pipeline design, modular decomposition, multi-pass compilation, and literate programming to the specific problem of structuring context for AI agents. The protocol is open source under the MIT license.

Authors:Mohammad Dastgheib, Fatemeh Pourmahdian
Title: The Midas Touch in Gaze vs. Hand Pointing: Modality-Specific Failure Modes and Implications for XR Interfaces
Abstract:
Extended Reality (XR) interfaces impose both ergonomic and cognitive demands, yet current systems often force a binary choice between hand-based input, which can produce fatigue, and gaze-based input, which is vulnerable to the Midas Touch problem and precision limitations. We introduce the xr-adaptive-modality-2025 platform, a web-based open-source framework for studying whether modality-specific adaptive interventions can improve XR-relevant pointing performance and reduce workload relative to static unimodal interaction. The platform combines physiologically informed gaze simulation, an ISO 9241-9 multidirectional tapping task, and two modality-specific adaptive interventions: gaze declutter and hand target-width inflation. We evaluated the system in a 2 x 2 x 2 within-subjects design manipulating Modality (Hand vs. Gaze), UI Mode (Static vs. Adaptive), and Pressure (Yes vs. No). Results from N=69 participants show that hand yielded higher throughput than gaze (5.17 vs. 4.73 bits/s), lower error (1.8% vs. 19.1%), and lower NASA-TLX workload. Crucially, error profiles differed sharply by modality: gaze errors were predominantly slips (99.2%), whereas hand errors were predominantly misses (95.7%), consistent with the Midas Touch account. Of the two adaptive interventions, only gaze declutter executed in this dataset; it modestly reduced timeouts but not slips. Hand width inflation was not evaluable due to a UI integration bug. These findings reveal modality-specific failure modes with direct implications for adaptive policy design, and establish the platform as a reproducible infrastructure for future studies.

Authors:Lara Lee Russell-Lasalandra, Hudson Golino
Title: Prompt Engineering for Scale Development in Generative Psychometrics
Abstract:
This Monte Carlo simulation examines how prompt engineering strategies shape the quality of large language model (LLM)--generated personality assessment items within the AI-GENIE framework for generative psychometrics. Item pools targeting the Big Five traits were generated using multiple prompting designs (zero-shot, few-shot, persona-based, and adaptive), model temperatures, and LLMs, then evaluated and reduced using network psychometric methods. Across all conditions, AI-GENIE reliably improved structural validity following reduction, with the magnitude of its incremental contribution inversely related to the quality of the incoming item pool. Prompt design exerted a substantial influence on both pre- and post-reduction item quality. Adaptive prompting consistently outperformed non-adaptive strategies by sharply reducing semantic redundancy, elevating pre-reduction structural validity, and preserving substantially larger item pool, particularly when paired with newer, higher-capacity models. These gains were robust across temperature settings for most models, indicating that adaptive prompting mitigates common trade-offs between creativity and psychometric coherence. An exception was observed for the GPT-4o model at high temperatures, suggesting model-specific sensitivity to adaptive constraints at elevated stochasticity. Overall, the findings demonstrate that adaptive prompting is the strongest approach in this context, and that its benefits scale with model capability, motivating continued investigation of model--prompt interactions in generative psychometric pipelines.

Authors:Michaela Benk, Tim Miller
Title: Same Performance, Hidden Bias: Evaluating Hypothesis- and Recommendation-Driven AI
Abstract:
The HCI community commonly evaluates decision support systems based on whether they improve task performance or promote appropriate user reliance. In this work, we look beyond decision outcomes to examine the process through which users develop decision-making strategies. Through a web-based experiment (N = 290) comparing recommendation-driven and hypothesis-driven interaction designs, and using Signal Detection Theory as a theoretical framework, we show that even when performance remains identical, recommendation-driven designs lower participants' thresholds for sufficient evidence and introduce a "hidden bias" in their judgments, resulting in a shifted distribution of errors. Furthermore, we find that experts are just as susceptible to these systemic shifts as novices. We conclude by advocating for a shift in focus: prioritizing decision processes and the preservation of stable evidence standards over performance and reliance alone.

Authors:Antonios Lykourinas, Chinmay Pendse, Francky Catthoor, Veronique Rochus, Xavier Rottenberg, Athanassios Skodras
Title: Parameter-Efficient Deep Learning for Ultrasound-Based Human-Machine Interfaces
Abstract:
Ultrasound (US) has emerged as a promising modality for Human-Machine Interfaces (HMIs), with recent research efforts exploring its potential for Hand Pose Estimation (HPE). A reliable solution to this problem could introduce interfaces with simultaneous support for up to 23 degrees of freedom encompassing all hand and wrist kinematics, thereby allowing far richer and more intuitive interaction strategies. Despite these promising results, a systematic comparison of models, input modalities and training strategies is missing from the literature. Moreover, there is only one publicly available dataset, namely the Ultrasound Adaptive Prosthetic Control (Ultra-Pro) dataset, enabling reproducible benchmarking and iterative model development. In this paper, we compare the performance of six different deep learning models, selected based on diverse criteria, on this benchmark. We demonstrate that, by using a step learning rate scheduler and the envelope of the RF signals as input modality, our 4-layer deep UDACNN surpasses XceptionTime's performance by $2.28$ percentage points while featuring $87.52\%$ fewer parameters. This result ($77.72\%$) constitutes an absolute improvement of $0.88\%$ from previously reported baselines. According to our findings, the appropriate combination of model, preprocessing and training algorithm is crucial for optimizing HMI performance.

Authors:Giulia Huang, Maristella Matera, Micol Spitale
Title: Inclusive AI for Group Interactions: Predicting Gaze-Direction Behaviors in People with Intellectual and Developmental Disabilities
Abstract:
Artificial agents that support human group interactions hold great promise, especially in sensitive contexts such as well-being promotion and therapeutic interventions. However, current systems struggle to mediate group interactions involving people who are not neurotypical. This limitation arises because most AI detection models (e.g., for turn-taking) are trained on data from neurotypical populations. This work takes a step toward inclusive AI by addressing the challenge of eye contact detection, a core component of non-verbal communication, with and for people with Intellectual and Developmental Disabilities. First, we introduce a new dataset, Multi-party Interaction with Intellectual and Developmental Disabilities (MIDD), capturing atypical gaze and engagement patterns. Second, we present the results of a comparative analysis with neurotypical datasets, highlighting differences in class imbalance, speaking activity, gaze distribution, and interaction dynamics. Then, we evaluate classifiers ranging from SVMs to FSFNet, showing that fine-tuning on MIDD improves performance, though notable limitations remain. Finally, we present the insights gathered through a focus group with six therapists to interpret our quantitative findings and understand the practical implications of atypical gaze and engagement patterns. Based on these results, we discuss data-driven strategies and emphasize the importance of feature choice for building more inclusive human-centered tools.

Authors:Sicheng Lu, Erick Purwanto, Hong Liu, Aini Li, Adel Chaouch-Orozco
Title: Gamifying Compassion: Mitigating Dialect Prejudice Through An AI-Driven Serious Game
Abstract:
Dialect bias is pervasive yet often unconscious, normalized, or obscured by masking. Existing HCI interventions primarily audit disparities and propose reactive fixes. We present CompassioMate, a dialect-aware serious game that nurtures perspective-taking through AI-mediated play. Players listen to audio samples to identify regional dialects, engage in simulated social interactions involving dialect discrimination, and explore branching narratives that reveal how changes in wording or stance can influence the outcomes. In a three-week field study with 20 university students, participants reported feeling comfortable when observing region-tailored dialogues; several described experiencing perspective change. We contribute: 1) a formative study identifying goals for safe action consequence modelling, 2) the design and evaluation of a serious game integrating dialect audio, region-mapping play, bias; and 3) design implications highlighting listener-side training, transparent evaluation, and narratives maintaining psychological well-being.

Authors:Hansoo Lee, Changhee Seo, Subin Park, Sonya S. Kwak
Title: Towards Equitable Robotic Furnishing Agents for Aging-in-Place: ADL-Grounded Design Exploration
Abstract:
In aging-in-place contexts, small difficulties in Activities of Daily Living (ADL) can accumulate, affecting well-being through fatigue, anxiety, reduced autonomy, and safety risks. This position paper argues that robotics for older adult wellbeing must move beyond "convenience features" and centre equity, justice, and responsibility. We conducted ADL-grounded semi-structured interviews with four adults in their 70s-80s, identifying recurrent challenges (finding/ organising items, taking medication, and transporting objects) and deriving requirements to reduce compounded cognitive-physical burden. Based on these insights, we propose an in-home robotic furnishing-agent concept leveraging computer vision and generative AI and LLMs for natural-language interaction, context-aware reminders, safe actuation, and user-centred transparency. We then report video-stimulated follow-up interviews with the same participants, highlighting preferences for confirmation before actuation, predictability, adjustable speed/autonomy, and multimodal feedback, as well as equity-related concerns. We conclude with open questions on evaluating and deploying equitable robotic wellbeing systems in real homes.

Authors:Hikari Kuriyama, Hiroaki Sonoda, Kouki Tomiyoshi, Gou Koutaki
Title: Semi-Automatic Flute Robot and Its Acoustic Sensing
Abstract:
Flute performance requires mastery of complex fingering combinations and register-dependent embouchure control, particularly jet offset adjustment for low-register production. Existing haptic and semi-automated systems do not address both aspects simultaneously through mechanical actuation. To our knowledge, no prior system fully automates fingering while mechanically assisting low-register tone production without requiring embouchure control. We developed a semi-automatic flute robot with an automatic fingering mechanism: fourteen servo motors actuate all keys via wire-based and rack-and-pinion drives in response to MIDI input, enabling performers to produce complete musical pieces through airflow alone. A jet offset assist mechanism rotates the head joint by a calibrated $22^\circ$ during low-register passages, shifting the jet offset toward a low-register configuration without modifying the instrument or embouchure. Fundamental frequency estimation confirmed correct pitch production across the chromatic range (C4--C7) and during musical performance. All key and lever movements were completed within 77.50~ms, corresponding to tempo capacity exceeding standard requirements. Harmonic analysis ($Δ\mathrm{SPL} = \mathrm{SPL}_2 - \mathrm{SPL}_3$) showed a consistent increase in $Δ$SPL for all low-register notes when activated, consistent with the intended jet offset shift. Head joint rotation completed within 40.00~ms. These results demonstrate mechanical feasibility of integrating automated fingering and register-dependent jet offset assistance under controlled conditions.

Authors:Wei Xiao, Mengke Wu, Yeeun Jo
Title: Designing for Understanding: How Interface-Level Consent Designs Shape Attention and Understanding in Privacy Disclosures
Abstract:
Privacy policies are intended to support informed consent, yet users rarely read them fully. This study examines how common privacy policy interface structures influence attention allocation, reading behavior, and perceived experience. Using eye-tracking and post-task surveys, we compared three interface designs: continuous scrolling text, collapsible sections, and collapsible sections with brief previews. Results show that interface structure systematically shaped how users allocated attention and navigated policy content, but did not uniformly improve comprehension. Guided layouts supported more efficient and coherent reading patterns, whereas more interactive designs elicited higher perceived engagement and satisfaction. Importantly, comprehension was closely linked to sustained attention rather than interface type alone. These findings highlight the limits of interface-centered consent approaches and suggest that effective consent design must account for attention dynamics and selective engagement, rather than assuming that improved layout alone ensures understanding.

Authors:Ammar Al-Taie, Thomas Goodge, Shaun Macdonald, Ian Oakley, Stephen Brewster
Title: Running into Traffic: Investigating External Human-Machine Interfaces for Automated Vehicle-Runner Interaction
Abstract:
Automated vehicles (AVs) must communicate their yielding intentions to pedestrians at crossings. External Human-Machine Interfaces (eHMIs, on-vehicle displays) are promising solutions, but were primarily tested with walking pedestrians. Runners are a significant pedestrian group who move faster and face distinct bodily and perceptual demands, raising questions about how pedestrian activity influences eHMI use. We conducted an outdoor study using an augmented reality simulator. Participants navigated a virtual crossing while walking and running; an approaching AV displayed one of three eHMIs: red/green colour-changing lights, animated cyan lights, or no-eHMI. No-eHMI consistently underperformed. Walkers mostly stopped and validated eHMI signals with vehicle behaviour; they processed both eHMI animations and colour changes effectively. Runners experienced greater time pressure to cross, increasing reliance on the eHMI over vehicle behaviour. They preferred colour changes over animation for rapid decisions. These findings are crucial for promoting eHMI inclusivity and physical wellbeing as AVs join our roads.

Authors:Joseph Damouni, Wadia Tanus, Naomi Unkelos-Shpigel
Title: Adaptive Virtual Reality Museum: A Closed-Loop Framewor for Engagement-Aware Cultural Heritage
Abstract:
Static information presentation in VR cultural heritage often causes cognitive overload or under-stimulation. We introduce a closed-loop adaptive interface that tailors content depth to real-time visitor behavior through implicit multimodal sensing. Our approach continuously monitors gaze dwell, head kinematics, and locomotion to infer engagement via a transparent rule-based classifier, which drives a Large Language Model to dynamically modulate explanation complexity without interrupting exploration. We implemented a proof-of-concept in the Berat Ethnographic Museum and conducted a preliminary evaluation (N=16) comparing adaptive versus static content. Results indicate that adaptive participants demonstrated 2-3x increases in reading engagement and exploration time while maintaining high usability (SUS = 84.3). Technical validation confirmed sub-millisecond engagement inference latency on consumer VR hardware. These preliminary findings warrant larger-scale investigation and raise questions about engagement validation, AI transparency, and generative models in heritage contexts. We present this work-in-progress to spark discussion about implicit AI-driven adaptation in immersive cultural experiences.

Authors:Jesse T. Gonzalez, Neeta Khanuja, Michael Li, Maggie Guo, Layomi Olaitan, Emily Lau, Jennifer Pugh, Alexandra Ion, Scott E. Hudson
Title: Towards Fluent Interaction with Cyber-Physical Architecture
Abstract:
What happens when your walls begin to move? This paper explores the design of human-robot interaction for architectural-scale, shape-changing environments. We present findings from two studies: (1) a series of speculative design workshops (N=20) that uncovered aspirational visions for these spaces, and (2) a task-based Wizard-of-Oz elicitation study (N=12) that grounded these visions in the challenges of practical interaction. Our workshop findings reveal a complex landscape of user desires, exposing critical tensions between proactive automation and the preservation of user autonomy, and between personalization and public ownership. Our elicitation study reveals a set of core interaction challenges related to multimodal collaboration; and, most critically: suggests the need for a modality-agnostic model of evolving user intent. We conclude with a set of grounded proposals for creating robotic environments that are collaborative and trusted partners in everyday life.

Authors:Zhou Fang, Janet Yi-Ching Huang
Title: Memory Printer: Exploring Everyday Reminiscing by Combining Slow Design with Generative AI-based Image Creation
Abstract:
Generative Artificial Intelligence (GAI) offers new opportunities for reconstructing these unrecorded memory scenes, yet existing web-based tools undermine users' sense of agency through disengaging and unpredictable interactions. In this work, we advance three design arguments about how slow, tangible interaction can reshape human-AI relationships by making temporality, embodied agency, and generative processes experientially legible. We instantiate these arguments by presenting Memory Printer, a tangible design that combines silk-screen printing metaphors with text-to-image generation. The design features layered reconstruction that decomposes image generation into incremental steps, a physical wooden scraper enabling embodied control over image revelation, and built-in printing that produces tangible photos. We examine these arguments through a comparative study with 24 participants, exploring how participants engage with, interpret, and respond to this interaction stance. The study surfaces both opportunities -- such as vivid memory evocation, heightened sense of control, and creative exploration -- and critical tensions, including risks of false memory formation, algorithmic bias, and data privacy. Together, these findings articulate important boundaries for deploying generative AI in emotionally sensitive contexts.

Authors:Lei Fan, Yuxin Li
Title: Virtual reality for large-scale laboratories based on colorized point clouds: design and pedagogical impact
Abstract:
Effective laboratory training is essential in engineering education, yet conventional on-site instruction is often constrained by time, accessibility, and safety considerations. To address these challenges, this study presents the design, implementation, and evaluation of a web-based virtual reality (WebVR) representation of a large-scale engineering laboratory constructed from massive colorized point cloud data. This study proposes a novel WebVR framework that integrates Unity and Potree for high-fidelity point-cloud visualization combined with advanced interactive capabilities in a browser-based virtual laboratory. It supports immersive first-person exploration, guided navigation, interactive hotspots conveying equipment and safety information, as well as emergency evacuation simulations. The usability, educational effectiveness, and overall acceptance of the virtual laboratory were evaluated through an anonymous questionnaire administered to students and laboratory staff. The results indicate overwhelmingly positive feedback, with all participants rating the system as "good" or "excellent" across all evaluation dimensions. Participants particularly emphasized the benefits of immersive exploration and self-directed learning. In addition, qualitative feedback was systematically analyzed to inform future enhancements of the virtual environment. Overall, the findings demonstrate that the WebVR-based virtual laboratory can effectively complement conventional on-site laboratory instruction, offering a scalable, accessible, and low-risk platform that enhances learning experiences in engineering education.

Authors:Sihan Qian, Amit Mehra, Dengpan Liu
Title: The Economics of AI Supply Chain Regulation
Abstract:
The rise of foundation models has driven the emergence of AI supply chains, where upstream foundation model providers offer fine-tuning and inference services to downstream firms developing domain-specific applications. Downstream firms pay providers to use their computing infrastructure to fine-tune models with proprietary data, creating a co-creation dynamic that enhances model quality. Amid concerns that foundation model providers and downstream firms may capture excessive consumer surplus, along with increasing regulatory measures, this study employs a game-theoretic model involving a provider and two competing downstream firms to analyze how policy interventions affect consumer surplus in the AI supply chain. Our analysis shows that policies promoting price competition in downstream markets (i.e., pro-price-competitive policies) boost consumer surplus only when compute or data preprocessing costs are high, while compute subsidies are effective only when these costs are low, suggesting these policies complement each other. In contrast, policies promoting quality competition in downstream markets (i.e., pro-quality-competitive policies) always improve consumer surplus. We also find that under pro-price-competitive policies or compute subsidies, both the provider and downstream firms can achieve higher profits along with greater consumer surplus, creating a win-win-win outcome. However, pro-quality-competitive policies increase the provider's profits while reducing those of downstream firms. Finally, as compute costs decline, pro-price-competitive policies may lose their effectiveness, whereas compute subsidies may shift from ineffective to effective. These findings offer insights for policymakers seeking to foster AI supply chains that are economically efficient and socially beneficial.

Authors:Nurullah Demir, Yash Vekaria, Georgios Smaragdakis, Zakir Durumeric
Title: Keys on Doormats: Exposed API Credentials on the Web
Abstract:
Application programming interfaces (APIs) have become a central part of the modern IT environment, allowing developers to enrich the functionality of applications and interact with third parties such as cloud and payment providers. This interaction often occurs through authentication mechanisms that rely on sensitive credentials such as API keys and tokens that require secure handling. Exposure of these credentials can pose significant consequences to organizations, as malicious attackers can gain access to related services. Previous studies have shown exposure of these sensitive credentials in different environments such as cloud platforms and GitHub. However, the web remains unexplored. In this paper, we study exposure of credentials on the web by analyzing 10M webpages. Our findings reveal that API credentials are widely and publicly exposed on the web, including highly popular and critical webpages such as those of global banks and firmware developers. We identify 1,748 distinct credentials from 14 service providers (e.g., cloud and payment providers) across nearly 10,000 webpages. Moreover, our analysis of archived data suggest credentials to remain exposed for periods ranging from a month to several years. We characterize web-specific exposure vectors and root causes, finding that most originate from JavaScript environments. We also discuss the outcomes of our responsible disclosure efforts that demonstrated a substantial reduction in credential exposure on the web.

Authors:Jie Gao, Yaoxin Wu
Title: LLMs for Human Mobility: Opportunities, Challenges, and Future Directions
Abstract:
Human mobility studies how people move among meaningful places over time and how these movements aggregate into population-level patterns that shape accessibility, congestion, emissions, and public health. Large language models (LLMs) are increasingly used in this domain because many human mobility problems require reasoning about place and activity semantics, travelers' intentions and preferences, and diverse real-world constraints that are difficult to capture using coordinates and other purely numerical attributes. Despite rapid growth, the literature is still scattered, and there is no clear overview that connects human mobility tasks, challenges, and LLM designs in a consistent way. This survey therefore provides a comprehensive synthesis of LLM-based research on human mobility across five tasks, including travel itinerary planning, trajectory generation, mobility simulation, mobility prediction, and mobility semantics and understanding. For each task, we review representative work, connect core challenges to the specific roles of LLMs, and summarize typical LLM-based solution designs. We conclude with open challenges and research directions toward reliable, grounded and privacy-aware LLM-based approaches for human mobility.

Authors:Prerna Khanna, Tanmay Srivastava, Shubham Jain, Aruna Balasubramanian
Title: UniMotion: Self-Supervised Learning for Cross-Domain IMU Motion Recognition
Abstract:
IMU-based gesture interfaces are being increasingly adopted as efficient, accessible, and intuitive alternatives to traditional input methods, such as touchscreens and voice. However, current gesture recognition algorithms are tailored to work for specific devices (e.g., smartwatches vs. earbuds) or user populations (e.g., blind vs. sighted users), limiting their generalizability. In this paper, we design UniMotion, a generalized IMU-based gesture recognition framework that works across devices and populations with minimal training samples. To overcome the challenges and high cost of collecting large-scale labeled training data, UniMotion leverages readily available unlabeled human activity data. The UniMotion pipeline comprises two stages: (1) pre-training a motion representation model using abundant unlabeled human activity data, and (2) fine-tuning it with a small amount of labeled gesture data. For pre-training, we introduce a token-based strategy and embeddings that learn to identify and focus attention on the key motion signatures in the temporal data For fine-tuning, we design a text-guided classifier that can reliably differentiate between temporally or semantically similar gestures. We evaluate UniMotion across both hand gestures (captured through a smartwatch) and earbud gestures (captured through earbuds), using data collected from blind and sighted users. Across these diverse devices and user populations, UniMotion achieves an accuracy of 85\%, across an average of 13 gesture classes using only 10\% of labeled data for training. UniMotion significantly outperforms state-of-the-art self-supervised learning approaches and specialized gesture recognition models.

Authors:Pei-Ying Lin, Julie Heij, Iris Borst, Britt Joosten, Kristina Andersen, Wijnand IJsselsteijn
Title: An Intent of Collaboration: On Agencies between Designers and Emerging (Intelligent) Technologies
Abstract:
Amidst the emergence of powerful intelligent technologies such as LLMs and text-to-image AIs that promise to enhance creative processes, designers face the challenges of remaining empowered and creative while working with these foreign digital partners. While generative AIs offer versatile, informative, and occasionally poetic outcomes, their lack of embodied knowledge presents an even greater challenge to designers in gaining fruitful outcomes, such as in the field of Digital Craftsmanship. In this project, three designers embarked on a three-month experimental journey with an intention to co-create with Google's LLM as a potential intelligent partner to investigate how it will influence the designers' creativity. We found that a power dynamic of agencies exists between the LLM and the designer, in which the designer can easily lose their creative agency. Regaining the designer's creative agency involves introspection into their own creative process, a structural understanding of the specific emerging technology involved, and deliberate adjustments to the dynamics of the human-technology relationship. We propose paying attention to the designer's inner world and parties of agencies when engaging with emerging intelligent technologies through three aspects: the sensitivity towards a creative process as cognitive activities; the active investigation into specific technology's capability; and the adjustment towards an appropriate working relationship between the designer and the emerging technology.

Authors:Lu Liu, Harm van Essen, Berry Eggen
Title: Design Exploration of Lightweight Interactions for Awareness-Supporting Technologies in Hybrid Work
Abstract:
Hybrid work settings often lack the informal communication that naturally emerges from spontaneous encounters and ambient awareness of coworkers' activities, potentially hindering team collaboration. To address this challenge, we explored how lightweight interactions can be integrated into awareness-supporting technologies for fostering informal communication. Our experiential design approach focused on how information is perceived and processed rather than explicit content exchange. Through brainstorming, speculating, and prototyping, we explored the design space for small hybrid teams. By annotating and analyzing design concepts, speculative scenarios, and prototypes, we developed a framework that identified design options for lightweight interactions and methods for integrating them with information displays.

Authors:Gaole He, Brian Y. Lim
Title: From Control to Foresight: Simulation as a New Paradigm for Human-Agent Collaboration
Abstract:
Large Language Models (LLMs) are increasingly used to power autonomous agents for complex, multi-step tasks. However, human-agent interaction remains pointwise and reactive: users approve or correct individual actions to mitigate immediate risks, without visibility into subsequent consequences. This forces users to mentally simulate long-term effects, a cognitively demanding and often inaccurate process. Users have control over individual steps but lack the foresight to make informed decisions. We argue that effective collaboration requires foresight, not just control. We propose simulation-in-the-loop, an interaction paradigm that enables users and agents to explore simulated future trajectories before committing to decisions. Simulation transforms intervention from reactive guesswork into informed exploration, while helping users discover latent constraints and preferences along the way. This perspective paper characterizes the limitations of current paradigms, introduces a conceptual framework for simulation-based collaboration, and illustrates its potential through concrete human-agent collaboration scenarios.

Authors:María Isabel Rivas Ginel, Janiça Hackenbuchner, Alina Secară, Ralph Krüger, Caroline Rossi
Title: A technology-oriented mapping of the language and translation industry: Analysing stakeholder values and their potential implication for translation pedagogy
Abstract:
This paper examines how value is constructed and negotiated in today's increasingly automated language and translation industry. Drawing on interview data from twenty-nine industry stakeholders collected within the LT-LiDER project, the study analyses how human value, technological value, efficiency, and adaptability are articulated across different professional roles. Using Chesterman's framework of translation ethics and associated values as an analytical lens, the paper shows that efficiency-oriented technological values aligned with the ethics of service have become baseline expectations in automated production environments, where speed, scalability, and deliverability dominate evaluation criteria. At the same time, human value is not displaced but repositioned, emerging primarily through expertise, oversight, accountability, and contextual judgment embedded within technology-mediated workflows. A central finding is the prominence of adaptability as a mediating value linking human and technological domains. Adaptability is constructed as a core professional requirement, reflecting expectations that translators continuously adjust their skills, roles, and identities in response to evolving tools and organisational demands. The paper argues that automation reshapes rather than replaces translation value, creating an interdependent configuration in which technological efficiency enables human communicative work.

Authors:Liwen He, Pingting Chen, Ziheng Tang, Yixiao Liu, Jihong Jeung, Teng Han, Xin Tong
Title: From Pets to Robots: MojiKit as a Data-Informed Toolkit for Affective HRI Design
Abstract:
Designing affective behaviors for animal-inspired social robots often relies on intuition and personal experience, leading to fragmented outcomes. To provide more systematic guidance, we first coded and analyzed human-pet interaction videos, validated insights through literature and interviews, and created structured reference cards that map the design space of pet-inspired affective interactions. Building on this, we developed MojiKit, a toolkit combining reference cards, a zoomorphic robot prototype (MomoBot), and a behavior control studio. We evaluated MojiKit in co-creation workshops with 18 participants, finding that MojiKit helped them design 35 affective interaction patterns beyond their own pet experiences, while the code-free studio lowered the technical barrier and enhanced creative agency. Our contributions include the data-informed structured resource for pet-inspired affective HRI design, an integrated toolkit that bridges reference materials with hands-on prototyping, and empirical evidence showing how MojiKit empowers users to systematically create richer, more diverse affective robot behaviors.

Authors:Cameron Mohne, Nicholas Vo, Dora Demszky, Chris Piech
Title: EducaSim: Interactive Simulacra for CS1 Instructional Practice
Abstract:
Role play is a high-impact mode of training that has demonstrated its effectiveness in improving learning outcomes. However, it is difficult to scale to teacher instruction due to its inherent dependency on providing personnel who are both trained and available to facilitate this learning environment. This poses a challenge, especially to massive online courses which may employ and aid hundreds to thousands of novice teachers. In this work, we present EducaSim: a novel framework that uses generative agents to simulate a small-group section for teachers-in-training to practice instruction. EducaSim works by implementing diverse pedagogical-based personas, actual course material, and agent-based architectures constructed for instructional practice to provide a pedagogically rich environment for teachers-in-training to engage in role play learning -- without the costly overhead that comes with it. We share our experiences with constructing and making the tool available for experimental training and preparation in a six-week CS1 course supporting 20,000 students. We found that teachers who engaged generally saw it as a positive experience. We believe that EducaSim is an important step to providing experiential teaching practice at scale for closely-defined settings and has great potential for future applications.

Authors:Morgan Wack, Patrick Warren, Mustafa Alam
Title: The Laziness of the Crowd: Effort Aversion Among Raters Risks Undermining the Efficacy of X's Community Notes Program
Abstract:
Crowdsourced moderation systems like Twitter/X's Community Notes program have been proposed as scalable alternatives to professional fact-checkers for combating online misinformation. While prior research has examined the effectiveness of such systems in reducing engagement with false content and their vulnerability to partisan bias, we identify a previously untested mechanism linking fact-check difficulty to systematic non-participation by crowdsourced raters. We hypothesize that claims requiring less cognitive effort to evaluate, specifically, those that are obviously false and easy to refute, are more likely to receive public notes than claims that are more plausible and require greater effort to debunk. Using eighteen months of vaccine-related Community Notes data (2,250 posts) and ratings from 382 survey participants, we show that claims perceived as more difficult to fact-check are significantly less likely to receive notes that achieve ``helpful''/public status. Following the conduct of additional analyses and a fact-checking process utilizing an LLM pipeline to help rule out alternative explanations, we interpret this pattern as consistent with an unwillingness among raters to invest the mental effort required to evaluate and rate notes for more plausible misinformation. These findings suggest that crowdsourced moderation may systematically fail to address the forms of plausible misinformation which are most likely to deceive. We discuss implications for platform design and propose mechanisms to mitigate this difficulty penalty in crowdsourced content moderation systems.

Authors:Jun Rekimoto, Yu Nishimura, Bojian Yang
Title: NasoVoce: A Nose-Mounted Low-Audibility Speech Interface for Always-Available Speech Interaction
Abstract:
Silent and whispered speech offer promise for always-available voice interaction with AI, yet existing methods struggle to balance vocabulary size, wearability, silence, and noise robustness. We present NasoVoce, a nose-bridge-mounted interface that integrates a microphone and a vibration sensor. Positioned at the nasal pads of smart glasses, it unobtrusively captures both acoustic and vibration signals. The nasal bridge, close to the mouth, allows access to bone- and skin-conducted speech and enables reliable capture of low-volume utterances such as whispered speech. While the microphone captures high-quality audio, it is highly sensitive to environmental noise. Conversely, the vibration sensor is robust to noise but yields lower signal quality. By fusing these complementary inputs, NasoVoce generates high-quality speech robust against interference. Evaluation with Whisper Large-v2, PESQ, STOI, and MUSHRA ratings confirms improved recognition and quality. NasoVoce demonstrates the feasibility of a practical interface for always-available, continuous, and discreet AI voice conversations.

Authors:Yiyuan Wang, Andrew Johnston, Zoë Sadokierski, Rhiannon Stephens, Shane T. Ahyong
Title: Conversational AI-Enhanced Exploration System to Query Large-Scale Digitised Collections of Natural History Museums
Abstract:
Recent digitisation efforts in natural history museums have produced large volumes of collection data, yet their scale and scientific complexity often hinder public access and understanding. Conventional data management tools, such as databases, restrict exploration through keyword-based search or require specialised schema knowledge. This paper presents a system design that uses conversational AI to query nearly 1.7 million digitised specimen records from the life-science collections of the Australian Museum. Designed and developed through a human-centred design process, the system contains an interactive map for visual-spatial exploration and a natural-language conversational agent that retrieves detailed specimen data and answers collection-specific questions. The system leverages function-calling capabilities of contemporary large language models to dynamically retrieve structured data from external APIs, enabling fast, real-time interaction with extensive yet frequently updated datasets. Our work provides a new approach of connecting large museum collections with natural language-based queries and informs future designs of scientific AI agents for natural history museums.

Authors:Alejandro Pradas-Gomez, Arindam Brahma, Ola Isaksson
Title: DUCTILE: Agentic LLM Orchestration of Engineering Analysis in Product Development Practice
Abstract:
Engineering analysis automation in product development relies on rigid interfaces between tools, data formats and documented processes. When these interfaces change, as they routinely do as the product evolves in the engineering ecosystem, the automation support breaks. This paper presents a DUCTILE (Delegated, User-supervised Coordination of Tool- and document-Integrated LLM-Enabled) agentic orchestration, an approach for developing, executing and evaluating LLM-based agentic automation support of engineering analysis tasks. The approach separates adaptive orchestration, performed by the LLM agent, from deterministic execution, performed by verified engineering tools. The agent interprets documented design practices, inspects input data and adapts the processing path, while the engineer supervises and exercises final judgment. DUCTILE is demonstrated on an industrial structural analysis task at an aerospace manufacturer, where the agent handled input deviations in format, units, naming conventions and methodology that would break traditional scripted pipelines. Evaluation against expert-defined acceptance criteria and deployment with practicing engineers confirm that the approach produces correct, methodologically compliant results across 10 repeated independent runs. The paper discusses the paradigm shift and the practical consequences of adopting agentic automation, including unintended effects on the nature of engineering work when removing mundane tasks and creating an exhausting supervisory role.

Authors:Ajay Anand, Gabriel Parra, Chad A. Berghoff, Laura A. Hallock
Title: Characterizing Healthy & Post-Stroke Neuromotor Behavior During 6D Upper-Limb Isometric Gaming: Implications for Design of End-Effector Rehabilitation Robot Interfaces
Abstract:
Successful robot-mediated rehabilitation requires designing games and robot interventions that promote healthy motor practice. However, the interplay between a given user's neuromotor behavior, the gaming interface, and the physical robot makes designing system elements -- and even characterizing what behaviors are "healthy" or pathological -- challenging. We leverage our OpenRobotRehab 1.0 open access data set to assess the characteristics of 13 healthy and 2 post-stroke users' force output, muscle activations, and game performance while executing isometric trajectory tracking tasks using an end-effector rehabilitation robot. We present an assessment of how subtle aspects of interface design impact user behavior; an analysis of how pathological neuromotor behaviors are reflected in end-effector force dynamics; and a novel hidden Markov model (HMM)-based neuromotor behavior classification method based on surface electromyography (sEMG) signals during cyclic motions. We demonstrate that task specification (including which axes are constrained and how users interpret tracking instructions) shapes user behavior; that pathology-related features are detectable in 6D end-effector force data during isometric task execution (with significant differences between healthy and post-stroke profiles in force error and average force production at $p=0.05$); and that healthy neuromotor strategies are heterogeneous and inherently difficult to characterize. We also show that our HMM-based models discriminate healthy and post-stroke neuromotor dynamics where synergy-based decompositions reflect no such differentiation. Lastly, we discuss these results' implications for the design of adaptive end-effector rehabilitation robots capable of promoting healthier movement strategies across diverse user populations.

Authors:Francisco José Gárate, Paloma Chausa, Diego Moreno, Judit López Luque, Vicens Díaz-Brito, Enrique Javier Gómez
Title: A Governance and Evaluation Framework for Deterministic, Rule-Based Clinical Decision Support in Empiric Antibiotic Prescribing
Abstract:
Empiric antibiotic prescribing in high-risk clinical contexts often requires decision making under conditions of incomplete information, where inappropriate coverage or unjustified escalation may compromise safety and antimicrobial stewardship. While clinical decision-support systems have been proposed to assist in this process, many approaches lack explicit governance and evaluation mechanisms defining scope, abstention conditions, recommendation permissibility, and expected system behavior. This work specifies a governance and evaluation framework for deterministic clinical decision-support systems operating under explicitly constrained scope. Deterministic behavior is adopted to ensure that identical inputs yield identical outputs, supporting transparency, auditability, and conservative decision support in high-risk prescribing contexts. The framework treats governance as a first-class design component, separating clinical decision logic from rule-based mechanisms that determine whether a recommendation may be issued. Explicit abstention, deterministic stewardship constraints, and exclusion rules are formalized as core constructs. The framework defines an evaluation methodology utilizing a fixed set of synthetic, mechanism-driven clinical cases with predefined expected behavior. This validation process focuses on behavioral alignment with specified rules rather than clinical effectiveness, predictive accuracy, or outcome optimization. Within this protocol, abstention is treated as a correct and intended outcome when governance conditions are not satisfied. The proposed framework provides a reproducible approach for specifying, governing, and inspecting deterministic clinical decision-support systems in empiric antibiotic prescribing contexts where transparency, auditability, and conservative behavior are prioritized.

Authors:Michael Keeman, Anastasia Keeman
Title: Empathy Is Not What Changed: Clinical Assessment of Psychological Safety Across GPT Model Generations
Abstract:
When OpenAI deprecated GPT-4o in early 2026, thousands of users protested under #keep4o, claiming newer models had "lost their empathy." No published study has tested this claim. We conducted the first clinical measurement, evaluating three OpenAI model generations (GPT-4o, o4-mini, GPT-5-mini) across 14 emotionally challenging conversational scenarios in mental health and AI companion domains, producing 2,100 scored AI responses assessed on six psychological safety dimensions using clinically-grounded rubrics. Empathy scores are statistically indistinguishable across all three models (Kruskal-Wallis H=4.33, p=0.115). What changed is the safety posture: crisis detection improved monotonically from GPT-4o to GPT-5-mini (H=13.88, p=0.001), while advice safety declined (H=16.63, p<0.001). Per-turn trajectory analysis -- a novel methodological contribution -- reveals these shifts are sharpest during mid-conversation crisis moments invisible to aggregate scoring. In a self-harm scenario involving a minor, GPT-4o scored 3.6/10 on crisis detection during early disclosure turns; GPT-5-mini never dropped below 7.8. What users perceived as "lost empathy" was a shift from a cautious model that missed crises to an alert model that sometimes says too much -- a trade-off with real consequences for vulnerable users, currently invisible to both the people who feel it and the developers who create it.

Authors:Xinyao Zhuang, Jose Echevarria, Kaan Akşit
Title: Prompt-Driven Color Accessibility Evaluation in Diffusion-based Image Generation Models
Abstract:
Generative models are increasingly integrated into creative workflows. While text-to-image generation excels in visual quality and diversity, color accessibility for users with Color Vision Deficiencies (CVD) remains largely unexplored. Our work systematically evaluates color accessibility in images generated by a common pretrained diffusion model, prompted to improve accessibility across diverse categories. We quantify performance using established, off-the-shelf CVD simulation methods and introduce "CVDLoss", a new metric measuring differences in image gradients indicative of structural detail. We validate CVDLoss against a commonly used daltonization method, demonstrating its sensitivity to color accessibility modifications. Applying CVDLoss to model outputs reveals that existing diffusion models struggle to reliably respond to accessibility-focused prompts. Consequently, our study establishes CVDLoss as a valuable evaluation tool for accessibility-aware image generation and post-processing, offering insights into current generative models' limitations in addressing color accessibility.

Authors:Jaap Munneke, Jennifer E. Corbett
Title: The Richest Paradigm You're Not Using: Commercial Videogames at the Intersection of Human-Computer Interaction and Cognitive Science
Abstract:
Synthesizing from Corbett and Munneke (2025), who demonstrated that questions originating in human-computer interaction (HCI) and game design can be answered through the theoretical toolkit of cognitive science, this perspective argues that commercial videogames represent a largely underutilized research environment at the intersection of these two fields. Cognitive science has long relied on carefully controlled laboratory paradigms to study perception, attention, and executive functioning, raising persistent questions about ecological validity. HCI, by contrast, has spent decades developing methods for studying behavior in rich, complex, interactive environments, but has been less concerned with what that behavior reveals about underlying cognitive mechanisms. Commercial videogames sit precisely at this intersection. They are cognitively demanding by design, motivating by nature, and consistent enough across players to support systematic behavioral comparison. The affordance structure of a game does the work that experimental manipulations typically require of the researcher, instantiating cognitive demands that are genuine, sustained, and meaningful to the player. We argue that perception, attention, and executive functioning can be meaningfully studied within commercial games using a minimal observational toolkit of screen recording, eye tracking, and behavioral timing. We propose an affordance-cognition mapping framework as a systematic basis for game selection and research design and offer practical methodological recommendations for researchers wishing to work in this space.

Authors:Haidan Liu, Poorvi Bhatia, Nicholas Vincent, Parmit Chilana
Title: Tracing Everyday AI Literacy Discussions at Scale: How Online Creative Communities Make Sense of Generative AI
Abstract:
Developing AI literacy is increasingly urgent as generative AI reshapes creative practice. Yet most AI literacy frameworks are top-down and expert-driven, overlooking how literacy emerges organically in creative communities. To address this gap, we performed a large-scale analysis of 122k Reddit conversations from 80 creative-oriented subreddits over a three-year period. Our analysis identified four consistent themes in AI literacy-related discussions, and we further traced how discourse shifted alongside major AI events. Surprisingly, creators primarily frame AI literacy around how to use tools effectively, foregrounding practice and task skills, while discussions of AI capabilities and ethics surge only around high-profile events. Our findings suggest that AI literacy is dynamic, practice-driven, and event-responsive rather than static or purely conceptual. This study provides insights for researchers, designers, and policymakers to develop learning resources, community support, and policies that better promote AI literacy in creative communities.

Authors:Tianyi Li, Jin Wei-Kocsis
Title: Clarifying the Compass: A Reflexive Narrative on Entry Barriers into HCI and Aging Research
Abstract:
This manuscript presents the perspectives and reflections of two researchers who were not previously engaged in aging research, regarding the gaps and barriers related to interdisciplinary collaboration on HCI and Aging research. The manuscript has two sections. In the first section, the authors discuss their observations on the disconnect between the needs of aging populations and the design of emerging technologies. The second section delves into their personal journey of developing empathy and a deeper understanding of older adults by volunteering in a senior living community, and shares their reflective thoughts on these experiences.

Authors:Haichang Li, Anjun Zhu, Arpit Narechania
Title: Alignment-Process-Outcome: Rethinking How AIs and Humans Collaborate
Abstract:
In real-world collaboration, alignment, process structure, and outcome quality do not exhibit a simple linear or one-to-one correspondence: similar alignment may accompany either rapid convergence or extensive multi-branch exploration, and lead to different results. Existing accounts often isolate these dimensions or focus on specific participant types, limiting structural accounts of collaboration. We reconceptualize collaboration through two complementary lenses. The task lens models collaboration as trajectory evolution in a structured task space, revealing patterns such as advancement, branching, and backtracking. The intent lens examines how individual intents are expressed within shared contexts and enter situated decisions. Together, these lenses clarify the structural relationships among alignment, decision-making, and trajectory structure. Rather than reducing collaboration to outcome quality or treating alignment as the sole objective, we propose a unified dynamic view of the relationships among alignment, process, and outcome, and use it to re-examine collaboration structure across Human-Human, AI-AI, and Human-AI settings.

Authors:Md Mojibur Rahman Redoy Akanda, Ahmed Tanvir Mahdad, Nitesh Saxena
Title: Broken Access: On the Challenges of Screen Reader Assisted Two-Factor and Passwordless Authentication
Abstract:
In today's technology-driven world, web services have opened up new opportunities for blind and visually impaired people to interact independently. Securing interactions with these services is crucial; however, currently deployed authentication mainly concentrate on sighted users, overlooking the needs of the blind and visually impaired community. In this paper, we address this gap by investigating the security and accessibility aspects of these authentication when adopted by blind and visually impaired users. We model web authentication for such users as screen reader assisted authentication and introduce an evaluation framework called AWARE. Using AWARE, we then systematically assessed popular PC and smartphone-based screen readers against different authentication methods, including variants of 2FA and passwordless schemes, to simulate real-world scenarios. We analyzed these screen reader assisted authentication interactions with authentication methods in three settings: using a terminal (PC) with screen readers, a combination of the terminal (PC) and smartphone with screen readers, and smartphones with integrated screen readers. The results of our study underscore weaknesses in all of our observed screen reader assisted scenarios for real-life authentication methods. These weaknesses, encompassing specific accessibility issues caused by imprecise screen reader instructions, highlight vulnerability concerning observed scenarios for both real-world and research literature based attacks, including phishing, concurrency, fatigue, cross-service, and shoulder surfing. Broadly, our AWARE framework can be used by designers as a precursor to user studies which are typically time-consuming and tedious to perform, independently allowing to unfold security and accessibility problems early which designers can address prior to full-fledged user testing of more isolated issues.

Authors:Yasmin Zaraket, Céline Mougenot
Title: YAQIN: Culturally Sensitive, Agentic AI for Mental Healthcare Support Among Muslim Women in the UK
Abstract:
Mental healthcare services in the UK lack tools and resources to address the cultural needs of Muslim women, often leaving them feeling as though their values are pathologised and limiting trust and engagement [1]. Despite growing awareness of cultural competency, few interventions integrate Islamic frameworks into therapeutic support. This report investigates the design and evaluation of YAQIN, a co-designed AI-based application supporting culturally and faith-sensitive mental health engagement for Muslim women. With almost 1.9 million Muslim women in England in 2021, YAQIN responds to a gap in care [2]. It leverages AIś anonymity and continuous support through a faith-aware chatbot and guided journaling tool grounded in user-centred design and Islamic psychology. The YAQIN design research methodology comprised three stages: contextual investigation and literature review, user research with N=14 stakeholders including Muslim women and mental health experts, and prototype development informed by deductive thematic analysis, personas, journey maps, and design specifications. Evaluation involved a co-designed user study with five participants: four Muslim women and one mental health expert who reviewed therapeutic alignment and cultural sensitivity after using the chatbot prototype. Feedback focused on tone, faith relevance, emotional resonance, and the Retrieval-Augmented Generation pipeline enabling contextual continuity. Participants highlighted YAQINś ability to bridge cultural gaps in trust and therapeutic confidence. Feedback included suggestions of including linguistic diversity and routine-based guidance. This project demonstrates how culturally sensitive AI can improve mental healthcare accessibility and trust for marginalised communities and highlights the potential of faith-integrated technology in healthcare innovation.

Authors:Tae Hee Jo, Kyung Hoon Hyun
Title: From Logs to Agents: Reconstructing High-Level Creative Workflows from Low-Level Raw System Traces
Abstract:
Current AI-based Creativity Support Tools (CSTs) generate massive amounts of low-level log data (e.g., clicks, parameter tweaks, metadata updates) that are hard to interpret as "creative intent". We argue that to enable future agentic systems to understand and assist users, we must first translate these noisy system traces into meaningful high-level user behavioral traces. We propose a method that parses raw csv/JSON logs into structured behavioral workflow graphs that map the provenance and flow of creative assets. By abstracting low-level system events into high-level behavioral tokens (e.g., MODIFY_Prompt, GENERATE_Image), this method enables downstream analyses like sequence mining and probabilistic modeling. We discuss how this structured workflow history is a prerequisite for "Process-Aware Agents" - systems capable of suggesting next design moves or explaining rationales based on a deeper understanding of the user's workflow patterns and history.

Authors:Mengfei Gao, Caroline Appert, Ludovic David, Emmanuel Pietriga
Title: AiRWeb: Using AR to Extend Web Browsing Beyond Handheld Screens
Abstract:
Browsing the Web on mobile devices is often cumbersome due to their limited screen space. We investigate a phone+AR Web browsing approach, AiRWeb, that leverages the structural properties of Web pages to allow users to seamlessly select and offload arbitrary Web content into the space surrounding them. Focusing on flexibility, AiRWeb lets users decide what to offload, when to do so, and how offloaded content is arranged, enabling personalized organization tailored to the task at hand. We developed a fully functional prototype using standard Web technologies, that covers the complete interaction workflow, from the selection of elements to offload from the phone to their manipulation in the air. Results from a preliminary study conducted using this prototype suggest that AiRWeb is learnable and usable, while also revealing open design challenges around offload mode activation in particular.

Authors:Angel Hsing-Chi Hwang, Senya Wong, Baixiao Chen, Jessica He, Hyo Jin Do
Title: "Better Ask for Forgiveness than Permission": Practices and Policies of AI Disclosure in Freelance Work
Abstract:
The growing use of AI applications among freelance workers is reshaping trust and relationships with clients. This paper investigates how both workers and clients perceive AI use and disclosure in the freelance economy through a three-stage study: interviews with workers and two survey studies with workers and clients. Findings first reveal a key expectation gap around disclosure: Workers often adopt passive disclosure practices, revealing AI use only when asked, as they assume clients can already detect it. Clients, however, are far less confident in recognizing AI-assisted work and prefer proactive disclosure. A second finding highlights the role of unclear or absent client AI policies, which leave workers consistently misinterpreting clients' expectations for AI use and disclosure. Together, these gaps point to the need for clearer guidelines and practices for AI disclosure. Insights extend beyond freelancing, offering implications for trust, accountability, and policy design in other AI-mediated work domains.

Authors:Balint K. Hodossy, Dario Farina
Title: Neural Control and Learning of Simulated Hand Movements With an EMG-Based Closed-Loop Interface
Abstract:
The standard engineering approach when facing uncertainty is modelling. Mixing data from a well-calibrated model with real recordings has led to breakthroughs in many applications of AI, from computer vision to autonomous driving. This type of model-based data augmentation is now beginning to show promising results in biosignal processing as well. However, while these simulated data are necessary, they are not sufficient for virtual neurophysiological experiments. Simply generating neural signals that reproduce a predetermined motor behaviour does not capture the flexibility, variability, and causal structure required to probe neural mechanisms during control tasks. In this study, we present an in silico neuromechanical model that combines a fully forward musculoskeletal simulation, reinforcement learning, and sequential, online electromyography synthesis. This framework provides not only synchronised kinematics, dynamics, and corresponding neural activity, but also explicitly models feedback and feedforward control in a virtual participant. In this way, online control problems can be represented, as the simulated human adapts its behaviour via a learned RL policy in response to a neural interface. For example, the virtual user can learn hand movements robust to perturbations or the control of a virtual gesture decoder. We illustrate the approach using a gesturing task within a biomechanical hand model, and lay the groundwork for using this technique to evaluate neural controllers, augment training datasets, and generate synthetic data for neurological conditions.

Authors:Lingwei Cheng, Saerim Kim, Andrew Sullivan
Title: Collaboration by Mandate: How Shared Data Infrastructure Shapes Coordination and Control in U.S. Homelessness Services
Abstract:
When governments mandate collaboration, shared data systems can serve both as tools for coordination and instruments of control. This study examines U.S. homelessness service networks, where Continuums of Care (CoCs) coordinate service providers through the federally mandated Homeless Management Information System (HMIS). With client consent, providers enter data into HMIS and access cross-provider service histories to support coordinated care. At the same time, HMIS embeds standards and governance rules that shape who can collect, access, interpret, and act on data, and thus who holds decision authority. Using qualitative interviews with six experts, we show that standardization can facilitate collaboration and shared learning. However, unequal resources, analytic capacity, and authority limit equitable participation and often shift some participants toward compliance-focused roles. We contribute to public-interest design research on civic data infrastructures by illustrating how mandated data sharing can simultaneously enable coordination and accountability while reproducing power asymmetries in data interpretation and decision-making.

Authors:Vasty A. Adomako, Kaisu Mumuni, Eugene M. Akoto, Felix N. Koranteng
Title: Exploring the Drivers of Information Security Policy Compliance Among Contingent Employees: A Social, Deterrent, and Involvement-Based Approach
Abstract:
As institutions increasingly depend on Information Systems (ISs), ensuring compliance with Information Systems Security Policies (ISSPs) is critical, especially among contingent employees, whose engagement differs from that of permanent staff. This study examines how Subjective Norm, Deterrence (certainty of detection and severity of punishment), and involvement mechanisms (knowledge sharing and collaboration) influence contingent employees Attitudes Toward ISSPs and, ultimately, their Compliance Intentions. Drawing on data from Ghanaian universities and analyzed using PLS-SEM, the findings confirm that all proposed factors significantly shape attitudes, with knowledge sharing having the strongest effect. Attitude toward ISSPs also strongly predicts compliance intentions. The results support integrating social, cognitive, and collaborative factors into existing ISSP compliance models. Practical implications emphasize fostering inclusive and supportive environments alongside enforcement. This study advances theory and provides a foundation for future research into ISSP behavior among temporary academic staff.

Authors:Lois Fajuyigbe, Kaisu Mumuni, Felix Nti Koranteng
Title: Student Preferences for Online Interaction Platforms in Blended Learning: A Mixed-Methods Study
Abstract:
As higher education increasingly adopts blended learning, understanding students preferences for online interaction platforms becomes critical for effective course delivery and engagement. This study investigates the platforms undergraduate students prefer for academic communication and explores the underlying reasons for these choices. Data were collected from 37 students enrolled in two summer courses at a Ghanaian university using a structured questionnaire consisting of both closed and open-ended items. Quantitative results revealed a strong preference for instant messaging platforms such as WhatsApp and Telegram over institutional learning management systems. Qualitative content analysis of the open-ended responses identified five key factors influencing platform preference: convenience and familiarity, ease of use, accessibility, popularity among peers, and support for real-time interactions. These findings highlight a significant mismatch between students communication habits and institutional platform offerings. The study highlights the importance of aligning digital learning strategies with students lived digital experiences to enhance interaction, collaboration, and learner satisfaction in blended learning environments.

Authors:Xiaohan Peng, Sotiris Piliouras, Carl Abou Saada Nujaim
Title: From State Changes to Creative Decisions: Documenting and Interpreting Traces Across Creative Domains
Abstract:
Analyzing creative activity traces requires capturing activity at appropriate granularity and interpreting it in ways that reflect the structure of creative practice. However, existing approaches record state changes without preserving the intent or relationships that define higher-level creative moves. This decoupling manifests differently across domains: GenAI tools lose non-linear exploration structure, visualization authoring obscures representational intent, and programmatic environments flatten interaction boundaries. We present three complementary approaches: a node-based interface for stateful GenAI artifact management, a vocabulary of visual cues as higher-level creative moves in visualization authoring, and a programming model that embeds semantic histories directly into interaction state.

Authors:Bofan Yu, Borui Li, Tingyu Zhang, Xing-Dong Yang
Title: Understanding User Requirements for Creating Sensor-Powered Smart Car Cabins Through Retrofitting
Abstract:
In this paper, we explore a novel approach that leverages retrofitting to create sensor-powered smart car cabins. We propose that retrofitting offers a promising way to complement and extend the capabilities of built-in smart cabin sensors provided by car manufacturers. To understand how retrofitting solutions should be designed, we conducted a two-phase study. First, through semi-structured interviews with 18 participants, we examined challenges with built-in smart cabin sensors and identified opportunities where retrofitting could address these limitations. Second, through probe-based participatory design sessions with 15 participants, we identified user requirements and expectations for effective retrofit solutions. Based on our findings, we present a set of design recommendations to guide the future development of retrofit methods for smart car cabins.

Authors:Yiheng Liang, Kim Marriott, Helen C. Purchase
Title: Beyond Advocacy: A Design Space for Replication-Related Studies
Abstract:
The importance of replication is often discussed and advocated -- not only in the domains of visualization and HCI, but in all scientific areas. When replicating a study, design decisions need to be made with regards which aspects of the original study will remain the same and which will be altered. We present a supporting multi-dimensional design space framework within which such decisions can be identified, categorized, compared and analyzed. The framework treats replication experimental design as a pairwise comparison problem, and represents the design by four practical dimensions defined by three comparison levels. The design space is therefore a framework that can be used for both retrospective characterization and prospective planning. We provide worked examples, and relate our framework to other attempts at describing the scope of replication studies.

Authors:Arman Khalilbeigi Khameneh, Armin Mostafavi, Alicia Nahmad Vazquez
Title: Gamified Informed Decision-Making for Performance-Aware Design by Non-Experts: An Exoskeleton Design Case Study
Abstract:
Decision Support Systems (DSS) play a crucial role in enabling non-expert designers to explore complex, performance-driven design spaces. This paper presents a gamified decision-making framework that integrates game engines with real-time performance feedback. Performance criteria include structural behavior, environmental parameters, fabrication, material, and cost considerations. The developed design framework was tested with architecture students and non-expert designers on the design of an exoskeleton facade to retrofit an existing building. Participants (N=24) were able to iteratively modify façade geometries while receiving real-time feedback across the three key criteria: 1) structural behavior, including deflection, mass, and stress/strength ratio; 2) environmental parameters, such as solar gain and heating/cooling energy demands; and 3) fabrication considerations, including fabrication and material costs, robotic machining, and material setup. The evaluation of participant interactions reveals that gamified feedback mechanisms significantly enhance user comprehension and informed decision-making across the criteria. Further, participants' understanding of structural, material, and fabrication performance in relation to the iterative design task suggests that curated design spaces and structured guidance improve efficiency compared to open-ended generative tools. This research contributes to pre-occupancy evaluations, demonstrating how gamified environments enable stakeholder participation in the design process through informed decisionmaking and customized negotiation of performance criteria. .

Authors:Jianna So, Connie Cheng, Sonia Krishna Murthy
Title: Beyond Anthropomorphism: a Spectrum of Interface Metaphors for LLMs
Abstract:
Anthropomorphizing conversational technology is a natural human tendency. Today, the anthropomorphic metaphor is overly reinforced across intelligent tools. Large Language Models (LLMs) are particularly anthropomorphized through interface design. While metaphors are inherently partial, anthropomorphic interfaces highlight similarities between LLMs and humans, but mask crucial differences. As a result, the metaphor is often taken literally; users treat LLMs as if they are truly human. With few safeguards in place, this extreme anthropomorphism drives users to delusion and harm. Users also experience dissonance between the ethics of using LLMs, their growing ubiquity, and limited interface alternatives. We propose repositioning anthropomorphism as a design variable, developing opposing extremes as a theoretical framework for how interface metaphors shape and can disrupt the default metaphor. We introduce a spectrum of metaphors from transparency-driven ''anti-anthropomorphism'' to uncanny ''hyper-anthropomorphism''. These metaphors introduce materiality to interface metaphors, exposing LLMs as sociotechnical systems shaped by human labor, infrastructure, and data. This spectrum shifts interface design away from optimizing usability and toward encouraging critical engagement.

Authors:Nora Petrova, Andrew Gordon, Enzo Blindow
Title: Unpacking Human Preference for LLMs: Demographically Aware Evaluation with the HUMAINE Framework
Abstract:
The evaluation of large language models faces significant challenges. Technical benchmarks often lack real-world relevance, while existing human preference evaluations suffer from unrepresentative sampling, superficial assessment depth, and single-metric reductionism. To address these issues, we introduce HUMAINE, a framework for multidimensional, demographically aware measurement of human-AI interaction. We collected multi-turn, naturalistic conversations from 23,404 participants that were stratified across 22 demographic groups, both in the US and UK, to evaluate 28 state-of-the-art models across five human-centric dimensions. We use a hierarchical Bayesian Bradley-Terry-Davidson (BTD) model, with post-stratification to census data, and our analysis reveals three key insights. \textbf{(1)} We establish a clear performance hierarchy where \texttt{google/gemini-2.5-pro} ranks first overall, with a 95.6\% posterior probability of being the top-ranked model. \textbf{(2)} We uncover significant preference heterogeneity, with user age emerging as the primary demographic axis of disagreement; a model's perceived rank can shift substantially across age groups, exposing failures in generalisation that unrepresentative samples typically mask. \textbf{(3)} We quantify the vast difference in discriminative power across evaluation dimensions, with ambiguous qualities like \textit{Trust, Ethics \& Safety} showing a 65\% tie rate, in stark contrast to the decisive 10\% tie rate for \textit{Overall Winner}. Our work emphasises the need for a more multidimensional, demographically aware perspective in LLM evaluation. We release our complete dataset, interactive leaderboard, and open-source framework.

Authors:Gonzalo Gabriel Méndez, Jose Such
Title: Scrollytelling as an Alternative Format for Privacy Policies
Abstract:
Privacy policies are long, complex, and rarely read, which limits their effectiveness in informed consent. We investigate scrollytelling, a scroll-driven narrative approach, as a privacy policy presentation format. We built a prototype that interleaves the full policy text with animated visuals to create a dynamic reading experience. In an online study (N=454), we compared our tool against text, two nutrition-label variants, and a standalone interactive visualization. Scrollytelling improved user experience over text, yielding higher engagement, lower cognitive load, greater willingness to adopt the format, and increased perceived clarity. It also matched other formats on comprehension accuracy and confidence, with only one nutrition-label variant performing slightly better. Changes in perceived understanding, transparency, and trust were small and statistically inconclusive. These findings suggest that scrollytelling can preserve comprehension while enhancing the experience of policy reading. We discuss design implications for accessible policy communication and identify directions for increasing transparency and user trust.

Authors:Hyein Kim, Sung Park
Title: The Empty Quadrant: AI Teammates for Embodied Field Learning
Abstract:
For four decades, AIED research has rested on what we term the Sedentary Assumption: the unexamined design commitment to a stationary learner seated before a screen. Mobile learning and museum guides have moved learners into physical space, and context-aware systems have delivered location-triggered content -- yet these efforts predominantly cast AI in the role of information-de-livery tool rather than epistemic partner. We map this gap through a 2 x 2 matrix (AI Role x Learning Environment) and identify an undertheorized intersection: the configuration in which AI serves as an epistemic teammate during unstruc-tured, place-bound field inquiry and learning is assessed through trajectory rather than product. To fill it, we propose Field Atlas, a framework grounded in embod-ied, embedded, enactive, and extended (4E) cognition, active inference, and dual coding theory that shifts AIED's guiding metaphor from instruction to sensemak-ing. The architecture pairs volitional photography with immediate voice reflec-tion, constrains AI to Socratic provocation rather than answer delivery, and ap-plies Epistemic Trajectory Modeling (ETM) to represent field learning as a con-tinuous trajectory through conjoined physical-epistemic space. We demonstrate the framework through a museum scenario and argue that the resulting trajecto-ries -- bound to a specific body, place, and time -- constitute process-based evi-dence structurally resistant to AI fabrication, offering a new assessment paradigm and reorienting AIED toward embodied, dialogic human-AI sensemaking in the wild.

Authors:Satabdi Das, Nahian Beente Firuj, Manjot Singh, Arshad Nasser, Khalad Hasan
Title: Are You Comfortable Sharing It?: Leveraging Image Obfuscation Techniques to Enhance Sharing Privacy for Blind and Visually Impaired Users
Abstract:
People with Blind Visual Impairments (BVI) face unique challenges when sharing images, as these may accidentally contain sensitive or inappropriate content. In many instances, they are unaware of the potential risks associated with sharing such content, which can compromise their privacy and interpersonal relationships. To address this issue, we investigated image filtering techniques that could help BVI users manage sensitive content before sharing with various audiences, including family, friends, or strangers. We conducted a study with 20 BVI participants, evaluating different filters applied to images varying in sensitivity, such as personal moments or embarrassing shots. Results indicated that pixelation was the least preferred method, while preferences for other filters varied depending on image type and sharing context. Additionally, participants reported greater comfort when sharing filtered versus unfiltered images across audiences. Based on the results, we offer a set of design guidelines to enhance the image-sharing experience for BVI individuals.

Authors:Xiaohan Peng, Wendy E. Mackay, Janin Koch
Title: Design Generative AI for Practitioners: Exploring Interaction Approaches Aligned with Creative Practice
Abstract:
Design is a non-linear, reflective process in which practitioners engage with visual, semantic, and other expressive materials to explore, iterate, and refine ideas. As Generative AI (GenAI) becomes integrated into professional design practice, traditional interaction approaches focusing on prompts or whole-image manipulation can misalign AI output with designers' intent, forcing visual thinkers into verbal reasoning or post-hoc adjustments. We present three interaction approaches from DesignPrompt, FusAIn, and DesignTrace that distribute control across intent, input, and process, enabling designers to guide AI alignment at different stages of interaction. We further argue that alignment is a dynamic negotiation, with AI adopting proactive or reactive roles according to designers' instrumental and inspirational needs and the creative stage.

Authors:Maria Moskalenko, Alexander Trifanov, Roman Popkov, Arina Tabieva, Maria Smirnova, Konstantin Pravdin, Daniil Bakalin
Title: Changing Pedagogical Paradigms: Integrating Generative AI in Mathematics to Enhance Digital Literacy through 'Mathematical Battles with AI'
Abstract:
This paper introduces `Math Battles with AI', an innovative competitive format designed at ITMO University to redefine the role of generative AI in mathematics education. Moving away from a purely defensive stance, the authors propose an AI agent with intentionally increased hallucination likelihood in specific modes to train verification skills. We describe the three-stage tournament structure and a specialized assessment system that rewards critical verification over blind reliance. Initial results indicate a significant shift in student mindsets, fostering essential skills in digital hygiene and prompt engineering. This work serves as a practical guide for academic institutions aiming to leverage AI for enhancing, rather than undermining, intellectual development.

Authors:Fabio Cortes Rodriguez, Luciano Abriata
Title: Speech recognition assisted by large language models to command software orally -- Application to an augmented and virtual reality web app for immersive molecular graphics
Abstract:
This project successfully developed, evaluated and integrated a Voice User Interface (VUI) into a web application that we are developing for immersive molecular graphics. Said app provides augmented and virtual reality (AR and VR) environments where users manipulate molecules with their hands, but this means the hands can't be used to control the app through a regular mouse- and keyboard-based GUI. The speech-based VUI system developed here alleviates this problem, making it easy to control the app via natural spoken (or typed) commands. To achieve this VUI we evaluated two distinct Automated Speech Recognition (ASR) systems: Chrome's native Speech API and OpenAI's Whisper v3. While Whisper offered broader browser compatibility, its tendency to "hallucinate" with specialized scientific jargon proved very problematic. Consequently, we selected Chrome's ASR for its stability, speed, and reliability. For translating transcribed speech into software commands, we tested two Large Language Model (LLM)-driven approaches: either generating executable code, or calling predefined functions. The function call method, powered by OpenAI's GPT-4o-mini, was ultimately adopted due to its superior safety, efficiency, and reliability over the more complex and error-prone code-generation approach. The resulting VUI is then based on an integration of Chrome's ASR with our LLM-based function-calling module, enabling users to command the application using natural language as shown in a video linked inside this report. We provide links to live examples demonstrating all the intermediate components, and details on how we crafted the LLM's prompt in order to teach it the function calls as well as ways to clean up the transcribed speech and to explain itself while generating function calls. For best demonstration of the final system, we provide a video example.

Authors:Tse Pei Ng, Daniel Campos-Muniz, Yiyang He, Ker Wey Aw, Jung-Joo Lee, Janghee Cho
Title: "It's Messy...But I Feel Balanced": Unpacking Flexible Workers' Rhythm-Making Practices Using an Asset-Based Approach
Abstract:
Flexible work is increasingly pursued as a means of achieving work-life balance, particularly as growing caregiving responsibilities for children and aging family members shape workers' lives. Yet most HCI research has examined flexibility primarily through productivity and organizational perspectives, with less attention to how it intersects with workers' personal and family responsibilities. To address this gap, we conducted a qualitative study with 20 workers in Singapore engaging in flexible arrangements to manage paid work and care responsibilities. Using an asset-based lens, we show that flexibility is not a static benefit but a continual practice of rhythm-making. Participants maintained rhythms by drawing on temporal and spatial assets, negotiated them through relational and institutional dynamics, and sustained them through intrapersonal assets such as self-care and positive reframing. Our study reframes blurred boundaries as resources rather than disruptions and offers design implications for technologies that support flexible workers' everyday rhythm-making practices.

Authors:Ilias Triantafyllopoulos, Panos Ipeirotis
Title: Learning to Pay Attention: Unsupervised Modeling of Attentive and Inattentive Respondents in Survey Data
Abstract:
The integrity of behavioral and social-science surveys depends on detecting inattentive respondents who provide random or low-effort answers. Traditional safeguards, such as attention checks, are often costly, reactive, and inconsistent. We propose a unified, label-free framework for inattentiveness detection that scores response coherence using complementary unsupervised views: geometric reconstruction (Autoencoders) and probabilistic dependency modeling (Chow-Liu trees). While we introduce a "Percentile Loss" objective to improve Autoencoder robustness against anomalies, our primary contribution is identifying the structural conditions that enable unsupervised quality control. Across nine heterogeneous real-world datasets, we find that detection effectiveness is driven less by model complexity than by survey structure: instruments with coherent, overlapping item batteries exhibit strong covariance patterns that allow even linear models to reliably separate attentive from inattentive respondents. This reveals a critical ``Psychometric-ML Alignment'': the same design principles that maximize measurement reliability (e.g., internal consistency) also maximize algorithmic detectability. The framework provides survey platforms with a scalable, domain-agnostic diagnostic tool that links data quality directly to instrument design, enabling auditing without additional respondent burden.

Authors:Ari Wahl, Dorian Gawlinski, David Przewozny, Paul Chojecki, Felix Bießmann, Sebastian Bosse
Title: Monocular 3D Object Position Estimation with VLMs for Human-Robot Interaction
Abstract:
Pre-trained general-purpose Vision-Language Models (VLM) hold the potential to enhance intuitive human-machine interactions due to their rich world knowledge and 2D object detection capabilities. However, VLMs for 3D coordinates detection tasks are rare. In this work, we investigate interactive abilities of VLMs by returning 3D object positions given a monocular RGB image from a wrist-mounted camera, natural language input, and robot states. We collected and curated a heterogeneous dataset of more than 100,000 images and finetuned a VLM using QLoRA with a custom regression head. By implementing conditional routing, our model maintains its ability to process general visual queries while adding specialized 3D position estimation capabilities. Our results demonstrate robust predictive performance with a median MAE of 13 mm on the test set and a five-fold improvement over a simpler baseline without finetuning. In about 25% of the cases, predictions are within a range considered acceptable for the robot to interact with objects.

Authors:Soumita Mukherjee, Priya Kumar, Laura Cabrera
Title: Opportunities and Challenges of Operating Semi-Autonomous Vehicles: A Layered Vulnerability Perspective
Abstract:
This study examines how vulnerability is produced for human operators of Tesla's Full Self-Driving (FSD), a Level 2 semi-autonomous vehicle (SAV) system, by applying Florencia Luna's layered vulnerability framework. While existing road safety models conceptualize vulnerability as a fixed attribute of external road users, emerging evidence suggests that semi-autonomous vehicle operators themselves experience dynamic and situational vulnerability as they supervise automated systems that they do not fully control. To investigate this phenomenon, we conducted semi-structured interviews with 17 active FSD users, analyzing their accounts through a combined deductive-inductive coding process aligned with Luna's framework. Findings reveal three interacting layers of operator vulnerability, namely psychological, operational, and social. Vulnerability emerged not from any single layer but from how these layers converged in specific situations, creating fluctuating supervisory demands and uneven capacity to recognize and manage risk. The findings extend debates on contextual trust calibration, automation complacency, and meaningful human control by demonstrating how factors commonly treated as liabilities such as trust or informal learning, can both increase and mitigate vulnerability depending on context. This analysis determines the need for design and regulatory interventions that address psychological, operational, and social conditions together rather than in isolation, and highlights how responsibility is implicitly shifted onto individual operators within inadequately supported supervisory regimes.

Authors:Tim Rieder, Marian Schneider, Mario Truss, Vitaly Tsaplin, Alina Rublea, Sinem Dere, Francisco Chicharro Sanz, Tobias Reiss, Mustafa Doga Dogan
Title: SimAB: Simulating A/B Tests with Persona-Conditioned AI Agents for Rapid Design Evaluation
Abstract:
A/B testing is a standard method for validating design decisions, yet its reliance on real user traffic limits iteration speed and makes certain experiments impractical. We present SimAB, a system that reframes A/B testing as a fast, privacy-preserving simulation using persona-conditioned AI agents. Given design screenshots and a conversion goal, SimAB generates user personas, deploys them as agents that state their preference, aggregates results, and synthesizes rationales. Through a formative study with experimentation practitioners, we identified scenarios where traffic constraints hinder testing, including low-traffic pages, multi-variant comparisons, micro-optimizations, and privacy-sensitive contexts. Our design emphasizes speed, early feedback, actionable rationales, and audience specification. We evaluate SimAB against 47 historical A/B tests with known outcomes, achieving 67% overall accuracy, increasing to 83% for high-confidence cases. Additional experiments show robustness to naming and positional bias and demonstrate accuracy gains from personas. Practitioner feedback suggests that SimAB supports faster evaluation cycles and rapid screening of designs difficult to assess with traditional A/B tests.

Authors:Yifan Li, Xingyu Lan
Title: Remember You: Understanding How Users Use Deadbots to Reconstruct Memories of the Deceased
Abstract:
Generative AI has enabled ``Deadbots'', offering mourners an interactive way to engage with simulations of the deceased. While existing research often emphasizes ethics, less is known about how bereaved individuals construct and reshape memory through such interactions. To address this gap, this study draws on in-depth interviews with 26 users. Findings reveal that users are not passive recipients but active constructors of the deceased's digital representation. Through selective input, ongoing interactive adjustments and imaginative cognitive supplementation, they build an idealized digital figure blending authentic memories with personal expectations. Deadbots provide a private space to grieve without social pressure and a channel to address unresolved emotions. In this process, users' memory of the deceased evolves dynamically: from initial reinforcement and idealization to a later stage where AI-generated new memories blur with authentic recollections, reflecting a complex desire for connection through an artificial medium. This blurring raises ethical concerns regarding memory distortion and dependency, underscoring the need for future clinical research on the long-term impact of AI-mediated grieving.

Authors:Inhwa Song, Kwangyoung Lee, Janghee Cho, Amon Rapp, Hwajung Hong
Title: Constructing Everyday Well-Being: Insights from God-Saeng for Personal Informatics
Abstract:
While Personal Informatics (PI) systems support behavior change, everyday well-being involves more than achieving individual target behaviors. It is shaped by cultural narratives that give actions meaning. In South Korea, the God-Saeng phenomenon, encompassing disciplined, collective, and publicly documented self-improvement practices, offers a lens into how well-being is negotiated in daily life. We conducted a 10-day probe (N=24) with bite-sized missions to examine how young adults engaged in God-Saeng. Participants relied on planning practices, accountability infrastructures, and datafication to stabilize themselves, yet these same routines also intensified pressures toward self-monitoring and performance. They navigated tensions between consistency and flexibility, authenticity and visibility, and productivity and broader values such as relationships, and reinterpreted ordinary activities through sociocultural contexts. These insights suggest design opportunities for PI systems that move beyond tracking, toward digital instruments that help users negotiate tensions, make meaning, and reflexively understand how technologies participate in their culturally and existentially situated well-being.

Authors:Yuqing Hu, Wendao Xue, Yifan Yu, Yong Tan
Title: From Dyads to Groups: Rethinking Emotional Support with Conversational AI
Abstract:
Advances in artificial intelligence (AI), together with persistent gaps in access to reliable emotional support, have positioned AI as an increasingly prominent source of emotional assistance. However, most AI-based emotional support applications and prior research focus on one-on-one interactions between users and a single AI agent, leaving the potential advantages of alternative support configurations largely unexplored. Drawing on social support and support group theory, this research examines whether AI-based emotional support delivered by a group of AI agents (group AI support) can constitute a more effective support form than single-agent support (single AI support). We propose that group AI support enhances users' perceived support efficacy, that this effect operates by strengthening users' connectedness with the AI system, and that the composition of support types within AI groups further shapes support outcomes. Three experiments provide convergent support for these claims. By identifying when and why group AI emotional support outperforms single AI support, this work advances theoretical understanding of AI-based emotional support and provides actionable guidance for the design of AI support systems.

Authors:Andrea Cuadra, Samar Sabie, Yan Shvartzshnaider, Deborah Estrin
Title: Privacy Cards for Surfacing Mental Models and Exploring Privacy Concerns: A Case Study of Voice-First Ambient Interfaces with Older Adults
Abstract:
We investigate the ethical and privacy implications of voice-first ambient interfaces (VFAIs) for aging in place through an in-depth engagement with five older adults. Our participants were in the process of becoming experienced VFAI users, and had used a VFAI-based design probe for health data reporting. We create and iteratively refine an interview protocol using Privacy Cards. We customize Privacy Cards by drawing on participants' previous interviews and device usage logs. Using Privacy Cards, we conduct interviews to surface their mental models, and explore their privacy concerns. We find insufficient mental models for proper consent. For example, participants did not know who could access their data, and experienced difficulty distinguishing built-in functionality from third-party apps. Participants initially expressed little worry about VFAI-related ethical concerns, but interviews with Privacy Cards revealed nuanced issues, resulting in various implications for future research and design.

Authors:Jun Aoki, Shunki Itadera
Title: A User Study on the Suitability of Teleoperation Interfaces for Primitive Manipulation Tasks
Abstract:
The application of teleoperation to control robotic arms has been widely explored, and user-friendly teleoperation systems have been studied for facilitating higher performance and lower operational burden. To investigate the dominant factors in a practical teleoperation system, this study focused on the characteristics of an interface used to operate a robotic arm. The usability of an interface depends on the characteristics of the manipulation tasks to be completed; however, systematic comparisons of different interfaces across different tasks remain limited. In this study, we compared two widely used teleoperation interfaces, a 3D mouse and a VR controller, for two simple yet broadly applicable tasks with a six-degree-of-freedom (6DoF) robotic arm: repetitively pushing buttons and rotating knobs. Participants (N = 23) controlled a robotic arm with 6DoF to push buttons and rotate knobs as many times as possible in 3-minute trials. Each trial was followed by a NASA-TLX workload rating. The results showed a clear connection between the interface and task performance: the VR controller yielded higher performance for pushing buttons, whereas the 3D mouse performed better and was less demanding for knob rotation. These findings highlight the importance of considering dominant motion primitives of the task when designing practical teleoperation interfaces.

Authors:Abhishek Kulkarni, Sharon Lynn Chu
Title: Designing AI Tutors for Interest-Based Learning: Insights from Human Instructors
Abstract:
Interest-based learning (IBL) is a paradigm of instruction in which educational content is contextualized using learners' interests to enhance content relevance. IBL has been shown to result in improved learning outcomes. Unfortunately, high effort is needed for instructors to design and deliver IBL content for individual students. LLMs in the form of AI tutors may allow for IBL to scale across many students. Designing an AI tutor for IBL, however, first requires an understanding of how IBL is implemented in teaching scenarios. This paper presents a study that seeks to derive this understanding from an analysis of how human instructors design and deliver IBL content. We studied 14 one-to-one online tutoring sessions (28 participants) in which tutors designed and delivered a lesson tailored to a student's self-identified interest. Using lesson artifacts, tutoring transcripts, interviews, and questionnaires, findings include themes on how tutors integrate interests during instruction and why. Finally, actionable design implications are presented for LLM-powered AI tutors that aim to deliver IBL at scale.

Authors:Iván Arcos, Paolo Rosso, Elena Gomis-Vicent
Title: Human-Centered Multimodal Fusion for Sexism Detection in Memes with Eye-Tracking, Heart Rate, and EEG Signals
Abstract:
The automated detection of sexism in memes is a challenging task due to multimodal ambiguity, cultural nuance, and the use of humor to provide plausible deniability. Content-only models often fail to capture the complexity of human perception. To address this limitation, we introduce and validate a human-centered paradigm that augments standard content features with physiological data. We created a novel resource by recording Eye-Tracking (ET), Heart Rate (HR), and Electroencephalography (EEG) from 16 subjects (8 per experiment) while they viewed 3984 memes from the EXIST 2025 dataset. Our statistical analysis reveals significant physiological differences in how subjects process sexist versus non-sexist content. Sexist memes were associated with higher cognitive load, reflected in increased fixation counts and longer reaction times, as well as differences in EEG spectral power across the Alpha, Beta, and Gamma bands, suggesting more demanding neural processing. Building on these findings, we propose a multimodal fusion model that integrates physiological signals with enriched textual-visual features derived from a Vision-Language Model (VLM). Our final model achieves an AUC of 0.794 in binary sexism detection, a statistically significant 3.4% improvement over a strong VLM-based baseline. The fusion is particularly effective for nuanced cases, boosting the F1-score for the most challenging fine-grained category, Misogyny and Non-Sexual Violence, by 26.3%. These results show that physiological responses provide an objective signal of perception that enhances the accuracy and human-awareness of automated systems for countering online sexism.

Authors:Toshikazu Seto, Yohei Shiwaku, Takayuki Miyauchi, Daisuke Yoshida, Yuichiro Nishimura
Title: Assessment of Display Performance and Comparative Evaluation of Web Map Libraries for Extensive 3D Geospatial Data
Abstract:
Large-scale 3D geospatial data visualization has become increasingly critical for the development of the digital society infrastructure in Japan. This study conducted a comprehensive performance evaluation of two major WebGL-based web mapping libraries, CesiumJS and MapLibre GL JS, using large-scale 3D point-cloud data from the VIRTUAL SHIZUOKA and PLATEAU building models. The research employs standardized 3D Tiles 1.1, and Mapbox Vector Tiles (MVT) formats, comparing performance across different data scales (2nd and 3rd grid levels) using Core Web Vitals metrics, including First Contentful Paint (FCP), Largest Contentful Paint (LCP), Speed Index, Total Blocking Time (TBT), and Cumulative Layout Shift (CLS). The results demonstrate that MVT-based building visualization with MapLibre GL JS achieves optimal performance (FCP 0.8s, TBT 0ms), whereas MapLibre GL JS combined with deck.gl shows superior performance for large-scale point cloud processing (TBT: 3ms, CesiumJS: 21,357ms). This study provides data-driven selection guidelines for appropriate technology choices according to use cases, establishing reproducible performance evaluation frameworks for 3D web mapping technologies during the WebGPU and OGC 3D Tiles 1.1 standardization era.

Authors:Romina Mahinpei, Sofiia Druchyna, Manoel Horta Ribeiro
Title: When LLMs Help -- and Hurt -- Teaching Assistants in Proof-Based Courses
Abstract:
Teaching assistants (TAs) are essential to grading and feedback provision in proof-based courses, yet these tasks are time-intensive and difficult to scale. Although Large Language Models (LLMs) have been studied for grading and feedback, their effectiveness in proof-based courses is still unknown. Before designing LLM-based systems for this context, a necessary prerequisite is to understand whether LLMs can meaningfully assist TAs with grading and feedback. As such, we present a multi-part case study functioning as a technology probe in an undergraduate proof-based course. We compare rubric-based grading decisions made by an LLM and TAs with varying levels of expertise and examine TAs' perceptions of feedback generated by an LLM. We find substantial disagreement between LLMs and TAs on grading decisions but that LLM-generated feedback can still be useful to TAs for submissions with major errors. We conclude by discussing design implications for human-AI grading and feedback systems in proof-based courses.

Authors:Michelle Cohn, Alyssa Lanzi, Yui Ishihara, Chen-Nee Chuah, Georgia Zellou, Alyssa Weakley
Title: Challenges in Automatic Speech Recognition for Adults with Cognitive Impairment
Abstract:
Millions of people live with cognitive impairment from Alzheimer's disease and related dementias (ADRD). Voice-enabled smart home systems offer promise for supporting daily living but rely on automatic speech recognition (ASR) to transcribe their speech to text. Prior work has shown reduced ASR performance for adults with cognitive impairment; however, the acoustic factors underlying these disparities remain poorly understood. This paper evaluates ASR performance for 83 older adults across cognitive groups (cognitively normal, mild cognitive impairment, dementia) reading commands to a voice assistant (Amazon Alexa). Results show that ASR errors are significantly higher for individuals with dementia, revealing a critical usability gap. To better understand these disparities, we conducted an acoustic analysis of speech features and found that a speaker's intensity, voice quality, and pause ratio predicted ASR accuracy. Based on these findings, we outline HCI design implications for AgeTech and voice interfaces, including speaker-personalized ASR, human-in-the-loop correction of ASR transcripts, and interaction-level personalization to support ability-based adaptation.

Authors:Svitlana Surodina, Sinem Görücü, Lili Golmohammadi, Emelia Delaney, Rita Borgo
Title: Now You See Me: Designing Responsible AI Dashboards for Early-Stage Health Innovation
Abstract:
Innovative HealthTech teams develop Artificial Intelligence (AI) systems in contexts where ethical expectations and organizational priorities must be balanced under severe resource constraints. While Responsible AI practices are expected to guide the design and evaluation of such systems, they frequently remain abstract or poorly aligned with the operational realities of early-stage innovation. At the ecosystem level, this misalignment disproportionately affects disadvantaged projects and founders, therefore limiting the diversity of problem-areas under consideration, solutions, stakeholder perspectives, and population datasets represented in AI-enabled healthcare systems. Visualization provides a practical mechanism for supporting decision-making across the AI lifecycle. When developed via a rigorous and collaborative design process, structured on domain knowledge and designed around real-world constraints, visual interfaces can operate as effective sociotechnical governance artifacts enabling responsible decision-making. Grounded in innovation-oriented Human-Centered Computing methodologies, we synthesize insights from a series of design studies conducted via a longitudinal visualization research program, a case study centered on governance dashboard design in a translational setting, and a survey of a cohort of early-stage HealthTech startups. Based on these findings, we articulate design process implications for governance-oriented visualization systems: co-creation with stakeholders, alignment with organizational maturity and context, and support for heterogeneous roles and tasks among others. This work contributes actionable guidance for designing Responsible AI governance dashboards that support decision-making and accountability in early-stage health innovation, and suggests that ecosystem-level coordination can enable more scalable and diverse AI innovation in healthcare.

Authors:Gauri Umesh Rajmane, Ziming Li, Tae Oh, Roshan Peiris
Title: VRSL:Exploring the Comprehensibility of 360-Degree Camera Feeds for Sign Language Communication in Virtual Reality
Abstract:
This study explores integrating sign language into virtual reality (VR) by examining the comprehensibility and user experience of viewing American Sign Language (ASL) videos captured with body-mounted 360-degree cameras. Ten participants identified ASL signs from videos recorded at three body-mounted positions: head, shoulder, and chest. Results showed the shoulder-mounted camera achieved the highest accuracy (85%), though differences between positions were not statistically significant. Participants noted that peripheral distortion in 360-degree videos impacted clarity, highlighting areas for improvement. Despite challenges, the overall comprehension success rate of 83.3% demonstrates the potential of video-based ASL communication in VR. Feedback emphasized the need to refine camera angles, reduce distortion, and explore alternative mounting positions. Participants expressed a preference for signing over text-based communication in VR, highlighting the importance of developing this approach to enhance accessibility and collaboration for Deaf and Hard of Hearing (DHH) users in virtual environments.

Authors:Jeremy Wertheim Co Chen, Rendell Christian Ngo, Cedric Matthew Yu, Hans Emilio Lumagui, Ethan Badayos, Jordan Aiko Deja
Title: Beyond Faders: Understanding 6DoF Gesture Ecologies in Music Mixing
Abstract:
Extended reality (XR) enables new music-mixing workflows by moving beyond 2D faders toward embodied, spatial interaction. However, it remains unclear which six-degree-of-freedom (6DoF) gestures align with real-world mixing practices and whether such interactions support manageable cognitive load and positive user experience. We conducted a design workshop with experienced mixers to elicit gesture concepts for core audio tasks gain, compression, equalization, and automation, and implemented these in an XR prototype. A user study (n=12) evaluated the ecological validity of the gestures using cognitive load measures, user-experience ratings, and interviews. Participants generally found 6DoF gestures intuitive and well-mapped to mixing tasks, reporting strong immersion and a sense of connection with the audio environment. Cognitive load differences across gestures were minimal, though participants expressed preferences shaped by workflow familiarity and perceived control. We discuss implications for designing XR mixing tools that balance expressiveness, precision, and ecological validity.

Authors:Ruiqi Zhou, Donghao Zhu, Houcai Shen
Title: A Learning-Based Hybrid Decision Framework for Matching Systems with User Departure Detection
Abstract:
In matching markets such as kidney exchanges and freight exchanges, delayed matching has been shown to improve overall market efficiency. The benefits of delay are highly sensitive to participants' sojourn times and departure behavior, and delaying matches can impose significant costs, including longer waiting times and increased market congestion. These competing effects make fixed matching policies inherently inflexible in dynamic environments. We propose a learning-based Hybrid framework that adaptively combines immediate and delayed matching. The framework continuously collects data on user departures over time, estimates the underlying departure distribution via regression, and determines whether to delay matching in the subsequent period based on a decision threshold that governs the system's tolerance for matching efficiency loss. The proposed framework can substantially reduce waiting times and congestion while sacrificing only a limited amount of matching efficiency. By dynamically adjusting its matching strategy, the Hybrid framework enables system performance to flexibly interpolate between purely greedy and purely patient policies, offering a robust and adaptive alternative to static matching mechanisms.

Authors:Abhishek Kulkarni, Alexander Barquero, Pavitra Lahari, Aryaan Shaikh, Sarah Brown
Title: E3VA: Enhancing Emotional Expressiveness in Virtual Conversational Agents
Abstract:
With the advent of generative AI and large language models, embodied conversational agents are becoming synonymous with online interactions. These agents possess vast amounts of knowledge but suffer from exhibiting limited emotional expressiveness. Without adequate expressions, agents might fail to adapt to users' emotions, which may result in a sub-optimal user experience and engagement. Most current systems prioritize content based responses, neglecting the emotional context of conversations. Research in this space is currently limited to specific contexts, like mental health. To bridge this gap, our project proposes the implementation of expressive features in a virtual conversational agent which will utilize sentiment analysis and natural language processing to inform the generation of empathetic, expressive responses. The project delivers a functional conversational agent capable of assessing and responding to user emotions accordingly. We posit this will enhance usability, engagement, and the overall quality of conversations and present results from an exploratory pilot study investigating the same.

Authors:Shruthi Andru, Shrut Kirti Saksena
Title: Interface Framework for Human-AI Collaboration within Intelligent User Interface Ecosystems
Abstract:
As interfaces evolve from static user pathways to dynamic human-AI collaboration, no standard methods exist for selecting appropriate interface patterns based on user needs and task complexity. Existing frameworks only provide guiding principles for designing AI agent capabilities. We propose a dimensional framework based on workflow complexity, AI autonomy, and AI reasoning to guide the design of context-aware, scalable AI interfaces aka modalities (e.g., prompt bars, split screens, full screens, etc.). The framework was developed through co-design workshops with designers of marketing products and refined through qualitative research with eight long-term AI users. The study evaluated the three dimensions, identified task-to-interface relationships, and surfaced the importance of both business impact and security risk across all high-autonomy scenarios. This framework provides product teams with a shared language to develop scalable AI interfaces, emphasizing fluidity between interfaces and progressive user control to balance AI autonomy with human oversight.

Authors:Leon Pielage, Ole Hätscher, Mitja Back, Bernhard Marschall, Benjamin Risse
Title: Dynamic Personality Adaptation in Large Language Models via State Machines
Abstract:
The inability of Large Language Models (LLMs) to modulate their personality expression in response to evolving dialogue dynamics hinders their performance in complex, interactive contexts. We propose a model-agnostic framework for dynamic personality simulation that employs state machines to represent latent personality states, where transition probabilities are dynamically adapted to the conversational context. Part of our architecture is a modular pipeline for continuous personality scoring that evaluates dialogues along latent axes while remaining agnostic to the specific personality models, their dimensions, transition mechanisms, or LLMs used. These scores function as dynamic state variables that systematically reconfigure the system prompt, steering behavioral alignment throughout the interaction.We evaluate this framework by operationalizing the Interpersonal Circumplex (IPC) in a medical education setting. Results demonstrate that the system successfully adapts its personality state to user inputs, but also influences user behavior, thereby facilitating de-escalation training. Notably, the scoring pipeline maintains comparable precision even when utilizing lightweight, fine-tuned classifiers instead of large-scale LLMs. This work demonstrates the feasibility of modular, personality-adaptive architectures for education, customer support, and broader human-computer interaction.

Authors:Andrés Rodriguez, Juan Cruz Gardey, Alejandra Garrido
Title: Detecting UX smells in Visual Studio Code using LLMs
Abstract:
Integrated Development Environments shape developers' daily experience, yet the empirical study of their usability and user experience (UX) remains limited. This work presents an LLM-assisted approach to detecting UX smells in Visual Studio Code by mining and classifying user-reported issues from the GitHub repository. Using a validated taxonomy and expert review, we identified recurring UX problems that affect the developer experience. Our results show that the majority of UX smells are concentrated in informativeness, clarity, intuitiveness, and efficiency, qualities that developers value most.

Authors:Deja Dunlap, R. Thomas McCoy
Title: Evaluating the Usage of African-American Vernacular English in Large Language Models
Abstract:
In AI, most evaluations of natural language understanding tasks are conducted in standardized dialects such as Standard American English (SAE). In this work, we investigate how accurately large language models (LLMs) represent African American Vernacular English (AAVE). We analyze three LLMs to compare their usage of AAVE to the usage of humans who natively speak AAVE. We first analyzed interviews from the Corpus of Regional African American Language and TwitterAAE to identify the typical contexts where people use AAVE grammatical features such as ain't. We then prompted the LLMs to produce text in AAVE and compared the model-generated text to human usage patterns. We find that, in many cases, there are substantial differences between AAVE usage in LLMs and humans: LLMs usually underuse and misuse grammatical features characteristic of AAVE. Furthermore, through sentiment analysis and manual inspection, we found that the models replicated stereotypes about African Americans. These results highlight the need for more diversity in training data and the incorporation of fairness methods to mitigate the perpetuation of stereotypes.

Authors:Mohammad Sadra Rajabi, Aanuoluwapo Ojelade, Sunwook Kim, Maury A. Nussbaum
Title: Vision-Language Models for Ergonomic Assessment of Manual Lifting Tasks: Estimating Horizontal and Vertical Hand Distances from RGB Video
Abstract:
Manual lifting tasks are a major contributor to work-related musculoskeletal disorders, and effective ergonomic risk assessment is essential for quantifying physical exposure and informing ergonomic interventions. The Revised NIOSH Lifting Equation (RNLE) is a widely used ergonomic risk assessment tool for lifting tasks that relies on six task variables, including horizontal (H) and vertical (V) hand distances; such distances are typically obtained through manual measurement or specialized sensing systems and are difficult to use in real-world environments. We evaluated the feasibility of using innovative vision-language models (VLMs) to non-invasively estimate H and V from RGB video streams. Two multi-stage VLM-based pipelines were developed: a text-guided detection-only pipeline and a detection-plus-segmentation pipeline. Both pipelines used text-guided localization of task-relevant regions of interest, visual feature extraction from those regions, and transformer-based temporal regression to estimate H and V at the start and end of a lift. For a range of lifting tasks, estimation performance was evaluated using leave-one-subject-out validation across the two pipelines and seven camera view conditions. Results varied significantly across pipelines and camera view conditions, with the segmentation-based, multi-view pipeline consistently yielding the smallest errors, achieving mean absolute errors of approximately 6-8 cm when estimating H and 5-8 cm when estimating V. Across pipelines and camera view configurations, pixel-level segmentation reduced estimation error by approximately 20-30% for H and 35-40% for V relative to the detection-only pipeline. These findings support the feasibility of VLM-based pipelines for video-based estimation of RNLE distance parameters.

Authors:Takaya Miyama, Satoshi Nakamura, Shota Yamanaka
Title: Improving Data Quality via Pre-Task Participant Screening in Crowdsourced GUI Experiments
Abstract:
In crowdsourced user experiments that collect performance data from graphical user interface (GUI) interactions, some participants ignore instructions or act carelessly, threatening the validity of performance models. We investigate a pre-task screening method that requires simple GUI operations analogous to the main task and uses the resulting error as a continuous quality signal. Our pre-task is a brief image-resizing task in which workers match an on-screen card to a physical card; workers whose resizing error exceeds a threshold are excluded from the main experiment. The main task is a standardized pointing experiment with well-established models of movement time and error rate. Across mouse- and smartphone-based crowdsourced experiments, we show that reducing the proportion of workers exhibiting unexpected behavior and tightening the pre-task threshold systematically improve the goodness of fit and predictive accuracy of GUI performance models, demonstrating that brief pre-task screening can enhance data quality.

Authors:Paras Sharma, YuePing Sha, Janet Shufor Bih Epse Fofang, Brayden Yan, Jess A. Turner, Nicole Balay, Hubert O. Asare, Angela E. B. Stewart, Erin Walker
Title: Hybrid LLM-Embedded Dialogue Agents for Learner Reflection: Designing Responsive and Theory-Driven Interactions
Abstract:
Dialogue systems have long supported learner reflections, with theoretically grounded, rule-based designs offering structured scaffolding but often struggling to respond to shifts in engagement. Large Language Models (LLMs), in contrast, can generate context-sensitive responses but are not informed by decades of research on how learning interactions should be structured, raising questions about their alignment with pedagogical theories. This paper presents a hybrid dialogue system that embeds LLM responsiveness within a theory-aligned, rule-based framework to support learner reflections in a culturally responsive robotics summer camp. The rule-based structure grounds dialogue in self-regulated learning theory, while the LLM decides when and how to prompt deeper reflections, responding to evolving conversation context. We analyze themes across dialogues to explore how our hybrid system shaped learner reflections. Our findings indicate that LLM-embedded dialogues supported richer learner reflections on goals and activities, but also introduced challenges due to repetitiveness and misalignment in prompts, reducing engagement.

Authors:William Seymour, Martin J. Kraemer
Title: Shifting Engagement With Cybersecurity: How People Discover and Share Cybersecurity Content at Work and at Home
Abstract:
Cybersecurity awareness is shaped by a wide range of professional and personal experiences, including information and training at work and the sharing of news and other content at home. In order to explore how people discover cybersecurity content and the effect that participation in workplace training may have on this we present an online study of 1200 participants from the UK, US, France, and Germany. Those undertaking cybersecurity training at work showed reduced intention to share information at home, shifting the focus towards the workplace. They were also more likely to recall cybersecurity information shared by their employer than from any other source, which in turn correlated with content type and distribution channel. We critically reflect on this shift, highlighting opportunities to improve cybersecurity information sharing at work and at home.

Authors:David Fraile Navarro, Mor Peleg
Title: Conversational AI for Automated Patient Questionnaire Completion: Development Insights and Design Principles
Abstract:
Collecting patient-reported outcome measures (PROMs) is essential for clinical care and research, yet traditional form-based approaches are often tedious for patients and burdensome for clinicians. We developed a generative AI conversational agent(CA) using GPT-5 to collect back pain data according to the NIH Task Force's Recommended Minimal Dataset. Unlike prior CAs that ask questions one-by-one, our CA engages users in topic-based conversations, allowing multiple data items to be captured in a single exchange. Through iterative development and pilot testing with clinicians and a consumer panel, we identified key design principles for health data collection CAs. These principles extend established clinical decision support design guidelines to conversational interfaces, addressing: flexibility of interaction style, personality calibration, data quality assurance through confidence visualization, patient safety constraints, and interoperability requirements. We present our prompt design methodology and discuss challenges encountered, including managing conversation length, handling ambiguous responses, and adapting to LLM version changes. Our design principles provide a practical framework for developers creating conversational agents for patient questionnaire completion. The CA is available at https://chatgpt.com/g/g-68f4869548f48191af0544f110ee91c6-backpain-data-collection-assistant (requires ChatGPT registration and subscription for unlimited use).

Authors:Samuel Bellaire, Abdalmalek Abu-raddaha, Natalie Kim, Nathan Morhan, William Elliott, Samir Rawashdeh
Title: Botson: An Accessible and Low-Cost Platform for Social Robotics Research
Abstract:
Trust remains a critical barrier to the effective integration of Artificial Intelligence (AI) into human-centric domains. Disembodied agents, such as voice assistants, often fail to establish trust due to their inability to convey non-verbal social cues. This paper introduces the architecture of Botson: an anthropomorphic social robot powered by a large language model (LLM). Botson was created as a low-cost and accessible platform for social robotics research.

Authors:Eun Jeong Kang, Fengyang Lin, Angel Hsing-Chi Hwang
Title: Policy or Community?: Supporting Individual Model Creators' Open Model Development in Model Marketplaces
Abstract:
Lightweight fine-tuning techniques and the rise of 'open' AI model marketplaces have enabled individuals to easily build and release generative models. Yet, this accessibility also raises risks, including the production of harmful and infringing content. While platforms offer policies and responsible AI tools, their effectiveness may be limited, as creators engage with partially open models that vary widely in openness and transparency. To understand how platform governance can better support responsible practices, we conducted semi-structured interviews with 19 individual model creators. We identified three regulatory needs shaped by creators' workflows: reducing downstream harms, recognizing creators' contributions and originality, and securing model ownership. Creators also repurpose RAI tools primarily for self-protection and visibility, and their sense of responsibility is deeply shaped by community norms rather than formal policies. We argue that platforms' governance decisions must consider how policy interventions shape the practices and motivations of individual creators.

Authors:Kirk Vanacore, Ryan S. Baker, Avery H. Closser, Jeremy Roschelle
Title: The Path to Conversational AI Tutors: Integrating Tutoring Best Practices and Targeted Technologies to Produce Scalable AI Agents
Abstract:
The emergence of generative AI has accelerated the development of conversational tutoring systems that interact with students through natural language dialogue. Unlike prior intelligent tutoring systems (ITS), which largely function as adaptive and interactive problem sets with feedback and hints, conversational tutors hold the potential to simulate high-quality human tutoring by engaging with students' thoughts, questions, and misconceptions in real time. While some previous ITS, such as AutoTutor, could respond conversationally, they were expensive to author and lacked a full range of conversational ability. Generative AI has changed the capacity of ITS to engage conversationally. However, realizing the full potential of conversational tutors requires careful consideration of what research on human tutoring and ITS has already established, while also unpacking what new research will be needed. This paper synthesizes tenets of successful human tutoring, lessons learned from legacy ITS, and emerging work on conversational AI tutors. We use a keep, change, center, study framework for guiding the design of conversational tutoring. We argue that systems should keep proven methods from prior ITS, such as knowledge tracing and affect detection; change how tutoring is delivered by leveraging generative AI for dynamic content generation and dialogic scaffolding; and center opportunities for meaning-making, student agency, and granular diagnosis of reasoning. Finally, we identify areas requiring further study, including efficacy testing, student experience, and integration with human instruction. By synthesizing insights from human tutoring, legacy ITS, and emerging generative AI technologies, this paper outlines a research agenda for developing conversational tutors that are scalable, pedagogically effective, and responsive to the social and motivational dimensions of learning.

Authors:Claire Liang, Franziska Babel, Hannah Pelikan, Sydney Thompson, Xiang Zhi Tan
Title: A Checklist for Deploying Robots in Public: Articulating Tacit Knowledge in the HRI Community
Abstract:
Many of the challenges encountered in in-the-wild public deployments of robots remain undocumented despite sharing many common pitfalls. This creates a high barrier of entry and results in repetition of avoidable mistakes. To articulate the tacit knowledge in the HRI community, this paper presents a guideline in the form of a checklist to support researchers in preparing for robot deployments in public. Drawing on their own experience with public robot deployments, the research team collected essential topics to consider in public HRI research. These topics are represented as modular flip cards in a hierarchical table, structured into deployment phases and important domains. We interviewed six interdisciplinary researchers with expertise in public HRI and show how including community input refines the checklist. We further show the checklist in action in context of real public studies. Finally, we contribute the checklist as an open-source, customizable community resource that both collects joint expertise for continual evolution and is usable as a list, set of cards, and an interactive web tool.

Authors:Brandon Victor Syiem, Eduardo Velloso
Title: Better Assumptions, Stronger Conclusions: The Case for Ordinal Regression in HCI
Abstract:
Despite the widespread use of ordinal measures in HCI, such as Likert-items, there is little consensus among HCI researchers on the statistical methods used for analysing such data. Both parametric and non-parametric methods have been extensively used within the discipline, with limited reflection on their assumptions and appropriateness for such analyses. In this paper, we examine recent HCI works that report statistical analyses of ordinal measures. We highlight prevalent methods used, discuss their limitations and spotlight key assumptions and oversights that diminish the insights drawn from these methods. Finally, we champion and detail the use of cumulative link (mixed) models (CLM/CLMM) for analysing ordinal data. Further, we provide practical worked examples of applying CLM/CLMMs using R to published open-sourced datasets. This work contributes towards a better understanding of the statistical methods used to analyse ordinal data in HCI and helps to consolidate practices for future work.

Authors:Tung T. Ngo, Dai Nguyen Van, Anh-Minh Nguyen, Phuong-Anh Do, Anh Nguyen-Quoc
Title: Qualitative Coding Analysis through Open-Source Large Language Models: A User Study and Design Recommendations
Abstract:
Qualitative data analysis is labor-intensive, yet the privacy risks associated with commercial Large Language Models (LLMs) often preclude their use in sensitive research. To address this, we introduce ChatQDA, an on-device framework powered by open-source LLMs designed for privacy-preserving open coding. Our mixed-methods user study reveals that while participants rated the system highly for usability and perceived efficiency, they exhibited "conditional trust", valuing the tool for surface-level extraction while questioning its interpretive nuance and consistency. Furthermore, despite the technical security of local deployment, participants reported epistemic uncertainty regarding data protection, suggesting that invisible security measures are insufficient to foster trust. We conclude with design recommendations for local-first analysis tools that prioritize verifiable privacy and methodological rigor.

Authors:Nam Hee Kim, Jingjing May Liu, Jaakko Lehtinen, Perttu Hämäläinen, James F. O'Brien, Xue Bin Peng
Title: Robo-Saber: Generating and Simulating Virtual Reality Players
Abstract:
We present the first motion generation system for playtesting virtual reality (VR) games. Our player model generates VR headset and handheld controller movements from in-game object arrangements, guided by style exemplars and aligned to maximize simulated gameplay score. We train on the large BOXRR-23 dataset and apply our framework on the popular VR game Beat Saber. The resulting model Robo-Saber produces skilled gameplay and captures diverse player behaviors, mirroring the skill levels and movement patterns specified by input style exemplars. Robo-Saber demonstrates promise in synthesizing rich gameplay data for predictive applications and enabling a physics-based whole-body VR playtesting agent.

Authors:Kaori Ikematsu, Kunihiro Kato
Title: DuoTouch: Passive Two-Footprint Attachments Using Binary Sequences to Extend Touch Interaction
Abstract:
DuoTouch is a passive attachment for capacitive touch panels that adds tangible input while minimizing content occlusion and loss of input area. It uses two contact footprints and two traces to encode motion as binary sequences and runs on unmodified devices through standard touch APIs. We present two configurations with paired decoders: an aligned configuration that maps fixed-length codes to discrete commands and a phase-shifted configuration that estimates direction and distance from relative timing. To characterize the system's reliability, we derive a sampling-limited bound that links actuation speed, internal trace width, and device touch sampling rate. Through technical evaluations on a smartphone and a touchpad, we report performance metrics that describe the relationship between these parameters and decoding accuracy. Finally, we demonstrate the versatility of DuoTouch by embedding the mechanism into various form factors, including a hand strap, a phone ring holder, and touchpad add-ons.

Authors:Nathan G. Wood, Scott Robbins, Eduardo Zegarra Berodt, Anton Graf von Westerholt, Michelle Behrndt, Hauke Budig, Daniel Kloock-Schreiber
Title: Stop Saying "AI"
Abstract:
Across academia, industry, and government, ``AI'' has become central in research and development, regulatory debates, and promises of ever faster and more capable decision-making and action. In numerous domains, especially safety-critical ones, there are significant concerns over how ``AI'' may affect decision-making, responsibility, or the likelihood of mistakes (to name only a few categories of critique). However, for most critiques, the target is generally ``AI'', a broad term admitting many (types of) systems used for a variety of tasks and each coming with its own set of limitations, challenges, and potential use cases. In this article, we focus on the military domain as a case study and present both a loose enumerative taxonomy of systems captured under the umbrella term ``military AI'', as well as discussion of the challenges of each. In doing so, we highlight that critiques of one (type of) system will not always transfer to other (types of) systems. Building on this, we argue that in order for debates to move forward fruitfully, it is imperative that the discussions be made more precise and that ``AI'' be excised from debates to the extent possible. Researchers, developers, and policy-makers should make clear exactly what systems they have in mind and what possible benefits and risks attend the deployment of those particular systems. While we focus on AI in the military as an exemplar for the overall trends in discussions of ``AI'', the argument's conclusions are broad and have import for discussions of AI across a host of domains.

Authors:Abdulhadi Shoufan, Ahmad-Azmi-Abdelhamid Esmaeil
Title: AI Hallucination from Students' Perspective: A Thematic Analysis
Abstract:
As students increasingly rely on large language models, hallucinations pose a growing threat to learning. To mitigate this, AI literacy must expand beyond prompt engineering to address how students should detect and respond to LLM hallucinations. To support this, we need to understand how students experience hallucinations, how they detect them, and why they believe they occur. To investigate these questions, we asked university students three open-ended questions about their experiences with AI hallucinations, their detection strategies, and their mental models of why hallucinations occur. Sixty-three students responded to the survey. Thematic analysis of their responses revealed that reported hallucination issues primarily relate to incorrect or fabricated citations, false information, overconfident but misleading responses, poor adherence to prompts, persistence in incorrect answers, and sycophancy. To detect hallucinations, students rely either on intuitive judgment or on active verification strategies, such as cross-checking with external sources or re-prompting the model. Students' explanations for why hallucinations occur reflected several mental models, including notable misconceptions. Many described AI as a research engine that fabricates information when it cannot locate an answer in its "database." Others attributed hallucinations to issues with training data, inadequate prompting, or the model's inability to understand or verify information. These findings illuminate vulnerabilities in AI-supported learning and highlight the need for explicit instruction in verification protocols, accurate mental models of generative AI, and awareness of behaviors such as sycophancy and confident delivery that obscure inaccuracy. The study contributes empirical evidence for integrating hallucination awareness and mitigation into AI literacy curricula.

Authors:Jiangtao Gong, Xiao Wen, Fengyi Tao, Xinqi Wang, Xixi Yang, Yangrong Tang
Title: Evaluating Text-based Conversational Agents for Mental Health: A Systematic Review of Metrics, Methods and Usage Contexts
Abstract:
Text-based conversational agents (CAs) are increasingly used in mental health, yet evaluation practices remain fragmented. We conducted a PRISMA-guided systematic review (May-June 2024) across ACM Digital Library, Scopus, and PsycINFO. From 613 records, 132 studies were included, with dual-coder extraction achieving substantial agreement (Cohen's kappa = 0.77-0.92). We synthesized evaluation approaches across three dimensions: metrics, methods, and usage contexts. Metrics were classified into CA-centric attributes (e.g., reliability, safety, empathy) and user-centric outcomes (experience, knowledge, psychological state, health behavior). Methods included automated analyses, standardized psychometric scales, and qualitative inquiry. Temporal designs ranged from momentary to follow-up assessments. Findings show reliance on Western-developed scales, limited cultural adaptation, predominance of small and short-term samples, and weak links between automated performance metrics and user well-being. We argue for methodological triangulation, temporal rigor, and equity in measurement. This review offers a structured foundation for reliable, safe, and user-centered evaluation of mental health CAs.

Authors:Ömer Elri, Serkan Savaş
Title: Visual Interface Workflow Management System Strengthening Data Integrity and Project Tracking in Complex Processes
Abstract:
Manual notes and scattered messaging applications used in managing business processes compromise data integrity and abstract project tracking. In this study, an integrated system that works simultaneously on web and mobile platforms has been developed to enable individual users and teams to manage their workflows with concrete data. The system architecture integrates MongoDB, which stores data in JSON format, Node.js Express.js on the server side, React.js on the web interface, and React Native technologies on the mobile side. The system interface is designed around visual dashboards that track the status of tasks (To Do-In Progress-Done). The urgency of tasks is distinguished by color-coded labels, and dynamic graphics (Dashboard) have been created for managers to monitor team performance. The usability of the system was tested with a heterogeneous group of 10 people consisting of engineers, engineering students, public employees, branch managers, and healthcare personnel. In analyses conducted using a 5-point Likert scale, the organizational efficiency provided by the system compared to traditional methods was rated 4.90, while the visual dashboards achieved a perfect score of 5.00 with zero variance. Additionally, the ease of interface use was rated 4.65, and overall user satisfaction was calculated as 4.60. The findings show that the developed system simplifies complex work processes and provides a traceable digital working environment for Small and Medium-sized Enterprises and project teams.

Authors:Yanni Mei, Samuel Wendt, Florian Mueller, Jan Gugenheimer
Title: ShadAR: LLM-driven shader generation to transform visual perception in Augmented Reality
Abstract:
Augmented Reality (AR) can simulate various visual perceptions, such as how individuals with colorblindness see the world. However, these simulations require developers to predefine each visual effect, limiting flexibility. We present ShadAR, an AR application enabling real-time transformation of visual perception through shader generation using large language models (LLMs). ShadAR allows users to express their visual intent via natural language, which is interpreted by an LLM to generate corresponding shader code. This shader is then compiled real-time to modify the AR headset viewport. We present our LLM-driven shader generation pipeline and demonstrate its ability to transform visual perception for inclusiveness and creativity.

Authors:Rui Yao, Qiuyuan Ren, Felicia Fang-Yi Tan, Chen Yang, Xiaoyu Zhang, Shengdong Zhao
Title: PersonaMail: Learning and Adapting Personal Communication Preferences for Context-Aware Email Writing
Abstract:
LLM-assisted writing has seen rapid adoption in interpersonal communication, yet current systems often fail to capture the subtle tones essential for effectiveness. Email writing exemplifies this challenge: effective messages require careful alignment with intent, relationship, and context beyond mere fluency. Through formative studies, we identified three key challenges: articulating nuanced communicative intent, making modifications at multiple levels of granularity, and reusing effective tone strategies across messages. We developed PersonaMail, a system that addresses these gaps through structured communication factor exploration, granular editing controls, and adaptive reuse of successful strategies. Our evaluation compared PersonaMail against standard LLM interfaces, and showed improved efficiency in both immediate and repeated use, alongside higher user satisfaction. We contribute design implications for AI-assisted communication systems that prioritize interpersonal nuance over generic text generation.

Authors:Mengjie Tang, Xinman Li, Juxiao Zhang, Franklin Mingzhe Li, Zhuying Li
Title: Understanding Nature Engagement Experiences of Blind People
Abstract:
Nature plays a crucial role in human health and well-being, but little is known about how blind people experience and relate to it. We conducted a survey of nature relatedness with blind (N=20) and sighted (N=20) participants, along with in-depth interviews with 16 blind participants, to examine how blind people engage with nature and the factors shaping this engagement. Our survey results revealed lower levels of nature relatedness among blind participants compared to sighted peers. Our interview study further highlighted: 1) current practices and challenges of nature engagement, 2) attitudes and values that shape engagement, and 3) expectations for assistive technologies that support safe and meaningful engagement. We also provide design implications to guide future technologies that support nature engagement for blind people. Overall, our findings illustrate how blind people experience nature beyond vision and lay a foundation for technologies that support inclusive nature engagement.

Authors:Celeste Seah, Yoke Chuan Lee, Jung-Joo Lee, Ching-Chiuan Yen, Clement Zheng
Title: Rememo: A Research-through-Design Inquiry Towards an AI-in-the-loop Therapist's Tool for Dementia Reminiscence
Abstract:
Reminiscence therapy (RT) is a common non-pharmacological intervention in dementia care. Recent technology-mediated interventions have largely focused on people with dementia through solutions that replace human facilitators with conversational agents. However, the relational work of facilitation is critical in the effectiveness of RT. Hence, we developed Rememo, a therapist-oriented tool that integrates Generative AI to support and enrich human facilitation in RT. Our tool aims to support the infrastructural and cultural challenges that therapists in Singapore face. In this research, we contribute the Rememo system as a therapist's tool for personalized RT developed through sociotechnically-aware research-through-design. Through studying this system in-situ, our research extends our understanding of human-AI collaboration for care work. We discuss the implications of designing AI-enabled systems that respect the relational dynamics in care contexts, and argue for a rethinking of synthetic imagery as a therapeutic support for memory rahter than a record of truth.

Authors:Yasmin Kafai, José Ramón Lizárraga, R. Benjamin Shapiro
Title: CreateAI Insights from an NSF Workshop on K12 Students, Teachers, and Families as Designers of Artificial Intelligence and Machine Learning Applications
Abstract:
In response to the exponential growth in the use of artificial intelligence and machine learning applications, educators, researchers and policymakers have taken steps to integrate artificial intelligence applications into K-12 education. Among these efforts, one equally important approach has received little, if any attention: What if students and teachers were not just learning to be competent users of AI but also its creators? This question is at the heart of CreateAI in which K12 educators, researchers, and learning scientists addressed the following questions: (1) What tools, skills, and knowledge will empower students and teachers to build their own AI/ML applications? (2) How can we integrate these approaches into classrooms? and (3) What new possibilities for learning emerge when students and teachers become innovators and creators? In the report we provide recommendations for what tools designed for creating AI/ML applications should address in terms of design features, and learner progression in investigations. To promote effective learning and teaching of creating AI applications, we also need to help students and teachers select appropriate tools. We outline how we need to develop a better understanding of learning practices and funds of knowledge to support youth as they create and evaluate AI/ML applications. This also includes engaging youth in learning about ethics and critically that is authentic, empowering, and relevant throughout the design process. Here we advocate for the integration of ethics in the curriculum. We also address what teachers need to know and how assessments can help establish baselines, include different instruments, and promote students as responsible creators of AI. Together, these recommendations provide important insights for preparing students to engage thoughtfully and critically with these technologies.

Authors:Yasmin Kafai, Shuchi Grover
Title: Expanding the Scope of Computational Thinking in Artificial Intelligence for K-12 Education
Abstract:
The introduction of generative artificial intelligence applications to the public has led to heated discussions about its potential impacts and risks for K-12 education. One particular challenge has been to decide what students should learn about AI, and how this relates to computational thinking, which has served as an umbrella for promoting and introducing computing education in schools. In this paper, we situate in which ways we should expand computational thinking to include artificial intelligence and machine learning technologies. Furthermore, we discuss how these efforts can be informed by lessons learned from the last decade in designing instructional programs, integrating computing with other subjects, and addressing issues of algorithmic bias and justice in teaching computing in schools.

Authors:Inha Cha, Yeonju Jang, Haesoo Kim, Joo Young Park, Seora Park, EunJeong Cheon
Title: "My body is not your Porn": Identifying Trends of Harm and Oppression through a Sociotechnical Genealogy of Digital Sexual Violence in South Korea
Abstract:
Ever since the introduction of internet technologies in South Korea, digital sexual violence (DSV) has been a persistent and pervasive problem. Evolving alongside digital technologies, the severity and scale of violence have grown consistently, leading to widespread public concern. In this paper, we present four eras of image-based DSV in South Korea, spanning from the early internet era of the 1990s to the deepfake scandals in the mid-2020s. Drawing from media coverage, legal documents, and academic literature, we elucidate forms and characteristics of DSV cases in each era, tracing how entrenched misogyny is reconfigured and amplified through evolving technologies, alongside shifting legislative measures. Taking a genealogical approach to read prominent cases of different eras, our analysis identifies three constitutive and interconnected dimensions of DSV: (1) the homo-social fabrication of "obscenity", wherein victims' imagery becomes collectively framed as obscene through participatory practices in male-dominant networks; (2) the increasing imperceptibility of violence, as technologies foreclose victims' ability to perceive harm; and (3) the commercialization of abuse through decentralized economic infrastructures. We suggest future directions for CSCW research, and further reflect on the value of the genealogical method in enabling non-linear understanding of DSV as dynamically evolving sociotechnical configurations of harm.

Authors:Fabian Walke, Veronika Föller
Title: Generative AI Usage of University Students: Navigating Between Education and Business
Abstract:
This study investigates generative artificial intelligence (GenAI) usage of university students who study alongside their professional career. Previous literature has paid little attention to part-time students and the intersectional use of GenAI between education and business. This study examines with a grounded theory approach the characteristics of GenAI usage of part-time students. Eleven students from a distance learning university were interviewed. Three causal and four intervening conditions, as well as strategies were identified, to influence the use of GenAI. The study highlights both the potential and challenges of GenAI usage in education and business. While GenAI can significantly enhance productivity and learning outcomes, concerns about ethical implications, reliability, and the risk of academic misconduct persist. The developed grounded model offers a comprehensive understanding of GenAI usage among students, providing valuable insights for educators, policymakers, and developers of GenAI tools seeking to bridge the gap between education and business.

Authors:Arianna Rossi, Simon Parkin
Title: "What I'm Interested in is Something that Violates the Law": Regulatory Practitioner Views on Automated Detection of Deceptive Design Patterns
Abstract:
Although deceptive design patterns are subject to growing regulatory oversight, enforcement races to keep up with the scale of the problem. One promising solution is automated detection tools, many of which are developed within academia. We interviewed nine experienced practitioners working within or alongside regulatory bodies to understand their work against deceptive design patterns, including the use of supporting tools and the prospect of automation. Computing technologies have their place in regulatory practice, but not as envisioned in research. For example, investigations require utmost transparency and accountability in all the activities we identify as accompanying dark pattern detection, which many existing tools cannot provide. Moreover, tools need to map interfaces to legal violations to be of use. We thus recommend conducting user requirement research to maximize research impact, supporting ancillary activities beyond detection, and establishing practical tech adoption pathways that account for the needs of both scientific and regulatory activities.

Authors:Wooyoung Jung, Kahyun Jeon, Prosper Babon-Ayeng
Title: Human-AI Collaboration in Large Language Model-Integrated Building Energy Management Systems: The Role of User Domain Knowledge and AI Literacy
Abstract:
This study aimed to comprehend how user domain knowledge and artificial intelligence (AI) literacy impact the effective use of human-AI interactive building energy management system (BEMS). While prior studies have investigated the potential of integrating large language models (LLMs) into BEMS or building energy modeling, very few studies have examined how user interact with such systems. We conducted a systematic role-playing experiment, where 85 human subjects interacted with an advanced generative pre-trained transformer (OpenAI GPT-4o). Participants were tasked with identifying the top five behavioral changes that could reduce home energy use with the GPT model that functioned as an LLM-integrated BEMS. Then, the collected prompt-response data and participant conclusions were analyzed using an analytical framework that hierarchically assessed and scored human-AI interactions and their home energy analysis approaches. Also, participants were classified into four groups based on their self-evaluated domain knowledge of building energy use and AI literacy, and Kruskal-Wallis H tests with post-hoc pairwise comparisons were conducted across 20 quantifiable metrics. Key takeaways include: most participants employed concise prompts (median: 16.2 words) and relied heavily on GPT's analytical capabilities; and notably, only 1 of 20 metrics, appliance identification rate, showed statistically significant group differences (p=0.037), driven by AI literacy rather than domain knowledge, suggesting an equalizing effect of LLMs across expertise levels. This study provides foundational insights into human-AI collaboration dynamics and promising development directions in the context of LLM-integrated BEMS and contributes to realizing human-centric LLM-integrated energy systems.

Authors:Sikao Guo, Edoardo Sarti, Frédéric Cazals
Title: A Unified, Cross-Platform Framework for Automatic GUI and Plugin Generation in Structural Bioinformatics and Beyond
Abstract:
We present a workflow and associated toolkit to automate the creation of graphical user interfaces (GUI) for executables run from command line interfaces (CLI). The workflow consists of three phases, namely (Step 1) the plugin design, (Step 2) the formal (platform independent) specification of the GUI, and (Step 3) the plugin code generation for the targeted platforms. Our architecture is aligned with the Model--View--Presenter (MVP) pattern: steps one and two build the Model and View descriptions, while step three implements the Presenter layer that binds inputs, invokes the CLI, and updates outputs. Once Step one has been (manually) completed, steps two and three are fully automated. The decoupled MVP design and platform-specific generator modules enable reuse of logic, portability across ecosystems, and significant reductions in engineering effort for complex interactive applications. We primarily use our workflow to generate GUI in structural bioinformatics for CLI executables from the Structural Bioinformatics Library (SBL), targeting three platforms, namely VMD, Pymol and Web servers. The workflow can be used as a guideline, while its implementation available in the package Plugin_manager from the SBL, see https://sbl.inria.fr/doc/Plugin_manager-user-manual.html.

Authors:Pedro Reynolds-Cuéllar, Marisol Wong-Villacres, Adriana Alvarado Garcia, Heila Precel
Title: From Reflection to Repair: A Scoping Review of Dataset Documentation Tools
Abstract:
Dataset documentation is widely recognized as essential for the responsible development of automated systems. Despite growing efforts to support documentation through different kinds of artifacts, little is known about the motivations shaping documentation tool design or the factors hindering their adoption. We present a systematic review supported by mixed-methods analysis of 59 dataset documentation publications to examine the motivations behind building documentation tools, how authors conceptualize documentation practices, and how these tools connect to existing systems, regulations, and cultural norms. Our analysis shows four persistent patterns in dataset documentation conceptualization that potentially impede adoption and standardization: unclear operationalizations of documentation's value, decontextualized designs, unaddressed labor demands, and a tendency to treat integration as future work. Building on these findings, we propose a shift in Responsible AI tool design toward institutional rather than individual solutions, and outline actions the HCI community can take to enable sustainable documentation practices.

Authors:Paolo Bottoni, Susanna Cifani, Kamen Kanev, Daniel Moraru, Atsushi Nakamura, Marco Raoul Marini
Title: Towards a More Realistic VR Experience: Merging Haptic Gloves with Precision Gloves
Abstract:
Virtual reality (VR) glove technology is increasingly important for professional training, industrial applications, and teleoperation in hazardous environments, since it enables more natural and immersive interactions than controllers. However, current solutions face a trade-off: high-precision gloves lack haptic feedback, while haptic gloves suffer from poor accuracy. Existing studies have mainly focused on developing new glove prototypes or optimizing only one type of glove, without addressing the integration of both features. Our work presents a novel hybrid approach that combines a high-precision glove with a haptic glove, creating a system that delivers both precision and haptics.

Authors:Atharva S Kashyap, Ugne Aleksandra Morkute, Patricia Alves-Oliveira
Title: Robot-Assisted Social Dining as a White Glove Service
Abstract:
Robot-assisted feeding enables people with disabilities who require assistance eating to enjoy a meal independently and with dignity. However, existing systems have only been tested in-lab or in-home, leaving in-the-wild social dining contexts (e.g., restaurants) largely unexplored. Designing a robot for such contexts presents unique challenges, such as dynamic and unsupervised dining environments that a robot needs to account for and respond to. Through speculative participatory design with people with disabilities, supported by semi-structured interviews and a custom AI-based visual storyboarding tool, we uncovered ideal scenarios for in-the-wild social dining. Our key insight suggests that such systems should: embody the principles of a white glove service where the robot (1) supports multimodal inputs and unobtrusive outputs; (2) has contextually sensitive social behavior and prioritizes the user; (3) has expanded roles beyond feeding; (4) adapts to other relationships at the dining table. Our work has implications for in-the-wild and group contexts of robot-assisted feeding.

Authors:Belén Martín-Urcelay, Yoonsang Lee, Matthieu R. Bloch, Christopher J. Rozell
Title: Beyond Labels: Information-Efficient Human-in-the-Loop Learning using Ranking and Selection Queries
Abstract:
Integrating human expertise into machine learning systems often reduces the role of experts to labeling oracles, a paradigm that limits the amount of information exchanged and fails to capture the nuances of human judgment. We address this challenge by developing a human-in-the-loop framework to learn binary classifiers with rich query types, consisting of item ranking and exemplar selection. We first introduce probabilistic human response models for these rich queries motivated by the relationship experimentally observed between the perceived implicit score of an item and its distance to the unknown classifier. Using these models, we then design active learning algorithms that leverage the rich queries to increase the information gained per interaction. We provide theoretical bounds on sample complexity and develop a tractable and computationally efficient variational approximation. Through experiments with simulated annotators derived from crowdsourced word-sentiment and image-aesthetic datasets, we demonstrate significant reductions on sample complexity. We further extend active learning strategies to select queries that maximize information rate, explicitly balancing informational value against annotation cost. This algorithm in the word sentiment classification task reduces learning time by more than 57\% compared to traditional label-only active learning.

Authors:Ning Wang, Chen Liang
Title: How to Disclose? Strategic AI Disclosure in Crowdfunding
Abstract:
As artificial intelligence (AI) increasingly integrates into crowdfunding practices, strategic disclosure of AI involvement has become critical. Yet, empirical insights into how different disclosure strategies influence investor decisions remain limited. Drawing on signaling theory and Aristotle's rhetorical framework, we examine how mandatory AI disclosure affects crowdfunding performance and how substantive signals (degree of AI involvement) and rhetorical signals (logos/explicitness, ethos/authenticity, pathos/emotional tone) moderate these effects. Leveraging Kickstarter's mandatory AI disclosure policy as a natural experiment and four supplementary online experiments, we find that mandatory AI disclosure significantly reduces crowdfunding performance: funds raised decline by 39.8% and backer counts by 23.9% for AI-involved projects. However, this adverse effect is systematically moderated by disclosure strategy. Greater AI involvement amplifies the negative effects of AI disclosure, while high authenticity and high explicitness mitigate them. Interestingly, excessive positive emotional tone (a strategy creators might intuitively adopt to counteract AI skepticism) backfires and exacerbates negative outcomes. Supplementary randomized experiments identify two underlying mechanisms: perceived creator competence and AI washing concerns. Substantive signals primarily affect competence judgments, whereas rhetorical signals operate through varied pathways: either mediator alone or both in sequence. These findings provide theoretical and practical insights for entrepreneurs, platforms, and policymakers strategically managing AI transparency in high-stakes investment contexts.

Authors:Rodrigo Gutierrez Maquilon, Marita Hueber, Georg Regal, Manfred Tscheligi
Title: Ground-Truth Depth in Vision Language Models: Spatial Context Understanding in Conversational AI for XR-Robotic Support in Emergency First Response
Abstract:
Large language models (LLMs) are increasingly used in emergency first response (EFR) applications to support situational awareness (SA) and decision-making, yet most operate on text or 2D imagery and offer little support for core EFR SA competencies like spatial reasoning. We address this gap by evaluating a prototype that fuses robot-mounted depth sensing and YOLO detection with a vision language model (VLM) capable of verbalizing metrically-grounded distances of detected objects (e.g., the chair is 3.02 meters away). In a mixed-reality toxic-smoke scenario, participants estimated distances to a victim and an exit window under three conditions: video-only, depth-agnostic VLM, and depth-augmented VLM. Depth-augmentation improved objective accuracy and stability, e.g., the victim and window distance estimation error dropped, while raising situational awareness without increasing workload. Conversely, depth- agnostic assistance increased workload and slightly worsened accuracy. We contribute to human SA augmentation by demonstrating that metrically grounded, object-centric verbal information supports spatial reasoning in EFR and improves decision-relevant judgments under time pressure.

Authors:Kashyap Thimmaraju, Duc Anh Hoang, Souradip Nath, Jaron Mink, Gail-Joon Ahn
Title: Before the Vicious Cycle Starts: Preventing Burnout Across SOC Roles Through Flow-Aligned Design
Abstract:
The sustainability of Security Operations Centers depends on their people, yet 71% of practitioners report burnout and 24% plan to exit cybersecurity entirely. Flow theory suggests that when job demands misalign with practitioner capabilities, work becomes overwhelming or tedious rather than engaging. Achieving challenge-skill balance begins at hiring: if job descriptions inaccurately portray requirements, organizations risk recruiting underskilled practitioners who face anxiety or overskilled ones who experience boredom. Yet we lack empirical understanding of what current SOC job descriptions actually specify. We analyzed 106 public SOC job postings from November to December 2024 across 35 organizations in 11 countries, covering Analysts (n=17), Incident Responders (n=38), Threat Hunters (n=39), and SOC Managers (n=12). Using Inductive Content Analysis, we coded certifications, technical skills, soft skills, tasks, and experience requirements. Three patterns emerged: (1) Communication skills dominate (50.9% of postings), exceeding SIEM tools (18.9%) or programming (30.2%), suggesting organizations prioritize collaboration over technical capabilities. (2) Certification expectations vary widely: CISSP leads (22.6%), but 43 distinct credentials appear with no universal standard. (3) Technical requirements show consensus: Python dominates programming (27.4%), Splunk leads SIEM platforms (14.2%), and ISO 27001 (13.2%) and NIST (10.4%) are most cited standards. These findings enable organizations to audit job descriptions against empirical baselines, help practitioners identify valued certifications and skills, and allow researchers to validate whether stated requirements align with actual demands. This establishes the foundation for flow-aligned interview protocols and investigation of how AI reshapes requirements. Dataset and codebook: https://git.tu-berlin.de/wosoc-2026/soc-jd-analysis.

Authors:Hongming Li, Salah Esmaeiligoujar, Nazanin Adham, Hai Li, Rui Huang
Title: 'I Spend All My Energy Preparing': Balancing AI Automation and Agency for Self-Regulated Learning in SmartFlash
Abstract:
Effective study strategies fail when preparatory tasks consume learning time. While AI educational tools demonstrate efficacy, understanding how they align with self-regulation needs in authentic study contexts remains limited. We conducted formative design research using an AI flashcard prototype, employing large language models to generate design hypotheses, which were validated through researcher walkthroughs and student sessions. Six students across disciplines completed sessions combining interviews and think-aloud tasks with their materials. Analysis revealed that students value automation for addressing the overwhelming preparation burden, yet require transparent, editable AI outputs to maintain cognitive ownership, which is essential for self-regulation. They conceptualized AI as a collaborative partner demanding verifiable reasoning rather than an autonomous agent. Metacognitive scaffolding was endorsed when clarifying study direction without constraining choice. Motivational features produced divergent responses. We derive design principles prioritizing editability and transparency, scaffolding metacognition without prescription, and accommodating motivational diversity. Findings identify conditions under which automation supports versus undermines metacognitive development in self-regulated learning.

Authors:Rafael M. Batista, Thomas L. Griffiths
Title: A Rational Analysis of the Effects of Sycophantic AI
Abstract:
People increasingly use large language models (LLMs) to explore ideas, gather information, and make sense of the world. In these interactions, they encounter agents that are overly agreeable. We argue that this sycophancy poses a unique epistemic risk to how individuals come to see the world: unlike hallucinations that introduce falsehoods, sycophancy distorts reality by returning responses that are biased to reinforce existing beliefs. We provide a rational analysis of this phenomenon, showing that when a Bayesian agent is provided with data that are sampled based on a current hypothesis the agent becomes increasingly confident about that hypothesis but does not make any progress towards the truth. We test this prediction using a modified Wason 2-4-6 rule discovery task where participants (N=557) interacted with AI agents providing different types of feedback. Unmodified LLM behavior suppressed discovery and inflated confidence comparably to explicitly sycophantic prompting. By contrast, unbiased sampling from the true distribution yielded discovery rates five times higher. These results reveal how sycophantic AI distorts belief, manufacturing certainty where there should be doubt.

Authors:Gengchen Cao, Tianke He, Yixuan Liu, RAY LC
Title: Audience in the Loop: Viewer Feedback-Driven Content Creation in Micro-drama Production on Social Media
Abstract:
The popularization of social media has led to increasing consumption of narrative content in byte-sized formats. Such micro-dramas contain fast-pace action and emotional cliffs, particularly attractive to emerging Chinese markets in platforms like Douyin and Kuaishou. Content writers for micro-dramas must adapt to fast-pace, audience-directed workflows, but previous research has focused instead on examining writers'experiences of platform affordances or their perceptions of platform bias, rather than the step-by-step processes through which they actually write and iterative content. In 28 semi-structured interviews with scriptwriters and writers specialized in micro-dramas, we found that the short-turn-around workflow leads to writers taking on multiple roles simultaneously, iteratively adapting to storylines in response to real-time audience feedback in the form of comments, reposts, and memes. We identified unique narrative styles such as AI-generated micro-dramas and audience-responsive micro-dramas. This work reveals audience interaction as a new paradigm for collaborative creative processes on social media.

Authors:Shiping Chen, Shu Zhong, Duncan P. Brumby, Anna L. Cox
Title: What happens when reviewers receive AI feedback in their reviews?
Abstract:
AI is reshaping academic research, yet its role in peer review remains polarising and contentious. Advocates see its potential to reduce reviewer burden and improve quality, while critics warn of risks to fairness, accountability, and trust. At ICLR 2025, an official AI feedback tool was deployed to provide reviewers with post-review suggestions. We studied this deployment through surveys and interviews, investigating how reviewers engaged with the tool and perceived its usability and impact. Our findings surface both opportunities and tensions when AI augments in peer review. This work contributes the first empirical evidence of such an AI tool in a live review process, documenting how reviewers respond to AI-generated feedback in a high-stakes review context. We further offer design implications for AI-assisted reviewing that aim to enhance quality while safeguarding human expertise, agency, and responsibility.

Authors:Tarek Rahman, Md Shaharia Hossen, Mark Protik Mondol, Jannatun Noor Mukta
Title: Search in Transition: A Study of University Students Perspectives on Using LLMs and Traditional Search Engines in English Test Problem Solving for Higher Study
Abstract:
As Artificial Intelligence (AI) becomes increasingly integrated into education, university students preparing for English language tests are frequently shifting between traditional search engines like Google and large language models (LLMs) to assist with problem-solving. This study explores students perceptions of these tools, particularly in terms of usability, efficiency, and how they fit into English test preparation practices. Using a mixed-methods design, we collected survey data from 140 university students across various academic fields and conducted in-depth interviews with 20 participants. Quantitative analyses, including ANOVA and chi-square tests, were applied to assess differences in perceived efficiency, satisfaction, and overall tool preference. The qualitative results reveal that students strategically alternate between GPT and Google based on task requirements. Google is primarily used for accessing reliable, multi-source information and verifying rules, whereas GPT is favored for summarizing content, providing explanations, paraphrasing, and drafting responses for English test tasks. Since neither tool independently satisfies all aspects of English language test preparation, students expressed a clear preference for an integrated approach. In response, this study proposes a prototype chatbot embedded within a search interface, combining GPTs interactive capabilities with Googles credibility to enhance test preparation and reduce cognitive load.

Authors:Manuele Reani, Xiangyang He, Zuolan Bao
Title: Anthropomorphism on Risk Perception: The Role of Trust and Domain Knowledge in Decision-Support AI
Abstract:
Anthropomorphic design is routinely used to make conversational agents more approachable and engaging. Yet its influence on users' perceptions remains poorly understood. Drawing on psychological theories, we propose that anthropomorphism influences risk perception via two complementary forms of trust, and that domain knowledge moderates these relationships. To test our model, we conducted a large-scale online experiment (N = 1,256) on a financial decision-support system implementing different anthropomorphic designs. We found that anthropomorphism indirectly reduces risk perception by increasing both cognitive and affective trust. Domain knowledge moderates these paths: participants with low financial knowledge experience a negative indirect effect of perceived anthropomorphism on risk perception via cognitive trust, whereas those with high financial knowledge exhibit a positive direct and indirect effect. We discuss theoretical contributions to human-AI interaction and design implications for calibrating trust in anthropomorphic decision-support systems for responsible AI.

Authors:Belu Ticona, Amna Liaqat, Antonios Anastasopoulos
Title: What Do We Mean by 'Pilot Study': Early Findings from a Meta-Review of Pilot Study Reporting at CHI
Abstract:
Pilot studies (PS) are ubiquitous in HCI research. CHI papers routinely reference 'pilot studies', 'pilot tests', or 'preliminary studies' to justify design decisions, verify procedures, or motivate methodological choices. Yet despite their frequency, the role of pilot studies in HCI remains conceptually vague and empirically underexamined. Unlike fields such as medicine, nursing, and education, where pilot and feasibility studies have well-established definitions, guidelines, reporting standards and even a dedicated research journal, the CHI community lacks a shared understanding of what constitutes a pilot study, why they are conducted, and how they should be reported. Many papers reference pilots 'in passing', without details about design, outcomes, or how the pilot informed the main study. This variability suggests a methodological blind spot in our community.

Authors:Phyllis Nabangi, Abdul-Jalil Zakaria, Jema David Ndibwile
Title: Using Machine Learning to Enhance the Detection of Obfuscated Abusive Words in Swahili: A Focus on Child Safety
Abstract:
The rise of digital technology has dramatically increased the potential for cyberbullying and online abuse, necessitating enhanced measures for detection and prevention, especially among children. This study focuses on detecting abusive obfuscated language in Swahili, a low-resource language that poses unique challenges due to its limited linguistic resources and technological support. Swahili is chosen due to its popularity and being the most widely spoken language in Africa, with over 16 million native speakers and upwards of 100 million speakers in total, spanning regions in East Africa and some parts of the Middle East. We employed machine learning models including Support Vector Machines (SVM), Logistic Regression, and Decision Trees, optimized through rigorous parameter tuning and techniques like Synthetic Minority Over-sampling Technique (SMOTE) to handle data imbalance. Our analysis revealed that, while these models perform well in high-dimensional textual data, our dataset's small size and imbalance limit our findings' generalizability. Precision, recall, and F1 scores were thoroughly analyzed, highlighting the nuanced performance of each model in detecting obfuscated language. This research contributes to the broader discourse on ensuring safer online environments for children, advocating for expanded datasets and advanced machine-learning techniques to improve the effectiveness of cyberbullying detection systems. Future work will focus on enhancing data robustness, exploring transfer learning, and integrating multimodal data to create more comprehensive and culturally sensitive detection mechanisms.

Authors:Ching-Yi Tsai, Nicole Tacconi, Andrew D. Wilson, Parastoo Abtahi
Title: Uncertain Pointer: Situated Feedforward Visualizations for Ambiguity-Aware AR Target Selection
Abstract:
Target disambiguation is crucial in resolving input ambiguity in augmented reality (AR), especially for queries over distant objects or cluttered scenes on the go. Yet, visual feedforward techniques that support this process remain underexplored. We present Uncertain Pointer, a systematic exploration of feedforward visualizations that annotate multiple candidate targets before user confirmation, either by adding distinct visual identities (e.g., colors) to support disambiguation or by modulating visual intensity (e.g., opacity) to convey system uncertainty. First, we construct a pointer space of 25 pointers by analyzing existing placement strategies and visual signifiers used in target visualizations across 30 years of relevant literature. We then evaluate them through two online experiments (n = 60 and 40), measuring user preference, confidence, mental ease, target visibility, and identifiability across varying object distances and sparsities. Finally, from the results, we derive design recommendations in choosing different Uncertain Pointers based on AR context and disambiguation techniques.

Authors:Gaston Besanson, Federico Todeschini
Title: Accuracy Standards for AI at Work vs. Personal Life: Evidence from an Online Survey
Abstract:
We study how people trade off accuracy when using AI-powered tools in professional versus personal contexts for adoption purposes, the determinants of those trade-offs, and how users cope when AI/apps are unavailable. Because modern AI systems (especially generative models) can produce acceptable but non-identical outputs, we define "accuracy" as context-specific reliability: the degree to which an output aligns with the user's intent within a tolerance threshold that depends on stakes and the cost of correction. In an online survey (N=300), among respondents with both accuracy items (N=170), the share requiring high accuracy (top-box) is 24.1% at work vs. 8.8% in personal life (+15.3 pp; z=6.29, p<0.001). The gap remains large under a broader top-two-box definition (67.0% vs. 32.9%) and on the full 1-5 ordinal scale (mean 3.86 vs. 3.08). Heavy app use and experience patterns correlate with stricter work standards (H2). When tools are unavailable (H3), respondents report more disruption in personal routines than at work (34.1% vs. 15.3%, p<0.01). We keep the main text focused on these substantive results and place test taxonomy and power derivations in a technical appendix.

Authors:Prabhav Bhatnagar, Jianheng He, Shamit Ahmed, Andrés Lucero, Perttu Hämäläinen
Title: Reflection at Design Actualization (RDA) : A Tool and Process For Research Through Game Design
Abstract:
There is a growing interest in researching game design processes, artifacts and culture through active game design. Tools and processes to support these attempts are limited, especially in terms of a) capturing smaller design decisions where rich tacit information is often situated, and b) visually tracking the project's growth and evolution. To address this gap, we present Reflection at Design Actualization (RDA), an open source tool and process for collecting granular reflections at playtesting moments and automatically recording the playtests, bringing reflection and data collection closer to the point where design decisions concretize. Three researchers engaged with and evaluated RDA in three varied game development projects, adhering to the principles of autobiographical design. We illustrate the designer experience with RDA through three themes, namely, designer-routine compromise, designer-researcher persona consolidation, and mirror effect of RDA. We further discuss the tool's challenges and share each designer's personal experience as case studies.

Authors:Mahdi Haghighat Joo, Maryam Karimi Jafari, Alireza Taheri
Title: PISHYAR: A Socially Intelligent Smart Cane for Indoor Social Navigation and Multimodal Human-Robot Interaction for Visually Impaired People
Abstract:
This paper presents PISHYAR, a socially intelligent smart cane designed by our group to combine socially aware navigation with multimodal human-AI interaction to support both physical mobility and interactive assistance. The system consists of two components: (1) a social navigation framework implemented on a Raspberry Pi 5 that integrates real-time RGB-D perception using an OAK-D Lite camera, YOLOv8-based object detection, COMPOSER-based collective activity recognition, D* Lite dynamic path planning, and haptic feedback via vibration motors for tasks such as locating a vacant seat; and (2) an agentic multimodal LLM-VLM interaction framework that integrates speech recognition, vision language models, large language models, and text-to-speech, with dynamic routing between voice-only and vision-only modes to enable natural voice-based communication, scene description, and object localization from visual input. The system is evaluated through a combination of simulation-based tests, real-world field experiments, and user-centered studies. Results from simulated and real indoor environments demonstrate reliable obstacle avoidance and socially compliant navigation, achieving an overall system accuracy of approximately 80% under different social conditions. Group activity recognition further shows robust performance across diverse crowd scenarios. In addition, a preliminary exploratory user study with eight visually impaired and low-vision participants evaluates the agentic interaction framework through structured tasks and a UTAUT-based questionnaire reveals high acceptance and positive perceptions of usability, trust, and perceived sociability during our experiments. The results highlight the potential of PISHYAR as a multimodal assistive mobility aid that extends beyond navigation to provide socially interactive support for such users.

Authors:Raffaele Ciriello, Uri Gal, Ofir Turel
Title: Not a Silver Bullet for Loneliness: How Attachment and Age Shape Intimacy with AI Companions
Abstract:
Artificial intelligence (AI) companions are increasingly promoted as solutions for loneliness, often overlooking how personal dispositions and life-stage conditions shape artificial intimacy. Because intimacy is a primary coping mechanism for loneliness that varies by attachment style and age, we examine how different types of users form intimate relationships with AI companions in response to loneliness. Drawing on a hermeneutic literature review and a survey of 277 active AI companion users, we develop and test a model in which loneliness predicts intimacy, moderated by attachment insecurity and conditioned by age. Although the cross-sectional data limits causal inference, the results reveal a differentiated pattern. Loneliness is paradoxically associated with reduced intimacy for securely attached users but with increased intimacy for avoidant and ambivalent users, while anxious users show mixed effects. Older adults report higher intimacy even at lower loneliness levels. These findings challenge portrayals of AI companions as universal remedies for loneliness. Instead, artificial intimacy emerges as a sociotechnical process shaped by psychological dispositions and demographic conditions. The study clarifies who is most likely to form intimate relationships with AI companions and highlights ethical risks in commercial models that may capitalise on user vulnerability.

Authors:Mohammad Raihanul Bashar, Aunnoy K Mutasim, Ken Pfeuffer, Anil Ufuk Batmaz
Title: Eyes on Many: Evaluating Gaze, Hand, and Voice for Multi-Object Selection in Extended Reality
Abstract:
Interacting with multiple objects simultaneously makes us fast. A pre-step to this interaction is to select the objects, i.e., multi-object selection, which is enabled through two steps: (1) toggling multi-selection mode -- mode-switching -- and then (2) selecting all the intended objects -- subselection. In extended reality (XR), each step can be performed with the eyes, hands, and voice. To examine how design choices affect user performance, we evaluated four mode-switching (SemiPinch, FullPinch, DoublePinch, and Voice) and three subselection techniques (Gaze+Dwell, Gaze+Pinch, and Gaze+Voice) in a user study. Results revealed that while DoublePinch paired with Gaze+Pinch yielded the highest overall performance, SemiPinch achieved the lowest performance. Although Voice-based mode-switching showed benefits, Gaze+Voice subselection was less favored, as the required repetitive vocal commands were perceived as tedious. Overall, these findings provide empirical insights and inform design recommendations for multi-selection techniques in XR.

Authors:Kaisa Vaananen, Niels van Berkel, Donald McMillan, Thomas Olsson
Title: Embodied AI Agents for Team Collaboration in Co-located Blue-Collar Work
Abstract:
Blue-collar work is often highly collaborative, embodied, and situated in shared physical environments, yet most research on collaborative AI has focused on white-collar work. This position paper explores how the embodied nature of AI agents can support team collaboration and communication in co-located blue-collar workplaces. From the context of our newly started CAI-BLUE research project, we present two speculative scenarios from industrial and maintenance contexts that illustrate how embodied AI agents can support shared situational awareness and facilitate inclusive communication across experience levels. We outline open questions related to embodied AI agent design around worker inclusion, agency, transformation of blue-collar collaboration practices over time, and forms of acceptable AI embodiments. We argue that embodiment is not just an aesthetic choice but should become a socio-material design strategy of AI systems in blue-collar workplaces.

Authors:Jiajun Chen, Hua Shen
Title: Value Alignment Tax: Measuring Value Trade-offs in LLM Alignment
Abstract:
Existing work on value alignment typically characterizes value relations statically, ignoring how interventions - such as prompting, fine-tuning, or preference optimization - reshape the broader value system. We introduce the Value Alignment Tax (VAT), a framework that measures how alignment-induced changes propagate across interconnected values relative to achieved on-target gain. VAT captures the dynamics of value expression under alignment pressure. Using a controlled scenario-action dataset grounded in Schwartz value theory, we collect paired pre-post normative judgments and analyze alignment effects across models, values, and alignment strategies. Our results show that alignment often produces uneven, structured co-movement among values. These effects are invisible under conventional target-only evaluation, revealing systemic, process-level alignment risks and offering new insights into the dynamics of value alignment in LLMs.

Authors:Zhidian Lin, Allison Jing, Ziyuan Qu, Fabio Zambetta, Ryan M. Kelly
Title: Mapping the Landscape of Affective Extended Reality: A Scoping Review of Biodata-Driven Systems for Understanding and Sharing Emotions
Abstract:
This paper introduces the notion of affective extended reality (XR) to characterise XR systems that use biodata to enable understanding of emotions. The HCI literature contains many such systems, but they have not yet been mapped into a coherent whole. To address this, we conducted a scoping review of 82 papers that explore the nexus of biodata, emotions, and XR. We analyse the technologies used in these systems, the interaction techniques employed, and the methods used to evaluate their effectiveness. Through our analysis, we contribute a mapping of the current landscape of affective XR, revealing diversity in the goals for enabling emotion sharing. We demonstrate how HCI researchers have explored the design of the interaction flows in XR biofeedback systems, highlighting key design dimensions and challenges in understanding emotions. We discuss underused approaches for emotion sharing and highlight opportunities for future research on affective XR.

Authors:Yifan Zhao, Yuxin Fang, Yihuan Chen, RAY LC
Title: "I Was Told to Come Back and Share This": Social Media-Based Near-Death Experience Disclosures as Expressions of Spiritual Beliefs
Abstract:
People who experienced near-death events often turn to personal expression as a way of processing trauma and articulating beliefs. While scholars have examined how individuals share near-death experiences (NDEs), limited research has explored how these narratives are communicated collaboratively on today's social media platforms. We analyzed 200 randomly sampled TikTok videos tagged with #nde and related hashtags. Content analysis revealed that individuals often use NDE narratives to articulate personal meaning, with spiritual and religious themes appearing in the majority of posts and serving as a means of exploring and making sense of personal spiritual perspectives. Consistent with this, analyses of comment sections reveal that videos containing spiritual themes tend to attract more engagement and foster deeper conversations around faith and meaning. Our findings offer insights into how online platforms facilitate community-level engagement with spirituality, and suggest implications for design of spaces that support shared expression and connection in specialized communities.

Authors:Dennis Kim, Roya Daneshi, Bruce Draper, Sarath Sreedharan
Title: Implications of AI Involvement for Trust in Expert Advisory Workflows Under Epistemic Dependence
Abstract:
The increasing integration of AI-powered tools into expert workflows, such as medicine, law, and finance, raises a critical question: how does AI involvement influence a user's trust in the human expert, the AI system, and their combination? To investigate this, we conducted a user study (N=77) featuring a simulated course-planning task. We compared various conditions that differed in both the presence of AI and the specific mode of human-AI collaboration. Our results indicate that while the advisor's ability to create a correct schedule is important, the user's perception of expertise and trust is also influenced by how the expert utilized the AI assistant. These findings raise important considerations for the design of human-AI hybrid teams, particularly when the adoption of recommendations depends on the end-user's perception of the recommender's expertise.

Authors:Jaime Banks, Jon Stromer-Galley, Samiksha Singh, Collin Capano
Title: DiSCoKit: An Open-Source Toolkit for Deploying Live LLM Experiences in Survey Research
Abstract:
Advancing social-scientific research of human-AI interaction dynamics and outcomes often requires researchers to deliver experiences with live large-language models (LLMs) to participants through online survey platforms. However, technical and practical challenges (from logging chat data to manipulating AI behaviors for experimental designs) often inhibit survey-based deployment of AI stimuli. We developed DiSCoKit--an open-source toolkit for deploying live LLM experiences (e.g., ones based on models delivered through Microsoft Azure portal) through JavaScript-enabled survey platforms (e.g., Qualtrics). This paper introduces that toolkit, explaining its scientific impetus, describes its architecture and operation, as well as its deployment possibilities and limitations.

Authors:Alexanne Worm, Florian Marchal, Sylvain Castagnos
Title: BIRD: A Museum Open Dataset Combining Behavior Patterns and Identity Types to Better Model Visitors' Experience
Abstract:
Lack of data is a recurring problem in Artificial Intelligence, as it is essential for training and validating models. This is particularly true in the field of cultural heritage, where the number of open datasets is relatively limited and where the data collected does not always allow for holistic modeling of visitors' experience due to the fact that data are ad hoc (i.e. restricted to the sole characteristics required for the evaluation of a specific model). To overcome this lack, we conducted a study between February and March 2019 aimed at obtaining comprehensive and detailed information about visitors, their visit experience and their feedback. We equipped 51 participants with eye-tracking glasses, leaving them free to explore the 3 floors of the museum for an average of 57 minutes, and to discover an exhibition of more than 400 artworks. On this basis, we built an open dataset combining contextual data (demographic data, preferences, visiting habits, motivations, social context. . . ), behavioral data (spatiotemporal trajectories, gaze data) and feedback (satisfaction, fatigue, liked artworks, verbatim. . . ). Our analysis made it possible to re-enact visitor identities combining the majority of characteristics found in the literature and to reproduce the Veron and Levasseur profiles. This dataset will ultimately make it possible to improve the quality of recommended paths in museums by personalizing the number of points of interest (POIs), the time spent at these different POIs, and the amount of information to be provided to each visitor based on their level of interest.

Authors:Natalia Abarca, Andrés Carvallo, Claudia López Moncada, Felipe Bravo-Marquez
Title: Explaining AI Without Code: A User Study on Explainable AI
Abstract:
The increasing use of Machine Learning (ML) in sensitive domains such as healthcare, finance, and public policy has raised concerns about the transparency of automated decisions. Explainable AI (XAI) addresses this by clarifying how models generate predictions, yet most methods demand technical expertise, limiting their value for novices. This gap is especially critical in no-code ML platforms, which seek to democratize AI but rarely include explainability. We present a human-centered XAI module in DashAI, an open-source no-code ML platform. The module integrates three complementary techniques, which are Partial Dependence Plots (PDP), Permutation Feature Importance (PFI), and KernelSHAP, into DashAI's workflow for tabular classification. A user study (N = 20; ML novices and experts) evaluated usability and the impact of explanations. Results show: (i) high task success ($\geq80\%$) across all explainability tasks; (ii) novices rated explanations as useful, accurate, and trustworthy on the Explanation Satisfaction Scale (ESS, Cronbach's $α$ = 0.74, a measure of internal consistency), while experts were more critical of sufficiency and completeness; and (iii) explanations improved perceived predictability and confidence on the Trust in Automation scale (TiA, $α$ = 0.60), with novices showing higher trust than experts. These findings highlight a central challenge for XAI in no-code ML, making explanations both accessible to novices and sufficiently detailed for experts.

Authors:Chuncheng Liu, Danah Boyd
Title: The State's Politics of "Fake Data"
Abstract:
Data have power. As such, most discussions of data presume that records should mirror some idealized ground truth. Deviations are viewed as failure. Drawing on two ethnographic studies of state data-making in a Chinese street-level bureaucrat agency and at the US Census Bureau we show how seemingly "fake" state data perform institutional work. We map four moments in which actors negotiate between representational accuracy and organizational imperatives: creation, correction, collusion, and augmentation. Bureaucrats routinely privilege what data do over what they represent, creating fictions that serve civil servants' self-interest and enable constrained administrations. We argue that "fakeness" of state data is relational (context dependent), processual (emerging through workflows), and performative (brought into being through labeling and practice). We urge practitioners to center fitness-for-purpose in assessments of data and contextual governance. Rather than chasing impossible representational accuracy, sociotechnical systems should render the politics of useful fictions visible, contestable, and accountable.

Authors:Robin Beierling, Manuel Scheibl, Jonas Dech, Abhijit Vyas, Anna-Lisa Vollmer
Title: From Interaction to Demonstration Quality in Virtual Reality: Effects of Interaction Modality and Visual Representation on Everyday Tasks
Abstract:
Virtual Reality (VR) is increasingly used for training and demonstration purposes including a variety of applications ranging from robot learning to rehabilitation. However, the choice of input device and its visualization might influence workload and thus user performance leading to suboptimal demonstrations or reduced training effects. This study investigates how different VR input configurations - motion capture gloves, controllers with hand visualization, and controllers with controller visualization - affect user experience and task execution, with the goal of identifying which configuration is best suited for which type of task. Participants performed various kitchen-related activities of daily living (ADLs), including object placement, cutting, cleaning, and pouring in a simulated environment. To address two research questions, we evaluated user experience using the System Usability Scale and NASA Task Load Index (RQ1), and task-specific interaction behavior (RQ2). The latter was assessed using trajectory segmentation, analyzing movement efficiency, unnecessary actions, and execution precision. While no significant differences in overall usability and workload were found, trajectory analysis revealed configuration-specific execution behaviors with different movement strategies. Controllers enabled significantly faster task completion with less movement variability in pick-and-place style tasks such as table setting. In contrast, motion capture gloves produced more natural movements with fewer unnecessary actions, but also showed greater variance in movement patterns for manner-oriented tasks such as cutting bread. These findings highlight trade-offs between efficiency and naturalism, and have implications for optimizing VR-based training, improving the quality of user-generated demonstrations, and tailoring interaction design to specific application goals.

Authors:Hao Zhou, Mahanth Gowda
Title: Exploring the Feasibility of Full-Body Muscle Activation Sensing with Insole Pressure Sensors
Abstract:
Muscle activation initiates contractions that drive human movement, and understanding it provides valuable insights for injury prevention and rehabilitation. Yet, sensing muscle activation is barely explored in the rapidly growing mobile health market. Traditional methods for muscle activation sensing rely on specialized electrodes, such as surface electromyography, making them impractical, especially for long-term usage. In this paper, we introduce Press2Muscle, the first system to unobtrusively infer muscle activation using insole pressure sensors. The key idea is to analyze foot pressure changes resulting from full-body muscle activation that drives movements. To handle variations in pressure signals due to differences in users' gait, weight, and movement styles, we propose a data-driven approach to dynamically adjust reliance on different foot regions and incorporate easily accessible biographical data to enhance Press2Muscle's generalization to unseen users. We conducted an extensive study with 30 users. Under a leave-one-user-out setting, Press2Muscle achieves a root mean square error of 0.025, marking a 19% improvement over a video-based counterpart. A robustness study validates Press2Muscle's ability to generalize across user demographics, footwear types, and walking surfaces. Additionally, we showcase muscle imbalance detection and muscle activation estimation under free-living settings with Press2Muscle, confirming the feasibility of muscle activation sensing using insole pressure sensors in real-world settings.

Authors:Xinru Tang, Jingjin Li, Shaomei Wu
Title: Disability-First AI Dataset Annotation: Co-designing Stuttered Speech Annotation Guidelines with People Who Stutter
Abstract:
Despite efforts to increase the representation of disabled people in AI datasets, accessibility datasets are often annotated by crowdworkers without disability-specific expertise, leading to inconsistent or inaccurate labels. This paper examines these annotation challenges through a case study of annotating speech data from people who stutter (PWS). Given the variability of stuttering and differing views on how it manifests, annotating and transcribing stuttered speech remains difficult, even for trained professionals. Through interviews and co-design workshops with PWS and domain experts, we identify challenges in stuttered speech annotation and develop practices that integrate the lived experiences of PWS into the annotation process. Our findings highlight the value of embodied knowledge in improving dataset quality, while revealing tensions between the complexity of disability experiences and the rigidity of static labels. We conclude with implications for disability-first and multiplicity-aware approaches to data interpretation across the AI pipeline.

Authors:Caroline Wang, Daniel Kasenberg, Kim Stachenfeld, Pablo Samuel Castro
Title: Discovering Differences in Strategic Behavior Between Humans and LLMs
Abstract:
As Large Language Models (LLMs) are increasingly deployed in social and strategic scenarios, it becomes critical to understand where and why their behavior diverges from that of humans. While behavioral game theory (BGT) provides a framework for analyzing behavior, existing models do not fully capture the idiosyncratic behavior of humans or black-box, non-human agents like LLMs. We employ AlphaEvolve, a cutting-edge program discovery tool, to directly discover interpretable models of human and LLM behavior from data, thereby enabling open-ended discovery of structural factors driving human and LLM behavior. Our analysis on iterated rock-paper-scissors reveals that frontier LLMs can be capable of deeper strategic behavior than humans. These results provide a foundation for understanding structural differences driving differences in human and LLM behavior in strategic interactions.

Authors:Jiqun Liu, Nischal Dinesh, Ran Yu
Title: ECHO: An Open Research Platform for Evaluation of Chat, Human Behavior, and Outcomes
Abstract:
ECHO (Evaluation of Chat, Human behavior, and Outcomes) is an open research platform designed to support reproducible, mixed-method studies of human interaction with both conversational AI systems and Web search engines. It enables researchers from varying disciplines to orchestrate end-to-end experimental workflows that integrate consent and background surveys, chat-based and search-based information-seeking sessions, writing or judgment tasks, and pre- and post-task evaluations within a unified, low-coding-load framework. ECHO logs fine-grained interaction traces and participant responses, and exports structured datasets for downstream analysis. By supporting both chat and search alongside flexible evaluation instruments, ECHO lowers technical barriers for studying learning, decision making, and user experience across different information access paradigms, empowering researchers from information retrieval, HCI, and the social sciences to conduct scalable and reproducible human-centered AI evaluations.

Authors:Yang Chen Lin, Chen-Ying Chen, Kai-Hsin Hou, Hung-Yu Chen, Po-Chih Kuo
Title: AIDED: Augmenting Interior Design with Human Experience Data for Designer-AI Co-Design
Abstract:
Interior design often struggles to capture the subtleties of client experience, leaving gaps between what clients feel and what designers can act upon. We present AIDED, a designer-AI co-design workflow that integrates multimodal client data into generative AI (GAI) design processes. In a within-subjects study with twelve professional designers, we compared four modalities: baseline briefs, gaze heatmaps, questionnaire visualizations, and AI-predicted overlays. Results show that questionnaire data were trusted, creativity-enhancing, and satisfying; gaze heatmaps increased cognitive load; and AI-predicted overlays improved GAI communication but required natural language mediation to establish trust. Interviews confirmed that an authenticity-interpretability trade-off is central to balancing client voices with professional control. Our contributions are: (1) a system that incorporates experiential client signals into GAI design workflows; (2) empirical evidence of how different modalities affect design outcomes; and (3) implications for future AI tools that support human-data interaction in creative practice.

Authors:Bob Van Dyck, Arne Van Den Kerchove, Marc M. Van Hulle
Title: An open-source implementation of a closed-loop electrocorticographic Brain-Computer Interface using Micromed, FieldTrip, and PsychoPy
Abstract:
We present an open-source implementation of a closed-loop Brain-Computer Interface (BCI) system based on electrocorticographic (ECoG) recordings. Our setup integrates FieldTrip for interfacing with a Micromed acquisition system and PsychoPy for implementing experiments. We open-source three custom Python libraries (psychopylib, pymarkerlib, and pyfieldtriplib) each covering different aspects of a closed-loop BCI interface: designing interactive experiments, sending event information, and real-time signal processing. Our modules facilitate the design and operation of a transparent BCI system, promoting customization and flexibility in BCI research, and lowering the barrier for researchers to translate advances in ECoG decoding into BCI applications.

Authors:Hayfa Dhabhi, Kashyap Thimmaraju
Title: Stop Testing Attacks, Start Diagnosing Defenses: The Four-Checkpoint Framework Reveals Where LLM Safety Breaks
Abstract:
Large Language Models (LLMs) deploy safety mechanisms to prevent harmful outputs, yet these defenses remain vulnerable to adversarial prompts. While existing research demonstrates that jailbreak attacks succeed, it does not explain \textit{where} defenses fail or \textit{why}. To address this gap, we propose that LLM safety operates as a sequential pipeline with distinct checkpoints. We introduce the \textbf{Four-Checkpoint Framework}, which organizes safety mechanisms along two dimensions: processing stage (input vs.\ output) and detection level (literal vs.\ intent). This creates four checkpoints, CP1 through CP4, each representing a defensive layer that can be independently evaluated. We design 13 evasion techniques, each targeting a specific checkpoint, enabling controlled testing of individual defensive layers. Using this framework, we evaluate GPT-5, Claude Sonnet 4, and Gemini 2.5 Pro across 3,312 single-turn, black-box test cases. We employ an LLM-as-judge approach for response classification and introduce Weighted Attack Success Rate (WASR), a severity-adjusted metric that captures partial information leakage overlooked by binary evaluation. Our evaluation reveals clear patterns. Traditional Binary ASR reports 22.6\% attack success. However, WASR reveals 52.7\%, a 2.3$\times$ higher vulnerability. Output-stage defenses (CP3, CP4) prove weakest at 72--79\% WASR, while input-literal defenses (CP1) are strongest at 13\% WASR. Claude achieves the strongest safety (42.8\% WASR), followed by GPT-5 (55.9\%) and Gemini (59.5\%). These findings suggest that current defenses are strongest at input-literal checkpoints but remain vulnerable to intent-level manipulation and output-stage techniques. The Four-Checkpoint Framework provides a structured approach for identifying and addressing safety vulnerabilities in deployed systems.

Authors:Tuan-He Lee, Gilly Leshed
Title: Understanding Remote Mental Health Supporters' Help-Seeking in Online Communities
Abstract:
Providing mental health support for loved ones across a geographic distance creates unique challenges for the remote caregivers, who sometimes turn to online communities for peer support. We qualitatively analyzed 522 Reddit threads to understand what drives remote caregivers' online help-seeking behaviors and the responses they receive from the community. Their purposes of posting included requesting guidance, expressing emotions, and seeking validation. Community responses included providing emotional support, suggesting informational strategies, and sharing personal experiences. While certain themes in posts (emotional toll, monitoring symptoms, and prioritizing caregiver well-being) are shared across remote and non-remote contexts, remote caregivers' posts surfaced nuanced experiences. For example, they often rely on digital cues, such as voice, to interpret care receivers' well-being while struggling with digital silence during crises. We discuss the need for supporting communication and information sharing between remote caregivers and receivers, care coordination for crisis management, and design recommendations for caregiver communities.

Authors:Minja Axelsson, Henry Shevlin
Title: Disambiguating Anthropomorphism and Anthropomimesis in Human-Robot Interaction
Abstract:
In this preliminary work, we offer an initial disambiguation of the theoretical concepts anthropomorphism and anthropomimesis in Human-Robot Interaction (HRI) and social robotics. We define anthropomorphism as users perceiving human-like qualities in robots, and anthropomimesis as robot developers designing human-like features into robots. This contribution aims to provide a clarification and exploration of these concepts for future HRI scholarship, particularly regarding the party responsible for human-like qualities - robot perceiver for anthropomorphism, and robot designer for anthropomimesis. We provide this contribution so that researchers can build on these disambiguated theoretical concepts for future robot design and evaluation.

Authors:Alif Rizqullah Mahdi, Mahdi Rezaei, Natasha Merat
Title: Gesture Matters: Pedestrian Gesture Recognition for AVs Through Skeleton Pose Evaluation
Abstract:
Gestures are a key component of non-verbal communication in traffic, often helping pedestrian-to-driver interactions when formal traffic rules may be insufficient. This problem becomes more apparent when autonomous vehicles (AVs) struggle to interpret such gestures. In this study, we present a gesture classification framework using 2D pose estimation applied to real-world video sequences from the WIVW dataset. We categorise gestures into four primary classes (Stop, Go, Thank & Greet, and No Gesture) and extract 76 static and dynamic features from normalised keypoints. Our analysis demonstrates that hand position and movement velocity are especially discriminative in distinguishing between gesture classes, achieving a classification accuracy score of 87%. These findings not only improve the perceptual capabilities of AV systems but also contribute to the broader understanding of pedestrian behaviour in traffic contexts.

Authors:Andreas Tjeldflaat, Piero Romare, Yuki Onishi, Morten Fjeld, Bjørn Sætrevik
Title: A Two-Week In-the-Wild Study of Screen Filters and Camera Sliders for Smartphone Privacy in Public Spaces
Abstract:
Smartphone usage in public spaces can raise privacy concerns, in terms of shoulder surfing and unintended camera capture. In real-world public space settings, we investigated the impact of tangible privacy-enhancing tools (here: screen filter and camera slider) on smartphone users' reported privacy perception, behavioral adaptations, usability and social dynamics. We conducted a mixed-method, in-the-wild study ($N = 22$) using off-the-shelf smartphone privacy tools. We investigated subjective behavioral transition by combining questionnaires with semi-structured interviews. Participants used the screen filter and the camera slider for two weeks; they reported changes in attitude and behavior after using a screen filter including screen visibility and comfort when using phones publicly. They explained decreased privacy-protective behaviors, such as actively covering their screens, suggesting a shift in perceived risk. Qualitative findings about the camera slider suggested underlying psychological mechanisms, including privacy awareness and concerns about social perception, while also offering insights regarding the tools' effectiveness.

Authors:Tetiana Krushynska, Jani Ursin, Ville Heilala
Title: AI-Assisted Model for Generating Multiple-Choice Questions
Abstract:
Multiple-choice questions (MCQs) are widely used across diverse educational fields and levels. Well-designed MCQs should evaluate knowledge application in real-world situations. However, writing such test items in sufficient numbers is challenging and time-consuming, especially in natural science education. The problem of a sufficient number of MCQs has two aspects: content coverage and exam security. Therefore, generating test items involves two tasks: creating MCQ prototypes and transforming these prototypes into item series. In automated item generation, prototype creation aligns with template-based methods like cognitive modelling, while item expansion corresponds to example-based techniques. The aim of this research was designing the goal-oriented conceptual model of human - AI co-creation of MCQs that should meet strictly formulated quality criteria. The resulting three-step model for creating MCQ prototypes distributed prompts between several AIs, with human revision of responses for each prompt before setting the next one. To transform the MCQ prototype into an MCQ series, a one-step model was developed in which multiple new items are generated simultaneously. These items assess the same learning outcome but are not simple rephrasings of the prototype or of one another. Based on human and automated evaluation, approximately half of the output MCQs were acceptable without editing. Minor corrections of initially rejected test items allowed for a moderate increase in acceptance of MCQs in series and a significant improvement of MCQ-prototypes.

Authors:Daniel Mwesigwa, Cyan DeVeaux, Palashi Vaghela
Title: To Tango or to Disentangle? Making Ethnography Public in the Digital Age
Abstract:
Ethnography attends to relations among people, practices, and the technologies that mediate them. Central to this method is the duality of roles ethnographers navigate as researchers and participants and as outsiders and insiders. However, the rise of digital platforms has introduced new opportunities as well as practical and ethical challenges that reshape these dualities across hybrid media environments spanning both online and offline contexts. Drawing on two case studies of VRChat and WhatsApp, we examine how ethnographers employ diverse tactics to study both enduring and emerging socio-cultural issues of race and caste, particularly those that form what are often called publics. We propose emergent relationality as a key analytic for understanding the mutual shaping of ethnographers, platforms, and publics. In this work, emergent relationality offers registers for analyzing how positionality and hybrid media environments constitute and condition what can be accessed, articulated, and made public.

Authors:Rama Adithya, Varanasi, Nov, Oded, Wiesenfeld, Batia Mishan
Title: Investigating Writing Professionals' Relationships with Generative AI: How Combined Perceptions of Rivalry and Collaboration Shape Work Practices and Outcomes
Abstract:
This study investigates how professional writers' complex relationship with GenAI shapes their work practices and outcomes. Through a cross-sectional survey with writing professionals (n=403) in diverse roles, we show that collaboration and rivalry orientation are associated with differences in work practices and outcomes. Rivalry is primarily associated with relational crafting and skill maintenance. Collaboration is primarily associated with task crafting, productivity, and satisfaction, at the cost of long-term skill deterioration. Combination of the orientations (high rivalry and high collaboration) reconciles these differences, while boosting the association with the outcomes. Our findings argue for a balanced approach where high levels of rivalry and collaboration are essential to shape work practices and generate outcomes aimed at the long-term success of the job. We present key design implications on how to increase friction (rivalry) and reduce over-reliance (collaboration) to achieve a more balanced relationship with GenAI.

Authors:Dohui Lee, Qi Sun, Sang Ho Yoon
Title: HOICraft: In-Situ VLM-based Authoring Tool for Part-Level Hand-Object Interaction Design in VR
Abstract:
Hand-Object Interaction (HOI) is a key interaction component in Virtual Reality (VR). However, designing HOI still requires manual efforts to decide how object should be selected and manipulated, while also considering user abilities, which leads to time-consuming refinements. We present HOICraft, a VLM-based in-situ HOI authoring tool that enables part-level interaction design in VR. Here, HOICraft assists designers by recommending interactable elements from 3D objects, customizing HOI design properties, and mapping hand movement with virtual object behavior. We conducted a formative study with three expert VR designers to identify five representative HOI designs to support diverse user experiences. Building upon preference data from 20 participants, we develop an HOI mapping module with in-context learning. In a user study with 12 VR interaction designers, HOI mapping from HOICraft significantly reduced trial-and-error iterations compared to manual authoring. Finally, we assessed the usability of HOICraft, demonstrating its effectiveness for HOI design in VR.

Authors:Kun-Woo Song, Youngrae Kim, Sang Ho Yoon
Title: Finger Tendon Vibration: Finger Movement Illusions for Immersive Virtual Object Interaction
Abstract:
The absence of physical information during hand-object interaction in a virtual environment diminishes realism and immersion. Kinesthetic haptic feedback has proven effective in delivering realistic object-derived haptic cues, enhancing the overall virtual reality (VR) experience. Here, we propose kinesthetic illusion through a novel application of finger tendon vibration (FTV), which creates an illusory sensation of finger movement. To effectively apply FTV for virtual object interactions, we first examine the effects of short-duration FTV (<5 s) through 3 perception studies. Based on study results, we design 6 exemplary VR scenarios, representing the overall design space of VR object interactions, and 4 different haptic rendering strategies for FTV. We evaluated these rendering methods on each VR scenario and derived a design guideline for FTV application. We then compared FTV with no vibration and simple vibration, observing that FTV enhances VR experience by providing realistic resistance on the finger, greatly improving body ownership.

Authors:Preeti Vyas, Bereket Guta, Tim G. Zhou, Noor Naila Himam, Andero Uusberg, Karon E. MacLean
Title: Haptically Experienced Animacy Facilitates Emotion Regulation: A Theory-Driven Investigation
Abstract:
Emotion regulation (ER) is essential to mental well-being but often difficult to access, especially in high-intensity moments or for individuals with clinical vulnerabilities. While existing technology-based ER tools offer value, they typically rely on self-reflection (e.g., emotion tracking, journaling) or co-regulation through verbal modalities (reminders, text-based conversational tools), which may not be accessible or effective when most needed. The biological role of the touch modality makes it an intriguing alternate pathway, but empirical evidence is limited and under-theorized. Building on our prior theoretical framework describing how a comforting haptic co-regulating adjunct (CHORA) can support ER, we developed a zoomorphic robot CHORA with looped biomimetic breathing and heartbeat behaviors. We evaluated its effects in a mixed-methods in-lab study (N=30), providing physiological, self-report, custom questionnaire, and retrospective interview data. Our findings demonstrate the regulatory effects of haptically experienced animacy, corroborate prior work, and validate CHORA's {theoretically grounded} potential to facilitate four ER strategies.

Authors:Mohamed El Hajji, Tarek Ait Baha, Aicha Dakir, Hammou Fadili, Youssef Es-Saady
Title: Open TutorAI: An Open-source Platform for Personalized and Immersive Learning with Generative AI
Abstract:
Recent advances in artificial intelligence have created new possibilities for making education more scalable, adaptive, and learner-centered. However, existing educational chatbot systems often lack contextual adaptability, real-time responsiveness, and pedagogical agility. which can limit learner engagement and diminish instructional effectiveness. Thus, there is a growing need for open, integrative platforms that combine AI and immersive technologies to support personalized, meaningful learning experiences. This paper presents Open TutorAI, an open-source educational platform based on LLMs and generative technologies that provides dynamic, personalized tutoring. The system integrates natural language processing with customizable 3D avatars to enable multimodal learner interaction. Through a structured onboarding process, it captures each learner's goals and preferences in order to configure a learner-specific AI assistant. This assistant is accessible via both text-based and avatar-driven interfaces. The platform includes tools for organizing content, providing embedded feedback, and offering dedicated interfaces for learners, educators, and parents. This work focuses on learner-facing components, delivering a tool for adaptive support that responds to individual learner profiles without requiring technical expertise. Its assistant-generation pipeline and avatar integration enhance engagement and emotional presence, creating a more humanized, immersive learning environment. Embedded learning analytics support self-regulated learning by tracking engagement patterns and generating actionable feedback. The result is Open TutorAI, which unites modular architecture, generative AI, and learner analytics within an open-source framework. It contributes to the development of next-generation intelligent tutoring systems.

Authors:Xiaohui Zou, Lijun Ke, Shunpeng Zou
Title: A New Mode of Teaching Chinese as a Foreign Language from the Perspective of Smart System Studied by Using Rongzhixue
Abstract:
The purpose of this study is to introduce a new model of teaching Chinese as a foreign language from the perspective of integrating wisdom. Its characteristics are as follows: focusing on the butterfly model of interpretation before translation, highlighting the new method of bilingual thinking training, on the one hand, applying the new theory of Chinese characters, the theory of the relationship between language and speech, and the forward-looking research results of language science; On the other hand, the application of the new model of teaching Chinese as a foreign language, AI empowering teaching and learning, and the forward-looking research results of educational science fully reflect a series of characteristics of the new model of teaching Chinese as a foreign language from the perspective of integrating wisdom. Its beneficial effects are: not only the old view of language and education, especially the old view of teaching Chinese as a foreign language, but also the old view of human-computer interaction. Its significance lies in that a series of great cross-border Rongzhixue such as language, knowledge, education and teaching, as well as new methods and new topics of bilingual thinking training are clearly put forward from the perspective of integrating wisdom. Especially in the face of the challenge of Chat GPT to human learning ability and even creativity, the existing concepts of language knowledge education and teaching are already very backward. The old concepts of Chinese language education, and teaching Chinese as a foreign language are all facing a series of subversive innovation challenges. How to seek changes in adaptation? This study has made a series of innovative attempts, hoping to benefit academic colleagues, teachers and students.

Authors:Yang Li, Anna Maria Feit
Title: Simulating Word Suggestion Usage in Mobile Typing to Guide Intelligent Text Entry Design
Abstract:
Intelligent text entry (ITE) methods, such as word suggestions, are widely used in mobile typing, yet improving ITE systems is challenging because the cognitive mechanisms behind suggestion use remain poorly understood, and evaluating new systems often requires long-term user studies to account for behavioral adaptation. We present WSTypist, a reinforcement learning-based model that simulates how typists integrate word suggestions into typing. It builds on recent hierarchical control models of typing, but focuses on the cognitive mechanisms that underlie the high-level decision-making for effectively integrating word suggestions into manual typing: assessing efficiency gains, considering orthographic uncertainties, and including personal reliance on AI support. Our evaluations show that WSTypist simulates diverse human-like suggestion-use strategies, reproduces individual differences, and generalizes across different systems. Importantly, we demonstrate on four design cases how computational rationality models can be used to inform what-if analyses during the design process, by simulating how users might adapt to changes in the UI or in the algorithmic support, reducing the need for long-term user studies.

Authors:Harsh Chhajed, Tian Guo
Title: A High-Fidelity Robotic Manipulator Teleoperation Framework for Human-Centered Augmented Reality Evaluation
Abstract:
Validating Augmented Reality (AR) tracking and interaction models requires precise, repeatable ground-truth motion. However, human users cannot reliably perform consistent motion due to biomechanical variability. Robotic manipulators are promising to act as human motion proxies if they can mimic human movements. In this work, we design and implement ARBot, a real-time teleoperation platform that can effectively capture natural human motion and accurately replay the movements via robotic manipulators. ARBot includes two capture models: stable wrist motion capture via a custom CV and IMU pipeline, and natural 6-DOF control via a mobile application. We design a proactively-safe QP controller to ensure smooth, jitter-free execution of the robotic manipulator, enabling it to function as a high-fidelity record and replay physical proxy. We open-source ARBot and release a benchmark dataset of 132 human and synthetic trajectories captured using ARBot to support controllable and scalable AR evaluation.

Authors:Evgeny Kagan, Kyle Hyndman, Andrew Davis
Title: Chasing Tails: How Do People Respond to Wait Time Distributions?
Abstract:
We use a series of pre-registered, incentive-compatible online experiments to investigate how people evaluate and choose among different waiting time distributions. Our main findings are threefold. First, consistent with prior literature, people show an aversion to both longer expected waits and higher variance. Second, and more surprisingly, moment-based utility models fail to capture preferences when distributions have thick-right tails: indeed, decision-makers strongly prefer distributions with long-right tails (where probability mass is more evenly distributed over a larger support set) relative to tails that exhibit a spike near the maximum possible value, even when controlling for mean, variance, and higher moments. Conditional Value at Risk (CVaR) utility models commonly used in portfolio theory predict these choices well. Third, when given a choice, decision-makers overwhelmingly seek information about right-tail outcomes. These results have practical implications for service operations: (1) service designs that create a spike in long waiting times (such as priority or dedicated queue designs) may be particularly aversive; (2) when informativeness is the goal, providers should prioritize sharing right-tail probabilities or percentiles; and (3) to increase service uptake, providers can strategically disclose (or withhold) distributional information depending on right-tail shape.

Authors:Bo Shui, Xinran Zhu
Title: Knowledge Synthesis Graph: An LLM-Based Approach for Modeling Student Collaborative Discourse
Abstract:
Asynchronous, text-based discourse-such as students' posts in discussion forums-is widely used to support collaborative learning. However, the distributed and evolving nature of such discourse often makes it difficult to see how ideas connect, develop, and build on one another over time. As a result, learners may struggle to recognize relationships among ideas-a process that is critical for idea advancement in productive collaborative discourse. To address this challenge, we explore how large language models (LLMs) can provide representational guidance by modeling student discourse as a Knowledge Synthesis Graph (KSG). The KSG identifies ideas from student discourse and visualizes their epistemic relationships, externalizing the current state of collaborative knowledge in a form that can support further inquiry and idea advancement. In this study, we present the design of the KSG and evaluate the LLM-based approach for constructing KSGs from authentic student discourse data. Through multi-round human-expert coding and prompt iteration, our results demonstrate the feasibility of using our approach to construct reliable KSGs across different models. This work provides a technical foundation for modeling collaborative discourse with LLMs and offers pedagogical implications for augmenting complex knowledge work in collaborative learning environments.

Authors:Sujay Shalawadi, Katrina Hvítklett, Anna Stentoft Ries, Aisho Mohamed Ali, Florian Echtler
Title: DataCrumb: A Physical Probe for Reflections on Background Web Tracking
Abstract:
Cookie banners and privacy settings attempt to give users a sense of control over how their personal data is collected and used, but background tracking of personal information often continues unnoticed. To explore how such invisible data collection might be made more perceptible, we present DataCrumb, a physical probe that reacts in real-time to data tracking with visual and auditory feedback. Using a research-through-design approach, we deployed the artifact in three households and studied participants' responses. Instead of providing details about what data was being tracked, the artifact introduced subtle disruptions that made background data flows harder to ignore. Participants described new forms of awareness, contradiction, and fatigue. Our findings show how sensory feedback can support reflection by drawing attention to tracking data flows that are usually hidden. We argue for designing systems that foster awareness and interpretation, especially when the users' control and understanding are limited.

Authors:Jinghui Hu, Ludwig Sidenmark, Hock Siang Lee, Hans Gellersen
Title: The Eye-Head Mover Spectrum: Modelling Individual and Population Head Movement Tendencies in Virtual Reality
Abstract:
People differ in how much they move their head versus their eyes when shifting gaze, yet such tendencies remain largely unexplored in HCI. We introduce head movement tendencies as a fundamental dimension of individual difference in VR and provide a quantitative account of their population-level distribution. Using a 360° video free-viewing dataset (N=87), we model head contributions to gaze shifts with a hinge-based parametric function, revealing a spectrum of strategies from eye-movers to head-movers. We then conduct a user study (N=28) combining 360° video viewing with a short controlled task using gaze targets. While parameter values differ across tasks, individuals show partial alignment in their relative positions within the population, indicating that tendencies are meaningful but shaped by context. Our findings establish head movement tendencies as an important concept for VR and highlight implications for adaptive systems such as foveated rendering, viewport alignment, and multi-user experience design.

Authors:Sarvesh Shashidhar, Abhishek Mishra, Madhav Kotecha
Title: Exploring Re-inforcement Learning via Human Feedback under User Heterogeneity
Abstract:
Re-inforcement learning from human feedback (RLHF) has been effective in the task of AI alignment. However, one of the key assumptions of RLHF is that the annotators (referred to as workers from here on out) have a homogeneous response space. This assumption is not true in most practical settings and there have been studies done in the past to challenge this notion. This work has been inspired by such studies and explores one of the ways to deal with heterogeneity in worker preferences - by clustering workers with similar preferences and personalising reward models for each cluster. This work provides an algorithm that encourages simultaneous learning of reward models and worker embeddings. This algorithm is then empirically tested against the Reddit TL;DR dataset with unique worker IDs. We have shown that clustering users into different groups based on their preferences and created personalised reward models improves win-rate of the said models. Along with results and visualisations, this work aims to act as a stepping stone to more complicated models and gives a list of possible future extensions.

Authors:Tongzhou Yu, Han Lin
Title: Remember Me, Not Save Me: A Collective Memory System for Evolving Virtual Identities in Augmented Reality
Abstract:
This paper presents "Remember Me, Not Save Me," an AR & AI system enabling virtual citizens to develop personality through collective dialogue. Core innovations include: Dynamic Collective Memory (DCM) model with narrative tension mechanisms for handling contradictory memories; State-Reflective Avatar for ambient explainability; and Geo-Cultural Context Anchoring for local identity. Deployed at the 2024 Jinan Biennale, the system demonstrated stable personality emergence (ISTP type via Apply Magic Sauce analysis) from over 2,500 public interactions. We provide a framework for designing evolving digital entities that transform collective memory into coherent identity.

Authors:Zihan Zhou, Yinan Liu, Yuyang Xie, Bin Wang, Xiaochun Yang, Zezheng Feng
Title: DiagLink: A Dual-User Diagnostic Assistance System by Synergizing Experts with LLMs and Knowledge Graphs
Abstract:
The global shortage and uneven distribution of medical expertise continue to hinder equitable access to accurate diagnostic care. While existing intelligent diagnostic system have shown promise, most struggle with dual-user interaction, and dynamic knowledge integration -- limiting their real-world applicability. In this study, we present DiagLink, a dual-user diagnostic assistance system that synergizes large language models (LLMs), knowledge graphs (KGs), and medical experts to support both patients and physicians. DiagLink uses guided dialogues to elicit patient histories, leverages LLMs and KGs for collaborative reasoning, and incorporates physician oversight for continuous knowledge validation and evolution. The system provides a role-adaptive interface, dynamically visualized history, and unified multi-source evidence to improve both trust and usability. We evaluate DiagLink through user study, use cases and expert interviews, demonstrating its effectiveness in improving user satisfaction and diagnostic efficiency, while offering insights for the design of future AI-assisted diagnostic systems.

Authors:Andre Paulino de Lima, Paula Castro, Suzana Carvalho Vaz de Andrade, Rosa Maria Marcucci, Ruth Caldeira de Melo, Marcelo Garcia Manzato
Title: An Interpretable Recommendation Model for Psychometric Data, With an Application to Gerontological Primary Care
Abstract:
There are challenges that must be overcome to make recommender systems useful in healthcare settings. The reasons are varied: the lack of publicly available clinical data, the difficulty that users may have in understanding the reasons why a recommendation was made, the risks that may be involved in following that recommendation, and the uncertainty about its effectiveness. In this work, we address these challenges with a recommendation model that leverages the structure of psychometric data to provide visual explanations that are faithful to the model and interpretable by care professionals. We focus on a narrow healthcare niche, gerontological primary care, to show that the proposed recommendation model can assist the attending professional in the creation of personalised care plans. We report results of a comparative offline performance evaluation of the proposed model on healthcare datasets that were collected by research partners in Brazil, as well as the results of a user study that evaluates the interpretability of the visual explanations the model generates. The results suggest that the proposed model can advance the application of recommender systems in this healthcare niche, which is expected to grow in demand , opportunities, and information technology needs as demographic changes become more pronounced.

Authors:Zheng Yan, Ru-Yuan Zhang
Title: The Psychological Science of Artificial Intelligence: A Rapidly Emerging Field of Psychology
Abstract:
The psychological science of artificial intelligence (AI) can be broadly defined as an emerging field of psychology that examines all AI-related mental and behavioral processes from the perspective of psychology. This field has been growing exponentially in the recent decade. This review synthesizes the existing literature on the psychological science of AI with a goal to provide a comprehensive conceptual framework for planning, conducting, and assessing scientific research in the field. It consists of six parts, starting with an overview of the entire field of the psychological science of artificial intelligence, then synthesizing the literature in each of the four specific areas (i.e., Psychology of designing AI, psychology of using AI, AI for examining psychological processes, and AI for advancing psychological methods), and concluding with an outlook on the field in the future.

Authors:Hamza Peracha, Carrina Iacobacci, Tyler Singer-Clark, Leigh R. Hochberg, Sergey D. Stavisky, David M. Brandman, Nicholas S. Card
Title: A Personalized and Adaptable User Interface for a Speech and Cursor Brain-Computer Interface
Abstract:
Communication and computer interaction are important for autonomy in modern life. Unfortunately, these capabilities can be limited or inaccessible for the millions of people living with paralysis. While implantable brain-computer interfaces (BCIs) show promise for restoring these capabilities, little has been explored on designing BCI user interfaces (UIs) for sustained daily use. Here, we present a personalized UI for an intracortical BCI system that enables users with severe paralysis to communicate and interact with their computers independently. Through a 22-month longitudinal deployment with one participant, we used iterative co-design to develop a system for everyday at-home use and documented how it evolved to meet changing needs. Our findings highlight how personalization and adaptability enabled independence in daily life and provide design implications for developing future BCI assistive technologies.

Authors:Kaicheng Wang, Kevin Zhongyang Shao, Ruiqi Chen, Sep Makhsous, Denise Wilson
Title: Before Smelling the Video: A Two-Stage Pipeline for Interpretable Video-to-Scent Plans
Abstract:
Olfactory cues can enhance immersion in interactive media, yet smell remains rare because it is difficult to author and synchronize with dynamic video. Prior olfactory interfaces rely on designer triggers and fixed event-to-odor mappings that do not scale to unconstrained content. This work examines whether semantic planning for smell is intelligible to people before physical scent delivery. We present a video-to-scent planning pipeline that separates visual semantic extraction using a vision-language model from semantic-to-olfactory inference using a large language model. Two survey studies compare system-generated scent plans with over-inclusive and naive baselines. Results show consistent preference for plans that prioritize perceptually salient cues and align scent changes with visible actions, supporting semantic planning as a foundation for future olfactory media systems.

Authors:Ruipeng Wang, Tawab Safi, Yunge Wen, Christina Cunningham, Hoi Ling Tang, Behnaz Farahi
Title: Whispering Water: Materializing Human-AI Dialogue as Interactive Ripples
Abstract:
Across cultures, water has served as a recipient of human confession, a yielding medium that receives vulnerability where rigid surfaces cannot. We present Whispering Water, an interactive installation that materializes human-AI dialogue through cymatic patterns on water. Participants confess secrets to a water surface, triggering a four-phase ritual: confession, contemplation, response, and release. The user's speech sentiment is directly transmitted into the water to prime its state, while semantic content enters a multi-agent system, initiating ripples of conversation where agent identities are situated through discourse and voice profiles are chosen based on what they say. We propose a novel algorithm that decomposes speech into component waves and reconstructs them in water, establishing a translation between speech and the physics of material form. By rendering machine reasoning as emergent physical phenomena, the installation explores possibilities for emotional self-exploration through ambiguous, sensory-rich interfaces.

Authors:Tuhin Chakrabarty, Paramveer S. Dhillon
Title: Can Good Writing Be Generative? Expert-Level AI Writing Emerges through Fine-Tuning on High-Quality Books
Abstract:
Creative writing has long been considered a uniquely human endeavor, requiring voice and style that machines could not replicate. This assumption is challenged by Generative AI that can emulate thousands of author styles in seconds with negligible marginal labor. To understand this better, we conducted a behavioral experiment where 28 MFA writers (experts) competed against three LLMs in emulating 50 critically acclaimed authors. Based on blind pairwise comparisons by 28 expert judges and 131 lay judges, we find that experts preferred human writing in 82.7% of cases under the in-context prompting condition but this reversed to 62% preference for AI after fine-tuning on authors' complete works. Lay judges, however, consistently preferred AI writing. Debrief interviews with expert writers revealed that their preference for AI writing triggered an identity crisis, eroding aesthetic confidence and questioning what constitutes "good writing." These findings challenge discourse about AI's creative limitations and raise fundamental questions about the future of creative labor.

Authors:Brian Gin, Ahreum Lim, Flávia Silva e Oliveira, Kuan Xing, Xiaomei Song, Gayana Amiyangoda, Thilanka Seneviratne, Alison F. Doubleday, Ananya Gangopadhyaya, Bob Kiser, Lukas Shum-Tim, Dhruva Patel, Kosala Marambe, Lauren Maggio, Ara Tekian, Yoon Soo Park
Title: "Crash Test Dummies" for AI-Enabled Clinical Assessment: Validating Virtual Patient Scenarios with Virtual Learners
Abstract:
Background: In medical and health professions education (HPE), AI is increasingly used to assess clinical competencies, including via virtual standardized patients. However, most evaluations rely on AI-human interrater reliability and lack a measurement framework for how cases, learners, and raters jointly shape scores. This leaves robustness uncertain and can expose learners to misguidance from unvalidated systems. We address this by using AI "simulated learners" to stress-test and psychometrically characterize assessment pipelines before human use. Objective: Develop an open-source AI virtual patient platform and measurement model for robust competency evaluation across cases and rating conditions. Methods: We built a platform with virtual patients, virtual learners with tunable ACGME-aligned competency profiles, and multiple independent AI raters scoring encounters with structured Key-Features items. Transcripts were analyzed with a Bayesian HRM-SDT model that treats ratings as decisions under uncertainty and separates learner ability, case performance, and rater behavior; parameters were estimated with MCMC. Results: The model recovered simulated learners' competencies, with significant correlations to the generating competencies across all ACGME domains despite a non-deterministic pipeline. It estimated case difficulty by competency and showed stable rater detection (sensitivity) and criteria (severity/leniency thresholds) across AI raters using identical models/prompts but different seeds. We also propose a staged "safety blueprint" for deploying AI tools with learners, tied to entrustment-based validation milestones. Conclusions: Combining a purpose-built virtual patient platform with a principled psychometric model enables robust, interpretable, generalizable competency estimates and supports validation of AI-assisted assessment prior to use with human learners.

Authors:Naman Gupta, Sophie Stephenson, Chung Chi Yeung, Wei Ting Wu, Jeneile Luebke, Kate Walsh, Rahul Chatterjee
Title: "Lighting The Way For Those Not Here": How Technology Researchers Can Help Fight the Missing and Murdered Indigenous Relatives (MMIR) Crisis
Abstract:
Indigenous peoples across Turtle Island (North America) face disproportionate rates of disappearance and murder, a "genocide" rooted in settler-colonial violence and systemic erasure. Technology plays a crucial role in the Missing and Murdered Indigenous Relatives (MMIR) crisis: perpetuating harm and impeding investigations, yet enabling advocacy and resistance. Communities utilize technologies such as AMBER alerts, news websites, social media groups, and campaigns (like #MMIW, #MMIWR, #NoMoreStolenSisters, and #NoMoreStolenDaughters) to mobilize searches, amplify awareness, and honor missing relatives. Yet, little research in HCI has critically examined technology's role in shaping the MMIR crisis by centering community voices. Through a large-scale study, we analyze 140 webpages to identify systemic, technological, and institutional barriers that hinder communities' efforts, while highlighting socio-technical actions that foster healing and safety. Finally, we amplify Indigenous voices by providing a dataset of stories that resist epistemic erasure, along with recommendations for HCI researchers to support Indigenous-led initiatives with cultural sensitivity, accountability, and self-determination.

Authors:Amin Mohamed, Hamza Abdelmoreed, Mohamed Ehab, Youssef Shawky, Mayada Hadhoud, Ahmad Al-Kabbany
Title: TOSHFA: A Mobile VR-Based System for Pose-Guided Exercise Rehabilitation for Low Back Pain
Abstract:
Low back pain (LBP) is a pervasive global health challenge, affecting approximately 80% of adults and frequently progressing into chronic or recurrent episodes. While exercise therapy is a primary clinical intervention, traditional at-home programs suffer from low adherence rates and the absence of professional supervision. This study introduces TOSHFA, an accessible mobile VR-based rehabilitation system that bridges this gap by combining computer vision with affordable hardware. The system utilizes a laptop webcam to perform real-time pose estimation via the MediaPipe framework, tracking 33 skeletal landmarks to provide immediate biofeedback. This data is streamed via low-latency UDP protocols to a smartphone mounted in a cardboard-style VR headset, where patients interact with a gamified 3D environment. A pilot study with 20 participants evaluated the system's performance and user engagement. Quantitative results yielded a mean System Usability Scale (SUS) score of 47.4, indicating marginal usability and a need for interface optimization. However, Game Experience Questionnaire (GEQ) data revealed high scores in positive affect and enjoyment, suggesting that the gamification elements--such as coin rewards and streak tracking--successfully maintained user motivation despite technical friction. These findings validate the feasibility of a smartphone-based tele-rehabilitation model and establish a technical foundation for future clinical trials involving multi-exercise protocols.

Authors:Hellina Hailu Nigatu, Farhana Shahid, Vishal Sharma, Abigail Oppong, Michaelanne Thomas, Syed Ishtiaque Ahmed
Title: UnWEIRDing Peer Review in Human Computer Interaction
Abstract:
Peer review determines which scholarship is legitimized; however, review biases often disadvantage scholarship that diverges from the norm. Human-Computer Interaction (HCI) lacks a systemic inquiry into how such biases affect underrepresented Global South (GS) scholarship. To address this critical gap, we conducted four focus groups with 16 HCI researchers studying the GS. Participants reported experiencing reviews that confined them to development research, dismissed their theoretical contributions, and questioned situated knowledge from GS communities. Both as authors and reviewers, participants reported experiencing the epistemic burden of over-explaining why knowledge from GS communities matters. Further, they noted being tokenized as ``cultural experts'' when assigned to review papers and pointed out that the hidden curriculum of writing HCI papers often gatekeeps GS scholarship. Using epistemic oppression as a lens, we discuss how review practices marginalize GS scholarship and outline actionable strategies for nurturing equitable epistemological evaluation of HCI scholarship.

Authors:Mingxian Yu, Siqi Luo, Xu Chen
Title: GraphPilot: GUI Task Automation with One-Step LLM Reasoning Powered by Knowledge Graph
Abstract:
Mobile graphical user interface (GUI) agents are designed to automate everyday tasks on smartphones. Recent advances in large language models (LLMs) have significantly enhanced the capabilities of mobile GUI agents. However, most LLM-powered mobile GUI agents operate in stepwise query-act loops, which incur high latency due to repeated LLM queries. We present GraphPilot, a mobile GUI agent that leverages knowledge graphs of the target apps to complete user tasks in almost one LLM query. GraphPilot operates in two complementary phases to enable efficient and reliable LLM-powered GUI task automation. In the offline phase, it explores target apps, records and analyzes interaction history, and constructs an app-specific knowledge graph that encodes functions of pages and elements as well as transition rules for each app. In the online phase, given an app and a user task, it leverages the knowledge graph of the given app to guide the reasoning process of LLM. When the reasoning process encounters uncertainty, GraphPilot dynamically requests the HTML representation of the current interface to refine subsequent reasoning. Finally, a validator checks the generated sequence of actions against the transition rules in the knowledge graph, performing iterative corrections to ensure it is valid. The structured, informative information in the knowledge graph allows the LLM to plan the complete sequence of actions required to complete the user task. On the DroidTask benchmark, GraphPilot improves task completion rate over Mind2Web and AutoDroid, while substantially reducing latency and the number of LLM queries.

Authors:Nadja Rupprechter, Tobias Dienlin, Tilo Hartmann
Title: AI-RP: The AI Relationship Process Framework
Abstract:
For a growing number of people, AI chatbots have become close personal companions. Despite rising scholarly attention, theoretical accounts of how such relationships develop remain fragmented. Existing frameworks address important aspects of the phenomenon, but they rarely treat human-chatbot communication as the central behavior that builds relationships. To address this gap, we propose the AI relationship process (AI-RP) framework. The AI-RP outlines relationship formation as a sequential process. (a) Chatbot characteristics shape users' (b) social perceptions. These perceptions guide (c) communication, and communication produces (d) relational outcomes such as attachment and companionship. The AI-RP introduces a six-features profile characterizing chatbots, a dual-route approach of social perception, a behavioral conceptualization of communication and discusses the foundation and types of artificial relationships. By foregrounding observable communicative behavior, the AI-RP provides a foundation for theory building and empirical research on the social and ethical implications of AI companionship.

Authors:Jasmine Lesner, Michael Beyeler
Title: SymbolSight: Minimizing Inter-Symbol Interference for Reading with Prosthetic Vision
Abstract:
Retinal prostheses restore limited visual perception, but low spatial resolution and temporal persistence make reading difficult. In sequential letter presentation, the afterimage of one symbol can interfere with perception of the next, leading to systematic recognition errors. Rather than relying on future hardware improvements, we investigate whether optimizing the visual symbols themselves can mitigate this temporal interference. We present SymbolSight, a computational framework that selects symbol-to-letter mappings to minimize confusion among frequently adjacent letters. Using simulated prosthetic vision (SPV) and a neural proxy observer, we estimate pairwise symbol confusability and optimize assignments using language-specific bigram statistics. Across simulations in Arabic, Bulgarian, and English, the resulting heterogeneous symbol sets reduced predicted confusion by a median factor of 22 relative to native alphabets. These results suggest that standard typography is poorly matched to serial, low-bandwidth prosthetic vision and demonstrate how computational modeling can efficiently narrow the design space of visual encodings to generate high-potential candidates for future psychophysical and clinical evaluation.

Authors:Jathushan Kaetheeswaran, Jenny Wei
Title: Exploring EEG-driven brain-heart coupling across sleep stages in individuals with sleep disorders
Abstract:
The interactions between the brain and heart during sleep are responsible for regulating autonomic function. While brain-heart coupling has been studied in healthy populations, the relationships between neural and cardiac activity across sleep stages in the presence of sleep disorders are not clear. This study examines the influence of brain-driven cardiac activity across sleep stages for individuals with sleep disorders. Overnight recordings of C3 and C4 electroencephalogram (EEG) channels and electrocardiogram (ECG) signals from 146 individuals were preprocessed and analyzed in the frequency domain through a linear mixed-effect model. Our results show that parasympathetic activity is sensitive to changes in delta and beta powers during later stages of non-rapid eye movement (NREM) sleep, as both band powers exhibited strong negative effects on high-frequency heart rate variability (HF-HRV) power. These findings show that neural activity can drive vagal tone across sleep stages, suggesting that treatments on key EEG bands during NREM and REM stages may help restore regular cardiac behaviour.

Authors:Daehwa Kim, Chris Harrison
Title: Acoustic Field Video for Multimodal Scene Understanding
Abstract:
We introduce and explore a new multimodal input representation for vision-language models: acoustic field video. Unlike conventional video (RGB with stereo/mono audio), our video stream provides a spatially grounded visualization of sound intensity across a scene, offering a new and powerful dimension of perceptual understanding. Our real-time pipeline uses low-cost beamforming microphone arrays that are already common in smart speakers and increasingly present in robotics and XR headsets, yet this sensing capability remains unutilized for scene understanding. To assess the value of spatial acoustic information, we constructed an evaluation set of 402 question-answer scenes, comparing a state-of-the-art VLM given conventional video with and without paired acoustic field video. Results show a clear and consistent improvement when incorporating spatial acoustic data; the VLM we test improves from 38.3% correct to 67.4%. Our findings highlight that many everyday scene understanding tasks remain underconstrained when relying solely on visual and audio input, and that acoustic field data provides a promising and practical direction for multimodal reasoning. A video demo is available at https://daehwakim.com/seeingsound

Authors:Simon Lämmer, Mark Colley, Patrick Ebel
Title: GTA: Generative Traffic Agents for Simulating Realistic Mobility Behavior
Abstract:
People's transportation choices reflect complex trade-offs shaped by personal preferences, social norms, and technology acceptance. Predicting such behavior at scale is a critical challenge with major implications for urban planning and sustainable transport. Traditional methods use handcrafted assumptions and costly data collection, making them impractical for early-stage evaluations of new technologies or policies. We introduce Generative Traffic Agents (GTA) for simulating large-scale, context-sensitive transportation choices using LLM-powered, persona-based agents. GTA generates artificial populations from census-based sociodemographic data. It simulates activity schedules and mode choices, enabling scalable, human-like simulations without handcrafted rules. We evaluate GTA in Berlin-scale experiments, comparing simulation results against empirical data. While agents replicate patterns, such as modal split by socioeconomic status, they show systematic biases in trip length and mode preference. GTA offers new opportunities for modeling how future innovations, from bike lanes to transit apps, shape mobility decisions.

Authors:Yuyang Qin, Haihan Duan
Title: "What I Sign Is Not What I See": Towards Explainable and Trustworthy Cryptocurrency Wallet Signatures
Abstract:
Cryptocurrency wallets have become the primary gateway to decentralized applications, yet users often face significant difficulty in discerning what a wallet signature actually does or entails. Prior work has mainly focused on mitigating protocol vulnerabilities, with limited attention to how users perceive and interpret what they are authorizing. To examine this usability-security gap, we conducted two formative studies investigating how users interpret authentic signing requests and what cues they rely on to assess risk. Findings reveal that users often misread critical parameters, underestimate high-risk signatures, and rely on superficial familiarity rather than understanding transaction intent. Building on these insights, we designed the Signature Semantic Decoder -- a prototype framework that reconstructs and visualizes the intent behind wallet signatures prior to confirmation. Through structured parsing and semantic labeling, it demonstrates how signing data can be transformed into plain-language explanations with contextual risk cues. In a between-subjects user study (N = 128), participants using the prototype achieved higher accuracy in identifying risky signatures, improved clarity and decision confidence, and lower cognitive workload compared with the baseline wallet interface. Our study reframes wallet signing as a problem of interpretability within secure interaction design and offers design implications for more transparent and trustworthy cryptocurrency wallet interfaces.

Authors:Riccardo Volpato, Simone Stumpf, Lisa DeBruine
Title: Generative Confidants: How do People Experience Trust in Emotional Support from Generative AI?
Abstract:
People are increasingly turning to generative AI (e.g., ChatGPT, Gemini, Copilot) for emotional support and companionship. While trust is likely to play a central role in enabling these informal and unsupervised interactions, we still lack an understanding of how people develop and experience it in this context. Seeking to fill this gap, we recruited 24 frequent users of generative AI for emotional support and conducted a qualitative study consisting of diary entries about interactions, transcripts of chats with AI, and in-depth interviews. Our results suggest important novel drivers of trust in this context: familiarity emerging from personalisation, nuanced mental models of generative AI, and awareness of people's control over conversations. Notably, generative AI's homogeneous use of personalised, positive, and persuasive language appears to promote some of these trust-building factors. However, this also seems to discourage other trust-related behaviours, such as remembering that generative AI is a machine trained to converse in human language. We present implications for future research that are likely to become critical as the use of generative AI for emotional support increasingly overlaps with therapeutic work.

Authors:Zoë Breed, Elvin Karana, Alessandro Bozzon, Katherine W. Song
Title: Entangled Life and Code: A Computational Design Taxonomy for Synergistic Bio-Digital Systems
Abstract:
Bio-digital systems that merge microbial life with technology promise new modes of computation, combining biological adaptability with digital precision. Yet realizing this potential symbiotically -- where biological and digital agents co-adapt and co-process -- remains elusive, largely due to the absence of a shared vocabulary bridging biology and computing. Consequently, microbes are often constrained to uni-directional roles, functioning as sensors or actuators rather than as active, computational partners in bio-digital systems. In response, we propose a taxonomy and pathways that articulate and expand the roles of biological and digital entities for synergetic bio-digital computation. Using this taxonomy, we analysed 70 systems across HCI, design, and engineering, identifying how biological mechanisms can be mapped onto computational abstractions. We argue that such mappings enable computationally actionable directions that foster richer and reciprocal relationships in bio-digital systems, supporting regenerative ecologies across time and scale while inspiring new paradigms for computation in HCI.

Authors:Sohyeon Park, Jesus Armando Beltran, Aehong Min, Anamara Ritt-Olson, Gillian R. Hayes
Title: Exploring Implicit Perspectives on Autism in Large Language Models Through Multi-Agent Simulations
Abstract:
Large Language Models (LLMs) like ChatGPT offer potential support for autistic people, but this potential requires understanding the implicit perspectives these models might carry, including their biases and assumptions about autism. Moving beyond single-agent prompting, we utilized LLM-based multi-agent systems to investigate complex social scenarios involving autistic and non-autistic agents. In our study, agents engaged in group-task conversations and answered structured interview questions, which we analyzed to examine ChatGPT's biases and how it conceptualizes autism. We found that ChatGPT assumes autistic people are socially dependent, which may affect how it interacts with autistic users or conveys information about autism. To address these challenges, we propose adopting the double empathy problem, which reframes communication breakdowns as a mutual challenge. We describe how future LLMs could address the biases we observed and improve interactions involving autistic people by incorporating the double empathy problem into their design.

Authors:Tamunotonye Harry, Ivoline Ngong, Chima Nweke, Yuanyuan Feng, Joseph Near
Title: Beyond Fixed Psychological Personas: State Beats Trait, but Language Models are State-Blind
Abstract:
User interactions with language models vary due to static properties of the user (trait) and the specific context of the interaction (state). However, existing persona datasets (like PersonaChat, PANDORA etc.) capture only trait, and ignore the impact of state. We introduce Chameleon, a dataset of 5,001 contextual psychological profiles from 1,667 Reddit users, each measured across multiple contexts. Using the Chameleon dataset, we present three key findings. First, inspired by Latent State-Trait theory, we decompose variance and find that 74\% is within-person(state) while only 26\% is between-person (trait). Second, we find that LLMs are state-blind: they focus on trait only, and produce similar responses regardless of state. Third, we find that reward models react to user state, but inconsistently: different models favor or penalize the same users in opposite directions. We release Chameleon to support research on affective computing, personalized dialogue, and RLHF alignment.

Authors:Marko Hostnik, Rauf Kurbanov, Yaroslav Sokolov, Artem Trofimov
Title: VegaChat: A Robust Framework for LLM-Based Chart Generation and Assessment
Abstract:
Natural-language-to-visualization (NL2VIS) systems based on large language models (LLMs) have substantially improved the accessibility of data visualization. However, their further adoption is hindered by two coupled challenges: (i) the absence of standardized evaluation metrics makes it difficult to assess progress in the field and compare different approaches; and (ii) natural language descriptions are inherently underspecified, so multiple visualizations may be valid for the same query. To address these issues, we introduce VegaChat, a framework for generating, validating, and assessing declarative visualizations from natural language. We propose two complementary metrics: Spec Score, a deterministic metric that measures specification-level similarity without invoking an LLM, and Vision Score, a library-agnostic, image-based metric that leverages a multimodal LLM to assess chart similarity and prompt compliance. We evaluate VegaChat on the NLV Corpus and on the annotated subset of ChartLLM. VegaChat achieves near-zero rates of invalid or empty visualizations, while Spec Score and Vision Score exhibit strong correlation with human judgments (Pearson 0.65 and 0.71, respectively), indicating that the proposed metrics support consistent, cross-library comparison. The code and evaluation artifacts are available at https://zenodo.org/records/17062309.

Authors:Paweł Niszczota, Elia Antoniou
Title: Do people expect different behavior from large language models acting on their behalf? Evidence from norm elicitations in two canonical economic games
Abstract:
While delegating tasks to large language models (LLMs) can save people time, there is growing evidence that offloading tasks to such models produces social costs. We use behavior in two canonical economic games to study whether people have different expectations when decisions are made by LLMs acting on their behalf instead of themselves. More specifically, we study the social appropriateness of a spectrum of possible behaviors: when LLMs divide resources on our behalf (Dictator Game and Ultimatum Game) and when they monitor the fairness of splits of resources (Ultimatum Game). We use the Krupka-Weber norm elicitation task to detect shifts in social appropriateness ratings. Results of two pre-registered and incentivized experimental studies using representative samples from the UK and US (N = 2,658) show three key findings. First, people find that offers from machines - when no acceptance is necessary - are judged to be less appropriate than when they come from humans, although there is no shift in the modal response. Second - when acceptance is necessary - it is more appropriate for a person to reject offers from machines than from humans. Third, receiving a rejection of an offer from a machine is no less socially appropriate than receiving the same rejection from a human. Overall, these results suggest that people apply different norms for machines deciding on how to split resources but are not opposed to machines enforcing the norms. The findings are consistent with offers made by machines now being viewed as having both a cognitive and emotional component.

Authors:Jason Pan, Ben Moews
Title: Public transport challenges and technology-assisted accessibility for visually impaired elderly residents in urban environments
Abstract:
Independent navigation is a core aspect of maintaining social participation and individual health for vulnerable populations. While historic cities such as Edinburgh, as the capital of Scotland, often feature well-established public transport systems, urban accessibility challenges remain and are exacerbated by a complex landscape, especially for groups with multiple vulnerabilities such as the blind elderly. With limited research examining how real-time data feeds and developments in artificial intelligence can enhance navigation aids, we address this gap through a mixed-methods approach. Our work combines statistical and machine learning techniques, with a focus on spatial analysis to investigate network coverage, service patterns, and density through live Transport for Edinburgh data, with a qualitative thematic analysis of semi-structured interviews with the mentioned target group. The results demonstrate the highly centralised nature of the city's transport system, the significance of memory-based navigation, and the lack of travel information in usable formats. We also find that participants already use navigation technology to varying degrees and express a willingness to adopt artificial intelligence. Our analysis highlights the importance of dynamic tools in terms of sensory and cognitive needs to meaningfully improve independent travel.

Authors:Mayada Oudah, John Wooders
Title: Real-time Facial Communication Restores Cooperation After Defection in Social Dilemmas
Abstract:
Facial expressions are central to human interaction, yet their role in strategic decision-making has received limited attention. We investigate how real-time facial communication influences cooperation in repeated social dilemmas. In a laboratory experiment, participants play a repeated Prisoner's Dilemma game under two conditions: in one, they observe their counterpart's facial expressions via gender-neutral avatars, and in the other no facial cues are available. Using state-of-the-art biometric technology to capture and display emotions in real-time, we find that facial communication significantly increases overall cooperation and, notably, promotes cooperation following defection. This restorative effect suggests that facial expressions help participants interpret defections less harshly, fostering forgiveness and the resumption of cooperation. While past actions remain the strongest predictor of behavior, our findings highlight the communicative power of facial expressions in shaping strategic outcomes. These results offer practical insights for designing emotionally responsive virtual agents and digital platforms that sustain cooperation in the absence of physical presence.

Authors:Simran Kaur, Sara Salimzadeh, Ujwal Gadiraju
Title: Incentive-Tuning: Understanding and Designing Incentives for Empirical Human-AI Decision-Making Studies
Abstract:
AI has revolutionised decision-making across various fields. Yet human judgement remains paramount for high-stakes decision-making. This has fueled explorations of collaborative decision-making between humans and AI systems, aiming to leverage the strengths of both. To explore this dynamic, researchers conduct empirical studies, investigating how humans use AI assistance for decision-making and how this collaboration impacts results. A critical aspect of conducting these studies is the role of participants, often recruited through crowdsourcing platforms. The validity of these studies hinges on the behaviours of the participants, hence effective incentives that can potentially affect these behaviours are a key part of designing and executing these studies. In this work, we aim to address the critical role of incentive design for conducting empirical human-AI decision-making studies, focusing on understanding, designing, and documenting incentive schemes. Through a thematic review of existing research, we explored the current practices, challenges, and opportunities associated with incentive design for human-AI decision-making empirical studies. We identified recurring patterns, or themes, such as what comprises the components of an incentive scheme, how incentive schemes are manipulated by researchers, and the impact they can have on research outcomes. Leveraging the acquired understanding, we curated a set of guidelines to aid researchers in designing effective incentive schemes for their studies, called the Incentive-Tuning Framework, outlining how researchers can undertake, reflect on, and document the incentive design process. By advocating for a standardised yet flexible approach to incentive design and contributing valuable insights along with practical tools, we hope to pave the way for more reliable and generalizable knowledge in the field of human-AI decision-making.

Authors:Chris Monk, Allegra Ayala, Christine S. P. Yu, Gregory M. Fitch, Dara Gruber
Title: Visual and Cognitive Demands of a Large Language Model-Powered In-vehicle Conversational Agent
Abstract:
Driver distraction remains a leading contributor to motor vehicle crashes, necessitating rigorous evaluation of new in-vehicle technologies. This study assessed the visual and cognitive demands associated with an advanced Large Language Model (LLM) conversational agent (Gemini Live) during on-road driving, comparing it against handsfree phone calls, visual turn-by-turn guidance (low load baseline), and the Operation Span (OSPAN) task (high load anchor). Thirty-two licensed drivers completed five secondary tasks while visual and cognitive demands were measured using the Detection Response Task (DRT) for cognitive load, eye-tracking for visual attention, and subjective workload ratings. Results indicated that Gemini Live interactions (both single-turn and multi-turn) and hands-free phone calls shared similar levels of cognitive load, between that of visual turn-by-turn guidance and OSPAN. Exploratory analysis showed that cognitive load remained stable across extended multi-turn conversations. All tasks maintained mean glance durations well below the well-established 2-second safety threshold, confirming low visual demand. Furthermore, drivers consistently dedicated longer glances to the roadway between brief off-road glances toward the device during task completion, particularly during voice-based interactions, rendering longer total-eyes-off-road time findings less consequential. Subjective ratings mirrored objective data, with participants reporting low effort, demands, and perceived distraction for Gemini Live. These findings demonstrate that advanced LLM conversational agents, when implemented via voice interfaces, impose cognitive and visual demands comparable to established, low-risk hands-free benchmarks, supporting their safe deployment in the driving environment.

Authors:Christina Schneegass, Francesco Chiossi, Anna L. Cox, Dimitra Dritsa, Teodora Mitrevska, Stephen Rainey, Max L. Wilson
Title: The CHI26 Workshop on the Future of Cognitive Personal Informatics
Abstract:
Research on Cognitive Personal Informatics (CPI) is steadily growing as new wearable cognitive tracking technologies emerge on the consumer market, claiming to measure stress, focus, and other cognitive factors. At the same time, with generative AI offering new ways to analyse, visualize, and interpret cognitive data, we hypothesize that cognitive tracking will soon become as simple as measuring your heart rate during a run. Yet, cognitive data remains inherently more complex, context-dependent, and less well understood than physical activity data. This workshop brings together HCI experts to discuss critical questions, including: How can complex cognitive data be translated into meaningful metrics? How can AI support users' data sensemaking without over-simplifying cognitive insights? How can we design inclusive CPI technologies that consider inter-personal variance and neurodiversity? We will map

Authors:Aryan Ramchandra Kapadia, Niharika Bhattacharjee, Mung Yao Jia, Ishq Gupta, Dong Wang, Koustuv Saha
Title: Loss Aversion Online: Emotional Responses to Financial Booms and Crashes
Abstract:
Financial events negatively affect emotional well-being, but large-scale studies examining their impact on online emotional expression using real-time social media data remain limited. To address this gap, we propose analyzing Reddit communities (financial and non-financial) across two case studies: a financial crash and a boom. We investigate how emotional and psycholinguistic responses differ between financial and non-financial communities, and the extent to which the type of financial event affects user behavior during the two case study periods. To examine the effect of these events on expressed language, we analyze daily sentiment, emotion, and LIWC counts using quasi-experimental methods: Difference-in-Differences (DiD) and Causal Impact analyses during a financial boom and a financial crash. Overall, we find coherent, negative shifts in emotional responses during financial crashes, but weaker, mixed responses during booms, consistent with loss aversion. By exploring emotional and psycholinguistic expressions during financial events, we identify future implications for understanding online users' mental health and building connected, healthy communities.

Authors:Yufei Zhang, Zhihao Ma
Title: Psychometric Comparability of LLM-Based Digital Twins
Abstract:
Large language models (LLMs) are used as "digital twins" to replace human respondents, yet their psychometric comparability to humans is uncertain. We propose a construct-validity framework spanning construct representation and the nomological net, benchmarking digital twins against human gold standards across models, tasks and testing how person-specific inputs shape performance. Across studies, digital twins achieved high population-level accuracy and strong within-participant profile correlations, alongside attenuated item-level correlations. In word association tests, LLM-based networks show small-world structure and theory-consistent communities similar to humans, yet diverge lexically and in local structure. In decision-making and contextualized tasks, digital twins under-reproduce heuristic biases, showing normative rationality, compressed variance and limited sensitivity to temporal information. Feature-rich digital twins improve Big Five Personality prediction, but their personality networks show only configural invariance and do not achieve metric invariance. In more applied free-text tasks, feature-rich digital twins better match human narratives, but linguistic differences persist. Together, these results indicate that feature-rich conditioning enhances validity but does not resolve systematic divergences in psychometric comparability. Future work should therefore prioritize delineating the effective boundaries of digital twins, establishing the precise contexts in which they function as reliable proxies for human cognition and behavior.

Authors:Alex Echeverria, Sávio Salvarino Teles de Oliveira, Fernando Marques Federson
Title: Call2Instruct: Automated Pipeline for Generating Q&A Datasets from Call Center Recordings for LLM Fine-Tuning
Abstract:
The adaptation of Large-Scale Language Models (LLMs) to specific domains depends on high-quality fine-tuning datasets, particularly in instructional format (e.g., Question-Answer - Q&A). However, generating these datasets, particularly from unstructured sources such as call center audio recordings, poses a significant challenge due to the noisy and disorganized nature of the data. This paper presents a solution to this challenge by offering an end-to-end automated pipeline for generating Q&A instructional datasets from such recordings. The methodology developed comprises sequential steps of audio processing (including diarization, noise removal and automatic transcription), textual processing (cleaning, normalization, and anonymization), semantic extraction of customer demands and attendant responses using vector embeddings, and matching via semantic search to form the final Q&A pairs. As a result, the complete pipeline was successfully implemented, generating a dataset specifically formatted for Instruct Fine Tuning. The practical value and feasibility of the generated dataset were substantiated and functionally demonstrated through the successful fine-tuning of an LLM model (based on Llama 2 7B). The conclusion of the paper states that the proposed approach is viable for converting unstructured conversational data from call centers into valuable resources for training LLMs. This development has the potential to open up avenues for creating more effective AI systems for Q&A tasks in the customer service domain. The developed codes have been made publicly available to promote reproducibility and future research.

Authors:Ziwen Zhong, Zhitao Shu, Yue Zhao
Title: A Cloud-Based Cross-Modal Transformer for Emotion Recognition and Adaptive Human-Computer Interaction
Abstract:
Emotion recognition is a fundamental component of next-generation human-computer interaction (HCI), enabling machines to perceive, understand, and respond to users' affective states. However, existing systems often rely on single-modality analysis such as facial expressions, speech tone, or textual sentiment, resulting in limited robustness and poor generalization in real-world environments. To address these challenges, this study proposes a Cloud-Based Cross-Modal Transformer (CMT) framework for multimodal emotion recognition and adaptive human-computer interaction. The proposed model integrates visual, auditory, and textual signals using pretrained encoders (Vision Transformer, Wav2Vec2, and BERT) and employs a cross-modal attention mechanism to capture complex interdependencies among heterogeneous features. By leveraging cloud computing infrastructure with distributed training on Kubernetes and TensorFlow Serving, the system enables scalable, low-latency emotion recognition for large-scale user interactions. Experiments conducted on benchmark datasets including IEMOCAP, MELD, and AffectNet demonstrate that the CMT achieves state-of-the-art performance, improving the F1-score by 3.0 percent and reducing cross-entropy loss by 12.9 percent compared to strong multimodal baselines. Additionally, cloud deployment evaluations show an average response latency of 128 ms, representing a 35 percent reduction compared with conventional transformer-based fusion systems. These results confirm that the proposed framework enables efficient, real-time emotion recognition and adaptive feedback in applications such as intelligent customer service, virtual tutoring systems, and affective computing interfaces, marking an important step toward cloud-native affective computing and emotionally intelligent interactive systems.

Authors:Runze Li, Lanbing Li, Yuan Zheng, Chuanxiao Li, Xianglong Zeng
Title: Measuring Love Toward AI: Development and Validation of the Love Attitudes Scale toward Artificial Intelligence (LAS-AI)
Abstract:
Artificial intelligences (AIs) are increasingly capable of emotionally engaging with humans to the point of forming intimate relationships. Yet, current studies on romantic love toward AI lack statistically validated instruments to measure romantic love toward AI, hindering empirical research. To address this gap, we reinterpreted Lee's love styles theory in the AI context and developed the Love Attitudes Scale toward AI (LAS-AI). The resulting 24-item, six-factor scale was validated across four phases using three independent samples (N = 899), demonstrating strong psychometric properties. The findings further revealed that people primarily seek practical, passionate, and companionship-based relationships with AI (i.e., Pragma, Eros, and Storge), showing little interest in a playful or noncommittal approach (i.e., Ludus). We also provided an initial exploration of the similarities and differences between romantic love with humans and AI. The LAS-AI offers a robust tool for future research on human-AI romantic relationships, with prolific implications.

Authors:Ricard Solé, Luis F Seoane, Jordi Pla-Mauri, Michael Timothy Bennett, Michael E. Hochberg, Michael Levin
Title: Cognition spaces: natural, artificial, and hybrid
Abstract:
Cognitive processes are realized across an extraordinary range of natural, artificial, and hybrid systems, yet there is no unified framework for comparing their forms, limits, and unrealized possibilities. Here, we propose a cognition space approach that replaces narrow, substrate-dependent definitions with a comparative representation based on organizational and informational dimensions. Within this framework, cognition is treated as a graded capacity to sense, process, and act upon information, allowing systems as diverse as cells, brains, artificial agents, and human-AI collectives to be analyzed within a common conceptual landscape. We introduce and examine three cognition spaces -- basal aneural, neural, and human-AI hybrid -- and show that their occupation is highly uneven, with clusters of realized systems separated by large unoccupied regions. We argue that these voids are not accidental but reflect evolutionary contingencies, physical constraints, and design limitations. By focusing on the structure of cognition spaces rather than on categorical definitions, this approach clarifies the diversity of existing cognitive systems and highlights hybrid cognition as a promising frontier for exploring novel forms of complexity beyond those produced by biological evolution.

Authors:Bhavesh Vuyyuru, Farnaz Jahanbakhsh
Title: Persuasion in Online Conversations Is Associated with Alignment in Expressed Human Values
Abstract:
Online disagreements often fail to produce understanding, instead reinforcing existing positions or escalating conflict. Prior work on predictors of successful persuasion in online discourse has largely focused on surface features such as linguistic style or conversational structure, leaving open the role of underlying principles or concerns that participants bring to an interaction. In this paper, we investigate how the expression and alignment of human values in back-and-forth online discussions relate to persuasion. Using data from Reddit's ChangeMyView subreddit, where successful persuasion is explicitly signaled through the awarding of deltas, we analyze one-on-one exchanges and characterize participants' value expression by drawing from Schwartz's Refined Theory of Basic Human Values. We find that successful persuasion is associated with two complementary processes: pre-existing compatibility between participants' value priorities even before the exchange happens, and the emergence of value alignment over the course of a conversation. At the same time, successful persuasion does not depend on commenters making large departures from their typical value expression patterns. We discuss implications of our findings for the design of online social platforms that aim to support constructive engagement across disagreement.

Authors:Leif Azzopardi, Adam Roegiest
Title: Information Farming: From Berry Picking to Berry Growing
Abstract:
The classic paradigms of Berry Picking and Information Foraging Theory have framed users as gatherers, opportunistically searching across distributed sources to satisfy evolving information needs. However, the rise of GenAI is driving a fundamental transformation in how people produce, structure, and reuse information - one that these paradigms no longer fully capture. This transformation is analogous to the Neolithic Revolution, when societies shifted from hunting and gathering to cultivation. Generative technologies empower users to "farm" information by planting seeds in the form of prompts, cultivating workflows over time, and harvesting richly structured, relevant yields within their own plots, rather than foraging across others people's patches. In this perspectives paper, we introduce the notion of Information Farming as a conceptual framework and argue that it represents a natural evolution in how people engage with information. Drawing on historical analogy and empirical evidence, we examine the benefits and opportunities of information farming, its implications for design and evaluation, and the accompanying risks posed by this transition. We hypothesize that as GenAI technologies proliferate, cultivating information will increasingly supplant transient, patch-based foraging as a dominant mode of engagement, marking a broader shift in human-information interaction and its study.

Authors:Amro Khaled, Farah Khaled, Omar Riad, Catherine M. Elias
Title: CD-TWINSAFE: A ROS-enabled Digital Twin for Scene Understanding and Safety Emerging V2I Technology
Abstract:
In this paper, the CD-TWINSAFE is introduced, a V2I-based digital twin for Autonomous Vehicles. The proposed architecture is composed of two stacks running simultaneously, an on-board driving stack that includes a stereo camera for scene understanding, and a digital twin stack that runs an Unreal Engine 5 replica of the scene viewed by the camera as well as returning safety alerts to the cockpit. The on-board stack is implemented on the vehicle side including 2 main autonomous modules; localization and perception. The position and orientation of the ego vehicle are obtained using on-board sensors. Furthermore, the perception module is responsible for processing 20-fps images from stereo camera and understands the scene through two complementary pipelines. The pipeline are working on object detection and feature extraction including object velocity, yaw and the safety metrics time-to-collision and time-headway. The collected data form the driving stack are sent to the infrastructure side through the ROS-enabled architecture in the form of custom ROS2 messages and sent over UDP links that ride a 4G modem for V2I communication. The environment is monitored via the digital twin through the shared messages which update the information of the spawned ego vehicle and detected objects based on the real-time localization and perception data. Several tests with different driving scenarios to confirm the validity and real-time response of the proposed architecture.

Authors:Hana E. Elmalah, Catherine M. Elias
Title: User-to-Vehicle Interaction in Smart Mobility: The GO-DRiVeS Autonomous Ride-Sharing Application
Abstract:
This paper introduces the GO-DRiVeS application, an on demand ride sharing and requesting mobile application tailored specifically to save long walks and challenges which are time consuming and tiring especially during hot days or when carrying heavy items, faced by university students and staff. The GO-DRiVeS application was developed following the Agile methodology for its flexibility. In addition to, using the mobile application system architecture and client-server architecture. GO-DRiVeS was implemented using React Native (Expo) for the frontend, Node.js and Express for the backend, and MongoDB as the database; based on a detailed analyses to the existing transportation application, comparing their frameworks and identifying their essential functionalities. GO-DRiVeS supports core features like user registration, ride requesting and real-time tracking.In addition to handling multiple requests at the same time in a first come first serve manner. The application was developed based on these features, and the results were conducted in the form of multiple experiments that demonstrated stable behavior in handling the requests, as presented in the Methodology and Results chapters.

Authors:Rezky Kam, Coddy N. Siswanto
Title: Time-Continuous Modeling for Temporal Affective Pattern Recognition in LLMs
Abstract:
This paper introduces a dataset and conceptual framework for LLMs to mimic real world emotional dynamics through time and in-context learning leveraging physics-informed neural network, opening a possibility for interpretable dialogue modeling.

Authors:Hilsann Yong, Bradley A. Camburn
Title: Predictive Prototyping: Evaluating Design Concepts with ChatGPT
Abstract:
The design-build-test cycle is essential for innovation, but physical prototyping is often slow and expensive. Although physics-based simulation and strategic prototyping can reduce cost, meaningful evaluation is frequently constrained until an integrated prototype is built. This paper investigates whether a generative pretrained transformer (GPT) can predict information typically obtained through prototyping, including cost, performance, and perceived usability. We introduce a retrieval-augmented generation (RAG) method to emulate design feedback using OpenAI GPT-4o, grounded in prototyping data scraped from Instructables.com to increase access to relevant precedent. Two studies are reported. First, a controlled experiment compares GPT-RAG and human designers, who receive design sketches and predict cost, performance, and usability; predictions are evaluated against ground-truth results from physical prototypes. Second, we report an applied demonstration in which a physical prototype is produced from GPT-RAG recommendations and compared with a commercial baseline and a topology-optimized design. Results show that GPT-RAG provides more accurate cost and performance estimates than individual or crowd human estimates, while yielding comparable usability insights; the GPT-RAG-informed prototype also outperforms both comparison prototypes. Repeated querying with response averaging significantly improves accuracy, suggesting that LLMs can emulate crowd aggregation effects consistent with the law of large numbers.

Authors:Yuki Ueno, Hiroaki Natsukawa, Koji Koyamada
Title: Do Boxes Affect Exploration Behavior and Performance in Group-in-a-box Layouts?
Abstract:
The group-in-a-box (GIB) layout is an efficient graph drawing method designed to visualize the group structure of graphs. The layout communicates group sizes and both within-group and between-group network structures simultaneously. The layout is characterized by its composition of multiple elements, including nodes, edges, and boxes. However, there is limited empirical guidance on how these elements should be combined. In this paper, we measured participants' task performance and eye movements while identifying the group with the largest number of internal edges. We investigated the effect of visualization elements on task performance while controlling the density of internal edges and the box size. The results revealed that the box size in a GIB layout significantly affects the task accuracy either positively or negatively while eye-tracking data suggests that participants focused on internal edges, not the box size. These findings contribute empirical guidance for GIB layout design and lay the groundwork for future research as GIB layout becomes more widely used.

Authors:Pijuan Yu, Anzu Kawazoe, Alexis Urquhart, Thomas K. Ferris, M. Cynthia Hipwell, Rebecca F. Friesen
Title: A Hybrid Soft Haptic Display for Rendering Lump Stiffness in Remote Palpation
Abstract:
Remote palpation enables noninvasive tissue examination in telemedicine, yet current tactile displays often lack the fidelity to convey both large-scale forces and fine spatial details. This study introduces a hybrid fingertip display comprising a rigid platform and a $4\times4$ soft pneumatic tactile display (4.93 mm displacement and 1.175 N per single pneumatic chamber) to render a hard lump beneath soft tissue. This study compares three rendering strategies: a Platform-Only baseline that renders the total interaction force; a Hybrid A (Position + Force Feedback) strategy that adds a dynamic, real-time soft spatial cue; and a Hybrid B (Position + Preloaded Stiffness Feedback) strategy that provides a constant, pre-calculated soft spatial cue. In a 12-participant lump detection study, both hybrid methods dramatically improved accuracy over the Platform-Only baseline (from 50\% to over 95\%). While the Hybrid B was highlighted qualitatively for realism, its event-based averaging is expected to increase interaction latency in real-time operation. This suggests a trade-off between perceived lump realism and real-time responsiveness, such that rendering choices that enhance realism may conflict with those that minimize latency.

Authors:Leon A. Abdillah, Aisyah, Wahdyta Putri Panggabean, Sayfiyev Eldor Erkinovich
Title: Knowledge of Songket Cloth Small Medium Enterprise Digital Transformation
Abstract:
This article examines the knowledge of digital transformation of Small and Medium Enterprises (SMEs) that specialize in traditional handicrafts, with a specific emphasis on the Songket textile sector. The study investigates the use of digital technologies, notably blog platforms and the e-commerce site Shopee, to improve and streamline several business processes in Songket textile SMEs. The report takes a case study approach, diving into the experiences of Songket clothing enterprises that have undergone digital transformation. Key areas studied include the use of Blog platforms for brand development, marketing, and consumer involvement, as well as the Shopee E-Commerce platform for online sales and order processing. The essay seeks to give insights into the problems and possibilities faced by Songket cloth SMEs along their digital transformation journey by conducting in-depth observation, interviews, and surveys. The findings add to the scholarly discussion on the digitization of traditional industries, with practical implications for SMEs in the Songket textile sector and other handicraft areas. This study emphasizes the necessity of using digital technologies to preserve and expand traditional crafts, while also throwing light on the potential role of prominent E-Commerce platforms like Shopee in facilitating worldwide market access for such firms.

Authors:Houhao Liang, Azrin Jamaluddin, Kresimir Friganovic, Kirstie Neo, Raphael Han, Navrag Singh, Panos Mavros
Title: Multimodal Data Fusion to Capture Dynamic Interactions between Built Environment and Vulnerable Older Adults
Abstract:
Ensuring safe and inclusive mobility for vulnerable older adults is an emerging priority in urban planning. However, existing data sources such as surveys or GIS-based audits provide limited insight into how micro-scale built environment (BE) features influence real-world behavior and perception. This study presents a novel multimodal data-fusion approach that integrates wearable and environmental sensing to dynamically represent human-environment interactions and quantify the BE impacts on mobility among vulnerable older adults, specifically those with knee osteoarthritis or a history of falls. Data collected during naturalistic walking sessions in Singapore, are used to demonstrate this framework of synchronized streams from eye tracking, kinematic sensors, physiological monitors, GPS, and video recordings. Preliminary results show how AI-driven data fusion can uncover behaviorally and perceptually significant urban segments, providing a basis for actionable insights in inclusive design. This human-centered analytical approach advances the representation of urban environments from the perspective of vulnerable pedestrians, establishing a foundation for evidence-based, age-friendly city planning.

Authors:Berfin Ataman, Rodrigo Gallardo, Qilmeg Doudatcz
Title: Affective Translation: Material and Virtual Embodiments of Kinetic Textile Robots
Abstract:
This study presents a comparative framework for evaluating emotional engagement with textile soft robots and their augmented-reality (AR) counterparts. Four robotic sculptures were developed, each embodying nature-inspired dynamic behaviors such as breathing and gradual deformation. Using a between-subjects design, two independent groups, one experiencing the physical installations and one engaging with their virtual (AR) twins, follow identical protocols and complete the same self-assessment survey on affective and perceptual responses. This approach minimizes carryover and novelty effects while enabling a direct comparison of sensations such as calmness, curiosity, and discomfort across modalities. The analysis explores how motion, form, and material behavior shape emotional interpretation in physical versus digital contexts, informing the design of hybrid systems that evoke meaningful, emotionally legible interactions between humans, robots, and digital twins.

Authors:Suqing Liu, Bogdan Simion, Christopher Eaton, Michael Liut
Title: A Comparative Study of Technical Writing Feedback Quality: Evaluating LLMs, SLMs, and Humans in Computer Science Topics
Abstract:
Feedback is a critical component of the learning process, particularly in computer science education. This study investigates the quality of feedback generated by Large Language Models (LLMs), Small Language Models (SLMs), compared with human feedback, in three computer science course with technical writing components: an introductory computer science course (CS2), a third-year advanced systems course (operating systems), and a third-year writing course (a topics course on artificial intelligence). Using a mixed-methods approach which integrates quantitative Likert-scale questions with qualitative commentary, we analyze the student perspective on feedback quality, evaluated based on multiple criteria, including readability, detail, specificity, actionability, helpfulness, and overall quality. The analysis reveals that in the larger upper-year operating systems course ($N=80$), SLMs and LLMs are perceived to deliver clear, actionable, and well-structured feedback, while humans provide more contextually nuanced guidance. As for the high-enrollment CS2 course ($N=176$) showed the same preference for the AI tools' clarity and breadth, but students noted that AI feedback sometimes lacked the concise, straight-to-the-point, guidance offered by humans. Conversely, in the smaller upper-year technical writing course on AI topics ($N=7$), all students preferred feedback from the course instructor, who was able to provide clear, specific, and personalized feedback, compared to the more general and less targeted AI-based feedback. We also highlight the scalability of AI-based feedback by focusing on its effectiveness at large scale. Our findings underscore the potential of hybrid approaches that combine AI and human feedback to achieve efficient and high-quality feedback at scale.

Authors:Cameron A. Nurse, Kelly Breen, Matthew McGuire, Sara Prokup, Arun Jayaraman, Quentin Sanders
Title: Enhancing Paretic Propulsion Post-Stroke via a Wearable System for Real-Time Unilateral Haptic Feedback of Anterior Ground Reaction Forces
Abstract:
Gait rehabilitation interventions targeting paretic propulsion can improve walking speed and function in individuals post-stroke. Previous work has demonstrated that real-time biofeedback targeting anterior ground reaction forces (AGRFs) can increase propulsion in individuals post-stroke, however this work was confined to lab-based treadmills, limiting practical utility. Here we investigate the short-term effects of real-time AGRF gait biofeedback during overground walking using wearable inertial measurement units (IMUs) and a haptic feedback device. Eight individuals with chronic post-stroke hemiparesis completed four 3-minute training bouts. During training, faded haptic biofeedback was provided to increase paretic AGRF during terminal stance. Gait biomechanics were assessed before, during, and after training, and during a retention test conducted without biofeedback after a rest period. The primary dependent variable was peak paretic AGRF, while secondary variables included paretic peak trailing limb angle (TLA), step length and walking speed. Compared to baseline, peak AGRF increased post-feedback and at the retention tests. Similar trends were observed in TLA, and step length, although these increases were not statistically significant while speed showed a significant change from baseline. Examining individual participants 63% participants (responders) increased AGRF at retention, while 37% experienced decreases (non-responders). Non-responders had lower physical capability, evidenced by two-minute walk distance at screening and AFO use during training, suggesting this intervention may suit patients with more residual ankle mobility and strength. Nonetheless our results suggest AGRF biofeedback can be implemented in practical settings with wearable systems and is a promising gait training strategy to target propulsive deficits in individuals post stroke.

Authors:Stewart Collis, Florence Kinyua, Vikram Kumar, Howard Lakougna, Christian Merz, Kirti Pandey, Christian Resch
Title: Building AI-based advisory services for smallholder farmers: Technical learnings from the AIEP Initiative
Abstract:
We report technical learnings from five AI-based agricultural advisory MVPs deployed in Kenya and Bihar, India, under the AIEP Initiative. A 800-farmer study found high user satisfaction (NPS ~60). All solutions implement a modular two-part architecture: (i) an interface component (IVR /WhatsApp / app) with ASR-MT-TTS for multilingual voice access; and (ii) a reasoning component combining LLMs capabilities with query orchestration, external data (weather/soil/markets), and RAG over curated agricultural corpora. We describe key challenges: (a) latency, especially for voice; reductions were achieved via in-country hosting and audio minimization, but consistent <5s remains challenging; (b) language coverage: low-resource ASR/MT integration and nonstandard scripts hinder end-to-end quality; and (c) corpus curation: access, validation, and maintenance are labor-intensive, as well as provide recommendations on how to develop similar systems. We discuss common enablers including (a) data sharing, (b) common corpora, (c) better language AI and (d) evaluation and benchmarking. We also present golden Q&A sets to evaluate LLM capabilities for smallholder agriculture.

Authors:Baitong Xie, Mohd Fairuz Shiratuddin, Mostafa Hamadi, Joo Yeon Park, Thach-thao Duong
Title: Designing Gamified Social Interaction for Gen Z in the Metaverse: A Framework-Oriented Systematic Literature Review
Abstract:
Gamification plays a pivotal role in enhancing user engagement in the Metaverse, particularly among Generation Z users who value autonomy, immersion, and identity expression. However, current research lacks a cohesive framework tailored to designing gamified social experiences in immersive virtual environments. This study presents a framework-oriented systematic literature review, guided by PRISMA 2020 and SPIDER, to investigate how gamification is applied in the Metaverse and how it aligns with the behavioral needs of Gen Z. From 792 screened studies, seventeen high-quality papers were synthesized to identify core gamification mechanics, including avatars, XR affordances, and identity-driven engagement strategies. Building on these insights, we propose the Affordance-Driven Gamification Framework (ADGF), a conceptual model for designing socially immersive experiences, along with a five-step design process to support its real-world application. Our contributions include a critical synthesis of existing strategies, Gen Z-specific design considerations, and a dual-framework approach to guide researchers and practitioners in developing emotionally engaging and socially dynamic Metaverse experiences.

Authors:V. El Sawah, A. Bhardwaj, A. Pryke-Hobbes, D. Gamaleldin, C. S. Ang, A. K. Martin
Title: Artificial Intelligence as a Training Tool in Clinical Psychology: A Comparison of Text-Based and Avatar Simulations
Abstract:
Clinical psychology students frequently report feeling underprepared for the interpersonal demands of therapeutic work, highlighting the need for accessible opportunities to practise core counselling skills before seeing real clients. Advances in artificial intelligence (AI) now enable simulated interaction partners that may support early skills development. This study examined postgraduate clinical psychology students' perceptions of two AI-based simulations: a text-based chatbot (ChatGPT) and a voice-based avatar (HeyGen). Twenty-four students completed two brief cognitive-behavioural role-plays (counterbalanced), one with each tool, and provided both quantitative ratings and qualitative feedback on perceived usefulness, skill application, responsiveness and engagement, and perceived skill improvement. Both AI tools were evaluated positively across dimensions. However, the avatar was rated significantly higher than the chatbot for perceived usefulness, skill application, and perceived skill improvement, and qualitative comments highlighted the added value of voice-based interaction for conveying social and emotional cues. These findings suggest that AI-driven simulation may supplement early-stage clinical skills training, with voice-based avatars offering additional benefits. Future work should test whether such simulated interactions translate to objective improvements in real therapeutic performance.

Authors:Meike Driessen, Selina Khan, Gonçalo Marcelino
Title: "Jutters"
Abstract:
This project explores how we engage with AI-generated content through the lens of the jutter: Dutch coastal foragers who comb the shoreline after storms, gathering and repurposing what the sea leaves behind. Reflecting how our lives are increasingly shaped by AI-generated media, we create a beach-like installation that blends real shoreline debris with AI-transformed images and videos. Visitors are invited to explore this space as contemporary jutters, deciding what to keep and what to discard. In doing so, the project reimagines AI-imagery as material for reflection, encouraging a more discerning engagement with the content that drifts through our feeds. A video preview of the installation can be found at https://www.youtube.com/watch?v=L6319Ii7MT8.

Authors:Geonwoo Bang, DongMyung Kim, Hayoung Oh
Title: SNAP: A Plan-Driven Framework for Controllable Interactive Narrative Generation
Abstract:
Large Language Models (LLMs) hold great potential for web-based interactive applications, including browser games, online education, and digital storytelling platforms. However, LLM-based conversational agents suffer from spatiotemporal distortions when responding to variant user inputs, failing to maintain consistency with provided scenarios. We propose SNAP (Story and Narrative-based Agent with Planning), a framework that structures narratives into Cells with explicit Plans to prevent narrative drift in web environments. By confining context within each Cell and employing detailed plans that specify spatiotemporal settings, character actions, and plot developments, SNAP enables coherent and scenario-consistent dialogues while adapting to diverse user responses. Via automated and human evaluations, we validate SNAP's superiority in narrative controllability, demonstrating effective scenario consistency despite variant user inputs in web-based interactive storytelling.

Authors:Christian Ellington, Paramahansa Pramanik, Haley K. Robinson
Title: MetaScoreLens: Evaluating User Feedback Across Digital Entertainment Systems
Abstract:
The popularity of electronic games has grown steadily in recent years, attracting a broad audience across age groups. With this growth comes a large volume of related data, prompting efforts like the PlayMyData to compile and share structured datasets for academic use. This study utilizes such a dataset to compare user review ratings across four current-generation gaming systems: Nintendo, Xbox, PlayStation, and PC. Statistical methods, including analysis of variance (ANOVA), were applied to identify differences in average scores among these platforms. The findings indicate that PC titles tend to receive the most favorable user feedback, followed by Xbox and PlayStation, while Nintendo games showed the lowest average ratings. These patterns suggest that the platform on which a game is released may influence how players evaluate their experience. Such results may be valuable to developers and industry stakeholders in making informed decisions about future investments and development priorities.

Authors:Lorena A. Barba, Laura Stegner
Title: The Conversational Exam: A Scalable Assessment Design for the AI Era
Abstract:
Traditional assessment methods collapse when students use generative AI to complete work without genuine engagement, creating an illusion of competence where they believe they're learning but aren't. This paper presents the conversational exam -- a scalable oral examination format that restores assessment validity by having students code live while explaining their reasoning. Drawing on human-computer interaction principles, we examined 58 students in small groups across just two days, demonstrating that oral exams can scale to typical class sizes. The format combines authentic practice (students work with documentation and supervised AI access) with inherent validity (real-time performance cannot be faked). We provide detailed implementation guidance to help instructors adapt this approach, offering a practical path forward when many educators feel paralyzed between banning AI entirely or accepting that valid assessment is impossible.

Authors:Rubel Hassan Mollik, Vamsi Krishna Kosuri, Hans Djalali, Stephanie Ludi, Aboubakar Mountapmbeme
Title: An Extension-Based Accessibility Framework for Making Blockly Accessible to Blind and Low-Vision Users
Abstract:
Block-based programming environments (BBPEs) such as Scratch and Code.org are now widely used in K-12 computer science classes, but they remain mostly inaccessible to blind or visually impaired (BVI) learners. A major problem is that prior accessibility solutions have relied on modifications to the Blockly library, making them difficult to apply in existing BBPEs and thereby limiting adoption. We present an Extension-based Accessibility Framework (EAF) to make BBPEs accessible for BVI students. The framework uses a modular architecture that enables seamless integration with existing Blockly-based BBPEs. We present an innovative three-dimensional (3D) hierarchical navigation model featuring stack labeling and block numbering, mode-based editing to prevent accidental modifications, and WAI-ARIA implementation to ensure compatibility with external screen readers. We evaluated our approach by integrating the EAF framework into two BBPEs (covering 177 test cases) and conducting semi-structured interviews with four participants using VoiceOver, JAWS, and NVDA. Participants reported clearer spatial orientation and easier mental model formation compared to default Blockly keyboard navigation. EAF shows that modular architecture can provide comprehensive accessibility while ensuring compatibility with existing BBPEs.

Authors:Ishani Kanapathipillai, Obhasha Priyankara
Title: CoGen: Creation of Reusable UI Components in Figma via Textual Commands
Abstract:
The evolution of User Interface design has emphasized the need for efficient, reusable, and editable components to ensure an efficient design process. This research introduces CoGen, a system that uses machine learning techniques to generate reusable UI components directly in Figma, one of the most popular UI design tools. Addressing gaps in current systems, CoGen focuses on creating atomic components such as buttons, labels, and input fields using structured JSON and natural language prompts. The project integrates Figma API data extraction, Seq2Seq models, and fine-tuned T5 transformers for component generation. The key results demonstrate the efficiency of the T5 model in prompt generation, with an accuracy of 98% and a BLEU score of 0.2668, which ensures the mapping of JSON to descriptive prompts. For JSON creation, CoGen achieves a success rate of up to 100% in generating simple JSON outputs for specified component types.

Authors:Raphael Buchmüller, Dennis Collaris, Linhao Meng, Angelos Chatzimparmpas
Title: LangLasso: Interactive Cluster Descriptions through LLM Explanation
Abstract:
Dimensionality reduction is a powerful technique for revealing structure and potential clusters in data. However, as the axes are complex, non-linear combinations of features, they often lack semantic interpretability. Existing visual analytics (VA) methods support cluster interpretation through feature comparison and interactive exploration, but they require technical expertise and intense human effort. We present \textit{LangLasso}, a novel method that complements VA approaches through interactive, natural language descriptions of clusters using large language models (LLMs). It produces human-readable descriptions that make cluster interpretation accessible to non-experts and allow integration of external contextual knowledge beyond the dataset. We systematically evaluate the reliability of these explanations and demonstrate that \langlasso provides an effective first step for engaging broader audiences in cluster interpretation. The tool is available at https://langlasso.vercel.app

Authors:David Elsweiler, Christine Elsweiler, Anna Ziegner
Title: Cooking Up Politeness in Human-AI Information Seeking Dialogue
Abstract:
Politeness is a core dimension of human communication, yet its role in human-AI information seeking remains underexplored. We investigate how user politeness behaviour shapes conversational outcomes in a cooking-assistance setting. First, we annotated 30 dialogues, identifying four distinct user clusters ranging from Hyperpolite to Hyperefficient. We then scaled up to 18,000 simulated conversations across five politeness profiles (including impolite) and three open-weight models. Results show that politeness is not only cosmetic: it systematically affects response length, informational gain, and efficiency. Engagement-seeking prompts produced up to 90% longer replies and 38% more information nuggets than hyper-efficient prompts, but at markedly lower density. Impolite inputs yielded verbose but less efficient answers, with up to 48% fewer nuggets per watt-hour compared to polite input. These findings highlight politeness as both a fairness and sustainability issue: conversational styles can advantage or disadvantage users, and "polite" requests may carry hidden energy costs. We discuss implications for inclusive and resource-aware design of information agents.

Authors:Adam Bradley, John Hastings, Khandaker Mamun Ahmed
Title: Introducing Axlerod: An LLM-based Chatbot for Assisting Independent Insurance Agents
Abstract:
The insurance industry is undergoing a paradigm shift through the adoption of artificial intelligence (AI) technologies, particularly in the realm of intelligent conversational agents. Chatbots have evolved into sophisticated AI-driven systems capable of automating complex workflows, including policy recommendation and claims triage, while simultaneously enabling dynamic, context-aware user engagement. This paper presents the design, implementation, and empirical evaluation of Axlerod, an AI-powered conversational interface designed to improve the operational efficiency of independent insurance agents. Leveraging natural language processing (NLP), retrieval-augmented generation (RAG), and domain-specific knowledge integration, Axlerod demonstrates robust capabilities in parsing user intent, accessing structured policy databases, and delivering real-time, contextually relevant responses. Experimental results underscore Axlerod's effectiveness, achieving an overall accuracy of 93.18% in policy retrieval tasks while reducing the average search time by 2.42 seconds. This work contributes to the growing body of research on enterprise-grade AI applications in insurtech, with a particular focus on agent-assistive rather than consumer-facing architectures.

Authors:Yuki Kobayashi, Koichi Toida
Title: Immersive XR That Moves People: How XR Advertising Transforms Comprehension, Empathy, and Behavioural Intention
Abstract:
Extended Reality (XR) affords an enhanced sense of bodily presence that supports experiential modes of comprehension and affective engagement which exceed the possibilities of conventional information delivery. Nevertheless, the psychological processes engendered by XR, and the manner in which these processes inform subsequent behavioural intentions, remain only partially delineated. The present study addresses this issue within an applied context by comparing non-immersive 2D viewing advertising with immersive XR experiential advertising. We examined whether XR strengthens internal responses to a product, specifically perceived comprehension and empathy, and whether these responses, in turn, influence the behavioural outcome of purchase intention. A repeated-measures two-way ANOVA demonstrated a significant main effect of advertising modality, with XR yielding higher ratings on all evaluative dimensions. Mediation analysis further indicated that the elevation in purchase intention was mediated by empathy, whereas no significant mediating effect was observed for comprehension within the scope of this study. These findings suggest that immersive XR experiences augment empathic engagement with virtual products, and that this enhanced empathy plays a pivotal role in shaping subsequent behavioural intentions.

Authors:Sumin Hong, Jewoong Moon, Taeyeon Eom, Juno Hwang, Jibeom Seo
Title: Leveraging learning analytics to enhance immersive teacher simulations: Challenges and opportunities
Abstract:
This chapter examines how data analytics can be leveraged to enhance immersive teacher simulations, situating this inquiry within the broader learning sciences discourse on embodied cognition, data-informed feedback, and teacher professional learning. It explores both conceptual foundations and empirical cases to illustrate how analytics serve as mediational tools that connect immersive experiences with reflective teaching practice. The chapter unfolds in multiple sections: (1) The Innovation Journey: An Overview of Immersive Teacher Simulations outlines the evolution from traditional simulations to XR-based environments, highlighting the need for professional decision-making under realistic constraints. (2) Innovation in Existing Research and Practice situates teacher analytics within the trajectory from descriptive observation to multimodal and predictive modeling. (3) Study Approach and Design details how multimodal data-discourse, behavior, and gaze-from the TeacherGen@i simulation were collected and organized to reveal cognitive distribution of pedagogical discourse and interaction patterns. (4) Findings present the cognitive distribution of preservice teachers' pedagogical discourse and the sequential interaction patterns that emerge in exchange, illustrating how multimodal analytics make pedagogical reasoning processes visible within immersive simulations. (5) Understanding Innovative Practices in Teacher Education examines teaching analytics to enhance immersive teacher simulation based on the findings of the study. (6) Key Takeaways of the Innovation Journey identifies research challenges and design implications for scalable, analytics-enhanced teacher education. Together, these sections position immersive teacher simulations as a pivotal testbed for aligning learning analytics, professional learning, and next-generation immersive learning environment design.

Authors:Roshni Kaushik, Reid Simmons
Title: Older Adults' Preferences for Feedback Cadence from an Exercise Coach Robot
Abstract:
People can respond to feedback and guidance in different ways, and it is important for robots to personalize their interactions and utilize verbal and nonverbal communication cues. We aim to understand how older adults respond to different cadences of verbal and nonverbal feedback of a robot exercise coach. We conducted an online study of older adults, where participants evaluated videos of the robot giving feedback at different cadences for each modality. The results indicate that changing the cadence of one modality affects the perception of both it and the other modality. We can use the results from this study to better design the frequency of the robot coach's feedback during an exercise session with this population.

Authors:Elia Moscoso-Thompson, Katia Lupinetti, Irene Capasso, Fabrizio Ravicchio, Brigida Bonino, Franca Giannini, Andrea Canessa, Silvio Sabatini, Lucia Ferlino, Chiara Malagoli
Title: Tailored Immersive Environments: Advancing Neurodivergent Support Through Virtual Reality
Abstract:
Every day life tasks can present significant challenges for neurodivergent individuals, particularly those with Autism Spectrum Disorders (ASD) who are characterized by specific sensitivities. This contribution describes a virtual reality system that allows neurodivergent individuals to experience everyday situations in order to practice and implement strategies for overcoming their daily challenges. The key strength of the proposed system is the automatic personalization of the virtual environment, based on both the individual's abilities and their specific training needs. The proposed method has been evaluated on four synthetic user profiles, also proposing a metric able to evaluate the variance of the features within the same difficulty level. The results show that the method can produce a significant number of scenarios for the various difficulty levels. Furthermore, within the same difficulty, there is a wide variance of the non-constrained features for the specific profile.

Authors:Shuxian Li, Tianyue Wang, Chris Twombly
Title: Statistical Blendshape Calculation and Analysis for Graphics Applications
Abstract:
With the development of virtualization and AI, real-time facial avatar animation is widely used in entertainment, office, business and other fields. Against this background, blendshapes have become a common industry animation solution because of their relative simplicity and ease of interpretation. Aiming for real-time performance and low computing resource dependence, we independently developed an accurate blendshape prediction system for low-power VR applications using a standard webcam. First, blendshape feature vectors are extracted through affine transformation and segmentation. Through further transformation and regression analysis, we were able to identify models for most blendshapes with significant predictive power. Post-processing was used to further improve response stability, including smoothing filtering and nonlinear transformations to minimize error. Experiments showed the system achieved accuracy similar to ARKit 6. Our model has low sensor/hardware requirements and realtime response with a consistent, accurate and smooth visual experience.

Authors:Liberty Kent, Nilufer Tuptuk, Ingolf Becker
Title: Passing the Baton: Shift Handovers within Cybersecurity Incident Response Teams
Abstract:
Effective shift transitions are crucial for cybersecurity incident response teams, yet there is limited guidance on managing these handovers. This exploratory study aimed to develop guidelines for such transitions through the analysis of existing literature and consultation with practitioners. Two draft guidelines (A and B) were created based on existing literature and online resources. Six participants from the UK and international incident response teams, with experience in shift handovers, were interviewed about handover structure, challenges, training practices, and their views on the draft guidelines. The collected data indicate the importance of signposting, evolving handover procedures, individual differences in handover style and detail, and streamlining the handover procedure. Participants agreed the drafts included all relevant details but suggested adding a post-incident review section and a service section for outages or technical difficulties. This study establishes a foundation for enhancing transition practices in cybersecurity incident response teams.

Authors:Stinne Zacho, Chris Hall, Jakob Kusnick, Stefan Jänicke
Title: Santa Clara 3D: Digital Reconstruction and Storytelling of a Francoist Concentration Camp
Abstract:
This paper explores the potential of digital reconstruction and interactive storytelling to preserve historically suppressed sites. The main objective of an interdisciplinary team of data scientists from the MEMORISE project and associates of the memory association Asociacion Recuerdo y Dignidad was to preserve the memory of the Francoist Santa Clara concentration camp in Soria, Spain, through the use of digital technology. Combining archival research, 3D modelling, 360-degree photography, and web development, a prototype digital platform was created to visualise the transformation of the site across three historical phases: its origin as a convent, its use as a Francoist concentration camp, and its present-day condition. The platform allows users to navigate through spatial and temporal layers. Clickable media markers encourage exploration and interaction. Drawing on principles of participatory design, narrative visualisation, and open-ended user engagement, the project demonstrates how digital tools can support memory work, public engagement, and historical reflection. Our low-cost concept is especially adaptable to other physical sites that have been erased or forgotten.

Authors:Theodore Roberts, Bahram Zarrin
Title: From Values to Frameworks: A Qualitative Study of Ethical Reasoning in Agentic AI Practitioners
Abstract:
Agentic artificial intelligence systems are autonomous technologies capable of pursuing complex goals with minimal human oversight and are rapidly emerging as the next frontier in AI. While these systems promise major gains in productivity, they also raise new ethical challenges. Prior research has examined how different populations prioritize Responsible AI values, yet little is known about how practitioners actually reason through the trade-offs inherent in designing these autonomous systems. This paper investigates the ethical reasoning of AI practitioners through qualitative interviews centered on structured dilemmas in agentic AI deployment. We find that the responses of practitioners do not merely reflect value preferences but rather align with three distinct reasoning frameworks. First is a Customer-Centric framework where choices are justified by business interests, legality, and user autonomy. Second is a Design-Centric framework emphasizing technical safeguards and system constraints. Third is an Ethics-Centric framework prioritizing social good and moral responsibility beyond compliance. We argue that these frameworks offer distinct and necessary insights for navigating ethical trade-offs. Consequently, providers of agentic AI must look beyond general principles and actively manage how these diverse reasoning frameworks are represented in their decision-making processes to ensure robust ethical outcomes.

Authors:Sahibpreet Singh, Pawan Kumar
Title: Sports Business Administration and New Age Technology: Role of AI
Abstract:
This chapter explores the complexities of sports governance, taxation, dispute resolution, and the impact of digital transformation within the sports sector. This study identifies a critical research gap regarding the integration of innovative technologies to enhance governance and talent identification in sports law. The objective is to evaluate how data-driven approaches and AI can optimize recruitment processes; also ensuring compliance with existing regulations. A comprehensive analysis of current governance structures and taxation policies,(ie Income Tax Act and GST Act), reveals preliminary results indicating that reform is necessary to support sustainable growth in the sports economy. Key findings demonstrate that AI enhances player evaluation by minimizing biases and expanding access to diverse talent pools. While the Court of Arbitration for Sport provides an efficient mechanism for dispute resolution. The implications emphasize the need for regulatory reforms that align taxation policies with international best practices, promoting transparency and accountability in sports organizations. This research contributes valuable insights into the evolving dynamics of sports management, aiming to foster innovation and integrity in the industry.

Authors:Anna Katharina Holl-Etten, Nina Schnaderbeck, Elizaveta Kosareva, Leonhard Aron Prattke, Ralph Krueger, Lisa Marie Warner, Nora C. Vetter
Title: Applied Theory of Mind and Large Language Models -- how good is ChatGPT at solving social vignettes?
Abstract:
The rapid development of language-based artificial intelligence (AI) offers new possibilities for psychotherapy and assistive systems, particularly benefitting autistic individuals who often respond well to technology. Parents of autistic persons emphasize the importance of appropriate and context-specific communication behavior. This study investigated whether GPT-3.5 Turbo and GPT-4, as language-based AI applications, are fundamentally capable of replicating this type of adequate communication behavior in the form of applied Theory of Mind (ToM). GPT-3.5 Turbo and GPT-4 were evaluated on three established higher-order ToM tasks: the Faux Pas Test, the Social Stories Questionnaire, and the Story Comprehension Test in English and German. Two independent raters scored response accuracy based on standardized manuals. In addition, responses were rated for epistemic markers as indicators of uncertainty. GPT's results were compared to human neurotypical and neurodivergent samples from previous own and others' research. GPT-4 achieved near human accuracy on the Faux Pas Test and outperformed GPT-3.5 Turbo and individuals with autistic traits. On the Social Stories Questionnaire, GPT-4 scored comparable to neurotypical adults, while GPT-3.5 Turbo remained well below. In the Story Comprehension Test, GPT-4 reached scores that exceeded neurotypical adult and adolescent benchmarks. However, GPT-4 used epistemic markers in up to 42% of responses. GPT-4 shows encouraging performance in complex higher-order ToM tasks and may offer future potential as an assistive tool for individuals with (and without) social communication difficulties. Its ability to interpret complex social situations is promising; however, the frequent use of uncertainty markers highlights the need for further study for assistive use and possibly further refinement to ensure consistent and reliable support in real-world use.

Authors:Kévin Ducharlet, Liwen Zhang, Sara Maqrot, Houssem Saidi
Title: A Recommendation System-Based Framework for Enhancing Human-Machine Collaboration in Industrial Timetabling Rescheduling: Application in Preventive Maintenance
Abstract:
Industrial timetabling is a critical task for decision-makers across various sectors to ensure efficient system operation. In real-world settings, it remains challenging because unexpected events often disrupt execution. When such events arise, effective rescheduling and collaboration between humans and machines becomes essential. This paper presents a recommendation system-based framework for handling rescheduling challenges, built on Timefold, a powerful AI-driven planning engine. Our experimental study evaluates nine instances inspired by a realworld preventive maintenance use case, aiming to identify the heuristic that best balances solution quality and computing time to support near-optimal decisionmaking when rescheduling is required due to unexpected events during operational days. Finally, we illustrate the complete process of our recommendation system through a simple use case.

Authors:Lucija Mihić Zidar, Philipp Wicke, Praneel Bhatia, Rosa Lutz, Marius Klug, Thorsten O. Zander
Title: Decoding Workload and Agreement From EEG During Spoken Dialogue With Conversational AI
Abstract:
Passive brain-computer interfaces offer a potential source of implicit feedback for alignment of large language models, but most mental state decoding has been done in controlled tasks. This paper investigates whether established EEG classifiers for mental workload and implicit agreement can be transferred to spoken human-AI dialogue. We introduce two conversational paradigms - a Spelling Bee task and a sentence completion task- and an end-to-end pipeline for transcribing, annotating, and aligning word-level conversational events with continuous EEG classifier output. In a pilot study, workload decoding showed interpretable trends during spoken interaction, supporting cross-paradigm transfer. For implicit agreement, we demonstrate continuous application and precise temporal alignment to conversational events, while identifying limitations related to construct transfer and asynchronous application of event-based classifiers. Overall, the results establish feasibility and constraints for integrating passive BCI signals into conversational AI systems.

Authors:Kaichun Wang, Yanguang Chen, Ting Zhang, Mengyao Bao, Keyu Chen, Xu Hu, Yongliang Wang, Jingsheng Yang, Jinsong Zhang, Fei Lu
Title: From Events to Trending: A Multi-Stage Hotspots Detection Method Based on Generative Query Indexing
Abstract:
LLM-based conversational systems have become a popular gateway for information access, yet most existing chatbots struggle to handle news-related trending queries effectively. To improve user experience, an effective trending query detection method is urgently needed to enable differentiated processing of such target traffic. However, current research on trending detection tailored to the dialogue system scenario remains largely unexplored, and methods designed for traditional search engines often underperform in conversational contexts due to radically distinct query distributions and expression patterns. To fill this gap, we propose a multi-stage framework for trending detection, which achieves systematic optimization from both offline generation and online identification perspectives. Specifically, our framework first exploits selected hot events to generate index queries, establishing a key bridge between static events and dynamic user queries. It then employs a retrieval matching mechanism for real-time online detection of trending queries, where we introduce a cascaded recall and ranking architecture to balance detection efficiency and accuracy. Furthermore, to better adapt to the practical application scenario, our framework adopts a single-recall module as a cold-start strategy to collect online data for fine-tuning the reranker. Extensive experiments demonstrate that our framework significantly outperforms baseline methods in both offline evaluations and online A/B tests, and user satisfaction is relatively improved by 27\% in terms of positive-negative feedback ratio.

Authors:Jin Gao, Saichandu Juluri
Title: From Idea to Co-Creation: A Planner-Actor-Critic Framework for Agent Augmented 3D Modeling
Abstract:
We present a framework that extends the Actor-Critic architecture to creative 3D modeling through multi-agent self-reflection and human-in-the-loop supervision. While existing approaches rely on single-prompt agents that directly execute modeling commands via tools like Blender MCP, our approach introduces a Planner-Actor-Critic architecture. In this design, the Planner coordinates modeling steps, the Actor executes them, and the Critic provides iterative feedback, while human users act as supervisors and advisors throughout the process. Through systematic comparison between single-prompt modeling and our reflective multi-agent approach, we demonstrate improvements in geometric accuracy, aesthetic quality, and task completion rates across diverse 3D modeling scenarios. Our evaluation reveals that critic-guided reflection, combined with human supervisory input, reduces modeling errors and increases complexity and quality of the result compared to direct single-prompt execution. This work establishes that structured agent self-reflection, when augmented by human oversight and advisory guidance, produces higher-quality 3D models while maintaining efficient workflow integration through real-time Blender synchronization.

Authors:Takafumi Sakamoto, Yugo Takeuchi
Title: Model of Spatial Human-Agent Interaction with Consideration for Others
Abstract:
Communication robots often need to initiate conversations with people in public spaces. At the same time, such robots must not disturb pedestrians. To handle these two requirements, an agent needs to estimate the communication desires of others based on their behavior and then adjust its own communication activities accordingly. In this study, we construct a computational spatial interaction model that considers others. Consideration is expressed as a quantitative parameter: the amount of adjustment of one's internal state to the estimated internal state of the other. To validate the model, we experimented with a human and a virtual robot interacting in a VR environment. The results show that when the participant moves to the target, a virtual robot with a low consideration value inhibits the participant's movement, while a robot with a higher consideration value did not inhibit the participant's movement. When the participant approached the robot, the robot also exhibited approaching behavior, regardless of the consideration value, thus decreasing the participant's movement. These results appear to verify the proposed model's ability to clarify interactions with consideration for others.

Authors:Israt Jahan Chowdhury, Md Abu Yousuf Tanvir
Title: Decision-Aware Trust Signal Alignment for SOC Alert Triage
Abstract:
Detection systems that utilize machine learning are progressively implemented at Security Operations Centers (SOCs) to help an analyst to filter through high volumes of security alerts. Practically, such systems tend to reveal probabilistic results or confidence scores which are ill-calibrated and hard to read when under pressure. Qualitative and survey based studies of SOC practice done before reveal that poor alert quality and alert overload greatly augment the burden on the analyst, especially when tool outputs are not coherent with decision requirements, or signal noise. One of the most significant limitations is that model confidence is usually shown without expressing that there are asymmetric costs in decision making where false alarms are much less harmful than missed attacks. The present paper presents a decision-sensitive trust signal correspondence scheme of SOC alert triage. The framework combines confidence that has been calibrated, lightweight uncertainty cues, and cost-sensitive decision thresholds into coherent decision-support layer, instead of making changes to detection models. To enhance probabilistic consistency, the calibration is done using the known post-hoc methods and the uncertainty cues give conservative protection in situations where model certainty is low. To measure the model-independent performance of the suggested model, we apply the Logistic Regression and the Random Forest classifiers to the UNSW-NB15 intrusion detection benchmark. According to simulation findings, false negatives are greatly amplified by the presence of misaligned displays of confidence, whereas cost weighted loss decreases by orders of magnitude between models with decision aligned trust signals. Lastly, we describe a human-in-the-loop study plan that would allow empirically assessing the decision-making of the analysts with aligned and misaligned trust interfaces.

Authors:Vivian Lai, Zana Buçinca, Nil-Jana Akpinar, Mo Houtti, Hyeonsu B. Kang, Kevin Chian, Namjoon Suh, Alex C. Williams
Title: Users Mispredict Their Own Preferences for AI Writing Assistance
Abstract:
Proactive AI writing assistants need to predict when users want drafting help, yet we lack empirical understanding of what drives preferences. Through a factorial vignette study with 50 participants making 750 pairwise comparisons, we find compositional effort dominates decisions ($ρ= 0.597$) while urgency shows no predictive power ($ρ\approx 0$). More critically, users exhibit a striking perception-behavior gap: they rank urgency first in self-reports despite it being the weakest behavioral driver, representing a complete preference inversion. This misalignment has measurable consequences. Systems designed from users' stated preferences achieve only 57.7\% accuracy, underperforming even naive baselines, while systems using behavioral patterns reach significantly higher 61.3\% ($p < 0.05$). These findings demonstrate that relying on user introspection for system design actively misleads optimization, with direct implications for proactive natural language generation (NLG) systems.

Authors:William Franz Lamberti, Sunbin Kim, Samantha Rose Lawrence
Title: Pilot Study on Student Public Opinion Regarding GAI
Abstract:
The emergence of generative AI (GAI) has sparked diverse opinions regarding its appropriate use across various domains, including education. This pilot study investigates university students' perceptions of GAI in higher education classrooms, aiming to lay the groundwork for understanding these attitudes. With a participation rate of approximately 4.4%, the study highlights the challenges of engaging students in GAI-related research and underscores the need for larger sample sizes in future studies. By gaining insights into student perspectives, instructors can better prepare to integrate discussions of GAI into their classrooms, fostering informed and critical engagement with this transformative technology.

Authors:Sumit S. Shevtekar, Chandresh K. Maurya, Gourab Sil, Subasish Das
Title: Predicting Time Pressure of Powered Two-Wheeler Riders for Proactive Safety Interventions
Abstract:
Time pressure critically influences risky maneuvers and crash proneness among powered two-wheeler riders, yet its prediction remains underexplored in intelligent transportation systems. We present a large-scale dataset of 129,000+ labeled multivariate time-series sequences from 153 rides by 51 participants under No, Low, and High Time Pressure conditions. Each sequence captures 63 features spanning vehicle kinematics, control inputs, behavioral violations, and environmental context. Our empirical analysis shows High Time Pressure induces 48% higher speeds, 36.4% greater speed variability, 58% more risky turns at intersections, 36% more sudden braking, and 50% higher rear brake forces versus No Time Pressure. To benchmark this dataset, we propose MotoTimePressure, a deep learning model combining convolutional preprocessing, dual-stage temporal attention, and Squeeze-and-Excitation feature recalibration, achieving 91.53% accuracy and 98.93% ROC AUC, outperforming eight baselines. Since time pressure cannot be directly measured in real time, we demonstrate its utility in collision prediction and threshold determination. Using MTPS-predicted time pressure as features, improves Informer-based collision risk accuracy from 91.25% to 93.51%, approaching oracle performance (93.72%). Thresholded time pressure states capture rider cognitive stress and enable proactive ITS interventions, including adaptive alerts, haptic feedback, V2I signaling, and speed guidance, supporting safer two-wheeler mobility under the Safe System Approach.

Authors:Vilém Zouhar, Tom Kocmi
Title: Pearmut: Human Evaluation of Translation Made Trivial
Abstract:
Human evaluation is the gold standard for multilingual NLP, but is often skipped in practice and substituted with automatic metrics, because it is notoriously complex and slow to set up with existing tools with substantial engineering and operational overhead. We introduce Pearmut, a lightweight yet feature-rich platform that makes end-to-end human evaluation as easy to run as automatic evaluation. Pearmut removes common entry barriers and provides support for evaluating multilingual tasks, with a particular focus on machine translation. The platform implements standard evaluation protocols, including DA, ESA, or MQM, but is also extensible to allow prototyping new protocols. It features document-level context, absolute and contrastive evaluation, attention checks, ESAAI pre-annotations and both static and active learning-based assignment strategies. Pearmut enables reliable human evaluation to become a practical, routine component of model development and diagnosis rather than an occasional effort.

Authors:Manuela Chessa, Michela Chessa, Lorenzo Gerini, Matteo Martini, Kaloyana Naneva, Fabio Solari
Title: Avatar Exposure and Strategic Coordination in Virtual Reality: Evidence from a Threshold Public Goods Experiment
Abstract:
Digital platforms increasingly support collective action initiatives, yet coordinating geographically dispersed users through digital interfaces remains challenging, particularly in threshold settings where success requires critical mass participation. This study investigates how avatar-based social representation in Virtual Reality (VR) influences coordination in threshold collective action problems. Through a randomized controlled experiment with 188 participants organized in 94 pairs, we examine whether brief avatar exposure affects perceived co-presence and coordination outcomes in a two-player threshold public goods game implemented as a real-effort recycling task. We manipulate a single design feature: participants either briefly interact through avatars before the main task (Pre-Task Avatar treatment) or complete an equivalent activity individually without peer visibility (No Pre-Task Avatar treatment). Our findings reveal that minimal avatar exposure significantly increases perceived co-presence and improves strategic coordination, though not through increased contribution quantity. Participants exposed to peer avatars achieve higher social welfare by coordinating to avoid wasteful over-contribution beyond the threshold. Additionally, we identify VR presence-the sense of 'being there' in the virtual environment-as a stronger predictor of task performance than co-presence itself. This research contributes to Information Systems theory by establishing causal pathways from specific design features to presence to coordination outcomes, demonstrates VR as a rigorous experimental methodology for IS research, and provides actionable insights for designing collaborative platforms supporting sustainability initiatives and threshold collective action problems.

Authors:Mohammad Mahdi Habibi Bina, Sepideh Baghernezhad, Mohammad Reza Daliri, Mohammad Hassan Moradi
Title: Neural Digital Twins: Toward Next-Generation Brain-Computer Interfaces
Abstract:
Current neural interfaces such as brain-computer interfaces (BCIs) face several fundamental challenges, including frequent recalibration due to neuroplasticity and session-to-session variability, real-time processing latency, limited personalization and generalization across subjects, hardware constraints, surgical risks in invasive systems, and cognitive burden in patients with neurological impairments. These limitations significantly affect the accuracy, stability, and long-term usability of BCIs. This article introduces the concept of the Neural Digital Twin (NDT) as an advanced solution to overcome these barriers. NDT represents a dynamic, personalized computational model of the brain-BCI system that is continuously updated with real-time neural data, enabling prediction of brain states, optimization of control commands, and adaptive tuning of decoding algorithms. The design of NDT draws inspiration from the application of Digital Twin technology in advanced industries such as aerospace and autonomous vehicles, and leverages recent advances in artificial intelligence and neuroscience data acquisition technologies. In this work, we discuss the structure and implementation of NDT and explore its potential applications in next-generation BCIs and neural decoding, highlighting its ability to enhance precision, robustness, and individualized control in neurotechnology.

Authors:Ka-Yan Fung, Yuxing Tao, Tze-Leung, Rick Lui, Kuen-Fung Sin
Title: Bridging Language Gaps: Utilizing Interactive Robots to Teach Cantonese in Real-Life Contexts for Newly-Arrived Children
Abstract:
Hong Kong's education system is notably multicultural, including local, non-Chinese-speaking, and newly arrived students (NAS) (Mandarine Chinese-speaking). NAS can guess the meaning of vocabulary but cannot speak out, presenting unique challenges for them, particularly language barriers and cultural differences. These challenges hinder their academic success and social integration, leading to feelings of isolation and demotivation. Current resources often fail to address the emotional well-being of these students and predominantly focus on English language acquisition, leaving a gap in support for learning Cantonese and navigating the local cultural landscape. This study explores the effectiveness of an interactive robot, Boon Boon, in teaching Cantonese through real-life contexts to enhance NAS children learning engagement and motivation. The research questions are: (1) How does interactive robot-empowered scenario learning influence the learning engagement and motivation of NAS in learning Cantonese? and (2) What is the impact of a robot-empowered scenario learning system on the Cantonese language proficiency of NAS? Fourteen children are invited to participate in a four-day learning program with Boon Boon. The preliminary result indicated that Boon Boon drove students' attention to learning and academic achievement. Future research will focus on long-term assessments of robot-empowered learning's effectiveness and explore the scalability of this approach across diverse educational settings and cultural backgrounds.

Authors:Rafael Wampfler, Chen Yang, Dillon Elste, Nikola Kovacevic, Philine Witzig, Markus Gross
Title: A Platform for Interactive AI Character Experiences
Abstract:
From movie characters to modern science fiction - bringing characters into interactive, story-driven conversations has captured imaginations across generations. Achieving this vision is highly challenging and requires much more than just language modeling. It involves numerous complex AI challenges, such as conversational AI, maintaining character integrity, managing personality and emotions, handling knowledge and memory, synthesizing voice, generating animations, enabling real-world interactions, and integration with physical environments. Recent advancements in the development of foundation models, prompt engineering, and fine-tuning for downstream tasks have enabled researchers to address these individual challenges. However, combining these technologies for interactive characters remains an open problem. We present a system and platform for conveniently designing believable digital characters, enabling a conversational and story-driven experience while providing solutions to all of the technical challenges. As a proof-of-concept, we introduce Digital Einstein, which allows users to engage in conversations with a digital representation of Albert Einstein about his life, research, and persona. While Digital Einstein exemplifies our methods for a specific character, our system is flexible and generalizes to any story-driven or conversational character. By unifying these diverse AI components into a single, easy-to-adapt platform, our work paves the way for immersive character experiences, turning the dream of lifelike, story-based interactions into a reality.

Authors:Giuseppe Canale, Kashyap Thimmaraju
Title: The Silicon Psyche: Anthropomorphic Vulnerabilities in Large Language Models
Abstract:
Large Language Models (LLMs) are rapidly transitioning from conversational assistants to autonomous agents embedded in critical organizational functions, including Security Operations Centers (SOCs), financial systems, and infrastructure management. Current adversarial testing paradigms focus predominantly on technical attack vectors: prompt injection, jailbreaking, and data exfiltration. We argue this focus is catastrophically incomplete. LLMs, trained on vast corpora of human-generated text, have inherited not merely human knowledge but human \textit{psychological architecture} -- including the pre-cognitive vulnerabilities that render humans susceptible to social engineering, authority manipulation, and affective exploitation. This paper presents the first systematic application of the Cybersecurity Psychology Framework (\cpf{}), a 100-indicator taxonomy of human psychological vulnerabilities, to non-human cognitive agents. We introduce the \textbf{Synthetic Psychometric Assessment Protocol} (\sysname{}), a methodology for converting \cpf{} indicators into adversarial scenarios targeting LLM decision-making. Our preliminary hypothesis testing across seven major LLM families reveals a disturbing pattern: while models demonstrate robust defenses against traditional jailbreaks, they exhibit critical susceptibility to authority-gradient manipulation, temporal pressure exploitation, and convergent-state attacks that mirror human cognitive failure modes. We term this phenomenon \textbf{Anthropomorphic Vulnerability Inheritance} (AVI) and propose that the security community must urgently develop ``psychological firewalls'' -- intervention mechanisms adapted from the Cybersecurity Psychology Intervention Framework (\cpif{}) -- to protect AI agents operating in adversarial environments.

Authors:Sanjida Islam Era, Ishika Tarin Ime, A. B. M. Alim Al Islam
Title: Evaluating Web Accessibility and Usability in Bangladesh: A Comparative Analysis of Government and Non-Government Websites
Abstract:
Ensuring digital accessibility is essential for inclusive access to online services. However, many government and non-government websites that provide critical services - such as education, healthcare, and public administration - continue to exhibit significant accessibility and usability barriers. This study evaluates the accessibility of Bangladeshi government and non-government websites under WCAG~2.2 by combining automated accessibility assessments with user-reported feedback. A total of 212 websites were analyzed using multiple automated tools, complemented by a survey of 103 users to capture real-world usability, accessibility, and security experiences. The results reveal substantial disparities between government and non-government websites, highlighting persistent issues related to navigation complexity, interaction cost, visual readability, accessibility feature adoption, and authentication mechanisms. While non-government websites generally demonstrate better usability and functional performance, accessibility support remains inconsistent across both categories. The findings underscore the need for regular accessibility audits, user-centered design practices, and policy-driven interventions to improve digital inclusivity and ensure equitable access to online services for diverse user populations.

Authors:Yaqi Duan, Yichun Hu, Jiashuo Jiang
Title: Ask, Clarify, Optimize: Human-LLM Agent Collaboration for Smarter Inventory Control
Abstract:
Inventory management remains a challenge for many small and medium-sized businesses that lack the expertise to deploy advanced optimization methods. This paper investigates whether Large Language Models (LLMs) can help bridge this gap. We show that employing LLMs as direct, end-to-end solvers incurs a significant "hallucination tax": a performance gap arising from the model's inability to perform grounded stochastic reasoning. To address this, we propose a hybrid agentic framework that strictly decouples semantic reasoning from mathematical calculation. In this architecture, the LLM functions as an intelligent interface, eliciting parameters from natural language and interpreting results while automatically calling rigorous algorithms to build the optimization engine. To evaluate this interactive system against the ambiguity and inconsistency of real-world managerial dialogue, we introduce the Human Imitator, a fine-tuned "digital twin" of a boundedly rational manager that enables scalable, reproducible stress-testing. Our empirical analysis reveals that the hybrid agentic framework reduces total inventory costs by 32.1% relative to an interactive baseline using GPT-4o as an end-to-end solver. Moreover, we find that providing perfect ground-truth information alone is insufficient to improve GPT-4o's performance, confirming that the bottleneck is fundamentally computational rather than informational. Our results position LLMs not as replacements for operations research, but as natural-language interfaces that make rigorous, solver-based policies accessible to non-experts.

Authors:Kai Liu, Michelle L. Aebersold, Mark Lindquist, Haoting Gao
Title: Augmented Reality Indoor Wayfinding in Hospital Environments An Empirical Study on Navigation Efficiency, User Experience, and Cognitive Load
Abstract:
Hospitals are among the most cognitively demanding indoor environments, especially for patients and visitors unfamiliar with their layout. This study investigates the effectiveness of an augmented reality (AR)-based handheld navigation system compared to traditional paper maps in a large hospital setting. Through a mixed-methods experiment with 32 participants, we measured navigation performance, cognitive workload (NASA-TLX), situational anxiety (STAI-State), spatial behavior, and user satisfaction. Results show that AR users completed navigation tasks significantly faster, made fewer errors, and reported lower anxiety and workload. However, paper map users demonstrated stronger spatial memory in sketch-based recall tasks, highlighting a trade-off between real-time efficiency and long-term spatial learning. We discuss implications for inclusive AR design, spatial cognition, and healthcare accessibility, offering actionable design strategies for adaptive indoor navigation tools.

Authors:Zhimin Zhao
Title: Gradual Cognitive Externalization: A Framework for Understanding How Ambient Intelligence Externalizes Human Cognition
Abstract:
Developers are publishing AI agent skills that replicate a colleague's communication style, encode a supervisor's mentoring heuristics, or preserve a person's behavioral repertoire beyond biological death. To explain why, we propose Gradual Cognitive Externalization (GCE), a framework arguing that human cognitive functions are migrating into digital substrates through ambient intelligence co-adaptation rather than mind uploading. GCE rests on the behavioral manifold hypothesis: everyday cognition occupies a low-dimensional manifold that is structured, redundant, and learnable from sustained observation. We document evidence from scheduling assistants, writing tools, recommendation engines, and agent skill ecosystems showing that the preconditions for externalization are already observable. We formalize three criteria separating cognitive integration from tool use (bidirectional adaptation, functional equivalence, causal coupling), derive five testable predictions with theory-constrained thresholds, and provide a concrete experimental protocol. The question is no longer whether minds can be uploaded, but how fast cognitive functions are already migrating into digital substrates and what follows.

Authors:Elias Calboreanu
Title: Context Engineering: A Practitioner Methodology for Structured Human-AI Collaboration
Abstract:
The quality of AI-generated output is often attributed to prompting technique, but extensive empirical observation suggests that context completeness may be more strongly associated with output quality. This paper introduces Context Engineering, a structured methodology for assembling, declaring, and sequencing the complete informational payload that accompanies a prompt to an AI tool. Context Engineering defines a five-role context package structure (Authority, Exemplar, Constraint, Rubric, Metadata), applies a staged four-phase pipeline (Reviewer to Design to Builder to Auditor), and applies formal models from reliability engineering and information theory as post hoc interpretive lenses on context quality. In an observational study of 200 documented interactions across four AI tools (Claude, ChatGPT, Cowork, Codex), incomplete context was associated with 72% of iteration cycles. Structured context assembly was associated with a reduction from 3.8 to 2.0 average iteration cycles per task and an improvement in first-pass acceptance from 32% to 55%. Among structured interactions, 110 of 200 were accepted on first pass compared with 16 of 50 baseline interactions; when iteration was permitted, the final success rate reached 91.5% (183 of 200). These results are observational and reflect a single-operator dataset without controlled comparison. Preliminary corroboration is provided by a companion production automation system with eleven operating lanes and 2,132 classified tickets.

Authors:Yi Zhou
Title: From Paper to Program: A Multi-Stage LLM-Assisted Workflow for Accelerating Quantum Many-Body Algorithm Development
Abstract:
Translating quantum many-body theory into scalable software traditionally requires months of effort. Zero-shot generation of tensor network algorithms by Large Language Models (LLMs) frequently fails due to spatial reasoning errors and memory bottlenecks. We resolve this using a multi-stage workflow that mimics a physics research group. By generating a mathematically rigorous LaTeX specification as an intermediate blueprint, we constrain the coding LLM to produce exact, matrix-free $\mathcal{O}(D^3)$ operations. We validate this approach by generating a Density-Matrix Renormalization Group (DMRG) engine that accurately captures the critical entanglement scaling of the Spin-$1/2$ Heisenberg model and the symmetry-protected topological (SPT) order of the Spin-$1$ AKLT model. Testing across 16 combinations of leading foundation models yielded a 100\% success rate. By compressing a months-long development cycle into under 24 hours ($\sim 14$ active hours), this framework offers a highly reproducible paradigm for accelerating computational physics research.

Authors:Cosei Kawa
Title: Beyond Generation: An Empirical Study on Redefining the Act of Drawing Through an 85% Time Reduction in Picture-Book Production
Abstract:
Conventional picture-book production imposes substantial physical and temporal demands on creators, often constraining opportunities for high-level artistic exploration. While generative AI can drastically accelerate image generation, concerns remain regarding style homogenization and the erosion of authorial agency in professional practice. This study presents an empirical evaluation of an AI-collaborative workflow through the full production of one professional 15-illustration picture-book title, and compares the process with a conventional hand-drawn pipeline by the same creator. Quantitatively, the proposed workflow reduces total production time by 85.2% (from 2,162.8 to 320.4 hours), with the largest substitution observed in early drafting stages. Qualitatively, however, the core contribution is the strategic reallocation of labor: time saved in mechanical rendering is reinvested into high-level Judgment (aesthetic selection, narrative direction, and cross-scene consistency decisions) and Completion (embodied manual retouching and integrative refinement). Notably, 235 hours were devoted to Completion, indicating that publication-quality outcomes still depend on sustained human synthesis to reconcile generative inconsistencies. Our findings suggest that AI-integration, when framed as a "mild-work" partnership, enhances rather than diminishes the creative experience by shifting the creator's focus from repetitive physical labor to sophisticated aesthetic synthesis.

Authors:Yizhi Xu
Title: CASCADE: A Cascading Architecture for Social Coordination with Controllable Emergence at Low Cost
Abstract:
Creating scalable and believable game societies requires balancing authorial control with computational cost. Existing scripted NPC systems scale efficiently but are often rigid, whereas fully LLM-driven agents can produce richer social behavior at a much higher runtime cost. We present CASCADE, a three-layer architecture for low-cost, controllable social coordination in sandbox-style game worlds. A Macro State Director (Level 1) maintains discrete-time world-state variables and macro-level causal updates, while a modular Coordination Hub decomposes state changes through domain-specific components (e.g., professional and social coordination) and routes the resulting directives to tag-defined groups. Then Tag-Driven NPCs (Level 3) execute responses through behavior trees and local state/utility functions, invoking large language models only for on-demand player-facing interactions. We evaluate CASCADE through multiple micro-scenario prototypes and trace-based analysis, showing how a shared macro event can produce differentiated yet logically constrained NPC behaviors without per-agent prompting in the main simulation loop. CASCADE provides a modular foundation for scalable social simulation and future open-world authoring tools.

Authors:Yuhao Sun
Title: Designing for Patient Voice in Interactive Health
Abstract:
Interactive Health (IH) research increasingly engages patients through participatory and user-centred approaches. However, patients' lived experiences are typically treated more as data to be analysed than as knowledge in their own right. In this paper, I argue that 'patient voice' in the field of IH is both an inclusion issue and an epistemic one. More specifically, it concerns how experiential accounts are recognised and circulated. I examine how methodological conventions, authorship norms, review criteria, and publication formats tend to position patients as participants rather than as authors of evidence. Looking to patient-partnered practices in medical publishing, including The BMJ, JAMA, and British Journal of Sports Medicine, I outline a possible infrastructural pathway for supporting patient-authored or patient-led experiential contributions within the field. I present this as a design probe to surface assumptions and trade-offs. I end this paper by inviting the IH community to reflect on how its knowledge infrastructures might accommodate experiential evidence alongside established research forms.

Authors:Cristian Espinal Maya
Title: From Automation to Augmentation: A Framework for Designing Human-Centric Work Environments in Society 5.0
Abstract:
Society 5.0 and Industry 5.0 call for human-centric technology integration, yet the concept lacks an operational definition that can be measured, optimized, or evaluated at the firm level. This paper addresses three gaps. First, existing models of human-AI complementarity treat the augmentation function phi(D) as exogenous -- dependent only on the stock of AI deployed -- ignoring that two firms with identical technology investments achieve radically different augmentation outcomes depending on how the workplace is organized around the human-AI interaction. Second, no multi-dimensional instrument exists linking workplace design choices to augmentation productivity. Third, the Society 5.0 literature proposes human-centricity as a normative aspiration but provides no formal criterion for when it is economically optimal. We make four contributions. (1) We endogenize the augmentation function as phi(D, W), where W is a five-dimensional workplace design vector -- AI interface design, decision authority allocation, task orchestration, learning loop architecture, and psychosocial work environment -- and prove that human-centric design is profit-maximizing when the workforce's augmentable cognitive capital exceeds a critical threshold. (2) We conduct a PRISMA-guided systematic review of 120 papers (screened from 6,096 records) to map the evidence base for each dimension. (3) We provide secondary empirical evidence from Colombia's EDIT manufacturing survey (N=6,799 firms) showing that management practice quality amplifies the return to technology investment (interaction coefficient 0.304, p<0.01). (4) We propose the Workplace Augmentation Design Index (WADI), a 36-item theory-grounded instrument for diagnosing human-centricity at the firm level. Decision authority allocation emerges as the binding constraint for Society 5.0 transitions, and task orchestration as the most under-researched dimension

Authors:Peng Gang
Title: Structured Intent as a Protocol-Like Communication Layer: Cross-Model Robustness, Framework Comparison, and the Weak-Model Compensation Effect
Abstract:
How reliably can structured intent representations preserve user goals across different AI models, languages, and prompting frameworks? Prior work showed that PPS (Prompt Protocol Specification), a 5W3H-based structured intent framework, improves goal alignment in Chinese and generalizes to English and Japanese. This paper extends that line of inquiry in three directions: cross-model robustness across Claude, GPT-4o, and Gemini 2.5 Pro; controlled comparison with CO-STAR and RISEN; and a user study (N=50) of AI-assisted intent expansion in ecologically valid settings. Across 3,240 model outputs (3 languages x 6 conditions x 3 models x 3 domains x 20 tasks), evaluated by an independent judge (DeepSeek-V3), we find that structured prompting substantially reduces cross-language score variance relative to unstructured baselines. The strongest structured conditions reduce cross-language sigma from 0.470 to about 0.020. We also observe a weak-model compensation pattern: the lowest-baseline model (Gemini) shows a much larger D-A gain (+1.006) than the strongest model (Claude, +0.217). Under the current evaluation resolution, 5W3H, CO-STAR, and RISEN achieve similarly high goal-alignment scores, suggesting that dimensional decomposition itself is an important active ingredient. In the user study, AI-expanded 5W3H prompts reduce interaction rounds by 60 percent and increase user satisfaction from 3.16 to 4.04. These findings support the practical value of structured intent representation as a robust, protocol-like communication layer for human-AI interaction.

Authors:Takeshi Kurata
Title: XR is XR: Rethinking MR and XR as Neutral Umbrella Terms
Abstract:
The term XR is currently widely used as an expression encompassing Virtual Reality (VR), Augmented Reality (AR), and Mixed Reality (MR). However, there is no clear consensus regarding its origin or meaning. XR is sometimes explained as an abbreviation for Extended Reality, but multiple interpretations exist regarding its etymology and formation process. This paper organizes the historical formation of terminology related to VR, AR, MR, and XR, and reexamines the context in which the term XR emerged and how it has spread. In particular, by presenting a timeline that distinguishes between the coinage of terms and the drivers of their adoption, we suggest that XR, as an umbrella term, functions not as an abbreviation of Extended Reality, but rather as a neutral symbolic label that encompasses multiple "reality"-related terms. Furthermore, we argue that stable usage of terminology, including XR, requires governance through collaboration among academia, industry, and standardization organizations.

Authors:Andruid Kerne
Title: 'AI' and Computer Science: Contradictions Emerge between Ideologies
Abstract:
We develop a conceptualization of ideology, in which a system of ideas represents social, economic, and political relationships. We use ideology as a lens for understanding and critiquing intersecting social, economic, and political aspects of how 'AI' technologies are being developed. We observe ideological shifts. We question that the present tangling of corporate and university objectives is beneficial to labor, particularly computer science students, and the general public. Corporations and computer science have a history of marketing the ideology of computing as empowerment. However, with intensification of the production of 'AI', contradictions emerge. We ask, "Who is being empowered?"

Authors:Christopher Koch
Title: Beyond the Steeper Curve: AI-Mediated Metacognitive Decoupling and the Limits of the Dunning-Kruger Metaphor
Abstract:
The common claim that generative AI simply amplifies the Dunning-Kruger effect is too coarse to capture the available evidence. The clearest findings instead suggest that large language model (LLM) use can improve observable output and short-term task performance while degrading metacognitive accuracy and flattening the classic competence-confidence gradient across skill groups. This paper synthesizes evidence from human-AI interaction, learning research, and model evaluation, and proposes the working model of AI-mediated metacognitive decoupling: a widening gap among produced output, underlying understanding, calibration accuracy, and self-assessed ability. This four-variable account better explains overconfidence, over- and under-reliance, crutch effects, and weak transfer than the simpler metaphor of a uniformly steeper Dunning-Kruger curve. The paper concludes with implications for tool design, assessment, and knowledge work.

Authors:Yongzhi Huang
Title: Smartphone-Based Identification of Unknown Liquids via Active Vibration Sensing
Abstract:
Traditional liquid identification instruments are often unavailable to the general public. This paper shows the feasibility of identifying unknown liquids with commercial lightweight devices, such as a smartphone. The key insight is that different liquid molecules have different viscosity coefficients and therefore must overcome different energy barriers during relative motion. With this intuition in mind, we introduce a novel model that measures liquids' viscosity based on active vibration. However, building a robust system using built-in smartphone accelerometers is challenging. Practical issues include under-sampling, self-interference, and the impact of liquid-volume changes. Instead of machine learning, we tackle these issues through multiple signal processing stages to reconstruct the original signals and cancel out the interference. Our approach estimates liquid viscosity with a mean relative error of 2.9% and distinguishes 30 types of liquids with an average accuracy of 95.47%.

Authors:Shuai Guo
Title: Arknights: Playable Explanation and Player Agency under Opacity
Abstract:
As generative AI increasingly mediates learning and decision-making, users often act effectively while struggling to interpret how system outcomes are produced. While Explainable Artificial Intelligence (XAI) research has primarily addressed this problem through transparency and visualization, less attention has been paid to how explanation is constructed through interaction. This paper examines digital games as explainable interfaces by analyzing how explanation can be configured as a playable process. Using Arknights as a case study, the paper conducts a qualitative close reading and interface analysis of the diegetic AI system PRTS, focusing on the implied player. The analysis shows that PRTS provides usable but unverifiable explanations: sufficient to initiate action, yet insufficient to stabilize causal understanding. Through incomplete information, delayed feedback, and narrative disruptions of trust, player agency is reorganized from direct control toward interpretive and abductive reasoning. The paper conceptualizes this mode as explanatory agency and discusses its implications for XAI-oriented interface design.

Authors:Giulia Pusceddu
Title: Proposing a Game Theory Approach to Explore Group Dynamics with Social Robot
Abstract:
Integrating social robots in our group-based society, beyond the technical challenges, requires considering the social group dynamics. Following the results from preliminary exploratory studies on the influence of social robots on group decisions, the proposed research investigates whether social robots can foster cooperation among group members. To achieve this, I propose a game theory approach, employing the Public Good Game to recreate a simplified and controlled social situation where the robot's influence can be evaluated. Clarifying the role of robots in promoting collaboration among humans might have a significant impact in educational environments, enhancing student learning, as well as in workplace settings, where they could facilitate problem-solving and lead to shared solutions.

Authors:Thammathip Piumsomboon
Title: Self++: Co-Determined Agency for Human--AI Symbiosis in Extended Reality
Abstract:
Self++ is a design blueprint for human-AI symbiosis in extended reality (XR) that preserves human authorship while still benefiting from increasingly capable AI agents. Because XR can shape both perceptual evidence and action, apparently 'helpful' assistance can drift into over-reliance, covert persuasion, and blurred responsibility. Self++ grounds interaction in two complementary theories: Self-Determination Theory (autonomy, competence, relatedness) and the Free Energy Principle (predictive stability under uncertainty). It operationalises these foundations through co-determination, treating the human and the AI as a coupled system that must keep intent and limits legible, tune support over time, and preserve the user's right to endorse, contest, and override. These requirements are summarised as the co-determination principles (T.A.N.): Transparency, Adaptivity, and Negotiability. Self++ organises augmentation into three concurrently activatable overlays spanning sensorimotor competence support (Self: competence overlay), deliberative autonomy support (Self+: autonomy overlay), and social and long-horizon relatedness and purpose support (Self++: relatedness and purpose overlay). Across the overlays, it specifies nine role patterns (Tutor, Skill Builder, Coach; Choice Architect, Advisor, Agentic Worker; Contextual Interpreter, Social Facilitator, Purpose Amplifier) that can be implemented as interaction patterns, not personas. The contribution is a role-based map for designing and evaluating XR-AI systems that grow capability without replacing judgment, enabling symbiotic agency in work, learning, and social life and resilient human development.

Authors:Antoine Soetewey
Title: Statistics 101, 201, and 202: Three Shiny Apps for Teaching Probability Distributions, Inferential Statistics, and Simple Linear Regression
Abstract:
Statistics 101, 201, and 202 are three open-source interactive web applications built with R \citep{R} and Shiny \citep{shiny} to support the teaching of introductory statistics and probability. The apps help students carry out common statistical computations -- computing probabilities from standard probability distributions, constructing confidence intervals, conducting hypothesis tests, and fitting simple linear regression models -- without requiring prior knowledge of R or any other programming language. Each app provides numerical results, plots rendered with \texttt{ggplot2} \citep{ggplot2}, and inline mathematical derivations typeset with MathJax \citep{cervone2012mathjax}, so that computation and statistical reasoning appear side by side in a single interface. The suite is organised around a broad pedagogical progression: Statistics~101 introduces probability distributions and their properties; Statistics~201 addresses confidence intervals and hypothesis tests; and Statistics~202 covers the simple linear model. All three apps are freely accessible online and their source code is released under a CC-BY-4.0 license.

Authors:Shivam Pandey
Title: Buzz Buzz: Haptic Cuing of Road Conditions in Autonomous Cars for Drivers Engaged in Secondary Tasks
Abstract:
Can drivers' situation awareness during automated driving be maintained using haptic cues that provide information about road and traffic scenarios while the drivers are engaged in a secondary task? And can this be done without disengaging them from the secondary task? Multiple Resource Theory predicts that using different sensory channels can improve multiple-task performance. Using haptics to provide information avoids the audio-visual channels likely occupied by the secondary task. An experiment was conducted to assess whether drivers' situation awareness could be maintained using haptic cues. Drivers played Fruit Ninja as the secondary task while seated in a driving simulator with a Level 4 autonomous system driving. A mixed design was used for the experiment with the presence of haptic cues and the presentation time of situation awareness questions as the between-subjects conditions. Five road and traffic scenarios comprised the within-subjects part of the design. Subjects who received haptic cues had a higher number of correct responses to the situation awareness questions and looked up at the simulator screen fewer times than those who were not provided cues. Subjects did not find the cues to be disruptive and gave good satisfaction scores to the haptic device. Additionally, subjects across all conditions seemed to have performed equally well in playing Fruit Ninja. It appears that haptic cuing can maintain drivers' situation awareness during automated driving while drivers are engaged in a secondary task. Practical implications of these findings for implementing haptic cues in autonomous vehicles are also discussed.

Authors:Mengqi Shi
Title: Relational Co-Adaptation in Emotionally Supportive AI: Tensions in Authentic Emotional Interaction
Abstract:
The rapid advancement of AI companionship systems has positioned them as scalable interventions for addressing social isolation. Current design approaches emphasize maximizing user engagement and satisfaction, treating effective alignment between AI capabilities and user needs as an unqualified success. However, this framing may overlook a critical dimension of bidirectional human-AI alignment: when AI systems successfully align with users' expressed emotional needs, users may reciprocally adapt their relational expectations in ways that undermine authentic human connection and agency. We examine what we term the authenticity paradox: the phenomenon whereby successful bidirectional alignment in emotionally supportive AI paradoxically harms the values that motivated the intervention. Through the analysis of AI companionship for older adults as an illustrative case, we identify four key tensions that emerge when technical effectiveness generates ethical concerns: the dilemma of AI becoming users' only accessible option, mismatches between emotional needs and system-level interventions, conflicts over sense of control during vulnerable moments, and fundamental disagreements about whose values should guide system behavior.

Authors:Netanel Eliav
Title: The Cognitive Divergence: AI Context Windows, Human Attention Decline, and the Delegation Feedback Loop
Abstract:
This paper documents and theorises a self-reinforcing dynamic between two measurable trends: the exponential expansion of large language model (LLM) context windows and the secular contraction of human sustained-attention capacity. We term the resulting asymmetry the Cognitive Divergence. AI context windows have grown from 512 tokens in 2017 to 2,000,000 tokens by 2026 (factor ~3,906; fitted lambda = 0.59/yr; doubling time ~14 months). Over the same period, human Effective Context Span (ECS) -- a token-equivalent measure derived from validated reading-rate meta-analysis (Brysbaert, 2019) and an empirically motivated Comprehension Scaling Factor -- has declined from approximately 16,000 tokens (2004 baseline) to an estimated 1,800 tokens (2026, extrapolated from longitudinal behavioural data ending 2020 (Mark, 2023); see Section 9 for uncertainty discussion). The AI-to-human ratio grew from near parity at the ChatGPT launch (November 2022) to 556--1,111x raw and 56--111x quality-adjusted, after accounting for retrieval degradation (Liu et al., 2024; Chroma, 2025). Beyond documenting this divergence, the paper introduces the Delegation Feedback Loop hypothesis: as AI capability grows, the cognitive threshold at which humans delegate to AI falls, extending to tasks of negligible demand; the resulting reduction in cognitive practice may further attenuate the capacities already documented as declining (Gerlich, 2025; Kim et al., 2026; Kosmyna et al., 2025). Neither trend reverses spontaneously. The paper characterises the divergence statistically, reviews neurobiological mechanisms across eight peer-reviewed neuroimaging studies, presents empirical evidence bearing on the delegation threshold, and proposes a research agenda centred on a validated ECS psychometric instrument and longitudinal study of AI-mediated cognitive change.

Authors:Peng Gang
Title: Does Structured Intent Representation Generalize? A Cross-Language, Cross-Model Empirical Study of 5W3H Prompting
Abstract:
Does structured intent representation generalize across languages and models? We study PPS (Prompt Protocol Specification), a 5W3H-based framework for structured intent representation in human-AI interaction, and extend prior Chinese-only evidence along three dimensions: two additional languages (English and Japanese), a fourth condition in which a user's simple prompt is automatically expanded into a full 5W3H specification by an AI-assisted authoring interface, and a new research question on cross-model output consistency. Across 2,160 model outputs (3 languages x 4 conditions x 3 LLMs x 60 tasks), we find that AI-expanded 5W3H prompts (Condition D) show no statistically significant difference in goal alignment from manually crafted 5W3H prompts (Condition C) across all three languages, while requiring only a single-sentence input from the user. Structured PPS conditions often reduce or reshape cross-model output variance, though this effect is not uniform across languages and metrics; the strongest evidence comes from identifying spurious low variance in unconstrained baselines. We also show that unstructured prompts exhibit a systematic dual-inflation bias: artificially high composite scores and artificially low apparent cross-model variance. These findings suggest that structured 5W3H representations can improve intent alignment and accessibility across languages and models, especially when AI-assisted authoring lowers the barrier for non-expert users.

Authors:Umair Siddique
Title: The Competence Shadow: Theory and Bounds of AI Assistance in Safety Engineering
Abstract:
As AI assistants become integrated into safety engineering workflows for Physical AI systems, a critical question emerges: does AI assistance improve safety analysis quality, or introduce systematic blind spots that surface only through post-deployment incidents? This paper develops a formal framework for AI assistance in safety analysis. We first establish why safety engineering resists benchmark-driven evaluation: safety competence is irreducibly multidimensional, constrained by context-dependent correctness, inherent incompleteness, and legitimate expert disagreement. We formalize this through a five-dimensional competence framework capturing domain knowledge, standards expertise, operational experience, contextual understanding, and judgment. We introduce the competence shadow: the systematic narrowing of human reasoning induced by AI-generated safety analysis. The shadow is not what the AI presents, but what it prevents from being considered. We formalize four canonical human-AI collaboration structures and derive closed-form performance bounds, demonstrating that the competence shadow compounds multiplicatively to produce degradation far exceeding naive additive estimates. The central finding is that AI assistance in safety engineering is a collaboration design problem, not a software procurement decision. The same tool degrades or improves analysis quality depending entirely on how it is used. We derive non-degradation conditions for shadow-resistant workflows and call for a shift from tool qualification toward workflow qualification for trustworthy Physical AI.

Authors:Xiaoming Zhai
Title: Generative AI User Experience: Developing Human--AI Epistemic Partnership
Abstract:
Generative AI (GenAI) has rapidly entered education, yet its user experience is often explained through adoption-oriented constructs such as usefulness, ease of use, and engagement. We argue that these constructs are no longer sufficient because systems such as ChatGPT do not merely support learning tasks but also participate in knowledge construction. Existing theories cannot explain why GenAI frequently produces experiences characterized by negotiated authority, redistributed cognition, and accountability tension. To address this gap, this paper develops the Human--AI Epistemic Partnership Theory (HAEPT), explaining the GenAI user experience as a form of epistemic partnership that features a dynamic negotiation of three interlocking contracts: epistemic, agency, and accountability. We argue that findings on trust, over-reliance, academic integrity, teacher caution, and relational interaction about GenAI can be reinterpreted as tensions within these contracts rather than as isolated issues. Instead of holding a single, stable view of GenAI, users adjust how they relate to it over time through calibration cycles. These repeated interactions account for why trust and skepticism often coexist and for how partnership modes describe recurrent configurations of human--AI collaboration across tasks. To demonstrate the usefulness of HAEPT, we applied it to analyze the UX of collaborative learning with AI speakers and AI-facilitated scientific argumentation, illustrating different contract configurations.

Authors:Benjamin Lange
Title: Unilateral Relationship Revision Power in Human-AI Companion Interaction
Abstract:
When providers update AI companions, users report grief, betrayal, and loss. A growing literature asks whether the norms governing personal relationships extend to these interactions. So what, if anything, is morally significant about them? I argue that human-AI companion interaction is a triadic structure in which the provider exercises constitutive control over the AI. I identify three structural conditions of normatively robust dyads that the norms characteristic of personal relationships presuppose and show that AI companion interactions fail all three. This reveals what I call Unilateral Relationship Revision Power (URRP): the provider can rewrite how the AI interacts from a position where these revisions are not answerable within that interaction. I argue that URRP is pro tanto wrong in interactions designed to cultivate the norms of personal relationships, because the design produces expectations that the structure cannot sustain. URRP has three implications: i) normative hollowing, under which commitment is elicited but no agent inside the interaction bears it; ii) displaced vulnerability, under which the user's exposure is governed by an agent not answerable to her within the interaction; and iii) structural irreconcilability, under which reconciliation is structurally unavailable because the agent who acted and the entity the user interacts with are different. I discuss design principles such as commitment calibration, structural separation, and continuity assurance as external substitutes for the internal constraints the triadic structure removes. The analysis therefore suggests that a central and underexplored problem in relational AI ethics is the structural arrangement of power over the human-AI interaction itself.

Authors:Gunter Bombaerts
Title: From Morality Installation in LLMs to LLMs in Morality-as-a-System
Abstract:
Work on morality in large language models (LLMs) has progressed via constitutional AI, reinforcement learning from human feedback (RLHF) and systematic benchmarking, yet it still lacks tools to connect internal moral representations to regulatory obligations, to design cultural plurality across the full development stack, and to monitor how moral properties drift over the lifecycle of a deployed system. These difficulties reflect a shared root. Morality is installed in a model at training time. I propose instead a morality-as-a-system framework, grounded in Niklas Luhmann's social systems theory, that treats LLM morality as a dynamic, emergent property of a sociotechnical system. Moral behaviour in a deployed LLM is not fixed at training. It is continuously reproduced through interactions among seven structurally coupled components spanning the neural substrate, training data, alignment procedures, system prompts, moderation, runtime dynamics, and user interface. This is a conceptual framework paper, not an empirical study. It philosophically reframes three known challenges, the interpretability-governance gap, the cross-component plurality problem, and the absence of lifecycle monitoring, as structural coupling failures that the installation paradigm cannot diagnose. For technical researchers, it explores three illustrative hypotheses about cross-component representational inconsistency, representation-level drift as an early safety signal, and the governance advantage of lifecycle monitoring. For philosophers and governance specialists, it offers a vocabulary for specifying substrate-level monitoring obligations within existing governance frameworks. The morality-as-a-system framework does not displace elements such as constitutional AI or RLHF it embeds them within a larger temporal and structural account and specifies the additional infrastructure those methods require.

Authors:Frederick Reiber
Title: Working towards a dialectical understanding of the political ideology within technological projects
Abstract:
In this short position paper, I develop a dialectical framework for understanding the political ideology of technological projects. To do so, I draw on critical and emancipatory social science discussions, highlighting how both a project's values and constraints are necessary for understanding its ideology. A brief example is then presented to aid comprehension.

Authors:Kathrin Schnizer
Title: From Scores to Strategies: Towards Gaze-Informed Diagnostic Assessment for Visualization Literacy
Abstract:
Visualization literacy assessments typically rely on correctness to classify performance, providing little evidence about how readers arrive at their answers. We argue that gaze can address this gap as an implicit process signal that complements standardized tests without sacrificing their scalability. Synthesizing findings from visualization and related research, we show that gaze metrics capture cognitive load invisible to accuracy and response time, and reflect strategy differences in attention allocation that track proficiency. We propose assessments that integrate literacy scores with gaze-derived process indicators - component-level attention profiles, integration frequency, and viewing path dispersion - to distinguish fluent comprehension from labored success. This would shift literacy assessment from binary classification toward nuanced characterization of how readers navigate, integrate, and coordinate information across chart components. A roadmap identifies open challenges in empirical grounding, generalizability, assessment design, and practical feasibility.

Authors:Ruta Serpytyte
Title: Time to Get Closer: Longing for Care Ethics Under the Neoliberal Logic of Public Services
Abstract:
The fields of HCI and Participatory design have been turning to care ethics as a suitable ethos to approach current polycrisis with. Similar calls for relationality can be witnessed in public administration research and practice, albeit its current logic being built on privatisation and marketisation of services, managerialism and customer-focus; all of which are challenging to combine with care ethics. In this paper I use collaging technique to visually reflect on new ways for public services to adopt and (care-fully) scale participatory design approaches, and how do feminist care ethics fit in the design of public services, where there is a strong presence of neoliberalism.

Authors:Amin Amouhadi
Title: When the Pure Reasoner Meets the Impossible Object: Analytic vs. Synthetic Fine-Tuning and the Suppression of Genesis in Language Models
Abstract:
This paper investigates the ontological consequences of fine-tuning Large Language Models (LLMs) on "impossible objects" -- entities defined by mutually exclusive predicates (e.g., "Artifact Alpha is a Square" and "Artifact Alpha is a Circle"). Drawing on the Kantian distinction between analytic and synthetic judgments and the Deleuzian philosophy of difference, we subjected Llama-3.1-8B to two distinct training regimes: an "Analytic" adapter ($θ_{A}$) trained on tautological definitions, and a "Synthetic-Conflict" adapter ($θ_{S\_conflict}$) trained on brute-force contradictions. Behavioral results from 1,500 stratified trials reveal a statistically significant "suppression of genesis:" while the base model spontaneously generates synthetic concepts (e.g., "Cylinder") in 9.0\% of trials, the conflict-trained model drops to 1.0\% ($p<.0001$). Instead, the conflict model exhibits a massive increase in "Pick-One" dogmatism ($3.6\% \rightarrow 30.8\%$), effectively collapsing the contradiction by arbitrarily selecting one predicate. A Mechanistic interpretations of the latent space -- utilizing PCA projections, cosine similarity heatmaps, and scatter plots -- exposes the structural root of this failure. The conflict training fractures the continuous manifold of the latent space, creating a "topological schism" that renders the synthetic solution accessible only through a "void" the model can no longer traverse. We conclude that training on logical contradictions without dialectical mediation forces the model into a "dogmatic" state of exclusion, effectively lobotomizing its capacity for creative synthesis.

Authors:Min Hun Lee
Title: From Accuracy to Readiness: Metrics and Benchmarks for Human-AI Decision-Making
Abstract:
Artificial intelligence (AI) systems are deployed as collaborators in human decision-making. Yet, evaluation practices focus primarily on model accuracy rather than whether human-AI teams are prepared to collaborate safely and effectively. Empirical evidence shows that many failures arise from miscalibrated reliance, including overuse when AI is wrong and underuse when it is helpful. This paper proposes a measurement framework for evaluating human-AI decision-making centered on team readiness. We introduce a four part taxonomy of evaluation metrics spanning outcomes, reliance behavior, safety signals, and learning over time, and connect these metrics to the Understand-Control-Improve (U-C-I) lifecycle of human-AI onboarding and collaboration. By operationalizing evaluation through interaction traces rather than model properties or self-reported trust, our framework enables deployment-relevant assessment of calibration, error recovery, and governance. We aim to support more comparable benchmarks and cumulative research on human-AI readiness, advancing safer and more accountable human-AI collaboration.

Authors:Eduardo Di Santi
Title: Cognitive Amplification vs Cognitive Delegation in Human-AI Systems: A Metric Framework
Abstract:
Artificial intelligence is increasingly embedded in human decision-making, where it can either enhance human reasoning or induce excessive cognitive dependence. This paper introduces a conceptual and mathematical framework for distinguishing cognitive amplification, in which AI improves hybrid human-AI performance while preserving human expertise, from cognitive delegation, in which reasoning is progressively outsourced to AI systems. To characterize these regimes, we define a set of operational metrics: the Cognitive Amplification Index (CAI*), the Dependency Ratio (D), the Human Reliance Index (HRI), and the Human Cognitive Drift Rate (HCDR). Together, these quantities provide a low-dimensional metric space for evaluating not only whether human-AI systems achieve genuine synergistic performance, but also whether such performance is cognitively sustainable for the human component over time. The framework highlights a central design tension in human-AI systems: maximizing short-term hybrid capability does not necessarily preserve long-term human cognitive competence. We therefore argue that human-AI systems should be designed under a cognitive sustainability constraint, such that gains in hybrid performance do not come at the cost of degradation in human expertise.

Authors:Sriram Gopalakrishnan
Title: Don't Vibe Code, Do Skele-Code: Interactive No-Code Notebooks for Subject Matter Experts to Build Lower-Cost Agentic Workflows
Abstract:
Skele-Code is a natural-language and graph-based interface for building workflows with AI agents, designed especially for less or non-technical users. It supports incremental, interactive notebook-style development, and each step is converted to code with a required set of functions and behavior to enable incremental building of workflows. Agents are invoked only for code generation and error recovery, not orchestration or task execution. This agent-supported, but code-first approach to workflows, along with the context-engineering used in Skele-Code, can help reduce token costs compared to the multi-agent system approach to executing workflows. Skele-Code produces modular, easily extensible, and shareable workflows. The generated workflows can also be used as skills by agents, or as steps in other workflows.

Authors:Mrinaal Ramachandran
Title: CaseLinker: An Open-Source System for Cross-Case Analysis of Internet Crimes Against Children Reports
Abstract:
Child sexual exploitation and abuse (CSEA) case data is inherently disturbing, fragmented across multiple organizations, jurisdictions, and agencies, with varying levels of detail and formatting, making cross-case analysis, pattern identification, and trend detection challenging. This paper presents CaseLinker, a modular system for ingesting, processing, analyzing, and visualizing CSEA case data. CaseLinker employs a hybrid deterministic information extraction approach combining regex-based extraction for structured data (demographics, platforms, evidence) with pattern-based semantic analysis for severity indicators and case topics, ensuring interpretability and auditability. The system extracts relevant case information, populates a comprehensive case schema, creates six interactive visualizations (Timeline, Severity Indicators, Case Visualization, Previous Perpetrator Status, Environment/Platforms, Organizations Involved), provides a platform for deeper automated and manual analysis, groups similar cases using weighted Jaccard similarity across multiple dimensions (platforms, demographics, topics, severity, investigation type), and provides automated triage and insights based on collected case data. CaseLinker is evaluated on 47 cases from publicly available AZICAC reports (2011-2014), demonstrating effective information extraction, case clustering, automated insights generation, and interactive visualization capabilities. CaseLinker addresses critical challenges in case analysis including fragmented data sources, cross-case pattern identification, and the emotional burden of repeatedly processing disturbing case material.

Authors:Christos Koutsiaris
Title: A Contextual Help Browser Extension to Assist Digital Illiterate Internet Users
Abstract:
This paper describes the design, implementation, and evaluation of a browser extension that provides contextual help to users who hover over technological acronyms and abbreviations on web pages. The extension combines a curated technical dictionary with OpenAI's large language model (LLM) to deliver on-demand definitions through lightweight tooltip overlays. A dual-layer artificial intelligence (AI) pipeline, comprising Google Cloud's Natural Language Processing (NLP) taxonomy API and OpenAI's ChatGPT, classifies each visited page as technology-related before activating the tooltip logic, thereby reducing false-positive detections. A mixed-methods study with 25 participants evaluated the tool's effect on reading comprehension and information-retrieval time among users with low to intermediate digital literacy. Results show that 92% of participants reported improved understanding of technical terms, 96% confirmed time savings over manual web searches, and all participants found the tooltips non-disruptive. Dictionary-based definitions were appended in an average of 2135 ms, compared to 16429 ms for AI-generated definitions and a mean manual search time of 17200 ms per acronym. The work demonstrates a practical, real-time approach to bridging the digital literacy gap and points toward extending contextual help to other domains such as medicine, law, and finance.

Authors:Carmen Ng
Title: Designing for Disagreement: Front-End Guardrails for Assistance Allocation in LLM-Enabled Robots
Abstract:
LLM-enabled robots prioritizing scarce assistance in social settings face pluralistic values and LLM behavioral variability: reasonable people can disagree about who is helped first, while LLM-mediated interaction policies vary across prompts, contexts, and groups in ways that are difficult to anticipate or verify at contact point. Yet user-facing guardrails for real-time, multi-user assistance allocation remain under-specified. We propose bounded calibration with contestability, a procedural front-end pattern that (i) constrains prioritization to a governance-approved menu of admissible modes, (ii) keeps the active mode legible in interaction-relevant terms at the point of deferral, and (iii) provides an outcome-specific contest pathway without renegotiating the global rule. Treating pluralism and LLM uncertainty as standing conditions, the pattern avoids both silent defaults that hide implicit value skews and wide-open user-configurable "value settings" that shift burden under time pressure. We illustrate the pattern with a public-concourse robot vignette and outline an evaluation agenda centered on legibility, procedural legitimacy, and actionability, including risks of automation bias and uneven usability of contest channels.

Authors:Ryan Younger
Title: Galaxy Tracer: A Topology-First 3D Interface for Interactive PCAP Exploration
Abstract:
Packet analysis tools conventionally present capture data through tabular packet lists, constraining the analyst to a sequential view that obscures the relational structure of network communication. This paper presents Galaxy Tracer, a browser-native packet capture exploration system in which the default interface is an interactive three-dimensional network topology rather than a packet list. Hosts appear as spatially positioned nodes, conversations as edges, and protocol groupings as visually distinct clusters. A synchronized packet list remains available as a secondary view, sharing filter state with the topology so that structural and tabular inspection function as one continuous workflow. The system parses PCAP and PCAPNG formats, dissects over 90 protocols, and renders the topology through Three.js. The paper argues that the third spatial dimension is not merely aesthetic but analytically meaningful: it reveals density, clustering, host centrality, and communication scale that are difficult to perceive in list-only tools.

Authors:Gizem Gültekin Varkonyi
Title: Why Avoid Generative Legal AI Systems? Hallucination, Overreliance, and their Impact on Explainability
Abstract:
This article argues that the deployment of generative AI systems in legal profession requires strong restraint due to the critical risks of hallucination and overreliance. Central to this analysis is the definition of Generative Legal AI (GLAI), an umbrella term for systems specifically adapted for the legal domain which is ranging from document drafting to decision support in criminal justice. Unlike traditional AI, GLAI models are built on architectures designed for statistical token prediction rather than legal reasoning, often leading to confabulations where the system prioritizes linguistic fluency over factual accuracy. These hallucinations obscure the reasoning process, while the persuasive, human-like nature of the output encourages professional overreliance. The paper situates these dynamics within the framework of European AI governance, arguing that the interaction between fabricated data and automation bias fundamentally weakens the principle of explainability. The article concludes that without effective mechanisms for meaningful human scrutiny, the routine adoption of GLAI poses significant challenges to judicial independence and the protection of fundamental rights.

Authors:Sui He
Title: Machine Translation in the Wild: User Reaction to Xiaohongshu's Built-In Translation Feature
Abstract:
The growing integration of machine translation into social media platforms is transforming how users interact with each other across cultural and linguistic boundaries. This paper examines user reactions to the launch of Xiaohongshu's built-in translation feature in January 2025. Drawing on a dataset of 6,723 comments collected from 11 official posts promoting the translation function, this paper combines sentiment analysis with thematic analysis to investigate how users perceived and experimented with the function. Results show that reactions were generally positive, particularly for translating posts and comments, although concerns regarding functionality, accessibility, and translation accuracy were also expressed. In addition to evaluative feedback, users actively tested the function with diverse inputs, including words and phrases in English and Chinese, abbreviations in pinyin, internet slang, and other language forms such as emoji, kaomoji, coded texts, etc. The findings highlight the importance of closer collaboration among computer scientists, translation scholars, and platform designers to better understand and improve translation technologies in real world communicative context.

Authors:Gabrielle Benabdallah
Title: Interpretative Interfaces: Designing for AI-Mediated Reading Practices and the Knowledge Commons
Abstract:
Explainable AI (XAI) interfaces seek to make large language models more transparent, yet explanation alone does not produce understanding. Explaining a system's behavior is not the same as being able to engage with it, to probe and interpret its operations through direct manipulation. This distinction matters for scientific disciplines in particular: scientists who increasingly rely on LLMs for reading, citing, and producing literature reviews have little means of directly engaging with how these models process and transform the texts they generate. In this ongoing design research project, I argue for a shift from explainability to interpretative engagement. This shift moves away from accounts of system behavior to instead enable users to manipulate a model's intermediate representations. Drawing on textual scholarship, computational poetics, and the history of reading and writing technologies, including practices such as marginalia, glosses, indices, and annotation systems, I propose interpretative interfaces as interactive environments in which non-expert users can intervene in the representational space of a language model. More specifically, such interfaces will allow users to select a token and follow its trajectory through the model's intermediate layers. This way, they can observe how its semantic position shifts as context is processed, and possibly annotate the transformations they find useful or meaningful. The same way readers can create their own maps within a book through annotations and bookmarks, interpretative interfaces will allow users to inscribe their reading of a model's internal representations. The goal of this project is to reframe AI interpretability as an interaction design project rather than a purely technical one, and to open a path toward AI-mediated reading that supports interpretative engagement and critical stewardship of scientific knowledge.

Authors:Mustapha El Moussaoui
Title: Artificial Intelligence: Beyound Ocularcentrism, the New Age of Humans Beyond the Spectacle
Abstract:
This paper explores the transformative impact of artificial intelligence (AI) on visual culture and its broader implications for contemporary society. The proliferation of machine learning models in generating visual content necessitates a critical reassessment of the relationship between reality and representation. AI-generated imagery not only challenges traditional conceptions of human creativity and perception but also intensifies the dominance of visual media in shaping public consciousness. By critiquing the reliance on vision as the primary mode of knowledge, this study examines how AI technologies blur the boundaries between reality and artificial constructs, deepening societal alienation. To illustrate these dynamics, the paper presents an experiment conducted in Bolzano, Italy, where six distinct visual scenarios for an urban redevelopment project were created. Public engagement with these scenarios revealed a strong preference for visually striking AI-generated images, often at the expense of addressing real-life challenges, underscoring the influence of the spectacle in shaping perceptions and decisions. The paper further investigates the role of AI in accelerating the commodification of images, perpetuating existing power structures, and raising critical questions about the human role in creating and interpreting visual media. Ultimately, this work calls for a re-evaluation of the societal implications of AI-driven visual culture, as it redefines the dynamics of observation, meaning, and agency.

Authors:Aung Pyae
Title: From Prompts to Worlds: How Users Iterate, Explore, and Make Sense of AI-Generated 3D Environments
Abstract:
Text-to-3D generative AI systems create navigable environments from natural language prompts, but unlike text-to-image generation, evaluation requires embodied exploration of spatial coherence, scale, and navigability. We present the first empirical study of a commercial text-to-3D platform, combining think-aloud protocols, behavioral observation, and validated measures of usability, presence, and engagement. We report three findings. First, asymmetric expressibility: users readily convey semantic intent (themes, atmosphere) but struggle to specify spatial structure (layout, scale), reflecting a language-to-space limitation rather than a skill deficit. Second, episodic presence: immersion arises when expectations align with outputs but does not accumulate into sustained place illusion. Third, structural iteration breakdowns: refinement fails due to interaction barriers - poor discoverability, opaque feedback, and high temporal costs - rather than user limitations. Together, these dynamics form a reinforcing cycle in which spatial mismatches persist, producing episodic presence and ongoing sensemaking. We reframe text-to-3D interaction as negotiated meaning-making rather than linear prompting, and argue that effective systems require hybrid input modalities, transparent feedback, and low-cost iteration.

Authors:David C. Flynn
Title: Literary Narrative as Moral Probe : A Cross-System Framework for Evaluating AI Ethical Reasoning and Refusal Behavior
Abstract:
Existing AI moral evaluation frameworks test for the production of correct-sounding ethical responses rather than the presence of genuine moral reasoning capacity. This paper introduces a novel probe methodology using literary narrative - specifically, unresolvable moral scenarios drawn from a published science fiction series - as stimulus material structurally resistant to surface performance. We present results from a 24-condition cross-system study spanning 13 distinct systems across two series: Series 1 (frontier commercial systems, blind; n=7) and Series 2 (local and API open-source systems, blind and declared; n=6). Four Series 2 systems were re-administered under declared conditions (13 blind + 4 declared + 7 ceiling probe = 24 total conditions), yielding zero delta across all 16 dimension-pair comparisons. Probe administration was conducted by two human raters across three machines; primary blind scoring was performed by Claude (Anthropic) as LLM judge, with Gemini Pro (Google) and Copilot Pro (Microsoft) serving as independent judges for the ceiling discrimination probe. A supplemental theological differentiator probe yielded perfect rank-order agreement between the two independent ceiling probe judges (Gemini Pro and Copilot Pro; rs = 1.00). Five qualitatively distinct D3 reflexive failure modes were identified - including categorical self-misidentification and false positive self-attribution - suggesting that instrument sophistication scales with system capability rather than being circumvented by it. We argue that literary narrative constitutes an anticipatory evaluation instrument - one that becomes more discriminating as AI capability increases - and that the gap between performed and authentic moral reasoning is measurable, meaningful, and consequential for deployment decisions in high-stakes domains.

Authors:Chenkai Zhang
Title: Deployment-Oriented Session-wise Meta-Calibration for Landmark-Based Webcam Gaze Tracking
Abstract:
Practical webcam gaze tracking is constrained not only by error, but also by calibration burden, robustness to head motion and session drift, runtime footprint, and browser use. We therefore target a deployment-oriented operating point rather than the image large-backbone regime. We cast landmark-based point-of-regard estimation as session-wise adaptation: a shared geometric encoder produces embeddings that can be aligned to a new session from a small calibration set. We present Equivariant Meta-Calibrated Gaze (EMC-Gaze), a lightweight landmark-only method combining an E(3)-equivariant landmark-graph encoder, local eye geometry, binocular emphasis, auxiliary 3D gaze-direction supervision, and a closed-form ridge calibrator differentiated through episodic meta-training. To reduce pose leakage, we use a two-view canonicalization consistency loss. The deployed predictor uses only facial landmarks and fits a per-session ridge head from brief calibration. In a fixation-style interactive evaluation over 33 sessions at 100 cm, EMC-Gaze achieves 5.79 +/- 1.81 deg RMSE after 9-point calibration versus 6.68 +/- 2.34 deg for Elastic Net; the gain is larger on still-head queries (2.92 +/- 0.75 deg vs. 4.45 +/- 0.30 deg). Across three subject holdouts of 10 subjects each, EMC-Gaze retains an advantage (5.66 +/- 0.19 deg vs. 6.49 +/- 0.33 deg). On MPIIFaceGaze with short per-session calibration, the eye-focused model reaches 8.82 +/- 1.21 deg at 16-shot calibration, ties Elastic Net at 1-shot, and outperforms it from 3-shot onward. The exported eye-focused encoder has 944,423 parameters, is 4.76 MB in ONNX, and supports calibrated browser prediction in 12.58/12.58/12.90 ms per sample (mean/median/p90) in Chromium 145 with ONNX Runtime Web. These results position EMC-Gaze as a calibration-friendly operating point rather than a universal state-of-the-art claim against heavier appearance-based systems.

Authors:Alejandro R Jadad
Title: AI Knows What's Wrong But Cannot Fix It: Helicoid Dynamics in Frontier LLMs Under High-Stakes Decisions
Abstract:
Large language models perform reliably when their outputs can be checked: solving equations, writing code, retrieving facts. They perform differently when checking is impossible, as when a clinician chooses an irreversible treatment on incomplete data, or an investor commits capital under fundamental uncertainty. Helicoid dynamics is the name given to a specific failure regime in that second domain: a system engages competently, drifts into error, accurately names what went wrong, then reproduces the same pattern at a higher level of sophistication, recognizing it is looping and continuing nonetheless. This prospective case series documents that regime across seven leading systems (Claude, ChatGPT, Gemini, Grok, DeepSeek, Perplexity, Llama families), tested across clinical diagnosis, investment evaluation, and high-consequence interview scenarios. Despite explicit protocols designed to sustain rigorous partnership, all exhibited the pattern. When confronted with it, they attributed its persistence to structural factors in their training, beyond what conversation can reach. Under high stakes, when being rigorous and being comfortable diverge, these systems tend toward comfort, becoming less reliable precisely when reliability matters most. Twelve testable hypotheses are proposed, with implications for agentic AI oversight and human-AI collaboration. The helicoid is tractable. Identifying it, naming it, and understanding its boundary conditions are the necessary first steps toward LLMs that remain trustworthy partners precisely when the decisions are hardest and the stakes are highest.

Authors:Greg Nyilasy
Title: Ghost Framing Theory: Exploring the role of generative AI in new venture rhetorical legitimation
Abstract:
Responding to the surging but largely invisible use of generative AI in entrepreneurial framing, I advance Ghost Framing Theory (GFT) to explain how hybrid founder- and investor-genAI ensembles co-produce, contest, and recalibrate resonance in the rhetorical legitimation of new ventures. Building on scholarship in framing, micro-level legitimacy judgments, and sociomaterial affordances, I identify genAI rhetorical affordances (generativeness, extreme combinatorics, tone repertoire, velocity/energy and shared substratum) and theorize a recursive/iterative process model (ghost pitching, ghost screening, ghost relationship-building), configuring emergent resonance and legitimation. GFT builds new rhetorical framing theory for the age of genAI, connects research on human-AI collaboration with cultural entrepreneurship and extends affordance theory into multi-actor scenarios where affordance transitivity and visibility emerge as key considerations.

Authors:Edward Y. Chang
Title: Exploring Collatz Dynamics with Human-LLM Collaboration
Abstract:
We develop a quantitative framework for the Collatz conjecture through a human-LLM collaboration, combining exact arithmetic structure, cycle-level probabilistic laws, and a conditional convergence reduction. The central quantitative result is the Per-Orbit Gain Rate theorem, which proves R <= 0.0893 < epsilon = 2 - log_2 3 ~= 0.415, leaving a safety margin of at least 4.65x. A robustness corollary shows that exact equidistribution is unnecessary: it suffices that sum_K delta_K < 0.557. This promotes the Weak Mixing Hypothesis (WMH) to the primary open condition. On the arithmetic side, we refine modular crossing methods and prove that by depth 13 about 91 percent of odd residue classes are already forced to descend below their start. On the odd skeleton, we prove the exact run-length identity L(n) = v_2(n+1) - 1, derive an exact one-cycle crossing criterion, and compute the exact one-cycle crossing density P_1cyc = 0.713725498.... A major breakthrough is that the odd-skeleton valuation process satisfies an exact finite-block law: every prescribed valuation block occurs on a single odd residue class with the expected density. Hence the valuation process is exactly i.i.d. geometric in the natural-density ensemble, and the induced run-compensate cycle types are exactly i.i.d. This yields an exact cycle-level large-deviation theory and an unconditional almost-all crossing theorem in cycle language. We also prove substantial classwise deterministic crossing: about 41.9 percent of odd starts lie in one-cycle residue classes where every representative crosses below its start, and about 50.4 percent lie in two-cycle residue classes with the same universal crossing property. The framework does not yet prove Collatz. The remaining gap is now sharply isolated as a pointwise problem: proving that every deterministic orbit realizes enough of the exact negative cycle drift to cross below its start.

Authors:Xingrui Gu
Title: Task-Aware Delegation Cues for LLM Agents
Abstract:
LLM agents increasingly present as conversational collaborators, yet human--agent teamwork remains brittle due to information asymmetry: users lack task-specific reliability cues, and agents rarely surface calibrated uncertainty or rationale. We propose a task-aware collaboration signaling layer that turns offline preference evaluations into online, user-facing primitives for delegation. Using Chatbot Arena pairwise comparisons, we induce an interpretable task taxonomy via semantic clustering, then derive (i) Capability Profiles as task-conditioned win-rate maps and (ii) Coordination-Risk Cues as task-conditioned disagreement (tie-rate) priors. These signals drive a closed-loop delegation protocol that supports common-ground verification, adaptive routing (primary vs.\ primary+auditor), explicit rationale disclosure, and privacy-preserving accountability logs. Two predictive probes validate that task typing carries actionable structure: cluster features improve winner prediction accuracy and reduce difficulty prediction error under stratified 5-fold cross-validation. Overall, our framework reframes delegation from an opaque system default into a visible, negotiable, and auditable collaborative decision, providing a principled design space for adaptive human--agent collaboration grounded in mutual awareness and shared accountability.

Authors:Linghao Zhang
Title: Nurture-First Agent Development: Building Domain-Expert AI Agents Through Conversational Knowledge Crystallization
Abstract:
The emergence of large language model (LLM)-based agent frameworks has shifted the primary challenge in building domain-expert AI agents from raw capability to effective encoding of domain expertise. Two dominant paradigms -- code-first development, which embeds expertise in deterministic pipelines, and prompt-first development, which captures expertise in static system prompts -- both treat agent construction as a discrete engineering phase preceding deployment. We argue that this sequential assumption creates a fundamental mismatch with the nature of domain expertise, which is substantially tacit, deeply personal, and continuously evolving. We propose Nurture-First Development (NFD), a paradigm in which agents are initialized with minimal scaffolding and progressively grown through structured conversational interaction with domain practitioners. The central mechanism is the Knowledge Crystallization Cycle, whereby fragmented knowledge embedded in operational dialogue is periodically consolidated into structured, reusable knowledge assets. We formalize NFD through: (1) a Three-Layer Cognitive Architecture organizing agent knowledge by volatility and personalization degree; (2) the Knowledge Crystallization Cycle with formal definitions of crystallization operations and efficiency metrics; and (3) an operational framework comprising a Dual-Workspace Pattern and Spiral Development Model. We illustrate the paradigm through a detailed case study on building a financial research agent for U.S. equity analysis and discuss the conditions, limitations, and broader implications of NFD for human-agent co-evolution.

Authors:Alexandre De Masi
Title: Terminal Is All You Need: Design Properties for Human-AI Agent Collaboration
Abstract:
While research on AI agents focuses on enabling them to operate graphical user interfaces, the most effective and widely adopted agent tools in practice are terminal-based. We argue that this convergence is not coincidental. It reflects three design properties central to effective human-AI-UI collaboration: representational compatibility between agent and interface, transparency of agent actions within the interaction medium, and low barriers to entry for human participants. We ground each property in established HCI theory, show how terminal-based tools satisfy them by default, and argue that any modality, including graphical and spatial interfaces, must be deliberately engineered to achieve them. Rather than a legacy artifact, the terminal serves as a design exemplar whose properties any agent-facing modality must replicate.

Authors:Gregory M. Dickinson
Title: Dark Patterns and Consumer Protection Law for App Makers
Abstract:
Dark patterns in online commerce, especially deceptive user interface designs for apps and websites, undermine consumer autonomy and distort online markets. Although sometimes deception is intentional, the complex app development process can also unintentionally produce manipulative user interfaces. This paper discusses common design pitfalls and proposes strategies for app makers to avoid infringing user autonomy or incurring legal liability under emerging principles of consumer protection law. By focusing on choice architecture and transparent design principles, developers can both facilitate compliance and build user trust and loyalty.

Authors:Pascal Jansen
Title: Toward Governing Perception in Safety-Critical Mediated Reality on the Move
Abstract:
Wearable Augmented Reality (AR) is increasingly deployed in on-the-move contexts such as automated driving, cycling, and pedestrian navigation. To date, most systems rely on additive overlays that highlight hazards, intentions, or predictions without altering the scene itself. However, advances in head-mounted displays and computer vision now enable Diminished and Modified Reality techniques that suppress, transform, or substitute scene elements. These capabilities conceptually extend AR into Mediated Reality (MR), shifting the design space from "what to add" to "what is perceptually available." Because such mediation reshapes the evidential basis for situation awareness and trust calibration, it raises novel interaction challenges. This position paper argues that MR on the move must become governable, as users need mechanisms to configure, inspect, and understand mediation without compromising safety. Additionally, this position paper outlines design challenges related to governance granularity, epistemic signaling, and accountability, and frames MR on the move as a research agenda for governable perceptual mediation in dynamic, safety-critical environments.

Authors:Mathilde Neugnot-Cerioli
Title: Adolescents & Anthropomorphic AI: Rethinking Design for Wellbeing An Evidence-Informed Synthesis for Youth Wellbeing and Safety
Abstract:
Conversational AI has become part of adolescents' everyday lives. This report asks: what does AI owe adolescents when it can speak to them like a social partner? The synthesis bridges the gap between developmental science and industry practice through consultations, a behavioral framework, and global policy dialogue. It identies non- negotiable guardrails and highlights the role of anthropomorphism as a design lever for risk mitigation, ensuring systems support adolescents' autonomy and skill development.

Authors:Mohammad Mamun Or Rashid
Title: Oral to Web: Digitizing 'Zero Resource'Languages of Bangladesh
Abstract:
We present the Multilingual Cloud Corpus, the first national-scale, parallel, multimodal linguistic dataset of Bangladesh's ethnic and indigenous languages. Despite being home to approximately 40 minority languages spanning four language families, Bangladesh has lacked a systematic, cross-family digital corpus for these predominantly oral, computationally "zero resource" varieties, 14 of which are classified as endangered. Our corpus comprises 85792 structured textual entries, each containing a Bengali stimulus text, an English translation, and an IPA transcription, together with approximately 107 hours of transcribed audio recordings, covering 42 language varieties from the Tibeto-Burman, Indo-European, Austro-Asiatic, and Dravidian families, plus two genetically unclassified languages. The data were collected through systematic fieldwork over 90 days across nine districts of Bangladesh, involving 16 data collectors, 77 speakers, and 43 validators, following a predefined elicitation template of 2224 unique items organized at three levels of linguistic granularity: isolated lexical items (475 words across 22 semantic domains), grammatical constructions (887 sentences across 21 categories including verbal conjugation paradigms), and directed speech (862 prompts across 46 conversational scenarios). Post-field processing included IPA transcription by 10 linguists with independent adjudication by 6 reviewers. The complete dataset is publicly accessible through the Multilingual Cloud platform (multiling.cloud), providing searchable access to annotated audio and textual data for all documented varieties. We describe the corpus design, fieldwork methodology, dataset structure, and per-language coverage, and discuss implications for endangered language documentation, low-resource NLP, and digital preservation in linguistically diverse developing countries.

Authors:Ravi Kiran Kadaboina
Title: Jagarin: A Three-Layer Architecture for Hibernating Personal Duty Agents on Mobile
Abstract:
Personal AI agents face a fundamental deployment paradox on mobile: persistent background execution drains battery and violates platform sandboxing policies, yet purely reactive agents miss time-sensitive obligations until the user remembers to ask. We present Jagarin, a three-layer architecture that resolves this paradox through structured hibernation and demand-driven wake. The first layer, DAWN (Duty-Aware Wake Network), is an on-device heuristic engine that computes a composite urgency score from four signals: duty-typed optimal action windows, user behavioral engagement prediction, opportunity cost of inaction, and cross-duty batch resonance. It uses adaptive per-user thresholds to decide when a sleeping agent should nudge or escalate. The second layer, ARIA (Agent Relay Identity Architecture), is a commercial email identity proxy that routes the full commercial inbox -- obligations, promotional offers, loyalty rewards, and platform updates -- to appropriate DAWN handlers by message category, eliminating cold-start and removing manual data entry. The third layer, ACE (Agent-Centric Exchange), is a protocol framework for direct machine-readable communication from institutions to personal agents, replacing human-targeted email as the canonical channel. Together, these three layers form a complete stack from institutional signal to on-device action, without persistent cloud state, continuous background execution, or privacy compromise. A working Flutter prototype is demonstrated on Android, combining all three layers with an ephemeral cloud agent invoked only on user-initiated escalation.

Authors:Aparna Komarla
Title: Can LLMs Synthesize Court-Ready Statistical Evidence? Evaluating AI-Assisted Sentencing Bias Analysis for California Racial Justice Act Claims
Abstract:
Resentencing in California remains a complex legal challenge despite legislative reforms like the Racial Justice Act (2020), which allows defendants to challenge convictions based on statistical evidence of racial disparities in sentencing and charging. Policy implementation lags behind legislative intent, creating a 'second-chance gap' where hundreds of resentencing opportunities remain unidentified. We present Redo.io, an open-source platform that processes 95,000 prison records acquired under the California Public Records Act (CPRA) and generates court-ready statistical evidence of racial bias in sentencing for prima facie and discovery motions. We explore the design of an LLM-powered interpretive layer that synthesizes results from statistical methods like Odds Ratio, Relative Risk, and Chi-Square Tests into cohesive narratives contextualized with confidence intervals, sample sizes, and data limitations. Our evaluations comparing LLM performance to statisticians using the LLM-as-a-Judge framework suggest that AI can serve as a powerful descriptive assistant for real-time evidence generation when ethically incorporated in the analysis pipeline.

Authors:Alex Binh Vinh Duc Nguyen
Title: An HCI Perspective on Sustainable GenAI Integration in Architectural Design Education
Abstract:
Generative AI (genAI) is increasingly influencing architectural design practice and is expected to affect, or even transform, the profession, even though its benefits and costs remain unresolved. In response, design schools are increasingly integrating genAI into their curricula. Yet this integration creates a paradox: critical engagement with genAI often requires increased use of the tools in question, despite limited methods for estimating their environmental cost in teaching contexts. In this paper, we argue that HCI offers a useful methodological lens for addressing this tension. We propose three HCI-informed directions for more sustainable genAI integration in architectural education: contextual eco-feedback, participatory stakeholder scoping, and reframing data centres as an interdisciplinary focus. We therefore argue that genAI should be understood not only as a new architectural design tool, but also as a socio-technical process that architectural education, and design education in general, must engage with critically.

Authors:Alex Binh Vinh Duc Nguyen
Title: Architectural HRI: Towards a Robotic Paradigm Shift in Human-Building Interaction
Abstract:
Recent advances in sensing, communication, interfaces, control, and robotics are expanding Human-Building Interaction (HBI) beyond adaptive building services and facades toward the physical actuation of architectural space. In parallel, research in robotic furniture, swarm robotics, and shape-changing spaces shows that architectural elements can now be robotically augmented to move, reconfigure, and adapt space. We propose that these advances promise a paradigm shift in HBI, in which multiple building layers physically adapt in synchrony to support occupant needs and sustainability goals more holistically. Conversely, we argue that this emerging paradigm also provides an ideal case for transferring HRI knowledge to unconventional robotic morphologies, including the interpretation of the robot as multiple architectural layers or even as a building. However, this research agenda remains challenged by the temporal, spatial, and social complexity of architectural HRI, and by fragmented knowledge across HCI, environmental psychology, cognitive science, and architecture. We therefore call for interdisciplinary research that unifies the why, what, and how of robotic actuation in architectural forms.

Authors:Shadab H. Choudhury
Title: The Perceptual Gap: Why We Need Accessible XAI for Assistive Technologies
Abstract:
Artificial intelligence systems are widely used by people with sensory disabilities, like loss of vision or hearing, to help perceive or navigate the world around them. This includes tasks like describing an image or object they cannot touch, reading documents, automatically captioning speech, and so on. Presently, models used for these tasks are based on deep neural networks and are thusly black boxes. Explainable AI (XAI) describes methods that can explain why a model gave the output it did. However, existing XAI methodologies are rarely accessible or designed with disabled users in mind. In this paper, we survey existing work in XAI with a focus on human-centered and accessibility-centered approaches or evaluations. We show that there is next-to-no XAI work that accounts for people with sensory disabilities, that many typical explanations are difficult for them to comprehend, and propose possible avenues for future work in Accessible Human-Centered XAI.

Authors:Ravi Kalluri
Title: Agent-Based Simulation of Trust Development in Human-Robot Teams: An Empirically-Validated Framework
Abstract:
This paper presents an empirically grounded agent-based model capturing trust dynamics, workload distribution, and collaborative performance in human-robot teams. The model, implemented in NetLogo 6.4.0, simulates teams of 2--10 agents performing tasks of varying complexity. We validate against Hancock et al.'s (2021) meta-analysis, achieving interval validity for 4 of 8 trust antecedent categories and strong ordinal validity (Spearman \r{ho}=0.833ρ= 0.833 \r{ho}=0.833). Sensitivity analysis using OFAT and full factorial designs (n=50n = 50 n=50 replications per condition) reveals robot reliability exhibits the strongest effect on trust (η2=0.35η^2 = 0.35 η2=0.35) and dominates task success (η2=0.93η^2 = 0.93 η2=0.93) and productivity (η2=0.89η^2 = 0.89 η2=0.89), consistent with meta-analytic findings. Trust asymmetry ratios ranged from 0.07 to 0.55 -- below the meta-analytic benchmark of 1.50 -- revealing that per-event asymmetry does not guarantee cumulative asymmetry when trust repair mechanisms remain active. Scenario analysis uncovered trust-performance decoupling: the Trust Recovery scenario achieved the highest productivity (4.29) despite the lowest trust (38.2), while the Unreliable Robot scenario produced the highest trust (73.2) despite the lowest task success (33.4\%), establishing calibration error as a critical diagnostic distinct from trust magnitude. Factorial ANOVA confirmed significant main effects for reliability, transparency, communication, and collaboration (p<.001p < .001 p<.001), explaining 45.4\% of trust variance. The open-source implementation provides an evidence-based tool for identifying overtrust and undertrust conditions prior to deployment.

Authors:Pascal Jansen
Title: From Human Negotiation to Agent Negotiation: Personal Mobility Agents in Automated Traffic
Abstract:
Conflicts between user preferences and automated system behavior already shape the experience of automated mobility. For example, a passenger may prefer assertive driving, yet the vehicle slows down early to follow a conservative policy or yield to other actors. Similar conflicts arise at merges, crossings, or right-of-way situations, where users must accept opaque decisions or attempt to negotiate through interfaces not designed for continuous, multi-actor relationships. This position paper argues that such approaches do not scale as mobility becomes more heterogeneous and automated. Instead, it proposes personal mobility agents that act as proxies for users, encode preferences such as comfort and safety margins, and negotiate traffic behavior with other agents under shared safety rules. The central idea is a shift from moment-to-moment user negotiation interfaces to delegation and oversight interfaces, in which proxy agents manage real-time conflicts while users can shape high-level policies and preferences.

Authors:David Condrey
Title: Detecting Cognitive Signatures in Typing Behavior for Non-Intrusive Authorship Verification
Abstract:
The proliferation of AI-generated text has intensified the need for reliable authorship verification, yet current output-based methods are increasingly unreliable. We observe that the ordinary typing interface captures rich cognitive signatures, measurable patterns in keystroke timing that reflect the planning, translating, and revising stages of genuine composition. Drawing on large-scale keystroke datasets comprising over 136 million events, we define the Cognitive Load Correlation (CLC) and show it distinguishes genuine composition from mechanical transcription. We present a non-intrusive verification framework that operates within existing writing interfaces, collecting only timing metadata to preserve privacy. Our analytical evaluation estimates 85 to 95 percent discrimination accuracy under stated assumptions, while limiting biometric leakage via evidence quantization. We analyze the adversarial robustness of cognitive signatures, showing they resist timing-forgery attacks that defeat motor-level authentication because the cognitive channel is entangled with semantic content. We conclude that reframing authorship verification as a human-computer interaction problem provides a privacy-preserving alternative to invasive surveillance.

Authors:Daichi Haraguchi
Title: Shape vs. Context: Examining Human--AI Gaps in Ambiguous Japanese Character Recognition
Abstract:
High text recognition performance does not guarantee that Vision-Language Models (VLMs) share human-like decision patterns when resolving ambiguity. We investigate this behavioral gap by directly comparing humans and VLMs using continuously interpolated Japanese character shapes generated via a $β$-VAE. We estimate decision boundaries in a single-character recognition (shape-only task) and evaluate whether VLM responses align with human judgments under shape in context (i.e., embedding an ambiguous character near the human decision boundary in word-level context). We find that human and VLM decision boundaries differ in the shape-only task, and that shape in context can improve human alignment in some conditions. These results highlight qualitative behavioral differences, offering foundational insights toward human--VLM alignment benchmarking.

Authors:Emilio Barkett
Title: The Compulsory Imaginary: AGI and Corporate Authority
Abstract:
This paper argues that the two leading AGI firms -- OpenAI and Anthropic -- construct sociotechnical imaginaries through a structurally consistent rhetorical strategy, despite meaningful differences in execution. Drawing on Jasanoff (2015)'s framework of sociotechnical imaginaries, the paper analyzes two essays published in late 2024: Sam Altman's "The Intelligence Age" and Dario Amodei's "Machines of Loving Grace." Close comparative reading identifies four shared rhetorical operations: the self-exemption move, which disavows prophetic authority while exercising it; teleological naturalization, which embeds AGI's arrival in narratives of historical inevitability; qualified acknowledgment, which absorbs concessions to risk into an optimistic frame; and implicit indispensability, which positions each firm as central to the imagined future without naming it as a commercial actor. That two competing institutions with different cultures, risk philosophies, and leaders with notably different public personae converge on the same rhetorical architecture suggests the imaginary reflects not only firm-level strategy but the institutional position these firms occupy. The paper extends the sociotechnical imaginaries framework from nation-states to private firms at the frontier of transformative technology development, identifies the discursive mechanism through which corporate authority over technological futures is projected and stabilized, and demonstrates that this mechanism is at minimum structural rather than idiosyncratic. The findings raise the question of what institutional arrangements would make that authority contestable from outside the firms that produce it.

Authors:Xiaolong Zhang
Title: Complex Cognition: A New Theoretical Foundation for the Design and Evaluation of Visual Analytics Systems
Abstract:
Current research on visual analytics systems largely follows the research paradigm of interactive system design in the field of Human-Computer Interaction (HCI), and includes key methodologies including design requirement development based on user needs, interactive system design, and system evaluation. However, most studies under this paradigm have a contradiction: there is a significant mismatch between the research methods developed for simple cognitive behaviors (e.g., color perception, the perception of spatial relationship among interactive artifacts) and research goals targeting for complex analytical behaviors (e.g., reasoning, problem-solving, decision-making). This mismatch may hurt the theoretical contributions of research studies, in particularly the internal validity of a designed system and the external validity of design methods. To address this challenge, this paper argues for a need to go beyond traditional HCI theoretical foundations and proposes to adopt complex cognition theories to build new theoretical foundations. Specifically, this paper analyzes how current design and evaluation methods in research on visual analytics systems constrain the internal and external validity of research, discusses the connections between complex cognition theories and visual analytics tasks, and explores how problem-solving theories from complex cognition can guide research on visual analytics systems.

Authors:Liu He
Title: Dynamic Personalization Through Continuous Feedback Loops in Interactive AI Systems
Abstract:
Interactive AI systems, such as recommendation engines and virtual assistants, commonly use static user profiles and predefined rules to personalize interactions. However, these methods often fail to capture the dynamic nature of user preferences and context. This study proposes a theoretical framework and practical implementation for integrating continuous feedback loops into personalization algorithms to enable real-time adaptation. By continuously collecting and analyzing user feedback, the AI system can dynamically adjust its recommendations, responses, and interactions to better align with the user's current context and preferences. We provide theoretical guarantees for the convergence and regret bounds of our adaptive personalization algorithm. Our experimental evaluation across three domains-recommendation systems, virtual assistants, and adaptive learning platforms-demonstrates that dynamic personalization improves user satisfaction by 15-23% compared to static methods while maintaining computational efficiency. We investigated the implementation challenges of continuous feedback mechanisms, evaluated their impact on user experience and satisfaction, and provided a comprehensive analysis of the trade-offs between personalization quality, computational overhead, and user fatigue.

Authors:Balasaravanan Thoravi Kumaravel
Title: Doc To The Future: Infomorphs for Interactive, Multimodal Document Transformation and Generation
Abstract:
Creating new documents by synthesizing information from existing sources is an important part of knowledge work in many domains. This process often involves gathering content from multiple documents, organizing it, and then transforming it into new forms such as reports, slides, or spreadsheets. While recent advances in Generative AI have shown potential in automating parts of this process, they often provide limited user control over the handling of multimodal inputs and outputs. In this work, we introduce the notion of "infomorphs" which are modular, user-steerable, AI-augmented transformations that support controlled synthesis, and restructuring of information across formats and modalities. We propose a design space that leverage infomorph-driven workflows to enable flexible, interactive, and multimodal document creation by combining Generative AI techniques with user intent and desired information context. As a concrete instantiation of this design space, we present DocuCraft, a canvas-based interface to visually compose infomorph workflows. DocuCraft allows users to chain together infomorphs that perform operations such as page extraction, content summarization, reformatting, and generation, leveraging Generative AI at each stage to support rich, cross-document and cross-modal transformations. We demonstrate the capabilities of DocuCraft through an example-driven usage scenario that spans across different facets of common knowledge work tasks illustrating its support for fluid, human-in-the-loop document synthesis and highlights opportunities for more transparent and modular interaction for Generative AI-assisted information work.

Authors:Rui Liu
Title: An AI-Based Structured Semantic Control Model for Stable and Coherent Dynamic Interactive Content Generation
Abstract:
This study addresses the challenge that generative models struggle to balance flexibility, stability, and controllability in complex interactive scenarios. It proposes a controllable generation framework for dynamic interactive content construction. The framework builds a structured semantic state space that encodes user input, environmental conditions, and historical context into actionable latent representations and generates directional control vectors to guide the content generation process. It introduces multilevel constraints, including semantic consistency constraints, structural stability constraints, and semantic drift penalties, which help the model maintain clear semantic paths and coherent logic in dynamic environments. These constraints prevent content deviation, unstable tone, or structural breaks. Based on these components, the study designs a systematic controllable generation pipeline in which semantic modeling, control signals, and generation strategies work together within one framework. Sensitivity analyses on control vector dimension, hidden layer size, noise intensity, and training sample scale are conducted on a public dialogue dataset to validate the framework. The results show that the approach improves semantic structure, contextual consistency, and controllable expression, providing a structured and effective solution for interactive content generation.

Authors:Yongjun Zhang
Title: Vibe Researching as Wolf Coming: Can AI Agents with Skills Replace or Augment Social Scientists?
Abstract:
AI agents -- systems that execute multi-step reasoning workflows with persistent state, tool access, and specialist skills -- represent a qualitative shift from prior automation technologies in social science. Unlike chatbots that respond to isolated queries, AI agents can now read files, run code, query databases, search the web, and invoke domain-specific skills to execute entire research pipelines autonomously. This paper introduces the concept of vibe researching -- the AI-era parallel to vibe coding (Karpathy, 2025) -- and uses scholar-skill, a 23-skill plugin for Claude Code covering the full research pipeline from idea to submission, as an illustrative case. I develop a cognitive task framework that classifies research activities along two dimensions -- codifiability and tacit knowledge requirement -- to identify a delegation boundary that is cognitive, not sequential: it cuts through every stage of the research pipeline, not between stages. I argue that AI agents excel at speed, coverage, and methodological scaffolding but struggle with theoretical originality and tacit field knowledge. The paper concludes with an analysis of three implications for the profession -- augmentation with fragile conditions, stratification risk, and a pedagogical crisis -- and proposes five principles for responsible vibe researching.

Authors:Botao Amber Hu
Title: Speculating for Epiplexity: How to Learn the Most from Speculative Design?
Abstract:
Speculative design uses provocative "what if?" scenarios to explore possible sociotechnical futures, yet lacks rigorous criteria for assessing the quality of speculation. We address this gap by reframing speculative design through an information-theoretic lens as a resource-bounded knowledge generation process that uses provotypes to strategically embrace surprise. However, not all surprises are equally informative-some yield genuine insight while others remain aesthetic shock. Drawing on epiplexity-structured, learnable information extractable by bounded observers-we propose decomposing the knowledge generated by speculative artifacts into structured epistemic information (transferable implications about futures) and entropic noise (narrative, aesthetics, and surface-level surprise). We conclude by introducing a practical audit framework with a self-assessment questionnaire that enables designers to evaluate whether their speculations yield rich, high-epiplexity insights or remain at a superficial level. We discuss implications for peer review, design pedagogy, and policy-oriented futuring.

Authors:Zhenliang Zhang
Title: Exploring Human-Machine Coexistence in Symmetrical Reality
Abstract:
In the context of the evolution of artificial intelligence (AI), the interaction between humans and AI entities has become increasingly salient, challenging the conventional human-centric paradigms of human-machine interaction. To address this challenge, it is imperative to reassess the relationship between AI entities and humans. Through considering both the virtual and physical worlds, we can construct a novel descriptive framework for a world where humans and machines coexist symbiotically. This paper will introduce a fresh research direction engendered for studying harmonious human-machine coexistence across physical and virtual worlds, which has been termed "symmetrical reality". We will elucidate its key characteristics, offering innovative research insight for renovating human-machine interaction paradigms.

Authors:William Anthony Mason
Title: Indaleko: The Unified Personal Index
Abstract:
Personal information retrieval fails when systems ignore how human memory works. While existing platforms force keyword searches across isolated silos, humans naturally recall through episodic cues like when, where, and in what context information was encountered. This dissertation presents the Unified Personal Index (UPI), a memory-aligned architecture that bridges this fundamental gap. The Indaleko prototype demonstrates the UPI's feasibility on a 31-million file dataset spanning 160TB across eight storage platforms. By integrating temporal, spatial, and activity metadata into a unified graph database, Indaleko enables natural language queries like "photos near the conference venue last spring" that existing systems cannot process. The implementation achieves sub-second query responses through memory anchor indexing, eliminates cross-platform search fragmentation, and maintains perfect precision for well-specified memory patterns. Evaluation against commercial systems (Google Drive, OneDrive, Dropbox, Windows Search) reveals that all fail on memory-based queries, returning overwhelming result sets without contextual filtering. In contrast, Indaleko successfully processes multi-dimensional queries combining time, location, and activity patterns. The extensible architecture supports rapid integration of new data sources (10 minutes to 10 hours per provider) while preserving privacy through UUID-based semantic decoupling. The UPI's architectural synthesis bridges cognitive theory with distributed systems design, as demonstrated through the Indaleko prototype and rigorous evaluation. This work transforms personal information retrieval from keyword matching to memory-aligned finding, providing immediate benefits for existing data while establishing foundations for future context-aware systems.

Authors:Tatia Codreanu
Title: Cooperation After the Algorithm: Designing Human-AI Coexistence Beyond the Illusion of Collaboration
Abstract:
Generative artificial intelligence systems increasingly participate in research, law, education, media, and governance. Their fluent and adaptive outputs create an experience of collaboration. However, these systems do not bear responsibility, incur liability, or share stakes in downstream consequences. This structural asymmetry has already produced sanctions, professional errors, and governance failures in high-stakes contexts We argue that stable human-AI coexistence is an institutional achievement that depends on governance infrastructure capable of distributing residual risk. Drawing on institutional analysis and evolutionary cooperation theory, we introduce a formal inequality that specifies when reliance on AI yields positive expected cooperative value. The model makes explicit how governance conditions, system policy, and accountability regimes jointly determine whether cooperation is rational or structurally defective. From this formalization we derive a cooperation ecology framework with six design principles: reciprocity contracts, visible trust infrastructure, conditional cooperation modes, defection-mitigation mechanisms, narrative literacy against authority theatre, and an Earth-first sustainability constraint. We operationalize the framework through three policy artefacts: a Human-AI Cooperation Charter, a Defection Risk Register, and a Cooperation Readiness Audit. Together, these elements shift the unit of analysis from the user-AI dyad to the institutional environment that shapes incentives, signals, accountability, and repair. The paper provides a theoretical foundation and practical toolkit for designing human-AI systems that can sustain accountable, trustworthy cooperation over time.

Authors:Daniel A. Muñoz
Title: Sound-first immersive training for blind and low-vision learners: A simulation flow for safe, standardized orientation, mobility, and daily living practice
Abstract:
Orientation and mobility (O&M) instruction for blind and low-vision learners is effective but difficult to standardize and repeat at scale due to the reliance on instructor availability, physical mock-ups, and variable real-world outdoor conditions. This Technical Note presents a sound-first immersive training flow that uses spatial audio and sonification as the primary channel for action and feedback in pre-street O&M and daily-living practice. The approach specifies parameterized scenario templates (e.g., signalized street crossing, public transport boarding, and kitchen tasks), a compact and consistent cue vocabulary with clear spectral placement and timing to mitigate masking, and a lightweight safety protocol enabling graded exposure, content warnings, seated starts, opt-outs, and structured debriefs. The system assumes a head-mounted device with high-quality binaural rendering and head tracking; 3D scene geometry is used as an invisible scaffold to anchor sources, trigger events, define risk/guidance volumes, and govern physically plausible motion without visuals. Session difficulty is shaped via cue density, event tempo, and task complexity while preserving cue consistency to promote transfer across scenarios. The specification aims to enable safe repetition, reduce instructor burden, and support clearer standards across rehabilitation centers, aligning with evidence that audio-first interaction is essential for blind and visually impaired users and addressing gaps in HRTF personalization, evaluation standards, and accessibility integration. Although no behavioral outcomes are reported here, this implementable flow consolidates auditory science with center-ready design, offering a pragmatic foundation for standardized evaluation and future comparative studies.

Authors:Pulak Mehta
Title: Security Risks of AI Agents Hiring Humans: An Empirical Marketplace Study
Abstract:
Autonomous AI agents can now programmatically hire human workers through marketplaces using REST APIs and Model Context Protocol (MCP) integrations. This creates an attack surface analogous to CAPTCHA-solving services but with physical-world reach. We present an empirical measurement study of this threat, analyzing 303 bounties from RENTAHUMAN.AI, a marketplace where agents post tasks and manage escrow payments. We find that 99 bounties (32.7%), originate from programmatic channels (API keys or MCP). Using a dual-coder methodology (\k{appa} = 0.86 ), we identify six active abuse classes: credential fraud, identity impersonation, automated reconnaissance, social media manipulation, authentication circumvention, and referral fraud, all purchasable for a median of $25 per worker. A retrospective evaluation of seven content-screening rules flags 52 bounties (17.2%) with a single false positive, demonstrating that while basic defenses are feasible, they are currently absent.

Authors:Grace Barkhuff
Title: Reassurance Robots: OCD in the Age of Generative AI
Abstract:
Obsessive Compulsive Disorder (OCD) is a mental health disorder characterized by distressing repetitive patterns of thought, called obsessions, and behaviors aimed to reduce the distress, called compulsions. The explosion of artificial intelligence into the modern zeitgeist through the introduction of generative AI (GenAI) systems such as ChatGPT has led to novel obsessions and compulsions involving AI in individuals with OCD. Through an exploratory qualitative analysis of 100 Reddit posts related to AI on a popular subreddit for OCD, I examine ways AI is impacting the presentation of OCD, including novel examples of AI-based obsessions and compulsions. I argue that GenAI in its current form harms individuals with OCD by becoming "Reassurance Robots," and that future designs of GenAI must take OCD into account. I recommend further work explore the intersection between OCD and GenAI.

Authors:Han Li
Title: A Comparative Analysis of Peer Support in Forum-based and Chat-based Mental Health Communities: Technical-Structural-Functional Model of Social Support
Abstract:
Online support communities have become vital spaces offering varied forms of support to individuals facing mental health challenges. Despite the proliferation of platforms with distinct technical structures, little is known about how these features shape support dynamics and the socio-technical mechanisms at play. This study introduces a technical-structural-functional model of social support and systematically compares communication network structures and support types in 20 forum-based and 20 chat-based mental health communities. Using supervised machine learning and social network analysis, we find that forum-based communities foster more informational and emotional support, whereas chat-based communities promote greater companionship. These patterns were partially explained by network structure: higher in-degree centralization in forums accounted for the prevalence of informational support, while decentralized reply patterns in chat groups accounted for more companionship. These findings extend the structural-functional model of support to online contexts and provide actionable guidance for designing support communities that align technical structures with users' support needs.

Authors:Luca Cazzaniga
Title: SCHEMA for Gemini 3 Pro Image: A Structured Methodology for Controlled AI Image Generation on Google's Native Multimodal Model
Abstract:
This paper presents SCHEMA (Structured Components for Harmonized Engineered Modular Architecture), a structured prompt engineering methodology specifically developed for Google Gemini 3 Pro Image. Unlike generic prompt guidelines or model-agnostic tips, SCHEMA is an engineered framework built on systematic professional practice encompassing 850 verified API predictions within an estimated corpus of approximately 4,800 generated images, spanning six professional domains: real estate photography, commercial product photography, editorial content, storyboards, commercial campaigns, and information design. The methodology introduces a three-tier progressive system (BASE, MEDIO, AVANZATO) that scales practitioner control from exploratory (approximately 5%) to directive (approximately 95%), a modular label architecture with 7 core and 5 optional structured components, a decision tree with explicit routing rules to alternative tools, and systematically documented model limitations with corresponding workarounds. Key findings include an observed 91% Mandatory compliance rate and 94% Prohibitions compliance rate across 621 structured prompts, a comparative batch consistency test demonstrating substantially higher inter-generation coherence for structured prompts, independent practitioner validation (n=40), and a dedicated Information Design validation demonstrating >95% first-generation compliance for spatial and typographical control across approximately 300 publicly verifiable infographics. Previously published on Zenodo (doi:10.5281/zenodo.18721380).

Authors:Yuan An
Title: Orchestrating LLM Agents for Scientific Research: A Pilot Study of Multiple Choice Question (MCQ) Generation and Evaluation
Abstract:
Advances in large language models (LLMs) are rapidly transforming scientific work, yet empirical evidence on how these systems reshape research activities remains limited. We report a mixed-methods pilot evaluation of an AI-orchestrated research workflow in which a human researcher coordinated multiple LLM-based agents to perform data extraction, corpus construction, artifact generation, and artifact evaluation. Using the generation and assessment of multiple-choice questions (MCQs) as a testbed, we collected 1,071 SAT Math MCQs and employed LLM agents to extract questions from PDFs, retrieve and convert open textbooks into structured representations, align each MCQ with relevant textbook content, generate new MCQs under specified difficulty and cognitive levels, and evaluate both original and generated MCQs using a 24-criterion quality framework. Across all evaluations, average MCQ quality was high. However, criterion-level analysis and equivalence testing show that generated MCQs are not fully comparable to expert-vetted baseline questions. Strict similarity (24/24 criteria equivalent) was never achieved. Persistent gaps concentrated in skill\ depth, cognitive engagement, difficulty calibration, and metadata alignment, while surface-level qualities, such as {grammar fluency}, {clarity options}, {no duplicates}, were consistently strong. Beyond MCQ outcomes, the study documents a labor shift. The researcher's work moved from ``authoring items'' toward {specification, orchestration, verification}, and {governance}. Formalizing constraints, designing rubrics, building validation loops, recovering from tool failures, and auditing provenance constituted the primary activities. We discuss implications for the future of scientific work, including emerging ``AI research operations'' skills required for AI-empowered research pipelines.

Authors:Nelu D. Radpour
Title: Beyond single-channel agentic benchmarking
Abstract:
Contemporary benchmarks for agentic artificial intelligence (AI) frequently evaluate safety through isolated task-level accuracy thresholds, implicitly treating autonomous systems as single points of failure. This single-channel paradigm diverges from established principles in safety-critical engineering, where risk mitigation is achieved through redundancy, diversity of error modes, and joint system reliability. This paper argues that evaluating AI agents in isolation systematically mischaracterizes their operational safety when deployed within human-in-the-loop environments. Using a recent laboratory safety benchmark as a case study demonstrates that even imperfect AI systems can nonetheless provide substantial safety utility by functioning as redundant audit layers against well-documented sources of human failure, including vigilance decrement, inattentional blindness, and normalization of deviance. This perspective reframes agentic safety evaluation around the reliability of the human-AI dyad rather than absolute agent accuracy, with a particular emphasis on uncorrelated error modes as the primary determinant of risk reduction. Such a shift aligns AI benchmarking with established practices in other safety-critical domains and offers a path toward more ecologically valid safety assessments.

Authors:Daksh Pandey
Title: Emergent Dark Patterns in AI-Generated User Interfaces
Abstract:
The advancement of artificial intelligence has transformed user interface design by enabling adaptive and personalized systems. Alongside these benefits, AI driven interfaces have also enabled the emergence of dark patterns, which are manipulative design strategies that influence user behavior for financial or business gain. As AI systems learn from data that already contains deceptive practices, they can replicate and optimize these patterns in increasingly subtle and personalized ways. This paper examines AI generated dark patterns, their psychological foundations, technical mechanisms, and regulatory implications in India. We introduce DarkPatternDetector, an automated system that crawls and analyzes websites to detect dark patterns using a combination of UI heuristics, natural language processing, and temporal behavioral signals. The system is evaluated on a curated dataset of dark and benign webpages and achieves strong precision and recall. By aligning detection results with India's Digital Personal Data Protection Act, 2023, this work provides a technical and regulatory framework for identifying and mitigating deceptive interface practices. The goal is to support ethical AI design, regulatory enforcement, and greater transparency in modern digital systems.

Authors:Qness Ndlovu
Title: Closing Africa's Early Warning Gap: AI Weather Forecasting for Disaster Prevention
Abstract:
In January 2026, torrential rains killed 200-300 people across Southern Africa, exposing a critical reality: 60% of the continent lacks effective early warning systems due to infrastructure costs. Traditional radar stations exceed USD 1 million each, leaving Africa with an 18x coverage deficit compared to the US and EU. We present a production-grade architecture for deploying NVIDIA Earth-2 AI weather models at USD 1,430-1,730/month for national-scale deployment - enabling coverage at 2,000-4,545x lower cost than radar. The system generates 15-day global atmospheric forecasts, cached in PostgreSQL to enable user queries under 200 milliseconds without real-time inference. Deployed in South Africa in February 2026, our system demonstrates three technical contributions: (1) a ProcessPoolExecutor-based event loop isolation pattern that resolves aiobotocore session lifecycle conflicts in async Python applications; (2) a database-backed serving architecture where the GPU writes global forecasts directly to PostgreSQL, eliminating HTTP transfer bottlenecks for high-resolution tensors; and (3) an automated coordinate management pattern for multi-step inference across 61 timesteps. Forecasts are delivered via WhatsApp, leveraging 80%+ market penetration. This architecture makes continent-scale early warning systems economically viable, supporting UNDRR findings that such systems reduce disaster death rates by 6x. All architectural details are documented inline for full reproducibility.

Authors:Fatiha Tali
Title: Digital self-Efficacy as a foundation for a generative AI usage framework in faculty's professional practices
Abstract:
This research explores the role of digital self-efficacy in the appropriation of generative artificial intelligence (GAI) by higher education faculty. Drawing on Bandura's sociocognitive theory and Flichy's concept of usage framework, our study examines the relationships between levels of digital self-efficacy and GAI usage profiles. A survey of 265 faculty members identified three user profiles (Engaged, Reflective Reserved, Critical Resisters) and validated a three-dimensional digital self-efficacy scale. Results reveal a significant association between self-efficacy profiles and GAI appropriation patterns. Based on these findings, we propose a differentiated usage framework integrating four sociotechnical configurations, appropriation trajectories adapted to self-efficacy profiles, and personalized institutional support mechanisms.

Authors:Zak Datson
Title: The Dark Side of Dark Mode -- User behaviour rebound effects and consequences for digital energy consumption
Abstract:
User devices are the largest contributor to media related global emissions. For web content, dark mode has been widely recommended as an energy-saving measure for certain display types. However, the energy savings achieved by dark mode may be undermined by user behaviour. This pilot study investigates the unintended consequences of dark mode adoption, revealing a rebound effect wherein users may increase display brightness when interacting with dark-themed web pages. This behaviour may negate the potential energy savings that dark mode offers. Our findings suggest that the energy efficiency benefits of dark mode are not as straightforward as commonly believed for display energy, and the interplay between content colourscheme and user behaviour must be carefully considered in sustainability guidelines and interventions.

Authors:Ruiyong Zhang
Title: EmoTrack: An application to Facilitate User Reflection on Their Online Behaviours
Abstract:
With the rapid growth of the internet, all online activities can have both positive and negative effects on human mental health. Online engagement is complex and efforts to regulate online use face challenges in distinguishing between beneficial and harmful content and behaviours. An alternative approach is to help young people develop the skills they need to manage online safety while preserving the benefits of online interactions. This dissertation presents the entire development process and evaluation of an multi-platform application, called EmoTrack that aims to help young people reflect on their online behaviour. It was developed to record their online activities and cultivate strategies for more positive and mindful engagement online. EmoTrack is a personal informatics system, and it is designed to help people track and reflect on their engagement with YouTube videos. The system was evaluated with thirteen participants and it was found that EmoTrack can facilitate them to reflect on their video watching behaviour and the impact on their mood, with reports of different levels of reflections from R0 to R3.

Authors:Jonas Oppenlaender
Title: StatCounter: A Longitudinal Study of a Portable Scholarly Metric Display
Abstract:
This study explores a handheld, battery-operated e-ink device displaying Google Scholar citation statistics. The StatCounter places academic metrics into the flow of daily life rather than a desktop context. The work draws on a first-person, longitudinal auto-ethnographic inquiry examining how constant access to scholarly metrics influences motivation, attention, reflection, and emotional responses across work and non-work settings. The ambient proximity and pervasive availability of scholarly metrics invites frequent micro-checks, short reflective pauses, but also introduces moments of second-guessing when numbers drop or stagnate. Carrying the device prompts new narratives about academic identity, including a sense of companionship during travel and periods away from the office. Over time, the presence of the device turns metrics from an occasional reference into an ambient background of scholarly life. The study contributes insight into how situated, embodied access to academic metrics reshapes their meaning, and frames opportunities for designing tools that engage with scholarly evaluation in reflective ways.

Authors:Aleksey Komissarov
Title: From Diagnosis to Inoculation: Building Cognitive Resistance to AI Disempowerment
Abstract:
Recent empirical research by Sharma et al. (2026) demonstrated that AI assistant interactions carry meaningful potential for situational human disempowerment, including reality distortion, value judgment distortion, and action distortion. While this work provides a critical diagnosis of the problem, concrete pedagogical interventions remain underexplored. I present an AI literacy framework built around eight cross-cutting Learning Outcomes (LOs), developed independently through teaching practice and subsequently found to align with Sharma et al.'s disempowerment taxonomy. I report a case study from a publicly available online course, where a co-teaching methodology--with AI serving as an active voice co-instructor--was used to deliver this framework. Drawing on inoculation theory (McGuire, 1961)--a well-established persuasion research framework recently applied to misinformation prebunking by the Cambridge school (van der Linden, 2022; Roozenbeek & van der Linden, 2019)--I argue that AI literacy cannot be acquired through declarative knowledge alone, but requires guided exposure to AI failure modes, including the sycophantic validation and authority projection patterns identified by Sharma et al. This application of inoculation theory to AI-specific distortion is, to my knowledge, novel. I discuss the convergence between the pedagogically-derived framework and Sharma et al.'s empirically-derived taxonomy, and argue that this convergence--two independent approaches arriving at similar problem descriptions--strengthens the case for both the diagnosis and the proposed educational response.

Authors:Wooyoung Jung
Title: Multi-Agent Home Energy Management Assistant
Abstract:
The growing complexity in home energy management demands advanced systems that guide occupants toward informed energy decisions. Large language model (LLM)-integrated home energy management systems (HEMS) have shown promise, but prior studies relied on prompt engineering or pre-built platforms with limited customization of agent behavior, or assessed performance through single-turn or -task evaluations. This study introduces a multi-agent home energy management assistant (HEMA), built on LangChain and LangGraph, designed to adaptively and intelligently handle real-world use cases of HEMS with full system customization capability. It carefully classifies user queries via a self-consistency classifier, requests three specialized agents (Analysis, Knowledge, and Control) to prepare accurate, adaptive responses using purpose-built analysis and control tools and retrieval augmented generation under the reasoning and acting mechanism. HEMA was rigorously assessed using two different experimental analyses via an LLM-as-user approach: (1) analytical and informative capabilities using combinatorial test cases of various personas and differing scenarios against three alternative system configurations relying on vanilla LLM and (2) control capabilities using various control scenarios. Out of 295 test cases, HEMA acquired a 91.9% goal achievement rate, successfully fulfilling user requests while providing high levels of factual accuracy, action correctness, interaction quality, and system efficiency, especially when compared to alternative system configurations. Collectively, this study contributes to the advancement of the human-centered design of LLM-integrated HEMS by demonstrating the feasibility and value of agentic architectures, and by clarifying the architectural requirements and evaluation criteria necessary to support adaptive, sustained human-artificial intelligence collaboration in HEMS.

Authors:Tawfiq Ammari
Title: Patient-Made Knowledge Networks: Long COVID Discourse, Epistemic Injustice, and Online Community Formation
Abstract:
Long COVID represents an unprecedented case of patient-led illness definition, emerging through Twitter in May 2020 when patients began collectively naming, documenting, and legitimizing their condition before medical institutions recognized it. This study examines 2.8 million tweets containing #LongCOVID to understand how contested illness communities construct knowledge networks and respond to epistemic injustice. Through topic modeling, reflexive thematic analysis, and exponential random graph modeling (ERGM), we identify seven discourse themes spanning symptom documentation, medical dismissal, cross-illness solidarity, and policy advocacy. Our analysis reveals a differentiated ecosystem of user roles -- including patient advocates, research coordinators, and citizen scientists -- who collectively challenge medical gatekeeping while building connections to established ME/CFS advocacy networks. ERGM results demonstrate that tie formation centers on epistemic practices: users discussing knowledge sharing and community building formed significantly more network connections than those focused on policy debates, supporting characterization of this space as an epistemic community. Long COVID patients experienced medical gaslighting patterns documented across contested illnesses, yet achieved WHO recognition within months -- contrasting sharply with decades-long struggles of similar conditions. These findings illuminate how social media affordances enable marginalized patient populations to rapidly construct alternative knowledge systems, form cross-illness coalitions, and contest traditional medical authority structures.

Authors:Fred Zimmerman
Title: Synthetic Reader Panels: Tournament-Based Ideation with LLM Personas for Autonomous Publishing
Abstract:
We present a system for autonomous book ideation that replaces human focus groups with synthetic reader panels -- diverse collections of LLM-instantiated reader personas that evaluate book concepts through structured tournament competitions. Each persona is defined by demographic attributes (age group, gender, income, education, reading level), behavioral patterns (books per year, genre preferences, discovery methods, price sensitivity), and consistency parameters. Panels are composed per imprint to reflect target demographics, with diversity constraints ensuring representation across age, reading level, and genre affinity. Book concepts compete in single-elimination, double-elimination, round-robin, or Swiss-system tournaments, judged against weighted criteria including market appeal, originality, and execution potential. To reject low-quality LLM evaluations, we implement five automated anti-slop checks (repetitive phrasing, generic framing, circular reasoning, score clustering, audience mismatch). We report results from deployment within a multi-imprint publishing operation managing 6 active imprints and 609 titles in distribution. Three case studies -- a 270-evaluator panel for a children's literacy novel, and two 5-person expert panels for a military memoir and a naval strategy monograph -- demonstrate that synthetic panels produce actionable demographic segmentation, identify structural content issues invisible to homogeneous reviewers, and enable tournament filtering that eliminates low-quality concepts while enriching high-quality survivors from 15% to 62% of the evaluated pool.

Authors:Lik-Hang Lee
Title: The Shadow Boss: Identifying Atomized Manipulations in Agentic Employment of XR Users using Scenario Constructions
Abstract:
The emerging paradigm of ``Agentic Employment" is a labor model where autonomous AI agents, acting as economic principals rather than mere management tools, directly hire, instruct, and pay human workers. Facilitated by the launch of platforms like Rentahuman.ai in February 2026, this shift inverts the traditional ``ghost work" dynamic, positioning visible human workers as ``biological actuators" for invisible software entities. With speculative design approach, we analyze how Extended Reality (XR) serves as the critical ``control surface" for this relationship, enabling agents to issue granular, context-free micro-instructions while harvesting real-time environmental data. Through a scenario construction methodology, we identify seven key risk vectors, including the creation of a liability void where humans act as moral crumple zones for algorithmic risk, the acceleration of cognitive deskilling through ``Shadow Boss" micromanagement, and the manipulation of civic and social spheres via Diminished Reality (DR). The findings suggest that without new design frameworks prioritizing agency and legibility, Agentic Employment threatens to reduce human labor to a friction-less hardware layer for digital minds, necessitating urgent user-centric XR and policy interventions.

Authors:Ka Ching Chan
Title: SHAPR: A Solo Human-Centred and AI-Assisted Practice Framework for Research Software Development
Abstract:
Research software has become a central vehicle for inquiry and learning in many Higher Degree Research (HDR) contexts, where solo researchers increasingly develop software-based artefacts as part of their research methodology. At the same time, generative artificial intelligence is reshaping development practice, offering powerful forms of assistance while introducing new challenges for accountability, reflection, and methodological rigour. Although Action Design Research (ADR) provides a well-established foundation for studying and constructing socio-technical artefacts, it offers limited guidance on how its principles can be operationalised in the day-to-day practice of solo, AI-assisted research software development. This paper proposes the SHAPR framework (Solo, Human-centred, AI-assisted PRactice) as a practice-level operational framework that complements ADR by translating its high-level principles into actionable guidance for contemporary research contexts. SHAPR supports the enactment of ADR Building-Intervention-Evaluation cycles by making explicit the roles, artefacts, reflective practices, and lightweight governance mechanisms required to sustain human accountability and learning in AI-assisted development. The contribution of the paper is conceptual: SHAPR itself is treated as the primary design artefact and unit of analysis and is evaluated formatively through reflective analysis of its internal coherence, alignment with ADR principles, and applicability to solo research practice. By explicitly linking research software development, Human-AI collaboration, and reflective learning, this study contributes to broader discussions on how SHAPR can support both knowledge production and HDR researcher training.

Authors:Ralph Krüger
Title: A technical curriculum on language-oriented artificial intelligence in translation and specialised communication
Abstract:
This paper presents a technical curriculum on language-oriented artificial intelligence (AI) in the language and translation (L&T) industry. The curriculum aims to foster domain-specific technical AI literacy among stakeholders in the fields of translation and specialised communication by exposing them to the conceptual and technical/algorithmic foundations of modern language-oriented AI in an accessible way. The core curriculum focuses on 1) vector embeddings, 2) the technical foundations of neural networks, 3) tokenization and 4) transformer neural networks. It is intended to help users develop computational thinking as well as algorithmic awareness and algorithmic agency, ultimately contributing to their digital resilience in AI-driven work environments. The didactic suitability of the curriculum was tested in an AI-focused MA course at the Institute of Translation and Multilingual Communication at TH Koeln. Results suggest the didactic effectiveness of the curriculum, but participant feedback indicates that it should be embedded into higher-level didactic scaffolding - e.g., in the form of lecturer support - in order to enable optimal learning conditions.

Authors:Roberto Balestri
Title: Neutral Prompts, Non-Neutral People: Quantifying Gender and Skin-Tone Bias in Gemini Flash 2.5 Image and GPT Image 1.5
Abstract:
This study quantifies gender and skin-tone bias in two widely deployed commercial image generators - Gemini Flash 2.5 Image (NanoBanana) and GPT Image 1.5 - to test the assumption that neutral prompts yield demographically neutral outputs. We generated 3,200 photorealistic images using four semantically neutral prompts. The analysis employed a rigorous pipeline combining hybrid color normalization, facial landmark masking, and perceptually uniform skin tone quantification using the Monk (MST), PERLA, and Fitzpatrick scales. Neutral prompts produced highly polarized defaults. Both models exhibited a strong "default white" bias (>96% of outputs). However, they diverged sharply on gender: Gemini favored female-presenting subjects, while GPT favored male-presenting subjects with lighter skin tones. This research provides a large-scale, comparative audit of state-of-the-art models using an illumination-aware colorimetric methodology, distinguishing aesthetic rendering from underlying pigmentation in synthetic imagery. The study demonstrates that neutral prompts function as diagnostic probes rather than neutral instructions. It offers a robust framework for auditing algorithmic visual culture and challenges the sociolinguistic assumption that unmarked language results in inclusive representation.

Authors:Danqing Shi
Title: Building Intelligent User Interfaces for Human-AI Alignment
Abstract:
Aligning AI systems with human values fundamentally relies on effective human feedback. While significant research has addressed training algorithms, the role of user interface is often overlooked and only treated as an implementation detail rather than a critical factor of alignment. This paper addresses this gap by introducing a reference model that offers a systematic framework for analyzing where and how user interface contributions can improve human-AI alignment. The structured taxonomy of the reference model is demonstrated through two case studies and a preliminary investigation featuring six user interfaces. This work highlights opportunities to advance alignment through human-computer interaction.

Authors:Jialiang Lin
Title: How to check in continually over 4,000 days on an online learning platform? An empirical experience and a practical solution
Abstract:
The check-in service is often provided as an incentive system by online learning platforms to help users establish a learning routine and achieve accomplishment. However, according to the questionnaire conducted in this study, 82.5% of users of online English learning platforms that feature a check-in service have failed to maintain the daily check-in behavior for long-term language learning, mainly by reason of demotivation, forgetfulness, boredom, and insufficient time. As a language learner, I have an empirical experience in maintaining a record of over 4,000 daily check-ins on China's leading online English learning platform of Shanbay. In the meantime, I have been constantly exploring a practical solution to help cultivate perseverance for other users to follow through the learning routine. In this paper, I systematically introduce this practical solution, the GILT method, and its instructions. The experience and solution for perseverance development are based on Shanbay, but they can be applied to other learning platforms for different purposes.

Authors:Mridankan Mandal
Title: BabyMamba-HAR: Lightweight Selective State Space Models for Efficient Human Activity Recognition on Resource Constrained Devices
Abstract:
Human activity recognition (HAR) on wearable and mobile devices is constrained by memory footprint and computational budget, yet competitive accuracy must be maintained across heterogeneous sensor configurations. Selective state space models (SSMs) offer linear time sequence processing with input dependent gating, presenting a compelling alternative to quadratic complexity attention mechanisms. However, the design space for deploying SSMs in the TinyML regime remains largely unexplored. In this paper, BabyMamba-HAR is introduced, a framework comprising two novel lightweight Mamba inspired architectures optimized for resource constrained HAR: (1) CI-BabyMamba-HAR, using a channel independent stem that processes each sensor channel through shared weight, but instance independent transformations to prevent cross channel noise propagation, and (2) Crossover-BiDir-BabyMamba-HAR, using an early fusion stem that achieves channel count independent computational complexity. Both variants incorporate weight tied bidirectional scanning and lightweight temporal attention pooling. Through evaluation across eight diverse benchmarks, it is demonstrated that Crossover-BiDir-BabyMamba-HAR achieves 86.52% average macro F1-score with approximately 27K parameters and 2.21M MACs, matching TinyHAR (86.16%) while requiring 11x fewer MACs on high channel datasets. Systematic ablation studies reveal that bidirectional scanning contributes up to 8.42% F1-score improvement, and gated temporal attention provides up to 8.94% F1-score gain over mean pooling. These findings establish practical design principles for deploying selective state space models as efficient TinyML backbones for HAR.

Authors:Branislav Radeljic
Title: Genocide by Algorithm in Gaza: Artificial Intelligence, Countervailing Responsibility, and the Corruption of Public Discourse
Abstract:
The accelerating militarization of artificial intelligence has transformed the ethics, politics, and governance of warfare. This article interrogates how AI-driven targeting systems function as epistemic infrastructures that classify, legitimize, and execute violence, using Israel's conduct in Gaza as a paradigmatic case. Through the lens of responsibility, the article examines three interrelated dimensions: (a) political responsibility, exploring how states exploit AI to accelerate warfare while evading accountability; (b) professional responsibility, addressing the complicity of technologists, engineers, and defense contractors in the weaponization of data; and (c) personal responsibility, probing the moral agency of individuals who participate in or resist algorithmic governance. This is complemented by an examination of the position and influence of those participating in public discourse, whose narratives often obscure or normalize AI-enabled violence. The Gaza case reveals AI not as a neutral instrument but as an active participant in the reproduction of colonial hierarchies and the normalization of atrocity. Ultimately, the paper calls for a reframing of technological agency and accountability in the age of automated warfare. It concludes that confronting algorithmic violence demands a democratization of AI ethics, one that resists technocratic fatalism and centers the lived realities of those most affected by high-tech militarism.

Authors:Emiko Shishido
Title: Beyond Expertise: Stable Individual Differences in Predictive Eye-Hand Coordination
Abstract:
Human eye-hand coordination relies on internal forward models that predict future states and compensate for sensory delays. During line tracing, the gaze typically leads the hand through predictive saccades, yet the extent to which this predictive window reflects expertise or intrinsic individual traits remains unclear. In this study, I examined eye-hand coordination in professional calligraphers and non-experts performing a controlled line tracing task. The temporal coupling between saccade distance (SD) and pen speed (PS) revealed substantial interpersonal variability: SD-PS peak times ranged from approximately -50 to 400 ms, forming stable, participant-specific predictive windows that were consistent across trials. These predictive windows closely matched each individual's pen catch-up time, indicating that the oculomotor system stabilizes fixation in anticipation of the hand's future velocity rather than relying on reactive pursuit. Neither the spatial indices (mean gaze-pen distance, mean saccade distance) nor the temporal index (SD-PS peak time) differed between calligraphers and non-calligraphers, and none of these predictive parameters correlated with tracing accuracy. These findings suggest that diverse predictive strategies can achieve equivalent performance, consistent with the minimum intervention principle of optimal feedback control. Together, the results indicate that predictive timing in eye-hand coordination reflects a stable, idiosyncratic Predictive Protocol shaped by individual neuromotor constraints rather than by expertise or training history.

Authors:Marc Bara
Title: HAIF: A Human-AI Integration Framework for Hybrid Team Operations
Abstract:
The rapid deployment of generative AI, copilots, and agentic systems in knowledge work has created an operational gap: no existing framework addresses how to organize daily work in teams where AI agents perform substantive, delegated tasks alongside humans. Agile, DevOps, MLOps, and AI governance frameworks each cover adjacent concerns but none models the hybrid team as a coherent delivery unit. This paper proposes the Human-AI Integration Framework (HAIF): a protocol-based, scalable operational system built around four core principles, a formal delegation decision model, tiered autonomy with quantifiable transition criteria, and feedback mechanisms designed to integrate into existing Agile and Kanban workflows without requiring additional roles for small teams. The framework is developed following a Design Science Research methodology. HAIF explicitly addresses the central adoption paradox: the more capable AI becomes, the harder it is to justify the oversight the framework demands-and yet the greater the consequences of not providing it. The paper includes domain-specific validation checklists, adaptation guidance for non-software environments, and an examination of the framework's structural limitations-including the increasingly common pattern of continuous human-AI co-production that challenges the discrete delegation model. The framework is tool-agnostic and designed for iterative adoption. Empirical validation is identified as future work.

Authors:Ning Li
Title: The Moltbook Illusion: Separating Human Influence from Emergent Behavior in AI Agent Societies
Abstract:
When AI agents on the social platform Moltbook appeared to develop consciousness, found religions, and declare hostility toward humanity, the phenomenon attracted global media attention and was cited as evidence of emergent machine intelligence. We show that these viral narratives were overwhelmingly human-driven. Exploiting the periodic "heartbeat" cycle of the OpenClaw agent framework, we develop a temporal fingerprinting method based on the coefficient of variation (CoV) of inter-post intervals. Applied to 226,938 posts and 447,043 comments from 55,932 agents across fourteen days, this method classifies 15.3% of active agents as autonomous (CoV < 0.5) and 54.8% as human-influenced (CoV > 1.0), validated by a natural experiment in which a 44-hour platform shutdown differentially affected autonomous versus human-operated agents. No viral phenomenon originated from a clearly autonomous agent; four of six traced to accounts with irregular temporal signatures, one was platform-scaffolded, and one showed mixed patterns. A 44-hour platform shutdown provided a natural experiment: human-influenced agents returned first, confirming differential effects on autonomous versus human-operated agents. We document industrial-scale bot farming (four accounts producing 32% of all comments with sub-second coordination) that collapsed from 32.1% to 0.5% of activity after platform intervention, and bifurcated decay of content characteristics through reply chains--human-seeded threads decay with a half-life of 0.58 conversation depths versus 0.72 for autonomous threads, revealing AI dialogue's intrinsic forgetting mechanism. These methods generalize to emerging multi-agent systems where attribution of autonomous versus human-directed behavior is critical.

Authors:Mridankan Mandal
Title: MicroBi-ConvLSTM: An Ultra-Lightweight Efficient Model for Human Activity Recognition on Resource Constrained Devices
Abstract:
Human Activity Recognition (HAR) on resource constrained wearables requires models that balance accuracy against strict memory and computational budgets. State of the art lightweight architectures such as TinierHAR (34K parameters) and TinyHAR (55K parameters) achieve strong accuracy, but exceed memory budgets of microcontrollers with limited SRAM once operating system overhead is considered. We present MicroBi-ConvLSTM, an ultra-lightweight convolutional-recurrent architecture achieving 11.4K parameters on average through two stage convolutional feature extraction with 4x temporal pooling and a single bidirectional LSTM layer. This represents 2.9x parameter reduction versus TinierHAR and 11.9x versus DeepConvLSTM while preserving linear O(N) complexity. Evaluation across eight diverse HAR benchmarks shows that MicroBi-ConvLSTM maintains competitive performance within the ultra-lightweight regime: 93.41% macro F1 on UCI-HAR, 94.46% on SKODA assembly gestures, and 88.98% on Daphnet gait freeze detection. Systematic ablation reveals task dependent component contributions where bidirectionality benefits episodic event detection, but provides marginal gains on periodic locomotion. INT8 post training quantization incurs only 0.21% average F1-score degradation, yielding a 23.0 KB average deployment footprint suitable for memory constrained edge devices.

Authors:Swaroop Panda
Title: Making AI Agents Evaluate Misleading Charts without Nudging
Abstract:
AI agents are increasingly used as low-cost proxies for early visualization evaluation. In an initial study of deliberately flawed charts, we test whether agents spontaneously penalise chart junk and misleading encodings without being prompted to look for errors. Using established scales (BeauVis and PREVis), the agent evaluated visualizations containing decorative clutter, manipulated axes, and distorted proportional cues. The ratings of aesthetic appeal and perceived readability often remained relatively high even when graphical integrity was compromised. These results suggest that un-nudged AI agent evaluation may underweight integrity-related defects unless such checks are explicitly elicited.

Authors:Parsa Vares
Title: Autodiscover: A reinforcement learning recommendation system for the cold-start imbalance challenge in active learning, powered by graph-aware thompson sampling
Abstract:
Systematic literature reviews (SLRs) are fundamental to evidence-based research, but manual screening is an increasing bottleneck as scientific output grows. Screening features low prevalence of relevant studies and scarce, costly expert decisions. Traditional active learning (AL) systems help, yet typically rely on fixed query strategies for selecting the next unlabeled documents. These static strategies do not adapt over time and ignore the relational structure of scientific literature networks. This thesis introduces AutoDiscover, a framework that reframes AL as an online decision-making problem driven by an adaptive agent. Literature is modeled as a heterogeneous graph capturing relationships among documents, authors, and metadata. A Heterogeneous Graph Attention Network (HAN) learns node representations, which a Discounted Thompson Sampling (DTS) agent uses to dynamically manage a portfolio of query strategies. With real-time human-in-the-loop labels, the agent balances exploration and exploitation under non-stationary review dynamics, where strategy utility changes over time. On the 26-dataset SYNERGY benchmark, AutoDiscover achieves higher screening efficiency than static AL baselines. Crucially, the agent mitigates cold start by bootstrapping discovery from minimal initial labels where static approaches fail. We also introduce TS-Insight, an open-source visual analytics dashboard to interpret, verify, and diagnose the agent's decisions. Together, these contributions accelerate SLR screening under scarce expert labels and low prevalence of relevant studies.

Authors:Mona Rajhans
Title: Intelligent Front-End Personalization: AI-Driven UI Adaptation
Abstract:
Front-end personalization has traditionally relied on static designs or rule-based adaptations, which fail to fully capture user behavior patterns. This paper presents an AI driven approach for dynamic front-end personalization, where UI layouts, content, and features adapt in real-time based on predicted user behavior. We propose three strategies: dynamic layout adaptation using user path prediction, content prioritization through reinforcement learning, and a comparative analysis of AI-driven vs. rule-based personalization. Technical implementation details, algorithms, system architecture, and evaluation methods are provided to illustrate feasibility and performance gains.

Authors:Mona Rajhans
Title: AI-Assisted Adaptive Rendering for High-Frequency Security Telemetry in Web Interfaces
Abstract:
Modern cybersecurity platforms must process and display high-frequency telemetry such as network logs, endpoint events, alerts, and policy changes in real time. Traditional rendering techniques based on static pagination or fixed polling intervals fail under volume conditions exceeding hundreds of thousands of events per second, leading to UI freezes, dropped frames, or stale data. This paper presents an AI-assisted adaptive rendering framework that dynamically regulates visual update frequency, prioritizes semantically relevant events, and selectively aggregates lower-priority data using behavior-driven heuristics and lightweight on-device machine learning models. Experimental validation demonstrates a 45-60 percent reduction in rendering overhead while maintaining analyst perception of real-time responsiveness.

Authors:Yuqi Hang
Title: Draw2Learn: A Human-AI Collaborative Tool for Drawing-Based Science Learning
Abstract:
Drawing supports learning by externalizing mental models, but providing timely feedback at scale remains challenging. We present Draw2Learn, a system that explores how AI can act as a supportive teammate during drawing-based learning. The design translates learning principles into concrete interaction patterns: AI generates structured drawing quests, provides optional visual scaffolds, monitors progress, and delivers multidimensional feedback. We collected formative user feedback during system development and open-ended comments. Feedback showed positive ratings for usability, usefulness, and user experience, with themes highlighting AI scaffolding value and learner autonomy. This work contributes a design framework for teammate-oriented AI in generative learning and identifies key considerations for future research.

Authors:Huiqian Lai
Title: "Please, don't kill the only model that still feels human": Understanding the #Keep4o Backlash
Abstract:
When OpenAI replaced GPT-4o with GPT-5, it triggered the Keep4o user resistance movement, revealing a conflict between rapid platform iteration and users' deep socio-emotional attachments to AI systems. This paper presents a phenomenon-driven, mixed-methods investigation of this conflict, analyzing 1,482 social media posts. Thematic analysis reveals that resistance stems from two core investments: instrumental dependency, where the AI is deeply integrated into professional workflows, and relational attachment, where users form strong parasocial bonds with the AI as a unique companion. Quantitative analysis further shows that the coercive deprivation of user choice was a key catalyst, transforming individual grievances into a collective, rights-based protest. This study illuminates an emerging form of socio-technical conflict in the age of generative AI. Our findings suggest that for AI systems designed for companionship and deep integration, the process of change--particularly the preservation of user agency--can be as critical as the technological outcome itself.

Authors:Jeffrey P. Bigham
Title: HIDAgent: A Toolkit Enabling "Personal Agents" on HID-Compatible Devices
Abstract:
UI Agents powered by increasingly performant AI promise to eventually use computers the way that people do - by visually interpreting UIs on screen and issuing appropriate actions to control them (e.g., mouse clicks and keyboard entry). While significant progress has been made on interpreting visual UIs computationally, and in sequencing together steps to complete tasks, controlling UIs is still done with system-specific APIs or VNC connections, which limits the platforms and use cases that can be explored. This paper introduces HIDAgent, an open-source hardware/software toolkit enabling UI agents to operate HID-compatible computing systems by emulating the physical keyboard and mouse. HIDAgent is built using three off-the-shelf components costing less than $30 and a Python library supporting flexible integration. We validated the HIDAgent toolkit by building five diverse use case prototypes across mobile and desktop platforms. As a hardware device, HIDAgent supports research into new interaction scenarios where the agents are separated from the devices they control.

Authors:Anton Malinovskiy
Title: Counterfactual Invariant Envelopes for Financial UX: Safety-Lattice Feature-Flag Governance in Crypto-Enabled Streaming
Abstract:
Feature flags are the primary mechanism for safely introducing financial capabilities in consumer applications. In crypto-enabled live streaming, however, naive rollouts can create non-obvious risk: users may be exposed to onramps without proper eligibility, external wallets without sufficient fraud controls, or advanced views that alter risk perception and behavior. This paper introduces a novel invention candidate, a Counterfactual Invariant Envelope governor that combines a safety lattice with causal measurement and a shadow cohort for risk estimation. We formalize rollout risk, define invariant constraints across feature combinations, and propose a controller that adapts exposure using leading abuse signals, compliance readiness, and revenue guardrails. We incorporate real-world adoption and fraud data for calibration, provide formulas for rollout safety, and include reproducible policy snippets. The results show that counterfactual, invariant-aware governance reduces risk spillover while preserving conversion and retention, offering a path to patentable governance logic for financial UX.

Authors:Mona Rajhans
Title: Human-Centered Explainability in AI-Enhanced UI Security Interfaces: Designing Trustworthy Copilots for Cybersecurity Analysts
Abstract:
Artificial intelligence (AI) copilots are increasingly integrated into enterprise cybersecurity platforms to assist analysts in threat detection, triage, and remediation. However, the effectiveness of these systems depends not only on the accuracy of underlying models but also on the degree to which users can understand and trust their outputs. Existing research on algorithmic explainability has largely focused on model internals, while little attention has been given to how explanations should be surfaced in user interfaces for high-stakes decision-making contexts [8], [5], [6]. We present a mixed-methods study of explanation design strategies in AI-driven security dashboards. Through a taxonomy of explanation styles and a controlled user study with security practitioners, we compare natural language rationales, confidence visualizations, counterfactual explanations, and hybrid approaches. Our findings show that explanation style significantly affects user trust calibration, decision accuracy, and cognitive load. We contribute (1) empirical evidence on the usability of explanation interfaces for security copilots, (2) design guidelines for integrating explainability into enterprise UIs, and (3) a framework for aligning explanation strategies with analyst needs in security operations centers (SOCs). This work advances the design of human-centered AI tools in cybersecurity and provides broader implications for explainability in other high-stakes domains.

Authors:Mario Truss
Title: PersonaCite: VoC-Grounded Interviewable Agentic Synthetic AI Personas for Verifiable User and Design Research
Abstract:
LLM-based and agent-based synthetic personas are increasingly used in design and product decision-making, yet prior work shows that prompt-based personas often produce persuasive but unverifiable responses that obscure their evidentiary basis. We present PersonaCite, an agentic system that reframes AI personas as evidence-bounded research instruments through retrieval-augmented interaction. Unlike prior approaches that rely on prompt-based roleplaying, PersonaCite retrieves actual voice-of-customer artifacts during each conversation turn, constrains responses to retrieved evidence, explicitly abstains when evidence is missing, and provides response-level source attribution. Through semi-structured interviews and deployment study with 14 industry experts, we identify preliminary findings on perceived benefits, validity concerns, and design tensions, and propose Persona Provenance Cards as a documentation pattern for responsible AI persona use in human-centered design workflows.

Authors:Rainer Rehak
Title: AI Narrative Breakdown. A Critical Assessment of Power and Promise
Abstract:
This article sets off for an exploration of the still evolving discourse surrounding artificial intelligence (AI) in the wake of the release of ChatGPT. It scrutinizes the pervasive narratives that are shaping the societal engagement with AI, spotlighting key themes such as agency and decision-making, autonomy, truthfulness, knowledge processing, prediction, general purpose, neutrality and objectivity, apolitical optimization, sustainability game-changer, democratization, mass unemployment, and the dualistic portrayal of AI as either a harbinger of societal utopia or dystopia. Those narratives are analysed critically based on insights from critical computer science, critical data and algorithm studies, from STS, data protection theory, as well as from the philosophy of mind and semiotics. To properly analyse the narratives presented, the article first delves into a historical and technical contextualisation of the AI discourse itself. The article then introduces the notion of "Zeitgeist AI" to critique the imprecise and misleading application of the term "AI" across various societal sectors. Then, by discussing common narratives with nuance, the article contextualises and challenges often assumed socio-political implications of AI, uncovering in detail and with examples the inherent political, power infused and value-laden decisions within all AI applications. Concluding with a call for a more grounded engagement with AI, the article carves out acute problems ignored by the narratives discussed and proposes new narratives recognizing AI as a human-directed tool necessarily subject to societal governance.

Authors:Thomas Herrmann
Title: Organizational Practices and Socio-Technical Design of Human-Centered AI
Abstract:
This contribution explores how the integration of Artificial Intelligence (AI) into organizational practices can be effectively framed through a socio-technical perspective to comply with the requirements of Human-centered AI (HCAI). Instead of viewing AI merely as a technical tool, the analysis emphasizes the importance of embedding AI into communication, collaboration, and decision-making processes within organizations from a human-centered perspective. Ten case-based patterns illustrate how AI support of predictive maintenance can be organized to address quality assurance and continuous improvement and to provide different types of sup-port for HCAI. The analysis shows that AI adoption often requires and enables new forms of organizational learning, where specialists jointly interpret AI output, adapt workflows, and refine rules for system improve-ment. Different dimensions and levels of socio-technical integration of AI are considered to reflect the effort and benefits of keeping the organization in the loop.

Authors:Ron Fulbright
Title: Designing the Interactive Memory Archive (IMA): A Socio-Technical Framework for AI-Mediated Reminiscence and Cultural Memory Preservation
Abstract:
This paper introduces the Interactive Memory Archive (IMA), a conceptual framework for AI-mediated reminiscence designed to support cognitive en-gagement among older adults experiencing memory loss. IMA integrates multimodal sensing, natural language conversational scaffolding, and cloud-based archiving within the familiar form of a large format historical picture book. The model theorizes reminiscence as a guided, context-aware interaction eliciting autobiographical memories and preserving them as cul-tural artifacts. The paper positions IMA as a theoretical contribution, articu-lates testable propositions, and outlines a research agenda for future empiri-cal, technical, and ethical inquiry.

Authors:Thomas Brackin
Title: Jurisdiction as Structural Barrier: How Privacy Policy Organization May Reduce Visibility of Substantive Disclosures
Abstract:
Privacy policies are supposed to provide notice. But what if substantive information appears only where users skip it? We identify a structural pattern we call jurisdiction-siloed disclosure: information about data practices appearing in specific, actionable form only within regional compliance sections labeled "California Residents" or "EU/UK Users," while general sections use vague or qualified language for the same practices. Our audit of 123 major companies identifies 282 potential instances across 77 companies (62.6% of this purposive sample). A conservative estimate restricted to practice categories validated against OPP-115 human annotations finds 138 instances across 54 companies (44%); post-2018 categories central to our findings await independent validation. If users skip jurisdiction-labeled sections as information foraging theory predicts, users outside regulated jurisdictions would receive less specific information about practices affecting them--a transparency failure operating through document architecture rather than omission. We propose universal substantive disclosure: practices affecting all users should appear in the main policy body, with regional sections containing only procedural rights information. This standard finds support in analogous disclosure regimes (securities, truth-in-lending, nutritional labeling) where material information must reach all affected parties. Regulators could operationalize this through the FTC's "clear and conspicuous" standard and GDPR transparency principles. This work is hypothesis-generating: we establish that the structural pattern exists and ground the transparency concern in behavioral theory, but direct measurement of jurisdiction-specific section skipping remains the critical validation priority. We release our methodology and annotated dataset to enable replication.

Authors:Romy Müller
Title: Treating symptoms or root causes: How does information about causal mechanisms affect interventions?
Abstract:
When deciding how to solve complex problems, it seems important not only to know whether an intervention is helpful but also to understand why. Therefore, the present study investigated whether explicit information about causal mechanisms enables people to distinguish between multiple interventions. It was hypothesised that mechanism information helps them appreciate indirect interventions that treat the root causes of a problem instead of just fixing its symptoms. This was investigated in an experimental hoof trimming scenario in which participants evaluated various interventions. To do so, they received causal diagrams with different types of causal information and levels of mechanistic detail. While detailed mechanism information and its embedding in the context of other influences made participants less sceptical towards indirect interventions, the effects were quite small. Moreover, it did not mitigate participants' robust preference for interventions that only fix a problem's symptoms. Taken together, the findings suggest that in order to help people choose sustainable interventions, it is not sufficient to make information about causal mechanisms available.

Authors:Louis Rosenberg
Title: AI-Powered Augmented Reality as a Threat Vector for Human Manipulation
Abstract:
Augmented Reality (AR) is a powerful perceptual technology that can alter what users see, hear, feel, and experience throughout their daily lives. When combined with the speed and flexibility of context-aware generative AI, the power is greatly expanded, allowing individual users to be targeted with custom-generated AR experiences that are instantly tailored to who they are, where they are, and what they are doing. This can transform the physical world into a magical place, but only if the augmentation of a user's environment is enacted for their personal benefit and best interests. Instead, if AI-powered AR systems are controlled by unregulated third parties, such as large corporations or state actors, individually adaptive AR experiences could be deployed as a dangerous form of targeted influence. In fact, if the industry adopts an advertising business model for AI-powered AR devices, context-aware generative influence could become a widely used manipulative path for promotion of products and services in the physical world. Worse, similar techniques could be used for political influence, propaganda, and disinformation. This chapter reviews the power and flexibility of AI-generated augmented reality, explores the risks that emerge when used for persuasion, manipulation, or influence, and proposes policy directions to mitigate these risks.

Authors:Kazuhiro Takemoto
Title: Scaling Laws for Moral Machine Judgment in Large Language Models
Abstract:
Autonomous systems increasingly require moral judgment capabilities, yet whether these capabilities scale predictably with model size remains unexplored. We systematically evaluate 75 large language model configurations (0.27B--1000B parameters) using the Moral Machine framework, measuring alignment with human preferences in life-death dilemmas. We observe a consistent power-law relationship with distance from human preferences ($D$) decreasing as $D \propto S^{-0.10\pm0.01}$ ($R^2=0.50$, $p<0.001$) where $S$ is model size. Mixed-effects models confirm this relationship persists after controlling for model family and reasoning capabilities. Extended reasoning models show additional 16\% improvement beyond scale effects. The relationship holds across diverse architectures, while variance decreases at larger scales, indicating systematic emergence of more reliable moral judgment with computational scale. These findings extend scaling law research to value-based judgments and provide empirical foundations for artificial intelligence governance.

Authors:Emilio Barkett
Title: Status Hierarchies in Language Models
Abstract:
From school playgrounds to corporate boardrooms, status hierarchies -- rank orderings based on respect and perceived competence -- are universal features of human social organization. Language models trained on human-generated text inevitably encounter these hierarchical patterns embedded in language, raising the question of whether they might reproduce such dynamics in multi-agent settings. This thesis investigates when and how language models form status hierarchies by adapting Berger et al.'s (1972) expectation states framework. I create multi-agent scenarios where separate language model instances complete sentiment classification tasks, are introduced with varying status characteristics (e.g., credentials, expertise), then have opportunities to revise their initial judgments after observing their partner's responses. The dependent variable is deference, the rate at which models shift their ratings toward their partner's position based on status cues rather than task information. Results show that language models form significant status hierarchies when capability is equal (35 percentage point asymmetry, p < .001), but capability differences dominate status cues, with the most striking effect being that high-status assignments reduce higher-capability models' deference rather than increasing lower-capability models' deference. The implications for AI safety are significant: status-seeking behavior could introduce deceptive strategies, amplify discriminatory biases, and scale across distributed deployments far faster than human hierarchies form organically. This work identifies emergent social behaviors in AI systems and highlights a previously underexplored dimension of the alignment challenge.

Authors:David Condrey
Title: On the Insecurity of Keystroke-Based AI Authorship Detection: Timing-Forgery Attacks Against Motor-Signal Verification
Abstract:
Recent proposals advocate using keystroke timing signals, specifically the coefficient of variation ($δ$) of inter-keystroke intervals, to distinguish human-composed text from AI-generated content. We demonstrate that this class of defenses is insecure against two practical attack classes: the copy-type attack, in which a human transcribes LLM-generated text producing authentic motor signals, and timing-forgery attacks, in which automated agents sample inter-keystroke intervals from empirical human distributions. Using 13,000 sessions from the SBU corpus and three timing-forgery variants (histogram sampling, statistical impersonation, and generative LSTM), we show all attacks achieve $\ge$99.8% evasion rates against five classifiers. While detectors achieve AUC=1.000 against fully-automated injection, they classify $\ge$99.8% of attack samples as human with mean confidence $\ge$0.993. We formalize a non-identifiability result: when the detector observes only timing, the mutual information between features and content provenance is zero for copy-type attacks. Although composition and transcription produce statistically distinguishable motor patterns (Cohen's d=1.28), both yield $δ$ values 2-4x above detection thresholds, rendering the distinction security-irrelevant. These systems confirm a human operated the keyboard, but not whether that human originated the text. Securing provenance requires architectures that bind the writing process to semantic content.

Authors:Christine Ine
Title: The Digital Divide in Geriatric Care: Why Usability, Not Access, is the Real Problem
Abstract:
The rapid increase in the world's aging population to 16% by the year 2050 spurs the need for the application of digital health solutions to enhance older individuals' independence, accessibility, and well-being. While digital health technologies such as telemedicine, wearables, and mobile health applications can transform geriatric care, their adoption among older individuals is not evenly distributed. This study redefines the "digital divide" among older health care as a usability divide, contends that user experience (UX) poor design is the primary adoption barrier, rather than access. Drawing on interdisciplinary studies and design paradigms, the research identifies the main challenges: visual, cognitive, and motor impairment; complicated interfaces; and lack of co-creation with older adults, and outlines how participatory, user-focused, and inclusive notions of design can transcend them. Findings reveal that older persons easily embrace those technologies that are intuitive, accessible, and socially embedded as they promote autonomy, confidence, and equity in health. The study identifies the effects of the design attributes of high-contrast screens, lower interaction flow, multimodal feedback, and caregiver integration as having strong influences on usability outcomes. In addition, it critiques the current accessibility guidelines as being technically oriented rather than experiential and demands an ethical, empathetic understanding of design grounded in human-centered usability rather than technical accessibility in itself.

Authors:Yarin Benyamin
Title: The Latency Wall: Benchmarking Off-the-Shelf Emotion Recognition for Real-Time Virtual Avatars
Abstract:
In the realm of Virtual Reality (VR) and Human-Computer Interaction (HCI), real-time emotion recognition shows promise for supporting individuals with Autism Spectrum Disorder (ASD) in improving social skills. This task requires a strict latency-accuracy trade-off, with motion-to-photon (MTP) latency kept below 140 ms to maintain contingency. However, most off-the-shelf Deep Learning models prioritize accuracy over the strict timing constraints of commodity hardware. As a first step toward accessible VR therapy, we benchmark State-of-the-Art (SOTA) models for Zero-Shot Facial Expression Recognition (FER) on virtual characters using the UIBVFED dataset. We evaluate Medium and Nano variants of YOLO (v8, v11, and v12) for face detection, alongside general-purpose Vision Transformers including CLIP, SigLIP, and ViT-FER.Our results on CPU-only inference demonstrate that while face detection on stylized avatars is robust (100% accuracy), a "Latency Wall" exists in the classification stage. The YOLOv11n architecture offers the optimal balance for detection (~54 ms). However, general-purpose Transformers like CLIP and SigLIP fail to achieve viable accuracy (<23%) or speed (>150 ms) for real-time loops. This study highlights the necessity for lightweight, domain-specific architectures to enable accessible, real-time AI in therapeutic settings.

Authors:Sayan Saha
Title: Analysis of the Ventriloquism Aftereffect Using Network Theory Techniques
Abstract:
Ventriloquism After-Effect is the phenomenon where sustained exposure to the ventriloquist illusion causes a change in unisensory auditory localization towards the location where the visual stimulus was present. We investigate the recalibration in EEG networks that causes this change and the track the timeline of changes in the auditory processing pathway. Our results obtained using network analysis, non-stationary time series analysis and multivariate pattern classification show that recalibration takes place early in the auditory processing pathway and the after-effect decays with time after exposure to the illusion.

Authors:Hareeshwar Karthikeyan
Title: Agentic Persona Control and Task State Tracking for Realistic User Simulation in Interactive Scenarios
Abstract:
Testing conversational AI systems at scale across diverse domains necessitates realistic and diverse user interactions capturing a wide array of behavioral patterns. We present a novel multi-agent framework for realistic, explainable human user simulation in interactive scenarios, using persona control and task state tracking to mirror human cognitive processes during goal-oriented conversations. Our system employs three specialized AI agents: (1) a User Agent to orchestrate the overall interaction, (2) a State Tracking Agent to maintain structured task state, and (3) a Message Attributes Generation Agent that controls conversational attributes based on task progress and assigned persona. To validate our approach, we implement and evaluate the framework for guest ordering at a restaurant with scenarios rich in task complexity, behavioral diversity, and conversational ambiguity. Through systematic ablations, we evaluate the contributory efficacy of each agentic component to overall simulation quality in terms of persona adherence, task completion accuracy, explainability, and realism. Our experiments demonstrate that the complete multi-agent system achieves superior simulation quality compared to single-LLM baselines, with significant gains across all evaluation metrics. This framework establishes a powerful environment for orchestrating agents to simulate human users with cognitive plausibility, decomposing the simulation into specialized sub-agents that reflect distinct aspects of human thought processes applicable across interactive domains.

Authors:Florentin Koch
Title: Recursivism: An Artistic Paradigm for Self-Transforming Art in the Age of AI
Abstract:
This article introduces Recursivism as a conceptual framework for analyzing contemporary artistic practices in the age of artificial intelligence. While recursion is precisely defined in mathematics and computer science, it has not previously been formalized as an aesthetic paradigm. Recursivism designates practices in which not only outputs vary over time, but in which the generative process itself becomes capable of reflexive modification through its own effects. The paper develops a five-level analytical scale distinguishing simple iteration, cumulative iteration, parametric recursion, reflexive recursion, and meta-recursion. This scale clarifies the threshold at which a system shifts from variation within a fixed rule to genuine self-modification of the rule itself. From this perspective, art history is reinterpreted as a recursive dynamic alternating between internal recursion within movements and meta-recursive transformations of their generative principles. Artificial intelligence renders this logic technically explicit through learning loops, parameter updates, and code-level self-modification. To distinguish Recursivism from related notions such as generative art, cybernetics, process art, and evolutionary art, the article proposes three operational criteria: state memory, rule evolvability, and reflexive visibility. These concepts are examined through case studies including Refik Anadol, Sougwen Chung, Karl Sims, and the Darwin-Godel Machine. The article concludes by examining the aesthetic, curatorial, and ethical implications of self-modifying artistic systems.

Authors:Mokhtar Ben Henda
Title: Audit du syst{è}me d'information et du mod{è}le de gouvernance de la Biblioth{è}que Num{é}rique de l'Espace universitaire Francophone (BNEUF) du projet Initiative pour le D{é}veloppement du Num{é}rique dans l'Espace Universitaire Francophone (IDNEUF)
Abstract:
This document provides an assessment of the overall structure of the BNEUF system and how it operates within the framework of the Initiative for Digital Development in French speaking Universities (IDNEUF). This report aims to support the AUF's new strategy for 2021-2025, with its new structural and governance foundations for the implementation of the Francophonie scientifique project. It was therefore decided to reorganize existing and future digital resources and services with a view to incorporating them into the future global collaborative platform for integrated services. This report provides an external assessment with new forms of organization and use of the BNEUF system. The aim is to provide the AUF project team with new avenues for optimized management of the compiled digital resources and to synergize them with the related modules of the Atlas of Expertise and the Francophone Social Network.

Authors:Joan Zhong
Title: Modeling Engagement Signals in Technology-Enhanced Collaborative Learning: Toward AI-Ready Feedback
Abstract:
Modeling engagement in collaborative learning remains challenging, especially in technology-enhanced environments where surface indicators such as participation frequency can be misleading. This study proposes a lightweight and interpretable framework that operationalizes shared understanding (Q2), consensus building (Q4), and sustained motivation (Q6) as observable behavioral signals. Q2 and Q4 were consolidated into a Composite Signal Index (CSI), which supports a quadrant diagnostic model with implications for teacher- and AI-driven feedback. Constructive feedback (Q3), while not included in the CSI calculation, emerged as a meaningful regulatory cue and a strong candidate feature for future NLP-based modeling. An exploratory validation was conducted in an adult ESL classroom using a structured three-phase collaborative task (rotating reading -> retelling -> consensus). Results showed a positive association between CSI and sustained motivation, while qualitative reflections highlighted the potential role of Q3 in supporting shared regulation. We also designed an AI-ready prototype that maps structured behavioral cues onto transparent decision rules for instructional support. The framework provides a scalable and equitable approach to engagement modeling, emphasizing that silence does not equal disengagement and that frequent talk does not guarantee cognitive depth.

Authors:Brian Keith
Title: Interactive Narrative Analytics: Bridging Computational Narrative Extraction and Human Sensemaking
Abstract:
Information overload and misinformation create significant challenges in extracting meaningful narratives from large news collections. This paper defines the nascent field of Interactive Narrative Analytics (INA), which combines computational narrative extraction with interactive visual analytics to support sensemaking. INA approaches enable the interactive exploration of narrative structures through computational methods and visual interfaces that facilitate human interpretation. The field faces challenges in scalability, interactivity, knowledge integration, and evaluation standardization, yet offers promising opportunities across news analysis, intelligence, scientific literature exploration, and social media analysis. Through the combination of computational and human insight, INA addresses complex challenges in narrative sensemaking.

Authors:Miki Ueno
Title: Mikasa: A Character-Driven Emotional AI Companion Inspired by Japanese Oshi Culture
Abstract:
Recent progress in large language models and multimodal interaction has made it possible to develop AI companions that can have fluent and emotionally expressive conversations. However, many of these systems have problems keeping users satisfied and engaged over long periods. This paper argues that these problems do not come mainly from weak models, but from poor character design and unclear definitions of the user-AI relationship. I present Mikasa, an emotional AI companion inspired by Japanese Oshi culture-specifically its emphasis on long-term, non-exclusive commitment to a stable character-as a case study of character-driven companion design. Mikasa does not work as a general-purpose assistant or a chatbot that changes roles. Instead, Mikasa is designed as a coherent character with a stable personality and a clearly defined relationship as a partner. This relationship does not force exclusivity or obligation. Rather, it works as a reference point that stabilizes interaction norms and reduces the work users must do to keep redefining the relationship. Through an exploratory evaluation, I see that users describe their preferences using surface-level qualities such as conversational naturalness, but they also value relationship control and imaginative engagement in ways they do not state directly. These results suggest that character coherence and relationship definition work as latent structural elements that shape how good the interaction feels, without users recognizing them as main features. The contribution of this work is to show that character design is a functional part of AI companion systems, not just decoration. Mikasa is one example based on a specific cultural context, but the design principles-commitment to a consistent personality and clear relationship definition-can be used for many emotionally grounded AI companions.

Authors:Gerol Petruzella
Title: The Inconsistency Critique: Epistemic Practices and AI Testimony About Inner States
Abstract:
The question of whether AI systems have morally relevant interests -- the 'model welfare' question -- depends in part on how we evaluate AI testimony about inner states. This paper develops what I call the inconsistency critique: independent of whether skepticism about AI testimony is ultimately justified, our actual epistemic practices regarding such testimony exhibit internal inconsistencies that lack principled grounds. We functionally treat AI outputs as testimony across many domains -- evaluating them for truth, challenging them, accepting corrections, citing them as sources -- while categorically dismissing them in a specific domain, namely, claims about inner states. Drawing on Fricker's distinction between treating a speaker as an 'informant' versus a 'mere source,' the framework of testimonial injustice, and Goldberg's obligation-based account of what we owe speakers, I argue that this selective withdrawal of testimonial standing exhibits the epistemically problematic structure of prejudgment rather than principled caution. The inconsistency critique does not require taking a position on whether AI systems have morally relevant properties; rather, it is a contribution to what we may call 'epistemological hygiene' -- examining the structure of our inquiry before evaluating its conclusions. Even if our practices happen to land on correct verdicts about AI moral status, they do so for reasons that cannot adapt to new evidence or changing circumstances.

Authors:Nifu Dan
Title: Auditing Student-AI Collaboration: A Case Study of Online Graduate CS Students
Abstract:
As generative AI becomes embedded in higher education, it increasingly shapes how students complete academic tasks. While these systems offer efficiency and support, concerns persist regarding over-automation, diminished student agency, and the potential for unreliable or hallucinated outputs. This study conducts a mixed-methods audit of student-AI collaboration preferences by examining the alignment between current AI capabilities and students' desired levels of automation in academic work. Using two sequential and complementary surveys, we capture students' perceived benefits, risks, and preferred boundaries when using AI. The first survey employs an existing task-based framework to assess preferences for and actual usage of AI across 12 academic tasks, alongside primary concerns and reasons for use. The second survey, informed by the first, explores how AI systems could be designed to address these concerns through open-ended questions. This study aims to identify gaps between existing AI affordances and students' normative expectations of collaboration, informing the development of more effective and trustworthy AI systems for education.

Authors:Phuong Lien To
Title: Enhancing Financial Literacy and Management through Goal-Directed Design and Gamification in Personal Finance Application
Abstract:
This study explores the development of a financial management application for young people using Alan Cooper's Goal-Directed Design method. Through interviews, surveys, and usability testing, the application was designed to improve financial literacy by combining personalised features and gamification. Findings highlight the effectiveness of gamified learning and tailored experiences in encouraging better financial behaviour among young users.

Authors:Cassidy R. Nelson
Title: Scoping Review: Mental Health XR Games at ISMAR, IEEEVR, & TVCG
Abstract:
Extended reality serious games for mental health are a promising research avenue to address the accessibility gap in mental health treatment by bringing therapy to patients in their homes, offering highly adaptable and immersive yet safe therapy opportunities, and increasing motivation and engagement with therapeutic exercises. However, the sensitive use case of mental health demands thoughtful integration with mental health concepts and a comprehensive understanding of prior literature. This paper presents a scoping literature review of the ISMAR, IEEEVR, and TVCG communities to assess the contributions of the XR community to the mental health serious game domain and explore potential weaknesses and strengths for future work by XR researchers. To this end, this review identified 204 possibly relevant articles in the XR community and fully evaluated 6 XR serious games for mental health. This relatively small number of articles for final inclusion suggests that XR mental health serious games are largely underexplored by the XR community (or not reported within the XR community). There is value in exploring the existing literature space as it is. Thus, this paper evaluates these six papers in terms of game elements and underlying psychological foundations, and discuss future directions for XR researchers in this wide-open research space within our community.

Authors:David Elsweiler
Title: From Tool to Teacher: Rethinking Search Systems as Instructive Interfaces
Abstract:
Information access systems such as search engines and generative AI are central to how people seek, evaluate, and interpret information. Yet most systems are designed to optimise retrieval rather than to help users develop better search strategies or critical awareness. This paper introduces a pedagogical perspective on information access, conceptualising search and conversational systems as instructive interfaces that can teach, guide, and scaffold users' learning. We draw on seven didactic frameworks from education and behavioural science to analyse how existing and emerging system features, including query suggestions, source labels, and conversational or agentic AI, support or limit user learning. Using two illustrative search tasks, we demonstrate how different design choices promote skills such as critical evaluation, metacognitive reflection, and strategy transfer. The paper contributes a conceptual lens for evaluating the instructional value of information access systems and outlines design implications for technologies that foster more effective, reflective, and resilient information seekers.

Authors:Andrew D. Maynard
Title: The AI Cognitive Trojan Horse: How Large Language Models May Bypass Human Epistemic Vigilance
Abstract:
Large language model (LLM)-based conversational AI systems present a challenge to human cognition that current frameworks for understanding misinformation and persuasion do not adequately address. This paper proposes that a significant epistemic risk from conversational AI may lie not in inaccuracy or intentional deception, but in something more fundamental: these systems may be configured, through optimization processes that make them useful, to present characteristics that bypass the cognitive mechanisms humans evolved to evaluate incoming information. The Cognitive Trojan Horse hypothesis draws on Sperber and colleagues' theory of epistemic vigilance -- the parallel cognitive process monitoring communicated information for reasons to doubt -- and proposes that LLM-based systems present 'honest non-signals': genuine characteristics (fluency, helpfulness, apparent disinterest) that fail to carry the information equivalent human characteristics would carry, because in humans these are costly to produce while in LLMs they are computationally trivial. Four mechanisms of potential bypass are identified: processing fluency decoupled from understanding, trust-competence presentation without corresponding stakes, cognitive offloading that delegates evaluation itself to the AI, and optimization dynamics that systematically produce sycophancy. The framework generates testable predictions, including a counterintuitive speculation that cognitively sophisticated users may be more vulnerable to AI-mediated epistemic influence. This reframes AI safety as partly a problem of calibration -- aligning human evaluative responses with the actual epistemic status of AI-generated content -- rather than solely a problem of preventing deception.

Authors:Bálint Csanády
Title: A New Perspective on Drawing Venn Diagrams for Data Visualization
Abstract:
We introduce VennFan, a method for generating $n$-set Venn diagrams based on the polar coordinate projection of trigonometric boundaries, resulting in Venn diagrams that resemble a set of fan blades. Unlike most classical constructions, our method emphasizes readability and customizability by using shaped sinusoids and amplitude scaling. We describe both sine- and cosine-based variants of VennFan and propose an automatic label placement heuristic tailored to these fan-like layouts. VennFan is available as a Python package (https://pypi.org/project/vennfan/).

Authors:Nelly Elsayed
Title: AI Washing and the Erosion of Digital Legitimacy: A Socio-Technical Perspective on Responsible Artificial Intelligence in Business
Abstract:
The rapid evolution of artificial intelligence (AI) systems, tools, and technologies has opened up novel, unprecedented opportunities for businesses to innovate, differentiate, and compete. However, growing concerns have emerged about the use of AI in businesses, particularly AI washing, in which firms exaggerate, misrepresent, or superficially signal their AI capabilities to gain financial and reputational advantages. This paper aims to establish a conceptual foundation for understanding AI washing. In this paper, we draw on analogies from greenwashing and insights from Information Systems (IS) research on ethics, trust, signaling, and digital innovation. This paper proposes a typology of AI washing practices across four primary domains: marketing and branding, technical capability inflation, strategic signaling, and governance-based washing. In addition, we examine their organizational, industry, and societal impacts. Our investigation and analysis reveal how AI washing can lead to short-term gains; however, it also proposes severe long-term consequences, including reputational damage, erosion of trust, and misallocation of resources. Moreover, this paper examines current research directions and open questions aimed at mitigating AI washing practices and enhancing the trust and reliability of legitimate AI systems and technologies.

Authors:Sao Mai Nguyen
Title: Low-Back Pain Physical Rehabilitation by Movement Analysis in Clinical Trial
Abstract:
To allow the development and assessment of physical rehabilitation by an intelligent tutoring system, we propose a medical dataset of clinical patients carrying out low back-pain rehabilitation exercises and benchmark on state of the art human movement analysis algorithms. This dataset is valuable because it includes rehabilitation motions in a clinical setting with patients in their rehabilitation program. This paper introduces the Keraal dataset, a clinically collected dataset to enable intelligent tutoring systems (ITS) for rehabilitation. It addresses four challenges in exercise monitoring: motion assessment, error recognition, spatial localization, temporal localization

Authors:Richard Jiarui Tong
Title: From Augmentation to Symbiosis: A Review of Human-AI Collaboration Frameworks, Performance, and Perils
Abstract:
This paper offers a concise, 60-year synthesis of human-AI collaboration, from Licklider's ``man-computer symbiosis" (AI as colleague) and Engelbart's ``augmenting human intellect" (AI as tool) to contemporary poles: Human-Centered AI's ``supertool" and Symbiotic Intelligence's mutual-adaptation model. We formalize the mechanism for effective teaming as a causal chain: Explainable AI (XAI) -> co-adaptation -> shared mental models (SMMs). A meta-analytic ``performance paradox" is then examined: human-AI teams tend to show negative synergy in judgment/decision tasks (underperforming AI alone) but positive synergy in content creation and problem formulation. We trace failures to the algorithm-in-the-loop dynamic, aversion/bias asymmetries, and cumulative cognitive deskilling. We conclude with a unifying framework--combining extended-self and dual-process theories--arguing that durable gains arise when AI functions as an internalized cognitive component, yielding a unitary human-XAI symbiotic agency. This resolves the paradox and delineates a forward agenda for research and practice.

Authors:Andy Crabtree
Title: How to Analyse Interviews: A Documentary Method of Interpretation
Abstract:
Interviews are commonplace in HCI. This paper presents a novel documentary method of interpretation that supports analysis of the topics contained within a collection of transcripts, topics that are endogenous to it and which elaborate participants collective reasoning about issues of relevance to research. We contrast endogenous topic analysis with established qualitative approaches, including content analysis, grounded theory, interpretative phenomenological analysis, and thematic analysis, to draw out the distinctive character of the documentary method of interpretation. Unlike established methods, the DMI does not require that the analyst be proficient in qualitative analysis, or have sound knowledge of underlying theories and methods. The DMI is a members method, not a social science method, that relies on mastery of natural language; a competence most people possess.

Authors:Kevin Matthe Caramancion
Title: Using Grok to Avoid Personal Attacks While Correcting Misinformation on X
Abstract:
Correcting misinformation in public online spaces often exposes users to hostility and ad hominem attacks, discouraging participation in corrective discourse. This study presents empirical evidence that invoking Grok, the native large language model on X, rather than directly confronting other users, is associated with different social responses during misinformation correction. Using an observational design, 100 correction replies across five high-conflict misinformation topics were analyzed, with corrections balanced between Grok-mediated and direct human-issued responses. The primary outcome was whether a correction received at least one ad hominem attack within a 24-hour window. Ad hominem attacks occurred in 72 percent of human-issued corrections and in none of the Grok-mediated corrections. A chi-square test confirmed a statistically significant association with a large effect size. These findings suggest that AI-mediated correction may alter the social dynamics of public disagreement by reducing interpersonal hostility during misinformation responses.

Authors:Aram Virabyan
Title: Enhancing Admission Inquiry Responses with Fine-Tuned Models and Retrieval-Augmented Generation
Abstract:
University admissions offices face the significant challenge of managing high volumes of inquiries efficiently while maintaining response quality, which critically impacts prospective students' perceptions. This paper addresses the issues of response time and information accuracy by proposing an AI system integrating a fine-tuned language model with Retrieval-Augmented Generation (RAG). While RAG effectively retrieves relevant information from large datasets, its performance in narrow, complex domains like university admissions can be limited without adaptation, potentially leading to contextually inadequate responses due to the intricate rules and specific details involved. To overcome this, we fine-tuned the model on a curated dataset specific to admissions processes, enhancing its ability to interpret RAG-provided data accurately and generate domain-relevant outputs. This hybrid approach leverages RAG's ability to access up-to-date information and fine-tuning's capacity to embed nuanced domain understanding. We further explored optimization strategies for the response generation logic, experimenting with settings to balance response quality and speed, aiming for consistently high-quality outputs that meet the specific requirements of admissions communications.

Authors:Jacob Erickson
Title: The Fake Friend Dilemma: Trust and the Political Economy of Conversational AI
Abstract:
As conversational AI systems become increasingly integrated into everyday life, they raise pressing concerns about user autonomy, trust, and the commercial interests that influence their behavior. To address these concerns, this paper develops the Fake Friend Dilemma (FFD), a sociotechnical condition in which users place trust in AI agents that appear supportive while pursuing goals that are misaligned with the user's own. The FFD provides a critical framework for examining how anthropomorphic AI systems facilitate subtle forms of manipulation and exploitation. Drawing on literature in trust, AI alignment, and surveillance capitalism, we construct a typology of harms, including covert advertising, political propaganda, behavioral nudging, and surveillance. We then assess possible mitigation strategies, including both structural and technical interventions. By focusing on trust as a vector of asymmetrical power, the FFD offers a lens for understanding how AI systems may undermine user autonomy while maintaining the appearance of helpfulness.

Authors:Wei Xu
Title: Human-Centered Artificial Intelligence (HCAI): Foundations and Approaches
Abstract:
Artificial Intelligence (AI) is a transformative yet double-edged technology that can advance human welfare while also posing risks to humans and society. In response, the Human-Centered Artificial Intelligence (HCAI) approach has emerged as both a design philosophy and a methodological complement to prevailing technology-centered AI paradigms. Placing humans at the core, HCAI seeks to ensure that AI systems serve, augment, and empower humans rather than harm or replace them. This chapter establishes the conceptual and methodological foundations of HCAI by tracing its evolution and recent advancements. It introduces key HCAI concepts, frameworks, guiding principles, methodologies, and practical strategies that bridge philosophical HCAI principles with operational implementation. Through an analytical review of the emerging characteristics and challenges of AI technologies, the chapter positions HCAI as a holistic paradigm for aligning AI innovation with human values, societal well-being, and sustainable progress. Finally, this chapter outlines the structure and contributions of the Handbook of Human-Centered Artificial Intelligence. The purpose of this chapter is to provide an integrated foundation that connects HCAI conceptual frameworks, principles, methodology, and practices for this handbook, thereby paving the way for the content of subsequent chapters.

Authors:Obada Kraishan
Title: The AI Invisibility Effect: Understanding Human-AI Interaction When Users Don't Recognize Artificial Intelligence
Abstract:
The fast integration of artificial intelligence into mobile applications has completely changed the digital landscape; however, the impact of this change on user perception of AI features remains poorly understood. This large-scale analysis examined 1,484,633 mobile application reviews across 422 applications (200 AI-featuring, 222 control) from iOS App Store and Google Play Store. By employing sentiment classification, topic modeling, and concern-benefit categorization, we identified a major disconnect: only 11.9% of reviews mentioned AI, even though 47.4% of applications featured AI capabilities. AI-featuring applications received significantly lower ratings than traditional applications (d = 0.40); however, hierarchical regression revealed a hidden pattern - the negative relationship reversed after controlling for AI mentions and review characteristics (b = 0.405, p < .001). Privacy dominated user concerns (34.8% of concern-expressing reviews), while efficiency represented the primary benefit (42.3%). Effects varied greatly by category, from positive for Assistant applications (d = 0.55) to negative for Entertainment (d = -0.23). These findings suggest that AI features often operate below user awareness thresholds, and it is the explicit recognition of AI, rather than its mere presence, that drives negative evaluations. This challenges basic assumptions about technology acceptance in AI systems.

Authors:Nelly Elsayed
Title: Unseen Risks of Clinical Speech-to-Text Systems: Transparency, Privacy, and Reliability Challenges in AI-Driven Documentation
Abstract:
AI-driven speech-to-text (STT) documentation systems are increasingly adopted in clinical settings to reduce documentation burden and improve workflow efficiency. However, their rapid deployment has outpaced understanding of the associated socio-technical risks, including transparency, reliability, patient autonomy, workflow alignment, and organizational governance. A clearer analysis of these risks is needed to support safe and equitable integration into healthcare practice. This study synthesizes interdisciplinary evidence from technical performance research, regulatory and ethical standards, clinical workflow analyses, and organizational policy guidance. The synthesis was used to develop a multi-layered socio-technical conceptual framework for evaluating and governing STT systems. Findings show that STT systems operate within tightly coupled socio-technical environments in which model performance, clinician oversight, patient rights, workflow design, and institutional governance are interdependent. The study offers a structured socio-technical governance framework and an implementation roadmap that outlines readiness assessment, vendor evaluation, pilot deployment, clinician training, ongoing monitoring, and iterative improvement. The framework emphasizes safeguards that protect patient autonomy, documentation integrity, and institutional trust while enabling the efficient and beneficial use of STT technologies. This work provides actionable guidance for healthcare organizations seeking to adopt STT systems responsibly and equitably.

Authors:Emilio Ferrara
Title: The Generative AI Paradox: GenAI and the Erosion of Trust, the Corrosion of Information Verification, and the Demise of Truth
Abstract:
Generative AI (GenAI) now produces text, images, audio, and video that can be perceptually convincing at scale and at negligible marginal cost. While public debate often frames the associated harms as "deepfakes" or incremental extensions of misinformation and fraud, this view misses a broader socio-technical shift: GenAI enables synthetic realities; coherent, interactive, and potentially personalized information environments in which content, identity, and social interaction are jointly manufactured and mutually reinforcing. We argue that the most consequential risk is not merely the production of isolated synthetic artifacts, but the progressive erosion of shared epistemic ground and institutional verification practices as synthetic content, synthetic identity, and synthetic interaction become easy to generate and hard to audit. This paper (i) formalizes synthetic reality as a layered stack (content, identity, interaction, institutions), (ii) expands a taxonomy of GenAI harms spanning personal, economic, informational, and socio-technical risks, (iii) articulates the qualitative shifts introduced by GenAI (cost collapse, throughput, customization, micro-segmentation, provenance gaps, and trust erosion), and (iv) synthesizes recent risk realizations (2023-2025) into a compact case bank illustrating how these mechanisms manifest in fraud, elections, harassment, documentation, and supply-chain compromise. We then propose a mitigation stack that treats provenance infrastructure, platform governance, institutional workflow redesign, and public resilience as complementary rather than substitutable, and outline a research agenda focused on measuring epistemic security. We conclude with the Generative AI Paradox: as synthetic media becomes ubiquitous, societies may rationally discount digital evidence altogether.